RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 68–83 (Mi danma668)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Pairwise image matching for plagiarism detection

D. D. Dorinab, K. D. Varlamovaab, A. V. Grabovoyab

a Antiplagiat Company, Moscow
b Moscow Institute of Physics and Technology, Moscow, Russia

Abstract: Plagiarism detection represents a critical task across various fields, including academic publishing, journalism, e-commerce, and media verification. While substantial attention focuses on identifying textual plagiarism, image plagiarism, particularly in biology and medicine, remains a significant concern. Automated retrieval systems often surface numerous potential candidates, but a high rate of false positives - pairs incorrectly flagged as plagiarism – necessitates highly accurate pairwise matching for verification. Manual alterations to images, such as rotations, mirroring, conversion to grayscale, and color distortion constitute forms of plagiarism. This work addresses the critical need for false positive rate (FPR) minimization in pairwise image plagiarism detection through rigorous analysis of similarity scoring models. The proposed approach employs a siamese network with three key components: a weight-shared encoder, a symmetric fusion module with order-invariant embedding combination, and a similarity classification head. Training employs a hybrid self-supervised strategy with plagiarism-mimicking augmentations, combining cross-entropy loss and contrastive regularization. Ablation studies evaluate encoder architectures and fusion strategies. For comparison, identical siamese architectures utilize frozen state-of-the-art self-supervised representations Barlow Twins and CLIP, with fusion modules and classification heads trained identically. Experimental validation across multi-domain images demonstrates that end-to-end trained models consistently outperform approaches using frozen state-of-the-art representations.

Keywords: image plagiarism detection, near-duplicate image, image matching, contrastive learning, siamese networks, image similarityÞ

UDC: 004.9

Received: 18.08.2025
Accepted: 15.09.2025

DOI: 10.7868/S2686954325070069



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025