RUS  ENG
Полная версия
ЖУРНАЛЫ // Информатика и её применения

Информ. и её примен., 2021, том 15, выпуск 1, страницы 30–41 (Mi ia709)

Методы обнаружения переводных заимствований в больших текстовых коллекциях
Р. В. Кузнецова, О. Ю. Бахтеев, Ю. В. Чехович

Литература

1. Никитов А. В., Орчаков О. А., Чехович Ю. В., “Плагиат в работах студентов и аспирантов: проблема и методы противодействия”, Университетское управление: практика и анализ, 5 (2012), 61–68 [Nikitov A. V., O. A. Orchakov, Y. V. Chekhovich, “Plagiarism in papers of students and graduate students: The problem and methods of counteraction”, University Management: Practice and Analysis, 5 (2012), 61–68]
2. Khritankov A., Botov P., Surovenko N., Tsarkov S., Viuchnov D., Chekhovich Y., “Discovering text reuse in large collections of documents: A study of theses in history sciences”, Artificial Intelligence and Natural Language & Information Extraction, Social Media and Web Search FRUCT Conference, IEEE, 2015, 26–32
3. Зеленков И. В., Сегалович И. В., “Сравнительный анализ методов определения нечетких дубликатов для Web-документов”, Электронные библиотеки: перспективные методы и технологии, электронные коллекции, Тр. 9-й Всеросс. научн. конф. RCDL, Университет г. Переславля, Переславль-Залесский, 2007, 166–174 [Zelenkov I. V., I. V. Segalovich, “Comparative analysis of methods for determining fuzzy duplicates for Web-documents”, 9th All-Russian Scientific Conference “Digital libraries: Advanced Methods and Technologies, Electronic Collections” Proceedings, Pereslavl-Zalessky University, Pereslavl-Zalessky, 2007, 166–174]
4. Franco-Salvador M., Gupta P., Rosso P., “Cross-language plagiarism detection using a multilingual semantic network”, European Conference on Information Retrieval, Lecture notes in computer science ser., 7814, eds. P. Serdyukov, P. Braslavski, S. O. Kuznetsov, et al., Springer, Berlin–Heidelberg, 2013, 710–713  crossref
5. Franco-Salvador M., Gupta P., Rosso P., Banchs R., “Cross-language plagiarism detection over continuous-space-and knowledge graph-based representations of language”, Knowl.-Based Syst., 111 (2016), 87–99  crossref
6. Grman J., Ravas R., “Improved implementation for finding text similarities in large collections of data”, Notebook papers of CLEF Labs and Workshops, CEUR Workshop Proceedings, 1177, eds. V. Petras, P. Forner, P. D. Clough, Amsterdam, The Netherlands, 2011, 6 pp. http://ceur-ws.org/Vol-1177/CLEF2011wn-PAN-GrmanEt2011.pdf (accessed January 18, 2021)
7. Grozea C., Popescu M., “The encoplot similarity measure for automatic detection of plagiarism”, Notebook papers of CLEF Labs and Workshops, CEUR Workshop Proceedings, 1177, eds. V. Petras, P. Forner, P. D. Clough, Amsterdam, The Netherlands, 2011 http://ceur-ws.org/Vol-1177/CLEF2011wn-PAN-GrozeaEt2011.pdf (accessed January 18, 2021)
8. Muhr M., Kern R., Zechner M., Granitzer M., “External and intrinsic plagiarism detection using a cross-lingual retrieval and segmentation system”, Notebook papers of CLEF Labs and Workshop (Padua, Italy, 2010), CEUR Workshop Proceedings, 1176, eds. M. Braschler, D. Harman, E. Pianta, 2010 http://ceur-ws.org/Vol-1176/CLEF2010wn-PAN-MuhrEt2010.pdf (accessed January 18, 2021)
9. Bakhteev O., Kuznetsova R., Romanov A., Khritankov A., “A monolingual approach to detection of text reuse in Russian–English collection”, Artificial Intelligence and Natural Language & Information Extraction, Social Media and Web Search FRUCT Conference, IEEE, 2015, 3–10
10. Koehn P., Hoang Hien, Birch A., et al., “Moses: Open source toolkit for statistical machine translation”, 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions Proceedings, ACL, 2007, 177–180
11. Tai K., Socher R., Manning C., “Improved semantic representations from tree-structured long short-term memory networks”, 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Joint Conference (International) on Natural Language Processing Proceedings, v. 1, ACL, 2015, 1556–1566
12. Wieting J., Bansal M., Gimpel K., Livescu K., Towards universal paraphrastic sentence embeddings, 2015, arXiv: 1511.08198 (accessed January 18, 2021)
13. Iyyer M., Manjunatha V., Boyd-Graber J. Daume H., “Deep unordered composition rivals syntactic methods for text classification”, 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Joint Conference (International) on Natural Language Processing Proceedings, v. 1, ACL, 2015, 1681–1691
14. Kuznetsova R., Bakhteev O., Ogaltsov A., “Variational learning across domains with triplet information”, 3rd Workshop on Bayesian Deep Learning (Montreal, Canada, 2018) http://bayesiandeeplearning.org/2018/papers/65.pdf (accessed January 18, 2021)
15. Wang J., Shen H., Song J., Ji J., Hashing for similarity search: A survey, 2014, 29 pp., arXiv: 1408.2927 [cs.DS] (accessed January 18, 2021)
16. Alain G., Bengio Y., “What regularized auto-encoders learn from the data-generating distribution”, J. Mach. Learn. Res., 15:1 (2014), 3563–3593
17. Jenssen M., Joos F., Perkins W., “On kissing numbers and spherical codes in high dimensions”, Adv. Math., 335 (2018), 307–321  crossref  zmath
18. Cybenko G., “Approximation by superpositions of a sigmoidal function”, Math. Control Signal, 2:4 (1989), 303–314  crossref  zmath
19. Синтетическая выборка для задачи обнаружения переводных заимствований, https://tiny.cc/cl_ru_en [Synthetic dataset for the cross-lingual text reuse detection problem (accessed January 18, 2021)]
20. Bojanowski P., Grave E., Joulin A., Mikolov T., “Enriching word vectors with subword information”, Transactions Association for Computational Linguistics, 5 (2017), 135–146  crossref
21. Chung J., Gulcehre C., Cho K., Bengio Y., Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014, 9 pp., arXiv: 1412.3555 (accessed January 18, 2021)
22. Tiedemann J., “News from OPUS — a collection of multilingual parallel corpora with tools and interfaces”, Advances in natural language processing, v. 5, John Benjamins, Amsterdam–Philadelphia, 2009, 237–248


© МИАН, 2024