RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2021 Issue 20, volume 3, Pages 623–653 (Mi trspy1155)

This article is cited in 2 papers

Artificial Intelligence, Knowledge and Data Engineering

Efficient natural language classification algorithm for detecting duplicate unsupervised features

S. Altafa, S. Iqbalb, M. Soomroc

a Pir Mehr Ali Shah Arid Agriculture University
b Pakistan Space and Upper Atmosphere Research Commission (SUPARCO), Pakistan
c Manukau Institute of Technology

Abstract: This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR.
The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Keywords: clustering, information retrieval, TF-IDF feature, Par2Vec, natural language texts, lexical approaches.

UDC: 006.72

Language: English

DOI: 10.15622/ia.2021.3.5



© Steklov Math. Inst. of RAS, 2024