Abstract:
Named entity recognition (NER) is aimed at obtaining the important information from the unstructured data presented in the form of natural language texts. In this paper, we investigate the efficiency of modern multi-task NER approach on Russian corpora by employing several different NER datasets and a dataset of part-of-speech (POS) tags. We apply a state-of-the-art neural architecture based on bidirectional LSTMs and conditional random fields. Convolutional neural networks were utilized to learn character-level features. We carry out an extensive experimental evaluation over three standard datasets of news written in Russian. The proposed multi-task model achieve states-of-the-art results with an F1 score of 88.04% on Gareev's dataset and an F1 score of 99.49% on Person-1000 dataset.
Key words and phrases:named entity recognition, NER, LSTM, CRF, multi-task learning.