Abstract:
We suggest a new language-independent architecture of robust word vectors (RoVe). It is designed to alleviate the issue of typos and misspellings, common in almost any user-generated content, which hinder automatic text processing. Our model is morphologically motivated, which allows it to deal with unseen word forms in morphologically rich languages. We present the results on a number of natural language processing (NLP) tasks and languages for a variety of related architectures and show that the proposed architecture is robust to typos.
Key words and phrases:word vectors, distributed representations, natural language processing.