Abstract:
In our work we address the problem of the natural language identification in short texts. A Bayesian classifier is employed. We propose an extension of the language identification model by the incorporation of the new cyrillic languages of the russian small nations.
Keywords:statistical language model, natural language identification, languages of russian small nations.