Associative portraits of subject areas as a tool for automated construction of big data systems for knowledge extraction: theory, methods, visualization, and application
Abstract:
The paper presents the technique of developing systems for extraction of knowledge which employs the approach of automated association portrait of a subject area (APSA) formation and building a semantic context space (SCS). The ideology of the APSA is based on the distributional hypothesis claiming that semantically equal (or related) lexemes have a similar context and, vice versa, in a similar context, the lexemes are semantically close. The model uses an extended hypothesis that consists in the investigation of similarities and differences in contexts not only of individual words, but of arbitrary multilexeme fragments of meaningful word-combinations. The examples of implemented projects for different subject domains are given.
Keywords:semantic modeling; associations; mathematical statistics; distributive semantics; big data; automated extraction of knowledge; digital natural language text corpora; semantic search; intelligent Internet technology.