Abstract:
The paper describes the method of extraction of two-word domain
terms combining their features. The features are computed
from three sources: the word usage statistics in a domain-specific text collection,
the statistics of global search engines, and a domain-specific thesaurus.
The evaluation of the approach is based on the terminology
from Ontology on natural sciences and technology. We show
that the use of multiple features considerably improves the
automatic extraction of domain-specific terms.
Keywords:knowledge acquisition; term extraction; thesaurus; machine learning; search engine; Internet.