Abstract:
This paper analyzes semantic relatedness measures between Wikipedia articles and their applications in text processing and information retrieval tasks. Computational efficiency requirements that a semantic relatedness measure has to conform to in order to be useful in practical systems are formulated. Two distinct computational problems for using semantic relatedness measures are identified: computing semantic relatedness between a pair of articles and ranking all articles with respect to a query article. Heuristic methods are presented for a class of semantic relatedness measures that enable efficient computation of both problems. Experiments were conducted to validate the proposed approach. Applications of the proposed measure and techniques are presented in the context of Texterra system.
Keywords:semantic relatedness, wikipedia, natural language processing, information retrieval.