Abstract:
Social media posts play an important role in demonstration of financial market state, and their analysis is
a powerful tool for trading. The article describes the result of a study of the impact of social media activities on
the movement of the financial market. The top authoritative influencers are selected. Twitter posts are used as
data. Such texts usually include slang and abbreviations, so methods for preparing primary text data, including
Stanza, regular expressions are presented. Two approaches to the representation of a point in time in the format of
text data are considered. The difference of the influence of a single tweet or a whole package consisting of tweets
collected over a certain period of time is investigated. A statistical approach in the form of frequency analysis
is also considered, metrics defined by the significance of a particular word when identifying the relationship
between price changes and Twitter posts are introduced. Frequency analysis involves the study of the occurrence
distributions of various words and bigrams in the text for positive, negative or general trends. To build the
markup, changes in the market are processed into a binary vector using various parameters, thus setting the
task of binary classification. The parameters for Binance candlesticks are sorted out for better description of the
movement of the cryptocurrency market, their variability is also explored in this article. Sentiment is studied
using Stanford Core NLP. The result of statistical analysis is relevant to feature selection for further binary or
multiclass classification tasks. The presented methods of text analysis contribute to the increase of the accuracy
of models designed to solve natural language processing problems by selecting words, improving the quality of
vectorization. Such algorithms are often used in automated trading strategies to predict the price of an asset, the
trend of its movement.
Keywords:text analysis, natural language processing, Twitter activity, frequency analysis, feature
selection, classification problem, financial markets.