Initial analysis on the Russo-Ukrainian War corpus on Twitter

Initial analysis on the Russo-Ukrainian War corpus on Twitter

Twitter consists one of the most popular social networks, attracting millions of users, including malicious entities. In this study, we delve into a recent dataset regarding the Russo-Ukrainian war in order to analyze malicious activity and suspension of users from Twitter. The scope of this work is to investigate online malicious activity on Twitter, as part of our goal in CONCORDIA- Task T1.5: User-Centric Security.

On 24 February 2022, Russia invaded Ukraine, also known now as Russo-Ukrainian War which gained the attention of the international public opinion and sparkle of public conversations on social media.  Twitter is nowadays one of the most popular online social networks and one of the main sources of communication and dissemination in the online world. It has been used in the past to analyze crises in the political world. Considering this, we have initiated an ongoing dataset acquisition from Twitter API. The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users, until now[1].

We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic and sentiment analysis in order to reveal the main trends and topics discussed in this online discourse, watch the tendencies of users, discover malicious entities and identify suspension of users. The analysis shown here is being updated on daily basis and can be found here: https://alexdrk14.github.io/RussiaUkraineWar/.


Figure 1: The Daily volume and activity of registered users.

In figure 1 we present the volume of tweets per day, while in figure 2 we show the volume of the suspended or deactivated accounts per day.  We notice increased activity in the first days of the attack, while the suspended accounts are increasing.  Figure 3 shows the daily volume of traffic based on the text language of the tweet. We only presented the ten most popular languages in the collected dataset. We notice that the language used in the majority of tweets is English.

Figure 2: Daily volume of suspended/deactivated accounts.
Figure 3: Daily volume of traffic based on text language.

Figure 4 presents the daily volume of HTs traffic, while only the ten most popular hashtags in the collected dataset are included. As it seems, the most popular HT is \#Ukraine.

Figure 4: Daily volume of HTs traffic.

Finally, in figure  5 we show the total volume of tweets, based on text language, where we present all the languages in the collected dataset. As shown in figure 3 the most popular language in tweets is English.

Figure 5: Total tweets’ volume based on text language.

Additionally, we apply sentiment analysis by using Vader cite{HuttoG14} . The results are shown in the figures below.

Figure 6: Daily positive sentiment between Ukraine and Russia.

Specifically, in figure 6, we show the daily positive sentiment between Ukraine and Russia, with the higher values representing bigger support by Twitter users.

Figure 7 shows the negative sentiment between Ukraine and Russia per day, with lower values corresponding to higher disagreement (sadness, rage, etc.) by Twitter users.

Figure 7: Daily negative sentiment between Ukraine and Russia.

Finally, we plot the positive and negative sentiment between Ukraine and Russia presidents in figures 8, 9 respectively with higher values showing bigger support by Twitter users in positive sentiment while lower values show higher disagreement (sadness, rage, etc.) by Twitter users.

Figure 8: Daily positive sentiment between Ukraine and Russia Presidents.
Figure 9: Daily negative sentiment between Ukraine and Russia Presidents.
Hashtag# of tweets
#Ukraine 28,578,739
#Russia9,070,451
#StandWithUkraine6,826,617
#Putin4,851,536
#UkraineRussiaWar4,007,785
#StopRussia2,346,969
#StopPutin 2,332,136
#StopWar 1,877,518
#Kyiv1,777,401
#NATO 1,686,092
Table 1: Ten most popular hashtags in our dataset

In table 1 we show the total number of tweets contained in each hashtag, for the ten most popular hashtags retrieved in our dataset, while in table 2, we show the total number of for each language, only for the top ten most popular languages in our dataset. 

Language# of tweets
English 35,007,332
Unclear/Mix3,883,865
French3,821,685
German 2,333,747
Spanish1,606,082
Italian1,541,243
Japanese1,353,848
Hindi1,076,816
Thai1,047,481
Polish1,012,907
Table 2: Ten most popular text languages in our dataset

References:

  1. The dataset is available on Github, at https://github.com/alexdrk14/RussoUkrainianWar_Dataset, but due to privacy restrictions applied by Twitter API, we only provide the tweet IDs.

(By Despoina Antonakaki, Foundation for Research and Technology – Hellas (FORTH), Greece)