Events and Wikipedia traffic: the case of Terrorism
Data visualisation lab final project, 2016
Purpose of this project is to develop an understanding of the motivations behind the various models of access to Wikipedia pages, especially through data traffic analysis. In this study, we pay particular attention to consistent variations in the visualization pattern – in particular, we study the correlation of a sudden increase in traffic with external events, such as news. The objective of our work is the analysis of how large deviations from the ‘normal’ traffic of a page can generate and influence traffic to secondary pages.
The dataset used for the introductory section of this work is the Global Terrorism Database. The GTD collects data on every terrorist incident that occurred between the 1970s and December 2014, with the latest update dated July 2015. The database in question is managed by the National Consortium for the Study of Terrorism and Responses to Terrorism.
Since the data for the year 2015 will be published only in August 2016, it has become necessary to integrate the dataset with data related to the terrorist incidents of the past year. The above data have been found in two Wikipedia pages dedicated to them; however, the source was much less detailed than the GTD, as visible from the visualization.
The dataset was cleaned up focusing only on the terrorist attacks that took place between 2000 and 2015. The CartoDB platform was used to create the visualization.
The second section of this work results in to two distinct vsualizations.
In the first instance, we resorted to Seealsology, which returned a dataset of pages related to a main page or “seed”. The aim was to identify the semantic network that develops from the Wikipedia page “Terrorism”, analyzing both direct (first level) and indirect (second) connections.
The obtained dataset has been elaborated in Gephi and the result is visible in the second visualization.
Subsequently, the five pages were selected with a greater degree, ie those characterized by several links connected to them, with the aim of comparing the variations in the pattern of views between the months of November and December 2015 with those of the source page. Data were collected with Wikipedia Article Traffic Statistics. The obtained datasets were then elaborated with Tableau Public. The result is shown in the third and last graphs
The first visualization returns an overview of the temporal evolution of the terrorist attacks in the new millennium, until December 2015.
The second visualization shows the network of terrorism-related Wikipedia pages; the structure of the semantic network, highlighted by the layout and the color palette, emerges.