top of page
gephi-label.png

Events and Wikipedia traffic: the case of Terrorism

Data visualisation lab final project, 2016

The project

Purpose of this project is to develop an understanding of the motivations behind the various models of access to Wikipedia pages, especially through data traffic analysis. In this study, we pay particular attention to consistent variations in the visualization pattern – in particular, we study the correlation of a sudden increase in traffic with external events, such as news. The objective of our work is the analysis of how large deviations from the ‘normal’ traffic of a page can generate and influence traffic to secondary pages.

Process

The dataset used for the introductory section of this work is the Global Terrorism DatabaseThe GTD collects data on every terrorist incident that occurred between the 1970s and December 2014, with the latest update dated July 2015. The database in question is managed by the National Consortium for the Study of Terrorism and Responses to Terrorism

Since the data for the year 2015 will be published only in August 2016, it has become necessary to integrate the dataset with data related to the terrorist incidents of the past year. The above data have been found in two Wikipedia pages dedicated to them; however, the source was much less detailed than the GTD, as visible from the visualization.

The dataset was cleaned up focusing only on the terrorist attacks that took place between 2000 and 2015. The CartoDB platform was used to create the visualization.

The second section of this work results in to two distinct vsualizations. 

In the first instance, we resorted to Seealsology, which returned a dataset of pages related to a main page or “seed”. The aim was to identify the semantic network that develops from the Wikipedia page “Terrorism”, analyzing both direct (first level) and indirect (second) connections.


The obtained dataset has been elaborated in Gephi and the result is visible in the second visualization.

Subsequently, the five pages were selected with a greater degree, ie those characterized by several links connected to them, with the aim of comparing the variations in the pattern of views between the months of November and December 2015 with those of the source page. Data were collected with Wikipedia Article Traffic Statistics. The obtained datasets were then elaborated with Tableau Public. The result is shown in the third and last graphs

Results

The first visualization returns an overview of the temporal evolution of the terrorist attacks in the new millennium, until December 2015.

The second visualization shows the network of terrorism-related Wikipedia pages; the structure of the semantic network, highlighted by the layout and the color palette, emerges.

gephi-label.png

The last graph reveals that both the structure of the semantic network and the involvement of users with the news determine the relevance of the news themselves. 

What emerges immediately evident from the third chart, are the peaks of views of the TerrorismTerrorism in the European Union and Islamic Terrorism pages coinciding with the Paris attacks of November 2015. But what strikes is that the peak has been reached from the page on terrorism in the European Union and not from the seed: this is because it is not only the structure of the semantic network that determines the degree of the pages but also the relationship of the same with the news.

visualizzazioni-wiki-def.png

In this case, article traffic reveals that it is the Terrorism in the European Union page that has been used by users as the starting point of their research, which means that the Terrorism page is a top-level link.

The results of this work indicate that, as assumed, the structure of Wikipedia is such as to create a network of interconnected nodes without a true origin. Connections refer to each other by creating a random navigation flow in which the user independently builds his own path internally to the collaborative encyclopedia.

Team members

Martina Agogliati, Valentina Zanelli

bottom of page