As well as oil, goods or capital, data is a valuable resource for countries. It is important for a nation to protect and preserve data that is produced in it. Data Sovereignty is the quality of having independent authority over citizens' information.

In this visualization we can observe how user data flows across countries. The color of the country represents how the country is sovereign over the data it produces (red: sovereign, yellow: not sovereign, gray: not available). Lines represent how data moves across web servers.

One can observe that the United States and China are the most data sovereign countries. However, most traffic around the world flows towards the United States meaning that China is not importing data from other countries.

Motivation

People privacy on the Internet is always at risk since almost everyone's life is being recorded. Even though most users give information willingly to some companies (for example, uploading personal information), it often happens that information is collected WITHOUT users consent. The real value of an internet company is given - mostly- by the amount and quality of the data that they have.

Governments are able to request personal information about users from companies. The more information is collected by companies, the more governments know about their citizens and citizens of other countries. This is a menace to the collective and individual privacy and national security.

In this visualization, we design a methodology to summarize this information, and understand which internet companies are offering services to users, and which are tracking them. Our main claim is that the menace to people's freedom is the amount of data that is being logged.

Methodology

When you are browsing a webpage, the browser (such as Firefox or Google Chrome) shares information with servers around the world. While a fraction of this is necessary to retrieve the data you want, others are used to track your web traffic, behind your shoulders.

We are interested in observing how much companies know about your traffic and where this data is (geographically) located

For this purpose, we record all data exchanged among website while the person is browsing using a Google Chrome Extension.

We use this information to classify sites as

However, although there are sites which are pure trackers (as doubleclick.com), or pure service providers (such as wikipedia.org). The majority of websites is however in between these two extremes: they offer a front end service to the user, but at the same time they track your information.

We estimate how much a website is a tracker or a service provider using the HITS algorithm. HITS is a link analysis algorithm that rates Web pages. It compute two values for each webpage: its authority, which estimates how many requests end up in the website, and its hub value, which estimates how much information is shared with other websites. In our case, high value of authority means that the website is tracking a user, whereas high hub value means that the website is not.

Visualization

In this visualization, you can see the traffic of a person. Each web domain is represented by a circle whose color changes depending on whether it is a tracker (in red) or not (in blue). The position of circles in the map corresponds to the geographical location of its servers.

Note You can interact with the visualization by dragging and zooming the map with your mouse pointer or trackpad, and click on any label to find its provider/tracker score.

Download the data

Download the data