While Edward Snowden’s recent revelations about NSA surveillance have shed light on potentially reprehensible uses of mass data analysis, it’s important to remember that big data can be used for good, too. Data can be graphed to make a point and promote change, for example, or analyzed in real time to predict the outcome of important events. Collected datasets – location and phone metadata, especially – can also be used to make so-called data art: interactive visualizations of large pools of information. That’s what former Google employee Eric Fischer and engineers at enterprise mapping company MapBox teamed up to produce recently. The subject? The geographic distribution of smartphone users by mobile operating system, as recorded by Twitter tweets containing geodata and device identifiers.
Smartphone models are represented on the map by colored dots: red for iPhone, green for Android, and shades of purple for BlackBerry and others. At first glance, socioeconomics seem to determine what model of phone the majority of a population owns. The default map view shows a large concentration of red dots – iPhone users – in affluent cities and suburbs, and green dots – Android phones – more evenly spread across rural and lower-income areas. However, it’s important to note that these distributions aren’t the most scientific, nor were they meant to be: Fischer told Wired the maps represent only “a tiny fraction” of what is happening.
Arguably, the methods behind the map are more interesting than the map itself. To produce the pretty graphics, Fischer and MapBox had to aggregate an enormous amount of metadata: 3 billion tweets in total, or every geotagged tweet since September 2011. Social data specialists from Gnip provided access to the data, while Fischer built the interactive interface and “de-duplicated” datapoints, weeding out geographically overlapping tweets. Post- data manipulation, the crew had nearly 280 million unique tweets to work with.
In addition to the smartphone map, Fischer and MapBox also used the tweets to create two additional (and probably less contentious) graphs. One shows the places tourists tend to congregate in a given city, while the other groups tweets by the language in which they were spoken. Both are available for perusal on MapBox’s website.