Google is not a search engine, it’s a personal data broker. Search, maps and mail are not their end products, they are just some of the ways they harvest personal data.
This data is used to provide more relevant search results, music recommendations and YouTube videos, but these are still essentially just more ways to collect more personal data. Google’s whole purpose (and where they make their money) is in “targeted advertising” – ie. using all the information they can learn about you and the latest tricks from psychology and behavioural science to put ads in front of your eyes which reach you in the deepest, most emotional way possible .
Google Analytics (GA) is a service from Google which owners of websites can use to see information about how their site is used. GA provides details such as numbers of visitors, times of website visits, the age and gender of visitors and the city they are browsing from. Google can guess your location from your IP address, but for details like age and gender, they likely guess based on your Google search history and the websites you have visited.
Google don’t just know the things you have searched for. Whenever you visit a page which contains a Google+ widget or a custom search widget, it creates a connection to Google which allows them to record your visit to that site.
Google Analytics also involves creating a connection to Google’s servers. Since Google Analytics is used by over one third of all the most popular sites on the internet[1], Google could use it to track individuals’ activities across huge swathes of the web. Google claim that they don’t do this, and that each visit to every separate site using Google Analytics is recorded separately, but there is no way to know whether Google correlate users’ records on their own servers. Given their interest in personal data there may be cause to suspect their claim.
We decided to investigate one small aspect of Google Analytics’ tracking abilities. Google Analytics knows whether a user is returning or visiting a site for the first time, and links a visitor to Google’s profile of them by using cookies. But Google Analytics also collects information about a visitor’s browser, operating system and screen, and all of these can be combined to create a unique fingerprint for that user. If a user deleted the Google Analytics cookie, Google could potentially recreate it based on their fingerprint to continue to identify that user.
OpenWPM[2] is a tool which allows us to programatically browse the web and record all the details of the browsing experience. We used OpenWPM to browse to a random selection of the most popular sites on the web, allowing Google Analytics to build up a fingerprint of the browser, before visiting our target site which used Google Analytics. We hypothesised that if Google were recreating cookies or re-identifying users based on their fingerprint, we would see GA reporting repeated visits to our site from the same user.
What we actually found was that these visits were reported as separate users. This suggests that Google are not using fingerprinting to re-identify users who delete their GA cookie. However, as we have no way of knowing how Google treat the data they collect once it reaches their servers, we cannot say definitively that Google don’t continue tracking users who delete their cookies as they could use fingerprinting to combine or link their records. All we can say for sure is that these are reported as separate visitors to the users of Google Analytics.
We also noticed Google Analytics reporting that some of our visits came from male users and others from female users. As we continued visiting the site, Google Analytics changed to reporting “No data available”. Since all visits to the site came from exactly the same bot, this suggests that Google Analytics’ gender reports may not be very reliable. Additionally, we found that simply changing the browser’s user agent string was all it took for Google Analytics to misidentify the browser used. Since we can see that Google Analytics examines other aspects of the browser which could be used to identify it, they could use these to corroborate the actual browser used.
These shortcomings of Google Analytics likely make little difference to those who use it on their websites, but are perhaps a boon to those privacy-conscious people who change their browser’s properties and disallow or regularly clear their cookies. Of course, just because this personal information is not available to the users of Google Analytics, doesn’t mean it’s not available to Google.
[1] Lerner, Simpson, Kohno & Roesner. “Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016.” 25th USENIX security symposium
[2] https://github.com/citp/OpenWPM