Introduction to Digital Fingerprinting

Not Surprised

So I had been thinking about getting a Polaroid camera. Probably mentioned it to friends and family… specifically in rants at amateur photographers snapping endless reels of selfies that remain buried in the depths of their iCloud for none to see. Anyhow around the time of my last birthday I began seeing online adverts for polaroid cameras popping up on reddit.com. Not thinking much of it, perhaps I’d googled them previously and been picked up by a re-targeting pixel. Come my birthday, lo and behold I received a polaroid camera from my partner – result!

So the ads popping up on Reddit got me thinking, why was I re-targeted? Come to think of it, from cosmetics right through to womens shoes, I was getting other curve-ball ads too. Great work Reddit, re-targeting my partner’s searches likely via our IP address you’re bound to hit the target 50% of the time. To me this method seemed simplistic; why not say record our device OS too, I’m using Android and she’s on iOS – double the accuracy of the re-targeting pixel. Moreover, why stop there? We must leave a digital fingerprint all over the web.

The plot thickens

Back at The Lab the team had some other thoughts. Kevin, in response to the explicit opt-in requirement for consent to store user cookies in web browsers (see GDPR: Secure, Diligent and Ethical Working), had begun investigating alternatives to conventional cookie tracking methods. If you were to collect data from a set of features, at high levels of entropy these features can form a probabilistic model to confidently label a set of data points from another.

But where to find our features? The JavaScript API exposed by modern web browsers provides read access to many properties such as device information (OS, CPU, make, model), user agent (browser version, language) and GPU to name a few. If one were to collect this data when a user visits their website then, upon returning, a probabilistic model should be able to identify and therefore track the user by matching their digital fingerprint. Not a cookie saved in sight.

Return to the scene of the crime

So let’s go leave our sticky fingers on the web, see who’s dusting for prints. Take a library designed specifically for implementing digital fingerprinting on a website (the aptly named fingerprint2.js Git repo will do).
Scrape out the native JavaScript doing the fingerprinting to form our list of features. Then go download websites and see who’s using the features in their source code.

Starting with our old friend reddit.com we manually scraped the files downloaded by a browser when visiting the website and then looped over the JavaScript source code. In total we highlighted 14 files each containing 7 or more of the fingerprinting features.

To be clear we are not making the claim that digital fingerprinting techniques are being used to track users. Our observation is that collecting data from these features would provide sufficiently high entropy to identify a user within a probabilistic model.

Out in the wild one would expect to find these features within JavaScript source code. It is worth noting however that one feature which stood out for its sheer pointlessness to find in a conventional website is the WebGL API. Canvas-based tracking is a fingerprinting method based on JavaScript that uses the HTML5 canvas element to uniquely identify a user [1]. The technique consists of drawing an object on a canvas element and then saving a base64 encoded PNG image that contains the whole content of the canvas element. We feel this is clear evidence of digital fingerprinting when observed on a website (not to mention the waste of battery power). For further reading on Canvas based fingerprinting and see Code Obfuscation see Towards accurate detection of obfuscated web tracking.

Anyhow, we don’t like to see good code go to waste so we built the Etic Lab Fingerprinting Tool the public can use to submit a website address and check for signs of digital fingerprinting. Our tool uses the OpenWPM headless browser developed by Princeton University [2] to visit the website address, download the files, loop over the features list and return the number of files each containing 7 or more fingerprinting features. I’m planning an in depth write up of the build including how to use OpenWPM and WebSockets for interactive communication sessions between the user’s browser and a server.

Currently our tool is keeping an eye on the Alexa top 100 global website list. Feel free to submit a website address to the Etic Lab Fingerprinting Tool. I’ll let Casper take it from here – he’s going to tell you about his experience experimenting with Google Analytics and fingerprinting techniques.

 

[1] Towards accurate detection of obfuscated web tracking. Hoan Le, Federico Fallace and Pere Barlet-Ros Universitat Politecnica de Catalunya, UPC-BarcelonaTech Barcelona, Spain {hoan, fallace, pbarlet}@ac.upc.edu
[2] OpenWPM: An automated platform for web privacy measurement. Steven Englehardt, Chris Eubank, Peter Zimmerman, Dillon Reisman, Arvind Narayanan. Princeton University {ste,cge,peterz,dreisman,arvindn}@cs.princeton.edu