A browsing history is like a fingerprint - very easily identifiable
- Browsing histories are "highly distinctive and stable"
- Advertisers (and others) can quickly create accurate profiles of individuals...
- … from small samples of browsing histories - despite privacy settings
- 90 per cent accuracy garnered from 150 domain visits
"Replication: Why We Still Can't Browse in Peace: On the Uniqueness and Re-identifiability of Web Browsing Histories" is the title of a new academic research paper by Mozilla, the not-for-profit organisation behind the respected Firefox web browser. The publication comes to the sobering conclusion that as most people have unique and routine web-browsing habits, and tend to visit the same or similar sites, time after time, day after day, online advertisers (and others) can quickly create remarkably accurate profiles of individuals despite any privacy settings.
The Mozilla data scientists, Sarah Bird, Ilana Segall and Martin Lopatka show that data collected can be aggregated and then manipulated to re-identify individual users across different sets of user data that contain only the tiniest samples of a full particular browsing history. In other words, you can try to hide by anonymising your browsing history but the slightest chink in the access armour will allow advertisers to lever their way in and then bombard you with endless and increasingly targeted ads. And, of course, the potential exists for more sinister incursions by third parties and the further stripping away of already depleted privacy.
The Mozilla report on privacy and browsing histories was presented online in August at the 29th USENIX Security Symposium and revealed that a list of as few as 50 most often visited domains permits advertisers to create a close to 50 per cent accurate tracking profile on an individual, whilst a list 150 most visited sites provides for an 80 per cent accurate re-identification rate.
The new report is a more detailed update on an earlier study, "Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns" that Mozilla published back in 2012. At the time, the paper analysed the browsing data of 380,000 individuals in one of the most detailed and in depth examinations ever undertaken.
The net result (pun intended) was that 97 per cent of them had, over time, constructed a browsing profile comprising a unique list of individual preferences that, when re-visited, easily enabled re-identification of the individuals concerned.
Eight years ago this could be done with a 38 per cent accuracy for a user's Top 50 most visited domains and a 70 per cent accuracy for data sets of 500 domains. Since then time and technologies have moved on and today the re-identification accuracy rate is 50 per cent based on 50 domains and 90 per cent based on 150 domains rather than 500.
The 2020 Results: Worrying
Mozilla's latest experiment was conducted over a four week period in July and August of last year and was based on 52,000 volunteers providing anonymous browsing data. It was a much smaller sample than in the 2012 experiment but the Mozilla scientists had designed the new investigation to collect data of the same types and at the same levels as those harvested from individuals according to the various methodologies and systems of commercial analytics companies.
Those individuals participating in the investigation shared their browsing history with the Mozilla team over the course of the first fortnight of the project. Then, during the second two weeks, the scientists worked to determine if they could re-identify the individuals from the massive amount of data they had provided. It comprised 35 million website visits to 660,000 unique domains. Astonishingly 99 per cent of the profiles of individuals were found to be unique to each user.
In the completed report on the new experiment, the Mozilla team says "the feasibility of re-identifying users through distinctive profiles of their browsing history visible to websites and third parties" was amply proven, as was potential threat to individual privacy, "posed by the aggregation of browsing histories".
Further, the team observed "numerous third parties [being] pervasive enough to gather web histories sufficient to leverage browsing history." Meanwhile, "Third-party trackers remain a major concern; their prevalence and mass tracking activity is well documented. This makes the threat of history-based profiling even more tangible and urgent now than when originally proposed."
Most damningly, Mozilla 's recent work depicts increasingly sophisticated tracking technologies sustaining the targeted behavioural advertising industry. The report concludes, "We also see continued increases in scale, a profound lack of transparency in disclosure of personal information flows and consolidation of the internet economy to fewer, larger, dominant parties" You have been warned.
You can watch a video of findings of the Mozilla investigation here.
Stay up to date with the latest industry developments: sign up to receive TelecomTV's top news and videos plus exclusive subscriber-only content direct to your inbox – including our daily news briefing and weekly wrap.