Even when users take steps to opt out of online tracking, many ad companies still track their activity according to preliminary research findings by Stanford University’s Center for Internet and Society.
As Arvind Narayanan, Postdoctoral fellow at the Center for Internet and Society puts it “A 1993 New Yorker cartoon famously proclaimed, “On the Internet, nobody knows you’re a dog.” The Web is a very different place today; you now leave countless footprints online. You log into websites. You share stuff on social networks. You search for information about yourself and your friends, family, and colleagues. And yet, in the debate about online tracking, ad networks and tracking companies would have you believe we’re still in the early 90s — they regularly advance, and get away with, “anonymization” or “we don’t collect Personally Identifiable Information” as an answer to privacy concerns.
In the language of computer science, clickstreams — browsing histories that companies collect — are not anonymous at all; rather, they are pseudonymous. The latter term is not only more technically appropriate, it is much more reflective of the fact that at any point after the data has been collected, the tracking company might try to attach an identity to the pseudonym (unique ID) that your data is labeled with. Thus, identification of a user affects not only future tracking, but also retroactively affects the data that’s already been collected. Identification needs to happen only once, ever, per user.
Will tracking companies actually take steps to identify or deanonymize users? It’s hard to tell, but there are hints that this is already happening: for example, many companies claim to be able to link online and offline activity, which is impossible without identity.
Regardless, what I will show you is that if they’re not doing it, it’s not because there are any technical barriers. Essentially, then, the privacy assurance reduces to: “Trust us. We won’t misuse your browsing history.” I highly recommend you read his full article.
Advertisers fund the internet – in exchange for personal information
Remember the dot.com bubble burst of 2000? It happened because internet companies built their content and services on one key concept – that we, the consumers, would subscribe to use their services. There was just one fatal flaw – consumers wanted everything to be free. But free doesn’t pay the bills, let alone turn a profit, and internet companies either went bankrupt or changed their revenue model to ad funded.
Reasonably, advertisers want a return on their investment for funding the internet and their primary requirement – as with any advertising – is to be able to segment internet user demographics so they don’t waste money marketing shaving cream to toddlers.
Internet companies quickly learned that the more targeted the ads could be, the more advertisers were willing to pay them for access to their users… from there it doesn’t take a leap to understand how we’ve come to a place where ads follow us , and behavioral advertising is the name of the game.
In theory you are able to opt-out, in reality you’ll never know
A do-not-track feature has been added to both the Mozilla Firefox and the Microsoft IE 9 browsers that supposedly allows users to check a box in their preferences indicating they do not wish to have their online purchases, browsing patterns, search strings, or personal information be tracked. Once checked, any website the user goes to receives notice of their preference.
However, there is no law requiring companies to respect consumers do-not-track preference, and according to Stanford’s research few websites comply with users requests for privacy; choosing instead to continue tracking the user without their knowledge. They do so in at least 5 ways, as shown on Stanford’s website and paraphrased here:
1. The third party is sometimes a first party
Companies with the biggest reach in terms of third-party tracking, such as Google and Facebook, are often also companies that users have a first-party relationship with. When you visit these sites directly, you’re giving them your identity, and there is no technical barrier to them associating your identity with your clickstream collected in the third-party context.
2. Leakage of identifiers from first-party to third-party sites
In a paper published just a few months ago, Balachander Krishnamurthy, Konstantin Naryshkin and Craig Wills exposed the various ways in which users’ information can and does leak from first parties to third parties. Fully three-quarters of sites leaked sensitive information or user IDs. There are at least four mechanisms by which identity is leaked: Email address or user ID in the Referer header, potentially identifying demographic information (gender, ZIP, interests) in the Request-URI, identifiers in shared cookies resulting from “hidden third-party” servers, and username or real name in page title.
3. The third party buys your identity
Ever seen one of those “Win a free iPod!” surveys? The business model for many of these outfits, going by the euphemism “lead-generation sites,” is to collect and sell your personal information. Increasingly, these sites have ties with tracking companies.
When you reveal your identity to a survey site, there are two ways in which it could get associated with your browsing history. First, the survey site itself could have a significant third-party presence on other sites you visit. When you visit the survey site and sign up, they can simply associate that information with the clickstream they’ve already collected about you. Later on, they can also act as an identity provider to sites on which they have a third-party presence.
Alternately, they could pass on your identity to trackers that are embedded in the survey site, allowing the tracker to link your identifying information with their cookie, and in turn associate it with your browsing history. In other words, the tracker has your browsing history, the survey site has your identity, and the two can be linked via the referrer header and other types of information leakage.
A variety of browser and server-side bugs can exploited to discover users’ social identities. The known bugs have all been fixed, but computer security is a never-ending process of finding and fixing bugs.
So far I’ve talked about identifying a user when they interact with the third party directly or indirectly. However, if the mountain of deanonymization research that has accumulated in the last few years has shown us one thing, it is that the data itself can be deanonymized by correlating its external information.
The logic is straightforward: in the course of a typical day, you might comment on a news article about your hometown, tweet a recipe from your favorite cooking site, and have a conversation on a friend’s blog. By these actions, you have established a public record of having visited these three specific URLs. How many other people do you expect will have visited all three, and at roughly the same times that you did? With a very high probability, no one else. This means that an algorithm combing through a database of anonymized clickstreams can easily match your clickstream to your identity. And that’s in a single day. Tracking logs usually stretch to months and years.
The unveiling of secret tracking has galvanized congress, the FTC and even the president. Bills have been proposed to create do-not-track lists with industry compliance requirements for all users, and for minors. The European Unions “right to be forgotten” model, which would give users the right to require companies to remove all of their information from websites, is coming into favor.
If your data privacy matters to you – and it should – don’t remain silent. Let your elected officials know you support legislation that gives you the ultimate control over your information.