WebTAP

What it is: The Web Transparency & Accountability Project monitors the data collection practices of websites, in an effort to provide oversight and inform the public of deceptive and unethical data practies.

How it's built: WebTAP's software is called Open-WPM, and it is built on Firefox and Selenium. In essence, Open-WPM simulates a human being using a browser to access websites and services, much in the way a feature-test suite simulates user behavior in an app. By automating browser activity and recording observations, researchers can run tests that simulate the behavior of thousands of users interacting with a site or service.

How to use it: Open-WPM's github page gives good guidance for anyone wanting to use the software to conduct their own research. The WebTAP website also has a large collection of talks, articles, and research papers that the project has produced.

THE STORY

I was reading the book "Weapons of Math Destruction" by data-scientist Cathy O'Neil (mathbabe.org), and the author mentioned WebTAP. This was in the context of discussing the complex, opaque algorithms that are used across business and government to make high-stakes decisions, often using shaky statistical methods, questionable objectives, and outright deception. O'Neil highlights the opacity as a particularly dangerous aspect of these algorithms: "The models are, by design, inscrutable black boxes. That makes it harder to question the score or protest against it, or to definitevely answer the question: 'does the model work against the subject's interest?"1

And it's true. We all know, for example, that Amazon serves us up products based in part on the data it has collected about us -- our shopping history, what we have searched for, the websites we frequent, maybe even the people we interact with online. But how can we know if these decisions are being made with any care for our wellbeing or our rights? And how can we speak up if we think they are not? And what happens when similar models are used to determine whether or not we qualify for a loan, or move forward in a job application, or get flagged for police observation? If the methods of reaching these decisions are opaque, even to the people employing them, what does that say about our rights and freedoms in the face of their power?

Efforts to increase transparency and oversight are essential. So I was thrilled when O'Neil introduced me to WebTAP. WebTAP was started in 2013, and its founders state that the project is "focused on monitoring and reverse-engineering web tracking...mostly studying 'third party online tracking.'"2 The methodology is very interesting -- using Firefox and Selenium, the OpenWPM platform allows researchers to simulate not only the behaviors of actual human users, but also simulate different demographics by 'building up a specified history of activity in the automated browser'." (ibid, 8). By taking advantage of automated browsing, WebTAP has been able to "visit as many as one million websites every month." (ibid, 8). To anyone who has used Selenium as part of a testing suite before (raises hand), this is an impressive/terrifying scale.

I'm not going to go in depth on any of the research and presentations that WebTAP has produced, because if you are interested you will be better served investigating them yourself. If you arent sure where to start, I humbly recomend WebTAP founder Arvind Narayanan's 20-minute presentation "The Web Tracking Arms Race: Past, Present, and Future" (on their talks page). My hope is that if we are in an arms-race when it comes to data-collection technology, projects like WebTAP continue to get the funding and attention they need in order to keep up.

Thanks for reading!