# mathr

## 2,405,518,376 flies can't be wrong

EDIT: some practical things you can do to mitigate the problems below (and others) are mentioned at fixtracking.com, thanks to whoever pointed me to that link.

The internet has many layers. One layer is the world wide web, based on HTTP. When you visit a page in your browser, the browser sends a request to the web server. A simple HTTP request for http://example.com/index.html looks like this:

GET /index.html HTTP/1.1
Host: example.com
Accept: */*

and the response from that server looks like this:

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=604800
Content-Type: text/html
Date: Fri, 04 Oct 2013 10:26:16 GMT
Etag: "359670651"
Expires: Fri, 11 Oct 2013 10:26:16 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (fll/073E)
X-Cache: HIT
x-ec-custom-error: 1
Content-Length: 1270

which is followed by a blank line and then the contents of the page. The Content-Type header shows that this is an HTML page. HTML is text with markup tags that describe the structure of the document. Some of those tags refer to other resources, like images, stylesheets or scripts, and the web browser will automatically fetch these too - and when it does it lets the server know which page needed those resources in the Referer header.

Moreover, every server can set cookies in their response headers, and the browser is expected to store them and pass them back to the domain that set them. These are used by the server to know you are the same visitor when you visit another page on that site.

I did an experiment. I installed VirtualBox on my machine, and installed a minimal Debian 7.1 inside a virtual machine I called "turd". I installed --without-recommends iceweasel xfce4 gdm3 xorg, and shut down the virtual machine. I then enabled network capture on the virtual machine and started it up.

VBoxManage modifyvm "turd" --nictrace1 on --nictracefile1 turd.pcap
VirtualBox -startvm "turd"

I launched Iceweasel, and entered the terms snowden lanchester in the search box on the start page. I clicked the link in the search results, and read the article. I then shut down the virtual machine and opened the network capture in Wireshark, setting the display filter to show only outbound HTTP requests:

http && ip.src==10.0.2.15

I edited the preferences to alter the columns, I deleted them all and added two Custom columns

http.request.full_uri
http.referer


I then printed the results to a plain text file (packet summary line for all displayed packets - I had to delete the columns I didn't want printed, as just hiding them didn't seem to work). The text file was quite long. Here's what I found with some simple shell scripts.

Browsing the article page:

http://www.theguardian.com/world/2013/oct/03/edward-snowden-files-john-lanchester

resulted in 182 additional HTTP requests with the Referer header set to the article page, across 57 domains: