$Id: cookies.html,v 1.11 2003/02/17 23:49:10 dean Exp $

This page was written (while working at HotWired) as a generic response to the significant number of users' complaints regarding HotWired tracking cookies that have been forwarded to me. I am generally the contact person for cookie questions at HotWired. I believe the users' complaints are misplaced -- instead of complaining about the sites using tracking cookies they should be complaining to the browser developers for more features to control cookies. I am not going to touch on the privacy issues of tracking, but I will mention that cookies are just the tip of the iceberg. I am not going to examine HotWired's motivation to use tracking cookies, that's not my job at all.

I'm hoping to provide enough information for the technical and non-technical readers... and I'm sure I'm failing for both.

Stateful vs. Stateless Protocols

HTTP is a stateless protocol. This means that an HTTP server has no information in a request to tie it to any other request. The data in a response is based only on the information the client sends in the request. It's like doing a math problem in high school -- you are only allowed to use the facts given in the problem plus mathematical logic to derive an answer.

HTTP stands out from all the other protocols you're probably familiar with using. These protocols are all "stateful", information divulged in one request can be used to modify future requests. In fact these protocols have a concept of a "session" wherein a batch of requests are sent and responses received. FTP (file transfer protocol) has many states, including "the current directory". SMTP (simple mail transfer protocol) and POP (post office protocol) both include a concept of "who you are" which is used for all requests. NNTP (network news transfer protocol) allows you to "change usenet groups" to direct where future requests for articles will be retrieved from.

Stateless protocols generally have the advantage that they require fewer resources on the server -- the resources are pushed into the client. But the disadvantage is that the client needs to tell the server enough information on each request to be able to get the proper answer. Cookies are a method for a server to ask the client to store arbitrary data for use in future connections. The server is asking the client to keep state information.

An Optional Standard

Cookies are not part of the HTTP/1.0 specification. They are an optional extension designed by Netscape. For this reason not all clients support cookies. The standard does not specify any method for a client to tell a server that it supports or doesn't support cookies. A server essentially has to guess if a browser supports cookies. One guess is to use the User-Agent string (this is a piece of text that identifies your browser -- "Mozilla/3.01" would indicate Netscape 3.01). But testing for that indicates whether the browser supports cookies, not whether the user wants their browser to support cookies.

In typical tracking cookie implementations, an attempt is made to send a cookie on every hit that didn't have a cookie in the request. Here are potential server-side modifications to tracking cookies which I don't feel are satisfactory solutions. I'd be interested in hearing more.

Fix it in the Browser

The solution which deals with these problems in the best way is to add the following easy to implement features to the "accept cookie" dialogue:

For best benefits, the browser should save your choice so that you don't have to make the choice again each time you visit the site. If you don't want tracking cookies, then deny all cookies of that name. If you don't want any cookies at all from a particular site, then choose that. But note that for some things, cookies are necessary. (For example, you can't use the "set as default" button on HotBot without accepting the cookie.)

HotWired is one of thousands of sites using tracking cookies. Apache, the most commonly used server has included a module that implements tracking cookies since at least revision 1.0. Even if the reader convinced HotWired to tweak our tracking cookie system are you going to convince the rest of the sites? Whereas (based on HotWired user-agent statistics) there's a 80% chance that the reader is using one of Netscape Navigator or Microsoft Internet Explorer, so there are only two companies to convince to add functionality to their browser.

I completely understand that Netscape and Microsoft are large companies, and are not likely to respond at all to any single request from a single user. However if all the people who have complained to HotWired would write Netscape and Microsoft they would be more likely to listen. I'm sure the original "accept cookie" dialogue was prompted by users annoyed at the privacy issues of the cookie spec.

The HTTP State Management Mechanism draft proposal also requires browsers and servers to provide more information to the user for controlling cookies. It also deals with many of the other (gross) problems that the current cookie spec has.

Appendix: Tracking Without Cookies

i have a more recent document which expands on these methods.

So you're busy avoiding tracking cookies thinking you're protecting your privacy. Allow me to outline a basic technique that you cannot control that will allow your session to be tracked. Your ip address is the key to this. Your ip address identifies you uniquely during your session (modulo firewall considerations -- but for the vast majority of dialup and university users, this statement is true).

The heuristics are clear. Consider any hits from an ip address within 10 minutes of each other to be part of the same session. Use the Referer header to track the progress of that user through your site.

There are even more tricks that can glean tracking information. Suppose you go to a page with frames. Unless you use "view document info" or "view document source" you have no idea of the URLs used to load the frame components. It would only be moderately difficult to insert tracking information into those URLs, something similar to what pathfinder does, but without the URL being obvious. If you've got javascript enabled then the links on the pages can show any message they want in the message window at the bottom -- so even if the link goes to some butt-ugly URL with a tracking id embedded in it the message you see looks all nice and pretty.

Then there are sites like Network Fusion which access all their information from databases and use entirely cryptic URLs. The URLs certainly contain tracking information -- you can't read the site without registering, and have to "log in" in order to read. Unfortunately they haven't done it very well because there's absolutely no way to direct someone to documents on the site with a URL. They use "docids" to refer people to parts of their site in email. But this problem can be solved (similar to how you can give URLs to parts of pathfinder's site by removing the cookie).

Here's a method that uses an extension to HTTP/1.0 (a part of HTTP/1.1) called keepalive. In HTTP/1.0 an HTTP request requires a new TCP/IP connection to be initiated and then torn down after the response. This unfortunately causes a significant amount of bandwidth to be wasted doing the "book keeping" for each TCP/IP session. Keepalive addresses this by defining how to issue multiple requests and receive responses using a single TCP/IP connection. With appropriate care in the server implementation it can be ensured that your client will open essentially only 4 connections (4, or whatever your "simultaneous network connections" setting is) to the server for your entire session. The server knows when you've left because your client closes (one or more of) the connections. So not only can you be tracked through the entire session, but the server knows how long you've been visiting. You won't even know it's happening. (I don't think any site presently does this, but I'd be interested in hearing of any that do.)

In short I'm saying that cookies are only one form of tracking. With the additions in the draft proposal mentioned above plus the dialogue additions I'm asking for, cookies are quite manageable. You know the cookies are there, and you can see the site trying to track you. Isn't that much better than trying to dumb down your browser enough that there's no known way to track you? It's an arms race.

Dean Gaudet