Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The bugfix that could make the Internet 5% faster (cereto.net)
71 points by dudus on Jan 12, 2012 | hide | past | favorite | 27 comments


I don't think this argument stands up to technical examination. His claim is based on this:

1. The average size of HTTP request headers is 700 bytes. 2. Of that around 25% is Google Analytics 3. About 50% of web sites use GA

Thus you could save 12% of HTTP request header size across the Internet.

Then he posits that this would make the net 5% faster. How?

If the average size is really 700 bytes then they will fit inside a single TCP packet and so changing it from 700 to 525 will make no difference at all. The entire request will be in a packet. The real cost is multiple packets requiring ACKs and that occurs with the larger header sizes that SPDY specifically targets with HTTP header compression. I've seen many egregious cookies, but the GA ones are tiny.

If you really want to speedup the web you need something like SPDY plus SDCH to do data-aware compression of repeated chunks of HTML.


Packet sizes make very little difference on an unsaturated line, which is often the case on the client side. It's very often not the case on the server side, or on mobile networks. There, it's not at all uncommon to see line saturation at peak usage, and under those circumstances pure bandwidth optimizations can be helpful.


The server's uplink is certain to be full-duplex and outbound heavy, so inbound bandwidth is effectively unconstrained. You still could run into line congestion somewhere in the middle, in which case bandwidth optimizations would help, but router congestion is much more common.

Except in mobile networks, as you note.


I'd guess that the mean request size is misleading though - it is presumably the average of a mix of lightweight CDN requests and gargantuan app requests loaded up with the egregious cookies jgrahamc mentions. It's not hard to imagine a healthy number of requests hovering right at the brink of breaking into two packets with a 1500 byte MTU.


I agree that my math is flawed and I ignore the whole TCP package splitting argument. Still that GA cookie could easily be breaking millions of TCP packages every day. But even if the impact is much lower. Why should we keep having such unnecessary data on those HTTP requests?


The speed of the request is probably not linear with the size either.

At 700-800 bytes for an average HTTP request you are below the single packet size. Given the overhead of processing the packet the extra bytes are pretty cheap: a 1 byte packet is not going to be 1000% faster than a 1K packet.

Now, if the cookie pushed the average request size over the packet boundary then the cost would be huge.


Exactly, and even if it was 5%, which by no means it is, what % of time is the request vs response. Sounds like this a solution to a non-problem.


We must also consider the relative cost of google implementing fully cross-browser compatible localStorage code into ga.js, with fallbacks on cookies. They're likely going to want to put the burden on the client than on their own personal bandwidth.


Please, I beg you, do not call the web "the internet". The web is but a subset of the internet. Yes, it is taking over the rest of the internet (web-mail and such). Yes, client-side HTTP is often the only part of the internet you can have from many networks. But no, that is not a good thing.

But because of sloppy wording, non-technical people get the idea that if they can query web servers, then "they have the internet". Most restrictions go unnoticed.

I know it may sound pedantic. It may even remind you of Stallman's GNU/Linux. But really, it can't hurt to say "web" instead of "internet": unlike "GNU/Linux", it is shorter than the incorrect word.


The moment you notice GA is present in about 50% of top websites you notice that useless GA cookies going around the internet represent 12% of all HTTP requests.

This is incorrect. "GA is present in about 50% of top websites" does not mean "50% of HTTP requests are on the domain with the cookies." Many websites load external images, etc. See ytimg.com, etc.


Of course the websites that use cookieless domains for resources reduce this issue by an order of magnitude, but it still exists in at least the html document request. Besides if you think that 50% of the traffic of the top million sites probably represent 70%-80% of the total internet traffic the real gain may be even larger.


1) How can half the traffic on a subset ever represent more than half of the traffic on the superset? The 12% number assumes that these top million sites have effectively all global traffic; in practice it will only ever be less than that. 2) These top million sites are exactly the sort that are likely to serve static content from a different hostname.


"We're talking about the average speed of the whole internet"

5% fewer bytes in the request headers would seem to be well within the noise in terms of user experience when you consider latency and application runtime.

This is probably a good argument for using a secondary hostname for assets (like facebook, with fbcdn.net), though, if you want to shave the last few bytes off of requests you don't need to track.


I never understood that, why would moving assets to another domain name speed things up? Wouldn't it involve another DNS lookup, thus adding to download time?

edit: I remember reading about this on Yahoo's speed guide, actually: http://developer.yahoo.com/performance/rules.html#cookie_fre...

Still don't quite get it though. Cookie data is always passed in HTTP requests for some reason?


Yeah, there's an initial hit, but only one DNS lookup per session-or-so if you've got your TTLs right. You're likely to hit hundreds (if not thousands) of assets in a typical Facebook session, though.

Edit: Actually, in Facebook's case, Akamai has a 20 second TTL. But it's not difficult to perform ~1k asset requests in that amount of time... just scroll through a large friend list.


This article presupposes a vast quantity of the internet's data is tracked basic HTTP requests.

Huge quantities of the traffic on the internet are video data being slung about (which has no where near the penetration of GA the poster is assuming). Cisco's estimate is video will be >50% of all traffic in 2012

Source: http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/...


Or block ga entirely in your browser for 100% speed recovery against tracking.

But seriously, GA will still work if you add DEFER and make it load only after page load. Then there is zero impact on the page.


You're missing the point. I'm not worried about the rendering time here. I'm worried about the HTTP request size. And even though block GA fix the issue to you it's still present for others.


But then GA might not record that someone loaded the page and clicked a link to navigate away, before the page was done loading.


So what? Why should you count this case?


Forgive me if I am missing something, but wouldn't moving the cookies to localstorage still require an additional call to be made to send the data to the server in the end - not to mention the additional javascript required to handle the localstorage management and ajax calls?


The idea is just to get rid of the extra overhead on the HTTP request, because that overhead will be multiplied by the requests you do. Any extra code on ga.js is cached and doesn't incur in too much overhead.


But you would still have an additional HTTP request that would be sending the same information + headers for another call - in the end it would be the same performance wise as if you were DEFERing the request, but you wouldn't be saving any overhead on data passing around.


That's not a bug at all, it's a feature.

How is Google supposed to track repeat visitors without any kind of state except IP? And IP is an awful variable for state because of NAT.

The heading is misleading and overly sensational.


The idea is not to get rid of the data that is stored in the cookie. But instead to move it into the localStorage. Still acessible through JS and doesn't ge into the HTTP request.


But the data still needs to go the Google Analytics server to actually do the tracking...in an HTTP request.


Yes, but you wouldn't have to send the cookie to Google's servers every page load. This is a great use of the localStorage or sessionStorage in HTML5. Another obvious benefit we will see when using sessionStorage in specific is that you can store your session ID in that storage space and not have to worry about logging in from a new tab causing a session to overflow from one tab to the other.

Cookies are shared between browser sessions, but sessionStorage is not. localStorage is the same way except it persists over browser sessions and between tabs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: