Caching Improvements in Internet Explorer 9

The network plays a crucial role in the overall performance of a web browser. The best way to improve network performance is to minimize the volume of network traffic by using HTTP compression and taking advantage of the browser cache.

We’ve made a tremendous number of improvements to the way that Internet Explorer 9 caches content to ensure that as many resources as possible are loaded from the cache. This post describes those improvements which are now available in the third IE9 Platform Preview which was released last month.

Understanding Caching

Let’s start with a quick refresher on how caching works in browsers. At a high level, web browsers make two types of requests over HTTP and HTTPS—conditional requests and unconditional requests.

An unconditional request is made when the client browser does not have a cached copy of the resource available locally. In this case, the server is expected to return the resource with a HTTP/200 OK response. If the response’s headers permit it, the client may cache this response in order to reuse it later.

If the browser later needs a resource which is in the local cache, that resource’s headers are checked to determine if the cached copy is still fresh. If the cached copy is fresh, then no network request is made and the client simply reuses the resource from the cache.

If a cached response is stale (older than its max-age or past the Expires date), then the client will make a conditional request to the server to determine whether the previously cached response is still valid and should be reused. The conditional request contains an If-Modified-Since and/or If-None-Match header that indicates to the server what version of the content the browser cache already contains. The server can indicate that the client’s version is still fresh by returning HTTP/304 Not Modified headers with no body, or it can indicate that the client’s version is obsolete by returning a HTTP/200 OK response with the new version of the content.

Obviously, conditional requests result in better performance than unconditional requests (because the server need not retransmit the entire file if the client’s version is fresh) but the best performance is obtained when the client knows that the version in its cache is fresh and the conditional revalidation can be avoided entirely.

Extremely Long-Life Cache Headers

While RFC2616 recommends that servers limit freshness to one year, some servers send Cache-Control directives specifying a much longer freshness lifetime. Prior to IE9, Internet Explorer would treat as stale any resource with a Cache-Control: max-age value over 2147483648 (2^31) seconds, approximately 68 years.

With Internet Explorer 9, we now accept any value up to 2^63 for the max-age value, although internally the freshness interval will be truncated to 2^31 seconds.

Vary Improvements

The HTTP/1.1 Vary response header allows a server to specify that a fresh cached resource is valid for future reuse without server revalidation only if the specified request headers in the later request match the request headers in the original request.

For example, this enables a server to return content in English with a Vary: Accept-Language header. If the user later changes their browser’s Accept-Language from en-US to ja-JP, the previously cached content will not be reused, because the Accept-Language request header no longer matches the request header at the time that the original English response was cached.

With Internet Explorer 9, we’ve enhanced support for key Vary header scenarios. Specifically, IE9 will no longer require server revalidation for responses which contain Vary: Accept-Encoding and Vary: Host directives.

We can safely support these two directives because:

  1. All requests implicitly vary by Host, because the host is a component of the request URL.
  2. IE always decompresses HTTP responses in the cache, making Vary: Accept-Encoding redundant.

Like IE6 and above, IE9 will also ignore the Vary: User-Agent directive.

If a response contains a Vary directive that specifies a header other than Accept-Encoding, Host, or User-Agent (or any combination of these) then Internet Explorer will still cache the response if the response contains an ETAG header. However, that response will be treated as stale and a conditional HTTP request will be made before reuse to determine if the cached copy is valid.

Redirect Caching

Internet Explorer 9 now supports caching of HTTP redirect responses, as described by RFC 2616. Responses with Permanent Redirect status (301) are cacheable unless there are headers which forbid it (e.g. Cache-Control: no-cache) and those with Temporary Redirect status (302 or 307) are cacheable if there are headers which permit it (e.g. Cache-Control: max-age=120).

While this significantly improves performance, web applications that are misconfigured might not work as expected after this change. For example, we’ve found a few commonly-visited sites that use a pattern which looks like this:

> GET / HTTP/1.1 
< 301 Redirect to /SetCookie.asp

> GET /SetCookie.asp HTTP/1.1 
< 301 Redirect to /

The site’s goal is to have the homepage determine if the user has a cookie set, and if not, send them to a page that sets the cookie. The problem is that the server has chosen a 301 for this task, and a 301 is cacheable. Hence, IE will simply redirect between these two cached redirects on the client (never again contacting the server) until the user gets bored and closes the browser. Notably, any version of IE would hit a redirect loop in the scenario above if the user had disabled cookie storage for the site in question.

If your site makes use of redirects, you should ensure that it is configured to avoid redirect loops by ensuring that any redirect that relies upon side-effects (e.g. testing or setting a cookie) is marked uncacheable.

HTTPS Caching Improvements

A few months ago, I mentioned that Internet Explorer will not reuse a previously-cached resource delivered over HTTPS until at least one secure connection to the target host has been established by the current process. This can cause previously-cached resources to be ignored, leading to unconditional network requests for content that was already in the local cache.

In IE9, the unnecessary cross-host HTTPS requests are now conditional requests, so the server can simply return a HTTP/304 Not Modified response for unchanged content. While a round-trip cost is still incurred, significant performance improvements are gained because the server does not need to retransmit the entire resource.

Back/Forward Optimization

For IE9, we’ve made improvements so that clicking the back and forward buttons results in faster performance.

RFC2616 specifically states that a browser’s Back/Forward mechanisms are not subject to cache directives:

History mechanisms and caches are different. In particular history
mechanisms SHOULD NOT try to show a semantically transparent view of
the current state of a resource. Rather, a history mechanism is meant
to show exactly what the user saw at the time when the resource was
retrieved.

By default, an expiration time does not apply to history mechanisms.
If the entity is still in storage, a history mechanism SHOULD display
it even if the entity has expired, unless the user has specifically
configured the agent to refresh expired history documents.

In previous versions of Internet Explorer, when the user navigated back or forward, IE would check the freshness of resources if they had been sent with the must-revalidate cache-control directive, and in numerous other circumstances depending on how recently the resource was downloaded. In IE9, the INTERNET_FLAG_FWD_BACK flag behaves as described on MSDN, and IE will not check the freshness of cached resources when the user navigates Back or Forward.

As a result of this optimization, Internet Explorer 9 can perform far fewer conditional HTTP requests when navigating with Back and Forward. For example, the following table shows the improvement when going Back to a typical article on a popular website:

IE8

IE9

Improvement

Back/Forward Navigation

Request Count: 21

Bytes Sent: 12,475

Bytes Recv: 216,580

Request Count: 1

Bytes Sent: 325

Bytes Recv: 144,617

Request Count: -20 (-95%)

Bytes Sent: -12,150 (-97.4%)

Bytes Recv:-71,963 (-33.3%)

Now, I mentioned that we ignore caching directives when navigating back and forward, so alert readers may be wondering why IE9 still makes one request when clicking Back on this site. The reason is that IE will not commit to the cache any uncacheable resource. An uncacheable resource is one delivered with a Cache-Control: no-cache directive or with an Expires date in the past or an Expires date not later than the Date header. Therefore, the browser is forced to redownload such resources when the user navigates Back and Forward. To improve performance and enable a resource to be reused in Back/Forward navigation while still requiring revalidation for other uses, simply replace Cache-Control: no-cache with Cache-Control: max-age=0.

Unlike the other improvements in described in this post, back/forward optimizations are not visible in the Platform Preview build because it does not have a back button.

Heuristic Cache Improvements

Best practices recommend that web developers should specify an explicit expiration time for their content in order to ensure that the browser is able to reuse the content without making conditional HTTP requests to revalidate the content with the server. However, many sites deliver some content with no expiration information at all, relying upon the browser to use its own rules to judge the content’s freshness.

Internet Explorer allows the user to configure what should happen when content is delivered without expiration information. Inside Tools > Internet Options > Browsing history > Settings, you will see four options:

These four options have the following behavior:

Every time I visit the webpage

Any resource without explicit freshness information is treated as stale and will be revalidated with the server before each reuse.

Every time I start Internet Explorer

Any resource without explicit freshness information is validated with the server at least once per browser session (and every 12 hours thereafter in that session).

Automatically (Default)

Internet Explorer will use heuristics to determine freshness.

Never

Any cached resource will be treated as fresh and will not be revalidated.

These options only control the browser’s behavior when content is delivered without expiration information; if the content specifies an explicit policy (e.g. Cache-Control: max-age=3600 or Cache-Control: no-cache) then the browser will respect the server’s directive and the options here have no effect.

In earlier IE versions, the Automatic Heuristics were simple and only affected cached images, but IE9 improves the heuristics to match RFC2616’s suggested behavior:

if the response does have a Last-Modified time, the heuristic
expiration value SHOULD be no more than some fraction of the interval
since that time. A typical setting of this fraction might be 10%.

If Internet Explorer 9 retrieves a cacheable resource which does not explicitly specify its freshness lifetime, a heuristic lifetime is calculated as follows:

max-age = (DownloadTime – LastModified) * 0.1

If a Last-Modified header wasn’t present in the server’s response, then Internet Explorer will fall back to the “Once per browser session” revalidation behavior.

As a result of the improvement to heuristic caching, Internet Explorer 9 can perform far fewer conditional HTTP requests when reloading many pages. For example, the following table shows the improvement when revisiting a typical article on a popular website:

IE8

IE9

Improvement

Revisit in new browser session (PLT2)

Request Count: 42

Bytes Sent: 26,050

Bytes Recv: 220,681

Request Count: 2

Bytes Sent: 1,134

Bytes Recv: 145,217

Request Count: -40

(-95.3%)

Bytes Sent: -24,916 (-95.6%)

Bytes Recv: -75,464(-34.2%)

The Caching Inspector in Fiddler will show you when a response expires, based on the headers provided on that response. For instance, here’s what you see for a response which contains an ETAG and Last-Modified header, but no expiration information:

Other Networking Improvements

In this post, I’ve enumerated the improvements in Internet Explorer’s caching code that help ensure web sites can make the most efficient possible use of the network. Of course, web developers should continue to follow best practices and specify their desired cache behavior using the Expires and Cache-Control headers, but even sites that fail to do so will load more quickly in IE9.

In a future post, I’ll describe other improvements we’ve made to the IE9 Networking Stack to further improve page load times.

-Eric Lawrence


IEBlog