Advances in JavaScript Performance in IE10 and Windows 8

Thursday, May 31, 2012, we delivered the Windows 8 Release Preview and the
Sixth IE10 Platform Preview. Windows 8 includes one HTML5 browsing engine that powers both browsing experiences (Metro style and desktop) as well as Metro
style applications that use HTML5 and JavaScript. The release preview represents a major revision of the same modern JavaScript engine, Chakra, which first debuted
with IE9. With each platform preview we make progress against our goals to create an engine that delivers great performance on the Web while ensuring that it is
highly compatible, interoperable, and secure. This post will explore how the JavaScript engine has been enhanced to deliver great performance for emerging Web application
scenarios.

Performance for Real Web Applications

Web applications have been evolving rapidly in recent years. A decade ago the Web consisted primarily of Web sites with static content, like what you may encounter
in a blog, a small business landing
page, or on Wikipedia. The emergence of AJAX helped spawn more complex and interactive sites like what you see on
Facebook or JetSetter. Subsequent advances in performance allowed for large and complex applications to be created,
such as
Office 365, Bing Maps, etc. Most recently, the expansion of the W3C standard APIs, gains in JavaScript performance,
and hardware accelerated graphics made building even sophisticated games on the Web possible, for example, Angry Birds, Pirates Love Daisies,
Cut The Rope, etc.

Diagram showing spectrum of Web pages and their performance characteristics. On the left are Basic Web Pages where Page Load Time is the driving performance goal. On the right are Web Applications, HTML5 Games, and Windows 8 Metro style apps where JavaScript Execution Speed, DOM Interactions, and Accelerated graphics have the biggest impact on performance.

As applications evolve, the performance factors affecting user experience change. For traditional Web sites, initial page load determines how quickly the user can
see the content. Interactive Web sites and large Web applications may be gated by the efficiency of DOM operations, CSS processing, and manipulation of large internal
state in memory. HTML5 games often depend on fast canvas rendering, JavaScript execution and efficient garbage collection. In short, browser performance is a complex
problem, which requires taking into account the needs of a broad spectrum of diverse applications.

In this post we fill focus on performance of only one browser subsystem, the JavaScript engine. With recent gains in JavaScript performance, for many Web applications
JavaScript execution is no longer a limiting factor. On the other hand, as performance increases, new scenarios emerge that place additional demands on the JavaScript
engine. We continually look for opportunities to evolve Chakra to match performance requirements of real JavaScript-intensive applications.

Two dimension chart showing screen shots of various sites plotted on two axis: Use of Other Browser Components (Y) and JavaScript Execution (X). Content sites are illustrated in the lower left (least use of other browser components and least use of JavaScript). Graphics-intensive games such as Angry Birds are show in the top right quadrant.
Dimensions of Web Application Performance

Internals of Chakra

From its inception in IE9, the Chakra JavaScript engine was designed around two guiding principles, which remain equally important in IE10:

  • Minimize the amount of work on the critical path for the user experience. This involves deferring as much work as possible until absolutely necessary, avoiding
    work altogether, making use of periods of inactivity, and parallelizing work to minimize impact on the responsiveness of the application.
  • Take advantage of all available hardware. This translates to utilizing all available CPU cores, as well as generating advanced specialized CPU instructions, for example,
    Intel’s SSE2, if available.

Diagram illustrating the Chakra JavaScript engine's use of two processor cores.
Chakra’s Parallel Architecture

Chakra, though only one of the browser subsystems – is itself comprised of several components which work together to process and execute JavaScript code. When the
browser downloads a JavaScript file it hands its content over to Chakra’s parser to verify its syntactical correctness. This is the only operation that applies
to the entire file. Subsequent steps are performed individually on each function (including the global function). As a function is about to be executed (the global
function is run immediately after parsing) Chakra’s parser builds an abstract syntax tree (AST) representation of the code, and hands it off to the bytecode generator,
which produces an intermediate form (bytecode) suitable for execution by the interpreter (but not directly by the CPU). Both the AST and the function bytecode are
preserved so they don’t need to be recreated on subsequent executions. The interpreter is then invoked to run the function. As the interpreter executes individual
operations it collects information (a profile) about the types of inputs it encounters and keeps track of how many times the function was called.

As the number of calls reaches certain threshold, the interpreter queues the function up for compilation. Unlike in other browsers, Chakra’s just-in-time (JIT)
compiler runs on a separate dedicated thread and thus does not interfere with script execution. The sole job of the compiler is to generate optimized machine instructions
for each function in the compilation queue. Once a function is compiled, the availability of the machine code is signaled to the main script thread. Upon the next
invocation, the entry point to the function is redirected to the newly compiled machine code and execution proceeds directly on the CPU. It’s important to note that
functions that are called only once or twice never actually get compiled, which saves time and resources.

JavaScript is a managed runtime in that memory management is hidden from the developer and performed by an automatic garbage collector, which runs periodically
to clean up any objects that are no longer in use. Chakra employs a conservative, quasi-generational, mark and sweep, garbage collector that does most of its work
concurrently on a dedicated thread to minimize script execution pauses that would interrupt the user experience.

This architecture allows Chakra to start executing JavaScript code almost immediately during page load. On the other hand, during periods of intense JavaScript
activity, Chakra can parallelize work and saturate up to three CPU cores by running script, compiling and collecting garbage at the same time.

Fast Page Load Time

Even relatively static Web sites tend to use JavaScript for interactivity, advertising, or social sharing. In fact, the volume of JavaScript included in
Alexa’s top 1 million pages has been steadily increasing, as reported by Steve Souders’ HTTP Archive.

Chart showing volume of JavaScript in Alexa’s Top 1 Million Pages
Volume of JavaScript in Alexa’s Top 1 Million
Pages

The JavaScript code included in these Web sites must be processed by the browser’s JavaScript engine and the global function of each script file must be executed
before the content can be fully rendered. Consequently, it is crucial that the amount of work performed on this critical path be minimized. Chakra’s parser and
bytecode interpreter were designed with this objective in mind.

Bytecode Interpreter. JavaScript code executed during page load often performs initialization and setup that is executed only once. To minimize the overall
page load time it is imperative to start executing this code immediately – without waiting for a just-in-time compiler to process the code and emit machine instructions.
The interpreter starts running JavaScript code as soon as it is translated into bytecode. To further reduce the time to first executed instruction, Chakra processes
and emits bytecode only for functions that are about to be executed using a mechanism called deferred parsing.

Deferred Parsing. Chart showing the fraction of code executed in 11 popular Web sites. The amount ranges from a little over 30% to a little over 50%.The JSMeter project from Microsoft Research showed that typical Web pages use only a fraction
of code that they download – generally on the order of 40-50% (see chart to right). Intuitively, this makes sense: developers often include popular JavaScript libraries like
jQuery or dojo or custom ones like those used in Office 365, but only leverage a fraction of the functionality the
library supports.

To optimize such scenarios, Chakra performs only the most basic syntax-only parsing of the source code. The rest of the work (building the abstract syntax tree
and generating bytecode) is performed one function at a time only when the function is about to be invoked. This strategy not only helps with the responsiveness
of the browser when loading Web pages, but also reduces the memory footprint.

In IE9 there was one limitation of Chakra’s deferred parsing. Functions nested inside other functions had to be parsed immediately with their enclosing functions.
This restriction proved important because many JavaScript libraries employ the so called “module
pattern,” in which most of the library’s code is enclosed in a large function which is immediately executed. In IE10 we removed this restriction and Chakra
now defers parsing and bytecode generation of any function that is not immediately executed.

Performance Improvements for JavaScript-Intensive Applications

In IE10, as in IE9 before, we strive to improve the performance of real Web applications. However, Web applications depend on JavaScript performance to a varying
degree. To discuss the enhancements in IE10 it’s most useful to focus on those applications which are JavaScript-intensive; where improvements in Chakra yield substantial
performance gains. An important class of JavaScript-intensive applications includes HTML5 games and simulations.

At the onset of IE10 we analyzed a sample of popular JavaScript games (for example, Angry Birds,
Cut the Rope, or Tankworld) and simulations (for example,
FishIE Tank, HTML5 Fish Bowl,
Ball Pool, Particle System) to understand what performance improvements would
have the most significant impact on the user experience. Our analysis revealed a number of common characteristics and coding patterns. All of the applications are driven
by a high frequency timer callback. Most of them use canvas for rendering, but some rely on animating DOM elements, and some use a combination of the two. In most
applications at least portions of the code are written in the object oriented style – either in application code or in included libraries (for example,
Box2d.js). Short functions are common, as are frequent property reads and writes, and polymorphism. All of the applications perform floating point arithmetic
and many allocate a fair amount of memory putting pressure on the garbage collector. These common patterns became the focus of our performance work in IE10. The
following sections describe the changes we’ve made in response.

Just-in-Time Compiler – Reconsidered and Improved

IE10 includes substantial improvements to Chakra’s JIT compiler. We added support for two additional processor architectures: x64 and ARM. That’s why, whether your
JavaScript application is experienced by the user on a 64-bit PC or an ARM-based tablet, it enjoys the benefits of executing directly on the CPU.

We also changed the fundamental approach to generating machine code. JavaScript is a very dynamic language, which limits how much a compiler can know when generating
code. For example, when compiling the function below, the compiler doesn’t know the shape (property layout) of the objects involved or types of their properties.

function compute(v, w) {

return v.x + w.y;

}

In IE9 Chakra’s compiler generated code that located every property at runtime and handled all plausible operations (in the example above: integer addition, floating
point addition, or even string concatenation). Some of these operations were handled directly in machine code, while others required help from Chakra’s runtime.

In IE10, the JIT compiler generates profile-based, type-specialized machine code. In other words, it generates machine code that is tailored to objects of a particular
shape and values of a particular type. To emit the right code the compiler needs to know what types of input values to expect. Because JavaScript is a dynamic language,
this information is not available in the source code. We enhanced Chakra’s interpreter to collect it at runtime, a technique we call dynamic profiling. When a function
is scheduled for JIT compilation, the compiler examines the runtime profile gathered by the interpreter and emits code tailored to the expected inputs.

The interpreter gathers information for the runs it observes, but it’s possible that the execution of the program will lead to runtime values which violate assumptions
made in the generated optimized code. For every assumption it makes, the compiler emits a runtime check. If a later execution results in an unexpected value, the check
fails, execution bails out of the specialized machine code, and is continued in the interpreter. The reason for bailout (the failed check) is recorded, additional profile
information is collected by the interpreter, and the function is recompiled with different assumptions. Bailout and re-compilation are two fundamentally new capabilities
in IE10.

The net effect is that Chakra’s IE10 compiler generates fewer machine instructions for your code, reducing the overall memory footprint and speeding up execution.
This particularly impacts apps with floating point arithmetic and object property access, like the HTML5 games and simulations we previously discussed.

If you write JavaScript code in the object oriented style, your code will also benefit from Chakra’s support for function inlining. Object oriented code commonly
contains a large proportion of relatively small methods, for which the overhead of the function call is significant compared to the execution time of the function.
Function inlining allows Chakra to reduce this overhead, but more importantly it greatly expands the scope of other traditional compiler optimizations, such as
loop invariant code motion or copy propagation.

Faster Floating Point Arithmetic

Most JavaScript programs perform some amount of integer arithmetic. As the example below illustrates, even in programs that don’t focus primarily on arithmetic,
integer values are commonly used as iteration variables in loops or as indices into arrays.

function findString(s, a) {

for (var i = 0, al = a.length; i < al; i++) {

if (a[i] == s) return i;

}

return -1;

}

Floating point math, on the other hand, is typically restricted to certain classes of applications such as games, simulations, sound, image or video processing,
etc. Historically, few such applications were written in JavaScript, but recent advances in browser performance have made JavaScript implementations viable. In
IE9 we optimized Chakra for the more common integer operations. In IE10 we dramatically improved floating point math.

function compute(a, b, c, d) {

return (a + b) * (c − d);

}

Given a simple function above a JavaScript compiler cannot determine the types of arguments a, b, c and d from the source code. The IE9 compiler would assume that
the arguments were likely to be integer numbers and generate fast integer machine instructions. This worked very well if during execution the arguments were, indeed,
integers. If floating point numbers were used instead, the code had to rely on much slower helper functions in Chakra’s runtime. The overhead of function calls
was further exacerbated by boxing and unboxing of intermediate values on the heap (in most 32-bit JavaScript engines, including Chakra, individual floating point
values must be allocated on the heap). In the expression above the result of each operation required a heap allocation, followed by storing the value on the heap,
and then retrieval of the value from the heap for the next operation.

In IE10, the compiler takes advantage of the profile information collected by the interpreter to generate dramatically faster floating point code. In the example
above, if the profile indicates that all arguments are likely to be floating point numbers, the compiler will emit floating point machine instructions. The entire
expression will be computed in just three machine instructions (assuming all arguments are already in registers), all intermediate values will be stored in registers,
and only one heap allocation will be required to return the final result.

For floating point intensive applications this is a massive performance gain. Experiments show that in IE10 floating point operations execute about 50% faster than
in IE9. In addition, the reduced rate of memory allocation means fewer garbage collections.

Faster Objects and Property Access

JavaScript objects are a convenient and broadly used mechanism for grouping logically related sets of values. Whether you’re using JavaScript objects in a structured
object oriented programming style or merely as flexible packaging for values, your code will greatly benefit from the improvements in object allocation and property
access performance added in IE10.

As mentioned earlier, efficient property access is complicated in JavaScript because the shape of an object isn’t known during compilation. JavaScript objects can
be created ad hoc without a predefined type or class. New properties can be added to (or even removed from) objects on the fly and in any order. As a result, when
compiling the following method, the compiler doesn’t know where to find the values of properties x, y, and z on the Vector object.

Vector.prototype.magnitude = function() {

return Math.sqrt(this.x * this.x + this.y * this.y + this.z * this.z);

}

In IE9 we introduced inline caches which greatly speed up access to properties. Inline caches remember the shape of the object and the location in the object’s
memory where a given property can be found. Inline caches can remember only one object shape and work well if all objects a function works with are of the same
shape. In IE10 we added a secondary caching mechanism which improves performance of code operating on objects of different shapes (polymorphic).

Before a property value can be read the compiler must verify that the object’s shape matches that stored in the inline cache. To do that, in IE9, the compiler
generates a runtime shape check before every property access. Because programs often read or write multiple properties of the same object in close succession (as
in the example below), all these checks add overhead.

function collide(b1, b2) {

var dx = b1.x - b2.x;

var dy = b1.y - b2.y;

var dvx = b1.vx - b2.vx;

var dvy = b1.vy - b2.vy;

var distanceSquare = (dx * dx + dy * dy) || 1.0;

//...

}

In IE10, Chakra generates code tailored to the expected object shape. Through careful symbol tracking combined with bailout and re-compilation capabilities the
new compiler dramatically reduces the number of runtime shape checks performed. In the example above, instead of 8 separate shape checks, only 2 are done, one each
for b1 and b2. In addition, once the shape of an object has been established, all property locations are known, and read or write operations are as efficient as
in C++.

In ECMAScript 5 objects may contain a new kind of properties, called accessor properties. Accessor properties differ from traditional data properties in that custom
get and set functions are invoked to handle the read and write operations. Accessor properties are a convenient mechanism for adding data encapsulation, computed
properties, data validation, or change notification. Chakra’s internal type system and inline caches were designed to accommodate accessor properties and facilitate
efficient reading and writing of their values.

If you write an HTML5 game or animation, you often need a physics engine which performs computation required to produce realistic movement of objects under the
force of gravity, simulate collisions, etc. For very simple physics, you may build your own engine, but for a more complex requirements, you would typically use
one of the popular physics libraries now available in JavaScript, such as Box2d.js (ported from
Box2d). These libraries often use small objects, such as Point, Vector or Color. On every animation frame a large number of these objects are created and
promptly discarded. Therefore, it’s important that the JavaScript runtime create objects efficiently.

var Vector = function(x, y, z) {

this.x = x;

this.y = y;

this.z = z;

}

 

Vector.prototype = {

//...

normalize : function() {

var m = Math.sqrt((this.x * this.x) + (this.y * this.y) + (this.z * this.z));

return new Vector(this.x / m, this.y / m, this.z / m);

},

 

add : function(v, w) {

return new Vector(w.x + v.x, w.y + v.y, w.z + v.z);

},

 

cross : function(v, w) {

return new Vector(-v.z * w.y + v.y * w.z, v.z * w.x - v.x * w.z, -v.y * w.x + v.x * w.y);

},

//...

}

In IE10, the internal layout of JavaScript objects is optimized to streamline object creation. In IE9 every object consisted of a fixed-size header and an expandable
property array. The latter is necessary to accommodate additional properties that may be added after the object has been created. Not all JavaScript applications
exploit this flexibility, and objects often receive most of their properties at construction. This trait allows Chakra to allocate most of the properties for such
objects directly with the header, which results in only one memory allocation (instead of two) for every newly created object. This change also reduces the number
of memory dereferences required to read or write the object’s property, and improves register utilization. Improved object layout and fewer runtime shape checks
result in up to 50% faster property access.

Garbage Collection Enhancements

As discussed above, HTML5 games and animations often create and discard objects at a very high rate. JavaScript programs don’t explicitly destroy discarded objects
and reclaim memory. Instead, they rely on the engine’s garbage collector to periodically reclaim memory occupied by unused objects to make room for new ones. Automatic
garbage collection makes programming easier, but typically requires JavaScript execution to pause every now and then for the collector to do its work. If the collector
takes a long time to run, the whole browser may become unresponsive. In HTML5 games, even short pauses (tens of milliseconds) are disruptive because they are perceptible
by the user as glitches in animation.

In IE10 we made a number of enhancements to our memory allocator and garbage collector. We already discussed object layout changes and generation of machine code
specialized for floating point arithmetic, which result in fewer memory allocations. In addition, Chakra now allocates leaf objects (for example, numbers and strings) from
a separate memory space. Leaf objects don’t hold pointers to other objects, so they don’t require as much attention during garbage collection as regular objects.
Allocating leaf objects from a separate space has two advantages. First, this entire space can be skipped during the mark phase, which reduces its duration. Second,
during concurrent collection, new allocations from the leaf object space don’t require rescanning affected pages. Because Chakra’s collector works concurrently
with the main script thread, the running script may modify or create new objects on pages that have already been processed. To make sure such objects aren’t prematurely
collected, Chakra write-protects pages before the mark phase starts. Pages that have been written to during the mark phase must be later rescanned on the main script
thread. Because leaf objects don’t require such processing, pages from the leaf object space don’t need to be write-protected or rescanned later. This saves precious
time on the main script thread, reducing pauses. HTML5 games and animations benefit significantly from this change, because they often work heavily with floating point
numbers and devote much of the allocated memory to heap-boxed numbers.

When the user interacts directly with a Web application, it is critical that the application’s code be executed as fast as possible, ideally without interruptions
for garbage collection. However, when the user switches away from the browser, or even just changes tabs, it is important to reduce the memory footprint of the
now inactive site or application. That’s why in IE9 Chakra triggered collection upon exiting JavaScript code if enough memory has been allocated. This worked well
for most applications, but proved problematic for applications driven by high frequency timers, such as HTML5 games and animations. For such applications collections
were triggered too frequently and resulted in dropped frames and overall degradation of the user experience. Perhaps the most apparent manifestation of this problem
was the Tankworld game, but other HTML5 simulations also exhibited pauses in animation induced by frequent garbage collections.

In IE10 we solved this problem by coordinating garbage collections with the rest of the browser. Chakra now delays the garbage collection at the end of script execution
and requests a callback from the browser after an interval of script inactivity. If the interval elapses before any script executes, Chakra starts a collection,
otherwise collection is further postponed. This technique permits us to shrink memory footprint when the browser (or one of its tabs) becomes inactive, while at
the same time greatly reducing frequency of collections in animation-driven applications.

Combined, these changes reduced the time spent in garbage collection on the main thread by an average factor of four on the HTML5 simulations measured. As a proportion
of JavaScript execution time, garbage collection dropped from around 27% to about 6%.

Summary

IE10 achieves dramatic performance gains for JavaScript-intensive applications, particularly HTML5 games and simulations. These gains were accomplished through
a range of important improvements in Chakra: from new fundamental capabilities of the JIT compiler to changes in the garbage collector.

As we wrap up development on IE10 we celebrate the progress we’ve made, but we are keenly aware that performance is a perpetual quest. New applications emerge almost
daily that test the limits of modern browsers and their JavaScript engines. Without a doubt there will be plenty to work on in the next release!

If you’re a JavaScript developer, we’d love to hear from you. If the new capabilities and performance advances in IE10 helped you create entirely new experiences
for your users, or make existing applications better, please, let us know. If you’ve hit any performance limitations in IE, please, drop us a note as well. We carefully
read all the comments on this blog, and we strive to make IE10 and Windows 8 the most comprehensive and performant application platform available.

—Andrew Miadowicz, Program Manager, JavaScript


IEBlog