Tuesday, July 13, 2010

Experiments with 'Static' JavaScript: As Fast As Native Code?

I don't really know much about computer language theory. I just mess around with it in my spare time (weekends mostly). Here is one thing I've been thinking about: I want the speed of native code on the web - because I want to run things like game engines there - but I don't want Java, or NaCl, or some plugin. I want to use standard, platform-agnostic web technologies.

So, how about if we could compile JavaScript into native code? Of course, in general we can't, at least not very well. But if a JavaScript program - or part of a program - happens to be implicitly statically typed, and in other ways 'performance-friendly',  then we should be able to compile at least such code and run it very quickly.

In fact PyPy does basically that with RPython - a subset of Python that it can translate into C. So, building on that, I made a demo of the following process:
  • Begin with some benchmark in JavaScript
  • Parse the JavaScript using the SpiderMonkey parser (a small hack of the code was needed in order to get it to dump the syntax tree in a nice format)
  • Translate the parsed code into Python (using a Python script, see below)
  • If the generated code happens to be RPython, happily run that through PyPy's RPython translator to get native code, and execute that
The code I wrote for this is horribly ugly, but if you must look at it, here it is (if I decide this idea is worth doing any more work on, I will almost certainly rewrite it from scratch). Here are the results for the fannkuch benchmark (run on the value n=10, on a Core 2 Duo laptop):


The first row in the chart shows a C++ implementation of fannkuch, compiled with -O3. All the other rows start with a JavaScript implementation and proceed from there. First there is simply the result of running V8 and SpiderMonkey on the benchmark. Then, the results of automatically converting the JavaScript to Python and running CPython on it are shown - very slow. Finally, that Python is run through PyPy's RPython translator, which generates native code, which runs very fast.

The final result is that the JavaScript => Python => PyPy pipeline runs the code only 26% slower than a C++ implementation of the same benchmark compiled at -O3. For comparison, V8 run on that same JavaScript is 3.76 times slower. So, it seems there is potential here to get (some) JavaScript running very fast, pretty much as fast as possible.

Of course, there is a big startup cost here - the conversion process takes a lot of time (at least PyPy is fun to watch as it runs ;). And JavaScript on the web needs to start up quickly. But there should be technical solutions for this, for example, running the code normally while trying the conversion process in a separate thread (so, running on another CPU core). If the process finishes successfully, swap out the code for the faster version (or if you can't hot-swap it, at least remember and use the compiled version next time you visit that website - would still help for websites you visit a lot).

So, as I said this is just a little demo I worked on in my spare time, to see if it can even work, and to learn more about the topic. It seems to sort of work actually (I was not very optimistic before I started), but I'm still unsure if it makes sense to do. Feedback is welcome.

(I also have a few other experiments I'm working on, that are closely related, and involve LLVM. I'll leave those for a future post.)


Edit: Jernej asked in a comment about Rhino. So for comparison I also ran this code in Rhino and Jython. Rhino took 22.51 seconds - more than twice as slow as SpiderMonkey - and Jython took 74.53 seconds, closer to the slowness of CPython than the next slowest result (Rhino). Neither of these - Rhino or Jython - is particularly fast in general, but they do give other significant benefits (integration with the JVM, proper multithreading, security, etc.).

Edit 2: Following comments by Sayre here and dmandelin elsewhere, here are the results of JaegerMonkey on this benchmark: 4.62 seconds. So, much faster than TraceMonkey, and almost as fast as V8 (very cool!).