Friday, December 21, 2012

Emscripten News: LLVM 3.2, etc.

LLVM 3.2 was just released today, and as with every LLVM release emscripten is switching to it. (We only support a single LLVM version at a time; if you don't want to upgrade to LLVM 3.2 just yet, you can use older revisions of emscripten.)

LLVM 3.2 brings as usual a large amount of general improvements and bugfixes. There isn't anything in particular that will be noticeable with emscripten, except for a change in how LLVM does linking. It now requires an explicit list of symbols to keep alive - this was quite puzzling to me at first but respindola explained it, and this is a very nice change for LLVM to make, linking is more consistent there now. We do have all the necessary symbol information in emscripten, but were not passing it to LLVM, now emscripten has been modified to do so.

The result is that in some cases more unneeded code can be removed, resulting in smaller generated code, which is great (for example ammo.js is 2% smaller). However, if you do not explicitly keep a function alive (either by using EMSCRIPTEN_KEEPALIVE or __used__ in the C/C++, or adding it to EXPORTED_FUNCTIONS), then LLVM may remove it.

Another improvement is landing together with this into master, unrelated to LLVM 3.2, is better linking of .a archives. We now only use the object files that are actually required, and will not link in others. This can also reduce the size of generate code, but again, if you are not careful, needed functions may be removed, in particular because the link order of archives matters (libA.a libB.a will only link in parts of libA that were required by things before it on the commandline, not things in libB).

Finally, another change you might notice if you use emscripten is that it now has better support for systems with both Python 2 and 3 installed at the same time. ack wrote a big patch to make our usage of python much cleaner to enable that. One significant consequence is that we now look for python2 in the python script shebangs. So if you run ./emcc and do not have python2 in your path, you will get an error. Solutions are to either run python emcc or add a symlink to python2 from python.

Monday, November 12, 2012

Emscripten Compiler Upgrades

Several major architecture improvements have landed in the last few weeks in Emscripten, here is an overview.

New Eliminator

The eliminator optimization phase was originally written by Max Shawabkeh. It basically removes unneeded variables, so

  var x = f(a);
  var y = x + g(b);

can be optimized into

  var y = f(a) + g(b);

This can greatly reduce the size of the code as well as improve performance, and was fundamental for our approach of relying on a combination of the eliminator + the closure compiler to go from LLVM's SSA representation into a register-like format: The eliminator removes large amounts of unneeded variables, and the closure compiler then reduces the number of variables further by reusing the ones that remain.

The eliminator could be slow on large functions, however, because it calculated the transitive closure of dependencies between all the variables, an expensive calculation. It also missed out on some opportunities to optimize because of some simplifying assumptions it made in its design. A final downside was it integrated poorly with the rest of our optimizations (in part due to being written in a different language, CoffeeScript).

I rewrote the eliminator entirely from scratch, in order to do a more precise analysis of which variables can be removed. I also simplified the problem slightly by only eliminating variables that have a single use - this makes it far faster, and I don't see any downside in the quality of the generated code (in fact it avoids some possible bad cases, although it took a long time to figure out what was going on in them). The new version is faster in general and far faster on the rare bad cases (100x even), and generates better-performing code to boot.

Parallel Optimizer

With all of the emscripten optimization passes now in JavaScript, I then worked on parallelizing that. We can't run one pass before the previous one finishes, but within each pass, we can work separately in each function - optimizing each function is independent of the other (we used to have some global optimization passes, but their benefit was very limited).

The parallelization is done using Python's process pool. It splits up the JavaScript into chunks and runs those in multiple JavaScript optimizer instances. The speedup can be close to linear in the number of cores. On BananaBread, the optimization passes become almost twice as fast on my dual-core laptop.

Parallel Compiler

With the optimizer parallel, there remain two phases that can be slow: The compiler (the initial code conversion from LLVM to JavaScript) and Closure Compiler. We can't do much for Closure, but in the long term it will become less and less important: we are implementing specific optimizations for the things we used to rely on it for, which leaves just minifying the code.

For the LLVM to JS compiler, I made the emscripten compiler parallel as well: It splits up the LLVM IR into 3 main parts: type data, function data, and globals. The function data part is unsurprisingly by far the largest in all cases I checked (95% or so), and it can in principle be parallelized - so I did that. Like in the optimizer, we use a Python process pool which feeds chunks of function data to multiple JavaScript compiler instances. There is some overhead due to chunking, and the type data and globals phases are not parallelized, but overall this can be a close to linear speedup.

Overall, pretty much all the speed-intensive parts of code generation and optimization in Emscripten are now parallel, and will utilize all your CPU cores automatically. That means that if you experience slow compilation speeds, you can just throw more hardware at it. I hear they are selling 16-core CPUs now ;)

New Relooper

The relooper is an optimization performed (unlike most optimizations) during initial code generation. It takes basic blocks of code and branching information between them and generates high-level JS control flow structures like loops and ifs, which makes the code run far faster. The original relooper algorithm was developed together with the implementation I wrote in the compiler. Eventually some aspects of how it works were found to be suboptimal, so specific optimizations were added to the JS optimizer ('hoistMultiples', 'loopOptimizer'), overall giving us pretty good generated code.

Meanwhile I wrote a new version of the relooper in C++. There were 2 reasons for that choice of language: First, because other projects needed something like it, and C++ was a better language for them, and second, because we had plans to evaluate writing an LLVM backend for emscripten that would also need to reloop in C++ (note: we decided against the LLVM backend in the end). The new version avoids the limitations of the first, and generates better code. In particular it has no need for additional optimizations done after the fact. It also implements some additional tweaks that are missing in the first one, like node splitting in some cases and more precise removal of loop labels when they are not needed, etc. It's also a much cleaner codebase.

I brought that new version of the relooper into Emscripten by compiling it to JS and using it in the JS compiler. This makes compilation faster both because the new relooper is faster than the previous one (not surprising as often compiled code is faster than handwritten code), and because the additional later optimizations are no longer needed, for overall about a 20% speedup on compiling BananaBread. It also generates better code, for example it can avoid a lot of unneeded nesting that the previous relooper had (which caused problems for projects like jsmess).

Note that this update makes Emscripten a 'self-hosting compiler' in a sense: one of the major optimization passes must be compiled to JS from C++, using Emscripten itself. Since this is an optimization pass, there is no chicken-and-egg problem: We bootstrap the relooper by first compiling it without optimizations, which works because we don't need to reloop there. We then use that unoptimized build of the relooper (which reloops properly, but slowly since it itself is unoptimized) in Emscripten to compile the relooper once more, generating the final fully-optimized version of the relooper, or "relooped relooper" if you will.


Thursday, October 25, 2012

Emscripten News: BananaBread, Nebula3, GDC Online, Websockets, Worker API, Performance, etc

I haven't found time to blog about individual things, so here is one post that summarizes various Emscripten-related things that happened over the last few months.

BananaBread

BananaBread, a port of the Cube 2 game engine to the web, was launched and then received a few minor updates with bugfixes and some additional experimental levels. Feedback was good, and it was linked to by a Firefox release announcement and later a Chromium release announcement, in both cases to show that each browser is now capable of running first person shooter games.

Nebula 3

Cube 2 isn't the only game engine being ported to the web using Emscripten, this post by a Nebula 3 dev is worth reading, and check out the demo it links to. Nebula is a powerful game engine that like id tech engines gets open source releases now and then, and has been used in some impressive games (like this). Very cool to see it working well in JS+WebGL, especially given the dev's initial skepticism - read the blogpost! :)

GDC Online

I gave a talk together with Kevin Gadd at GDC Online, here are my slides. We talked about compiling games to HTML5, I focused on C++ and Kevin on C#, so overall we covered a lot of potential codebases that could be automatically ported to the web.

Among the demos I gave was of course BananaBread, as an example of a 3D first person shooter compiled from C++ and OpenGL to JavaScript and WebGL. Interestingly, Adobe gave a talk later that day about porting games to web browsers, which compared 4 platforms: WebGL/JS, Flash, NaCl, and Unity, and for the WebGL/JS demo they also presented BananaBread, so it ended up being shown twice ;)

Workers API

Support for worker threads is in the incoming branch, look in emscripten.h and for tests with "worker_api" in them in tests/runner.py. This API basically lets you compile code into "worker libraries" that the main thread can then call and get responses from, giving you an easy way to do message-passing style concurrency.

The API is in initial stages, feedback is welcome.

Networking

Initial support for networking using websockets has also been implemented, see tests with "websockets" in their name. Basic sockets usage works, however we have had troubles with setting up a testing websocket server with binary support, see the issue for details. Because of that this won't work on arbitrary binary data yet. If you know websockets and websocket servers and are interested in helping with this, that would be great.

Another approach we intend to work on, and where help would be welcome, is WebRTC. WebRTC could actually be easier to work with since it supports p2p connections, so it's easy to test a connection from one page to itself. It also supports UDP-style unreliable data, so we should be able to get multiplayer working in BananaBread when that is complete.

Library Bindings to JavaScript

We currently have the "bindings generator" which is used to make ammo.js and box2d.js. It works for them, but needs manual hacking and has various limitations. A more proper approach is being worked on, contributed by Chad Austin, which he called "embind". This is a more explicit, controllable approach to bindings generation, and in time it should give us big improvements in projects like ammo.js and box2d. If you use those projects and want them to improve, the best way is to help with the new embind bindings approach. We have some initial test infrastructure set up, and there are various bugs filed with the tag "embind" if you are interested.

LLVM backend

I did some experiments with an LLVM backend for Emscripten when I had free time over the last few months. The results were interesting, and I got some "hello world" stuff working, during which I learned a lot about how LLVM backends are built.

Overall this is a promising approach, and it is what pretty much all other compilers from languages like C into JS work. However, this is going to be low priority, for two main reasons. First, we simply lack the resources: There are many, many other things that are very important for me to work on in Emscripten (see other points in this blogpost for some), and we have not had luck in interesting people to collaborate on this topic so far. Second, while my investigations were mostly positive, they also found negatives in going the LLVM backend route. Some types of optimizations that make sense for JavaScript are an uncomfortable fit for LLVM's backends, which is not surprising given how different JS is from most target languages. It's possible to overcome those issues, of course, but it isn't the optimal route.

Why do pretty much all the other compilers go the LLVM backend route? I suspect it might have to do with the fact that they typically do not just compile to JS. For example, if you already have a compiler into various targets, then when you consider also compiling into JS, it is simplest to modify your existing approach to do that as well. Emscripten on the other hand is 100% focused on JS only, so that's a fundamental difference. If all you care about is targeting JS, it is not clear that an LLVM backend is the best way to go. (In fact I suspect it is not, but to be 100% sure I would need to fully implement a backend to compare it to.)

Compiler and Code Perf

To continue the previous point, there is however one aspect of an LLVM backend that is greatly beneficial - it's written in efficient C++ and will compile your code quickly. Emscripten on the other hand is written in JavaScript and has some complex optimization passes that do a lot of work on a JS AST, and these can take a long time. A fully optimized build of BananaBread, for example, takes about 3 minutes on my laptop, and while it's a big project there are bigger ones of course that would take even more.

On the one hand, this doesn't matter that much - it's done offline by the compiler. People running the generated code don't notice it. But of course, making developer's lives easier is important too.

In Emscripten the goal has always been to focus more on performance of the generated code rather than performance of the compiler itself, so we have added new optimization passes even when they were expensive in compilation time, as long as they made the generated code faster. And we rely on tools like Closure Compiler that take a long time to run but are worth it.

But compiler perf vs code perf isn't an all of nothing decision. Right now on the incoming branch there are some (not fully finished and slightly buggy, but almost ready) optimizations that improve compilation time quite a bit. And with those in place we can move towards parallel compilation in almost all of the optimization passes, so with 8 cores you might get close to 8x speedups in compilation, etc.

So the current goal is to focus on the existing compiler. It will get much faster than it is now, but it will probably never get close to the speed an LLVM backend could get, that's the tradeoff we are making in order to focus on generating faster code. An additional reason this tradeoff makes sense is that we currently have plans for several new types of optimizations to make the generated code yet faster, and it is far easier for us to work on them in the current compiler than an LLVM backend.

Record/Replay

Finally, we added a system for recording and replaying of Emscripten-compiled projects (see reproduceriter.py). With it you basically compile your project in a special mode, run it in record mode and do stuff, then you can run the project in replay mode and it will replay the exact same output you saw before.

The main use case for this is benchmarks: If you have a program that depends on user input and random things like timing or Math.random(), then it is very hard to generate a good benchmark from it because you get different code being run each time. With the record/replay facility you can basically make a reproducible execution trace.

This has been tested on BananaBread so far, and used to create BananaBench, a benchmark based on BananaBread. You can either run it in the browser or in the shell, and hopefully a real-world benchmark like this will make it easier to optimize browsers for this kind of content.

Tuesday, July 24, 2012

BananaBread (or any compiled codebase) Startup Experience

This post talks about startup experience improvements in BananaBread. BananaBread is a specific 3D game engine compiled from C++ to JavaScript, but the principles are general and could be applied to any large compiled codebase.

Starting up "nicely"

Starting up as quickly as possible is always best on every platform. This is a general issue so I won't focus on it, because on the web, there is also another important criterion, which is to start up in as asynchronous a way as possible. By asynchronous I mean to not run in a single large event on the main thread: Instead it is better for as much as possible to be done on background threads (web workers) and for what does run on the main thread to at least be broken up into small pieces.

Why is being asynchronous important? A single long-running event makes the page unresponsive - no input events are being handled and no output is being shown. This might seem not that important for startup, when there is little interaction anyhow. But even during startup you want to at least show a progress bar to give the user an indication that things are moving along, and also most browsers will eventually warn the user about nonresponsive web pages, showing a scary "slow script" dialog with an option to cancel the script or close the page.

Asynchronize all the things..?

If you're writing a new codebase, you would indeed make everything asynchronous. All pure startup calculations would be done in background threads, and main thread events would be very short. Here is an example of such a recently-launched product: The worst main thread stall during startup seems to be about half a second, not bad at all, and a friendly progress bar updates you on the current status. When you are writing a new codebase it is straightforward to design in a way that makes nice startup like that achievable.

When you're compiling an existing codebase, things are harder, though. A normal desktop application does not need to be written in an asynchronous manner, while it might have a main loop that can easily be made asynchronous (run each main loop iteration separately), startup can be just a single continuous process of execution with some periodic notifications to update a progress meter. That is exactly the situation with Cube 2, the game engine compiled to JavaScript in the BananaBread project.

Now, there is a way to have long-running JavaScript code in browsers: Run it in a web worker. That would be the perfect solution, however, workers do not have access to WebGL or audio, and there is no way for them to send synchronous messages to the main thread, so even proxying those APIs to them is not practical. So unless you can easily box off "pure calculation" parts into workers, you do need to run most or all of your code on the main thread.

But you can still asynchronize even such a codebase: Here is what startup was like until recently: BananaBread r13, and and here is what it looks like now: BananaBread r15. The worst main thread stall is 1.4 seconds on my laptop, which is not great but definitely enough to prevent "slow script" warnings on most machines, and there is now a progress bar.

Means of asynchronization

The first important thing is to find small chunks of computation that are easily done ahead of time and their results cached for later:
  • In BananaBread jpg and png images must be decoded into pixel data. Emscripten does that during the preloading phase, each one is decoded by a separate Image element. This not only breaks things up into small pieces, it also uses the browser's native decoders, so it happens faster than if we had compiled a decoding library with the rest of the game engine. (A clever browser might also do these decodings in parallel..)
  • Crunch files need to be decompressed using a compiled JavaScript decoder. We do that during preloading as well, with the decoder running in a web worker.
  • Cube 2 levels (or maps as they are called) are gzip compressed, and the engine decompresses them during startup. I refactored that and BananaBread now decompresses them using zee.js during preloading, also in a worker.
Taking these three points together, BananaBread can use three cores during the preloading phase. This is actually an improvement on the original game engine which is single-threaded!

After preloading, the compiled engine starts to run and we are necessarily single-threaded. The important thing to do at this stage is to at least break up the startup code into small-enough pieces to avoid freezing the main thread. This requires refactoring the original source code, and is not the most fun thing in the world, but definitely possible. Emscripten provides an API to help with this (emscripten_push_main_loop_blocker etc.), you can define a queue of functions to be called in sequence, each in a separate invocation. So the tricky part is just to deal with the codebase you are porting.

Over a few days I broke up the biggest functions called during startup, getting from a maximum of 6 seconds to 1.4 seconds. Browsers seem to complain after around 10 seconds, so 1.4 isn't perfect, but on machines 7x slower than my laptop things should still be ok. Further breaking up will be hard as it starts to get into intricate parts of the game engine - it's possible, but it would take serious time and effort.

Other notes

Of course, there are other big factors with startup:
  • Download time: My personal server that hosts the BananaBread links above is not that fast, and doesn't even do gzip compression. We hope to get a more serious hosting solution before BananaBread hits 1.0.
  • GPU factors: BananaBread compiles a lot of shaders and uploads a lot of textures during startup. On the plus side, the time these take is probably not much different than a native build of the engine, but it's noticeable there too.
  • Data: Smaller levels lead to faster startup and vice versa. Our levels aren't done yet, we'll optimize them some more.
  • Subjective factors: You can't render during long-running JavaScript calculations, but music will keep playing, which makes for a less-boring experience for the user. Also, a nice splash screen during startup is a good idea, we should do that... ;)

Thursday, July 19, 2012

Experimenting with an LLVM backend for JavaScript

We have started to experiment with an LLVM backend for JavaScript. The reasons, approach and other notes are all on the relevant emscripten wiki page.

If you know LLVM and JavaScript and want to help, please get in touch!

Thursday, July 5, 2012

Scripting BananaBread / Using Compiled C/C++ Code in JavaScript

In BananaBread we compile an entire 3D game engine from C++ into JavaScript. The simplest way to use it is to compile everything you need and just run it. However, it might be useful to let the compiled code be controlled from JavaScript, that way you can use normal JavaScript to control the game engine. Here are an examples of that: Fireworks Demo.

In that demo two APIs are provided from the compiled game engine: camera control and particle effect creation. The scripting API used in them begins here, and an example use can be seen here.

How does this work? There are 4 main steps to accessing compiled C/C++ from normal JavaScript:
  • Make a C API if the code is in C++. You can use C++, but then you need to deal with name mangling and this pointers in a manual way - C is easier. Example.
  • Use EMSCRIPTEN_KEEPALIVE to keep the code alive. EMSCRIPTEN_KEEPALIVE is a macro that uses compiler attributes to tell the compiler not to eliminate code as dead even if it isn't used (it will be used from JavaScript, but without this the compiler doesn't know that). Example.
  • Export the function through Closure Compiler. In -O2 closure compiler is used to minify and optimize the code. As a consequence, the original function names are unrecognizable, and closure will also remove code it sees is never used. The way to do this is to add the function to EXPORTED_FUNCTIONS when calling emcc to compile to JavaScript. The function will then show up on the Module object even after closure compiler runs (side note, all of the exports through closure are on the Module object, for example you can access memory through Module.HEAP8, etc.). Example (scroll to EXPORTED_FUNCTIONS).
  • Access the code through ccall or cwrap. ccall does a one-time call to a function, while cwrap returns a native JavaScript function that wraps the C function. Both take as arguments the return type and argument types (see docs). Example.
You can then access the code, and if you want, you can write a nice JavaScript API on top of the C-like interface ccall/cwrap give you.

Returning to BananaBread specifically, we now have the infrastructure to allow JavaScript access to all the compiled game engine's functionality. Right now as mentioned before we have camera and particle effect APIs (was very quick to start with those), but straightforward work can let us control the how characters move, the rules of the game (how you earn points, get ammo, etc.), how objects behave, how weapons work, etc. The underlying engine is mainly used for first person shooters, but it is easy to use it for other things, from visual demos like the fireworks from before to 2D games to non-game 3D virtual worlds and so forth, once you have the proper scripting APIs in place (in fact I did something very similar a few years ago using the same engine).

If that kind of thing is interesting to you please get in touch, ideas for how to design the JavaScript part of the API are welcome.

Tuesday, June 26, 2012

BananaBread 0.2: Levels!

BananaBread, the port of the Sauerbraten first person shooter from C++ and OpenGL to JavaScript and WebGL, is making good progress. We are starting to work on polish and our artist gk is in the process of making some very cool levels!


Here are some screenshots. First, here are parts of the larger of the three levels,



and here is another part of that level, where water effects are turned on maximum (both reflection and refraction, and glare),


Here is the medium-sized level,


which has a very different theme to it. You can also see a bot charging towards me there. Finally, here is the smaller level,


and here in that level is a ferocious squirrel on the attack,


(the squirrel model is from the Yo Frankie game).

What the screenshots can't show is that playing a first person shooter in a web browser (without any plugins!) is an interesting experience, I guess because it isn't a common thing yet. Try it :)

Tuesday, June 19, 2012

StatCounter and Statistics

"In summary - the Net Applications sample is small (it is based on 40,000 websites compared to our 3,000,000). Weighting the initial unrepresentative data from that small sample will NOT produce meaningful information. Instead, applying a weighting factor will simply inflate inaccuracies due to the small sample size." http://gs.statcounter.com/press/open-letter-ms
Very misleading. I have no idea which of StatCounter and Net Applications is more accurate. But that argument is off.

In statistics, sample size is basically irrelevant past a certain minimal size. That's how a survey of 300 people in the US can predict pretty well for 300 million. The number of people doesn't matter in two ways: First, it could be 1 million or 1 billion, the actual population size is irrelevant, and second, it could be 3,000 or 30,000 and it would not be much better than 300. The only exception to those two facts is if the population size is very small, say 100. Then a sample size of 100 is guaranteed to be representative, obviously. And for very small sample sizes like say 5, you have poor accuracy in most cases. But just 300 people is enough for any large population.

The reason is the basic statistical law that the standard deviation of the sample is the same as of the population, divided by the square root of the sample size. If you are measuring something like % of people using a browser, the first factor doesn't matter much. That leaves the second. Happily for statistics, 1 over square root decreases very fast. You get to accuracy of a few percent with just a few hundred people, no matter what the population size.

So that StatCounter has 3,000,000 websites and Net Applications has 40,000 means practically nothing (note that 40,000 even understates it, since those are websites. The number of people visiting those sites is likely much larger). 40,000 is definitely large enough: In fact, just a few hundred datapoints would be enough! Of course, that is only if the sample is unbiased. That's the crucial factor, not sample size. We don't really know which of StatCounter and Net Applications is less biased. But the difference in sample size between them is basically irrelevant. Past a minimal sample size, more doesn't matter, even if it seems intuitively like it must make you more representative.

Many More Falling Blocks

Several months ago I made a demo of box2d.js, a port of the 2D physics engine Box2D to JavaScript. I made the demo using 15 falling blocks because that's what ran well at the time. Checking the demo now, I see that JavaScript Engine improvements allow for the possibility of many more falling blocks. Here is a version of that demo with 80 falling blocks.

On the current Firefox GA Release (13) the frame rate with 80 blocks often drops briefly to 20fps, which is not great. But on Firefox Nightly (16), I get very smooth frame rates, very close to 60fps! I see similar results in Chrome Dev (21) as well [1]. So it looks like it is now possible to run games on the web with lots of 2D physics in them.

[1] On Opera 12 I manually enabled WebGL, but sadly the page doesn't render properly.

Friday, June 15, 2012

Debugging JavaScript is Awesome

I sometimes find myself debugging large amounts of JavaScript, typically a combination of Emscripten-compiled C++ and handwritten JavaScript that it interfaces with. It turns out to be pretty easy and fun to do.

Obviously that's a personal opinion and it depends on the type of code you debug as well as your debugging style. My C++ debugging style tends to be to add printfs in relevant places, recompile and run, only using gdb when there are odd crashes and such. So in JavaScript this turns out to be better than C++: You add your prints but you don't need to recompile, just reload the page.

But it gets even better. You can use JavaScript language features to make your life easier. For example, I've been debugging OpenGL compiled to WebGL, and sometimes a test would start to fail when I made a change. It is really really easy to add automatic logging of every single WebGL command that is generated, see the 30 or so lines of code starting here. That wraps the WebGL context and logs everything it does. So I can log the output before a commit and after, diff those, and see what went wrong. I also have similar code in Emscripten to log out each call to a libc function. Of course that sort of thing is possible in C++ too, but it is much trickier, while in JavaScript it is pretty simple.

And actually it is better still. Unlike a regular debugger like gdb, when you debug JavaScript you can script debugging tasks directly in the code with immediate effect. For example, when debugging BananaBread I might see something wrong in the particle effects but nowhere else. If so I can just jump into the source code, set a variable to 1 when starting to render particles, and set it to 0 when leaving. I can then check that variable when logging GL stuff, and I'll only see the relevant code. It's also useful for more complex situations like logging specific data on the Nth call to a function or only when certain situations hold. Since reloads are so fast, this is very efficient and effective. I heard gdb has a python scripting option, and maybe other debuggers have similar tools, but really nothing can beat scripting your debug procedure using the same language as your code like you can with JS: There is nothing to learn, your have the full power of the language, and you just hit reload.

And of course there are other nice things like being able to print out new Error().stack to get a stack trace at any point in time, JSON.stringify(x, null, 2) to get nice pretty-printed output of any object, etc.


Tuesday, May 29, 2012

Reloop All The Blocks

One of the main results from the development of Emscripten was the Relooper algorithm. The Relooper takes basic blocks of code - chunks of simple code, at the end of which are branches to other blocks of code - and generates high-level structure from that using loops and ifs. This is important because LLVM gives you basic blocks, and JavaScript requires loops and ifs to be fast, so when compiling C++ to JavaScript you need to bridge the two. So if you have any sort of compiler from a representation with basic blocks into JavaScript - or for that manner any high-level language that does not have gotos, but does have labelled loops - then the Relooper might be useful for you. The Relooper is known to be used in the two main C++ to JS compilers, Emscripten and Mandreel.

Emscripten's Relooper implementation is in JavaScript, which was very useful for experimenting with different approaches and developing the algorithm. However, there are two downsides to that implementation: First, that it was built for experimentation, not speed, and second, that being in JavaScript it is not easily reusable by non-JavaScript projects. So I have been working on a C++ Relooper, which is intended to implement a more optimized version of the Relooper algorithm, in a fast way, and to make embedding in other projects as easy as possible.

That implementation is not fully optimized yet, but it has gotten to the point where it is usable by other projects. It got to that point after last week I wrote a fuzzer for it, which generates random basic blocks, then implements that in JavaScript in the trivial switch-in-a-loop manner, and then uses the Relooper to compile it into fast JavaScript. The fuzzer then runs both programs and checks for identical output. This found a few bugs, and after fixing them the fuzzer can be run for a very very long time without finding anything, so hopefully there are no remaining bugs or at least very very few.

The C++ Relooper code linked to before comes with some testcases, which are good examples for how to use it. As you can see there, using the Relooper is very simple: There are both C++ and C APIs, and what you do in them is basically
  • Define the output buffer
  • Create the basic blocks, specifying the text they contain and which other blocks they branch to
  • Create a relooper instance and add blocks to it
  • Tell the relooper to perform its calculation on those blocks, and finally to render to the output buffer
There is also a debugging mode, in which a lot of debug info will be printed out, including (from the C API) a C program using the C API, which is useful for generating testcases.

Friday, May 25, 2012

Emscripten and LLVM 3.1

LLVM 3.1 support for Emscripten just landed in master, all tests pass and all benchmarks either remain the same, or improve from 3.0.

LLVM 3.1 is now the officially supported version, all testing from now on will be on 3.1. The Emscripten tutorial has been updated to reflect that.

(3.0 might work, it does right now, but over time that might change.)

Thursday, May 3, 2012

Emscripten OpenGL / WebGL Conversion Progress

Here is a very early demo of a 3D game engine, written in C++ and using OpenGL, compiled to JavaScript and WebGL using Emscripten. The game engine is Sauerbraten (aka Cube 2), one of the best open source game engines out there, and we nicknamed the port BananaBread.

After loading the demo link, press the "fullscreen" button, then click "GO!" to start the game. Move with WASD, jump with space, look around with the mouse. You can shoot a little by clicking the mouse. Please note that
  • The C++ game code has not been optimized at all in any way yet
  • The generated JavaScript is itself not fully optimized yet, nor even minified
  • The level you see when you press "GO!" was made by me, a person with 0 artistic talent
  • The game assets (textures) have not been optimized for faster downloads at all
So this is a very very early demo - ignore performance and content quality. Also, it might not work in all browsers yet, sorry about that: Seems fine in Firefox 15, including pointer lock and fullscreen mode, but for some reason in Chrome 20 pointer lock isn't working. I do get 60fps in both of them though on my 2 year old MacBook (note that the frame rate is capped at 60 using requestAnimationFrame, so that it does not go higher is not an indication of anything).

After the disclaimers, I did want to blog about this because despite being very early, I think it does show the potential of this approach. We are taking a C++ game engine using an oldish version of OpenGL, and with almost no changes to the source code we can compile it using open source tools to something that runs on the web thanks to modern JS engines, the fullscreen and pointer lock APIs and WebGL, at a reasonable frame rate, even before optimizing it.

A few technical details:
  • Emscripten supports the WebGL-friendly subset of OpenGL and OpenGL ES quite well. That subset is basically OpenGL ES 2.0 minus clientside arrays. If you are writing C++ code with the goal of compiling it to WebGL, using that subset is the best thing to do, it can be compiled into something very efficient. We should currently support all of that subset, but most of it is untested - please submit testcases if you can.
  • Emscripten now also supports some amount of non-WebGL-friendly OpenGL stuff. We will never support all of desktop OpenGL I don't think - that would amount to writing an OpenGL driver - but we can add parts that are important. Note that we are doing this carefully, so that it does not affect performance of code that uses just the WebGL-friendly subset, the additional overhead for supporting the nonfriendly features is only suffered if you deviate from the friendly subset.
    • Specifically, the non-friendly features we partially support include pieces of immediate mode, clientside state and arrays, and shader conversion to WebGL's GLSL. Again, we have only partial support for those - it is best to not rely on them and to use the WebGL-friendly subset. The parts we support are motivated by what Sauerbraten's renderer requires (note that even to render the GUI, you need a immediate mode support, that's all done with OpenGL and not some 2D API).
  • The demo is the result of about a month of work. Almost all of that time was spent in learning OpenGL (which I had never used before) and writing the emulation layer for OpenGL features not present in WebGL, basically proceeding testcase by testcase after generating testcases from Sauerbraten. Aside from that, everything else pretty much just worked when compiled to JS. 
The plan is to continue this port, and help is welcome. Basically we want to get the entire game working, including model rendering (the main part of the Sauerbraten renderer I haven't looked at yet), AI bots and so forth, and to use professionally designed levels and models. At some point we will probably want to optimize the code as well. The goal is to end up with a playable, good looking 3D FPS game that runs on the web, that is open source and is built using open source tools, so other people can learn from the project or even use the code directly. The gane will initially be single player versus some bots, but eventually using WebRTC we should be able to get multiplayer mode working as well (WebRTC should land in most browsers later this year).

Aside from this specific game port, Emscripten's OpenGL support has greatly improved, and there are other projects using it already. If you use the WebGL-friendly subset of OpenGL, it is ready for use now, with the disclaimer that while everything should work we have not rigorously tested it yet, help with testing and testcases would be welcome. In particular if you have some application you want to port, if you find problems in our OpenGL support please file a bug with a testcase, for the WebGL-friendly subset those should be easy to fix and we can add the testcase to our test suite so we don't regress on the features your project needs.


Friday, March 23, 2012

HOWTO: Port a C/C++ Library to JavaScript (xml.js)

I've been porting various libraries to JavaScript recently (lzma.js, sql.js) and I thought it might be useful to write up details about how this kind of thing is done. So here is how I ported libxml - an open source library that can validate XML schemas - in response to a request. Note that this isn't a general HOWTO, it's more a detailed writeup of what I did to port libxml in particular, but I hope it's useful for understanding the general technique.

If you just want to see the final result, the ported project is called xml.js, and there is an online demo here (thanks syssgx!)

Part 1: Get the Source Code and Check It Natively

I downloaded the latest libxml source code from the project's website and compiled it natively. One of the generated files is xmllint, a commandline tool to validate schemas. I made sure it works on a simple example. This is important first of all as a sanity check on the code being compiled (especially important if you are porting code you never used or looked at, which is the case here!), and second having the testcase will let us easily check the JavaScript version later on. Running xmllint looks like this:

  $./xmllint --noout --schema test.xsd test.xml
  test.xml validates


Just to be sure everything is working properly, I introduced some errors into those files, and indeed running xmllint on them produces error messages.

Part 2: Run Configure


  emconfigure ./configure

emconfigure runs a command with some environment variables set to make configure use emcc, the Emscripten replacement for gcc or clang, instead of the local native compiler.

When looking at the results of configure, I saw it includes a lot of functionality we don't really need, for example HTTP and FTP support (we only want to validate schemas directly given to us). So I re-ran configure with the options to disable those features. In general, it's a good idea to build just the features you need: First, unneeded code leads to larger code size, which matters on the web, and second, you will need to make sure the additional features compile properly with emcc, and sometimes headers need some modifications (mainly since we use newlib and not glibc).

Part 3: Build the Project

  emmake make


emmake is similar to emconfigure, in that it sets some environment variables. emconfigure sets them in order for configure to work, including configure's configuration tests (which build native executables), whereas emmake sets them in order for actually building the project to work. Specifically, it makes the project's build system use LLVM bitcode as the generated code format instead of native code. It works that way because if we generated JS for each object file, we would need to write a JS linker and so forth, whereas this way we can use LLVM's bitcode linking etc.


Make succeeds, and there are various generated files. But they can't be run! As mentioned above, they contain LLVM bitcode (you can see that by inspecting their contents, they begin with 'BC'). So we have an additional step as described next.

Part 4: Final Conversion to JavaScript

For xmllint, we need xmllint.o. We also need libxml2.a, however. We need to manually specify it because LLVM bitcode linking does not support dynamic linking, so dynamic linking is basically ignored by emcc. But it's pretty obvious in most cases what you need, here, just libxml2.a.

Slightly less obvious is that we also need libz (the open source compression library). Again, dynamic linking was ignored, but we can see it was in the link command. I actually missed this the first time around, but it is no big deal, you get a clear error message at runtime saying a function is not defined, in this case gzopen. A quick grep through the headers shows gzopen is in libz, so I grabbed libz.bc from the emscripten test suite (if it wasn't there, I would have had to make a quick build of it).

Ok, let's convert this to JavaScript! The following will work:

  emcc -O2 xmllint.o .libs/libxml2.a libz.a -o xmllint.test.js --embed-file test.xml --embed-file test.xsd


Let's see what this means:
  • emcc is as mentioned before a drop-in replacement for gcc or clang.
  • -O2 means to optimize. This does both LLVM optimizations and additional JS-level optimizations, including Closure Compiler advanced opts.
  • The files we want to build together are then specified.
  • The output file will be xmllint.test.js. Note that the suffix tells emcc what to generate, in this case, JavaScript.
  • Finally, the odd bit is the two --embed-file options we specify. What this does is actually embed the contents of those files into the generated code, and set up the emulated filesystem so that the files are accessible normally through stdio calls (fopen, fread, etc.). Why do we need this? It's the simplest way to just access some files from compiled code. Without this, if we run the code in a JS console shell, we are likely to run into inconsistencies of how those shells let JS read files (binary files in particular are an annoyance), and if we run the code in a web page, we have issues with synchronous binary XHRs being disallowed except for web workers. So to avoid all those issues, a simple flag to emcc lets us bundle files with the code for easy testing.
Part 5: Test the Generated JavaScript

A JavaScript shell like Node.js, the SpiderMonkey shell or V8's d8 console can be used to run the code. Running it gives this:


  $node xmllint.test.js --noout --schema test.xsd test.xml
  test.xml validates


Which is exactly what the native build gave us for those two files! Success :) Also, introducing intentional errors into the input files leads to the same errors as in the native build. So everything is working exactly as expected.

Note that we passed the same commandline arguments to the JavaScript build as to the native build of xmllint - the two builds behave exactly the same.

Part 6: Make it Nice and Reusable

What we have now is hardcoded to run on the two example files, and we want a general function that given any XML file and schema, can validate them. This is pretty easy to do, but to make sure it also works with Closure Compiler optimizations is a little trickier. Still, it's not that bad, details are below, and it's definitely worth the effort because Closure Compiler makes the code much smaller.

The first thing we need is to use emcc's --pre-js option. This adds some JavaScript alongside the generated code (in this case before it because we say pre and not post). Importantly, --pre-js adds the code before optimizations are run. That means that the code will be minified by Closure Compiler together with the compiled code, allowing us to access the compiled code properly - otherwise, Closure Compiler might eliminate as dead code functions that we need.

Here are the contents of the file we will include using --pre-js:

  Module['preRun'] = function() {
    FS.createDataFile(
      '/',
      'test.xml',
      Module['intArrayFromString'](Module['xml']),
      true,
      true);
    FS.createDataFile(
      '/',
      'test.xsd',
      Module['intArrayFromString'](Module['schema']),
      true,
      true);
  };
  Module['arguments'] = ['--noout', '--schema', 'test.xsd', 'test.xml'];
  Module['return'] = '';
  Module['print'] = function(text) {
    Module['return'] += text + '\n';
  };

 What happens there is as follows:
  • Module is an object through which Emscripten-compiled code communicates with other JavaScript. By setting properties on it and reading others after the code runs, we can interact with the code.
  • Note that we use string names to access Module, Module['name'] instead of Module.name. Closure will minify the former to the latter, but importantly it will leave the name unminified.
  • Moving on the actual code: The first thing we modify is Module.preRun, which is code that executes just before running the compiled code itself (but after we set up the runtime environment). What we do in preRun is set up two data files using the Emscripten FileSystem API. For simplicity, we use the same filenames as in the testcase from before, test.xml and test.xsd. We set the data in those files to be equal to Module['xml'] and Module['xsd'], which we will explain later, for now, we assume those properties of Module have been set and contain strings with XML or an XML schema, respectively. We need to convert those strings to an array of values in 0-255 using intArrayFromString.
  • We set Module.arguments, which contains the commandline arguments. We want the compiled code to behave exactly as it did in the testcase! So we pass it the same arguments. The only difference will be that the files will have user-defined content in them.
  • Module.print is called when the compiled code does printf or a similar stdio call. Here we customize printing to save to a buffer. After the compiled code runs, we can then access that buffer, as we will see later.
In summary, we "sandbox" the compiled code in the sense that we set up the input files to contain the data we need, and capture the output so that we can do whatever we want to with it later.

We are not yet done, but we can compile the code now - the final thing that remains will be done after compile it. Compiling can be done with this command:

  emcc -O2 xmllint.o .libs/libxml2.a libz.a -o xmllint.raw.js --pre-js pre.js

This is basically the command from before, except we no longer embed files. Instead, we use --pre-js to include pre.js which we discussed before.

After that command runs, we have an optimized and minified build of the code. We wrap that with something we do not want to be optimized and minified, because we want it to be usable from normal JavaScript in a normal way,

  function validateXML(xml, schema) {
    var Module = {
      xml: xml,
      schema: schema
    };
    {{{ GENERATED_CODE }}}
    return Module.return;
  }

GENERATED_CODE should be replaced with the output we got before from the compiler. So, what we do here is wrap the compiled code in a function. The function receives the xml and schema and stores them in Module, where as we saw before we access them to set up the "files" that contain their data. After the compiled code runs, we then simply return Module.return which as we set up before will contain the printed output.

That's it! libxml.js can now be used from normal JS. All you need to do is include the final .js file (xmllint.js in the xml.js repo, for now - still need to clean that up and make a nicer function wrapping, pull requests welcome), and then call validateXML with a string representing some XML and another string representing some XML schema.


Tuesday, February 21, 2012

box2d.js: Box2D on the Web is Getting Faster

Box2D is a popular open source 2D physics library, used for example in Angry Birds. It's been ported to various platforms, including JavaScript through a previous port to ActionScript. box2d.js is a new port, straight from C++ to JavaScript using Emscripten. Here is a demo.

Last December, Joel Webber benchmarked various versions of Box2D. Of the JavaScript versions, the best (Mandreel's build) was 12x slower than C. Emscripten did worse, which was not surprising since back then Emscripten could not yet support all LLVM optimizations. Recently however that support has landed, so I ran the numbers and on the trunk version of SpiderMonkey (Firefox's JavaScript engine), Emscripten's version is now around 6x slower than C. That's twice as fast as the previous best result from December (three times as fast as Emscripten's result at that time).

That should get even faster as JavaScript engines and the compilers to JavaScript continue to improve. The rate of improvement is quite fast in fact, you will likely see a big difference between stable and development versions of browsers when running processing-intensive code like Box2D.

Aside from speed, it's important that the compiled code be easily usable. box2d.js uses the Emscripten bindings generator to wrap compiled C++ classes in friendly JavaScript classes, see the demo code for an example. Basically, you can write natural JavaScript like new Box2D.b2Vec2(0.0, -10.0) and it will call the compiled code for you.

(And of course, box2d.js is zlib licensed, like Box2D - usable for free in any way.)

Monday, January 23, 2012

Emscripten Standard Library Support, Now With More C++

I just landed much more comprehensive support for the C++ standard library in Emscripten, which now allows you to compile pretty much any C++ code using the standard C++ library to JavaScript. So I figured it was a good time to write up an overview of how Emscripten handles standard libraries.

As background, one of the initial design decisions in Emscripten was to focus on generating good code, even when that has some potential downsides elsewhere. Good code means both fast code and small code, both of which are particularly important on the web: While fast code is important everywhere, JavaScript is not yet as fast as native code, so to counter that we need to really focus on generating efficient code, and regarding code size, you might not care much about linking to a 5MB shared library on your desktop, but downloading and parsing a 5MB script is something significant.

For that reason, it didn't seem like a good idea to build C and C++ standard libraries, ship them with your code, and link them at runtime: The standard libraries are quite large. Furthermore, Emscripten doesn't have a single ABI, it has several code generation modes, as one example there are two typed array modes and one mode without typed arrays, and code compiled with one is not interface-compatible with another. Not having a stable ABI lets us generate more specialized and efficient code, but it is another reason for not shipping separate linkable standard libraries.

Instead, when compiling code with Emscripten we build the standard library along with your project's code. Everything is then shipped as a single file. This gives the advantages mentioned before: Smaller code size since we know which parts of the standard library you actually need, and faster code since we can specialize both the standard library and your own code, not just your own.

However, there are two disadvantages of this approach. The first is that combining the standard libraries with your own code means they form a single "unit". When normally you build your code and then link to the LGPL-licensed GNU standard libraries, it's clear that your code does not need to comply with the LGPL (you just need to comply with the LGPL regarding the LGPL'd library itself). But if you build your project together with the standard library, intertwining them in an optimized way, it is less clear how the LGPL applies here. I actually don't think there is a problem - it seems equivalent to the former "normal" case to me, despite the differences between them - however, I am not a lawyer, and also it is better to avoid any possible confusion and concern. In addition, even if the LGPL still applies just to the library, you would be shipping the library yourself, meaning you need to comply with the LGPL for it (which I don't think is a problem myself, but it is a concern for other people). For those reasons, Emscripten doesn't include any LGPL code. That ruled out using the GNU standard C and C++ libraries, which would otherwise be the first choice because of their familiarity and compatibility with existing code.

Given that decision, I looked at the other options and decided to use the Newlib C library. There are then two options: Use just the Newlib headers, or use the existing Newlib implementation code, porting it to the new platform. I decided to use just the headers, because (1) Newlib is already not 100% compatible with the GNU C library, so there would anyhow be inconsistencies and missing parts we would need to work around, and easier to do so in our own new code, (2) By implementing the C standard library in JavaScript, we can optimize it using the existing capabilities of the web platform (for example, we use JavaScript's sort inside qsort), and (3) Porting Newlib to the web would mean writing new C code inside Newlib, and interfacing that with JavaScript code that hooks into the web platform APIs themselves, which is a little less convenient than writing just JavaScript, and finally (4) Porting a C standard library means working with the internals of that library, as opposed to implementing the familiar C standard library interface which is higher-level. So, Emscripten has an implementation of the C standard library written in JavaScript, primarily using the Newlib headers. (The only exception is malloc and free, which we compile from dlmalloc, because writing an effective malloc implementation is not trivial. However, we should implement malloc and free in JavaScript eventually since we could optimize it quite a bit.)

For C++, again we couldn't use the GNU C++ standard library. Instead we started out with the libc++ headers. Just using the headers was enough to get a lot of code to run, because a lot of the functionality is in the headers themselves. We did need to introduce some ugly hacks in the headers though, as well as implement some bits in JavaScript. This was enough for a lot of projects to work, but was still missing a lot of stuff, almost everything that wasn't implemented in a header.

That problem is what I worked on fixing last week: As of Saturday, we will build the libc++ sources, if they are needed by your project, and include them. To get this to be efficient, I also enabled LLVM's global dead code elimination, so that while we link in the entire C++ standard library, we immediately eliminate all the parts you don't actually need before proceeding to compile to JavaScript. This helps quite a lot with the size of the generated code (and also is nice for compilation times). Aside from that, the main challenge here was getting libc++ to build using the Newlib C standard library headers. The end result is that all of our hacks in the libc++ headers are now removed, we now build stock libc++ and include it if necessary, which means pretty much any C++ program that uses the C++ standard library should work. (With the obvious caveats of no multithreading with shared state and so forth, which are general limitations of compiling to JavaScript.)

I mentioned before that there were two downsides to building the standard libraries with your project, the first of which was licensing, which led us to avoid LGPL code. The other disadvantage is build time: You compile the standard library into JavaScript when you build your project, as opposed to just building your own project and linking it with prebuilt standard libraries. While I believe this is definitely worth the advantages of the approach (faster and smaller generated code), it is a concern. We get around a lot of the problem by (1) having the C standard library implemented in JavaScript, so there is no compilation time for it, (2) using LLVM's dead code elimination to quickly get rid of parts of the C++ standard library (and dlmalloc, as mentioned before that we also build, if it is used) early in the compilation process, (3) only linking in the C++ standard library (and dlmalloc) if they are actually used, and (4) caching the bitcode result of compiling the C++ standard library (and dlmalloc), so that we only compile it once to bitcode. With these in place, while emcc is still slower than gcc, it isn't very significant, except for the very first time you compile libc++ from source into bitcode (which as mentioned before, is done once and then cached).