Thursday, October 25, 2012

Emscripten News: BananaBread, Nebula3, GDC Online, Websockets, Worker API, Performance, etc

I haven't found time to blog about individual things, so here is one post that summarizes various Emscripten-related things that happened over the last few months.

BananaBread

BananaBread, a port of the Cube 2 game engine to the web, was launched and then received a few minor updates with bugfixes and some additional experimental levels. Feedback was good, and it was linked to by a Firefox release announcement and later a Chromium release announcement, in both cases to show that each browser is now capable of running first person shooter games.

Nebula 3

Cube 2 isn't the only game engine being ported to the web using Emscripten, this post by a Nebula 3 dev is worth reading, and check out the demo it links to. Nebula is a powerful game engine that like id tech engines gets open source releases now and then, and has been used in some impressive games (like this). Very cool to see it working well in JS+WebGL, especially given the dev's initial skepticism - read the blogpost! :)

GDC Online

I gave a talk together with Kevin Gadd at GDC Online, here are my slides. We talked about compiling games to HTML5, I focused on C++ and Kevin on C#, so overall we covered a lot of potential codebases that could be automatically ported to the web.

Among the demos I gave was of course BananaBread, as an example of a 3D first person shooter compiled from C++ and OpenGL to JavaScript and WebGL. Interestingly, Adobe gave a talk later that day about porting games to web browsers, which compared 4 platforms: WebGL/JS, Flash, NaCl, and Unity, and for the WebGL/JS demo they also presented BananaBread, so it ended up being shown twice ;)

Workers API

Support for worker threads is in the incoming branch, look in emscripten.h and for tests with "worker_api" in them in tests/runner.py. This API basically lets you compile code into "worker libraries" that the main thread can then call and get responses from, giving you an easy way to do message-passing style concurrency.

The API is in initial stages, feedback is welcome.

Networking

Initial support for networking using websockets has also been implemented, see tests with "websockets" in their name. Basic sockets usage works, however we have had troubles with setting up a testing websocket server with binary support, see the issue for details. Because of that this won't work on arbitrary binary data yet. If you know websockets and websocket servers and are interested in helping with this, that would be great.

Another approach we intend to work on, and where help would be welcome, is WebRTC. WebRTC could actually be easier to work with since it supports p2p connections, so it's easy to test a connection from one page to itself. It also supports UDP-style unreliable data, so we should be able to get multiplayer working in BananaBread when that is complete.

Library Bindings to JavaScript

We currently have the "bindings generator" which is used to make ammo.js and box2d.js. It works for them, but needs manual hacking and has various limitations. A more proper approach is being worked on, contributed by Chad Austin, which he called "embind". This is a more explicit, controllable approach to bindings generation, and in time it should give us big improvements in projects like ammo.js and box2d. If you use those projects and want them to improve, the best way is to help with the new embind bindings approach. We have some initial test infrastructure set up, and there are various bugs filed with the tag "embind" if you are interested.

LLVM backend

I did some experiments with an LLVM backend for Emscripten when I had free time over the last few months. The results were interesting, and I got some "hello world" stuff working, during which I learned a lot about how LLVM backends are built.

Overall this is a promising approach, and it is what pretty much all other compilers from languages like C into JS work. However, this is going to be low priority, for two main reasons. First, we simply lack the resources: There are many, many other things that are very important for me to work on in Emscripten (see other points in this blogpost for some), and we have not had luck in interesting people to collaborate on this topic so far. Second, while my investigations were mostly positive, they also found negatives in going the LLVM backend route. Some types of optimizations that make sense for JavaScript are an uncomfortable fit for LLVM's backends, which is not surprising given how different JS is from most target languages. It's possible to overcome those issues, of course, but it isn't the optimal route.

Why do pretty much all the other compilers go the LLVM backend route? I suspect it might have to do with the fact that they typically do not just compile to JS. For example, if you already have a compiler into various targets, then when you consider also compiling into JS, it is simplest to modify your existing approach to do that as well. Emscripten on the other hand is 100% focused on JS only, so that's a fundamental difference. If all you care about is targeting JS, it is not clear that an LLVM backend is the best way to go. (In fact I suspect it is not, but to be 100% sure I would need to fully implement a backend to compare it to.)

Compiler and Code Perf

To continue the previous point, there is however one aspect of an LLVM backend that is greatly beneficial - it's written in efficient C++ and will compile your code quickly. Emscripten on the other hand is written in JavaScript and has some complex optimization passes that do a lot of work on a JS AST, and these can take a long time. A fully optimized build of BananaBread, for example, takes about 3 minutes on my laptop, and while it's a big project there are bigger ones of course that would take even more.

On the one hand, this doesn't matter that much - it's done offline by the compiler. People running the generated code don't notice it. But of course, making developer's lives easier is important too.

In Emscripten the goal has always been to focus more on performance of the generated code rather than performance of the compiler itself, so we have added new optimization passes even when they were expensive in compilation time, as long as they made the generated code faster. And we rely on tools like Closure Compiler that take a long time to run but are worth it.

But compiler perf vs code perf isn't an all of nothing decision. Right now on the incoming branch there are some (not fully finished and slightly buggy, but almost ready) optimizations that improve compilation time quite a bit. And with those in place we can move towards parallel compilation in almost all of the optimization passes, so with 8 cores you might get close to 8x speedups in compilation, etc.

So the current goal is to focus on the existing compiler. It will get much faster than it is now, but it will probably never get close to the speed an LLVM backend could get, that's the tradeoff we are making in order to focus on generating faster code. An additional reason this tradeoff makes sense is that we currently have plans for several new types of optimizations to make the generated code yet faster, and it is far easier for us to work on them in the current compiler than an LLVM backend.

Record/Replay

Finally, we added a system for recording and replaying of Emscripten-compiled projects (see reproduceriter.py). With it you basically compile your project in a special mode, run it in record mode and do stuff, then you can run the project in replay mode and it will replay the exact same output you saw before.

The main use case for this is benchmarks: If you have a program that depends on user input and random things like timing or Math.random(), then it is very hard to generate a good benchmark from it because you get different code being run each time. With the record/replay facility you can basically make a reproducible execution trace.

This has been tested on BananaBread so far, and used to create BananaBench, a benchmark based on BananaBread. You can either run it in the browser or in the shell, and hopefully a real-world benchmark like this will make it easier to optimize browsers for this kind of content.