Web workers – an overview

We recently had a talk about about service worker at BrisJS — the local Javasccript meetup — but I wanted to take a step back and talk a little more generally how workers all fit together.

You can view my original slides & interactive presentation on GitHub.

What’s a worker?

A worker is essentially a separate process in which you can run some Javascript code.

The thing about Javascript is that like most programming languages it’s single-threaded, which means when you’re executing code you can only do one thing at a time. We've got it fairly good with async tasks in Javascript land, with things like AJAX and upcoming ES6 promises, but even still if you've got a long-running task it’s going to lock up your thread until it’s done.

This doesn't sound so bad until you consider that the thread you're executing Javascript in also includes the thread that renders your page, which means your animation, scrolling, the whole shebang will stop working until your JS is done.

Thus separating your Javascript into a separate worker process can significantly improve performance on your site if you’re doing a lot of processing.

Quick description of workers

So should you just stick your code in a worker and hope for the best, right? Not quite.

A worker gives you a more Javascript-oriented environment than your standard script is used to. To start with, there's no window object, so things like window.onload don't do anything (the script in the worker starts firing once it's loaded, and doesn't need to wait for anything else). There's also no DOM or XML parsing, and you only get a subset of the APIs you’d have access to in a regular Javascript page.

This means that the code in your worker is essentially sandboxed from the regular goings on of your page, and you have to structure your app accordingly.

With such a restricted environment, it's easy to discredit workers, but there are several things a worker is good for:

Pretty good sandbox – makes you more mindful of your code, and helps you write good, testable modules.
Ideal for discrete tasks – Things that you can boil down to a standalone component are great for use in workers.
Longer running, CPU intensive tasks – These are where workers shine. If you're doing anything at all that might take longer than the 16.66 milliseconds/60 fps of your page draw, consider taking it out of the main thread for super performance gains & smooth animation/scrolling.

So this might include things like browser games or web apps with large amounts of data.

Types of worker

There’s three main types of worker:

Dedicated workers are the bread and butter worker, and work in almost all browsers. They’re really simple, you can fire them up in your page, do some processing, tear them down. Easy!
Shared workers are similar to dedicated workers, except you can fire one up and access it from anywhere else your entire origin. Multiple tabs and all. These aren't especially well supported, and in my opinion have been mostly superseded by:
Service workers! A relatively new type of worker with a longer lifecycle, which can run in the background even when your page is closed, and has a few extra APIs. They’re essentially designed to make the web more app-like.

Start up a worker

Starting up a worker looks a little something like this:

// index.html
var echoWorker = new Worker('echoworker.js');
echoWorker.onmessage = function(message){
    console.log('%cMessage from worker: ', 'color: darkred', message);
};
echoWorker.postMessage('Hello worker!');

In this example, we're firing up a worker by pointing to an external script echoworker.js and listening for any messages it might send us.

Then we're sending a hello world message to our new worker.

// echoworker.js
addEventListener('message', function(event) {
    console.log('%cMessage from parent: ', 'color: blue', event.data);
    postMessage(event.data);
}, false);

Inside the worker we can run any code we like. In this example we're simply listening for any messages our parent might send to us, and echoing them back once they've been received.

You can try this code on this page by opening up the console in the presentation and running echoWorker.postMessage('Hello worker!');.

Postmessage API

The postMessage API is the main way of communicating with a worker and has two parameters: message and transferList.

myWorker.postMessage([mixed] aMessage, [array] transferList);

In the initial spec, the message parameter only supported passing of strings, but modern browsers support structured clone which is a spec that’s similar to JSON.stringify except it supports a bunch more things, like blobs, files, regexes etc. This means that you can send a large number of data types through the message parameter, including objects.

One point to note is that passing messages using this API works differently due to the cross-process nature. While traditional Javsascript objects are passed by reference and exist only once in memory no matter how many times you pass them around, objects passed through postMessage are duplicated in order to exist in both processes. So if you're sending large amounts of data between workers, you may end up accruing a lot of duplicated memory that needs to be garbage collected.

To get around this, the transferlist parameter lets you transfer certain data types, rather than duplicating them. At the moment the only data types you can transfer are ArrayBuffers (binary data), but this covers a number of high-profile use cases including Canvas graphics and file handling.

The following example transfers the ArrayBuffer rather than duplicating it:

var myMessage = {
    command: 'doTheThing',
    data: someArrayBuffer
};
echoWorker.postMessage(myMessage, [someArrayBuffer]);

Worker sope

There's not a lot of API devoted to managing workers. Essentially you have:

Worker.terminate()
WorkerGlobalScope.close()
WorkerGlobalScope.importScripts()

Terminate lets you terminate a worker from the main script. Close lets you terminate a worker from the inside.

importScripts is the fun one, it's essentially the equivalent of putting a <script> tag into your HTML document, in that it blocks the page and loads the script at that point in time. Of course since we're in a worker it doesn't matter if we block the Javascript because it doesn't affect what's in the page, so this is useful for loading any dependencies you might need before your script runs.

My preference here is to simply bundle everything up with Browserify (because you get all the benefits of NPM and don't have to worry about the file management and extra requests that ensue) but you can just as easily use the native implementation.

Some gotchas

One of the biggest gotchas I've hit when using workers is scripts expecting window to exist. You've probably seen stuff like this around:

if(window.XMLHttpRequest){
    // do the magic
}

It's a handy shortcut to check if something exists, but it's a bit of an antipattern these days because often these functions may exist in another global scope. In Node you've got global and in workers you've got self, so in each of these environments your code is going to throw an error and break everything.

There's two ways you can solve this:

Either wrap the offending code in a closure:
(function(window){ […] })(self);
or ensure window exists:
self.window = self;

The first option is nicer because it's a less grotty-globally hack, but the second is by far the easiest and doesn't seem to cause any problems.

How fast are workers

Guillaume Cedric Marty did a bunch of benchmarks on some of their low end Firefox OS phones over at the Mozilla Hacks blog recently, so if you're interested in the details you can check that out..

The summary: firing up a new worker is the slowest part of the whole affair, everything else is blindingly fast.

Operation	Cost
Initialisation	~40ms
postMessage latency	~0.5ms
Communication speed	45kB/ms

The take-out from this is that you should fire up only as many workers as you need, and reuse them for common tasks. You can do this by having all your tasks bundled into the one codebase, and then running one or more identical workers that you can share tasks across as required.

Another consideration in terms of performance is only firing up as many workers as will benefit your system. Considering your render thread is already taking up a process, you should consider firing up cores - 1 workers (or if your render thread won't be doing much, fire up a worker per core.)

You can get the number of cores in a system using the webkit-specific navigator.hardwareConcurrency or using one of the polyfills depending on your needs.

Actually writing an app with workers

So you're writing an app that will be using workers, one of the most important things to get right is the communication. postMessage is a super simple communication mmethod, so you'll probably want to build something on top to make this smoother.

My favourite thing to do is write a little async ‘do something’ api, so you can post commands and receive callbacks.
Something I've been playing with is running the whole model in a worker. This can be useful when dealing with lots of data & processing.
An existing library: there's a bunch of them. I like workerproxy because it gives you load balancing and some debugging for free, but I've found it can eat errors and make things difficult sometimes.