Distributed Computing in the Browser

Why to Do it?

It has been a long time since scientists and researchers casted their attention to distributed computing. Without the help of distributed computing, Facebook couldn’t have reached such remarkable triumph in social-network market. As is known to many interested people, distributed computing can, and does, rely on enormous clusters of comparatively cheap computing devices to make good use of their computing power. It will definitely change the situation that distributed computing is still expensive and not that open to common people or schools if millions of smart devices can make contributions to the computing tasks. The blueprint invites us to this question: What if we can exploit the browser for distributed computing?

It is quite a reasonable idea. Nowadays, the computing capability of an iPhone has exceeded that of many ancient scientific computers. However, why the browser is chosen? Take Chrome as an example, thanks to its V8 engine, JavaScript can run much faster than Python in benchmark tests. Besides, the browser is one of the several infrastructures that do not require additional environment configurations, which makes large-scale distributed computing more practical. After all, it sounds much better to run everything as people browse the internet compared to asking them to download and install c++, gcc, and so on.

Status Quo

Basically speaking, the emergence of Web Worker makes all of these ideas possible. It enables scripts in the browser to be run in different threads, which means multiple tasks can exist at the same time. Suppose a user is using Facebook, he can devote his CPU’s power to certain research work at the same time in the background in that another thread is able to cope with the computing separately there.

What is more, webSocket as well as webRTC(Real Time Communication) brings the specifications and prototypes to live. Both of them enable programmers to write stable connection on top of TCP. As far as I am concerned, many newly created websites including Zhihu take webSocket as a solution for continuous notification in the frontend. Inspired and supported by these technologies, there have been libraries devoted to the first step of distributed computing in the browser. Two of the most attractive of them are Dnode and Parellel.js. Dnode builds a higher level of RPC(Remote Procedure Call) and the latter tackles the mission of parallel computing.

With the help of them, programmers can write Map/Reduce in the frontend, in parallel and even asynchronized way. That is to say, once the server is ready to work, it can send tasks to any browsers (tabs, in fact) that is viewing the relevant page, and the tabs will start to interact with the browser, which is not limited to finishing the task and return the ultimate result but can call remote functions on the server and vice versa. From this aspect, each tab on every browser of all devices can server as a diligent worker. As long as the tasks are carefully scheduled, it can see a fantastic outcome. Chris Wellons, a curious geek, managed to organize a voluntary distributed computing supported by reddit users to calculate a specific permutation of alphabet whose space is 403,291,461,126,605,635,584,000,000. 4 That is a successful try, which leads to the following thoughts.

Future Direction

Let’s make an assumption. Assume V8 engine can have more access to hardware, for example, GPU. Why not give machine learning a chance? In fact, one PhD student of Stanford, Andrej Karpathy, has made this come true with his experimental project ConvNetJS. ConvNetJS trains deep learning models completely on the browser. At present, it gains around 5,000 stars on Github. The problem of it is speed: It is too slow. I haven’t finish the example of Classifing MNIST digits using ConvNetJS within a whole day. Therefore, scientists of MxNet come up with a compromised way: use the browser to do some work using already trained models.

Every time a new impressive research comes out, there are peers who intend to reproduce the experiment. Nevertheless, the reproduction may take such a long time that it is torturing and tedious to wait sitting in front of the desk. In the discussion of the paper “MLitB: Machine Learning in the Browser 6”, the authors support this idea that researchers can use massive browsers’ computing power to enhance local reproduction. In addition, distributed machine learning in the browser brings the mysterious work to public and can make contributions to people’s cognition as well as interest in it. Chances are that machine learning can be ubiquitous in this way.

What is more, as the distributed computing in the browser goes on, the low efficiency of memory usage shall be changed and vector calculation should be supported in JavaScript as a fundamental core library. Only when browsers can touch GPU, say, with CUDA, in an elegant way can the distributed computing be swift and realistic.

Personal Conclusion

From my perspective, the browser is one of the infrastructures of this era and distributed computing is evolving into more than “cloud”. Obstacles like speed and sandbox limitation though there exist, the large-scale computing in the browser have its promising cutting edge in the trend of cheaper devices and growing demand including machine learning for computing powers.