Thursday, October 11, 2012

Quick and Dirty Node.js Process (Job) Queue

(LOL @ Windows Clipart)

I recently began experimenting with nodejs for a small web scraping project. I wrote a tiny script that goes out to lots of URLs and downloads files to disk. The simple solution was to iterate through the list and send a request to load the URL and download the page.

Too Many Open Files

Unfortunately, there are limits on the amount of simultaneous exec() calls you can make. Since running an external command via exec() is non-blocking, making too many back-to-back calls of it will result in the following:
        throw e; // process.nextTick error, or error event on first tick
Error: spawn EMFILE
    at errnoException (child_process.js:481:11)
    at ChildProcess.spawn (child_process.js:444:11)
    at child_process.js:342:9
    at Object.execFile (child_process.js:252:15)
    at child_process.js:220:18

Maximum Simultaneous Calls

To solve this, I implemented something like the following:
var queue = [];
var MAX = 20;  // only allow 20 simultaneous exec calls
var count = 0;  // holds how many execs are running
var urls = [...] // long list of urls

// our callback for each exec call
function wget_callback(err, stdout, stderr) {
  count -= 1;
  if (queue.length > 0 && count < MAX) {  // get next item in the queue!
    count += 1;
    var url = queue.shift();
    exec('wget '+url, wget_callback);

urls.forEach( function(url) {
  if (count < MAX) {  // go get the file!
    count += 1;
    exec('wget '+url, wget_callback);
  } else {  // queue it up...
This will only allow so many exec() calls to simultaneously run. The rest of the URLs will be stored in a queue until a slot becomes available for them. Checking (and shifting) the queue is done in the callback function wget_callback(). I fetch the next URL to download out of the queue only if there are no more than MAX exec() calls already running. I keep track of how many calls are currently running using count and increment/decrement accordingly.

I'm sure there are tons of libraries that do this, but I decided to implement a quick and dirty solution to this problem and thought I'd share!


  1. thanks a lot....

  2. This comment has been removed by the author.

  3. I love how clean this is. I like how the loop pre-loads the processes, which process the queue with exactly the right number of callbacks. I'm not sure I would've thought of doing it this cleanly.

    Could there be a race condition between multiple calls to the callback modifying the enclosed variables (queue and count)?

  4. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Front end developer learn from Node JS Online Training from India . or learn thru Javascript Online Training from India. Nowadays JavaScript has tons of job opportunities on various vertical industry. ES6 Online Training

  5. Good Post, I am a big believer in posting comments on sites to let the blog writers know that they ve added something advantageous to the world wide web.

    Java training in Chennai

    Java training in Bangalore


  6. This is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb.
    This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved. This arrange is synchronous s informative impolites festivity to pity. I appreciated what you ok extremely here 

    Selenium training in bangalore
    Selenium training in Chennai
    Selenium training in Bangalore
    Selenium training in Pune
    Selenium Online training