I recently began experimenting with nodejs for a small web scraping project. I wrote a tiny script that goes out to lots of URLs and downloads files to disk. The simple solution was to iterate through the list and send a request to load the URL and download the page.
Too Many Open Files
Unfortunately, there are limits on the amount of simultaneous exec() calls you can make. Since running an external command via exec() is non-blocking, making too many back-to-back calls of it will result in the following:node.js:201 throw e; // process.nextTick error, or error event on first tick ^ Error: spawn EMFILE at errnoException (child_process.js:481:11) at ChildProcess.spawn (child_process.js:444:11) at child_process.js:342:9 at Object.execFile (child_process.js:252:15) at child_process.js:220:18
Maximum Simultaneous Calls
To solve this, I implemented something like the following:var queue = []; var MAX = 20; // only allow 20 simultaneous exec calls var count = 0; // holds how many execs are running var urls = [...] // long list of urls // our callback for each exec call function wget_callback(err, stdout, stderr) { count -= 1; if (queue.length > 0 && count < MAX) { // get next item in the queue! count += 1; var url = queue.shift(); exec('wget '+url, wget_callback); } } urls.forEach( function(url) { if (count < MAX) { // go get the file! count += 1; exec('wget '+url, wget_callback); } else { // queue it up... queue.push(url); } });This will only allow so many exec() calls to simultaneously run. The rest of the URLs will be stored in a queue until a slot becomes available for them. Checking (and shifting) the queue is done in the callback function wget_callback(). I fetch the next URL to download out of the queue only if there are no more than MAX exec() calls already running. I keep track of how many calls are currently running using count and increment/decrement accordingly.
I'm sure there are tons of libraries that do this, but I decided to implement a quick and dirty solution to this problem and thought I'd share!
Thanks! Good idea.
ReplyDeletethanks a lot....
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI love how clean this is. I like how the loop pre-loads the processes, which process the queue with exactly the right number of callbacks. I'm not sure I would've thought of doing it this cleanly.
ReplyDeleteCould there be a race condition between multiple calls to the callback modifying the enclosed variables (queue and count)?
ReplyDeleteThis is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb.
This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved. This arrange is synchronous s informative impolites festivity to pity. I appreciated what you ok extremely here
Selenium training in bangalore
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing.
ReplyDeleteJoin Cloud Computing Training in Bangalore at Softgen Infotech. Learn from Certified Professionals with 10+ Years of experience in Cloud Computing. Get 100% Placement Assistance. Placements in MNC after successful course completion.
Thank you so much for the great and very beneficial stuff that you have shared with the world.
ReplyDeleteLooking for Training Institute in Bangalore , India. Softgen Infotech is the best one to offers 85+ computer training courses including IT software course in Bangalore, India. Also it provides placement assistance service in Bangalore for IT.
Best Software Training Institute in Bangalore
Wow its a very good post. The information provided by you is really very good and helpful for me. Keep sharing good information.
ReplyDeleteBest Training Institute in Bangalore BTM. My Class Training Bangalore training center for certified course, learning on Software Training Course by expert faculties, also provides job placement for fresher, experience job seekers.
Software Training Institute in Bangalore
Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday. I want to encourage that you continue your great posts.
ReplyDeleteYour Website is very good, Your Website impressed us a lot, We have liked your website very much. Keep Posting
ReplyDeletesap training in chennai
sap training in annanagar
azure training in chennai
azure training in annanagar
cyber security course in chennai
cyber security course in annanagar
ethical hacking course in chennai
ethical hacking course in annanagar
I like the helpful info you supply in your articles. I’ll bookmark your weblog and take a look at once more here regularly. I am relatively certain I will learn a lot of new stuff right here! Good luck for the following!
ReplyDeletesap training in chennai
sap training in velachery
azure training in chennai
azure training in velachery
cyber security course in chennai
cyber security course in velachery
ethical hacking course in chennai
ethical hacking course in velachery
Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here..
ReplyDeleteoracle training in chennai
oracle training in omr
oracle dba training in chennai
oracle dba training in omr
ccna training in chennai
ccna training in omr
seo training in chennai
seo training in omr
❤ I favor the idea, such a good deal digital marketing
ReplyDeleteGet the answer for the query “ How To Get a Job in Infosys as a Fresher? ” with the real-time examples and best interview questions and answers from the best software training institute in Chennai, Infycle Technologies. Get the best software training and placement with the free demo and great offers, by calling +91-7504633633, +91-7502633633.
ReplyDeletemmorpg oyunlar
ReplyDeleteinstagram takipçi satın al
Tiktok Jeton Hilesi
Tiktok Jeton Hilesi
SAC EKİMİ ANTALYA
TAKİPÇİ SATIN AL
instagram takipçi satın al
metin2 pvp serverlar
instagram takipçi satın al
instagram takipçi satın al
ReplyDeletecasino siteleri
ON5R
Good content. You write beautiful things.
ReplyDeletemrbahis
hacklink
vbet
hacklink
vbet
sportsbet
taksi
sportsbet
korsan taksi
bilecik
ReplyDeletegebze
ısparta
şırnak
alsancak
MBYE
bilecik
ReplyDeletegebze
ısparta
şırnak
alsancak
VVY2M