Sitebulb is designed to be a responsible crawler, with rate limiting in place that slows the tool down if the server appears to be struggling.
However, this can be too limiting to some users, who really want (or need) to crawl a lot faster than the default settings allow. This document explains how you can over-ride these settings to crawl as fast as you can.
The first thing to note, when it comes to speed, is that it is much quicker to crawl using the HTML Crawler, so if speed is a concern then use that, if possible.
You can select the crawler you wish to use in the Crawler Configuration settings from the left hand menu. All the other 'speed related' settings are also in these Crawler Settings:
You can adjust the speed by updating these settings:
The biggest difference-maker is the number of threads you wish to use when crawling. How fast you can crawl and the affect it has on your machine is dependent upon the number of logical processors (cores) that your machine has.
So in the example above, the machine only has 4 logical processors, so increasing the threads above 4 will start to hammer your CPU. You can increase this up to 16 threads, however auditing using way more threads than are actually available can lead to thread starvation, which causes your computer to slow down and sometimes crash.
Additionally, there is also a default limitation applied via the tickbox Limit URL Speed, which you can over-ride either by un-ticking the box or by changing the dropdown value for Max HTML URLs per Second.
This limitation exists to help you crawl responsibly, and if you want to learn more about that we suggest you read our guide on crawling responsibly.
However, if you are looking for the fastest crawl you can do with Sitebulb, do the following:
Please note that this is still limited by the machine itself. If you buy a new computer with 16 cores, you will be able to crawl faster than a machine with 8 cores, all else being equal.
If you selected the Chrome Crawler from the Crawler Type dropdown, the Crawler Configuration page will look slightly different.
There is the option to select how many Chrome instances you wish to use for crawling. Again, this is dependent upon the number of logical processors you have on your machine, and pushing the value up may have adverse effects on your machine while crawling.
Adjusting these values will affect how fast Sitebulb is able to crawl:
If you wish to learn more about adjusting the crawling speed, we suggest you read our documentation How to control URLs/second for chrome crawler.
Please note that we only recommend pushing up the speed options if you have permission to crawl and the website owner is comfortable with you crawling the website fast. Ideally, this would be a site you know can handle a high number of connections at once.
If you want to learn more about this subject, we suggest you read our guide on crawling responsibly.