Configuration
CrawlerOptions is the top-level knob bag passed to StartAsync. It groups download- and scheduler-level options.
CrawlerOptions
| Option | Default | Description |
|---|---|---|
MaxCrawlTime |
No limit | Maximum total duration for the crawl |
NumberOfProcessors |
3 | Number of parallel page processors |
DownloadOptions |
new DownloadOptions() |
See below |
SchedulerOptions |
new SchedulerOptions() |
See below |
DownloadOptions
| Option | Default | Description |
|---|---|---|
RetryCount |
3 | Number of retries for HTTP 5xx errors |
Timeout |
No limit | Timeout per individual page download |
UserAgent |
UserAgentLibrary.Chrome |
User agent string sent with requests |
SchedulerOptions
| Option | Default | Description |
|---|---|---|
MaxQueueCount |
No limit | Maximum items in the queue. New URIs are dropped when the limit is reached. |
Full example
var crawler = new Crawler();
var options = new CrawlerOptions
{
MaxCrawlTime = TimeSpan.FromMinutes(10),
NumberOfProcessors = 5,
DownloadOptions = new DownloadOptions
{
RetryCount = 3,
Timeout = TimeSpan.FromSeconds(30),
UserAgent = UserAgentLibrary.Chrome
},
SchedulerOptions = new SchedulerOptions
{
MaxQueueCount = 1000
}
};
var result = await crawler.StartAsync(new Uri("https://example.com/"), options);
User agent library
UserAgentLibrary provides built-in user agent strings so you don't have to type them out:
UserAgentLibrary.Chrome(default)UserAgentLibrary.FirefoxUserAgentLibrary.EdgeUserAgentLibrary.GooglebotUserAgentLibrary.BingbotUserAgentLibrary.DuckDuckBot
Cancellation
All StartAsync overloads accept a CancellationToken:
using var cts = new CancellationTokenSource(TimeSpan.FromMinutes(5));
var result = await crawler.StartAsync(new Uri("https://example.com/"), cancellationToken: cts.Token);
Console.WriteLine($"Cancelled: {result.IsCancelled}, Elapsed: {result.Elapsed}");
When cancelled, StartAsync still returns a CrawlerResult — pages crawled before cancellation are included and IsCancelled is true.
CrawlerResult
The CrawlerResult returned from StartAsync exposes:
IsCancelled— whether the crawl was cancelled.Elapsed— total crawl duration.GetRequestedPages()— every page that was requested.GetFinalPages()— distinct final pages (after following redirects), ordered by redirect count.
Each crawled page includes the HTTP status code, redirect chain, final URI, content type, download time, and page title.