Getting started
Install
dotnet add package Tharga.Crawler
Tharga.Crawler targets .NET 8, .NET 9, and .NET 10.
Namespace
The entry-point types live under Tharga.Crawler:
using Tharga.Crawler;
Minimal example
var crawler = new Crawler();
var result = await crawler.StartAsync(new Uri("https://example.com/"));
The default Crawler constructor wires up an HttpClientDownloader, a BasicPageProcessor, and a MemoryScheduler — enough to crawl a public site without any configuration.
Multiple starting points
var crawler = new Crawler();
var uris = new[] { new Uri("https://example.com/"), new Uri("https://example.com/blog") };
var result = await crawler.StartAsync(uris);
Events
Three events let you observe progress without awaiting the final result:
crawler.CrawlerCompleteEvent += (s, e) =>
Console.WriteLine($"Done: {e.CrawlerResult.GetFinalPages().Count()} pages.");
crawler.PageCompleteEvent += (s, e) =>
Console.WriteLine($"Downloaded: {e.CrawlContent.FinalUri} ({e.CrawlContent.StatusCode})");
crawler.PageFailedEvent += (s, e) =>
Console.WriteLine($"Failed: {e.CrawlContent.RequestUri} - {e.CrawlContent.StatusCode}");
PageCompleteEvent fires on HTTP 2xx responses; PageFailedEvent fires on non-2xx or exceptions.
Dependency injection
Register the crawler in your service collection:
services.AddCrawler();
All components are registered as transient, so multiple parallel crawler instances are supported. Inject ICrawler where you need one:
public class MyService(ICrawler crawler)
{
public Task<CrawlerResult> Crawl(Uri uri) => crawler.StartAsync(uri);
}
For scenarios where you need to construct a crawler with custom components at runtime, inject ICrawlerProvider:
public class MyService(ICrawlerProvider crawlerProvider)
{
public Task<CrawlerResult> Crawl(Uri uri, IScheduler myScheduler)
{
var crawler = crawlerProvider.GetCrawlerInstance(scheduler: myScheduler);
return crawler.StartAsync(uri);
}
}
You can also replace any default component at registration time:
services.AddCrawler(options =>
{
options.Scheduler = provider => new MyCustomScheduler();
options.Downloader = provider => new MyCustomDownloader();
});
Next steps
- Configuration —
CrawlerOptions,DownloadOptions,SchedulerOptions, cancellation, and result shape. - Custom services — replace any of the four pluggable components.