Skip to content

Web Crawler written in C# that parses all urls from a specific page then recursively visits them while parsing all links available on that webpage

Notifications You must be signed in to change notification settings

Modyev/WebsiteCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


🌐 Recursive Web Crawler in C#

A simple web crawler in C# that recursively explores links from a webpage!

πŸ“‹ Features

  • πŸ” Parse Links: Starts by parsing all links from a given URL.
  • πŸ”„ Recursive Crawling: Visits each parsed link and extracts further links until the maximum limit is reached.
  • πŸ”— Link Extraction: Uses regular expressions to extract URLs from the page content.
  • πŸ›‘ Duplicate Protection: Maintains a HashSet of visited URLs to avoid revisiting and prevent infinite loops.
  • 🎯 Customizable: Set the maximum number of URLs to crawl.

πŸš€ How It Works

  1. Start Crawling: Specify the starting URL.
  2. Extract Links: The program fetches the content of the page and extracts all links.
  3. Recursive Visits: It recursively visits those links, repeating the process.
  4. Stop Condition: Crawling continues until the defined maximum number of URLs is reached.

About

Web Crawler written in C# that parses all urls from a specific page then recursively visits them while parsing all links available on that webpage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages