This project implements a search engine for the University of North Texas (unt.edu) using a vector space retrieval model. The search engine crawls and parses web pages from unt.edu to build a dictionary of terms for efficient retrieval.
Vector space retrieval model: The search engine utilizes a vector space model to index and retrieve relevant documents based on query terms. Web crawling and parsing: The search engine includes a web crawler and parser that collects webpages from unt.edu and extracts relevant terms for indexing. Term-based search: Users can input queries, and the search engine returns a ranked list of documents based on term relevance. Evaluation of system: The system allows for evaluating the search results by comparing them with the original unt.edu search.
Once the search engine is up and running, you can perform searches by following these steps:
- Enter your query in the search box on the search engine interface.
- Click the "Search" button or press Enter.
- The search engine will retrieve and rank relevant documents based on the query.
- View the search results, which will include the document title, snippet, and link.
To evaluate the search engine's performance, follow these steps:
- Select a word or query of your choice.
- Perform the same query on the original unt.edu search engine.
- Perform the query on your system using the search engine you've implemented.
- Compare the search results and note any discrepancies or differences.