This is a Java program for text indexing, which takes in a file containing names of files to be indexed, scans each file to extract the words and assigns a document number to each file. It then builds a hash map to map each unique word to the document number(s) where it appears. The program also implements stop word removal and stemming to improve the quality of the indexed words. The output is a dictionary of words and the corresponding documents where they appear.
-
Notifications
You must be signed in to change notification settings - Fork 0
shubhamshubhankar/Indexer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This is a program which indexes all the terms of the documents of the corpus.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published