Skip to content

Smart document retrieval system utilizes FastAPI framework with Python to implement APIs. These APIs are designed to retrieve data from Elasticsearch based on user-specified queries.

Notifications You must be signed in to change notification settings

yaseen-asaliya/Smart-Document-Retrieval-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Document Retrieval System

Smart document retrieval system, a powerful tool implemented using the FastAPI framework with Python. This system empowers users to tailor their searches based on specific criteria such as article location, author, and topic.

Application interface

image

Key Features

  • Customized Searches: Users can refine their searches by specifying the location of the article, the author, and the topic.

  • Result Retrieval: The system retrieves results from Elasticsearch, ensuring they align with the user's specified criteria.

  • User-Friendly GUI: A graphical user interface (GUI) is provided, complete with a search box, to offer an intuitive and personalized search experience.

  • Visualizing the Structure of Stored Articles in Elasticsearch
article_1
{
  "date": "YYYY-MM-DD HH:mm:ss",
  "topics": ["topic1", "topic2", ...],
  "title": "Article Title",
  "author": [{"firstname": author1_firstname, "surname": author1_surname}, {"firstname": author2_firstname, "surname": author2_surname}, ...],
  "analized-body": ["term1", "term2", ...],
  "body": "Main text content of the article.",
  "temporal-expression": ["expression": "time1", "expression": "time2", ...],
  "geopoints": [{"lat": latitude1, "lot": longitude1}, {"lat": latitude2, "lot": longitude2}, ...],   
  "georeferences": [{"lat": latitude1, "lot": longitude1}, {"lat": latitude2, "lot": longitude2}, ...]
}
  • Index Mapping and Setting
index_mapping = {
    "mappings": {
        "properties": {
            "date": {"type": "date"},
            "topics": {"type": "keyword"},
            "title": {"type": "text", "analyzer": "autocomplete", "search_analyzer": "autocomplete_search"},
            "author": {
                "type": "nested",
                "properties": {
                    "firstname": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
                    "surname": {"type": "text", "fields": {"keyword": {"type": "keyword"}}}
                }
            },
            "analized-body": {"type": "text"},
            "body": {"type": "text"},
            "temporal-expression": {
                "type": "nested",
                "properties": {
                    "expression": {"type": "text"}
                }
            },
            "geopoints": {
                "type": "nested",
                "properties": {
                    "lon": {"type": "double"},
                    "lat": {"type": "double"}
                }
            },
            "georeferences": {
                "type": "nested",
                "properties": {
                    "lon": {"type": "double"},
                    "lat": {"type": "double"}
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": ["lowercase"]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase"
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 3,
                    "max_gram": 10,
                    "token_chars": ["letter", "digit"]
                }
            }
        }
    }
}

  • Articles Dates Distribution

This image was generated using an API from the applications.

image

About

Smart document retrieval system utilizes FastAPI framework with Python to implement APIs. These APIs are designed to retrieve data from Elasticsearch based on user-specified queries.

Topics

Resources

Stars

Watchers

Forks