Skip to content

Latest commit

 

History

History
98 lines (88 loc) · 3.64 KB

README.md

File metadata and controls

98 lines (88 loc) · 3.64 KB

Smart Document Retrieval System

Smart document retrieval system, a powerful tool implemented using the FastAPI framework with Python. This system empowers users to tailor their searches based on specific criteria such as article location, author, and topic.

Application interface

image

Key Features

  • Customized Searches: Users can refine their searches by specifying the location of the article, the author, and the topic.

  • Result Retrieval: The system retrieves results from Elasticsearch, ensuring they align with the user's specified criteria.

  • User-Friendly GUI: A graphical user interface (GUI) is provided, complete with a search box, to offer an intuitive and personalized search experience.

  • Visualizing the Structure of Stored Articles in Elasticsearch
article_1
{
  "date": "YYYY-MM-DD HH:mm:ss",
  "topics": ["topic1", "topic2", ...],
  "title": "Article Title",
  "author": [{"firstname": author1_firstname, "surname": author1_surname}, {"firstname": author2_firstname, "surname": author2_surname}, ...],
  "analized-body": ["term1", "term2", ...],
  "body": "Main text content of the article.",
  "temporal-expression": ["expression": "time1", "expression": "time2", ...],
  "geopoints": [{"lat": latitude1, "lot": longitude1}, {"lat": latitude2, "lot": longitude2}, ...],   
  "georeferences": [{"lat": latitude1, "lot": longitude1}, {"lat": latitude2, "lot": longitude2}, ...]
}
  • Index Mapping and Setting
index_mapping = {
    "mappings": {
        "properties": {
            "date": {"type": "date"},
            "topics": {"type": "keyword"},
            "title": {"type": "text", "analyzer": "autocomplete", "search_analyzer": "autocomplete_search"},
            "author": {
                "type": "nested",
                "properties": {
                    "firstname": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
                    "surname": {"type": "text", "fields": {"keyword": {"type": "keyword"}}}
                }
            },
            "analized-body": {"type": "text"},
            "body": {"type": "text"},
            "temporal-expression": {
                "type": "nested",
                "properties": {
                    "expression": {"type": "text"}
                }
            },
            "geopoints": {
                "type": "nested",
                "properties": {
                    "lon": {"type": "double"},
                    "lat": {"type": "double"}
                }
            },
            "georeferences": {
                "type": "nested",
                "properties": {
                    "lon": {"type": "double"},
                    "lat": {"type": "double"}
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": ["lowercase"]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase"
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 3,
                    "max_gram": 10,
                    "token_chars": ["letter", "digit"]
                }
            }
        }
    }
}

  • Articles Dates Distribution

This image was generated using an API from the applications.

image