-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* oauth refactored and added to all routes * Added auth to query service * Initial auths added * Autorization moved to the request level as return objects are not needed for many functions * addedd params model_name * List all downloaded ollama models * feat ✨: Extend llm_utils module with OLLAMA model list - This commit extends the `llm_utils` module to provide a function for listing all available OLLAMA models. - Improved the query service and added model list functionality. * feat ✨: Improved user interface - Improved the user interface by adding a select dropdown to filter models. - Addthe ability to send a question along with the model name when pressing enter. * feat ✨: Enhanced data modeling and documentation - Implemented enhanced `pydantic` models for increased data modeling and clarity. - Implemented added status field and timestamps for Documents model objects. - The service file has been updated to incorporate document progress tracking and returns the validated DocumentPydantic model object for each document. - Improved document processing for better query response clarity and structure. * docs 📝:: Error handling for document retrieval updated. - Replaced error handling for document retrieval with a default empty list. * feat ✨: Consistent datetime format for timestamps - The codebase now uses a consistent datetime for both `created_at` and `updated_at`. * feat ✨: UI Document upload feature implementation - Implemented a document upload feature with a popup and modal. * feat ✨: UI Store Document management functionality added - Added document management functionality to the main store. * feat ✨: User authentication token clearing - Functionality for clearing the user's authentication token has been added. - The code implements a mechanism to handle unauthorized access attempts and redirect the user to the login page. * feat ✨: New response template implementation - The code implements a new response template for user queries based on the provided context. * feat ✨: Support for different document types - Added support for accepting different file types for document upload. * feat ✨: Improved vectorstore and AI assistant - The code updates the vectorstore initialization and embedding function to improve efficiency and accuracy. - Implemented an improved prompt template for AI assistant responses. * feat ✨: Monitoring system for document updates - Implemented a monitoring system to check for changes in the uploaded documents and update their status. * feat ✨: Update watchdog with new vector database - The code updates the `watchdog` to use a new vector database and update document status in the background. * fix 🐛: Simplified database configuration - Simplified database configuration for improved consistency and reduced complexity. * feat ✨: Use 'aora.db' for SQLite persistence - Changed database connection settings to use the 'aora.db' file for SQLite persistence. * feat ✨: Build and store vector embeddings for PDF documents - The code rewrites the logic to build and store vector embeddings of PDF documents into a persistent vector database. * feat ✨: Document saving and hashing method - The code defines a method to save documents into the designated directory and hash them for persistence in the database. * feat ✨: Implement environment variable loading - The code implements environment variables loading for the application, then sets up a context manager. --------- Co-authored-by: raikarn <nikhil.raikar@apl-landau.de> Co-authored-by: NikhilRaikar17 <nikhilraikar88@gmail.com>
- Loading branch information
1 parent
77cc4f6
commit 0e13e6d
Showing
16 changed files
with
420 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
import os | ||
import sys | ||
import time | ||
import hashlib | ||
from sqlalchemy import create_engine | ||
from sqlalchemy.orm import declarative_base | ||
|
||
|
||
from sqlalchemy.orm import sessionmaker | ||
from datetime import datetime | ||
from models.sqlalchemy_models import Documents | ||
|
||
|
||
from langchain_community.document_loaders import PyPDFLoader | ||
from langchain_community.embeddings import OllamaEmbeddings | ||
from langchain_text_splitters import RecursiveCharacterTextSplitter | ||
from langchain_chroma import Chroma | ||
|
||
from dotenv import load_dotenv | ||
|
||
|
||
load_dotenv() | ||
|
||
|
||
vectordatastore_directory = os.getenv("VECTORSTORE_DATABASE_PATH") | ||
documenst_directory = os.getenv("DOCUMENTS_DIRECTORY") | ||
|
||
DATABASE_URL = os.getenv("DATABASE_URL", "aora.db") | ||
engine = create_engine("sqlite:///"+DATABASE_URL) | ||
Base = declarative_base() | ||
|
||
|
||
|
||
|
||
|
||
Base.metadata.create_all(engine) | ||
Session = sessionmaker(bind=engine) | ||
session = Session() | ||
|
||
def check_file_in_db(filename): | ||
return session.query(Documents).filter_by(filename=filename).first() | ||
|
||
def update_file_status(filename): | ||
file = check_file_in_db(filename) | ||
# file_hash = hashlib.sha256(open(os.path.join(documenst_directory, filename), "rb").read()).hexdigest() | ||
|
||
if file and file.status != "done": | ||
|
||
file.status = "uploaded" | ||
file.updated_at = datetime.now() | ||
session.commit() | ||
create_vectorstore(filename) | ||
|
||
file.status = "done" | ||
file.updated_at = datetime.now() | ||
session.commit() | ||
|
||
|
||
def create_vectorstore(filename): | ||
|
||
text_splitter = RecursiveCharacterTextSplitter( | ||
# Set a really small chunk size, just to show. | ||
chunk_size=1300, | ||
chunk_overlap=110, | ||
length_function=len, | ||
) | ||
|
||
|
||
loader = PyPDFLoader(os.path.join(documenst_directory, filename)) | ||
doc = loader.load() | ||
document_split = text_splitter.split_documents(doc) | ||
|
||
Chroma.from_documents( | ||
collection_name=os.environ.get("COLLECTION_NAME"), | ||
documents=document_split, | ||
embedding=OllamaEmbeddings(model="mxbai-embed-large"), | ||
persist_directory=vectordatastore_directory, | ||
collection_metadata={"hnsw:space": "cosine"} | ||
) | ||
|
||
print("vectorstore created...") | ||
|
||
def monitor_directory(directory): | ||
previous_files = set() | ||
while True: | ||
try: | ||
current_files = set(os.listdir(directory)) | ||
print("current_files:", current_files) | ||
new_files = current_files - previous_files | ||
for filename in new_files: | ||
file_path = os.path.join(directory, filename) | ||
if os.path.isfile(file_path): | ||
update_file_status(filename) | ||
previous_files = current_files | ||
print(50* "*") | ||
time.sleep(3) | ||
except KeyboardInterrupt: | ||
print('Stopping script...') | ||
session.close() | ||
sys.exit(0) | ||
|
||
|
||
def main(): | ||
|
||
monitor_directory(documenst_directory) | ||
|
||
if __name__ == '__main__': | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.