Skip to content

Vectorize Docs for RAG Chatbot #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

patriciaflutterflow
Copy link

@patriciaflutterflow patriciaflutterflow commented Apr 17, 2025

Link to RAG Chatbot PRD

There are 2 tables in BigQuery associated with documentation for the RAG Chatbot:

  1. doc_text (includes doc id + doc text)
  2. doc_text_vector (includes doc id + chunk id + vectorized chunk + chunk text)
    The code here ultimately processes each documentation page to write to the tables in BigQuery.

What happens:

  • Github action run everytime there is a change in a markdown file in docs (vectorize_docs.yaml), function is process_single_file in vectorize.py
  • the markdown file is chunked and each chunk is vectorized
  • these changes are added to bigquery
  • there is also a function that does a backfill for all existing docs

ToDo: right now, I am getting a 429 error quota exceeded after vectorizing sometimes. Check to see what quota limits are and how to get around it.

Copy link

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@github-actions github-actions bot requested a review from PoojaB26 April 17, 2025 06:38
@PoojaB26
Copy link
Collaborator

Is there a ticket to this or maybe an explanation of what it does? @patriciaflutterflow

@patriciaflutterflow patriciaflutterflow changed the title added logic to vectorize docs and github action Vectorize Docs for RAG Chatbot Apr 17, 2025
@patriciaflutterflow
Copy link
Author

Is there a ticket to this or maybe an explanation of what it does? @patriciaflutterflow

Hello @PoojaB26 ! Yes added a brief overview at the top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants