-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source code: Add Multimodal RAG with Elasticsearch Gotham City tutorial #390
source code: Add Multimodal RAG with Elasticsearch Gotham City tutorial #390
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs run locally to check everything works. But I spotted a couple of minor things that need changing.
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
...ntent/building-multimodal-rag-with-elasticsearch-gotham/stages/04-stage/rag_crime_analyze.py
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
...ilding-multimodal-rag-with-elasticsearch-gotham/stages/02-stage/test_embedding_generation.py
Outdated
Show resolved
Hide resolved
Adjustments made for new review |
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
@salgado, still running into the same issues. I'm around if you want to try to hop on a call and see if we can get the environment to work on my computer sometime next week |
Yes, I think that's better. I just replicated a new environment from scratch, and it worked... Let's try to schedule a time next week to adjust these details. Thanks again. |
I was wondering if this code, which looks quite neat, shouldn't be in the example-apps directory for better visibility? Also, I'd like to experiment later about observability on this, as multi-modal may present interesting challenges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this example will be really hard for someone to use on MacOS due to the pytorch dependency which is very hard to get right. While I've not verified it personally, and it may be too much, there's an alterative impl to consider https://pypi.org/project/onnxruntime-silicon/
Also, I think there are places you might be to reduce dependencies by using elser pipeline like this or something that uses it internally like langchain-elasticsearch (done in chatbot-rag-app)
PUT _ingest/pipeline/elser-pipeline
{
"processors": [{
"text_expansion": {
"model_id": ".elser_model_2",
"field": "text",
"prediction_field": "ml.tokens"
}
}]
}
Note that I am not an expert on ML rather quite the opposite, but I do think the aim is for folks to be able to run this. Failing this, I think it needs a dockerfile and if you think that's the way out I can try to help with it. At the moment, it isn't easy to run.
onnxruntime-silicon
""" | ||
try: | ||
response = self.client.chat.completions.create( | ||
model="gpt-4-turbo-preview", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read this from ENV e.g. os.getenv("CHAT_MODEL")
|
||
try: | ||
response = self.client.chat.completions.create( | ||
model="gpt-4-turbo-preview", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here read from os.getenv("CHAT_MODEL")
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/.env.template
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/.env.template
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/.env.template
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
...orting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/src/elastic_manager.py
Show resolved
Hide resolved
@salgado so this runs now. Note I changed the code to authenticate my local elasticsearch, also I installed on macos per notes I made earlier. I'm out of time today, but if you like please weave in any of the polishings you can. Tomorrow, I'll box time to help on a docker image. $ python stages/01-stage/files_check.py
All files are correctly organized!
$ python stages/02-stage/test_embedding_generation.py
Downloading ImageBind weights...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.47G/4.47G [01:26<00:00, 55.4MB/s]
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
(1024,)
$ python stages/03-stage/index_all_modalities.py
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
INFO:elastic_transport.transport:HEAD http://localhost:9200/multimodal_content [status:404 duration:0.015s]
INFO:elastic_transport.transport:PUT http://localhost:9200/multimodal_content [status:200 duration:0.208s]
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.211s]
INFO:__main__:
Indexed vision: {
"result": "created",
"_id": "ANI_PJUBWASaLF64_TED",
"_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.010s]
INFO:__main__:
Indexed vision: {
"result": "created",
"_id": "DdI_PJUBWASaLF64_zH8",
"_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.050s]
INFO:__main__:
Indexed vision: {
"result": "created",
"_id": "DtJAPJUBWASaLF64AjEW",
"_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.040s]
INFO:__main__:
Indexed audio: {
"result": "created",
"_id": "D9JAPJUBWASaLF64AzFz",
"_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.009s]
INFO:__main__:
Indexed text: {
"result": "created",
"_id": "ENJAPJUBWASaLF64BDE2",
"_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.015s]
INFO:__main__:
Indexed text: {
"result": "created",
"_id": "EdJAPJUBWASaLF64BDHQ",
"_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.011s]
INFO:__main__:
Indexed depth: {
"result": "created",
"_id": "EtJAPJUBWASaLF64BDH_",
"_index": "multimodal_content"
}
$ python stages/04-stage/rag_crime_analyze.py
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
INFO:elastic_transport.transport:HEAD http://localhost:9200/multimodal_content [status:200 duration:0.016s]
INFO:__main__:✅ All components initialized successfully
INFO:__main__:🔍 Collecting evidence...
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.110s]
INFO:__main__:✅ Data retrieved for vision: 2 results
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.005s]
INFO:__main__:✅ Data retrieved for audio: 2 results
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.013s]
INFO:__main__:✅ Data retrieved for text: 2 results
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.004s]
INFO:__main__:✅ Data retrieved for depth: 2 results
INFO:__main__:
📝 Generating forensic report...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llm_analyzer:
📋 Forensic Report Generated:
INFO:llm_analyzer:==================================================
INFO:llm_analyzer:**Prime Suspect:** The Joker
**Evidence Supporting Conclusion:**
- **Visual Evidence:**
- The crime scene photo features playing cards scattered around, which is a known signature of the Joker. The presence of sinister graffiti depicting the Joker laughing adds a psychological element of fear and chaos, aligning with his modus operandi. The similarity score of 0.83 indicates a high likelihood that this scene is directly related to the Joker.
- A photo of the Joker in an urban night setting, with his distinctive green hair, white face paint, and sinister smile, further corroborates his presence in the vicinity of the crime. The similarity score of 0.69 suggests a moderate to high likelihood of his involvement.
- **Auditory Evidence:**
- A sinister laugh captured near the crime scene with a similarity score of 1.00 perfectly matches the Joker's known laugh, serving as a strong auditory signature of his presence.
- The second audio piece, despite its lower similarity score of 0.57, still suggests the Joker's involvement due to the unique characteristics of his voice and laughter.
- **Textual Evidence:**
- The mysterious note found at the location, with a similarity score of 0.70, likely contains a message or riddle typical of the Joker's communication style, further implicating him in the crime.
- The description of the Joker in the text evidence matches the visual and auditory evidence, reinforcing the conclusion of his involvement.
- **Depth Evidence:**
- The depth sensor capture of the suspect with a similarity score of 0.77 suggests a figure matching the Joker's known height and build was present at the crime scene.
- Although the mysterious note's depth capture has a lower similarity score (0.55), it may indicate the note was left in a hurry or placed in a manner that suggests the Joker's hasty departure from the scene.
**Behavioral Patterns:**
The Joker is known for his love of chaos, use of symbolic markers (like playing cards), and leaving cryptic messages at his crime scenes. His signature is not just the physical evidence he leaves behind but also the psychological impact on the city and its inhabitants. The combination of visual, auditory, and textual clues, along with the depth sensor data, aligns perfectly with the Joker's behavioral patterns and criminal signature.
**Confidence Level:** 95%
**Next Steps:** No further evidence required.
The evidence collected and analyzed from multiple modalities strongly points to the Joker as the prime suspect in the Gotham Central Bank case. The high confidence level is based on the consistency and convergence of evidence across visual, auditory, textual, and depth data, all of which align with the Joker's known characteristics and criminal behavior.
INFO:llm_analyzer:==================================================
INFO:__main__:✅ Forensic report generated successfully
INFO:__main__:
📊 Report Preview:
INFO:__main__:++++++++++++++++++++++++++++++++++++++++++++++++++
INFO:__main__:**Prime Suspect:** The Joker
**Evidence Supporting Conclusion:**
- **Visual Evidence:**
- The crime scene photo features playing cards scattered around, which is a known signature of the Joker. The presence of sinister graffiti depicting the Joker laughing adds a psychological element of fear and chaos, aligning with his modus operandi. The similarity score of 0.83 indicates a high likelihood that this scene is directly related to the Joker.
- A photo of the Joker in an urban night setting, with his distinctive green hair, white face paint, and sinister smile, further corroborates his presence in the vicinity of the crime. The similarity score of 0.69 suggests a moderate to high likelihood of his involvement.
- **Auditory Evidence:**
- A sinister laugh captured near the crime scene with a similarity score of 1.00 perfectly matches the Joker's known laugh, serving as a strong auditory signature of his presence.
- The second audio piece, despite its lower similarity score of 0.57, still suggests the Joker's involvement due to the unique characteristics of his voice and laughter.
- **Textual Evidence:**
- The mysterious note found at the location, with a similarity score of 0.70, likely contains a message or riddle typical of the Joker's communication style, further implicating him in the crime.
- The description of the Joker in the text evidence matches the visual and auditory evidence, reinforcing the conclusion of his involvement.
- **Depth Evidence:**
- The depth sensor capture of the suspect with a similarity score of 0.77 suggests a figure matching the Joker's known height and build was present at the crime scene.
- Although the mysterious note's depth capture has a lower similarity score (0.55), it may indicate the note was left in a hurry or placed in a manner that suggests the Joker's hasty departure from the scene.
**Behavioral Patterns:**
The Joker is known for his love of chaos, use of symbolic markers (like playing cards), and leaving cryptic messages at his crime scenes. His signature is not just the physical evidence he leaves behind but also the psychological impact on the city and its inhabitants. The combination of visual, auditory, and textual clues, along with the depth sensor data, aligns perfectly with the Joker's behavioral patterns and criminal signature.
**Confidence Level:** 95%
**Next Steps:** No further evidence required.
The evidence collected and analyzed from multiple modalities strongly points to the Joker as the prime suspect in the Gotham Central Bank case. The high confidence level is based on the consistency and convergence of evidence across visual, auditory, textual, and depth data, all of which align with the Joker's known characteristics and criminal behavior.
INFO:__main__:++++++++++++++++++++++++++++++++++++++++++++++++++
|
I also got this code working with 3.12. I made a change to my Python path, and I changed the way I was running files in the virtual environment. So, I was using I also had to install an audio backend:
Then, install the Python package:
The output I got was:
|
I have docker working just about locally and plan to push a commit to your branch including some polishing i mentioned. Will have it up tomorrow |
@codefromthecrypt @JessicaGarson, thanks again for reviewing and running the code. @codefromthecrypt, could you comment here with the Dockerfile so I can try to replicate and test it today while you haven't pushed it yet? |
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
@salgado I pushed the commits before my last comment, so you should be able to pull them in and test. you can do whatever you like after as well. I just didn't want to block you. p.s. formatting was needed to pass the linter ( |
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/docker-compose.yml
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/docker-compose.yml
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/requirements.txt
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md
Outdated
Show resolved
Hide resolved
supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/Dockerfile
Outdated
Show resolved
Hide resolved
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
@codefromthecrypt I just ran the Docker command, and it worked!! docker compose run --build --rm search-and-analyze [+] Building 2.8s (39/39) FINISHED docker:desktop-linux Evidence Supporting Conclusion:
Behavioral Patterns: The Joker's criminal signature includes the use of playing cards, sinister graffiti, and taunting notes left at his crime scenes. These elements are not only trademarks of his identity but also serve to instill fear and chaos. His motives often revolve around creating anarchy and challenging Batman, rather than financial gain, which aligns with the theatrical and high-profile nature of the crime at the Gotham Central Bank. Confidence Level: 95% The evidence collectively points to the Joker with high confidence. The visual, auditory, and textual clues align closely with his known behavioral patterns and criminal signature. The depth sensor capture, while slightly less conclusive, still supports the identification based on physical appearance. Next Steps: No further evidence required. The combination of multimodal evidence strongly supports the conclusion that the Joker is the prime suspect. Additional evidence, such as forensic analysis of the playing cards or further examination of the mysterious note for fingerprints or DNA, could provide supplementary confirmation but is not necessary for a confident identification. Evidence Supporting Conclusion:
Behavioral Patterns: The Joker's criminal signature includes the use of playing cards, sinister graffiti, and taunting notes left at his crime scenes. These elements are not only trademarks of his identity but also serve to instill fear and chaos. His motives often revolve around creating anarchy and challenging Batman, rather than financial gain, which aligns with the theatrical and high-profile nature of the crime at the Gotham Central Bank. Confidence Level: 95% The evidence collectively points to the Joker with high confidence. The visual, auditory, and textual clues align closely with his known behavioral patterns and criminal signature. The depth sensor capture, while slightly less conclusive, still supports the identification based on physical appearance. Next Steps: No further evidence required. The combination of multimodal evidence strongly supports the conclusion that the Joker is the prime suspect. Additional evidence, such as forensic analysis of the playing cards or further examination of the mysterious note for fingerprints or DNA, could provide supplementary confirmation but is not necessary for a confident identification. |
ps if my docker stuff is causing more harm than good, please remove Dockerfile, docker-compose.yml and .dockerignore, and corresponding adds to README.md. There were some other things I polished so reverting everything I did may be throwing out the baby with the bathwater. In any case, I'm glad the code is working however it is intended to be run. Good job! |
@carlyrichmond , @JessicaGarson and @codefromthecrypt I believe this blog will provide significantly more value to our audience by presenting two distinct execution methods: Docker and Jupyter Notebook. And we don’t need to discard any of the latest excellent contributions. Docker provides a production-ready, reproducible environment that ensures consistent execution regardless of the user's setup. This is ideal for deployment scenarios and users who prefer containerized solutions. On the other hand, the Jupyter Notebook option creates an interactive learning experience, allowing users to experiment with the code step by step, making it much more accessible for educational purposes and quick exploration. By maintaining both options, we are not duplicating code but rather offering flexibility in how users can engage with the same underlying technology. This approach accommodates both technical users who need reproducibility and learners who benefit from interactivity. Please check the update I made in the README.md, where I clearly explain both approaches while keeping the core functionality identical. This inclusive approach will help us reach a broader audience – from ML engineers setting up production pipelines to data scientists exploring multimodal techniques for the first time. |
@salgado I would recommend picking one option. Having two options makes it harder to maintain the example and could cause confusion on which one is better to pick. |
Ok, so I made the adjustments to keep only the Jupyter Notebook. |
Good to know @salgado! Can we confirm that others are able to run the notebook please? 🤞 If not I would add Docker back in. |
@carlyrichmond @salgado. I could run it with a few enhancements (adding a few import statements and extra pip installs). I committed the improvements to the file. |
As discussed with @JessicaGarson, Jess is going to run and check things once more before it's merged. After merge @salgado please raise a PR to update the git clone command. Then if @JessicaGarson and @justincastilla can quickly check it still runs that would be great. Thanks all! If you need help ping me! |
This PR adds a new tutorial demonstrating how to build a Multimodal RAG system with Elasticsearch and ImageBind.
The tutorial covers:
The code is organized in stages for easy understanding and includes sample data for testing.