Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source code: Add Multimodal RAG with Elasticsearch Gotham City tutorial #390

Merged
merged 25 commits into from
Feb 28, 2025

Conversation

salgado
Copy link
Contributor

@salgado salgado commented Feb 8, 2025

This PR adds a new tutorial demonstrating how to build a Multimodal RAG system with Elasticsearch and ImageBind.

The tutorial covers:

  • Multimodal embedding generation with ImageBind
  • Vector storage and search in Elasticsearch
  • Cross-modal similarity search
  • Evidence analysis with GPT-4

The code is organized in stages for easy understanding and includes sample data for testing.

Copy link
Contributor

@carlyrichmond carlyrichmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs run locally to check everything works. But I spotted a couple of minor things that need changing.

@salgado
Copy link
Contributor Author

salgado commented Feb 14, 2025

Adjustments made for new review

@JessicaGarson
Copy link
Contributor

@salgado, still running into the same issues. I'm around if you want to try to hop on a call and see if we can get the environment to work on my computer sometime next week

@salgado
Copy link
Contributor Author

salgado commented Feb 14, 2025

Yes, I think that's better. I just replicated a new environment from scratch, and it worked... Let's try to schedule a time next week to adjust these details. Thanks again.

@codefromthecrypt
Copy link
Collaborator

I was wondering if this code, which looks quite neat, shouldn't be in the example-apps directory for better visibility? Also, I'd like to experiment later about observability on this, as multi-modal may present interesting challenges.

Copy link
Collaborator

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this example will be really hard for someone to use on MacOS due to the pytorch dependency which is very hard to get right. While I've not verified it personally, and it may be too much, there's an alterative impl to consider https://pypi.org/project/onnxruntime-silicon/

Also, I think there are places you might be to reduce dependencies by using elser pipeline like this or something that uses it internally like langchain-elasticsearch (done in chatbot-rag-app)

PUT _ingest/pipeline/elser-pipeline
{
  "processors": [{
    "text_expansion": {
      "model_id": ".elser_model_2",
      "field": "text",
      "prediction_field": "ml.tokens"
    }
  }]
}

Note that I am not an expert on ML rather quite the opposite, but I do think the aim is for folks to be able to run this. Failing this, I think it needs a dockerfile and if you think that's the way out I can try to help with it. At the moment, it isn't easy to run.

onnxruntime-silicon

"""
try:
response = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read this from ENV e.g. os.getenv("CHAT_MODEL")


try:
response = self.client.chat.completions.create(
model="gpt-4-turbo-preview",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here read from os.getenv("CHAT_MODEL")

Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
@codefromthecrypt
Copy link
Collaborator

@salgado so this runs now. Note I changed the code to authenticate my local elasticsearch, also I installed on macos per notes I made earlier. I'm out of time today, but if you like please weave in any of the polishings you can. Tomorrow, I'll box time to help on a docker image.

$ python stages/01-stage/files_check.py
All files are correctly organized!

$ python stages/02-stage/test_embedding_generation.py
Downloading ImageBind weights...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.47G/4.47G [01:26<00:00, 55.4MB/s]
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
(1024,)

$  python stages/03-stage/index_all_modalities.py
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
INFO:elastic_transport.transport:HEAD http://localhost:9200/multimodal_content [status:404 duration:0.015s]
INFO:elastic_transport.transport:PUT http://localhost:9200/multimodal_content [status:200 duration:0.208s]
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.211s]
INFO:__main__:

Indexed vision: {
  "result": "created",
  "_id": "ANI_PJUBWASaLF64_TED",
  "_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.010s]
INFO:__main__:

Indexed vision: {
  "result": "created",
  "_id": "DdI_PJUBWASaLF64_zH8",
  "_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.050s]
INFO:__main__:

Indexed vision: {
  "result": "created",
  "_id": "DtJAPJUBWASaLF64AjEW",
  "_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.040s]
INFO:__main__:

Indexed audio: {
  "result": "created",
  "_id": "D9JAPJUBWASaLF64AzFz",
  "_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.009s]
INFO:__main__:

Indexed text: {
  "result": "created",
  "_id": "ENJAPJUBWASaLF64BDE2",
  "_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.015s]
INFO:__main__:

Indexed text: {
  "result": "created",
  "_id": "EdJAPJUBWASaLF64BDHQ",
  "_index": "multimodal_content"
}
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_doc [status:201 duration:0.011s]
INFO:__main__:

Indexed depth: {
  "result": "created",
  "_id": "EtJAPJUBWASaLF64BDH_",
  "_index": "multimodal_content"
}

$  python stages/04-stage/rag_crime_analyze.py

INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
INFO:elastic_transport.transport:HEAD http://localhost:9200/multimodal_content [status:200 duration:0.016s]
INFO:__main__:✅ All components initialized successfully
INFO:__main__:🔍 Collecting evidence...
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.110s]
INFO:__main__:✅ Data retrieved for vision: 2 results
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.005s]
INFO:__main__:✅ Data retrieved for audio: 2 results
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.013s]
INFO:__main__:✅ Data retrieved for text: 2 results
INFO:elastic_transport.transport:POST http://localhost:9200/multimodal_content/_search [status:200 duration:0.004s]
INFO:__main__:✅ Data retrieved for depth: 2 results
INFO:__main__:
📝 Generating forensic report...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llm_analyzer:
📋 Forensic Report Generated:
INFO:llm_analyzer:==================================================
INFO:llm_analyzer:**Prime Suspect:** The Joker

**Evidence Supporting Conclusion:**

- **Visual Evidence:**
  - The crime scene photo features playing cards scattered around, which is a known signature of the Joker. The presence of sinister graffiti depicting the Joker laughing adds a psychological element of fear and chaos, aligning with his modus operandi. The similarity score of 0.83 indicates a high likelihood that this scene is directly related to the Joker.
  - A photo of the Joker in an urban night setting, with his distinctive green hair, white face paint, and sinister smile, further corroborates his presence in the vicinity of the crime. The similarity score of 0.69 suggests a moderate to high likelihood of his involvement.

- **Auditory Evidence:**
  - A sinister laugh captured near the crime scene with a similarity score of 1.00 perfectly matches the Joker's known laugh, serving as a strong auditory signature of his presence.
  - The second audio piece, despite its lower similarity score of 0.57, still suggests the Joker's involvement due to the unique characteristics of his voice and laughter.

- **Textual Evidence:**
  - The mysterious note found at the location, with a similarity score of 0.70, likely contains a message or riddle typical of the Joker's communication style, further implicating him in the crime.
  - The description of the Joker in the text evidence matches the visual and auditory evidence, reinforcing the conclusion of his involvement.

- **Depth Evidence:**
  - The depth sensor capture of the suspect with a similarity score of 0.77 suggests a figure matching the Joker's known height and build was present at the crime scene.
  - Although the mysterious note's depth capture has a lower similarity score (0.55), it may indicate the note was left in a hurry or placed in a manner that suggests the Joker's hasty departure from the scene.

**Behavioral Patterns:**

The Joker is known for his love of chaos, use of symbolic markers (like playing cards), and leaving cryptic messages at his crime scenes. His signature is not just the physical evidence he leaves behind but also the psychological impact on the city and its inhabitants. The combination of visual, auditory, and textual clues, along with the depth sensor data, aligns perfectly with the Joker's behavioral patterns and criminal signature.

**Confidence Level:** 95%

**Next Steps:** No further evidence required.

The evidence collected and analyzed from multiple modalities strongly points to the Joker as the prime suspect in the Gotham Central Bank case. The high confidence level is based on the consistency and convergence of evidence across visual, auditory, textual, and depth data, all of which align with the Joker's known characteristics and criminal behavior.
INFO:llm_analyzer:==================================================
INFO:__main__:✅ Forensic report generated successfully
INFO:__main__:
📊 Report Preview:
INFO:__main__:++++++++++++++++++++++++++++++++++++++++++++++++++
INFO:__main__:**Prime Suspect:** The Joker

**Evidence Supporting Conclusion:**

- **Visual Evidence:**
  - The crime scene photo features playing cards scattered around, which is a known signature of the Joker. The presence of sinister graffiti depicting the Joker laughing adds a psychological element of fear and chaos, aligning with his modus operandi. The similarity score of 0.83 indicates a high likelihood that this scene is directly related to the Joker.
  - A photo of the Joker in an urban night setting, with his distinctive green hair, white face paint, and sinister smile, further corroborates his presence in the vicinity of the crime. The similarity score of 0.69 suggests a moderate to high likelihood of his involvement.

- **Auditory Evidence:**
  - A sinister laugh captured near the crime scene with a similarity score of 1.00 perfectly matches the Joker's known laugh, serving as a strong auditory signature of his presence.
  - The second audio piece, despite its lower similarity score of 0.57, still suggests the Joker's involvement due to the unique characteristics of his voice and laughter.

- **Textual Evidence:**
  - The mysterious note found at the location, with a similarity score of 0.70, likely contains a message or riddle typical of the Joker's communication style, further implicating him in the crime.
  - The description of the Joker in the text evidence matches the visual and auditory evidence, reinforcing the conclusion of his involvement.

- **Depth Evidence:**
  - The depth sensor capture of the suspect with a similarity score of 0.77 suggests a figure matching the Joker's known height and build was present at the crime scene.
  - Although the mysterious note's depth capture has a lower similarity score (0.55), it may indicate the note was left in a hurry or placed in a manner that suggests the Joker's hasty departure from the scene.

**Behavioral Patterns:**

The Joker is known for his love of chaos, use of symbolic markers (like playing cards), and leaving cryptic messages at his crime scenes. His signature is not just the physical evidence he leaves behind but also the psychological impact on the city and its inhabitants. The combination of visual, auditory, and textual clues, along with the depth sensor data, aligns perfectly with the Joker's behavioral patterns and criminal signature.

**Confidence Level:** 95%

**Next Steps:** No further evidence required.

The evidence collected and analyzed from multiple modalities strongly points to the Joker as the prime suspect in the Gotham Central Bank case. The high confidence level is based on the consistency and convergence of evidence across visual, auditory, textual, and depth data, all of which align with the Joker's known characteristics and criminal behavior.
INFO:__main__:++++++++++++++++++++++++++++++++++++++++++++++++++

@JessicaGarson
Copy link
Contributor

JessicaGarson commented Feb 25, 2025

I also got this code working with 3.12. I made a change to my Python path, and I changed the way I was running files in the virtual environment. So, I was using python3.12 instead of python when everything started working.

I also had to install an audio backend:

brew install libsndfile

Then, install the Python package:

pip install soundfile

The output I got was:

python3.12 stages/04-stage/rag_crime_analyze.py
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator::robot_face: ImageBind model initialized successfully
INFO:elastic_transport.transport:HEAD [https://getting-started.es.us-east4.gcp.elastic-cloud.com:443/multimodal_content](https://getting-started.es.us-east4.gcp.elastic-cloud.com/multimodal_content) [status:200 duration:0.153s]
INFO:__main__::white_check_mark: All components initialized successfully
INFO:__main__::mag: Collecting evidence...
INFO:elastic_transport.transport:POST [https://getting-started.es.us-east4.gcp.elastic-cloud.com:443/multimodal_content/_search](https://getting-started.es.us-east4.gcp.elastic-cloud.com/multimodal_content/_search) [status:200 duration:0.148s]
INFO:__main__::white_check_mark: Data retrieved for vision: 2 results
INFO:elastic_transport.transport:POST [https://getting-started.es.us-east4.gcp.elastic-cloud.com:443/multimodal_content/_search](https://getting-started.es.us-east4.gcp.elastic-cloud.com/multimodal_content/_search) [status:200 duration:0.153s]
INFO:__main__::white_check_mark: Data retrieved for audio: 2 results
INFO:elastic_transport.transport:POST [https://getting-started.es.us-east4.gcp.elastic-cloud.com:443/multimodal_content/_search](https://getting-started.es.us-east4.gcp.elastic-cloud.com/multimodal_content/_search) [status:200 duration:0.028s]
INFO:__main__::white_check_mark: Data retrieved for text: 2 results
INFO:elastic_transport.transport:POST [https://getting-started.es.us-east4.gcp.elastic-cloud.com:443/multimodal_content/_search](https://getting-started.es.us-east4.gcp.elastic-cloud.com/multimodal_content/_search) [status:200 duration:0.024s]
INFO:__main__::white_check_mark: Data retrieved for depth: 2 results
INFO:__main__:
:memo: Generating forensic report...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llm_analyzer:
:clipboard: Forensic Report Generated:
INFO:llm_analyzer:==================================================
INFO:llm_analyzer:**Prime Suspect:** The Joker
**Evidence Supporting Conclusion:**
- **Visual Evidence:**
  - The photos of the crime scene are highly indicative of the Joker's involvement. The presence of playing cards scattered around a dark, rain-soaked alley and the sinister graffiti of the Joker laughing on a brick wall align with the Joker's known penchant for leaving thematic calling cards at his crime scenes. The similarity score of 0.83 for both photos suggests a high degree of confidence in the relevance of these visual clues to the Joker's modus operandi.
- **Auditory Evidence:**
  - A sinister laugh captured near the crime scene, with a similarity score of 1.00, directly points to the Joker. This laugh, being a signature auditory marker of the Joker, further corroborates the visual evidence and strengthens the connection to him.
  - The description of the Joker with green hair, white face paint, and a sinister smile, despite a lower similarity score of 0.57, still serves as an auditory descriptor that matches public and law enforcement records of the Joker's appearance and behavior during criminal activities.
- **Textual Evidence:**
  - Mysterious notes found at the location, with similarity scores of 0.76, likely contain messages or threats that are consistent with the Joker's communication style. The content of these notes, while not detailed here, presumably includes taunts or riddles that the Joker is known for leaving behind.
- **Depth Evidence:**
  - Depth sensor captures of the suspect with a similarity score of 0.77 provide a digital representation that matches the physical characteristics or posture unique to the Joker. This technological evidence supports the visual and auditory evidence by adding a layer of physical presence at the crime scene.
**Behavioral Patterns:**
- The Joker is known for his theatrical crimes, often involving elaborate setups and symbolic messages. The use of playing cards, sinister graffiti, and mysterious notes are all consistent with his desire to instill fear and chaos. His laugh, a hallmark of his presence, further signifies his personal involvement in the crime.
**Confidence Level:** 95%
**Next Steps:** No further evidence required.
Given the multimodal evidence analyzed, including visual, auditory, textual, and depth data, all signs conclusively point to the Joker as the prime suspect in the Gotham Central Bank case. The combination of thematic elements, personal signatures, and behavioral patterns leaves little doubt regarding his involvement. While absolute certainty in forensic science is rare, the evidence at hand provides a high degree of confidence in identifying the Joker as the perpetrator.
INFO:llm_analyzer:==================================================
INFO:__main__::white_check_mark: Forensic report generated successfully
INFO:__main__:
:bar_chart: Report Preview:
INFO:__main__:++++++++++++++++++++++++++++++++++++++++++++++++++
INFO:__main__:**Prime Suspect:** The Joker
**Evidence Supporting Conclusion:**
- **Visual Evidence:**
  - The photos of the crime scene are highly indicative of the Joker's involvement. The presence of playing cards scattered around a dark, rain-soaked alley and the sinister graffiti of the Joker laughing on a brick wall align with the Joker's known penchant for leaving thematic calling cards at his crime scenes. The similarity score of 0.83 for both photos suggests a high degree of confidence in the relevance of these visual clues to the Joker's modus operandi.
- **Auditory Evidence:**
  - A sinister laugh captured near the crime scene, with a similarity score of 1.00, directly points to the Joker. This laugh, being a signature auditory marker of the Joker, further corroborates the visual evidence and strengthens the connection to him.
  - The description of the Joker with green hair, white face paint, and a sinister smile, despite a lower similarity score of 0.57, still serves as an auditory descriptor that matches public and law enforcement records of the Joker's appearance and behavior during criminal activities.
- **Textual Evidence:**
  - Mysterious notes found at the location, with similarity scores of 0.76, likely contain messages or threats that are consistent with the Joker's communication style. The content of these notes, while not detailed here, presumably includes taunts or riddles that the Joker is known for leaving behind.
- **Depth Evidence:**
  - Depth sensor captures of the suspect with a similarity score of 0.77 provide a digital representation that matches the physical characteristics or posture unique to the Joker. This technological evidence supports the visual and auditory evidence by adding a layer of physical presence at the crime scene.
**Behavioral Patterns:**
- The Joker is known for his theatrical crimes, often involving elaborate setups and symbolic messages. The use of playing cards, sinister graffiti, and mysterious notes are all consistent with his desire to instill fear and chaos. His laugh, a hallmark of his presence, further signifies his personal involvement in the crime.
**Confidence Level:** 95%
**Next Steps:** No further evidence required.
Given the multimodal evidence analyzed, including visual, auditory, textual, and depth data, all signs conclusively point to the Joker as the prime suspect in the Gotham Central Bank case. The combination of thematic elements, personal signatures, and behavioral patterns leaves little doubt regarding his involvement. While absolute certainty in forensic science is rare, the evidence at hand provides a high degree of confidence in identifying the Joker as the perpetrator.
INFO:__main__:++++++++++++++++++++++++++++++++++++++++++++++++++

Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
@codefromthecrypt
Copy link
Collaborator

I have docker working just about locally and plan to push a commit to your branch including some polishing i mentioned. Will have it up tomorrow

@salgado
Copy link
Contributor Author

salgado commented Feb 26, 2025

I have docker working just about locally and plan to push a commit to your branch including some polishing i mentioned. Will have it up tomorrow

@codefromthecrypt @JessicaGarson, thanks again for reviewing and running the code.

@codefromthecrypt, could you comment here with the Dockerfile so I can try to replicate and test it today while you haven't pushed it yet?

Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
@codefromthecrypt
Copy link
Collaborator

@salgado I pushed the commits before my last comment, so you should be able to pull them in and test. you can do whatever you like after as well. I just didn't want to block you.

p.s. formatting was needed to pass the linter (pre-commit run -a), so I did that

Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
@salgado
Copy link
Contributor Author

salgado commented Feb 26, 2025

@codefromthecrypt I just ran the Docker command, and it worked!!

docker compose run --build --rm search-and-analyze

[+] Building 2.8s (39/39) FINISHED docker:desktop-linux
=> [verify-file-structure internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.36kB 0.0s
=> [index-content internal] load metadata for docker.io/library/python:3.12 2.4s
=> [verify-file-structure auth] library/python:pull token for registry-1.docker.io 0.0s
=> [verify-file-structure internal] load .dockerignore 0.0s
=> => transferring context: 143B 0.0s
=> [index-content 1/8] FROM docker.io/library/python:3.12@sha256:f11c627a0a754fb45a2378790d7666c4aa85720e08c92538e0a4819b9 0.0s
=> [verify-file-structure internal] load build context 0.0s
=> => transferring context: 1.61kB 0.0s
=> CACHED [verify-file-structure 2/8] COPY /requirements.txt . 0.0s
=> CACHED [verify-file-structure 3/8] RUN apt-get update && apt-get install -y --no-install-recommends libgeos 0.0s
=> CACHED [verify-file-structure 4/8] WORKDIR /app 0.0s
=> CACHED [verify-file-structure 5/8] RUN mkdir -p ./data ./src ./stages 0.0s
=> CACHED [verify-file-structure 6/8] COPY ./data ./data 0.0s
=> CACHED [verify-file-structure 7/8] COPY ./src ./src 0.0s
=> CACHED [verify-file-structure 8/8] COPY ./stages ./stages 0.0s
=> [verify-file-structure] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:8ce5f79f882ba176249d56ce10775188f15742e2c341ffa148468be56ddb0a3a 0.0s
=> => naming to docker.io/library/gotham-city-crime-analysis-verify-file-structure 0.0s
=> [verify-file-structure] resolving provenance for metadata file 0.0s
=> [generate-embeddings internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.36kB 0.0s
=> [generate-embeddings internal] load .dockerignore 0.0s
=> => transferring context: 143B 0.0s
=> [generate-embeddings internal] load build context 0.0s
=> => transferring context: 1.61kB 0.0s
=> CACHED [generate-embeddings 2/8] COPY /requirements.txt . 0.0s
=> CACHED [generate-embeddings 3/8] RUN apt-get update && apt-get install -y --no-install-recommends libgeos-d 0.0s
=> CACHED [generate-embeddings 4/8] WORKDIR /app 0.0s
=> CACHED [generate-embeddings 5/8] RUN mkdir -p ./data ./src ./stages 0.0s
=> CACHED [generate-embeddings 6/8] COPY ./data ./data 0.0s
=> CACHED [generate-embeddings 7/8] COPY ./src ./src 0.0s
=> CACHED [generate-embeddings 8/8] COPY ./stages ./stages 0.0s
=> [generate-embeddings] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:d586f172af2581c35807d14046bee1357d694b1485c5bd1d5a5caa55dd03674c 0.0s
=> => naming to docker.io/library/gotham-city-crime-analysis-generate-embeddings 0.0s
=> [generate-embeddings] resolving provenance for metadata file 0.0s
=> [index-content internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.36kB 0.0s
=> [index-content internal] load .dockerignore 0.0s
=> => transferring context: 143B 0.0s
=> [index-content internal] load build context 0.0s
=> => transferring context: 1.61kB 0.0s
=> CACHED [index-content 2/8] COPY /requirements.txt . 0.0s
=> CACHED [index-content 3/8] RUN apt-get update && apt-get install -y --no-install-recommends libgeos-dev 0.0s
=> CACHED [index-content 4/8] WORKDIR /app 0.0s
=> CACHED [index-content 5/8] RUN mkdir -p ./data ./src ./stages 0.0s
=> CACHED [index-content 6/8] COPY ./data ./data 0.0s
=> CACHED [index-content 7/8] COPY ./src ./src 0.0s
=> CACHED [index-content 8/8] COPY ./stages ./stages 0.0s
=> [index-content] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:07b1a4a3372cb6cbc0c87a417c0dc02eeda79ec47da4e591011d2e151ade093e 0.0s
=> => naming to docker.io/library/gotham-city-crime-analysis-index-content 0.0s
=> [index-content] resolving provenance for metadata file 0.0s
[+] Creating 5/5
✔ generate-embeddings Built 0.0s
✔ index-content Built 0.0s
✔ verify-file-structure Built 0.0s
✔ Container verify-file-structure Created 0.0s
✔ Container generate-embeddings Created 0.0s
[+] Running 3/3
✔ Container verify-file-structure Exited 0.7s
✔ Container generate-embeddings Exited 26.6s
✔ Container index-content Started 0.2s
[+] Building 0.8s (14/14) FINISHED docker:desktop-linux
=> [search-and-analyze internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.36kB 0.0s
=> [search-and-analyze internal] load metadata for docker.io/library/python:3.12 0.7s
=> [search-and-analyze internal] load .dockerignore 0.0s
=> => transferring context: 143B 0.0s
=> [search-and-analyze 1/8] FROM docker.io/library/python:3.12@sha256:f11c627a0a754fb45a2378790d7666c4aa85720e08c92538e0a4 0.0s
=> [search-and-analyze internal] load build context 0.0s
=> => transferring context: 1.61kB 0.0s
=> CACHED [search-and-analyze 2/8] COPY /requirements.txt . 0.0s
=> CACHED [search-and-analyze 3/8] RUN apt-get update && apt-get install -y --no-install-recommends libgeos-de 0.0s
=> CACHED [search-and-analyze 4/8] WORKDIR /app 0.0s
=> CACHED [search-and-analyze 5/8] RUN mkdir -p ./data ./src ./stages 0.0s
=> CACHED [search-and-analyze 6/8] COPY ./data ./data 0.0s
=> CACHED [search-and-analyze 7/8] COPY ./src ./src 0.0s
=> CACHED [search-and-analyze 8/8] COPY ./stages ./stages 0.0s
=> [search-and-analyze] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:915e3e4ee9a744f3b22166658fc2e9e6f866977c245c2546acccea43ce6492f3 0.0s
=> => naming to docker.io/library/gotham-city-crime-analysis-search-and-analyze 0.0s
=> [search-and-analyze] resolving provenance for metadata file 0.0s
Downloading ImageBind weights...
100%|████████████████████████████████████████████████████████████████████████████████████████| 4.47G/4.47G [12:47<00:00, 6.26MB/s]
INFO:embedding_generator:Testing model with sample input...
INFO:embedding_generator:🤖 ImageBind model initialized successfully
INFO:elastic_transport.transport:HEAD https://b52aec2b64e745b6a86f613315600704.us-east-2.aws.elastic-cloud.com:443/multimodal_content [status:200 duration:0.646s]
INFO:main:✅ All components initialized successfully
INFO:main:🔍 Collecting evidence...
INFO:elastic_transport.transport:POST https://b52aec2b64e745b6a86f613315600704.us-east-2.aws.elastic-cloud.com:443/multimodal_content/_search [status:200 duration:0.686s]
INFO:main:✅ Data retrieved for vision: 2 results
INFO:elastic_transport.transport:POST https://b52aec2b64e745b6a86f613315600704.us-east-2.aws.elastic-cloud.com:443/multimodal_content/_search [status:200 duration:0.379s]
INFO:main:✅ Data retrieved for audio: 2 results
INFO:elastic_transport.transport:POST https://b52aec2b64e745b6a86f613315600704.us-east-2.aws.elastic-cloud.com:443/multimodal_content/_search [status:200 duration:0.357s]
INFO:main:✅ Data retrieved for text: 2 results
INFO:elastic_transport.transport:POST https://b52aec2b64e745b6a86f613315600704.us-east-2.aws.elastic-cloud.com:443/multimodal_content/_search [status:200 duration:0.191s]
INFO:main:✅ Data retrieved for depth: 2 results
INFO:main:
📝 Generating forensic report...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llm_analyzer:
📋 Forensic Report Generated:
INFO:llm_analyzer:==================================================
INFO:llm_analyzer:Prime Suspect: The Joker

Evidence Supporting Conclusion:

  • Visual Evidence: The photos of the crime scene are highly indicative of the Joker's involvement. The presence of playing cards scattered around a dark, rain-soaked alley and the graffiti of the Joker laughing are symbolic markers closely associated with the Joker's modus operandi. The similarity score of 0.83 for both photos suggests a high degree of confidence in the visual match to known Joker-related crime scenes.

  • Auditory Evidence: The sinister laugh captured near the crime scene, with a similarity score of 1.00, directly points to the Joker. This laugh is a distinctive auditory signature of the Joker, known to be heard at the scenes of his crimes and is a form of psychological warfare against his victims and Gotham City at large.

  • Textual Evidence: The mysterious note found at the location, with a similarity score of 0.76, suggests it may contain a message or riddle typical of the Joker's communication style. Although the content of the note is not detailed here, the Joker is known for leaving taunting messages at his crime scenes, which often play into his chaotic and anarchistic themes.

  • Depth Evidence: The depth sensor capture of the suspect, with a similarity score of 0.77, while not as high as the auditory evidence, still suggests a match to the Joker's physical profile. Given the Joker's known presence in Gotham and his history with the Gotham Central Bank, this evidence further supports his identification.

Behavioral Patterns:

The Joker's criminal signature includes the use of playing cards, sinister graffiti, and taunting notes left at his crime scenes. These elements are not only trademarks of his identity but also serve to instill fear and chaos. His motives often revolve around creating anarchy and challenging Batman, rather than financial gain, which aligns with the theatrical and high-profile nature of the crime at the Gotham Central Bank.

Confidence Level: 95%

The evidence collectively points to the Joker with high confidence. The visual, auditory, and textual clues align closely with his known behavioral patterns and criminal signature. The depth sensor capture, while slightly less conclusive, still supports the identification based on physical appearance.

Next Steps: No further evidence required.

The combination of multimodal evidence strongly supports the conclusion that the Joker is the prime suspect. Additional evidence, such as forensic analysis of the playing cards or further examination of the mysterious note for fingerprints or DNA, could provide supplementary confirmation but is not necessary for a confident identification.
INFO:llm_analyzer:==================================================
INFO:main:✅ Forensic report generated successfully
INFO:main:
📊 Report Preview:
INFO:main:++++++++++++++++++++++++++++++++++++++++++++++++++
INFO:main:Prime Suspect: The Joker

Evidence Supporting Conclusion:

  • Visual Evidence: The photos of the crime scene are highly indicative of the Joker's involvement. The presence of playing cards scattered around a dark, rain-soaked alley and the graffiti of the Joker laughing are symbolic markers closely associated with the Joker's modus operandi. The similarity score of 0.83 for both photos suggests a high degree of confidence in the visual match to known Joker-related crime scenes.

  • Auditory Evidence: The sinister laugh captured near the crime scene, with a similarity score of 1.00, directly points to the Joker. This laugh is a distinctive auditory signature of the Joker, known to be heard at the scenes of his crimes and is a form of psychological warfare against his victims and Gotham City at large.

  • Textual Evidence: The mysterious note found at the location, with a similarity score of 0.76, suggests it may contain a message or riddle typical of the Joker's communication style. Although the content of the note is not detailed here, the Joker is known for leaving taunting messages at his crime scenes, which often play into his chaotic and anarchistic themes.

  • Depth Evidence: The depth sensor capture of the suspect, with a similarity score of 0.77, while not as high as the auditory evidence, still suggests a match to the Joker's physical profile. Given the Joker's known presence in Gotham and his history with the Gotham Central Bank, this evidence further supports his identification.

Behavioral Patterns:

The Joker's criminal signature includes the use of playing cards, sinister graffiti, and taunting notes left at his crime scenes. These elements are not only trademarks of his identity but also serve to instill fear and chaos. His motives often revolve around creating anarchy and challenging Batman, rather than financial gain, which aligns with the theatrical and high-profile nature of the crime at the Gotham Central Bank.

Confidence Level: 95%

The evidence collectively points to the Joker with high confidence. The visual, auditory, and textual clues align closely with his known behavioral patterns and criminal signature. The depth sensor capture, while slightly less conclusive, still supports the identification based on physical appearance.

Next Steps: No further evidence required.

The combination of multimodal evidence strongly supports the conclusion that the Joker is the prime suspect. Additional evidence, such as forensic analysis of the playing cards or further examination of the mysterious note for fingerprints or DNA, could provide supplementary confirmation but is not necessary for a confident identification.
INFO:main:++++++++++++++++++++++++++++++++++++++++++++++++++

@codefromthecrypt
Copy link
Collaborator

ps if my docker stuff is causing more harm than good, please remove Dockerfile, docker-compose.yml and .dockerignore, and corresponding adds to README.md.

There were some other things I polished so reverting everything I did may be throwing out the baby with the bathwater. In any case, I'm glad the code is working however it is intended to be run. Good job!

@salgado
Copy link
Contributor Author

salgado commented Feb 27, 2025

@carlyrichmond , @JessicaGarson and @codefromthecrypt

I believe this blog will provide significantly more value to our audience by presenting two distinct execution methods: Docker and Jupyter Notebook. And we don’t need to discard any of the latest excellent contributions.

Docker provides a production-ready, reproducible environment that ensures consistent execution regardless of the user's setup. This is ideal for deployment scenarios and users who prefer containerized solutions.

On the other hand, the Jupyter Notebook option creates an interactive learning experience, allowing users to experiment with the code step by step, making it much more accessible for educational purposes and quick exploration.

By maintaining both options, we are not duplicating code but rather offering flexibility in how users can engage with the same underlying technology. This approach accommodates both technical users who need reproducibility and learners who benefit from interactivity.

Please check the update I made in the README.md, where I clearly explain both approaches while keeping the core functionality identical. This inclusive approach will help us reach a broader audience – from ML engineers setting up production pipelines to data scientists exploring multimodal techniques for the first time.

@carlyrichmond
Copy link
Contributor

carlyrichmond commented Feb 27, 2025

@salgado I would recommend picking one option. Having two options makes it harder to maintain the example and could cause confusion on which one is better to pick.

@salgado
Copy link
Contributor Author

salgado commented Feb 27, 2025

Ok, so I made the adjustments to keep only the Jupyter Notebook.

@carlyrichmond
Copy link
Contributor

Ok, so I made the adjustments to keep only the Jupyter Notebook.

Good to know @salgado! Can we confirm that others are able to run the notebook please? 🤞 If not I would add Docker back in.

@JessicaGarson
Copy link
Contributor

@carlyrichmond @salgado. I could run it with a few enhancements (adding a few import statements and extra pip installs). I committed the improvements to the file.

@carlyrichmond
Copy link
Contributor

As discussed with @JessicaGarson, Jess is going to run and check things once more before it's merged. After merge @salgado please raise a PR to update the git clone command. Then if @JessicaGarson and @justincastilla can quickly check it still runs that would be great.

Thanks all! If you need help ping me!

@JessicaGarson JessicaGarson merged commit 21cfcc6 into elastic:main Feb 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants