Skip to content

Latest commit

 

History

History
45 lines (27 loc) · 2.66 KB

README.md

File metadata and controls

45 lines (27 loc) · 2.66 KB

Gluten-free products discovery

This is a project for collecting gluten-free products from various online stores in Lithuania and displaying them in Streamlit app.

Streamlit App

Data collection

Data has been collected from the following websites:

Raw data was collected manually by selecting particular gluten free products from these websites and saved as txt files. This step is expected to be automated in the future.

Data parsing

Data parsing was performed using Python scripts by creating parsers for each website. Each parser defines the logic how the fields should be extracted. Script extract_raw_data.py is used to parse data from all txt files, extract required fields and write prepared tables to BigQuery.

Data transformation

Data transformation step was performed using dbt. Source tables were defined for each website, base staging models were created for each website, that later on were unioned into one staging model for all websites. In the application layer, data cleaning steps were performed, data tests were added, category mapping was included using seeds, similar products logic was implemented, and final model app_final_products.sql was created.

Data application

Finally, transformed data is used for creating data application using Streamlit. Application has 3 tabs:

  • Gluten-free products discovery
  • Gluten-free products exploration
  • Gluten-free products visualization

App default page

Discovery tab can be used for discovering products. By default, we can see all available products (showing only 100), but we can apply various filters or search for a specific keyword to narrow down the search results. For each of the product we can see website where the product was found, product name with clickable url, prodcut brand, weight, price, description, ingredients, nutrition info, as well as explore each product in more details.

When we click Explore in more details button, we can explore the product in Exploration tab, where we can find more details of each prodcuts and similar products found on other websites.

In visualization tab we can see some barcharts displaying product count by website, by category, or by brand.