Skip to content

Commit

Permalink
build: do not import sample data in GitHub PR checks for make_dev (#1…
Browse files Browse the repository at this point in the history
…1070)

We currently have an action to run "make dev" in GitHub PR checks. This
loads product .sto files and images from static.openfoodfacts.org, and
it can be quite slow.

e.g. this test has been running for 3 hours:

https://github.com/openfoodfacts/openfoodfacts-server/actions/runs/12082629626/job/33694707420?pr=11068

also added smaller dump:

```
off@off:/srv/off/html/exports$ (off) ls -lrt products.random-modulo-100*
-rw-r--r-- 1 off off   552875078 May 20  2024 products.random-modulo-100.tar.gz
-rw-r--r-- 1 off off    59335855 Nov 23 12:55 products.random-modulo-1000.tar.gz
-rw-r--r-- 1 off off 13099413539 Nov 23 14:04 products.random-modulo-1000.images.tar.gz
-rw-r--r-- 1 off off     8866855 Nov 23 14:05 products.random-modulo-1000.jsonl.gz
-rw-r--r-- 1 off off    10660049 Nov 23 14:05 products.random-modulo-1000.mongodbdump.gz
-rw-r--r-- 1 off off     5520665 Nov 29 12:09 products.random-modulo-10000.tar.gz
-rw-r--r-- 1 off off  1257096631 Nov 29 12:12 products.random-modulo-10000.images.tar.gz
-rw-r--r-- 1 off off      903732 Nov 29 12:12 products.random-modulo-10000.jsonl.gz
-rw-r--r-- 1 off off     1082138 Nov 29 12:12 products.random-modulo-10000.mongodbdump.gz
-rw-r--r-- 1 off off      451429 Nov 29 13:44 products.random-modulo-100000.tar.gz
-rw-r--r-- 1 off off   118325376 Nov 29 13:45 products.random-modulo-100000.images.tar.gz
-rw-r--r-- 1 off off       82747 Nov 29 13:45 products.random-modulo-100000.jsonl.gz
-rw-r--r-- 1 off off       99513 Nov 29 13:45 products.random-modulo-100000.mongodbdump.gz
```

---------

Co-authored-by: Alex Garel <alex@garel.org>
  • Loading branch information
stephanegigandet and alexgarel authored Nov 29, 2024
1 parent f77eb82 commit 16f36ca
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 16 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ jobs:
echo "export USER_GID=$(id -g)" >> .envrc
- name: Test make dev
run: |
make DOCKER_LOCAL_DATA="$(pwd)" dev
make DOCKER_LOCAL_DATA="$(pwd)" SKIP_SAMPLE_IMAGES=1 dev
make status
- name: Test all is running
run: make livecheck || ( tail -n 300 logs/apache2/*error*log; docker compose logs; false )
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ SHELL := $(shell which bash)
ENV_FILE ?= .env
NAME = "ProductOpener"
MOUNT_POINT ?= /mnt
# in CI, in make dev we want to skip downloading sample images (too slow)
SKIP_SAMPLE_IMAGES ?= SKIP_SAMPLE_IMAGES
DOCKER_LOCAL_DATA_DEFAULT = /srv/off/docker_data
DOCKER_LOCAL_DATA ?= $(DOCKER_LOCAL_DATA_DEFAULT)
OS := $(shell uname)
Expand Down
15 changes: 10 additions & 5 deletions scripts/gen_feeds_daily.sh
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,20 @@ cd $OFF_SCRIPTS_DIR
./mongodb_dump.sh $OFF_PUBLIC_DATA_DIR $PRODUCT_OPENER_FLAVOR $MONGODB_HOST $PRODUCT_OPENER_FLAVOR_SHORT

# Small products data and images export for Docker dev environments
# for about 1/10000th of the products contained in production.
./export_products_data_and_images.pl --sample-mod 10000,0 \
# for about 1/100000th of the products contained in production.
./export_products_data_and_images.pl --sample-mod 100000,0 \
--products-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-100000.tar.gz \
--images-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-100000.images.tar.gz \
--jsonl-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-100000.jsonl.gz \
--mongo-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-100000.mongodbdump.gz
# On saturday, export modulo 1000 and 10000 for larger sample
if [ "$(date +%u)" = "6" ]
then
./export_products_data_and_images.pl --sample-mod 10000,0 \
--products-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-10000.tar.gz \
--images-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-10000.images.tar.gz \
--jsonl-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-10000.jsonl.gz \
--mongo-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-10000.mongodbdump.gz
# On saturday, export modulo 1000 for larger sample
if [ "$(date +%u)" = "6" ]
then
./export_products_data_and_images.pl --sample-mod 1000,0 \
--products-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-1000.tar.gz \
--images-file $OFF_PUBLIC_EXPORTS_DIR/products.random-modulo-1000.images.tar.gz \
Expand Down
15 changes: 10 additions & 5 deletions scripts/import_sample_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,19 @@ cd /tmp
echo "\033[32m------------------ 1/ Retrieve products -----------------\033[0m";
# explicitly specify the wget output file name so that wget does not append .1 if already present
# e.g. if the tar command failed and the script was stopped
wget -O products.tar.gz https://static.openfoodfacts.org/exports/products.random-modulo-10000.tar.gz 2>&1
wget -O products.tar.gz https://static.openfoodfacts.org/exports/products.random-modulo-100000.tar.gz 2>&1
tar -xzvf products.tar.gz -C /mnt/podata/products
rm products.tar.gz

echo "\033[32m------------------ 2/ Retrieve product images -------------------\033[0m";
wget -O products.images.tar.gz https://static.openfoodfacts.org/exports/products.random-modulo-10000.images.tar.gz 2>&1
tar -xzvf products.images.tar.gz -C /opt/product-opener/html/images/products/
rm products.images.tar.gz
if [[ -z "${SKIP_SAMPLE_IMAGES}" ]]
then
echo "\033[32m------------------ 2/ Retrieve product images -------------------\033[0m";
wget -O products.images.tar.gz https://static.openfoodfacts.org/exports/products.random-modulo-100000.images.tar.gz 2>&1
tar -xzvf products.images.tar.gz -C /opt/product-opener/html/images/products/
rm products.images.tar.gz
else
echo "\033[32m------------------ SKIPPED product images -------------------\033[0m";
fi

echo "\033[32m------------------ 3/ Import products -------------------\033[0m";
perl -I/opt/product-opener/lib /opt/product-opener/scripts/update_all_products_from_dir_in_mongodb.pl
Expand Down
6 changes: 1 addition & 5 deletions taxonomies/food/categories.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68440,10 +68440,6 @@ tr: Armut
intake24_category_code:en: PEAR
wikidata:en: Q13099586

< en:Pears
en: Pears (Guyot)
fr: Poires Guyot

< en:Pears
it: Pera dell'Emilia Romagna
origins:en: en:italy
Expand All @@ -68458,7 +68454,7 @@ hr: Konferencijske kruške
nl: Conference peren

< en:Pears
en: Guyot pears
en: Guyot pears, Pears (Guyot)
xx: Guyot
fr: Poires Guyot
wikidata:en: Q3033517
Expand Down

0 comments on commit 16f36ca

Please sign in to comment.