Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: reviewing the Ecoscore.pm program and fixing the product_ecosco…
…re.yaml schema (#10875) docs: reviewing the Ecoscore.pm program and fixing the product_ecoscore.yaml schema =================================================================================== This pull request applies to: ``` lib/ProductOpener/Ecoscore.pm docs/api/ref/schemas/product_ecoscore.yaml ``` Actually, nothing is modified in `Ecoscore.pm`. Within the "ecoscore_data/adjustments/origins_of_ingredients", "ecoscore_data/grades" and "ecoscore_data/scores" hierarchies, we find many key-values pairs, where the key is a 2-char code, or the string "world". According to `product_ecoscore.yaml`, the 2-char code is a language code and the 5-char string "world" is not allowed. On the other hand, when reading `Ecoscore.pm`, we find the hard-coded string "world" together with a list `@ecoscore_countries_enabled_sorted`, initialised not with language codes, but with actual country codes: "uk" instead of "en", "be" in addition to "nl" and "fr" for example. So the schema is fixed to rename "language_code" to "country_code" and to allow property "world". Another point, with the "ecoscore_data/adjustments/origin_of_ingredients" composite property. This property includes a string array "origins_from_origins_field", while the actual data records (and the `Ecoscore.pm` program) include both the string array "origins_from_origins_field" and the string array "origins_from_categories". The pull request fixes this. Not included in the pull request: the "ecoscore_data/adjustments/packaging" contains a property "non_recyclable_and_non_biodegradable_materials" and a property "packagings" (plural), which is an array of objects. According to the JSON data files, sometimes (not often) these inner objects include a "non_recyclable_and_non_biodegradable" property (which is different, albeit related to "non_recyclable_and_non_biodegradable_materials" at the outer level). According to the YAML schema file, this property does not exist. Should it be added to the schema file? Not included in the pull request, some stylistic issues. For example, the following lines ``` $agribalyse{$row_ref->[0]} = { code => $row_ref->[0], # Agribalyse code = Ciqual code name_fr => $row_ref->[4], # Nom du Produit en Français name_en => $row_ref->[5], # LCI Name dqr => $row_ref->[6], # DQR (data quality rating) # warning: the AGB file has a hidden H column ef_agriculture => $row_ref->[8] + 0, # Agriculture ef_processing => $row_ref->[9] + 0, # Transformation ef_packaging => $row_ref->[10] + 0, # Emballage ef_transportation => $row_ref->[11] + 0, # Transport ef_distribution => $row_ref->[12] + 0, # Supermarché et distribution ef_consumption => $row_ref->[13] + 0, # Consommation ef_total => $row_ref->[14] + 0, # Total co2_agriculture => $row_ref->[15] + 0, # Agriculture co2_processing => $row_ref->[16] + 0, # Transformation co2_packaging => $row_ref->[17] + 0, # Emballage co2_transportation => $row_ref->[18] + 0, # Transport co2_distribution => $row_ref->[19] + 0, # Supermarché et distribution co2_consumption => $row_ref->[20] + 0, # Consommation co2_total => $row_ref->[21] + 0, # Total version => $agribalyse_version }; ``` should be formatted as: ``` $agribalyse{$row_ref->[0]} = { code => $row_ref->[ 0], # Agribalyse code = Ciqual code name_fr => $row_ref->[ 4], # Nom du Produit en Français name_en => $row_ref->[ 5], # LCI Name dqr => $row_ref->[ 6], # DQR (data quality rating) # warning: the AGB file has a hidden H column ef_agriculture => $row_ref->[ 8] + 0, # Agriculture ef_processing => $row_ref->[ 9] + 0, # Transformation ef_packaging => $row_ref->[10] + 0, # Emballage ef_transportation => $row_ref->[11] + 0, # Transport ef_distribution => $row_ref->[12] + 0, # Supermarché et distribution ef_consumption => $row_ref->[13] + 0, # Consommation ef_total => $row_ref->[14] + 0, # Total co2_agriculture => $row_ref->[15] + 0, # Agriculture co2_processing => $row_ref->[16] + 0, # Transformation co2_packaging => $row_ref->[17] + 0, # Emballage co2_transportation => $row_ref->[18] + 0, # Transport co2_distribution => $row_ref->[19] + 0, # Supermarché et distribution co2_consumption => $row_ref->[20] + 0, # Consommation co2_total => $row_ref->[21] + 0, # Total version => $agribalyse_version }; ``` See Perl Best Practices page 26, "Vertical Alignment". And while I am browsing PBP, maybe you should indent with spaces instead of tabs (p 20). While browsing test data, I have looked at the "transportation_scores" property. Most often, this property contains key-value pairs in which the value is zero. Sometimes, the values are integer numbers, like in products "0052833225082", "0078742102047", "2241447012920", "3270160503070", "3451790834080", "4063500001669", "8033049610109", "8411945200226". This is compatible with what the YAML schema file says. But product "04083637" has float values (fractional part is either 0.3333...33 plus a random last digit or 0.6666...66 plus a random last digit). Product "2625078016210" has floating transportation scores with a 2-digit fractional part. In an old test file (not in the recent file `openfoodfacts-products.jsonl.gz`), product "5601009974337" had float values which were actually integer values plus a rounding error, such as 12.000000000000002 or 51.00000000000001. Is there a problem with these three products?
- Loading branch information