Methodology
How GreenScore calculates CO₂ emissions. Radical transparency is our differentiator.
The 6-source pipeline
Every scoring request runs through two phases:
Phase 1: Fast sources (parallel, <500ms)
AGRIBALYSE via Open Food Facts
Direct LCA match for food products with barcode. Highest confidence (0.93-0.99).
Archetype default (AGRIBALYSE category average)
When a barcode isn't available, the AI-classified product archetype maps to an AGRIBALYSE category average. Confidence ~0.72.
Emission factors database (barcode + fuzzy name)
79,800 factors from ADEME, DEFRA, EPA. Barcode exact match (0.93) or trigram name search (0.80).
If any Phase 1 source has confidence ≥ 0.80, Phase 2 is skipped for faster response.
Phase 2: Extended sources (parallel, 2s timeout)
Cached mappings + local DB category lookup
Tag-based and subcategory matching against the full factor database. Confidence 0.40-0.80.
Vector search (embedding-based factor retrieval)
Product description is embedded with Gemini and matched against 79K factor embeddings via cosine similarity. Confidence varies by match quality (0.10-0.75).
Open CEDA (spend-based, 169 countries)
Uses the AI-estimated product price and BEA sector mapping to derive CO₂ from economic input-output data. With CPI deflation to base year. Confidence ~0.20.
IDEMAT material decomposition
Uses the AI-classified material composition (e.g. 60% HDPE, 40% paper) and per-material LCA factors. Confidence ~0.15.
Cross-validation
When 3 or more sources produce a result:
- Sources are sorted by confidence (highest first).
- The median of all non-primary source values is computed.
- If the primary source is >3x or <1/3x the median, it's rejected as an outlier.
- Remaining sources within 2x of the median are kept.
- If the top 2 sources agree within ±30%, a weighted average is used for higher accuracy.
Dynamic confidence scoring
Confidence is not a fixed number per source. It's dynamically calibrated based on match quality signals:
+ Exact barcode match
+0.03
+ LCA phases available
+0.02
+ Same region factor
+0.02
+ Multiple sources agree
+0.05
- Generic category match
-0.10
- Cross-region factor
-0.05
- Low vector similarity
-0.15
- Weight AI-estimated
-0.05
Temporal decay also applies: factors >5 years old get a 0.85x multiplier, >10 years get 0.70x.
Eco grade thresholds
| Grade | Food (kg CO₂e/kg) | General (kg CO₂e/kg) | Spend (kg CO₂e/EUR) |
|---|---|---|---|
| A | < 0.9 | < 1.0 | < 0.05 |
| B | 0.9 - 2.0 | 1.0 - 3.0 | 0.05 - 0.2 |
| C | 2.0 - 4.0 | 3.0 - 8.0 | 0.2 - 0.5 |
| D | 4.0 - 8.0 | 8.0 - 20.0 | 0.5 - 1.0 |
| E | > 8.0 | > 20.0 | > 1.0 |
Data quality commitment
- All emission factors traceable to original source with URL.
- Factor database updated quarterly (new sources, version bumps).
- Confidence scores reflect actual match quality, not fixed values.
- Uncertainty ranges provided when sources disagree.
- Gemini never estimates CO₂ — it only classifies products.
- Cross-validation rejects outliers automatically.
- Full changelog of factor additions/removals published.