Methodology

How GreenScore calculates CO₂ emissions. Radical transparency is our differentiator.

Key principle: Gemini AI classifies products (name, category, materials). It never estimates CO₂. All emission data comes from verified databases listed below.

The 6-source pipeline

Every scoring request runs through two phases:

Phase 1: Fast sources (parallel, <500ms)

S1

AGRIBALYSE via Open Food Facts

Direct LCA match for food products with barcode. Highest confidence (0.93-0.99).

S1b

Archetype default (AGRIBALYSE category average)

When a barcode isn't available, the AI-classified product archetype maps to an AGRIBALYSE category average. Confidence ~0.72.

S2

Emission factors database (barcode + fuzzy name)

79,800 factors from ADEME, DEFRA, EPA. Barcode exact match (0.93) or trigram name search (0.80).

If any Phase 1 source has confidence ≥ 0.80, Phase 2 is skipped for faster response.

Phase 2: Extended sources (parallel, 2s timeout)

S3

Cached mappings + local DB category lookup

Tag-based and subcategory matching against the full factor database. Confidence 0.40-0.80.

S4

Vector search (embedding-based factor retrieval)

Product description is embedded with Gemini and matched against 79K factor embeddings via cosine similarity. Confidence varies by match quality (0.10-0.75).

S5

Open CEDA (spend-based, 169 countries)

Uses the AI-estimated product price and BEA sector mapping to derive CO₂ from economic input-output data. With CPI deflation to base year. Confidence ~0.20.

S6

IDEMAT material decomposition

Uses the AI-classified material composition (e.g. 60% HDPE, 40% paper) and per-material LCA factors. Confidence ~0.15.

Cross-validation

When 3 or more sources produce a result:

  1. Sources are sorted by confidence (highest first).
  2. The median of all non-primary source values is computed.
  3. If the primary source is >3x or <1/3x the median, it's rejected as an outlier.
  4. Remaining sources within 2x of the median are kept.
  5. If the top 2 sources agree within ±30%, a weighted average is used for higher accuracy.

Dynamic confidence scoring

Confidence is not a fixed number per source. It's dynamically calibrated based on match quality signals:

+ Exact barcode match

+0.03

+ LCA phases available

+0.02

+ Same region factor

+0.02

+ Multiple sources agree

+0.05

- Generic category match

-0.10

- Cross-region factor

-0.05

- Low vector similarity

-0.15

- Weight AI-estimated

-0.05

Temporal decay also applies: factors >5 years old get a 0.85x multiplier, >10 years get 0.70x.

Eco grade thresholds

GradeFood (kg CO₂e/kg)General (kg CO₂e/kg)Spend (kg CO₂e/EUR)
A< 0.9< 1.0< 0.05
B0.9 - 2.01.0 - 3.00.05 - 0.2
C2.0 - 4.03.0 - 8.00.2 - 0.5
D4.0 - 8.08.0 - 20.00.5 - 1.0
E> 8.0> 20.0> 1.0

Data quality commitment

  1. All emission factors traceable to original source with URL.
  2. Factor database updated quarterly (new sources, version bumps).
  3. Confidence scores reflect actual match quality, not fixed values.
  4. Uncertainty ranges provided when sources disagree.
  5. Gemini never estimates CO₂ — it only classifies products.
  6. Cross-validation rejects outliers automatically.
  7. Full changelog of factor additions/removals published.