12 Subcriterion A2b

Reproducible workflow using Python

This document shows an example calculation of criterion B using data assets and functions available on Earth Engine (EE) from a notebook running Python.

12.1 Updating the assessment workflow

The original workflow included a combination of two remote sensing products (Friedl et al. 2010; Hansen et al. 2013). The updated workflow is based on an updated version of one data source that spans more years and uses improved algorithms. We have two version of the workflow, one based on R and local copies of the remote sensing products (see Section 11.1), and one based on Python and Earth Engine for cloud computing (this document).

Differences between original and updated assessment workflows for subcriterion A2b.

Step	2019	2025 (R)	2025 (Python)
GIS data prep.	`GRASS GIS` / `gdal` / `R`	`R` / `gdal` / `terra`	`Earth Engine`
Data source	GFC v1.2 (2015) and MODIS LC v5.1	GFC v1.12 (2024)	GFC v1.12 (2024)
Analysis	`R` / `glm`	`R` / `glm` / `redlistr`	`Python` / `Earth Engine`
Timeframe	2000-2050	2000-2050	2000-2050

Part of the code is based on examples from Earth Engine Guides.

12.2 Set up

Import modules

import ee
import geemap.core as geemap
import pandas as pd
import pyreadr

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

Authenticate to Earth Engine

myproject='ee-jrferrerparis'

ee.Authenticate()
ee.Initialize(project=myproject)

Read inputs

Here we will use three assets, two of them are available in Earth Engine’s data catalog:

gfc=ee.Image('UMD/hansen/global_forest_change_2024_v1_12')
countries=ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017')

The third asset has been uploaded from (FerrerParis20025?) into my personal project:

macrogroups=ee.Image('projects/{}/assets/IVC_NS_v7_270m_cea'.format(myproject))

12.3 Prepare the data

First we need to select the target macrogroup, in this case the raster value is the ID, so value 563 returns the potential distribution of M563:

MG563 = macrogroups.eq(563);

Then we define a region of interest, in this case we know this macrogroup is present in three countries: Colombia, Venezuela, and Trinidad and Tobago, so we will create regions for each one of them.

rVEN = countries.filter(
    ee.Filter.inList('country_na', ['Venezuela',])
)
rCOL = countries.filter(
    ee.Filter.inList('country_na', ['Colombia',])
)
rTT = countries.filter(
    ee.Filter.inList('country_na', ['Trinidad & Tobago',])
)

Parameters and functions

We set up the scale for the outputs

params = {'scale': 30, 'maxPixels':1e10}

And calculate the corresponding resolution

res_km2 = (params['scale']**2)/1e6

Now we design our own combined reducer for the spatial analysis:

combinedReducer = ee.Reducer.sum().combine(
  reducer2 = ee.Reducer.count(),
  sharedInputs = True 
).combine(
  reducer2 = ee.Reducer.mean(),
  sharedInputs = True 
)

12.4 Analysis approach: Compute totals

The first approach is to calculate raster totals and do the calculations afterwards. We choose treecover, gain and loss bands from GFC and apply the mask for the potential distribution of our target macrogroup.

mcdgTotals = (
    gfc
    .select(['treecover2000','gain','loss'])
    .selfMask()
    .mask(MG563)
)

Computations

We compute summaries of the three bands for each country

totalsVEN = mcdgTotals.reduceRegion(
    reducer=combinedReducer,
    geometry=rVEN.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

totalsVEN

{'gain_count': 105206342,
 'gain_mean': 0.0040003273355281,
 'gain_sum': 420784.8392156863,
 'loss_count': 105206342,
 'loss_mean': 0.0586245309569536,
 'loss_sum': 6166573.823529413,
 'treecover2000_count': 105206342,
 'treecover2000_mean': 37.97777763966412,
 'treecover2000_sum': 3994791355.184314}

totalsCOL = mcdgTotals.reduceRegion(
    reducer=combinedReducer,
    geometry=rCOL.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

totalsTT = mcdgTotals.reduceRegion(
    reducer=combinedReducer,
    geometry=rTT.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

Aggregate results

totals = pd.DataFrame([totalsVEN,totalsCOL,totalsTT])

totals

	gain_count	gain_mean	gain_sum	loss_count	loss_mean	loss_sum	treecover2000_count	treecover2000_mean	treecover2000_sum
0	105206342	0.004000	420784.839216	105206342	0.058625	6.166574e+06	105206342	37.977778	3.994791e+09
1	54502069	0.014495	789892.454902	54502069	0.103135	5.620141e+06	54502069	35.044587	1.909694e+09
2	233854	0.003176	741.000000	233854	0.045033	1.050737e+04	233854	75.694211	1.766134e+07

totals['eco_extent_2000'] = (totals.treecover2000_sum / 100) * res_km2

totals['eco_extent_2024'] = totals['eco_extent_2000'] - ((totals.loss_sum.sum()) * res_km2) + ((totals.gain_sum.sum()) * res_km2)

totals[['eco_extent_2000','eco_extent_2024']]

	eco_extent_2000	eco_extent_2024
0	35953.122197	26425.898653
1	17187.244420	7660.020876
2	158.952065	-9368.271479

12.5 Analysis approach: Raster algebra

The second approach is to use raster algebra to calculate a new image of tree cover for 2024 that reflects pixel by pixel changes.

mcdg_2024 = (
    gfc
    .select(['treecover2000'])
    .multiply(gfc.select(['loss']).eq(0))
    .add(gfc.select(['gain']).multiply(100))
    .mask(MG563)
)

Computations

finalVEN = mcdg_2024.reduceRegion(
    reducer=combinedReducer,
    geometry=rVEN.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

finalCOL = mcdg_2024.reduceRegion(
    reducer=combinedReducer,
    geometry=rCOL.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

finalTT = mcdg_2024.reduceRegion(
    reducer=combinedReducer,
    geometry=rTT.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

Aggregate results

finals = pd.DataFrame([finalVEN,finalCOL,finalTT])

finals

	treecover2000_count	treecover2000_mean	treecover2000_sum
0	105206342	34.141011	3.591211e+09
1	54502069	28.821156	1.570559e+09
2	233854	71.958293	1.678966e+07

finals['treecover2000_sum'] / totals['treecover2000_sum']

0    0.898973
1    0.822414
2    0.950645
Name: treecover2000_sum, dtype: float64

sum(finals['treecover2000_sum']) / sum(totals['treecover2000_sum'] )

0.8744396077079528

253656.4 / sum(totals['gain_count'])

0.0015859247710416004

12.6 Analysis approach: Cross tabulation

A third approach is to do a crosstabulation of tree cover loss per year, we use the bands treecover2000 and lossyear from the GFC dataset.

mcdgTC = (
    gfc
    .select(['treecover2000','lossyear'])
    .mask(MG563)
)

Computations

tclVEN = mcdgTC.reduceRegion(
    reducer=combinedReducer.group(groupField=1, groupName='lossyear'),
    geometry=rVEN.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

tclCOL = mcdgTC.reduceRegion(
    reducer=combinedReducer.group(groupField=1, groupName='lossyear'),
    geometry=rCOL.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

tclTT = mcdgTC.reduceRegion(
    reducer=combinedReducer.group(groupField=1, groupName='lossyear'),
    geometry=rTT.geometry(),
    scale=params['scale'],
    maxPixels=params['maxPixels'],
).getInfo()

Crosstabulation of loss per year

Create a function to prepare the results of the previous computation for the extrapolation step:

def prepareDF(r1,r2):
    # We read the data from the treecover / lossyear crosstabulation:
    df = pd.DataFrame(r1['groups'])
    # We use the values calculated from the gain layer to correct the area estimates:
    df['gain'] = (r2['gain_sum'] / 24)
    df.loc[df.lossyear==0,'gain'] = 0
    # We calculate the area for each year by dividing the treecover by 100, 
    # adding a portion of the gain and using the resolution to transform to area:
    df['tc_area'] = df['area'] = ((df['sum']/100) + df['gain']) * res_km2 
    # We reorder the data to account for the area lost between 2001 and 2024 and the area remaining in 2025
    df['order'] = 25 - df['lossyear']
    df['year'] = 1999 + df['lossyear']
    df.loc[df.lossyear==0,'order'] = 0
    df.loc[df.lossyear==0,'year'] = 2025
    df.sort_values('order', inplace=True)
    # Now calculate the accumulated area in reverse order of years:
    df['tc_area'] = np.cumsum( df['area'])
    # And reorder to get a time series of remaining area per year:
    df.sort_values('year', inplace=True)
    return(df)

Apply this function to the data from Venezuela

resultsVEN = prepareDF(tclVEN,totalsVEN)

resultsVEN.head()

	count	lossyear	mean	sum	gain	tc_area	area	order	year
1	352765	1	73.815910	2.603826e+07	17532.701634	36331.828552	250.123728	24	2000
2	224563	2	74.279556	1.667921e+07	17532.701634	36081.704824	165.892364	23	2001
3	217493	3	80.186956	1.743915e+07	17532.701634	35915.812460	172.731799	22	2002
4	188625	4	74.362437	1.402577e+07	17532.701634	35743.080661	142.011373	21	2003
5	241077	5	75.143395	1.811477e+07	17532.701634	35601.069287	178.812379	20	2004

Apply this function to the data from Colombia

resultsCOL = prepareDF(tclCOL,totalsCOL)

Apply this function to the data from Trinidad and Tobago

resultsTT = prepareDF(tclTT,totalsTT)

12.7 Extrapolation of decline

Xnew=pd.DataFrame(range(2000,2050),columns=['year',]).assign(intercept=1)
Xnew['year_squared'] = Xnew['year']**2

def fitExtModel(df):
    X = (df
         .assign(intercept=1)
         .reset_index(drop=True)
        )
    X['year_squared'] = X['year']**2
    y = X.pop('tc_area')
    model_1 = sm.GLM(
        y,
        X[['intercept', 'year', 'year_squared']],
        family=sm.families.Poisson(),
        )
    return model_1.fit()

extVEN = fitExtModel(resultsVEN)
print(extVEN.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                tc_area   No. Observations:                   25
Model:                            GLM   Df Residuals:                       22
Model Family:                 Poisson   Df Model:                            2
Link Function:                    Log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:                -159.71
Date:                Mon, 25 Aug 2025   Deviance:                       12.561
Time:                        20:38:29   Pearson chi2:                     12.6
No. Iterations:                     3   Pseudo R-squ. (CS):              1.000
Covariance Type:            nonrobust                                         
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
intercept      -98.7564     90.558     -1.091      0.275    -276.247      78.735
year             0.1140      0.090      1.266      0.205      -0.062       0.290
year_squared -2.968e-05   2.24e-05     -1.327      0.185   -7.35e-05    1.42e-05
================================================================================

prdVEN=extVEN.get_prediction(Xnew[['intercept', 'year', 'year_squared']]).summary_frame()

plt.plot(resultsVEN[['year']], resultsVEN[['tc_area']], 'o')
plt.plot(Xnew[['year']], prdVEN["mean"], '--', label='fit')
plt.ylabel("Area estimated from tree cover %")
plt.xlabel("Year")
plt.fill_between(
    Xnew['year'], prdVEN["mean_ci_lower"], prdVEN["mean_ci_upper"], color="k", alpha=0.1,
    label='plausible range'
);
plt.legend()
plt.show()

100 - (prdVEN.iloc[49][["mean","mean_ci_lower","mean_ci_upper"]].values * 100 / resultsVEN.iloc[0][['tc_area']].values)

array([26.20915397, 30.41213941, 21.75231554])

extCOL = fitExtModel(resultsCOL)
prdCOL=extCOL.get_prediction(Xnew[['intercept', 'year', 'year_squared']]).summary_frame()
100 - (prdCOL.iloc[49][["mean","mean_ci_lower","mean_ci_upper"]].values * 100 / resultsCOL.iloc[0][['tc_area']].values)

array([52.71523956, 56.68077749, 48.38668748])

plt.plot(resultsCOL[['year']], resultsCOL[['tc_area']], 'o')
plt.plot(Xnew[['year']], prdCOL["mean"], '--', label='fit')
plt.ylabel("Area estimated from tree cover %")
plt.xlabel("Year")
plt.fill_between(
    Xnew['year'], prdCOL["mean_ci_lower"], prdCOL["mean_ci_upper"], color="k", alpha=0.1,
    label='plausible range'
);
plt.legend()
plt.show()

extTT = fitExtModel(resultsTT)
prdTT=extTT.get_prediction(Xnew[['intercept', 'year', 'year_squared']]).summary_frame()
prdTT.iloc[49][["mean","mean_ci_lower","mean_ci_upper"]].values * 100 / resultsTT.iloc[0][['tc_area']].values

array([ 92.5565775 ,  38.95877469, 219.89192697])

resultsTT.iloc[0][['tc_area']].values

array([159.61896455])

plt.plot(resultsTT[['year']], resultsTT[['tc_area']], 'o')
plt.plot(Xnew[['year']], prdTT["mean"], '--', label='fit')
plt.ylabel("Area estimated from tree cover %")
plt.xlabel("Year")
plt.fill_between(
    Xnew['year'], prdTT["mean_ci_lower"], prdTT["mean_ci_upper"], color="k", alpha=0.1,
    label='plausible range'
);
plt.legend()
plt.show()