CAFRI Labs: GNN-AGB 0.0.1

Sam Gordon

Gradient Nearest Neighbor (GNN) is an imputation method due to Janet L. Ohmann and Matthew J. Gregory[2]. The method combines canonical correspondence analysis (CCA) with k-nearest neighbor (kNN) imputation.

CCA generates ordination axes from two distinct sets of predictor variables, e.g. ‘species’ and ‘environmental’ variables. In GNN, the two categories are typically ‘geospatial data’, e.g. topography or LANDSAT, and ‘field data’, e.g. species abundance.

Once the CCA is conducted, the kNN imputation can then be performed in the CCA (ordination) space as opposed to the original predictor variable space.

The original application for GNN imputation was species composition mapping in coastal Oregon[2]. More recently, John J. Battles et al.[1] used GNN imputation to estimate AGB in California at the regional level. Here, we attempt to imitate their implementation in New York based on similar predictor data.

For GNN-AGB 0.0.1, we used Canonical Correlation Analysis to generate the ordination axes, as opposed to Canonical Correspondence Analysis, which has historically been used for GNN.

Methods

Geospatial Data

Geospatial data were obtained at the locations of 1,977 FIA plots in New York State. Geospatial predictors from Battles et al. (2018) and GNN-AGB 0.0.1. are listed below, for comparison.

	Battles et al. (2018)	GNN-AGB 0.0.1
Topography	ASPTR: Cosine transformation of aspect	ASPTR
	DEM: Elevation from a digital elevation map (m)	DEM
	PRR: Potential relative radiation (unitless)
	SLPPCT: Slope (%)	SLOPE
	TPI450: Topographic position index	TWI: Topographic wetness index
Climate	ANNPRE: Mean annual precipitation (ln[mm])	PRECIP: 30-year normal (in)
	ANNTMP: Mean annual temperature (\(^\circ\)C)
	AUGMAXT: Mean maximum temperature of August (\(^\circ\)C)	TMAX: Mean maximum annual temperature (\(^\circ\)C)
	DECMINT: Mean minimum temperature of December (\(^\circ\)C)	TMIN: Mean minimum annual temperature (\(^\circ\)C)
	SMRTP: Ratio of mean temperature (\(^\circ\)C) to precipitation (ln[mm]) of May-Sept.
LANDSAT	TC1: Brightness (i.e., axis 1 of the tassel cap transformation)	TCB
	TC2: Greenness (i.e., axis 2 of the tassel cap transformation)	TCG
	TC3: Wetness (i.e., axis 3 of the tassel cap transformation)	TCW
	NBR: Normalized burn ratio (unitless)	NBR
Change	\(\Delta\) TC1: Mean change in TC1 during previous 6 years	\(\Delta\) TCB: Mean change in TCB during previous year
	\(\Delta\) TC2: Mean change in TC2 during previous 6 years	\(\Delta\) TCG: Mean change in TCG during previous year
	\(\Delta\) TC3: Mean change in TC3 during previous 6 years	\(\Delta\) TCW: Mean change in TCB during previous year
	\(\Delta\) NBR: Mean change in NBR during previous 6 years	\(\Delta\) NBR: Mean change in TCB during previous year
Geology/Soils	ROCKDEPTH: Rock depth (cm)
	BD_30: Bulk density of soils 0 cm - 30 cm (g cm -3)
	PERM_30: Permeability of soils 0 cm - 30 cm (m 2)
	PH_30: Mean pH of soils 0 cm - 30 cm
	RVOL_30: Rock volume of soils 0 cm - 30 cm (cm 3)
Location	COASTPROX: Distance to the Pacific Ocean (km)
	LAT: Latitude (\(^\circ\))
	LON: Longitude (\(^\circ\))

Field Data

In both Battles et al. (2018) and GNN-AGB 0.0.1, tree species matrices were obtained from FIA data. These matrices constitute the ‘field data’ category of predictors.

Canonical Correlation Analysis

Canonical Correlation Analysis was conducted in R with the CCA::cc() function. Of the sixteen axes generated by CCA::cc(), we retained ten based on the significance of the Wilks’ Lambda test statistic, with alpha = 0.05.

CCA Axis	Wilks Lambda	p.value
1	0.0340950	0.0000000
2	0.0969175	0.0000000
3	0.2485243	0.0000000
4	0.3791973	0.0000000
5	0.4544100	0.0000000
6	0.5435356	0.0000000
7	0.6109630	0.0000000
8	0.6802785	0.0000000
9	0.7413935	0.0000019
10	0.7928830	0.0008046
11	0.8424386	0.0534617
12	0.8909325	0.5819887
13	0.9266623	0.9189674
14	0.9530076	0.9788676
15	0.9710587	0.9692588
16	0.9861286	0.9033297

GNN

We used a 30% holdout set to evaluate our GNN model. For each FIA plot in the (30%) holdout set, AGB values were estimated by distance-weighted kNN imputation, drawing from the remaining (70%) of data points in the 10-dimensional ordination space derived from the CCA. The model was then evaluated on the holdout set by comparing the AGB estimates from FIA field data against the GNN imputation. We repeated this process for each value of k from 1 to 100.

Results

GNN-AGB predictions vs FIA AGB estimates (Mg ha-1)

Test Set Accuracy

NRMSE and R2 for all values of k from 1 to 100

NRMSE and R2 for Battles et al. (2018) and GNN-AGB 0.0.1

	Battles: k1	Battles: k10	GNN-AGB: k1	GNN-AGB: k10	GNN-AGB: k30	GNN-AGB: k60
NRMSE	0.765	0.693	0.579	0.425	0.421	0.425
R2	0.461	0.557	0.100	0.249	0.277	0.292

References

[1] Battles, J. et al. (2018). Innovations in measuring and managing forest carbon stocks in California. A Report for: California’s Fourth Climate Change Assessment, 99.

[2] Ohmann, J. L., & Gregory, M. J. (2002). Predictive mapping of forest composition and structure with direct gradient analysis and nearest-neighbor imputation in coastal Oregon, USA. Canadian Journal of Forest Research, 32(4), 725-741.

GNN-AGB 0.0.1