CAFRI Labs: 1.3.0: Seeing the (Non)Forest (Out, because it's not measured)

Mike Mahoney

Evaluation Results

Last iteration of these models: 2021-08-04

Change Summary

Removed all plots with elements of non-forest condition, because those plots do not get measured for diameter (but then are assigned AGB values of 0)

	RF (ranger)	GBM (LightGBM)	SVM (kernlab)	Ensemble (model weighted)	Ensemble (RMSE weighted)
RMSE	42.492	46.377	43.049	42.536	42.316
MBE	-0.947	-0.722	-3.544	-0.394	-1.787
R2	0.587	0.534	0.579	0.587	0.591

AGB Distribution

summary(bind_rows(training, testing)$agb_mgha)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0    86.5   134.9   134.6   173.5   425.0

Bootstrapping Results

Across 1000 bootstrap iterations, our ensemble model had a mean RMSE of 42.571 \(\pm\) 0.555.

RMSE Distribution

Plot Errors

Validation Results

RMSE	Min	Median	Max
Rf	36.293	43.174	49.579
Lgb	38.917	48.916	56.496
Svm	37.342	43.758	66.253
Ensemble	36.974	43.749	50.376

R2	Min	Median	Max
rf	0.456	0.584	0.699
lgb	0.372	0.489	0.625
svm	0.122	0.571	0.700
ensemble	0.449	0.573	0.698

Metadata

Ensembles

RMSE-weighted model weights:

      lgb        rf       svm 
0.3090064 0.3408116 0.3501821

Linear model weights:


Call:
lm(formula = agb_mgha ~ rf_pred * lgb_pred * svm_pred, data = pred_values)

Residuals:
     Min       1Q   Median       3Q      Max 
-138.983  -27.699   -3.649   23.850  198.591 

Coefficients:
                              Estimate   Std. Error t value Pr(>|t|)
(Intercept)               -30.29818165   3.64841560  -8.304  < 2e-16
rf_pred                     1.10023916   0.08461760  13.002  < 2e-16
lgb_pred                    0.39840667   0.06685331   5.959 2.60e-09
svm_pred                    0.25963604   0.07484690   3.469 0.000524
rf_pred:lgb_pred           -0.00405108   0.00050474  -8.026 1.09e-15
rf_pred:svm_pred           -0.00043423   0.00047167  -0.921 0.357266
lgb_pred:svm_pred          -0.00119990   0.00060327  -1.989 0.046722
rf_pred:lgb_pred:svm_pred   0.00001379   0.00000199   6.929 4.42e-12
                             
(Intercept)               ***
rf_pred                   ***
lgb_pred                  ***
svm_pred                  ***
rf_pred:lgb_pred          ***
rf_pred:svm_pred             
lgb_pred:svm_pred         *  
rf_pred:lgb_pred:svm_pred ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 42.99 on 13192 degrees of freedom
Multiple R-squared:  0.5866,    Adjusted R-squared:  0.5863 
F-statistic:  2674 on 7 and 13192 DF,  p-value: < 2.2e-16

Coverages

16 coverages:
- FEMA_FranklinStLawrence2016, FEMA_FultonSaratogaHerkimerFranklin2017, FEMA_GreatLakes2014, FEMA_OneidaSubbasin2016, NYSGPO_AlleganySteuben2016, NYSGPO_CayugaOswego_2018, NYSGPO_ColumbiaRensselaer2016, NYSGPO_ErieGeneseeLivingston2019, NYSGPO_MadisonOtsego_2015, NYSGPO_Southwest_spring_2017, NYSGPO_SouthwestB_fall_2017, NYSGPO_WarrenWashingtonEssex_2015, USGS_3County2014, USGS_ClintonEssexFranklin2014, USGS_LongIsland2014, USGS_Schoharie2014

\(n\) and \(p\)

648 observations
- 443 training
- 205 testing
81 predictors
- zmean, zmean_c, max, quad_mean, quad_mean_c, cv, cv_c, z_kurt, z_skew, L2, L3, L4, L_cv, L_skew, L_kurt, h10, h20, h30, h40, h50, h60, h70, h80, h90, h95, h99, hvol, cancov, rpc1, d10, d20, d30, d40, d50, d60, d70, d80, d90, precip, tmin, tmax, twi, slope, aspect, elev, tax_code_105, tax_code_112, tax_code_120, tax_code_210, tax_code_240, tax_code_260, tax_code_311, tax_code_312, tax_code_321, tax_code_322, tax_code_323, tax_code_910, tax_code_911, tax_code_912, tax_code_930, tax_code_931, tax_code_932, tax_code_941, tax_code_961, tax_code_1000, tax_category_100, tax_category_200, tax_category_300, tax_category_900, tax_code_2000, lcpri4, lcpri6, lcpri3, lcsec3, lcsec1, lcsec4, lcsec5, lcpri2, lcsec2, lcsec8, lcpri1

Component Models

Tuning used 5-fold CV
Final hyperparameters:

Random forest:

$num.trees
[1] 750

$mtry
[1] 28

$min.node.size
[1] 3

$sample.fraction
[1] 0.2

$splitrule
[1] "variance"

$replace
[1] TRUE

$formula
agb_mgha ~ .

LGB:

$nrounds
[1] 1000

$params
$params$learning_rate
[1] 0.1

$params$num_leaves
[1] 6

$params$max_depth
[1] -1

$params$extra_trees
[1] TRUE

$params$min_data_in_leaf
[1] 10

$params$bagging_fraction
[1] 0.3

$params$bagging_freq
[1] 1

$params$feature_fraction
[1] 0.3

$params$min_data_in_bin
[1] 8

$params$lambda_l1
[1] 14

$params$lambda_l2
[1] 0.5

$params$force_col_wise
[1] TRUE

SVM:

$x
agb_mgha ~ zmean + zmean_c + max + quad_mean + quad_mean_c + 
    cv + cv_c + z_kurt + z_skew + L2 + L3 + L4 + L_cv + L_skew + 
    L_kurt + h10 + h20 + h30 + h40 + h50 + h60 + h70 + h80 + 
    h90 + h95 + h99 + hvol + cancov + rpc1 + d10 + d20 + d30 + 
    d40 + d50 + d60 + d70 + d80 + d90 + precip + tmin + tmax + 
    twi + slope + aspect + elev + tax_code_105 + tax_code_112 + 
    tax_code_120 + tax_code_210 + tax_code_240 + tax_code_260 + 
    tax_code_311 + tax_code_312 + tax_code_321 + tax_code_322 + 
    tax_code_323 + tax_code_910 + tax_code_911 + tax_code_912 + 
    tax_code_930 + tax_code_931 + tax_code_932 + tax_code_941 + 
    tax_code_961 + tax_code_1000 + tax_category_100 + tax_category_200 + 
    tax_category_300 + tax_category_900 + tax_code_2000

$kernel
[1] "laplacedot"

$type
[1] "eps-svr"

$kpar
$kpar$sigma
[1] 0.001953125


$C
[1] 21

$epsilon
[1] 0.25

1.3.0: Seeing the (Non)Forest (Out, because it’s not measured)

Evaluation Results

Change Summary

AGB Distribution

Bootstrapping Results

RMSE Distribution

Plot Errors

Validation Results

Metadata

Ensembles

Coverages

\(n\) and \(p\)

Component Models

Corrections

Citation