CAFRI Labs: 1.1.0: Overstory Only

Mike Mahoney

Evaluation Results

Last iteration of these models: 2021-05-12

Change Summary

Removing understory plots from AGB values

	RF (ranger)	GBM (LightGBM)	SVM (kernlab)	Ensemble (model weighted)	Ensemble (RMSE weighted)
RMSE	38.031	36.942	36.972	36.617	36.624
MBE	2.594	2.684	-1.681	2.087	1.196
R2	0.765	0.779	0.780	0.782	0.783

AGB Distribution

summary(bind_rows(training, testing)$agb_mgha)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   9.107  85.393  91.573 149.091 425.000

Bootstrapping Results

Across 1000 bootstrap iterations, our ensemble model had a mean RMSE of 36.993 \(\pm\) 0.337.

RMSE Distribution

Plot Errors

Validation Results

RMSE	Min	Median	Max
Rf	33.746	39.697	43.525
Lgb	32.495	39.252	43.268
Svm	34.456	40.661	44.947
Ensemble	32.649	39.258	43.391

R2	Min	Median	Max
rf	0.690	0.747	0.791
lgb	0.697	0.751	0.804
svm	0.681	0.740	0.782
ensemble	0.699	0.755	0.798

Metadata

Ensembles

RMSE-weighted model weights:

      lgb        rf       svm 
0.3363945 0.3294700 0.3341355

Linear model weights:


Call:
lm(formula = agb_mgha ~ rf_pred * lgb_pred * svm_pred, data = pred_values)

Residuals:
     Min       1Q   Median       3Q      Max 
-132.679  -21.086   -0.691   13.965  199.124 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -2.761e+00  5.508e-01  -5.013 5.38e-07 ***
rf_pred                   -1.188e-02  7.727e-02  -0.154 0.877824    
lgb_pred                   1.002e+00  8.608e-02  11.640  < 2e-16 ***
svm_pred                   1.741e-01  5.272e-02   3.301 0.000964 ***
rf_pred:lgb_pred          -1.609e-03  3.834e-04  -4.197 2.71e-05 ***
rf_pred:svm_pred           2.474e-03  6.317e-04   3.917 8.98e-05 ***
lgb_pred:svm_pred         -2.820e-03  6.391e-04  -4.412 1.03e-05 ***
rf_pred:lgb_pred:svm_pred  6.795e-06  1.085e-06   6.260 3.93e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 38.95 on 23492 degrees of freedom
Multiple R-squared:  0.7527,    Adjusted R-squared:  0.7526 
F-statistic: 1.021e+04 on 7 and 23492 DF,  p-value: < 2.2e-16

Coverages

17 coverages:
- FEMA_FranklinStLawrence2016, FEMA_FultonSaratogaHerkimerFranklin2017, FEMA_GreatLakes2014, FEMA_OniedaSubbasin2016, NYSGPO_AlleganySteuben2016, NYSGPO_CayugaOswego_2018, NYSGPO_ColumbiaRensselaer2016, NYSGPO_ErieGeneseeLivingston2019, NYSGPO_MadisonOtsego_2015, NYSGPO_Southwest_spring_2017, NYSGPO_SouthwestB_fall_2017, NYSGPO_WarrenWashingtonEssex_2015, USGS_3County2014, USGS_ClintonEssexFranklin2014, USGS_LongIsland2014, USGS_NorthEast2011, USGS_Schoharie2014

\(n\) and \(p\)

1136 observations
- 786 training
- 350 testing
99 predictors
- n, zmean, zmean_c, max, quad_mean, quad_mean_c, cv, cv_c, z_kurt, z_skew, L2, L3, L4, L_cv, L_skew, L_kurt, h10, h20, h30, h40, h50, h60, h70, h80, h90, h95, h99, hvol, cancov, rpc1, d10, d20, d30, d40, d50, d60, d70, d80, d90, precip, tmin, tmax, twi, slope, aspect, elev, tax_code_105, tax_code_111, tax_code_112, tax_code_113, tax_code_117, tax_code_120, tax_code_190, tax_code_210, tax_code_230, tax_code_240, tax_code_241, tax_code_250, tax_code_260, tax_code_270, tax_code_280, tax_code_311, tax_code_312, tax_code_314, tax_code_320, tax_code_321, tax_code_322, tax_code_416, tax_code_418, tax_code_551, tax_code_552, tax_code_583, tax_code_613, tax_code_670, tax_code_681, tax_code_693, tax_code_710, tax_code_720, tax_code_831, tax_code_852, tax_code_910, tax_code_911, tax_code_912, tax_code_920, tax_code_930, tax_code_931, tax_code_932, tax_code_941, tax_code_961, tax_code_971, tax_code_1000, tax_category_700, tax_category_900, tax_code_108, tax_code_109, tax_code_110, tax_code_114, tax_code_115, tax_code_123

Component Models

Tuning used 5-fold CV
Final hyperparameters:

Random forest:

$num.trees
[1] 1000

$mtry
[1] 30

$min.node.size
[1] 3

$sample.fraction
[1] 0.25

$splitrule
[1] "variance"

$replace
[1] TRUE

$formula
agb_mgha ~ .

LGB:

$learning_rate
[1] 0.05

$nrounds
[1] 100

$num_leaves
[1] 17

$max_depth
[1] -1

$extra_trees
[1] TRUE

$min_data_in_leaf
[1] 10

$bagging_fraction
[1] 0.6

$bagging_freq
[1] 1

$feature_fraction
[1] 0.6

$min_data_in_bin
[1] 3

$lambda_l1
[1] 2

$lambda_l2
[1] 0.5

$force_col_wise
[1] TRUE

SVM:

$x
agb_mgha ~ .

$kernel
[1] "laplacedot"

$type
[1] "eps-svr"

$kpar
$kpar$sigma
[1] 0.00390625


$C
[1] 8

$epsilon
[1] 0.0001220703

1.1.0: Overstory Only

Evaluation Results

Change Summary

AGB Distribution

Bootstrapping Results

RMSE Distribution

Plot Errors

Validation Results

Metadata

Ensembles

Coverages

\(n\) and \(p\)

Component Models

Corrections

Citation