CAFRI Labs: 1.0.1: Big Tune and Then Some

Mike Mahoney

Evaluation Results

Last iteration of these models: 2021-02-02

Change Summary

Adding in 29 missing plots that were excluded

	RF (ranger)	GBM (LightGBM)	SVM (kernlab)	Ensemble (model weighted)	Ensemble (RMSE weighted)
RMSE	37.507	37.052	37.381	36.861	36.618
MBE	-1.157	-1.853	-3.273	-1.556	-2.059
R2	0.788	0.793	0.789	0.793	0.799

AGB Distribution

summary(bind_rows(training, testing)$agb_mgha)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   9.386  86.165  92.153 149.404 425.363

Bootstrapping Results

Across 1000 bootstrap iterations, our ensemble model had a mean RMSE of 36.838 \(\pm\) 0.331.

RMSE Distribution

Plot Errors

Validation Results

RMSE	Min	Median	Max
Rf	34.156	40.022	44.767
Lgb	34.409	40.412	45.238
Svm	34.603	41.073	47.701
Ensemble	33.750	39.950	44.774

R2	Min	Median	Max
rf	0.670	0.740	0.796
lgb	0.664	0.734	0.794
svm	0.637	0.723	0.784
ensemble	0.675	0.741	0.800

Metadata

Ensembles

RMSE-weighted model weights:

      lgb        rf       svm 
0.3396700 0.3457726 0.3145575

Linear model weights:


Call:
lm(formula = agb_mgha ~ rf_pred * lgb_pred * svm_pred, data = pred_values)

Residuals:
     Min       1Q   Median       3Q      Max 
-127.608  -22.060   -0.713   15.174  203.474 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -3.306e+00  5.860e-01  -5.642 1.70e-08 ***
rf_pred                    1.530e-01  8.414e-02   1.818  0.06909 .  
lgb_pred                   8.837e-01  8.910e-02   9.918  < 2e-16 ***
svm_pred                   1.740e-01  5.341e-02   3.257  0.00113 ** 
rf_pred:lgb_pred          -1.790e-03  3.709e-04  -4.826 1.40e-06 ***
rf_pred:svm_pred           4.294e-03  6.355e-04   6.758 1.44e-11 ***
lgb_pred:svm_pred         -5.143e-03  6.553e-04  -7.849 4.36e-15 ***
rf_pred:lgb_pred:svm_pred  9.149e-06  1.368e-06   6.688 2.31e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 39.72 on 23492 degrees of freedom
Multiple R-squared:  0.7385,    Adjusted R-squared:  0.7384 
F-statistic:  9476 on 7 and 23492 DF,  p-value: < 2.2e-16

Coverages

18 coverages:
- FEMA_FranklinStLawrence2016, FEMA_FultonSaratogaHerkimerFranklin2017, FEMA_GreatLakes2014, FEMA_HudsonHoosic2012, FEMA_OniedaSubbasin2016, NYSGPO_AlleganySteuben2016, NYSGPO_CayugaOswego_2018, NYSGPO_ColumbiaRensselaer2016, NYSGPO_ErieGeneseeLivingston2019, NYSGPO_MadisonOtsego_2015, NYSGPO_Southwest_spring_2017, NYSGPO_SouthwestB_fall_2017, NYSGPO_WarrenWashingtonEssex_2015, USGS_3County2014, USGS_ClintonEssexFranklin2014, USGS_LongIsland2014, USGS_NorthEast2011, USGS_Schoharie2014

\(n\) and \(p\)

1136 observations
- 786 training
- 350 testing
73 predictors
- n, zmean, zmean_c, max, quad_mean, quad_mean_c, cv, cv_c, z_kurt, z_skew, L2, L3, L4, L_cv, L_skew, L_kurt, h10, h20, h30, h40, h50, h60, h70, h80, h90, h95, h99, hvol, cancov, rpc1, d10, d20, d30, d40, d50, d60, d70, d80, d90, precip, tmin, tmax, twi, slope, aspect, elev, tax_code_105, tax_code_210, tax_code_240, tax_code_260, tax_code_270, tax_code_280, tax_code_312, tax_code_314, tax_code_322, tax_code_323, tax_code_910, tax_code_911, tax_code_912, tax_code_931, tax_code_932, tax_code_1000, tax_category_100, tax_category_200, tax_category_300, tax_category_900, tax_code_112, tax_code_120, tax_code_241, tax_code_321, tax_code_930, tax_code_941, tax_code_2000

Component Models

Tuning used 5-fold CV
Final hyperparameters:

Random forest:

$num.trees
[1] 1000

$mtry
[1] 18

$min.node.size
[1] 7

$sample.fraction
[1] 0.2

$splitrule
[1] "variance"

$replace
[1] TRUE

$formula
agb_mgha ~ .

LGB:

$learning_rate
[1] 0.05

$nrounds
[1] 100

$num_leaves
[1] 5

$max_depth
[1] 2

$extra_trees
[1] TRUE

$min_data_in_leaf
[1] 10

$bagging_fraction
[1] 0.3

$bagging_freq
[1] 1

$feature_fraction
[1] 0.4

$min_data_in_bin
[1] 8

$lambda_l1
[1] 5

$lambda_l2
[1] 1

$force_col_wise
[1] TRUE

SVM:

$x
agb_mgha ~ .

$kernel
[1] "laplacedot"

$type
[1] "eps-svr"

$kpar
$kpar$sigma
[1] 0.0078125


$C
[1] 12

$epsilon
[1] 1.525879e-05

1.0.1: Big Tune and Then Some

Evaluation Results

Change Summary

AGB Distribution

Bootstrapping Results

RMSE Distribution

Plot Errors

Validation Results

Metadata

Ensembles

Coverages

\(n\) and \(p\)

Component Models

Corrections

Citation