Evaluation Results
Last iteration of these models: 2020-12-29
Change Summary
- Dropped intensity predictors
- RMSE-weighted ensemble is now correctly weighted by RMSE
- Prior versions used MSE accidentally
- Insignificant impact to RMSE, R2, but feels clearer to use RMSE for everything
|
RF (ranger)
|
GBM (LightGBM)
|
SVM (kernlab)
|
Ensemble (model weighted)
|
Ensemble (RMSE weighted)
|
RMSE
|
38.088
|
34.715
|
36.623
|
35.530
|
35.880
|
MBE
|
-1.991
|
-1.160
|
-3.050
|
-1.170
|
-2.064
|
R2
|
0.732
|
0.775
|
0.753
|
0.765
|
0.762
|
Bootstrapping Results
Across 1000 bootstrap iterations, our ensemble model had a mean RMSE of 36.6 \(\pm\) 0.422.
RMSE Distribution
Plot Errors
Validation Results
RMSE
|
Min
|
Median
|
Max
|
Rf
|
34.382
|
39.488
|
47.609
|
Lgb
|
30.983
|
38.601
|
46.882
|
Svm
|
32.896
|
38.892
|
45.863
|
Ensemble
|
31.947
|
38.436
|
46.212
|
R2
|
Min
|
Median
|
Max
|
rf
|
0.587
|
0.699
|
0.780
|
lgb
|
0.593
|
0.710
|
0.806
|
svm
|
0.609
|
0.705
|
0.779
|
ensemble
|
0.610
|
0.715
|
0.796
|
Ensembles
- RMSE-weighted model weights:
lgb rf svm
0.3422873 0.3205030 0.3372097
Call:
lm(formula = agb_mgha ~ rf_pred * lgb_pred * svm_pred, data = pred_values)
Residuals:
Min 1Q Median 3Q Max
-123.787 -21.382 -1.536 18.213 187.790
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.148e+00 1.034e+00 -2.077 0.037780 *
rf_pred 7.713e-03 7.884e-02 0.098 0.922070
lgb_pred 7.954e-01 9.116e-02 8.725 < 2e-16 ***
svm_pred 2.356e-01 8.937e-02 2.636 0.008388 **
rf_pred:lgb_pred -1.347e-03 7.098e-04 -1.897 0.057802 .
rf_pred:svm_pred 2.500e-03 6.903e-04 3.622 0.000293 ***
lgb_pred:svm_pred -1.308e-03 7.639e-04 -1.712 0.086918 .
rf_pred:lgb_pred:svm_pred 9.097e-07 2.246e-06 0.405 0.685474
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 38.47 on 12992 degrees of freedom
Multiple R-squared: 0.7125, Adjusted R-squared: 0.7124
F-statistic: 4600 on 7 and 12992 DF, p-value: < 2.2e-16
\(n\) and \(p\)
- 624 observations
- 57 predictors
- X, n, zmean, zmean_c, max, quad_mean, quad_mean_c, cv, cv_c, z_kurt, z_skew, L2, L3, L4, L_cv, L_skew, L_kurt, h10, h20, h30, h40, h50, h60, h70, h80, h90, h95, h99, hvol, cancov, rpc1, d10, d20, d30, d40, d50, d60, d70, d80, d90, stems, ca_max, ca_mean, ca_min, ca25, ca50, ca75, ca90, ca95, precip, tmin, tmax, twi, slope, aspect, elev, fold_index
Component Models
- Tuning used 5-fold CV
- Final hyperparameters:
$num.trees
[1] 300
$mtry
[1] 2
$min.node.size
[1] 5
$replace
[1] FALSE
$splitrule
[1] "variance"
$sample.fraction
[1] 0.2
$formula
agb_mgha ~ .
$learning_rate
[1] 0.1
$nrounds
[1] 50
$num_leaves
[1] 5
$extra_trees
[1] FALSE
$min_data_in_bin
[1] 13
$bagging_fraction
[1] 0.3
$feature_fraction
[1] 0.8
$lambda_l1
[1] 9
$lambda_l2
[1] 9
$force_col_wise
[1] TRUE
$x
agb_mgha ~ .
$kernel
[1] "laplacedot"
$type
[1] "eps-bsvr"
$kpar
$kpar$sigma
[1] 0.015625
$C
[1] 4
$epsilon
[1] 0.0625