Evaluation Results
Last iteration of these models: 2020-12-31
Change Summary
- New coverages (@Lucas for details)
- Predictions bound to [0, Inf) (minimal improvement)
- Tax parcel data used as predictors
- At the moment, only X00-level being used, with 400, 500, 700, and 800 rolled into collective “other” due to low representation
- This is due to time constraints and is likely not optimal.
|
RF (ranger)
|
GBM (LightGBM)
|
SVM (kernlab)
|
Ensemble (model weighted)
|
Ensemble (RMSE weighted)
|
RMSE
|
38.617
|
38.496
|
38.574
|
37.546
|
37.798
|
MBE
|
-1.313
|
-0.878
|
-5.357
|
-2.230
|
-2.547
|
R2
|
0.761
|
0.761
|
0.768
|
0.774
|
0.773
|
AGB Distribution
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 9.645 86.795 91.792 148.679 425.363
Bootstrapping Results
Across 1000 bootstrap iterations, our ensemble model had a mean RMSE of 37.97 \(\pm\) 0.355.
RMSE Distribution
Plot Errors
Validation Results
RMSE
|
Min
|
Median
|
Max
|
Rf
|
34.981
|
39.895
|
45.285
|
Lgb
|
36.127
|
40.427
|
46.676
|
Svm
|
35.101
|
39.619
|
45.833
|
Ensemble
|
35.316
|
39.255
|
44.952
|
R2
|
Min
|
Median
|
Max
|
rf
|
0.684
|
0.740
|
0.792
|
lgb
|
0.661
|
0.732
|
0.791
|
svm
|
0.674
|
0.750
|
0.808
|
ensemble
|
0.684
|
0.749
|
0.801
|
Ensembles
- RMSE-weighted model weights:
lgb rf svm
0.3288537 0.3307084 0.3404378
Call:
lm(formula = agb_mgha ~ rf_pred * lgb_pred * svm_pred, data = pred_values)
Residuals:
Min 1Q Median 3Q Max
-140.574 -20.252 -0.105 12.575 211.587
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.703e-01 5.548e-01 -1.569 0.1167
rf_pred 3.020e-01 6.630e-02 4.556 5.25e-06 ***
lgb_pred -4.926e-03 6.565e-02 -0.075 0.9402
svm_pred 7.300e-01 5.500e-02 13.272 < 2e-16 ***
rf_pred:lgb_pred 7.808e-04 3.958e-04 1.973 0.0485 *
rf_pred:svm_pred -8.177e-04 4.952e-04 -1.651 0.0987 .
lgb_pred:svm_pred -1.133e-04 5.587e-04 -0.203 0.8393
rf_pred:lgb_pred:svm_pred 9.406e-07 1.291e-06 0.729 0.4662
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 39.31 on 23792 degrees of freedom
Multiple R-squared: 0.7489, Adjusted R-squared: 0.7489
F-statistic: 1.014e+04 on 7 and 23792 DF, p-value: < 2.2e-16
\(n\) and \(p\)
- 1147 observations
- 65 predictors
- X, n, zmean, zmean_c, max, quad_mean, quad_mean_c, cv, cv_c, z_kurt, z_skew, L2, L3, L4, L_cv, L_skew, L_kurt, h10, h20, h30, h40, h50, h60, h70, h80, h90, h95, h99, hvol, cancov, rpc1, d10, d20, d30, d40, d50, d60, d70, d80, d90, stems, ca_max, ca_mean, ca_min, ca25, ca50, ca75, ca90, ca95, precip, tmin, tmax, twi, slope, aspect, elev, tax_category_100, tax_category_200, tax_category_300, tax_category_600, tax_category_900, tax_category_1000, fold_index, tax_category_2000, tax_category_other
Component Models
- Tuning used 5-fold CV
- Final hyperparameters:
$num.trees
[1] 750
$mtry
[1] 20
$min.node.size
[1] 1
$sample.fraction
[1] 0.5
$splitrule
[1] "maxstat"
$replace
[1] TRUE
$formula
agb_mgha ~ .
$learning_rate
[1] 0.1
$nrounds
[1] 50
$num_leaves
[1] 5
$max_depth
[1] -1
$extra_trees
[1] TRUE
$min_data_in_leaf
[1] 10
$bagging_fraction
[1] 0.3
$bagging_freq
[1] 1
$feature_fraction
[1] 0.5
$min_data_in_bin
[1] 24
$lambda_l1
[1] 0
$lambda_l2
[1] 0.1
$force_col_wise
[1] TRUE
$x
agb_mgha ~ .
$kernel
[1] "laplacedot"
$type
[1] "nu-svr"
$kpar
$kpar$sigma
[1] 0.001953125
$C
[1] 64
$epsilon
[1] 0.001953125
$nu
[1] 1