Tax Parcel Variables

Mike Mahoney true
2021-01-11

Evaluation Results

Last iteration of these models: 2021-01-06

Change Summary

RF (ranger) GBM (LightGBM) SVM (kernlab) Ensemble (model weighted) Ensemble (RMSE weighted)
RMSE 37.071 36.717 37.412 36.622 36.473
MBE 2.594 2.366 -2.827 2.397 0.699
R2 0.771 0.775 0.769 0.777 0.778

AGB Distribution

summary(bind_rows(training, testing)$agb_mgha)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   9.318  84.914  91.012 148.527 425.363 

Bootstrapping Results

Across 1000 bootstrap iterations, our ensemble model had a mean RMSE of 36.735 \(\pm\) 0.305.

RMSE Distribution

Plot Errors

Validation Results

RMSE Min Median Max
Rf 36.054 41.252 46.861
Lgb 35.752 40.820 45.736
Svm 35.485 42.036 46.726
Ensemble 35.174 40.940 45.886
R2 Min Median Max
rf 0.671 0.729 0.778
lgb 0.683 0.733 0.784
svm 0.664 0.725 0.778
ensemble 0.684 0.736 0.783

Metadata

Ensembles

      lgb        rf       svm 
0.3332251 0.3312431 0.3355317 

Call:
lm(formula = agb_mgha ~ rf_pred * lgb_pred * svm_pred, data = pred_values)

Residuals:
    Min      1Q  Median      3Q     Max 
-152.73  -20.34   -0.47   15.15  217.40 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -1.388e+00  5.748e-01  -2.416 0.015715 *  
rf_pred                   -4.833e-01  7.944e-02  -6.083  1.2e-09 ***
lgb_pred                   1.451e+00  8.535e-02  17.002  < 2e-16 ***
svm_pred                   1.560e-01  6.105e-02   2.555 0.010631 *  
rf_pred:lgb_pred          -1.452e-03  4.390e-04  -3.307 0.000946 ***
rf_pred:svm_pred           6.360e-03  6.166e-04  10.315  < 2e-16 ***
lgb_pred:svm_pred         -6.275e-03  6.498e-04  -9.657  < 2e-16 ***
rf_pred:lgb_pred:svm_pred  4.860e-06  1.261e-06   3.855 0.000116 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 40.5 on 23292 degrees of freedom
Multiple R-squared:  0.7354,    Adjusted R-squared:  0.7354 
F-statistic:  9250 on 7 and 23292 DF,  p-value: < 2.2e-16

Coverages

\(n\) and \(p\)

Component Models

Random forest:

$num.trees
[1] 2000

$mtry
[1] 33

$min.node.size
[1] 5

$sample.fraction
[1] 0.2

$splitrule
[1] "variance"

$replace
[1] TRUE

$formula
agb_mgha ~ .

LGB:

$learning_rate
[1] 0.05

$nrounds
[1] 100

$num_leaves
[1] 10

$max_depth
[1] -1

$extra_trees
[1] TRUE

$min_data_in_leaf
[1] 10

$bagging_fraction
[1] 0.8

$bagging_freq
[1] 1

$feature_fraction
[1] 0.5

$min_data_in_bin
[1] 21

$lambda_l1
[1] 0.3

$lambda_l2
[1] 0.3

$force_col_wise
[1] TRUE

SVM:

$x
agb_mgha ~ .

$kernel
[1] "laplacedot"

$type
[1] "nu-svr"

$kpar
$kpar$sigma
[1] 0.0078125


$C
[1] 4

$epsilon
[1] 0.00390625

$nu
[1] 1