Validation
GeoStats.jl was designed to, among other things, facilitate rigorous comparison of different geostatistical models in the literature. As a user of geostatistics, you may be interested in trying various models on a given data set to pick the one with best performance. As a researcher in the field, you may be interested in benchmarking your new model against other established models.
Errors of geostatistical solvers can be estimated with the cverror
function:
GeoStatsValidation.cverror
— Functioncverror(model::GeoStatsModel, geotable, method; kwargs...)
Estimate error of model
in a given geotable
with error estimation method
using Interpolate
or InterpolateNeighbors
depending on the passed kwargs
.
cverror(model::StatsLearnModel, geotable, method)
cverror((model, invars => outvars), geotable, method)
Estimate error of model
in a given geotable
with error estimation method
using the Learn
transform.
For example, we can perform block cross-validation on a decision tree model using the following code:
using GeoStats
using GeoIO
# load geospatial data
Ω = GeoIO.load("data/agriculture.csv", coords = ("x", "y"))
# 20%/80% split along the (1, -1) direction
Ωₛ, Ωₜ = geosplit(Ω, 0.2, (1.0, -1.0))
# features and label for supervised learning
feats = [:band1,:band2,:band3,:band4]
label = :crop
# learning model
model = DecisionTreeClassifier()
# loss function
loss = MisclassLoss()
# block cross-validation with r = 30.
bcv = BlockValidation(30., loss = Dict(:crop => loss))
# estimate of generalization error
ϵ̂ = cverror((model, feats => label), Ωₛ, bcv)
Dict{Symbol, Float64} with 1 entry:
:crop => 0.233974
We can unhide the labels in the target domain and compute the actual error for comparison:
# train in Ωₛ and predict in Ωₜ
Ω̂ₜ = Ωₜ |> Learn(Ωₛ, model, feats => label)
# actual error of the model
ϵ = mean(loss.(Ωₜ.crop, Ω̂ₜ.crop))
0.23349822615300056
Below is the list of currently implemented validation methods.
Leave-one-out
GeoStatsValidation.LeaveOneOut
— TypeLeaveOneOut(; loss=Dict())
Leave-one-out validation. Optionally, specify loss
function from LossFunctions.jl
for some of the variables.
References
Leave-ball-out
GeoStatsValidation.LeaveBallOut
— TypeLeaveBallOut(ball; loss=Dict())
Leave-ball
-out (a.k.a. spatial leave-one-out) validation. Optionally, specify loss
function from the LossFunctions.jl package for some of the variables.
LeaveBallOut(radius; loss=Dict())
By default, use Euclidean ball of given radius
in space.
References
K-fold
GeoStatsValidation.KFoldValidation
— TypeKFoldValidation(k; shuffle=true, loss=Dict())
k
-fold cross-validation. Optionally, shuffle
the data, and specify loss
function from LossFunctions.jl
for some of the variables.
References
Block
GeoStatsValidation.BlockValidation
— TypeBlockValidation(sides; loss=Dict())
Cross-validation with blocks of given sides
. Optionally, specify loss
function from LossFunctions.jl
for some of the variables. If only one side is provided, then blocks become cubes.
References
Weighted
GeoStatsValidation.WeightedValidation
— TypeWeightedValidation(weighting, folding; lambda=1.0, loss=Dict())
An error estimation method which samples are weighted with weighting
method and split into folds with folding
method. Weights are raised to lambda
power in [0,1]
. Optionally, specify loss
function from LossFunctions.jl
for some of the variables.
References
- Sugiyama et al. 2006. Importance-weighted cross-validation for covariate shift
- Sugiyama et al. 2007. Covariate shift adaptation by importance weighted cross validation
Density-ratio
GeoStatsValidation.DensityRatioValidation
— TypeDensityRatioValidation(k; [parameters])
Density ratio validation where weights are first obtained with density ratio estimation, and then used in k
-fold weighted cross-validation.
Parameters
shuffle
- Shuffle the data before folding (default totrue
)estimator
- Density ratio estimator (default toLSIF()
)optlib
- Optimization library (default todefault_optlib(estimator)
)lambda
- Power of density ratios (default to1.0
)
Please see DensityRatioEstimation.jl for a list of supported estimators.
References
- Hoffimann et al. 2020. Geostatistical Learning: Challenges and Opportunities