Validation
Overview
We provide various geostatistical cross-validation methods to compare interpolation models or learning models. These methods are accessible through the cverror function:
GeoStatsValidation.cverror — Function
cverror(model, geotable, method)Estimate cross-validation error of (geo)statistical model on given geotable with error estimation method.
As an example, consider the block cross-validation error of the following decision tree learning model:
using GeoStats
using GeoIO
# load geospatial data
Ω = GeoIO.load("data/agriculture.csv", coords = ("x", "y"))
# 20%/80% split along the (1, -1) direction
Ωₛ, Ωₜ = geosplit(Ω, 0.2, (1.0, -1.0))
# learning model
model = DecisionTreeClassifier()
# loss function
loss = MisclassLoss()
# block cross-validation with r = 30.
bcv = BlockValidation(30., loss = Dict("crop" => loss))
# estimate of generalization error
ϵ̂ = cverror(model, label(Ωₛ, "crop"), bcv)Dict{Symbol, Float64} with 1 entry:
:crop => 0.230067We can unhide the labels in the target domain and compute the actual error for comparison:
# train in Ωₛ and predict in Ωₜ
Ω̂ₜ = Ωₜ |> Learn(label(Ωₛ, "crop"), model=model)
# actual error of the model
ϵ = mean(loss.(Ωₜ.crop, Ω̂ₜ.crop))0.23491397258448676Methods
Leave-one-out
GeoStatsValidation.LeaveOneOut — Type
LeaveOneOut(; loss=Dict())Leave-one-out validation. Optionally, specify a dictionary of loss functions from LossFunctions.jl for some of the variables.
References
- Stone. 1974. [Cross-Validatory Choice and Assessment of Statistical Predictions] (https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1974.tb00994.x)
Leave-ball-out
GeoStatsValidation.LeaveBallOut — Type
LeaveBallOut(ball; loss=Dict())Leave-ball-out (a.k.a. spatial leave-one-out) validation. Optionally, specify a dictionary with loss functions from LossFunctions.jl for some of the variables.
LeaveBallOut(radius; loss=Dict())By default, use Euclidean ball of given radius in space.
References
- Le Rest et al. 2014. [Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation] (https://onlinelibrary.wiley.com/doi/full/10.1111/geb.12161)
K-fold
GeoStatsValidation.KFoldValidation — Type
KFoldValidation(k; shuffle=true, loss=Dict())k-fold cross-validation. Optionally, shuffle the data, and specify a dictionary with loss functions from LossFunctions.jl for some of the variables.
References
- Geisser, S. 1975. [The predictive sample reuse method with applications] (https://www.jstor.org/stable/2285815)
- Burman, P. 1989. [A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods] (https://www.jstor.org/stable/2336116)
Block
GeoStatsValidation.BlockValidation — Type
BlockValidation(sides; loss=Dict())Cross-validation with blocks of given sides. Optionally, specify a dictionary with loss functions from LossFunctions.jl for some of the variables.
References
- Roberts et al. 2017. [Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure] (https://onlinelibrary.wiley.com/doi/10.1111/ecog.02881)
- Pohjankukka et al. 2017. [Estimating the prediction performance of spatial models via spatial k-fold cross-validation] (https://www.tandfonline.com/doi/full/10.1080/13658816.2017.1346255)
Weighted
GeoStatsValidation.WeightedValidation — Type
WeightedValidation(weighting, folding; lambda=1.0, loss=Dict())An error estimation method which samples are weighted with weighting method and split into folds with folding method. Weights are raised to lambda power in [0,1]. Optionally, specify a dictionary with loss functions from LossFunctions.jl for some of the variables.
References
- Sugiyama et al. 2006. Importance-weighted cross-validation for covariate shift
- Sugiyama et al. 2007. Covariate shift adaptation by importance weighted cross validation
Density-ratio
GeoStatsValidation.DensityRatioValidation — Type
DensityRatioValidation(k; [options])Density ratio validation where weights are first obtained with density ratio estimation, and then used in k-fold weighted cross-validation.
Options
shuffle- Shuffle the data before folding (default totrue)estimator- Density ratio estimator (default toLSIF())optlib- Optimization library (default todefault_optlib(estimator))lambda- Power of density ratios (default to1.0)loss- Dictionary with loss functions (default toDict())
Please see [DensityRatioEstimation.jl] (https://github.com/JuliaEarth/DensityRatioEstimation.jl) for a list of supported estimators.
References
- Hoffimann et al. 2020. [Geostatistical Learning: Challenges and Opportunities] (https://arxiv.org/abs/2102.08791)