Geospatial transforms

We provide a very powerful list of transforms that were designed to work seamlessly with geospatial data. These transforms are implemented in different packages depending on how they interact with geometries.

The list of supported transforms is continuously growing. The following code can be used to print an updated list in any project environment:

# packages to print type tree
using InteractiveUtils
using AbstractTrees
using TransformsBase

# packages with transforms
using GeoStats

# define the tree of types
AbstractTrees.children(T::Type) = subtypes(T)

# print all currently available transforms
AbstractTrees.print_tree(TransformsBase.Transform)
Transform
├─ GeometricTransform
│  ├─ Bridge
│  ├─ CoordinateTransform
│  │  ├─ Affine
│  │  ├─ Rotate
│  │  ├─ Scale
│  │  ├─ Stretch
│  │  └─ Translate
│  ├─ LambdaMuSmoothing
│  ├─ Repair
│  └─ StdCoords
├─ Identity
├─ TableTransform
│  ├─ Aggregate
│  ├─ CookieCutter
│  ├─ Detrend
│  ├─ Downscale
│  ├─ FeatureTransform
│  │  ├─ EigenAnalysis
│  │  ├─ Remainder
│  │  ├─ ColwiseFeatureTransform
│  │  │  ├─ Center
│  │  │  ├─ Coalesce
│  │  │  ├─ LowHigh
│  │  │  ├─ Quantile
│  │  │  └─ ZScore
│  │  └─ StatelessFeatureTransform
│  │     ├─ AbsoluteUnits
│  │     ├─ Assert
│  │     ├─ Closure
│  │     ├─ Coerce
│  │     ├─ ColTable
│  │     ├─ Compose
│  │     ├─ DropConstant
│  │     ├─ DropExtrema
│  │     ├─ DropMissing
│  │     ├─ DropNaN
│  │     ├─ DropUnits
│  │     ├─ Filter
│  │     ├─ Functional
│  │     ├─ Indicator
│  │     ├─ Learn
│  │     ├─ Levels
│  │     ├─ Map
│  │     ├─ OneHot
│  │     ├─ ProjectionPursuit
│  │     ├─ Reject
│  │     ├─ Rename
│  │     ├─ Replace
│  │     ├─ RowTable
│  │     ├─ Sample
│  │     ├─ Satisfies
│  │     ├─ Select
│  │     ├─ Sort
│  │     ├─ StdFeats
│  │     ├─ StdNames
│  │     ├─ LogRatio
│  │     │  ├─ ALR
│  │     │  ├─ CLR
│  │     │  └─ ILR
│  │     └─ Unitify
│  ├─ ClusteringTransform
│  │  ├─ GHC
│  │  ├─ GSC
│  │  └─ SLIC
│  ├─ Interpolate
│  ├─ InterpolateMissing
│  ├─ InterpolateNaN
│  ├─ InterpolateNeighbors
│  ├─ Potrace
│  ├─ Rasterize
│  ├─ Simulate
│  ├─ ParallelTableTransform
│  ├─ Transfer
│  ├─ UniqueCoords
│  └─ Upscale
└─ SequentialTransform

Transforms at the leaves of the tree above should have a docstring with more information on the available options. For example, the documentation of the Select transform is shown below:

TableTransforms.SelectType
Select(col₁, col₂, ..., colₙ)
Select([col₁, col₂, ..., colₙ])
Select((col₁, col₂, ..., colₙ))

The transform that selects columns col₁, col₂, ..., colₙ.

Select(col₁ => newcol₁, col₂ => newcol₂, ..., colₙ => newcolₙ)

Selects the columns col₁, col₂, ..., colₙ and rename them to newcol₁, newcol₂, ..., newcolₙ.

Select(regex)

Selects the columns that match with regex.

Examples

Select(1, 3, 5)
Select([:a, :c, :e])
Select(("a", "c", "e"))
Select(1 => :x, 3 => :y)
Select(:a => :x, :b => :y)
Select("a" => "x", "b" => "y")
Select(r"[ace]")

Transforms of type FeatureTransform operate on the attribute table, whereas transforms of type GeometricTransform operate on the underlying geospatial domain:

TableTransforms.FeatureTransformType
FeatureTransform

A transform that operates on the columns of the table containing features, i.e., simple attributes such as numbers, strings, etc.

Other transforms such as Detrend are defined in terms of both the geospatial domain and the attribute table. All transforms and pipelines implement the following functions:

TransformsBase.isrevertibleFunction
isrevertible(transform)

Tells whether or not the transform is revertible, i.e. supports a revert function. Defaults to false for new transform types.

Transforms can be revertible and yet don't be invertible. Invertibility is a mathematical concept, whereas revertibility is a computational concept.

See also isinvertible.

TransformsBase.isinvertibleFunction
isinvertible(transform)

Tells whether or not the transform is invertible, i.e. whether it implements the inverse function. Defaults to false for new transform types.

Transforms can be invertible in the mathematical sense, i.e., there exists a one-to-one mapping between input and output spaces.

See also inverse, isrevertible.

TransformsBase.applyFunction
newobject, cache = apply(transform, object)

Apply transform on the object. Return the newobject and a cache to revert the transform later.

TransformsBase.revertFunction
object = revert(transform, newobject, cache)

Revert the transform on the newobject using the cache from the corresponding apply call and return the original object. Only defined when the transform isrevertible.

TransformsBase.reapplyFunction
newobject = reapply(transform, object, cache)

Reapply the transform to (a possibly different) object using a cache that was created with a previous apply call. Fallback to apply without using the cache.

Feature transforms

Please check the TableTransforms.jl documentation for an updated list of feature transforms. As an example consider the following features over a Cartesian grid and their statistics:

using DataFrames

# table of features and domain
tab = DataFrame(a=rand(1000), b=randn(1000), c=rand(1000))
dom = CartesianGrid(100, 100)

# georeference table onto domain
Ω = georef(tab, dom)

# describe features
describe(values(Ω))
3×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolFloat64Float64Float64Float64Int64DataType
1a0.4934360.0003775550.4924920.9997980Float64
2b0.0357089-3.491350.03397223.985030Float64
3c0.4944630.0002836310.4888820.9997810Float64

We can create a pipeline that transforms the features to their normal quantile (or scores):

pipe = Quantile()

Ω̄, cache = apply(pipe, Ω)

describe(values(Ω̄))
3×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolFloat64Float64Float64Float64Int64DataType
1a0.00309023-3.090230.001253323.090230Float64
2b0.00309023-3.090230.001253323.090230Float64
3c0.00309023-3.090230.001253323.090230Float64

We can then revert the transform given any new geospatial data in the transformed sample space:

Ωₒ = revert(pipe, Ω̄, cache)

describe(values(Ωₒ))
3×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolFloat64Float64Float64Float64Int64DataType
1a0.4939270.000729110.4926750.9970990Float64
2b0.0390278-3.150990.03582183.77680Float64
3c0.4949550.0004200450.48950.997530Float64

The Learn transform is another important transform from StatsLearnModels.jl:

StatsLearnModels.LearnType
Learn(train, model, incols => outcols)

Fits the statistical learning model using the input columns, selected by incols, and the output columns, selected by outcols, from the train table.

The column selection can be a single column identifier (index or name), a collection of identifiers or a regular expression (regex).

Examples

Learn(train, model, [1, 2, 3] => "d")
Learn(train, model, [:a, :b, :c] => :d)
Learn(train, model, ["a", "b", "c"] => 4)
Learn(train, model, [1, 2, 3] => [:d, :e])
Learn(train, model, r"[abc]" => ["d", "e"])

For more details, consider watching our JuliaCon2021 talk:

Geometric transforms

Please check the Meshes.jl documentation for an updated list of geometric transforms. As an example consider the rotation of geospatial data over a Cartesian grid:

# geospatial domain
Ω = georef((Z=rand(10, 10),))

# apply geometric transform
Ωr = Ω |> Rotate(Angle2d(π/4))

fig = Mke.Figure(size = (800, 400))
viz(fig[1,1], Ω.geometry, color = Ω.Z)
viz(fig[1,2], Ωr.geometry, color = Ωr.Z)
fig

Geostatistical transforms

Bellow is the current list of transforms that operate on both the geometries and features of geospatial data. They are implemented in the GeoStatsBase.jl package.

UniqueCoords

GeoStatsTransforms.UniqueCoordsType
UniqueCoords(var₁ => agg₁, var₂ => agg₂, ..., varₙ => aggₙ)

Retain locations in data with unique coordinates.

Duplicates of a variable varᵢ are aggregated with aggregation function aggᵢ. If an aggregation function is not defined for variable varᵢ, the default aggregation function will be used. Default aggregation function is mean for continuous variables and first otherwise.

Examples

UniqueCoords(1 => last, 2 => maximum)
UniqueCoords(:a => first, :b => minimum)
UniqueCoords("a" => last, "b" => maximum)
# point set with repeated points
X = rand(2, 50)
Ω = georef((Z=rand(100),), [X X])
100×2 GeoTable over 100 PointSet{2,Float64}
Z geometry
Continuous Point2
[NoUnits]
0.482956 (0.227862, 0.669888)
0.215817 (0.815748, 0.557501)
0.879321 (0.747854, 0.0708477)
0.637723 (0.765591, 0.44949)
0.787073 (0.81633, 0.16868)
0.442855 (0.166483, 0.801379)
0.419074 (0.771582, 0.827903)
0.780717 (0.178335, 0.751436)
0.219165 (0.563711, 0.638633)
0.693488 (0.323725, 0.832441)
# discard repeated points
𝒰 = Ω |> UniqueCoords()
50×2 GeoTable over 50 view(::PointSet{2,Float64}, [1, 2, 3, 4, ..., 47, 48, 49, 50])
Z geometry
Continuous Point2
[NoUnits]
0.321102 (0.227862, 0.669888)
0.477058 (0.815748, 0.557501)
0.523582 (0.747854, 0.0708477)
0.455216 (0.765591, 0.44949)
0.507788 (0.81633, 0.16868)
0.262208 (0.166483, 0.801379)
0.368692 (0.771582, 0.827903)
0.461413 (0.178335, 0.751436)
0.134512 (0.563711, 0.638633)
0.465087 (0.323725, 0.832441)

Detrend

GeoStatsTransforms.DetrendType
Detrend(col₁, col₂, ..., colₙ; degree=1)
Detrend([col₁, col₂, ..., colₙ]; degree=1)
Detrend((col₁, col₂, ..., colₙ); degree=1)

The transform that detrends columns col₁, col₂, ..., colₙ with a polynomial of given degree.

Detrend(regex; degree=1)

Detrends the columns that match with regex.

Examples

Detrend(1, 3, 5)
Detrend([:a, :c, :e])
Detrend(("a", "c", "e"))
Detrend(r"[ace]", degree=2)
Detrend(:)

References

# quadratic trend + random noise
r = range(-1, stop=1, length=100)
μ = [x^2 + y^2 for x in r, y in r]
ϵ = 0.1rand(100, 100)
Ω = georef((Z=μ+ϵ,))

# detrend and obtain noise component
𝒩 = Ω |> Detrend(:Z, degree=2)

fig = Mke.Figure(size = (800, 400))
viz(fig[1,1], Ω.geometry, color = Ω.Z)
viz(fig[1,2], 𝒩.geometry, color = 𝒩.Z)
fig

Potrace

GeoStatsTransforms.PotraceType
Potrace(mask; [ϵ])
Potrace(mask, var₁ => agg₁, ..., varₙ => aggₙ; [ϵ])

Trace polygons on 2D image data with Selinger's Potrace algorithm.

The categories stored in column mask are converted into binary masks, which are then traced into multi-polygons. When provided, the option ϵ is forwarded to Selinger's simplification algorithm.

Duplicates of a variable varᵢ are aggregated with aggregation function aggᵢ. If an aggregation function is not defined for variable varᵢ, the default aggregation function will be used. Default aggregation function is mean for continuous variables and first otherwise.

Examples

Potrace(:mask, ϵ=0.1)
Potrace(1, 1 => last, 2 => maximum)
Potrace(:mask, :a => first, :b => minimum)
Potrace("mask", "a" => last, "b" => maximum)

References

# continuous feature
Z = [sin(i/10) + sin(j/10) for i in 1:100, j in 1:100]

# binary mask
M = Z .> 0

# georeference data
Ω = georef((Z=Z, M=M))

# trace polygons using mask
𝒯 = Ω |> Potrace(:M)

fig = Mke.Figure(size = (800, 400))
viz(fig[1,1], Ω.geometry, color = Ω.Z)
viz(fig[1,2], 𝒯.geometry, color = 𝒯.Z)
fig
𝒯.geometry
2 GeometrySet{2,Float64}
├─ Multi(4×PolyArea)
└─ Multi(4×PolyArea)

Rasterize

GeoStatsTransforms.RasterizeType
Rasterize(grid)
Rasterize(grid, var₁ => agg₁, ..., varₙ => aggₙ)

Rasterize geometries within specified grid.

Rasterize(nx, ny)
Rasterize(nx, ny, var₁ => agg₁, ..., varₙ => aggₙ)

Alternatively, use the grid with size nx by ny obtained with discretization of the bounding box.

Duplicates of a variable varᵢ are aggregated with aggregation function aggᵢ. If an aggregation function is not defined for variable varᵢ, the default aggregation function will be used. Default aggregation function is mean for continuous variables and first otherwise.

Examples

grid = CartesianGrid(10, 10)
Rasterize(grid)
Rasterize(10, 10)
Rasterize(grid, 1 => last, 2 => maximum)
Rasterize(10, 10, 1 => last, 2 => maximum)
Rasterize(grid, :a => first, :b => minimum)
Rasterize(10, 10, :a => first, :b => minimum)
Rasterize(grid, "a" => last, "b" => maximum)
Rasterize(10, 10, "a" => last, "b" => maximum)
A = [1, 2, 3, 4, 5]
B = [1.1, 2.2, 3.3, 4.4, 5.5]
p1 = PolyArea((2, 0), (6, 2), (2, 2))
p2 = PolyArea((0, 6), (3, 8), (0, 10))
p3 = PolyArea((3, 6), (9, 6), (9, 9), (6, 9))
p4 = PolyArea((7, 0), (10, 0), (10, 4), (7, 4))
p5 = PolyArea((1, 3), (5, 3), (6, 6), (3, 8), (0, 6))
gt = georef((; A, B), [p1, p2, p3, p4, p5])

nt = gt |> Rasterize(20, 20)

viz(nt.geometry, color = nt.A)

Clustering

Unlike traditional clustering algorithms in machine learning, geostatistical clustering (a.k.a. domaining) algorithms consider both the features and the geospatial coordinates of the data.

Consider the following data as an example:

Ω = georef((Z=[10sin(i/10) + j for i in 1:4:100, j in 1:4:100],))

viz(Ω.geometry, color = Ω.Z)
GeoStatsTransforms.GHCType
GHC(k, λ; kern=:epanechnikov, link=:ward, as=:CLUSTER)

A transform for partitioning geospatial data into k clusters according to a range λ using Geostatistical Hierarchical Clustering (GHC). The larger the range the more connected are nearby samples.

Parameters

  • k - Approximate number of clusters
  • λ - Approximate range of kernel function
  • kern - Kernel function (:uniform, :triangular or :epanechnikov)
  • link - Linkage function (:single, :average, :complete, :ward or :ward_presquared)
  • as - Cluster column name

References

Notes

  • The range parameter controls the sparsity pattern of the pairwise distances, which can greatly affect the computational performance of the GHC algorithm. We recommend choosing a range that is small enough to connect nearby samples. For example, clustering data over a 100x100 Cartesian grid with unit spacing is possible with λ=1.0 or λ=2.0 but the problem starts to become computationally unfeasible around λ=10.0 due to the density of points.
𝒞 = Ω |> GHC(20, 1.0)

viz(𝒞.geometry, color = 𝒞.CLUSTER)
GeoStatsTransforms.GSCType
GSC(k, m; σ=1.0, tol=1e-4, maxiter=10, weights=nothing, as=:CLUSTER)

A transform for partitioning geospatial data into k clusters using Geostatistical Spectral Clustering (GSC).

Parameters

  • k - Desired number of clusters
  • m - Multiplicative factor for adjacent weights
  • σ - Standard deviation for exponential model (default to 1.0)
  • tol - Tolerance of k-means algorithm (default to 1e-4)
  • maxiter - Maximum number of iterations (default to 10)
  • weights - Dictionary with weights for each attribute (default to nothing)
  • as - Cluster column name

References

Notes

  • The algorithm implemented here is slightly different than the algorithm

described in Romary et al. 2015. Instead of setting Wᵢⱼ = 0 when i <-/-> j, we simply magnify the weight by a multiplicative factor Wᵢⱼ *= m when i <–> j. This leads to dense matrices but also better results in practice.

𝒞 = Ω |> GSC(50, 2.0)

viz(𝒞.geometry, color = 𝒞.CLUSTER)
GeoStatsTransforms.SLICType
SLIC(k, m; tol=1e-4, maxiter=10, weights=nothing, as=:CLUSTER)

A transform for clustering geospatial data into approximately k clusters using Simple Linear Iterative Clustering (SLIC). The transform produces clusters of samples that are spatially connected based on a distance dₛ and that, at the same time, are similar in terms of vars with distance dᵥ. The tradeoff is controlled with a hyperparameter parameter m in an additive model dₜ = √(dᵥ² + m²(dₛ/s)²).

Parameters

  • k - Approximate number of clusters
  • m - Hyperparameter of SLIC model
  • tol - Tolerance of k-means algorithm (default to 1e-4)
  • maxiter - Maximum number of iterations (default to 10)
  • weights - Dictionary with weights for each attribute (default to nothing)
  • as - Cluster column name

References

𝒞 = Ω |> SLIC(50, 0.01)

viz(𝒞.geometry, color = 𝒞.CLUSTER)

Interpolate

GeoStatsTransforms.InterpolateType
Interpolate(domain, vars₁ => model₁, ..., varsₙ => modelₙ; [parameters])
Interpolate([g₁, g₂, ..., gₙ], vars₁ => model₁, ..., varsₙ => modelₙ; [parameters])

Interpolate geospatial data on given domain or vector of geometries [g₁, g₂, ..., gₙ], using geostatistical models model₁, ..., modelₙ for variables vars₁, ..., varsₙ.

Interpolate(domain, model=NN(); [parameters])
Interpolate([g₁, g₂, ..., gₙ], model=NN(); [parameters])

Interpolate geospatial data on given domain or vector of geometries [g₁, g₂, ..., gₙ], using geostatistical model for all variables.

Parameters

  • point - Perform interpolation on point support (default to true)
  • prob - Perform probabilistic interpolation (default to false)

See also InterpolateNeighbors, InterpolateMissing, InterpolateNaN.

GeoStatsTransforms.InterpolateNeighborsType
InterpolateNeighbors(domain, vars₁ => model₁, ..., varsₙ => modelₙ; [parameters])
InterpolateNeighbors([g₁, g₂, ..., gₙ], vars₁ => model₁, ..., varsₙ => modelₙ; [parameters])

Interpolate geospatial data on given domain or set of geometries g₁, g₂, ..., gₙ, using geostatistical models model₁, ..., modelₙ for variables vars₁, ..., varsₙ.

InterpolateNeighbors(domain, model=NN(); [parameters])
InterpolateNeighbors([g₁, g₂, ..., gₙ], model=NN(); [parameters])

Interpolate geospatial data on given domain or set of geometries g₁, g₂, ..., gₙ, using geostatistical model for all variables.

Unlike Interpolate, this transform uses neighbor search methods to fit geostatistical models at each interpolation location with a reduced number of measurements.

Parameters

  • minneighbors - Minimum number of neighbors (default to 1)
  • maxneighbors - Maximum number of neighbors (default to 10)
  • neighborhood - Search neighborhood (default to nothing)
  • distance - A distance defined in Distances.jl (default to Euclidean())
  • point - Perform interpolation on point support (default to true)
  • prob - Perform probabilistic interpolation (default to false)

The maxneighbors parameter can be used to perform interpolation with a subset of measurements per prediction location. If maxneighbors is not provided, then all measurements are used.

Two neighborhood search methods are available:

  • If a neighborhood is provided, local prediction is performed by sliding the neighborhood in the domain.

  • If a neighborhood is not provided, the prediction is performed using maxneighbors nearest neighbors according to distance.

See also Interpolate, InterpolateMissing, InterpolateNaN.

table = (; Z=[1.,0.,1.])
coord = [(25.,25.), (50.,75.), (75.,50.)]
geotable = georef(table, coord)

grid = CartesianGrid(100, 100)

model = Kriging(GaussianVariogram(range=35.))

interp = geotable |> Interpolate(grid, model)

viz(interp.geometry, color = interp.Z)

Simulate

GeoStatsTransforms.SimulateType
Simulate(domain, vars₁ => process₁, ..., varsₙ => processₙ; [parameters])
Simulate(domain, nreals, vars₁ => process₁, ..., varsₙ => processₙ; [parameters])
Simulate([g₁, g₂, ..., gₙ], vars₁ => process₁, ..., varsₙ => processₙ; [parameters])
Simulate([g₁, g₂, ..., gₙ], nreals, vars₁ => process₁, ..., varsₙ => processₙ; [parameters])

Simulate nreals realizations of variables varsᵢ with geostatistical process processᵢ over given domain or vector of geometries [g₁, g₂, ..., gₙ].

The parameters are forwarded to the rand method of the geostatistical processes.

CookieCutter

GeoStatsTransforms.CookieCutterType
CookieCutter(domain, parent => process, var₁ => procmap₁, ..., varₙ => procmapₙ; [parameters])
CookieCutter(domain, nreals, parent => process, var₁ => procmap₁, ..., varₙ => procmapₙ; [parameters])

Simulate nreals realizations of variable parent with geostatistical process process, and each child variable varsᵢ with process map procmapᵢ, over given domain.

The process map must be an iterable of pairs of the form: value => process. Each process in the map is related to a value of the parent realization, therefore the values of the child variables will be chosen according to the values of the corresponding parent realization.

The parameters are forwarded to the rand method of the geostatistical processes.

Examples

parent = QuiltingProcess(trainimg, (30, 30))
child0 = GaussianProcess(SphericalVariogram(range=20.0, sill=0.2))
child1 = GaussianProcess(SphericalVariogram(MetricBall((200.0, 20.0))))
transform = CookieCutter(domain, :parent => parent, :child => [0 => child0, 1 => child1])