# Basic workflow

A basic geostatistical workflow often consists of three steps:

1. Manipulation of geospatial data
2. Definition of geostatistical problem
3. Visualization of problem solution

In this section, we walk through these steps to illustrate some of the features of the project. Although we use Plots.jl for visualization, we could have used Makie.jl instead. Please check MeshViz.jl for 3D visualization examples.

## Manipulating data

The workflow starts with the creation of geospatial data, which can be loaded from disk or derived from other Julia variables. For example, given a Julia array (or image), which is not attached to any particular coordinate system:

using Plots

Z = [10sin(i/10) + j for i in 1:100, j in 1:200]

heatmap(Z)

We can georeference the array using the georef function:

using GeoStats

Ω = georef((Z=Z,))
20000 MeshData{2,Float64}
variables (rank 2)
└─Z (Float64)
domain: 100×200 CartesianGrid{2,Float64}

The origin and spacing of samples can be specified with:

georef((Z=Z,), origin=(1.,1.), spacing=(10.,10.))
20000 MeshData{2,Float64}
variables (rank 2)
└─Z (Float64)
domain: 100×200 CartesianGrid{2,Float64}

and different geospatial configurations can be obtained with different methods (see Data).

We plot the geospatial data and note a few differences compared to the plot shown above:

plot(Ω)

First, we note that the image was rotated to match the first index i of the array with the horizontal x axis, and the second index j of the array with the vertical y axis. Second, we note that the image was stretched to reflect the real 100x200 size of the CartesianGrid.

### Tabular access

Our geospatial data implements the Tables.jl interface, which means that they can be accessed as if they were tables with samples in the rows and variables in the columns. In this case, a special column named geometry is created on the fly, row by row, containing Meshes.jl geometries:

using Tables

first(Tables.rows(Ω), 3)
3-element Vector{Any}:
(Z = 1.9983341664682817, geometry = Quadrangle(Point(0.0, 0.0), Point(1.0, 0.0), Point(1.0, 1.0), Point(0.0, 1.0)))
(Z = 2.9866933079506124, geometry = Quadrangle(Point(1.0, 0.0), Point(2.0, 0.0), Point(2.0, 1.0), Point(1.0, 1.0)))
(Z = 3.9552020666133956, geometry = Quadrangle(Point(2.0, 0.0), Point(3.0, 0.0), Point(3.0, 1.0), Point(2.0, 1.0)))

We can design advanced geospatial queries using the Query.jl language:

using Query

t = Ω |> @mutate(geometry = centroid(_.geometry))
Zgeometry
1.99833Point(0.5, 0.5)
2.98669Point(1.5, 0.5)
3.9552Point(2.5, 0.5)
4.89418Point(3.5, 0.5)
5.79426Point(4.5, 0.5)
6.64642Point(5.5, 0.5)
7.44218Point(6.5, 0.5)
8.17356Point(7.5, 0.5)
8.83327Point(8.5, 0.5)
9.41471Point(9.5, 0.5)

... with 19990 more rows.

and convert the resulting table to geospatial data if a geometry column is present:

t |> GeoData
20000 MeshData{2,Float64}
variables (rank 0)
└─Z (Float64)
domain: 20000 PointSet{2,Float64}

For convenience, we can retrieve individual columns as vectors:

Ω.Z
20000-element Vector{Float64}:
1.9983341664682817
2.9866933079506124
3.9552020666133956
4.894183423086505
5.79425538604203
6.6464247339503535
7.44217687237691
8.173560908995228
8.833269096274833
9.414709848078965
⋮
202.22889914100247
201.2445442350706
200.2477542545336
199.24848879538192
198.2567321877702
197.28239373589057
196.33520870748072
195.4246410622468
194.5597888911063

or as an array with the correct shape using the asarray function:

asarray(Ω, :Z)
100×200 Matrix{Float64}:
1.99833    2.99833    3.99833   …  197.998  198.998  199.998  200.998
2.98669    3.98669    4.98669      198.987  199.987  200.987  201.987
3.9552     4.9552     5.9552       199.955  200.955  201.955  202.955
4.89418    5.89418    6.89418      200.894  201.894  202.894  203.894
5.79426    6.79426    7.79426      201.794  202.794  203.794  204.794
6.64642    7.64642    8.64642   …  202.646  203.646  204.646  205.646
7.44218    8.44218    9.44218      203.442  204.442  205.442  206.442
8.17356    9.17356   10.1736       204.174  205.174  206.174  207.174
8.83327    9.83327   10.8333       204.833  205.833  206.833  207.833
9.41471   10.4147    11.4147       205.415  206.415  207.415  208.415
⋮                               ⋱
3.2289     4.2289     5.2289       199.229  200.229  201.229  202.229
2.24454    3.24454    4.24454      198.245  199.245  200.245  201.245
1.24775    2.24775    3.24775      197.248  198.248  199.248  200.248
0.248489   1.24849    2.24849      196.248  197.248  198.248  199.248
-0.743268   0.256732   1.25673   …  195.257  196.257  197.257  198.257
-1.71761   -0.717606   0.282394     194.282  195.282  196.282  197.282
-2.66479   -1.66479   -0.664791     193.335  194.335  195.335  196.335
-3.57536   -2.57536   -1.57536      192.425  193.425  194.425  195.425
-4.44021   -3.44021   -2.44021      191.56   192.56   193.56   194.56

### Tabular transforms

The TableTransforms.jl package can be used to design advanced pipelines with the attribute table:

quant = values(Ω) |> Quantile()

histogram(quant.Z, color=:gray80, label="quantile")

The transformed table can be georeferenced for further geostatistical modeling:

georef(quant, domain(Ω))
20000 MeshData{2,Float64}
variables (rank 2)
└─Z (Float64)
domain: 100×200 CartesianGrid{2,Float64}

These pipelines are revertible meaning that one can transform the data, perform geostatistical modeling, and revert the pipelines to obtain estimates in the original sample space.

### Geospatial views

Geospatial data can be viewed at a subset of locations without unnecessary memory allocations:

Ωᵥ = view(Ω, 1:10*100)

plot(Ωᵥ)

We plot a random view of the grid to emphasize that views do not preserve geospatial regularity:

inds = rand(1:nelements(Ω), 100)

plot(view(Ω, inds))

### Geospatial partitions

Geospatial data can be partitioned with various efficient methods. To demonstrate the operation, we partition our geospatial data view into balls of given radius:

Π = partition(Ωᵥ, BallPartition(5.))

plot(Π)

or, alternatively, into two halfspaces:

Π = partition(Ωᵥ, BisectFractionPartition((1.,1.), 0.5))

plot(Π)