# Geospatial data

## Overview

Given a table or array containing data, we can georeference these objects onto a geospatial domain with the `georef`

function. For a list of available domains, please see Domains.

`GeoStatsBase.georef`

— Function`georef(table, domain)`

Georeference `table`

on geospatial `domain`

.

`georef(table, coords)`

Georeference `table`

on a `PointSet(coords)`

.

`georef(table, coordnames)`

Georeference `table`

using columns `coordnames`

.

`georef(tuple, domain)`

Georeference named `tuple`

on geospatial `domain`

.

`georef(tuple, coords)`

Georefrence named `tuple`

on `PointSet(coords)`

.

`georef(tuple; origin=(0.,0.,...), spacing=(1.,1.,...))`

Georeference named `tuple`

on `CartesianGrid(size(tuple[1]), origin, spacing)`

.

In the opposite direction, the functions `values`

and `domain`

can be used to retrieve the table of attributes and the underlying geospatial domain.

`Base.values`

— Function`values(data, [rank])`

Return the values of `data`

for a given `rank`

as a table.

The rank is a non-negative integer that specifies the parametric dimension of the geometries of interest:

- 0 - points
- 1 - segments
- 2 - triangles, quadrangles, ...
- 3 - tetrahedrons, hexahedrons, ...

If the rank is not specified, it is assumed to be the rank of the elements of the domain.

`Meshes.domain`

— Function`domain(data)`

Return underlying domain of the `data`

.

## Examples

### Tables

Consider a table (e.g. DataFrame) with 25 samples of temperature and precipitation:

```
using DataFrames
table = DataFrame(T=rand(25), P=rand(25))
```

25 rows × 2 columns

T | P | |
---|---|---|

Float64 | Float64 | |

1 | 0.70417 | 0.747171 |

2 | 0.421527 | 0.864585 |

3 | 0.688692 | 0.21695 |

4 | 0.393807 | 0.199692 |

5 | 0.178637 | 0.969165 |

6 | 0.614038 | 0.550512 |

7 | 0.619698 | 0.994584 |

8 | 0.313715 | 0.228739 |

9 | 0.25812 | 0.252486 |

10 | 0.0140286 | 0.899314 |

11 | 0.503665 | 0.695648 |

12 | 0.637951 | 0.625345 |

13 | 0.456498 | 0.977333 |

14 | 0.53485 | 0.0600936 |

15 | 0.966474 | 0.055822 |

16 | 0.722687 | 0.185636 |

17 | 0.2105 | 0.365988 |

18 | 0.0332188 | 0.759673 |

19 | 0.726096 | 0.686092 |

20 | 0.0872569 | 0.882196 |

21 | 0.515131 | 0.39207 |

22 | 0.432542 | 0.739041 |

23 | 0.947375 | 0.799289 |

24 | 0.324698 | 0.495992 |

25 | 0.843534 | 0.142729 |

We can georeference this table based on a given set of coordinates:

```
𝒟 = georef(table, PointSet(rand(2,25)))
plot(𝒟)
```

or alternatively, georeference it on a 5x5 regular grid (5x5 = 25 samples):

```
𝒟 = georef(table, CartesianGrid(5,5))
plot(𝒟)
```

In the first case, the `PointSet`

domain type can be omitted, and GeoStats.jl will understand that the matrix passed as the second argument contains the coordinates of a point set:

`𝒟 = georef(table, rand(2,25))`

```
25 MeshData{2,Float64}
variables (rank 0)
└─P (Float64)
└─T (Float64)
domain: 25 PointSet{2,Float64}
```

Another common pattern in spatial data sets is when the coordinates of the samples are already part of the table as columns. In this case, we can specify the column names as symbols:

```
table = DataFrame(T=rand(25), P=rand(25), X=rand(25), Y=rand(25), Z=rand(25))
𝒟 = georef(table, (:X,:Y,:Z))
plot(𝒟)
```

### Arrays

Consider arrays (e.g. images) with data for various spatial variables. We can georeference these arrays using a named tuple, and GeoStats.jl will understand that the shape of the arrays should be preserved in a Cartesian grid:

```
T, P = rand(5,5), rand(5,5)
𝒟 = georef((T=T, P=P))
plot(𝒟)
```

We can also specify the origin and spacing of the grid using keyword arguments:

```
𝒟₁ = georef((T=T, P=P), origin=(0.,0.), spacing=(1.,1.))
𝒟₂ = georef((T=T, P=P), origin=(10.,10.), spacing=(2.,2.))
plot(𝒟₁)
plot!(𝒟₂)
```

Alternatively, we can interpret the entries of the named tuple as columns in a table:

```
𝒟 = georef((T=T, P=T), rand(2,25))
plot(𝒟)
```

### Shapefiles

The GeoTables.jl package can be used to load geospatial data from various file formats. It also provides utility functions to automatically download maps given the name of any region in the world.

We can load a shapefile as a geospatial table that is compatible with the framework:

```
using GeoTables
zone = GeoTables.load("data/zone.shp")
```

```
4 GeoTable{2,Float64}
variables (rank 2)
└─ACRES (Float64)
└─Hectares (Float64)
└─MACROZONA (String)
└─PERIMETER (Float64)
└─area_m2 (Float64)
domain: 4 GeometrySet{2,Float64}
```

`path = GeoTables.load("data/path.shp")`

```
6 GeoTable{2,Float64}
variables (rank 2)
└─ZONA (String)
domain: 6 GeometrySet{2,Float64}
```

Unlike in previous examples where each row of the table was associated with simple geometries (e.g. `Point`

or `Quadrangle`

), here we have more complicated geometries to consider:

`zone.geometry`

```
4-element Vector{Multi{2, Float64, PolyArea{2, Float64, Chain{2, Float64, Vector{Point2}}}}}:
21 Multi-PolyArea{2,Float64}
1 Multi-PolyArea{2,Float64}
1 Multi-PolyArea{2,Float64}
2 Multi-PolyArea{2,Float64}
```

We can visualize these geometries as the domain of the geospatial data as usual:

```
plot(domain(zone), fill=true, color=:gray)
plot!(domain(path), fill=true, color=:gray90)
```

and most importantly, *nothing special needs to be done*. This geospatial table can be used with any geostatistical workflow.

## Custom data

GeoStats.jl is integrated with the Meshes.jl project. In summary, any type that is a subtype of `Meshes.Data`

and that implements `Meshes.domain`

and `Meshes.values`

is compatible with the framework and can be used in geostatistical workflows.

Please ask in our community channels if you need help implementing custom geospatial data types.