## Spatial Index: Grid Systems

This post is a continuation of Stomping Grounds: Spatial Indexes, but don’t worry if you missed the first part—you’ll still find plenty of new insights right here.

### 3. Geohash

Geohash: Invented in 2008 by Gustavo Niemeyer, encodes a geographic location into a short string of letters and digits. It's a hierarchical spatial data structure that subdivides space into buckets of grid shape using a Z-order curve (Section 2.).

## 3.1. Geohash - Intuition

Earth is round or more accurately, an ellipsoid. Map projection is a set of transformations represent the globe on a plane. In a map projection. Coordinates (latitude and longitude) of locations from the surface of the globe are transformed to coordinates on a plane. And GeoHash Uses Equirectangular projection

Figure 21: Equirectangular projection/ Equidistant Cylindrical Projection

The core of GeoHash is just an clever use of Z-order curves. Split the map-projection (rectangle) into 2 equal rectangles, each identified by unique bit strings.

Figure 22: GeoHash Level 1 - Computation

Observation: the divisions along X and Y axes are interleaved between bit strings. For example: an arbitrary bit string `01110 01011 00000`

, follows:

By futher encoding this to Base32 (`0123456789bcdefghjkmnpqrstuvwxyz`

), we map a unique string to a quadrant in a grid and quadrants that share the same prefix are closer to each other; e.g. `000000`

and `000001`

. By now we know that interleaving trace out a Z-order curve.

Figure 23: GeoHash Level 1 - Z-Order Curve

Higher levels (higher order z-curves) lead to higher precision. The geohash algorithm can be iteratively repeated for higher precision. That's one cool property of geohash, adding more characters increase precision of the location.

Figure 24: GeoHash Level 2

Despite the easy implementation and wide usage of geohash, it inherits the disadvantages of Z-order curves (Section 2.5): weakly preserved latitude-longitude proximity; does not always guarantee that locations that are physically close are also close on the Z-curve.

Adding on to it, is the use of equirectangular projection, where the division of the map into equal subspaces leads to unequal/disproportional surface areas, especially near the poles (northern and southern hemisphere). However, there are alternatives such as Geohash-EAS (Equal-Area Spaces).

## 3.2. Geohash - Implementation

To Convert a geographical location (latitude, longitude) into a concise string of characters and vice versa:

- Convert latitude and longitude to a binary strings.
- Interleave the binary strings of latitude and longitude.
- Geohash: Convert the interleaved binary string into a base32 string.

## 3.2a. Geohash Encoder - Snippet

```
public class GeohashEncoder {
public static String encodeGeohash(double latitude, double longitude, int precision) {
// 1. Convert Lat and Long into a binary string based on the range.
String latBin = convertToBinary(latitude, -90, 90, precision * 5 / 2);
String lonBin = convertToBinary(longitude, -180, 180, precision * 5 / 2);
// 2. Interweave the binary strings.
String interwovenBin = interweave(lonBin, latBin);
// 3. Converts a binary string to a base32 geohash.
String geohash = binaryToBase32(interwovenBin);
return geohash.substring(0, precision);
}
private static String convertToBinary(double value, double min, double max, int precision) {
StringBuilder binaryStr = new StringBuilder();
for (int i = 0; i < precision; i++) {
double mid = (min + max) / 2;
if (value >= mid) {
binaryStr.append('1');
min = mid;
} else {
binaryStr.append('0');
max = mid;
}
}
return binaryStr.toString();
}
private static String interweave(String str1, String str2) {
StringBuilder interwoven = new StringBuilder();
for (int i = 0; i < str1.length(); i++) {
interwoven.append(str1.charAt(i));
interwoven.append(str2.charAt(i));
}
return interwoven.toString();
}
private static String binaryToBase32(String binaryStr) {
String base32Alphabet = "0123456789bcdefghjkmnpqrstuvwxyz";
StringBuilder base32Str = new StringBuilder();
for (int i = 0; i < binaryStr.length(); i += 5) {
String chunk = binaryStr.substring(i, Math.min(i + 5, binaryStr.length()));
int decimalVal = Integer.parseInt(chunk, 2);
base32Str.append(base32Alphabet.charAt(decimalVal));
}
return base32Str.toString();
}
public static void main(String[] args) {
double latitude = 37.7749;
double longitude = -122.4194;
int precision = 5;
String geohash = encodeGeohash(latitude, longitude, precision);
System.out.println("Geohash: " + geohash);
}
}
```

## 3.3. Geohash - Conclusion

Similar to Section 2.7 (Indexing the Z-values); Geohashes convert latitude and longitude into a single, sortable string, simplifying spatial data management. A B-trees or search tree such as GiST/SP-GiST (Generalized Search Tree) index are commonly used for geohash indexing in databases.

Prefix Search: Nearby locations share common geohash prefixes, enabling efficient filtering of locations by performing prefix searches on the geohash column

Neighbor Searches: Generate geohashes for a target location and its neighbors to quickly retrieve nearby points. Which also extends to Area Searches: Calculate geohash ranges that cover a specific area and perform range queries to find all relevant points within that region.

Popular databases such as ClickHouse, MySQL, PostGIS, BigQuery, RedShift and many others offer built-in geohash function. And many variations have been developed, such as the 64-bit Geohash and Hilbert-Geohash

Interactive Geohash Visualization: /geohash

### 4. Google S2

## 4.1. S2 - Intuition

Google's S2 library was released more than 10 years ago and didn't exactly the get the attention it deserved, much later in 2017, Google announced the release of open-source C++ s2geometry library. With the use of Hilbert Curve (Section 2.2) and cube face (spherical) projection instead of geohash's Z-order curve and equirectangular projection; S2 addresses (to an extent) the large jumps (Section 2.5) problem with Z-order curves and disproportional surface areas associated with equirectangular projection.

The core of S2 is the hierarchical decomposition of the sphere into "cells"; done using a Quad-tree, where a quadrant is recursively subdivided into four equal sub-cells and the use of Hilbet Curve goes hand-in-hand - runs across the centers of the quad-tree’s leaf nodes.

## 4.2. S2 - Implementation

The overview of solution is to:

- Enclose sphere in cube
- Project point(s)
`p`

onto the cube - Build a quad-tree/hilbert-curve on each cube face (6 faces)
- Assign ID to the quad-tree cell that contains the projection of point(s)
`p`

Starting with the input co-ordinates, latitude (Degrees: -90° to +90°. Radians: -π/2 to π/2) and longitude (-180° to +180°. Radians: 0 to 2π). And WGS84 is a commmonly standard used in geocentric coordinate system.

### 4.2.1. (Lat, Long) to (X,Y,Z)

Covert `p = (lattitude,longitude) => (x,y,z)`

XYZ co-ordinate system (`x = [-1.0, 1.0], y = [-1.0, 1.0], z = [-1.0, -1.0]`

), based on coordinates on the unit sphere (unit radius), which is similar to Earth-centered, Earth-fixed coordinate system.

Figure 25: (lat, long) to (x, y, z) Transformation with ECEF

Where, `(x, y, z)`

: X-axis at latitude 0°, longitude 0° (equator and prime meridian intersection), Y-axis at latitude 0°, longitude 90° (equator and 90°E meridian intersection), Z-axis at latitude 90° (North Pole), Altitude (`PM`

on Figure 25) = Height to the reference ellipsoid/Sphere (Zero for a Round Planet approximation)

### 4.2.2. (X,Y,Z) to (Face,U,V)

To map `(x,y,z)`

to `(face, u,v)`

, each of the six faces of the cube is projected onto the sphere. The process is similar to UV Mapping: to project 3D model surface into a 2D coordinate space. where `u`

and `v`

denote the axes of the 2D plane. In this case, `U,V`

represent the location of a point on one face of the cube.

The projection can simply be imagined as a unit sphere circumscribed by a cube. And a ray is emitted from the center of the sphere to obtain the projection of the point on the sphere to the 6 faces of the cube, that is, the sphere is projected into a cube.

Figure 26: (lat, long) to (x, y, z) and (x, y, z) to (face, u, v)

The `face`

denotes which of the 6 (0 to 5) cube faces a point on the sphere is mapped onto. Figure 27, shows the 6 faces of the cube (cube mapping) after the projection. For a unit-sphere, for each face, the point `u,v = (0,0)`

represent the center of the face.

Figure 27: Cube Face (Spherical) Projection

The evident problem here is that, the linear projection leads to same-area cells on the cube having different sizes on the sphere (Length and Area Distortion), with the ratio of highest to lowest area of `5.2`

(areas on the cube can be up to 5.2 times longer or shorter than the corresponding distances on the sphere).

## 4.2.2a. S2 FaceXYZ to UV - Snippet

```
public static class Vector3 {
public double x;
public double y;
public double z;
public Vector3(double x, double y, double z) {
this.x = x;
this.y = y;
this.z = z;
}
}
public static int findFace(Vector3 r) {
double absX = Math.abs(r.x);
double absY = Math.abs(r.y);
double absZ = Math.abs(r.z);
if (absX >= absY && absX >= absZ) {
return r.x > 0 ? 0 : 3;
} else if (absY >= absX && absY >= absZ) {
return r.y > 0 ? 1 : 4;
} else {
return r.z > 0 ? 2 : 5;
}
}
public static double[] validFaceXYZToUV(int face, Vector3 r) {
switch (face) {
case 0:
return new double[]{r.y / r.x, r.z / r.x};
case 1:
return new double[]{-r.x / r.y, r.z / r.y};
case 2:
return new double[]{-r.x / r.z, -r.y / r.z};
case 3:
return new double[]{r.z / r.x, r.y / r.x};
case 4:
return new double[]{r.z / r.y, -r.x / r.y};
default:
return new double[]{-r.y / r.z, -r.x / r.z};
}
}
public static void main(String[] args) {
Vector3 r = new Vector3(1.0, 2.0, 3.0);
int face = 0;
double[] uv = validFaceXYZToUV(face, r);
System.out.println("u: " + uv[0] + ", v: " + uv[1]);
}
```

The Cube `Face`

is the largest absolute X,Y,Z component, when component is -ve, back faces are used.

Face and XYZ is mapped to UV by using the other two X, Y, Z components (other than largest component of face) and diving it by the largest component, a value between `[-1, 1]`

. Additionally, some faces of the cube are transposed (-ve) to produce the single continuous hilbert curve on the cube.

### 4.2.3. (Face,U,V) to (Face,S,T)

The ST coordinate system is an extension of UV with an additional non-linear transformation layer to address the (Area Preservation) disproportionate sphere surface-area to cube cell mapping. Without which, cells near the cube face edges would be smaller than those near the cube face centers.

Figure 28: (u, v) to (s, t)

S2 uses Quadratic projection for `(u,v)`

=> `(s,t)`

. Comparing `tan`

and `quadratic`

projections: The tan projection has the least Area/Distance Distortion. However, quadratic projection, which is an approximation of the tan projection - is much faster and almost as good as tangent.

Area Ratio | Cell → Point (µs) | Point → Cell (µs) | |

Linear | 5.20 | 0.087 | 0.085 |

Tangent | 1.41 | 0.299 | 0.258 |

Quadratic | 2.08 | 0.096 | 0.108 |

`Cell → Point`

and `Point → Cell`

represents the transformation from (U, V) to (S, T) coordinates and vice versa.

Figure 29: (face, u, v) to (face, s, t); for face = 0

For the quadratic transformation: Apply a square root transformation; sqrt(1 + 3 * u) and to maintain the uniformity of the grid cells

## 4.2.3a. S2 UV to ST - Snippet

```
public static double uvToST(double u) {
if (u >= 0) {
return 0.5 * Math.sqrt(1 + 3 * u);
} else {
return 1 - 0.5 * Math.sqrt(1 - 3 * u);
}
}
public static void main(String[] args) {
// (u, v) values in the range [-1, 1]
double u1 = 0.5;
double v1 = -0.5;
// Convert (u, v) to (s, t)
double s1 = uvToST(u1);
double t1 = uvToST(v1);
System.out.println("For (u, v) = (" + u1 + ", " + v1 + "):");
System.out.println("s: " + s1);
System.out.println("t: " + t1);
}
```

### 4.2.4. (Face,S,T) to (Face,I,J)

The IJ coordinates are discretized ST coordinates and divides the ST plane into `2`

, i.e. the i and j coordinates in S2 range from ^{30} × 2^{30}`0 to 2`

. And represent the two dimensions of the leaf-cells (lowest-level cells) on a cube face.^{30} - 1

Why 2^{30}? The i and j coordinates are each represented using 30 bits, which is `2`

distinct values for both i and j coordinates (every cm² of the earth), this large range allows precise positioning within each face of the cube (high spatial resolution). The total number of unique cells is ^{30}`6 x (2`

^{30} × 2^{30})

Figure 30: (face, s, t) to (face, i, j); for face = 0

## 4.2.4a. S2 ST to IJ - Snippet

```
public static int stToIj(double s) {
return Math.max(
0, Math.min(1073741824 - 1, (int) Math.round(1073741824 * s))
);
}
```

### 4.2.5. (Face,I,J) to S2 Cell ID

The hierarchical sub-division of each cube face into 4 equal quadrants calls for Hilbert Space-Filling Curve (Section 2.2): to enumerate cells along a Hilbert space-filling curve.

Figure 31: (face, i, j) to Hilbert Curve Position

Hilbert Curve preserves spatial locality, meaning, the values that are close on the cube face/surface, are numerically close in the Hilbert curve position (illustration in Figure 31 - Level 3).

Transformation: The Hilbert curve transforms the IJ coordinate position on the cube face from 2D to 1D and is given by a `60 bit`

integer (`0 to 2`

).^{60}

## 4.2.5a. S2 IJ to S2 Cell ID - Snippet

```
public class S2CellId {
private static final long MAX_LEVEL = 30;
private static final long POS_BITS = 2 * MAX_LEVEL + 1;
private static final long FACE_BITS = 3;
private static final long FACE_MASK = (1L << FACE_BITS) - 1;
private static final long POS_MASK = (1L << POS_BITS) - 1;
public static long faceIjToCellId(int face, int i, int j) {
// Face Encoding
long cellId = ((long) face) << POS_BITS;
// Loop from MAX_LEVEL - 1 down to 0
for (int k = MAX_LEVEL - 1; k >= 0; --k) {
// Hierarchical Position Encoding
int mask = 1 << k;
long bits = (((i & mask) != 0) ? 1 : 0) << 1 | (((j & mask) != 0) ? 1 : 0);
cellId |= bits << (2 * k);
}
return cellId;
}
public static void main(String[] args) {
int face = 2;
int i = 536870912;
int j = 536870912;
long cellId = faceIjToCellId(face, i, j);
System.out.println("S2 Cell ID: " + cellId);
}
}
```

The **S2 Cell ID** is represented by a `64-bit`

integer,

- the left
`3 bits`

are used to represent the cube face`[0-5],`

- the next following
`60 bits`

represents the Hilbert Curve position, - with
`[0-30]`

levels; two bits for every higher order/level, followed by a trailing`1`

bit, which is a marker to identify the level of the cell (by position). - and the last digits are padded with 0s

Figure 32: (face, i, j) to S2 Cell ID

```
fffpppp...pppppppp1 # Level 30 cell ID
fffpppp...pppppp100 # Level 29 cell ID
fffpppp...pppp10000 # Level 28 cell ID
...
...
...
fffpp10...000000000 # Level 1 cell ID
fff1000...000000000 # Level 0 cell ID
```

Notice the position of trailing `1`

and padded `0`

s, correlated to the level.

**S2 Tokens** are a string representation of S2 Cell IDs (uint64), which can be more convenient for storage.

## 4.2.5b. S2 Cell ID to S2 Token - Snippet

```
public static String cellIdToToken(long cellId) {
// The zero token is encoded as 'X' rather than as a zero-length string
if (cellId == 0) {
return "X";
}
// Convert cell ID to a hex string and strip any trailing zeros
String hexString = Long.toHexString(cellId).replaceAll("0*$", "");
return hexString;
}
public static void main(String[] args) {
long cellId = 3383821801271328768L; // Given example value
// Convert S2 Cell ID to S2 Token
String token = cellIdToToken(cellId);
System.out.println("S2 Cell ID: " + cellId);
System.out.println("S2 Token: " + token);
}
```

It's similar to Geohash, however, prefixes from a high-order S2 token does not yield a parent lower-order token, because the trailing 1 bit in S2 cell ID wouldn't be set correctly. Convert S2 Cell ID to an S2 Token by encoding the ID into a base-16 (hexadecimal) string.

## 4.3. S2 - Conclusion

Google's S2 provides spatial indexing by using hierarchical decomposition of the sphere into cells through a combination of Hilbert curves and cube face (spherical) projection. This approach mitigates some of the spatial locality issues present in Z-order curves and offers more balanced surface area representations. S2's use of (face, u, v) coordinates, quadratic projection, and Hilbert space-filling curves ensures efficient and precise spatial indexing.

Closing with a strong pro and a con, S2 offers a high resolution of as low as `0.48 cm²`

cell size (level 30), but the number of cells required to cover a given polygon isn't the best. This makes it a good transition to talk about Uber's H3. The question is, Why Hexagons?

## 3. References

```
6. Christian S. Perone, "Google’s S2, geometry on the sphere, cells and Hilbert curve," in Terra Incognita, 14/08/2015, https://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/. [Accessed: 12-Jun-2024].
7. B. Feifke, "Geospatial Indexing Explained," Ben Feifke, Dec. 2022. [Online]. Available: https://benfeifke.com/posts/geospatial-indexing-explained/. [Accessed: 12-Jun-2024].
8. "S2 Concepts," S2 Geometry Library Documentation, 2024. [Online]. Available: https://docs.s2cell.aliddell.com/en/stable/s2_concepts.html. [Accessed: 13-Jun-2024].
9. "Geospatial Indexing: A Look at Google's S2 Library," CNIter Blog, Mar. 2023. [Online]. Available: https://cniter.github.io/posts/720275bd.html. [Accessed: 13-Jun-2024].
10. "S2 Geometry Library," S2 Geometry, 2024. [Online]. Available: https://s2geometry.io/. [Accessed: 13-Jun-2024].
```

Cite this article as: Adesh Nalpet Adimurthy. (Jun 12, 2024). Spatial Index: Grid Systems. PyBlog. https://www.pyblog.xyz/spatial-index-grid-system