Understanding STAC
Spatio-Temporal Asset Catalog (STAC) specification provides a common structure for describing and cataloging spatiotemporal data products in a uniform, extensible manner. The STAC specification is composed of four semi-independent components which can be used alone but work best in concert with each other. The four components are as follows:
- STAC Catalog
- The Catalog is a simple, extensible collection of links that provides organization structure for browsing available assets.
- NOTE: Each STAC API has a root Catalog that presents the API structure
- STAC Collection
- The Collection is an extension of the Catalog with additional descriptive fields (extents, licensing, keywords, providers, etc.) that describe assets contained in the Collection.
- STAC Item
- The Item is a single, atomic unit that represents a spatiotemporal asset as a GeoJSON feature along with temporal extents and other fields.
- This can be a single LAS/LAZ file, a GeoTIFF, or any other data product.
- STAC API
- The API specification (provided via OpenAPI) details the required RESTful endpoints exposed by any STAC-compliant API.
This STAC structure allows data consumers to efficiently search for data within any given spatial extent, temporal range, or a combination of the two. Data consumers can maintain a collection of multiple STAC APIs and can send the same GET request or search endpoint of each API to retrieve all available data within a given spatiotemporal query.
While this structure is powerful and enables seamless integration between multiple providers, it does not provide a data organization schema beyond requiring each Item has a valid spatiotemporal extent. Data providers and consumers can still realize significant value by implementing an organization schema that matches their specific business needs.
Indexing LiDAR with STAC
LiDAR point-cloud datasets are acquired and used in a myriad of business cases and contexts and can be leveraged to produce a significant amount of actionable intelligence when indexed efficiently. All LiDAR datasets contain a spatial-extent at the minimum, and most modern scanners also encode a GPS timestamp onto each return within the resultant point-cloud. This makes LiDAR an ideal candidate for ingestion into a STAC Collection for each acquisition project to ensure each file can be easily retrieved and analyzed at any time.
As an example, imagine a client who has acquired a LiDAR dataset in the same spatial region over several years. Assume the LiDAR itself had sufficient density to derive a DEM as well as some basic delineation of manmade features (buildings, utility assets, etc.). If the client took the effort to generate a single STAC Collection for each year of acquisition and included each resultant LiDAR files & DEMs as STAC Items, they could then present those collections through a single STAC API.
Now, say an internal or external analyst is interested in performing change detection analysis on the DEMs across several years within a constrained spatial region-of-interest (ROI). They would compose a single JSON query as follows:
This query tells the STAC API to retrieve all STAC Items between 12 FEB 2018 and 18 MAR 2018 within this spatial region:
This JSON query would be included in the HTTP request body of a single GET request to the STAC API’s /search endpoint. The API would then return a paged collection of all STAC Items that intersect with the given spatiotemporal query – and the analyst can then perform their work.
The power of STAC is enabling providers and consumers to agree upon a simple ‘search’ contract so that distributed teams can easily publish & retrieve data without adding development overhead or risk.
Combining STAC with H3
In many contexts – clients leverage existing organization schemes (such as discrete global grid systems) to aggregate and organize both their raw data but also their derived intelligence. Since STAC doesn’t enforce an organization scheme, clients can seamlessly integrate STAC into their existing organization scheme. While the STAC API’s /search endpoint allows consumers to search for data within a given spatiotemporal query, the API also allows consumers to retrieve a specific file based on their existing organizational scheme.
Take the case of Uber’s H3, which is a hierarchal, hexagonal spatial indexing scheme that exposes functionality to retrieve the unique index of the covering hexagon for any geographic location & resolution. Taking the location (45.424721, -75.695000) at a resolution of 6, there is a corresponding hexagon ID of 862b83b0fffffff.
That’s great – but does that mean we don’t need STAC to find the covering hexagon?
Technically, we don’t, however there is another approach. One could ingest the data associated with that hexagon as a STAC Item whose ID matches the hexagon ID. Then, include all those Items within a Collection named ‘h3’ and add it to the STAC API.
Doing so allows one to directly retrieve the item via a GET request to the following URL:
- example.stac.com/collections/h3/items/862b83b0fffffff
Using this method we get the best of H3 and STAC and clients can ensure a direct retrieval is possible while also enabling consumers who aren’t using your specific organizational scheme to search for any product covering their ROI at no added cost.