Posts Tagged: big
Just trying to get my head around some of the new big raster processors out there, in addition of course to Google Earth Engine. Bear with me (bare?) while I sort through these. Thanks for raster sleuth Stefania Di Tomasso for the leg work.
1. Geotrellis (https://geotrellis.io/)
Geotrellis is a Scala-based raster processing engine, and it is one of the first geospatial libraries on Spark. Geotrellis is able to process big datasets. Users can interact with geospatial data and see results in real time in an interactive web application (for regional, statewide dataset). For larger raster datasets (eg. US NED). GeoTrellis performs fast batch processing using Akka clustering to distribute data across the cluster. GeoTrellis was designed to solve three core problems, with a focus on raster processing:
- Creating scalable, high performance geoprocessing web services;
- Creating distributed geoprocessing services that can act on large data sets; and
- Parallelizing geoprocessing operations to take full advantage of multi-core architecture.
- GeoTrellis is designed to help a developer create simple, standard REST services that return the results of geoprocessing models.
- GeoTrellis will automatically parallelize and optimize your geoprocessing models where possible.
- In the spirit of the object-functional style of Scala, it is easy to both create new operations and compose new operations with existing operations.
2. GeoPySpark - in synthesis GeoTrellis for Python community
Geopyspark provides python bindings for working with geospatial data on PySpark (PySpark is the Python API for Spark). Spark is open source processing engine originally developed at UC Berkeley in 2009. GeoPySpark makes Geotrellis (https://geotrellis.io/) accessible to the python community. Scala is a difficult language so they have created this Python library.
Great commentary from Martin Isenburgon of LASTools fame on releasing data with false precision. This deals with the new open data release by the Environment Agency in England. So far LiDAR-derived DTM and DSM rasters have been released for 72% of the entire English territory at horizontal resolutions of 50 cm, 1 m, and 2 m. They can be downloaded here. The rasters are distributed as zipped archives of tiles in textual ASC format (*.asc).
Martin gives us a cautionary tale on how not to release national data. It is not the ASC format that he has problems with, but the vertical precision. He says:
"The vertical resolution ranges from femtometers to attometers. This means that the ASCII numbers that specify the elevation for each grid cell are written down with 15 to 17 digits after the decimal point."
Example heights might be something like: 79.9499969482421875 or 80.23999786376953125. These data should be resolved to about the cm, not attometer, whatever that is. Crazy man!
An interesting position piece on the appropriate uses of big data for climate resilience. The author, Amy Luers, points out three opportunities and three risks.
She sums up:
"The big data revolution is upon us. How this will contribute to the resilience of human and natural systems remains to be seen. Ultimately, it will depend on what trade-offs we are willing to make. For example, are we willing to compromise some individual privacy for increased community resilience, or the ecological systems on which they depend?—If so, how much, and under what circumstances?"
Read more from this interesting article here.