Is ChatGPT a Game Changer for GIS?

May 8, 2023

Find out what our GIS experts say about ChatGPT's answers to GIS questions

Banner with GIS and ChatGPT icones

Introduction

ChatGPT has taken the world by storm. Most of the media coverage has been around its ability to write essays, cover letters, and even poetry. But its language model is drawn from the web, which covers a lot more than prose. Its programming model in particular has impressed - and even scared - professional software engineers and data scientists.

But domain experts who have tested ChatGPT also caution that using the world wide web as the corpus of knowledge for training ChatGPT is both a blessing and vulnerability. Because as we all know, the web is a mixed bag of content. A lot of it is accurate and of good quality, but not everything. It takes humans years of experience to develop the wisdom to sift the wheat from the chaff - how is ChatGPT doing after just a few months out of the oven?

We decided to test it out with our domain knowledge - geospatial data, science, and technology. Like everyone, we're curious about the accuracy of ChatGPTs answers to common GIS questions that have a known answer, and how it deals with questions that don't have a single answer (like which GIS software is “best”). But we were also interested in what types of questions it can handle, and when it essentially gives a non-answer.

To kick the tires on ChatGPT's geospatial domain knowledge, we asked it a series of questions in a few categories that we have a lot of experience with: general GIS, GIS software (i.e., ArcGIS), and spatial programming with Python and R. Our experts then gave each response a score of 1-5 stars, plus a little commentary.

See below the questions we asked, or jump to the end for our conclusions. The results might surprise you!

General GIS Questions

1. What's the best spatial projection to use for a GIS analysis of a forest?

ChaptGPT's answer:

The choice of spatial projection for a GIS analysis of a forest can depend on a variety of factors, such as the location of the forest, the purpose of the analysis, and the data being used. However, some common projections that are often used for forest analyses include:

  1. Universal Transverse Mercator (UTM): UTM projections are often used for smaller areas such as a single forest stand, as they provide accurate measurements for a particular zone, which are then combined to provide an accurate representation of the entire forest.

  2. Lambert Conformal Conic: Lambert Conformal Conic projections are often used for larger areas such as a forest region or a country. They are particularly useful for mapping features that are oriented north-south and are widely used in the United States.

  3. Albers Equal Area Conic: Albers Equal Area Conic projections are also commonly used for forest analysis, particularly for areas that are more east-west in orientation. They are often used for regional and national level forest assessments.

Ultimately, the best projection to use for a specific forest analysis will depend on the objectives of the analysis, the nature of the data being used, and the spatial characteristics of the area being analyzed. It's always a good idea to consult with a GIS professional or a forest analyst to determine the best projection for your specific needs.

IGIS experts score: 4.5 stars

Slightly more emphasis could have gone to scale, and the trade-offs of preserving area, direction, positional accuracy, and cartographic shape; however, this is a very good answer! SH

 

2. What is the most common format for spatial data?

ChaptGPT's answer:

The most common format for spatial data is the Shapefile (.shp) format. It was developed by ESRI (Environmental Systems Research Institute) and has become the de facto standard for spatial data storage and exchange.

A shapefile is a collection of files that stores feature geometry, attributes, and spatial index information. The main files include the .shp file (which stores the feature geometry), the .dbf file (which stores the attribute data), and the .shx file (which stores the spatial index information).

Shapefiles are widely used in GIS (Geographic Information Systems) software for storing and analyzing spatial data, including points, lines, and polygons. They can represent a variety of geographic features, such as roads, rivers, land parcels, and administrative boundaries.

However, with the development of new technologies and data standards, other file formats are becoming more common in the GIS community, such as GeoJSON, KML, and GeoTIFF. The choice of file format will depend on the software and tools being used, the type and complexity of data being analyzed, and the needs of the analysis.

IGIS experts score: 4.5 stars

ChatGPT answers the question accurately, however I was struck by the oddity of including “GeoTIFF” as a competing or comparable file format to a Shapefile, given that a GeoTIFF is a raster format and a Shapefile is a vector format. This comparison is not technically wrong, but requires further explanation. SH

What is the best format for vector spatial data, and why?

The best format for vector spatial data depends on the specific needs of the analysis and the software being used. However, some of the most common and widely used formats for vector spatial data include:

  1. Shapefile (.shp): Shapefile is a widely used format for vector data that is supported by many GIS software packages. It stores geometric and attribute data in separate files and can represent point, line, and polygon features.

  2. GeoJSON (.json): GeoJSON is a format for encoding a variety of geographic data structures using the JSON (JavaScript Object Notation) format. It is often used for web mapping applications and is supported by many web mapping libraries.

  3. KML (.kml): KML (Keyhole Markup Language) is an XML-based format for storing geographic data and is often used for displaying geographic data in Google Earth.

  4. GeoPackage (.gpkg): GeoPackage is an open, standards-based format for storing geospatial data that can store both vector and raster data. It is designed for use on mobile devices and is gaining popularity as a format for sharing geospatial data.

The choice of format will depend on factors such as the purpose of the analysis, the size and complexity of the data, and the software being used. It is always a good idea to consult with experts in the field, such as GIS professionals, to determine the best format for a given analysis.

IGIS experts score: 3.5 stars

ChatGPT does a good job at picking some good (and common) vector data formats; however, in my opinion it doesn't do a great job at explaining why its selected formats are the best, but rather just gave some basic examples of use-cases. SH

What is the best source of data to make a map of wildfire risk in California?

ChatGPT's answer:

There are several potential sources of data that could be used to create a map of wildfire risk in California, depending on the scale and scope of the analysis. Some potential sources of data are:

California Department of Forestry and Fire Protection (CAL FIRE): CAL FIRE provides a variety of data and resources related to wildfire risk, including maps of high fire hazard severity zones, fire hazard severity zone datasets, and other wildfire-related data.

National Interagency Fire Center (NIFC): NIFC provides a variety of data and resources related to wildfires across the United States, including maps of current wildfires, fire potential maps, and fire weather forecasts.
US Forest Service: The US Forest Service provides a variety of data related to wildfire risk and forest management, including data on forest structure, fuels, and vegetation cover.

United States Geological Survey (USGS): USGS provides various data related to wildfires, including historical wildfire perimeters, vegetation and fuel data, and topographic maps.
NASA: NASA provides satellite imagery and data related to wildfire activity and smoke plumes, which can be used to track the spread and impact of wildfires.

The choice of data source will depend on the specific needs of the analysis and the scale of the study. It's often a good idea to consult with experts in the field, such as fire ecologists or GIS professionals, to determine the best data sources for a given analysis.

IGIS experts score: 4.5 stars

This is a very good answer for primary sources of fire risk data; however I would have also liked to have seen mention of secondary data sources, such as ESRI or Living Atlas. SH

Is it possible to classify historic black and white aerial photography to distinguish between grasses, shrubs, and trees?

ChatGPT's answer:

Yes, it is possible to classify historic black and white aerial photography to distinguish between grasses, shrubs, and trees. Aerial photography can provide valuable information about the vegetation cover and composition of an area, and it has been used in many applications, including vegetation mapping and monitoring, land use and land cover mapping, and natural resource management.

There are various methods for classifying aerial photography, such as visual interpretation, unsupervised and supervised classification, object-based image analysis, and machine learning algorithms. These methods rely on the spectral properties of the imagery and the spatial patterns of the features being mapped.

In order to distinguish between grasses, shrubs, and trees, it is important to consider the differences in their spectral reflectance properties. For example, trees typically have higher reflectance values in the near-infrared portion of the spectrum compared to grasses and shrubs, while grasses tend to have higher reflectance values in the visible portion of the spectrum. These spectral differences can be used to develop spectral indices and classification algorithms that can accurately distinguish between different types of vegetation.

However, it's worth noting that the accuracy of the classification will depend on various factors, such as the resolution of the imagery, the quality of the data, and the complexity of the vegetation cover. In addition, manual interpretation of aerial photography can be time-consuming and labor-intensive, and it may require specialized expertise in vegetation identification and classification. Therefore, it's often a good idea to consult with experts in the field, such as remote sensing or GIS professionals, to determine the best approach for a specific analysis.

IGIS experts score: 2.5 stars

This answer would have been good, if it were not for the third paragraph, which doesn't really apply to black and white imagery in any practical sense. Near-infrared reflectance and black and white imagery are very much two different things. SH

 

GIS Software Questions

Where do I go in ArcGIS Pro to split a big polygon into multiple polygons using a road network to define the regions?

ChatGPT's answer:

To split a big polygon into multiple polygons using a road network to define the regions in ArcGIS Pro, you can use the "Split Polygons" tool in the "Editing" tab. Here are the steps to follow:

  1. Open the "Editing" tab in the ArcGIS Pro ribbon.

  2. Click on the "Split" drop-down menu and select "Split Polygons."

  3. In the "Split Polygons" pane, select the polygon layer you want to split in the "Input Features" field.

  4. Select the road network layer you want to use as a splitting feature in the "Split Features" field.

  5. Choose the split method that you want to use. For example, you can split the polygon by the roads that intersect its boundary, or by the roads that fall within the polygon's interior.

  6. Adjust any additional options for the split method you chose.

  7. Click the "Run" button to execute the tool and create the split polygons.

Once the tool is complete, a new layer will be created with the split polygons. You can edit this layer further, if needed, to refine the individual polygon boundaries.

IGIS experts score: 2 stars

I attempted to use the above work-flow to split a polygon, and the directions were both inaccurate and not very clear. In my opinion, a better way of tackling this geoprocessing step would have been to use the search tool in the geoprocessing tool box, as this approach would be much less likely to change with updated versions of ArcGIS Pro. SH

 

Programming Questions

Write Python code to convert a multi-layer raster stack to Zarr.

ChatGPT's answer:

Here is some Python code that demonstrates how to convert a multi-layer raster stack to the Zarr format using the rasterio and xarray libraries:

import rasterio
import xarray as xr

# Open the raster stack
with rasterio.open('input.tif') as src:
 # Read the raster data into an xarray dataset
 data = xr.open_rasterio(src)
 # Write the dataset to a Zarr store
 data.to_zarr('output.zarr', mode='w')

The code above opens the raster stack using rasterio and reads it into an xarray dataset. Then it's using the xarray to_zarr function to write the dataset to a Zarr store.

You can also specify the storage backend to use, by default it uses the filesystem backend, but you can use other backends like 'S3', 'GCS' etc.

data.to_zarr('s3://my-bucket/output.zarr', mode='w', storage_options={'anon':True})

It's also possible to use the zarr library directly to write the data, but xarray provides a convenient way to handle the conversion and makes it easy to work with the Zarr data later on.

Please note that the above code is just an example and may require additional modifications to work with your specific dataset and system configuration. Additionally, the example above writes the data to a single zarr store, but you can also write it to multiple stores using the group parameter of the to_zarr method.

IGIS experts score: 5 stars

The provided code works and is easy to follow. I was impressed that it offered an alternative library and anticipated the next question - how to upload to cloud storage (because Zarz is a cloud optimized format). AL

Write a R script to import a polygon layer from a file geodatabase, and plot it using leaflet.

ChatGPT's answer:

Sure, here's an example R script that imports a polygon layer from a file geodatabase and plots it using leaflet:

# Load required libraries
library(sf)
library(leaflet)

# Set the working directory to where the file geodatabase is located
setwd("path/to/geodatabase")

# Read in the polygon layer from the geodatabase
data

# Create a leaflet map and add the polygon layer
m %
 addTiles() %>%
 addPolygons(data = data, fillColor = "blue", fillOpacity = 0.6)

# Display the map
m

In this script, you'll need to replace "path/to/geodatabase" with the actual file path where your geodatabase is located, "geodatabase.gdb" with the name of your file geodatabase, and "polygon_layer" with the name of the layer containing the polygon data.

The st_read() function from the sf library is used to read in the polygon layer from the geodatabase. The leaflet() function is used to create a new leaflet map, and the addTiles() function adds the default tile layer. The addPolygons() function is used to add the polygon layer to the map, where the data argument is the data frame containing the polygon data, and the fillColor and fillOpacity arguments are used to set the polygon fill color and opacity. Finally, the m object is displayed to show the leaflet map.

IGIS experts score: 4 stars

The code is correct, and the explanations are helpful. Impressive that it knows how to combine functions from two R-packages that are often presented separately. It didn't however check the projection of the polygon data to verify the data are in WGS 84, which is required by leaflet. AL

 

Conclusions

Overall we found ChatGPT very impressive. We gave it a series of questions on geospatial concepts, data, desktop software, and programming, and in all cases it was able to interpret our questions sensibly. Although it missed some of the details, the responses were always relevant, and in many cases it was able to bridge different domains of knowledge (such the scale of projections and the scale of forests).

Where it works well: ChatGPT is most impressive in domains like coding that are fairly well-defined and have highly structured ontologies and syntax. It's probably not too surprising that a computer program is most proficient in understanding computer languages. Programming is also probably one of the easier domains for it to train on: answers to common questions on the internet probably converge pretty quickly. We predict it will have the greatest utility for questions on getting started, for example which function to use, and well-defined use cases. Most advanced topics, like debugging or designing the architecture of a program to address a specific use-case, are much more complex and will continue to require human expertise and experience synthesizing information.

Other fields it might work well in. Extrapolating this conclusion further, we could also predict AI could become very useful in fields like law, medicine and other highly technical areas where domain knowledge is well-developed, highly structured, and requires storage capacity well beyond what the human brain can handle. It will be interesting - and scary - to see how AI intrudes upon fields that have historically been seen as some of the most intellectually demanding and prestigious occupations out there.

AI in education: There are many active discussions about the use of AI in education, as there is no doubt that these tools are going to change education and training. There is considerable worry about how these tools might facilitate exam cheating in higher education, and some instructors have responded by trying to keep AI tools out of the classroom. However, that seems like trying to hold back an ocean back with a sand castle. Many teachers are figuring out ways to embrace and incorporate tools like ChatGPT in their teaching. It is clear that these tools can also help to accelerate the learning curve, but also require strengthening skills in discernment, critical thinking, the value of diverse perspectives and experiences, and the ways in which knowledge serves as a form of power.

Learning how to talk to ChatGPT: We found that to get good answers you have to ask good questions. This is perhaps an extension of the well-known adage, ‘garbage in, garbage out' (which also applies to ChatGPT's training data). As we learned from trial and error, if you don't ask a clearly expressed question, using standard terminology and with all the needed parameters, the results are more unpredictable, general, and probably not helpful. Again this is not surprising given that you're talking to a computer who doesn't know your background and can't read your body language. Indeed one of the skills we think will need to be taught to use ChatGPT effectively is articulating good questions. Perhaps not coincidentally, learning how to ask better questions is also one of the primary learning outcomes of many K-16 curricula.

Recognizing and reconciling multiple realities and knowledges. Our domain of interest - geospatial science, data, and tools - is fairly cut-and-dry, and the ‘one universal truth' paradigm isn't so far fetched. That being said, there are still shades of gray, old and new standards that often coexist together, scale dependencies, data modeling choices, philosophical differences, power dynamics, etc. To its credit, when there isn't a single answer or the decision has consequences, ChatGPT likes to invoke ‘It is always a good idea to consult with experts in the field…' That often feels like a cop-out, but it's probably the right thing to do.

In other domains, such as many social science fields and the humanities, multiple situated knowledges and interpretations are not an outlier but an intrinsic characteristic of the field. Whether this new generation of AI can recognize and articulate the value in different types of knowledge, time will tell.

What we'd like to see more of: transparency. Transparency is a tall ask for AI. The algorithms are extremely complex and fuse together training data in ways that are hard if not impossible to unpack (even if you know the algorithm!). Not knowing where information comes from is a huge paradigm shift from traditional scholarship, and not just a bit unsettling. In conventional research, transparency is baked in through practices like literature reviews and citations. This is how scholars build upon the work of each other, allowing the reader to trace the lineage of knowledge and evaluate for herself whether a new discovery or idea holds water. ChatGPT on the other hand draws from a massive treasure trove of online content, however the specifics of how sources are chosen and evaluated are complex and known only to an elite few software engineers. To say its sources are anonymous is incorrect - they're just hidden. You don't have to be a George Orwell to think of ways this could produce unintended and undesirable consequences in a big way.

Does ChatGPT level the playing field? Technologies like ChatGPT can level the playing field to some degree, allowing poorly resourced communities to catch up in terms of education and technology. Think of the positive impacts on universities in poor countries. However once transformative technologies get monetized, they tend to exacerbate existing inequalities. Hopefully the powerful companies that are developing these tools will listen to the smart people inside and outside the company and find ways to keep this technology in the public domain.

In conclusion, the journey to Cyborg is a slippery slope. Long before ChatGPT, people freaked out about the transformative impacts of search engines, cell phone apps, and even Wikipedia. Is ChatGPT going to be a game changer in the world of GIS? Only time will tell. We will definitely be using it for what it's already good at, and are not too worried about our jobs being taken over by robots - yet!

 

 


By Andy Lyons
Author - Program Coordinator
By Sean Hogan
Author
By Maggi Kelly
Author