Review: Python for Geospatial Data Analysis

When I saw the announcement for this book on the O’Reilly website, I knew immediately that I had to get it. My programming background is in C/C++/C# and I’ve only been using Python for a year or two, and I’m still trying to find my way through all the libraries and packages that are available. The description on the O’Reilly website looked like this book would cover of what is important to me as someone who deals with geospatial data on a daily basis.

Unfortunately, my conclusion after reading through the book is that it is quite superficial. Most provided examples are the plotting of data, with little actual analysis being shown.

It starts with chapter 1, where map projections are briefly explained – limited to global maps with no mentioning of national grids or EPSG codes (which only appear briefly later on). Chapter 2 is an introduction to QGIS. Chapter 3 introduces PyQGIS, the QGIS Python API. Loading and styling data are explained, but there is only one example that actually does any data processing/analysis: Selecting cities based on distance to a river. The paragraph Addressing the Research Question poses an interesting question, but never actually does the analysis – it just encourages the reader to look at the maps and find patterns. This chapter also uses the term uploading for opening datasets in QGIS, which I find confusing.

Chapter 4 discusses Google Earth Engine and how to use it to display data. This chapter promises a decision-tree classification of Landsat data – but this never happens. Chapter 5 talks about OpenStreetMap and actually shows a few useful examples, such as calculating travel times, and some network parameters. Chapter 6 is about the ArcGIS Python API, and once again focuses mostly on displaying data. As ArcGIS is commercial software, I feel that this chapter will be of little use to many people.

Chapter 7 is about Geopandas. Geopandas is a spatial extension to Pandas, a very powerful Python package for dealing with tabular data. I would have expected that it shows at least how to read a CSV file of data, convert it to a Geopandas data frame with geometry, and save it – but no, only loading and plotting of existing datasets is shown. In a note it is stated that A normal distribution is when the mean, mode, and median are all the same value. This is plain wrong – yes, in a normal distribution the mean, mode, and median are the same value. But one can think of other distributions for which this is the case, but which are not Gaussian and hence not normal.

Chapter 8 is about data cleaning, and briefly shows how to use Geopandas to load shapefiles and create Geodataframes from CSV files – which should be happening in chapter 7. Here’s another inaccuracy: it is stated that GeoPandas is a Cartesian coordinate reference system, which means that each point is defined by a pair of numerical coordinates, such as latitude and longitude in our example. This is not what a Cartesian coordinate system is, and a latitude/longitude coordinate system is certainly not a Cartesian coordinate system!

Chapter 9 is a short introduction to GDAL, a library for working with raster data. Finally, chapter 10 does some real data analysis, using deforestation data. Sorely missing is a chapter about PostGIS, the open source spatial database based upon PostgreSQL.

You probably can tell that I’m disappointed by this book. It just feels incomplete and unpolished, and probably should have been called An Introduction to Geospatial Data Analysis with Python or something similar instead. As such, it is useful in providing an overview of packages and possibilities for reading, displaying, and analyzing gespatial data with Python. But for actually learning how to do geospatial analysis with Python, you’ll have to go looking elsewhere.


Een reactie plaatsen

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *