GDAL sucks, but it's exceptional, and here's why.
GDAL sucks. There. I said it. We all think it. But! Before we go into why let's acknowledge one thing. Our world would be bleak without the infuriating Geospatial Data Abstraction Library, better known as GDAL. You see, GDAL has an impossible job. GDAL must abstract how programmers work with geospatial data. All for free because GDAL is an open-source software (OSS) project.
What is GDAL? "GDAL is a translator library for raster and vector geospatial data formats that is released under an MIT style Open Source License by the Open Source Geospatial Foundation. As a library, it presents a single raster abstract data model and single vector abstract data model to the calling application for all supported formats." (gdal.org). In short, GDAL translates almost any geospatial data format into a standardized data model that programmers can use.
GDAL has 158 raster drivers and 79 vector drivers that will translate hundreds of different geospatial formats. It will translate all 200+ data formats into essentially three data models: the Raster Data Model (GDALDataset), the Multidimensional Raster Data Model (GDALGroup), and the Vector Data Model (OGRLayer). More impressively, GDAL will read data files on all sorts of systems. On the file systems of Windows, Linux, and Mac devices, but also files that are: in-memory, compressed (.zip, tar.gz), encrypted, stored on a network, or in the cloud. To make this all possible, GDAL abstracts all these file systems to the GDAL Virtual File System. So, any programmer can use GDAL to open almost any geospatial data on almost any platform; Hadoop, OpenStack, Microsoft Azure, Google Cloud, AWS S3, HTTP, FTP, and your computer. GDAL isn't just impressive. GDAL is exceptional.
You don't have to take my word for it. You can see it for yourself. 248 Python packages depend on GDAL directly (GDAL dependents), and another 192 depend on fiona, the Python wrapper for GDAL. On top of that, fiona has 9,372 GitHub repositories that list it as a dependency! (fiona dependents). There are thousands upon thousands of packages and projects written in Python, R, C, and Java that depend on GDAL. They use GDAL's translation prowess to read and write all types of geospatial data. So, in a way, GDAL supports much of the modern (open source) geospatial infrastructure.
Credit: xkcd; https://xkcd.com/2347
Rather than some random person in Nebraska, we have Frank Warmerdam in Ontario, who has led GDAL development since 1998. And, a shout out to the entire GDAL team (osgeo.org/projects/gdal).
If GDAL is extraordinary, then why does it suck? Because it does so much! It is frustrating to install and configure. In a Windows environment, GDAL almost always needs Administrator privileges. In the 'last mile problem' for geospatial workflows (a topic of another essay), driver issues connecting GDAL to other packages can cause headaches. Data formats don't always fit neatly into the three data models. The list goes on. Because much of our modern geospatial infrastructure depends on GDAL, it can be a chokepoint.
GDAL has the impossible job of translating and standardizing hundreds of geospatial oddities into something useful. It interfaces with dozens of file systems on desktops, servers, and the cloud. All the capabilities. All the exceptions to the rules. All the geospatial features. They become code. The code must work across platforms, services, and data types. When it doesn't, it sucks. We want it to work. When it doesn't, it is a source of madness. And then. We all say GDAL sucks.
Despite our complaints, the GDAL team continues their work. The GDAL-DEV archives show years of emails (link). Support questions. Complaints about errors and bugs. Requests for new features. Looking at the code repository. GDAL has 380 open issues (gdal issues). More feature requests, enhancements, and bugs. Impressively, they have closed 2,051 issues already. Those are thousands of issues that you and I will never experience. A quick peek at the repository for fiona--the Python wrapper for GDAL--we see 77 open issues and 609 closed issues. So the extraordinary work continues by the 400+ unsung geospatial heroes who contribute code and expertise (gdal contributors). Millions of lines of geospatial code. All of it makes our modern geospatial infrastructure possible.
It has been a long time coming, but I just want to say…
Thanks, Frank!
Thank you Frank Warmerdam, for your leadership, dedication, and 1.2+ million lines of code. (I don’t believe I have met you in person, but I have appreciated your work from afar). Thank you, Even Rouault. Who has single-handedly written 2.5+ million lines of GDAL code. Thank you to everyone who has written code, given feedback, and made the geospatial profession from globe to gates so much easier. You have made GDAL do the impossible and become extraordinary!
I am donating one dollar for each year that I have been in the geospatial world. So while GDAL may suck at times when it doesn’t work when I want it to, I truly appreciate how extraordinary it is. So, this is my small token of thanks to GDAL.
I started my journey into the geospatial world when I transitioned from being a Computer Science Master's student at the University of Iowa to a Ph.D. student at the University of Illinois at Urbana-Champaign in 2007. So I am donating $15 to GDAL through NumFocus, which supports open source projects like GDAL. A small token that I hope can be followed by many. My donation is in honor of all the geospatial heroes that make GDAL exceptional.
If you would like to donate to GDAL too, then please click on the link below. You can donate as little as $1 if you are just getting started in the geospatial profession. Or many more dollars if you are a seasoned pro.
In future essays in this newsletter, we will look at how you can use GDAL and the fiona Python package to read and write geospatial data. We might also cover my journey from high-performance computing to geospatial computing. If you have an interesting story, please feel free to share it with me (eshook@gmail.com).