Moving PostGIS databases between machines

PostGIS is really, really fantastic software. I love it.
Paired with GeoDjango to smooth out the api surface and you have lots of power in your hands.

An unfortunate side effect of how PostGIS is implemented (at the time of this writing) is that you end up with absolute filesystem path references in your databases. This makes moving a database between machines a pain.

I often run into the need to move databases between machines as I prefer to mirror the production stack as closely as reasonably possible on my local machine. Making a copy the database on the remote cluster and tunneling is an option but can be slow if you aren’t on a nice pipe.

My solution is essentially a selective restore of the dump file. If we only restore the items that have been added to the database beyond the initial PostGIS items then we can load the dump into another fresh postgis database.

pg_restore has two flags that will be useful:

  • -l: List the contents of the archive. The output of this operation can be used with the -L option to restrict and reorder the items that are restored.
  • -L list-file: Restore elements in list-file only, and in the order they appear in the file.

How handy.

Note: These options are only available if you have a Postgres formatted dump, you can create one of these with the -Fc option to pg_dump. Also be aware that crossing Postgres versions with these can sometimes break.

Here’s a little script I hacked together that will give you the list of items in one Postgres dump but not the other. Given the template_postgis dump file and the dump of your populated database you can generate the list of items that have been added to the populated database beyond the initial PostGIS items. Once you have this then loading the dump should proceed smoothly on any fresh PostGIS database. Awesome.

See help text inline for more info: