GIS Data is an Illicit Drug
GIS in General March 29th, 2006GIS data is like an illicit drug. You can’t control it. It travels in secret and hides in the dark alleys of your organization. Its effect spreads and enslaves those that use it. In the end it can lead to ruin.
Well maybe its not that bad but organizing and managing your GIS data is difficult. If you need to maintain canonical datasets, the spread of “temporary” and/or “working” copies is your enemy. The problem seems to be directly related to the size of the organization. It becomes particularly grievous when dealing with enterprise data in a large organization.
Even when controls or policies are in place that mandate the use of only the authoritative copy of a dataset, the nature of GIS practically encourages the duplication of data. Here are some scenarios that lead to the duplication of data:
- Perception that local data is better/faster/more available that than on the server. The answer, make a copy! Oh, and while we’re at it, tweak it a bit.
- Resistance to change in technology. You move all your vector or raster data to a relational database for centralized management and administration and folks still use the client tools to make a local copy in a format they are more comfortable with.
- An error is found in a dataset, someone makes copy, corrects it, and then uses it in their work rather than pushing the fix back through the organization to update the canonical copy.
- A customization of the data is needed and through evolution of versions, you end up with a dozen copies, typically labeled boundary, boundary_old, boundary_good, boundary_2k, boundary_best, and so on. Which one do you use? Better yet, which one does the new guy that comes on 2 years from now use?
The crux of the problem is that you can’t stop people from creating local datasets. I can hear the GIS freedom fighters howling now. Don’t get me wrong, I firmly believe in doing what it takes to get the job done and I’m not a command and control type. I’ve witnessed this from both sides of the fence and am convinced that in the long run, something has to be done to get control of GIS data without impeding the workflow and ability to complete projects in a timely fashion.
Perhaps its just a management problem.

March 29th, 2006 at 8:39 pm
Spot on analysis.
I’m lucky enough to see a push for consolidation in my current job, but here are some more scenarios that lead to data proliferation:
- The original enterprise data model failed to recognise an existing need, or is inflexible in adjusting to new requirements.
- Client software requires update access to the database for making styling changes (is this GIS or Illustrator folks?)
- The organisation’s GIS users are balkanized both by the organisation and by technology. Data models and databases that are readble by one department need scheduled translations for usage by other departments.
- The update cycle is broken. The maintaining department is unresponsive to error notifications and change requests.
- The complexity of client-side setup for new data models is high, creating a barrier to data model changes.
Honestly, I’d rather say that it’s a managment problem so that my manager has to deal with it
Jason
March 29th, 2006 at 9:58 pm
You hit the nail on the head, Gary. Revisions are necessary since all data needs to be massaged for analysis or cartography. Currently the most practical way to do this is to make a new dataset. You spawn new datasets to make minor changes, to join with attribute tables, to reproject, to test processing steps on smaller clipped regions, to move to another platform or another software package, etc. So you’ve got a dozen datasets on three machines all essentially representing the same thing with cryptic names like eu_bord_clip, eu_bord2, etc. Then you realize you made some minor mistake in the original parent dataset. Ahhhh….
So thinking pie-in-the-sky here, what would be the ideal solution? I was dreaming of a CVS /SVN central repository-like system where each user could check out their local copy, make changes to the dataset in their software of choice. Those changes can be checked back in, split into branches, rolled back to previous versions and all the metadata is managed internally. Using a data abstraction layer, the data could be checked in/out in any format so users could download and work with whichever format they needed locally. I can’t imagine the logistics of building such a system but it’s always nice to dream.
January 12th, 2009 at 9:47 am
I’m coming in way late here, but I’m doing a grad school project on enterprise GIS data and knowledge management and would love to hear about any successful solutions these problems.