GIS Data is an Illicit Drug

GIS data is like an illicit drug. You can’t control it. It travels in secret and hides in the dark alleys of your organization. Its effect spreads and enslaves those that use it. In the end it can lead to ruin.

Well maybe its not that bad but organizing and managing your GIS data is difficult. If you need to maintain canonical datasets, the spread of “temporary” and/or “working” copies is your enemy. The problem seems to be directly related to the size of the organization. It becomes particularly grievous when dealing with enterprise data in a large organization.

Even when controls or policies are in place that mandate the use of only the authoritative copy of a dataset, the nature of GIS practically encourages the duplication of data. Here are some scenarios that lead to the duplication of data:

Perception that local data is better/faster/more available that than on the server. The answer, make a copy! Oh, and while we’re at it, tweak it a bit.
Resistance to change in technology. You move all your vector or raster data to a relational database for centralized management and administration and folks still use the client tools to make a local copy in a format they are more comfortable with.
An error is found in a dataset, someone makes copy, corrects it, and then uses it in their work rather than pushing the fix back through the organization to update the canonical copy.
A customization of the data is needed and through evolution of versions, you end up with a dozen copies, typically labeled boundary, boundary_old, boundary_good, boundary_2k, boundary_best, and so on. Which one do you use? Better yet, which one does the new guy that comes on 2 years from now use?

The crux of the problem is that you can’t stop people from creating local datasets. I can hear the GIS freedom fighters howling now. Don’t get me wrong, I firmly believe in doing what it takes to get the job done and I’m not a command and control type. I’ve witnessed this from both sides of the fence and am convinced that in the long run, something has to be done to get control of GIS data without impeding the workflow and ability to complete projects in a timely fashion.

Perhaps its just a management problem.