Faking a Data Provider with Python

QGIS data providers are written in C++, however it is possible to simulate a data provider in Python using a memory layer and some code to interface with your data.

Why would you want to do this? Typically you should use the QGIS data providers, but here are some reasons why you may want to give it a go:

There is no QGIS data provider
The generic access available through OGR doesn’t provide all the features you need
You have no desire to write a provider in C++
No one will write a C++ provider for you, for any amount of money

If you go this route you are essentially creating a bridge that connects QGIS and your data store, be it flat file, database, or some other binary format. If Python can “talk” to your data store, you can write a pseudo-provider.

To illustrate the concept, we’ll create a provider for CSV files that allows you to create a layer and have full editing capabilities using QGIS and the Python csv module.

The provider will:

Create a memory layer from a CSV file
Create fields in the layer based on integer, float, or string values in the CSV
Write changes back to the CSV file
Require the CSV file to have an X and Y field
Support only Point geometries

We’ll use the cities.shp file that comes with the PyQGIS Programmer’s Guide to create a CSV using ogr2ogr:

ogr2ogr -lco "GEOMETRY=AS_XY" -f CSV cities.csv ~/Downloads/pyqgis_data/cities.shp

This gives us a CSV file with point geometry fields.

If you don't want to roll your own, you can download the cities.csv file here.

Here’s the header (first row) and a couple of rows from the file:

    X,Y,NAME,COUNTRY,POPULATION,CAPITAL
    33.086040496826172,68.963546752929688,Murmansk,Russia,468000,N
    40.646160125732422,64.520668029785156,Arkhangelsk,Russia,416000,N
    30.453327178955078,59.951889038085938,Saint Petersburg,Russia,5825000,N

Signals and Slots

We’ll look at the code in a moment, but here is the general flow and connection of signals from the layer to slots (methods) in our provider:

These are all the connections we need (signal -> slot) to implement editing of both geometries and attributes for the layer.

The Plugin Code

The plugin is pretty simple; when the tool is clicked, it opens a dialog allowing you to select a CSV file, then uses the CsvLayer class to read the file, create the memory layer, make connections (signals -> slots), and add it to the QGIS canvas.

The run Method

Here’s the run method from the plugin (csv_provider.py):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


  def run(self):
      """Run method that performs all the real work"""
      # show the dialog
      self.dlg.show()
      # Run the dialog event loop
      result = self.dlg.exec_()
      # See if OK was pressed
      if result:
          csv_path = self.dlg.lineEdit.text()
          self.csvlayer = CsvLayer(csv_path)

The dialog (csv_provider_dialog.py) contains an intimidating warning and allows you to select a CSV file to load (lines 4-6).

Line 9 gets the path of the selected file from the dialog and line 10 creates the layer using the selected file. From that point on, you interact with the layer using the regular QGIS editing tools. We need to keep a reference to the layer (self.csvlayer), otherwise all our connections get garbage collected and we lose the “link” to the CSV file.

The CsvLayer Class

The CsvLayer class manages the creation, loading, and editing of the CSV file. First let’s look at the methods in csv_layer.py that create the layer from our CSV file.

Creating the Layer from the CSV

The __init__ method examines the header and a sample row to determine the names and field types to be created in the layer. To read the CSV file, we use the csv module.

Here’s the __init__ method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48


  def __init__(self, csv_path):
      """ Initialize the layer by reading the CSV file, creating a memory
      layer, and adding records to it """
      # Save the path to the file so we can update it in response to edits
      self.csv_path = csv_path
      self.csv_file = open(csv_path, 'rb')
      self.reader = csv.reader(self.csv_file)
      self.header = self.reader.next()
      logger(str(self.header))
      # Get sample
      sample = self.reader.next()
      self.field_sample = dict(zip(self.header, sample))
      logger("sample %s" % str(self.field_sample))
      field_name_types = {}
      # create dict of fieldname:type
      for key in self.field_sample.keys():
          if self.field_sample[key].isdigit():
              field_type = 'integer'
          else:
              try:
                  float(self.field_sample[key])
                  field_type = 'real'
              except ValueError:
                  field_type = 'string'
          field_name_types[key] = field_type
      logger(str(field_name_types))
      # Build up the URI needed to create memory layer
      self.uri = self.uri = "Point?crs=epsg:4326"
      for fld in self.header:
          self.uri += '&field={}:{}'.format(fld, field_name_types[fld])

      logger(self.uri)
      # Create the layer
      self.lyr = QgsVectorLayer(self.uri, 'cities.csv', 'memory')
      self.add_records()
      # done with the csv file
      self.csv_file.close()

      # Make connections
      self.lyr.editingStarted.connect(self.editing_started)
      self.lyr.editingStopped.connect(self.editing_stopped)
      self.lyr.committedAttributeValuesChanges.connect(self.attributes_changed)
      self.lyr.committedFeaturesAdded.connect(self.features_added)
      self.lyr.committedFeaturesRemoved.connect(self.features_removed)
      self.lyr.geometryChanged.connect(self.geometry_changed)

      # Add the layer the map
      QgsMapLayerRegistry.instance().addMapLayer(self.lyr)

Once the CSV file is opened, the header is read in line 8, and a sample of the first row is read in line 11.

Line 12 creates a dict that maps the field names from the header to the corresponding values in the sample row.

Line 16-25 look at each sample value and determine if its type: integer, real, or string.

Lines 28-30 create the URI needed to create the memory layer, which is done in line 34.

The add_records method is called to read the CSV and add the features in line 35.

Lastly we make the connections needed to support editing of the attribute table and the CSV file in response to the actions in our Signal/Slot diagram (lines 40-45).

Here is the add_records method that reads the CSV and creates a corresponding feature in the newly created memory layer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


def add_records(self):
    """ Add records to the memory layer by reading the CSV file """
    # Return to beginning of csv file
    self.csv_file.seek(0)
    # Skip the header
    self.reader.next()
    self.lyr.startEditing()

    for row in self.reader:
        flds = dict(zip(self.header, row))
        # logger("This row: %s" % flds)
        feature = QgsFeature()
        geometry = QgsGeometry.fromPoint(
            QgsPoint(float(flds['X']), float(flds['Y'])))

        feature.setGeometry(geometry)
        feature.setAttributes(row)
        self.lyr.addFeature(feature, True)
    self.lyr.commitChanges()

Each row in the CSV file is read and both the attributes and geometry for the feature are created and added to the layer.

Managing Changes

With the layer loaded and the connections made, we can edit the CSV using the regular QGIS editing tools. Here is the code for the connections that make this happen:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57


def editing_started(self):
    """ Connect to the edit buffer so we can capture geometry and attribute
    changes """
    self.lyr.editBuffer().committedAttributeValuesChanges.connect(
        self.attributes_changed)

def editing_stopped(self):
    """ Update the CSV file if changes were committed """
    if self.dirty:
        logger("Updating the CSV")
        features = self.lyr.getFeatures()
        tempfile = NamedTemporaryFile(mode='w', delete=False)
        writer = csv.writer(tempfile, delimiter=',')
        # write the header
        writer.writerow(self.header)
        for feature in features:
            row = []
            for fld in self.header:
                row.append(feature[feature.fieldNameIndex(fld)])
            writer.writerow(row)

        tempfile.close()
        shutil.move(tempfile.name, self.csv_path)

        self.dirty = False

def attributes_changed(self, layer, changes):
    """ Attribute values changed; set the dirty flag """
    if not self.doing_attr_update:
        logger("attributes changed")
        self.dirty = True

def features_added(self, layer, features):
    """ Features added; update the X and Y attributes for each and set the
    dirty flag
    """
    logger("features added")
    for feature in features:
        self.geometry_changed(feature.id(), feature.geometry())
    self.dirty = True

def features_removed(self, layer, feature_ids):
    """ Features removed; set the dirty flag """
    logger("features removed")
    self.dirty = True

def geometry_changed(self, fid, geom):
    """ Geometry for a feature changed; update the X and Y attributes for
    each """
    feature = self.lyr.getFeatures(QgsFeatureRequest(fid)).next()
    pt = geom.asPoint()
    logger("Updating feature {} ({}) X and Y attributes to: {}".format(
        fid, feature['NAME'], pt.toString()))
    self.lyr.changeAttributeValue(fid, feature.fieldNameIndex('X'),
                                  pt.x())
    self.lyr.changeAttributeValue(fid, feature.fieldNameIndex('Y'),
                                  pt.y())

The dirty flag is used to indicate that the CSV file needs to be updated. Since the format doesn’t support random access to individual rows, we rewrite the entire file each time an update is needed. This happens in the editing_stopped method.

When attributes are changed and/or removed, the only action taken is to set the dirty flag to True.

When there is a geometry change for a feature, it is updated in the attribute table immediately to keep the X and Y values in sync with the feature’s coordinates. This happens in the geometry_changed method.

You can view the full code for both the CsvLayer class and the plugin on GitHub or by installing the plugin. To install the plugin, you’ll need to make sure you have the Show also experimental plugins checkbox ticked in the Plugin Manager settings. To help you find it, the plugin is listed as CSV Provider in the Plugin Manager.

Looking Further

A few things about this implementation:

It is an example, not a robust implementation
It lacks proper error handling
There is no help file
It could be extended to support other geometry types in CSV
In its current form, it may not work for other CSV files, depending on how they are formatted (take a look at the Add Delimited Text Layer dialog in QGIS to see what I mean)
If you want to enhance this for fun or profit—well for fun, fork the repository and give me some pull requests

A couple of final points:

Going this route can be a lot more work than using an existing data provider
This method can be (and has been) used successfully to interface with a data store for which there is no data provider