viernes, 23 de noviembre de 2012

Spliting a vector layer

A user in the QGIS mailing list asked a couple of days ago about how to split a points layer in n given layers containing a fraction of those points. As this can be solved in many different ways, I thought it would be a good idea for a blog post and I am proposing a couple of alternatives here, of course all of them using SEXTANTE.

You can use any layer you want to test this examples. I will be using the elevations points layer in the Baranja Hill dataset.



Alternative 1

The first way we can split a layer into n new layers is by randomly assigning each feature to a group, with a total of n possible options, and then splitting the layer based on that group. The first step can be easily performed using the great Pyculator plugin, which is already available from SEXTANTE.

Double click on its name and fill the fields in the parameters window as shown below



Here we are creating 10 different groups, but you can enter the number you prefer or need.

There is no guarantee that the number of points will be equal in each group, but since the random values follow an uniform distribution, it should more or less be similar, which is enough for this purpose.

Here is the resulting layer rendered using a classified scheme based on the new field we have created in the layer, which represents the group it belongs to.



Now we can just split the layer using that field, creating a script algorithm to do so.

Here is the code of that algorithm

#Definition of inputs and outputs
#==================================
##[Example scripts]=group
##input=vector
##class_field=field input
##output=output file

#Algorithm body
#==================================
from qgis.core import *
from PyQt4.QtCore import *
from sextante.core.SextanteVectorWriter import SextanteVectorWriter

# "input" contains the location of the selected layer.
# We get the actual object,
layer = sextante.getObjectFromUri(input)
provider = layer.dataProvider()
allAttrs = provider.attributeIndexes()
provider.select( allAttrs )
fields = provider.fields()
writers = {}

# Fields are defined by their names, but QGIS needs the index for the attributes map
class_field_index = provider.fieldNameIndex(class_field)

inFeat = QgsFeature()
outFeat = QgsFeature()
inGeom = QgsGeometry()
nFeat = provider.featureCount()
nElement = 0
writers = {}

while provider.nextFeature(inFeat):
  progress.setPercentage(int((100 * nElement)/nFeat))
  nElement += 1
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  if clazz not in writers:
      outputFile = output + "_" + str(len(writers)) + ".shp"
      writers[clazz] = SextanteVectorWriter(outputFile, None, fields, provider.geometryType(), provider.crs() )
  inGeom = inFeat.geometry()
  outFeat.setGeometry(inGeom)
  outFeat.setAttributeMap(atMap)
  writers[clazz].addFeature(outFeat)

for writer in writers.values():
  sextante.load(writer.fileName)  
  del writer


A few things to comment here:
  • There are no declared outputs. That is because we are not going to generate a vector layer, but several of them, and we do not know in advance the number of new layers to create, since it depends on the input layer and the number of classes it has.
  • We have an input of type file. That is the base filename we use for the layers to create. WE could have used a string input as well, but using a file input will make it easier for the user to browse in the file system and select the filename to use.
  • We are loading layers explicitly. This is not needed (and should not be done) if the output is declared, but in this case we have to do it in case we want the layers to be loaded. Once again, this is a problem of the semantics of the algorithm, which are not well-defined, since the number of outputs is not know at design-time.
  • We cannot use this algorithm in the modeler. The semantic problems make it impossible to add it to a model, since it has no declared output.
Now execute the algorithm an use the points layer with the random field as the file to split, and that random field as the one to use for splitting into classes.


You will get your set of new layers, and they will be loaded into QGIS.

We can make it even easier to split a layer. How can we put this two algorithms into just one, which takes the layer to split and the number of final layers we want to create? Again, a bit of Python scripting can be used to create a new script algorithm, which wraps the two algorithm that are involved in this operation.

I will leave this a as an exercise for the reader ;-)

Alternative 2.

A different way of dividing a later is by tiling it. We can do this using two algorithms: one to create a tiling of polygons and another one to clip the layer using each one of the polygons.

First, let's create an algorithm to do the tiling and create a set of rectangles covering the area of the layer to split. A small change  in the script that we used to create the hex grid of the last post, and we have it. Here is the code:

##[Example scripts]=group
##input=vector
##numpolygons=number 10
##polygons=output vector
from sextante.core.QGisLayers import QGisLayers

input = sextante.getObjectFromUri(input)
centerx = (input.extent().xMinimum() + input.extent().xMaximum()) / 2
centery = (input.extent().yMinimum() + input.extent().yMaximum()) / 2
width = (input.extent().xMaximum() - input.extent().xMinimum())
cellsize = width / numpolygons
height = (input.extent().yMaximum() - input.extent().yMinimum())
sextante.runalg("mmqgisx:creategrid", cellsize, height, width, height, centerx, centery, 1, polygons)


This creates n horizontal tiles, but it can be easily adjusted to create a different design.

Save the algorithm as Create_tiling_from_vector_layer.py, and run it setting a total of 5 polygons to be created.



This is the result you will get.



Now, we can use the Clip algorithm, but instead of using it for the whole layer, we will use it with the run iteratively option for the clipping layer. That will cause the algorithm to be executed once for each polygon, just like we want. Remember that to active the iterative execution you have to click the button with the green icon at the right-hand side of the box where you select the layer.



Here you can see the result, with 5 different layers, each one rendered with a different color.


Alternative 3.

Use the GRASS v.kcv module. Very practical, but not as fun as the examples above :-)

Alternative 4

Use lassplit from LASTools, also available from SEXTANTE. Not a very good option, since you need las files to work with it. The point layer to split should be exported to a las file , then processed, and then the resulting ones should be converted back to shp. This last step can be done with the las2shp tool, used through the SEXTANTE batc processing interface to ease the task.

Note:

I found a couple of bugs in the SEXTANTE Pyculator implementation and in the MMQGISX Create grid algorithm when using rectangles. They are already fixed, but only in the GitHub version, so will need it to run these example

domingo, 11 de noviembre de 2012

Hexagons, trees and more

One thing I like to do (and that I will probably be doing in other entries in this blog) is to find examples of spatial analysis and to replicate them with SEXTANTE. If the original example uses QGIS, a SEXTANTE-based solution not only involves obtaining the same result using the same software, but also having a better way of replicating that analysis with a different dataset or automating it, since it can be integrated into SEXTANTE tools such as the graphical modeler or the batch processing interface.

For today's example, I am going to replicate this post from Anita Graser's blog, one of the best QGIS blogs around, with plenty of interesting examples. I recommend reading the original entry, but basically we are going to work with the Viena Tree Cadastre, create an hexagonal grid and find the number of trees in each hexagon, to get a nice rendering of how green each area of the city is. After that, we will go a little further and add some extra calculations, just to show some additional SEXTANTE elements.

The first thing is to create the grid. The MMQGISX Create Grid algorithm can be used for that, but the extent of the grid has do be defined manually with the coordinates of the center of the grid and its width and height. That might make it difficult to use it later in an automated context. For this reason, we are going to create our own algorithm to generate the grid based on the bounds of a layer. Of course, we will not do it from scratch, but reusing the MMQGISX algorithm. Here is how to do it:

Create a new script algorithm, selecting the Create new script tool in the SEXTANTE toolbox.

Add the following code to the script. Comments explain most of it.

#Definition of inputs and outputs
#==================================
#input=vector
##cellsize=number 1000.0
##grid=output vector

#Algorithm body
#==================================
from sextante.core.QGisLayers import QGisLayers

# "input" contains the location of the selected layer.
# We get the actual object, so we can get its bounds
input = QGisLayers.getObjectFromUri(input)

# And now we calculate the bounds and call the MMQGISX algorithm
centerx = (input.extent().xMinimum() + input.extent().xMaximum()) / 2
centery = (input.extent().yMinimum() + input.extent().yMaximum()) / 2
width = (input.extent().xMaximum() - input.extent().xMinimum())
height = (input.extent().yMaximum() - input.extent().yMinimum())
Sextante.runalg("mmqgisx:creategrid", cellsize, cellsize, width, height, centerx, centery, 3, grid)

This creates a new algorithm that wraps the existing one to construct the grid layer, but makes it easier to enter the bounding box, since it takes it from an already loaded layer.

Executing the algorithm, we will see the following parameters dialog.

We select the tree cadastre layer as input, and a cellsize of 0.0025, and we will get the grid covering the input layer.


Now to count the number of trees in each hexagon, we combine the trees layer and the grid layer, using both of them as inputs in the fTools Count points in polygon algorithm.

The result, with a graduated color ramp based on the attribute containing the number of points, looks like this.


 Empty cells are rendered in white, and since the borders of each hexagon are also white, it seems that there are no cells in certain parts of the covered region. However, they are there, and it would be a good idea to remove them, so as to make the layer size smaller. We can use another MMQGISX algorithm (Select), to create a new layer that contains just the hexagon cells with at least one tree.


Cell borders are represented in black this time, just to show the missing ones.

If we put all that in a model, it will allow us to create the final grid in one single step, just selecting the input layer with elements to count (trees in this case). and the desired hexagon size.


As you can see, there is no problem reusing the algorithm we have created as a part of a model. In fact, we can use a model inside another model, or our script-based algorithm as part of another script, calling it just like we call any of the other algorithms.

Running the model, we should see the parameters dialog shown below.



We can represent not just the number of trees, but the diversity of them, showing how many different species can be found in each cell (the trees layer includes an attribute with the name of the specie)

There is no algorithm to get the number of unique points in a polygons, so we have to find a workaround. The first step for that is to add a new field to the points layer with the code of the cell it belongs to. We can use the SAGA Add polygon attributes to points algorithm, but we need a unique attribute for each cell. The Add autoincremental field algorithm does exactly that.


 Now we can run the Add polygon attributes to points algorithm.


Once the points have the ID of the polygon they belong to, we can make a small script to count the number of different tree species in each class. We will create a new SEXTANTE algorithm with the following script.

#Definition of inputs and outputs
#==================================
##input=vector
##class_field=field input
##value_field=field input
##output=output vector

#Algorithm body
#==================================
from sextante.core.QGisLayers import QGisLayers
from qgis.core import *
from PyQt4.QtCore import *
from sextante.core.SextanteVectorWriter import SextanteVectorWriter

# "input" contains the location of the selected layer.
# We get the actual object, so we can get its bounds
layer = QGisLayers.getObjectFromUri(input)
provider = layer.dataProvider()
allAttrs = provider.attributeIndexes()
provider.select( allAttrs )
fields = provider.fields()
fields[len(fields)] = QgsField("UNIQ_COUNT", QVariant.Int)
writer = SextanteVectorWriter(output, None, fields, provider.geometryType(), provider.crs() )

# Fields are defined by their names, but QGIS needs the index for the attributes map
class_field_index = provider.fieldNameIndex(class_field)
value_field_index = provider.fieldNameIndex(value_field)

inFeat = QgsFeature()
outFeat = QgsFeature()
inGeom = QgsGeometry()
nFeat = provider.featureCount()
nElement = 0
classes = {}

#Iterate over input layer to count unique values in each class
while provider.nextFeature(inFeat):
  #Printing a number to the console to indicate progress
  print int((50 * nElement)/nFeat)
  nElement += 1
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  value = atMap[value_field_index].toString()
  if clazz not in classes:
      classes[clazz] = []
  if value not in classes[clazz]:
      classes[clazz].append(value)      

# Create output vector layer with additional attribute
while provider.nextFeature(inFeat):
  print int((500 * nElement)/nFeat)
  nElement += 1  
  inGeom = inFeat.geometry()
  outFeat.setGeometry( inGeom )
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  outFeat.setAttributeMap( atMap )
  outFeat.addAttribute( len(provider.fields()), QVariant(len(classes[clazz])))
  writer.addFeature( outFeat )
  
del writer

Again, the comments in the source code should explain how it works.

 Running that on the trees layer with the extra field will generate a new points layer with another new field containing the number of  different species (the richness) in the polygon each tree belongs to. To pass that value to the corresponding hexagon cell in the grid layer, we can use the SAGA Point statistics in polygons algorithm. Since all points in a cell will have the same richness value (because it is the richness of the cell they belong to), calculating the average value will yield exactly that richness value.

 Below you can see the result.



The script we have created can be used as well to find out the street with the largest number of different trees. Just run it on the original trees layer (which contains an attribute with the name of the street where the tree is), using the street name as class attribute, and then sort the attributes table of the resulting layer based on the richness attribute.

Tuerkenschanzpark seems to be the one with the largest value (219). Nice place for a walk, I guess ;-)

A note


The process above is a bit cumbersome, although rather useful not just as a didactic example, but also from a practical point of view, since we have created two new algorithms that can be reused. However, an algorithm to count unique points would be a better solution. As I said, there was not one to do that, but I thought it would be a good idea, and it is pretty easy to create one, so I added it to the fTools group. You can now find it in the SEXTANTE toolbox if using the development version, and it will be available in the next SEXTANTE release.

Another note


The measure of richness we have used is rather poor. For better ones, check this wikipedia article on diversity indices, and if you feel adventurous, try to implement one yourself :-)

Bonus 


Another interesting way of representing points when the number of them is high is by aggregating them and replacing a group of points by a bigger one with a size related to the size of the group. This technique is used, for instance, in the GeoTools PointStacker transformation. We can create such a rendering by converting the hexagon cells into circles with a radius depending on the number of trees. First we convert the hexagons into points, by running the fTools Polygon centroids algorithm. The attributes table of the centroids layer is exactly like the one of the original polygon layer, so it will contains the field with the number of points. We can divide the values in groups and create a new value for each group (use the Field pyculator algorithm for that), representing the radius of the circle that represents that class. Now we can buffer each centroid using that radius as buffer distance. Feel free to try it yourself :-)

lunes, 5 de noviembre de 2012

Some hydrological analysis

For this entry, we are going to do some hydrological analysis. Starting with a DEM, we are going to extract a channel network, delineate watersheds and calculate some statistics. We will use mostly algorithms from SAGA, but also from other providers, to complement them. And finally we will show how to use one of the powerful tools of SEXTANTE: the iterative execution of algorithms. Using it, we will calculate a histogram to show how elevation is distributed in each subwatershed, and plot it using R.

Let's start.

The first thing is to load this DEM.




And the first module to execute is the Catchment area (Parallel) one (use the search box in the toolbox to find it). You can use anyone of  the others named Catchment area. They have different algorithms underneath, but the results are the same.

Select the DEM in the Elevation field, and leave the default values for the rest of parameters.


This algorithm calculates many layers, but the Catchment Area one is the only one we will be using.


You can get rid of the other ones if you want.

The rendering of the layer is not very informative. To know why, you can have a look at the histogram and you will see that values are not evenly distributed (there are a few cells with very high value, those corresponding to the channel network). Calculating the logarithm of the catchment area value yields a layer that conveys much more information (you can do it using the raster calculator).



The catchment area (also known as flow accumulation), can be used to set a threshold for channel initiation. This can be done using the Channel network algorithm. Here is how you have to set it up.



Use the original catchment area layer, not the logarithm one. That one was just for rendering purposes.

If you increase the Initiation threshold value, you will get a more sparse channel network. If you decrease it, you will get a denser one. With the proposed value, this is what you get.



The image above shows just the resulting vector layer and the DEM, but there should be also a raster one with the same channel network. That raster one will be, in fact, the one we will be using.

Now, we will use the Watersheds basins algorithmtol delineate the subbasins corresponding to that channel network, using as outlet points all the junctions in it. Here is how you have to set the corresponding parameters dialog.



And this is what you will get.


This is a raster result. You can vectorise it using the Vectorising grid classes algorithm.




Now, let's move into the second part of this tutorial. Let's try to create a set of histograms that show how elevation values are distributed within each subbasin. First, let's see how to do it for a single subbasin. The idea is to have a layer that just represents the elevation within that subbasin and then pass that layer to R so we can calculate the histogram and plot it.

First, let's clip the orginal DEM with the polygon representing a subbasin. There is another SAGA algorithm to do so, and it's called Clip Grid with Polygon. Although we are calling an external application, since we are using SEXTANTE, it will be aware of the selected features in the layer we use as input. SEXTANTE will take care of creating an intermediate layer, so we do not have to worry about it and the integration between QGIS and SAGA is more complete. so, if we select a single subbasin polygon and then call the clipping algorithm, we can clip the DEM to the area covered by that polygon.

Select a polygon,




and call the clipping algorithm with the following parameters:


The element selected in the Input field is, or course, the DEM we want to clip.

You will get something like this.



This layer is ready to be used in R, but first we need an R algorithm that can use it. Add a new R script named  Raster Histogram, with the following code:


##R scripts=group
##layer=raster
##showplots
hist(as.matrix(layer))


This R script is, in fact, included as an example script, but you should try to add it yourself and get some practice. If you know R, you can even try to adapt it to plot the accumulated histogram or the values from an hypsometric curve, more commonly used to describe the relief of a watershed.

If you run the script using the clipped DEM as input, you will get a plot of the elevation histogram within the basin used to create that clipped DEM.

We can link this two last algorithm (clipping with SAGA + plotting with R) creating a simple model, as shown below. It takes a polygon layer and a raster layer, and then does the clipping and the plotting.



The intermediate clipped layer should not be a final result, but the output from the R script has to be marked as final, so you should enter a name in the corresponding box. Enter "Plots" as the name of that final output.
Save the model and name it "clipping + plotting". Now if you double click on the new entry corresponding to that model in the toolbox, you will have the following parameters dialog.


If you enter the vector layer as input, it will use the whole layer, and you will get a single histogram. You can select one of its polygons, and you will get the histogram for that basin. however, what we want is to have all the histograms corresponding to all the polygons. Here is where the iterative execution comes in.

Select the watershed vector layer as input, but click on the button on the right-hand side of the row corresponding to that parameter (the one with the green icon). That will cause SEXTANTE to run the algorithm (the model in this case), n times, one for each of the n features in the selected vector layer. That is exactly what we want.

All the resulting plots will be added to the SEXTANTE results window, as usual.


You can also create a model with the algorithms in the first part of the tutorial. That way, only two steps will be needed to get from the DEM to the set of plots: one for creating the layer with watersheds, another one to do the clipping and plotting.