jueves, 31 de enero de 2013

Dividing in equal area zones

This post is, once again, inspired by a  question by a QGIS user in the QGIS mailing list. The problem is as follows:

Given a DEM and a polygon defining an area to analize, divide the area inside the polygon in subareas of a given surface S. This subareas will be used for defining a harvesting scheme, and since the risk of erosion is to be minimized, they should run along contours. In other word, the problem is similar to creating elevation classes, but instead of being defined by an elevation interval, the interval is based on the total area between class breaks.

This is an interesting problem, because it can be solved in many different ways, and using many different tools from the SEXTANTE toolbox. Here is just one possible solution, which I consider particularly elegant.

I will be using the same DEM used for other previous post, along with a simple polygon created on top of it.

First we clip the DEM with the polygon that represents the area to divide, using the SAGA Clip grid with polygons algorithm

Now we run the SAGA Sort grid to get a grid that has values ranging from 1 to n, n being the total number of cells (excluding no-data ones) in the input layer. The cell with the lowest value in the input layer has a value in the output layer of 1, the cell with the second lowest value gets a value of 2, and so on.

The resulting layer looks very similar to the original DEM. How similar it looks depend on how homogeneously distributed the original values are.

We can now reclassify this layer to get the desired zones. If we want a total area of , let's say, 1000ha (10000000 m2) for each zone, it is easy to calculate the number of cells in each zone, just dividing the zone area by the area of a size cell (the DEM that I used has a cellsize of 25 * 25 = 625m2). In our case, numcells = 10000000 / 625 =  16000.

Note: I have selected a large area for each zone, to get a result where they can be easily visualized, but in this context a much smaller are would be more logical.

We need group of 1600 cells, taken using the ordering defined by the sorted grid we have produced. The most straightforward way to do it is to use a reclassify algorithm. SAGA has one, and you need a table with the values defining each range and the resulting value. In our case, it might take a while to fill it, since there is a large number of group. GRASS has another algorithm, and the ranges are defined in a text file, which might help us, since we can create it more easily using a spreadsheet.

However, let's try to do it without reclassifying, looking for an easier way. Since our classes are regular, we can define them using a mathematical expression. Open the SAGA Grid calculator algorithm,, select the sorted layer as the only input grid (which we will refer to in the formula as a),  and use the following formula: int( (a-1) / 117298 * 8)

Et voilà, this will directly produce the layer we are looking for.

How did that work? Let's have a closer look at the formula.

First, we normalize the layer to have values only in the (0,1) range, by substracting the min value (1) and dividing by the total range covered (117298). Then we multiply by the number of classes we want (8, which was obtained by dividing the total number of cells, 117298, by the size of each class in cells, 16000, and rounding it up), to get values in the range (0,8). Then we apply the int() function to truncate those values and have just integer class values.

How can we now that each class has the same number of pixels as the other ones? Easy. Just go to the properties of the resulting raster layer and compute a histogram. You will see that the histogram has the same  constant y value at all integer x values.

If you are familiar with image analysis, you will easily recognize here a case of image equalization.

Notice that the class index are zero-based (and up to 7, not 8), since we used truncation.

The resulting layer looks similar to the one that would be obtained by just reclassiying the DEM instead of the ordered layer, but the result might differ depending on how the layer values are distributed.

lunes, 28 de enero de 2013

Using selection algorithms

Among the long list of algorithms in SEXTANTE, selection algorithms are a very interesting group. Instead of producing new layers, they just modify the selection that exist in one of them. They cannot be used from the batch processing interface (since it works with layers that are not loaded in QGIS, so there is no selection at all), and using them from the SEXTANTE toolbox is not such an useful tool, but in the modeler and the Python console they can really help us simplify things and automate tasks.

This short post shows how to use them to automate a task that would otherwise involve a large amount of work,  but that can be solved this way with just a few lines. It is inspired by this question recently asked at StackOverflow

The problem is as follows: given a points layer (species) with the presence of different species, and a polygon layer(zones) with vegetation types, generate N new polygon layer, each one with the areas where each specie can be found.

Assuming that the specie represented at each point is stored in an attribute named specie, the following script will solve out problem

import sextante
import os
basefolder = "[put_your_destination_folder_here]"
species = sextante.runalg('qgis:listuniquevalues', 'species', 'specie', None)['UNIQUE_VALUES'].split(";")
for specie in species:
  sextante.runalg('qgis:selectbyattribute', 'species', 'specie', 0, specie)
  sextante.runalg('qgis:selectbylocation', 'zones', 'species', 0)
  filename = os.path.join(basefolder, specie + ".shp")
  sextante.runalg('qgis:saveselectedfeatures', 'zones', filename)

Just two selections, and then we save the resulting selected features to a file whose name is created from the name of the corresponding specie.

This should work on the just-released 1.0.9 version of SEXTANTE. Just make sure you do not enable executing in a separate thread, since it seems to still cause some strange errors when using the console, and it tends to freeze QGIS when a runalg command is put in a loop.

martes, 8 de enero de 2013

Hydrology analysis with TauDEM

To start the year, here is the first external contribution to this blog, a great post about the TauDEM algorithm provider for hydrological analysis. Alexander Bruy, an active QGIS developer, and most likely the most active SEXTANTE developer apart from myself, is responsible for this great entry that I am glad to share here.

Thanks, Alex!


TauDEM (Terrain Analysis Using Digital Elevation Models) is a set of Digital Elevation Model (DEM) tools for the extraction and analysis of hydrologic information from topography as represented by a DEM. This is software developed at Utah State University (USU) for hydrologic digital elevation model analysis and watershed delineation.

TauDEM is distributed as a set of standalone command line executable programs for a Windows and source code for compiling and use on other systems.  SEXTANTE contains only the interface description, so you need to install TauDEM 5.0.6 by yourself and configure SEXTANTE properly.

Let’s start.

First of all load into QGIS the DEM and adjust symbology. It is the same DEM used in the last hydrological analysis post

TauDEM extracts hydrologically useful information from raw digital elevation model data. The main idea is based on the concept of the hydrological flow field being represented by flow from each grid cell to one or more of its neighbors. For this to work the topography should not contain pits, defined as one or more grid cells surrounded completely by cells with higher elevation. The first step in hydrological analysis is should be pit removal. TauDEM uses a fill process to do this, raising the elevation of pits until they drain out. A DEM with pits removed is referred to as hydrologically correct and can be used to calculate flow directions for each grid cell.

So the first tool to run is Pit Remove

Select the DEM in the Elevation Grid field and execute it.

The output DEM is the same, because our input file has no pits. If you are absolutely sure that your input file has no pits, this step can be skipped, otherwise it is safer to run Pit Remove to ensure that all further analysis will be correct.

The next algorithm to run is D8 Flow Direction. 

This takes as input the hydrologically correct elevation grid (which we create at the previous step) and outputs D8 flow direction and slope for each grid cell. The resulting D8 flow direction grid is shown below. 
It uses an encoding of the direction of steepest descent from each grid cell using the numbers 1 to 8, to represent each one of the cells around a given one. D8 is the simplest flow direction model.

NOTE: all TauDEM algorithms have built-in help, so you always can read description of the algorithm, required inputs and produced results.

The next tool to run is D8 Contributing Area. It counts the number of grid cells draining through (out of) each grid cell based on D8 flow directions.

There are optional inputs (outlets and an input weight grid) which we leave empty for now. These inputs are described in the tool help and allow to restrict calculation to the specific area upstream of the designated outlet and to accumulate an input weight field, rather than just counting contributing area as a number of grid cells. There is also an option named Check for edge contamination (set to Yes by default). 

The resulting layer produced by this algorithms looks like this:
A logarithmic scale is better suited to the particular distribution of values in this layer

In the rendering below, pink color is used to show NODATA values, so as to illustrate edge contamination.
The functions above used the D8 flow model that represents flow from each grid cell to one neighbor. TauDEM also provides the D∞ (D-Infinity) flow model that calculates the steepest outwards flow direction using triangular facets centered on each grid cell and distributes flow between neighboring grid cells based on flow direction angles. 

Having flow directions and slope grids we can delineate and analyze stream networks and watersheds. The simplest stream network delineation method uses a threshold on contributing area.

First we need to define a stream using the Stream Definition by Threshold algorithm. 

This tool defines a stream raster grid by applying a threshold to the input. In our case the input is a D8 contributing area grid and a threshold of 100 grid cells has been used.

The result depicts the stream network as a binary grid (but is not logically connected as a network shapefile yet).
The next step is to define the outlet point. This can be done by creating a point shapefile and placing the outlet point. Press the New Shapefile Layer button in the QGIS main window, and the following dialog will appear.

Select Point as shape type, and specify the same CRS as the DEM. TauDEM requires this, because it doesn’t do any spatial reference transformations. Now press OK and save the new file wherever you prefer. A new vector layer will be added to QGIS canvas. Select it and start editing. Zoom to a likely location for an outlet and create a new feature, then save and stop editing.

It is not required for the outlet to be precisely located where there will be a stream as TauDEM has a tool to move outlets to streams that we will used. Alternatively, if the outlet point location is available from some other source (e.g. location information about a stream gauge), it can be created from that information.
Here is the outlet we created manually.

As you can see, it is located outside the stream. So we use the Move Outlets To Stream tool to get outlet in correct place.

IMPORTANT: TauDEM creates shapefile in same CRS as input files, but it doesn’t create .prj file. You need to define projection for output file manually in order to get it in correct place. This can be done from layer context menu using Set Layer CRS item.

Here is our initial outlet (red) and correct one (green). Notice how the outlet has been moved to match the stream.

With the outlet positioned on the stream the stream network upstream of the outlet can be delineated.
We again run the D8 Contributing Area algorithm, but this time specifying an outlet shapefile to evaluate contributing area and effectively identify the watershed upstream of the outlet point (or points for multiple outlets).

The result is the contributing area only for the watershed upstream of the outlet.

The next step is to use the Stream Definition By Threshold algorithm to define streams using a specified contributing area threshold. We choose a threshold of 200 grid cells.

The result is a grid stream network upstream of the outlet
This network is still only represented as a grid. To convert it into vector elements represented using a shapefile, the Stream Reach and Watershed algorithm is used.

This algorithm produces many outputs including watershed grid and stream network in shape format and some text files with additional information. Here we visualize the watershed grid and stream network shapefile.

The attributes of a stream network link contains useful information such as downstream and upstream connectivity, stream order, length, slope, drainage area etc (detailed explanation of fields is available in the tool help).

Rather than do each step separately, we can create model with SEXTANTE modeler to run these algorithms together.