jueves, 31 de enero de 2013

Dividing in equal area zones

This post is, once again, inspired by a  question by a QGIS user in the QGIS mailing list. The problem is as follows:

Given a DEM and a polygon defining an area to analize, divide the area inside the polygon in subareas of a given surface S. This subareas will be used for defining a harvesting scheme, and since the risk of erosion is to be minimized, they should run along contours. In other word, the problem is similar to creating elevation classes, but instead of being defined by an elevation interval, the interval is based on the total area between class breaks.

This is an interesting problem, because it can be solved in many different ways, and using many different tools from the SEXTANTE toolbox. Here is just one possible solution, which I consider particularly elegant.

I will be using the same DEM used for other previous post, along with a simple polygon created on top of it.

First we clip the DEM with the polygon that represents the area to divide, using the SAGA Clip grid with polygons algorithm

Now we run the SAGA Sort grid to get a grid that has values ranging from 1 to n, n being the total number of cells (excluding no-data ones) in the input layer. The cell with the lowest value in the input layer has a value in the output layer of 1, the cell with the second lowest value gets a value of 2, and so on.

The resulting layer looks very similar to the original DEM. How similar it looks depend on how homogeneously distributed the original values are.

We can now reclassify this layer to get the desired zones. If we want a total area of , let's say, 1000ha (10000000 m2) for each zone, it is easy to calculate the number of cells in each zone, just dividing the zone area by the area of a size cell (the DEM that I used has a cellsize of 25 * 25 = 625m2). In our case, numcells = 10000000 / 625 =  16000.

Note: I have selected a large area for each zone, to get a result where they can be easily visualized, but in this context a much smaller are would be more logical.

We need group of 1600 cells, taken using the ordering defined by the sorted grid we have produced. The most straightforward way to do it is to use a reclassify algorithm. SAGA has one, and you need a table with the values defining each range and the resulting value. In our case, it might take a while to fill it, since there is a large number of group. GRASS has another algorithm, and the ranges are defined in a text file, which might help us, since we can create it more easily using a spreadsheet.

However, let's try to do it without reclassifying, looking for an easier way. Since our classes are regular, we can define them using a mathematical expression. Open the SAGA Grid calculator algorithm,, select the sorted layer as the only input grid (which we will refer to in the formula as a),  and use the following formula: int( (a-1) / 117298 * 8)

Et voilà, this will directly produce the layer we are looking for.

How did that work? Let's have a closer look at the formula.

First, we normalize the layer to have values only in the (0,1) range, by substracting the min value (1) and dividing by the total range covered (117298). Then we multiply by the number of classes we want (8, which was obtained by dividing the total number of cells, 117298, by the size of each class in cells, 16000, and rounding it up), to get values in the range (0,8). Then we apply the int() function to truncate those values and have just integer class values.

How can we now that each class has the same number of pixels as the other ones? Easy. Just go to the properties of the resulting raster layer and compute a histogram. You will see that the histogram has the same  constant y value at all integer x values.

If you are familiar with image analysis, you will easily recognize here a case of image equalization.

Notice that the class index are zero-based (and up to 7, not 8), since we used truncation.

The resulting layer looks similar to the one that would be obtained by just reclassiying the DEM instead of the ordered layer, but the result might differ depending on how the layer values are distributed.

1 comentario: