viernes, 14 de diciembre de 2012

Simple vector operations

I was asked yesterday about how to solve a simple analysis problem: given a lines layer with a channel network and a polygons layer representing forest areas, extract those parts of the first one that intersect the second one. The main goal was to easily calculate parameters such as the total length of rivers across forest areas, but apparently there was interest in also keeping the parts outside of the forest areas to do other kinds of analysis. Solving the first question can be easily done with a little bit of SQL, but since the person who asked me was not familiar with of SQL, and she planned to do some other things with the data, I decided to create a small SEXTANTE model instead, which basically separates line parts within the polygons from those outside, while keeping both in the same layer.

I followed these steps:
  • Clipping the lines layer with the polygons layer. 
  • Calculating the difference of the polygons layer and the lines one.
  • Adding a new field called "IN" to both layers.
  • Filling the field with a value of "1" for the layer obtained from the clipping, since it contains the line parts inside the polygons.
  • Filling the field with a value of "0" for the layer obtained from the difference, since it contains the line parts outside the polygons.
  • Merging these last two layers
The model that automates these tasks looks like this.



Considering that there are other similar layers to process, this will probably save same time for future analysis, and using it is really simple, even for inexperienced users.

---

A (slightly narcissistic) note: I later realized that the model can be created without the two Field calculator algorithms, since the SAGA Merge Shapes Layers algorithm adds a field with the name of the input layer from which each feature comes from in the output layer, and also a numerical field with an index representing that input layer. I went to SAGA to check it and know more about the algorithm (that feature seemed to me an interesting addition to a plain merge...). and I was surprised (and happy) when I  found out that I was the author of that SAGA algorithm... According to the algorithm description, I wrote it back in 2004, which I guess explains my lack of memory :-)

viernes, 23 de noviembre de 2012

Spliting a vector layer

A user in the QGIS mailing list asked a couple of days ago about how to split a points layer in n given layers containing a fraction of those points. As this can be solved in many different ways, I thought it would be a good idea for a blog post and I am proposing a couple of alternatives here, of course all of them using SEXTANTE.

You can use any layer you want to test this examples. I will be using the elevations points layer in the Baranja Hill dataset.



Alternative 1

The first way we can split a layer into n new layers is by randomly assigning each feature to a group, with a total of n possible options, and then splitting the layer based on that group. The first step can be easily performed using the great Pyculator plugin, which is already available from SEXTANTE.

Double click on its name and fill the fields in the parameters window as shown below



Here we are creating 10 different groups, but you can enter the number you prefer or need.

There is no guarantee that the number of points will be equal in each group, but since the random values follow an uniform distribution, it should more or less be similar, which is enough for this purpose.

Here is the resulting layer rendered using a classified scheme based on the new field we have created in the layer, which represents the group it belongs to.



Now we can just split the layer using that field, creating a script algorithm to do so.

Here is the code of that algorithm

#Definition of inputs and outputs
#==================================
##[Example scripts]=group
##input=vector
##class_field=field input
##output=output file

#Algorithm body
#==================================
from qgis.core import *
from PyQt4.QtCore import *
from sextante.core.SextanteVectorWriter import SextanteVectorWriter

# "input" contains the location of the selected layer.
# We get the actual object,
layer = sextante.getObjectFromUri(input)
provider = layer.dataProvider()
allAttrs = provider.attributeIndexes()
provider.select( allAttrs )
fields = provider.fields()
writers = {}

# Fields are defined by their names, but QGIS needs the index for the attributes map
class_field_index = provider.fieldNameIndex(class_field)

inFeat = QgsFeature()
outFeat = QgsFeature()
inGeom = QgsGeometry()
nFeat = provider.featureCount()
nElement = 0
writers = {}

while provider.nextFeature(inFeat):
  progress.setPercentage(int((100 * nElement)/nFeat))
  nElement += 1
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  if clazz not in writers:
      outputFile = output + "_" + str(len(writers)) + ".shp"
      writers[clazz] = SextanteVectorWriter(outputFile, None, fields, provider.geometryType(), provider.crs() )
  inGeom = inFeat.geometry()
  outFeat.setGeometry(inGeom)
  outFeat.setAttributeMap(atMap)
  writers[clazz].addFeature(outFeat)

for writer in writers.values():
  sextante.load(writer.fileName)  
  del writer


A few things to comment here:
  • There are no declared outputs. That is because we are not going to generate a vector layer, but several of them, and we do not know in advance the number of new layers to create, since it depends on the input layer and the number of classes it has.
  • We have an input of type file. That is the base filename we use for the layers to create. WE could have used a string input as well, but using a file input will make it easier for the user to browse in the file system and select the filename to use.
  • We are loading layers explicitly. This is not needed (and should not be done) if the output is declared, but in this case we have to do it in case we want the layers to be loaded. Once again, this is a problem of the semantics of the algorithm, which are not well-defined, since the number of outputs is not know at design-time.
  • We cannot use this algorithm in the modeler. The semantic problems make it impossible to add it to a model, since it has no declared output.
Now execute the algorithm an use the points layer with the random field as the file to split, and that random field as the one to use for splitting into classes.


You will get your set of new layers, and they will be loaded into QGIS.

We can make it even easier to split a layer. How can we put this two algorithms into just one, which takes the layer to split and the number of final layers we want to create? Again, a bit of Python scripting can be used to create a new script algorithm, which wraps the two algorithm that are involved in this operation.

I will leave this a as an exercise for the reader ;-)

Alternative 2.

A different way of dividing a later is by tiling it. We can do this using two algorithms: one to create a tiling of polygons and another one to clip the layer using each one of the polygons.

First, let's create an algorithm to do the tiling and create a set of rectangles covering the area of the layer to split. A small change  in the script that we used to create the hex grid of the last post, and we have it. Here is the code:

##[Example scripts]=group
##input=vector
##numpolygons=number 10
##polygons=output vector
from sextante.core.QGisLayers import QGisLayers

input = sextante.getObjectFromUri(input)
centerx = (input.extent().xMinimum() + input.extent().xMaximum()) / 2
centery = (input.extent().yMinimum() + input.extent().yMaximum()) / 2
width = (input.extent().xMaximum() - input.extent().xMinimum())
cellsize = width / numpolygons
height = (input.extent().yMaximum() - input.extent().yMinimum())
sextante.runalg("mmqgisx:creategrid", cellsize, height, width, height, centerx, centery, 1, polygons)


This creates n horizontal tiles, but it can be easily adjusted to create a different design.

Save the algorithm as Create_tiling_from_vector_layer.py, and run it setting a total of 5 polygons to be created.



This is the result you will get.



Now, we can use the Clip algorithm, but instead of using it for the whole layer, we will use it with the run iteratively option for the clipping layer. That will cause the algorithm to be executed once for each polygon, just like we want. Remember that to active the iterative execution you have to click the button with the green icon at the right-hand side of the box where you select the layer.



Here you can see the result, with 5 different layers, each one rendered with a different color.


Alternative 3.

Use the GRASS v.kcv module. Very practical, but not as fun as the examples above :-)

Alternative 4

Use lassplit from LASTools, also available from SEXTANTE. Not a very good option, since you need las files to work with it. The point layer to split should be exported to a las file , then processed, and then the resulting ones should be converted back to shp. This last step can be done with the las2shp tool, used through the SEXTANTE batc processing interface to ease the task.

Note:

I found a couple of bugs in the SEXTANTE Pyculator implementation and in the MMQGISX Create grid algorithm when using rectangles. They are already fixed, but only in the GitHub version, so will need it to run these example

domingo, 11 de noviembre de 2012

Hexagons, trees and more

One thing I like to do (and that I will probably be doing in other entries in this blog) is to find examples of spatial analysis and to replicate them with SEXTANTE. If the original example uses QGIS, a SEXTANTE-based solution not only involves obtaining the same result using the same software, but also having a better way of replicating that analysis with a different dataset or automating it, since it can be integrated into SEXTANTE tools such as the graphical modeler or the batch processing interface.

For today's example, I am going to replicate this post from Anita Graser's blog, one of the best QGIS blogs around, with plenty of interesting examples. I recommend reading the original entry, but basically we are going to work with the Viena Tree Cadastre, create an hexagonal grid and find the number of trees in each hexagon, to get a nice rendering of how green each area of the city is. After that, we will go a little further and add some extra calculations, just to show some additional SEXTANTE elements.

The first thing is to create the grid. The MMQGISX Create Grid algorithm can be used for that, but the extent of the grid has do be defined manually with the coordinates of the center of the grid and its width and height. That might make it difficult to use it later in an automated context. For this reason, we are going to create our own algorithm to generate the grid based on the bounds of a layer. Of course, we will not do it from scratch, but reusing the MMQGISX algorithm. Here is how to do it:

Create a new script algorithm, selecting the Create new script tool in the SEXTANTE toolbox.

Add the following code to the script. Comments explain most of it.

#Definition of inputs and outputs
#==================================
#input=vector
##cellsize=number 1000.0
##grid=output vector

#Algorithm body
#==================================
from sextante.core.QGisLayers import QGisLayers

# "input" contains the location of the selected layer.
# We get the actual object, so we can get its bounds
input = QGisLayers.getObjectFromUri(input)

# And now we calculate the bounds and call the MMQGISX algorithm
centerx = (input.extent().xMinimum() + input.extent().xMaximum()) / 2
centery = (input.extent().yMinimum() + input.extent().yMaximum()) / 2
width = (input.extent().xMaximum() - input.extent().xMinimum())
height = (input.extent().yMaximum() - input.extent().yMinimum())
Sextante.runalg("mmqgisx:creategrid", cellsize, cellsize, width, height, centerx, centery, 3, grid)

This creates a new algorithm that wraps the existing one to construct the grid layer, but makes it easier to enter the bounding box, since it takes it from an already loaded layer.

Executing the algorithm, we will see the following parameters dialog.

We select the tree cadastre layer as input, and a cellsize of 0.0025, and we will get the grid covering the input layer.


Now to count the number of trees in each hexagon, we combine the trees layer and the grid layer, using both of them as inputs in the fTools Count points in polygon algorithm.

The result, with a graduated color ramp based on the attribute containing the number of points, looks like this.


 Empty cells are rendered in white, and since the borders of each hexagon are also white, it seems that there are no cells in certain parts of the covered region. However, they are there, and it would be a good idea to remove them, so as to make the layer size smaller. We can use another MMQGISX algorithm (Select), to create a new layer that contains just the hexagon cells with at least one tree.


Cell borders are represented in black this time, just to show the missing ones.

If we put all that in a model, it will allow us to create the final grid in one single step, just selecting the input layer with elements to count (trees in this case). and the desired hexagon size.


As you can see, there is no problem reusing the algorithm we have created as a part of a model. In fact, we can use a model inside another model, or our script-based algorithm as part of another script, calling it just like we call any of the other algorithms.

Running the model, we should see the parameters dialog shown below.



We can represent not just the number of trees, but the diversity of them, showing how many different species can be found in each cell (the trees layer includes an attribute with the name of the specie)

There is no algorithm to get the number of unique points in a polygons, so we have to find a workaround. The first step for that is to add a new field to the points layer with the code of the cell it belongs to. We can use the SAGA Add polygon attributes to points algorithm, but we need a unique attribute for each cell. The Add autoincremental field algorithm does exactly that.


 Now we can run the Add polygon attributes to points algorithm.


Once the points have the ID of the polygon they belong to, we can make a small script to count the number of different tree species in each class. We will create a new SEXTANTE algorithm with the following script.

#Definition of inputs and outputs
#==================================
##input=vector
##class_field=field input
##value_field=field input
##output=output vector

#Algorithm body
#==================================
from sextante.core.QGisLayers import QGisLayers
from qgis.core import *
from PyQt4.QtCore import *
from sextante.core.SextanteVectorWriter import SextanteVectorWriter

# "input" contains the location of the selected layer.
# We get the actual object, so we can get its bounds
layer = QGisLayers.getObjectFromUri(input)
provider = layer.dataProvider()
allAttrs = provider.attributeIndexes()
provider.select( allAttrs )
fields = provider.fields()
fields[len(fields)] = QgsField("UNIQ_COUNT", QVariant.Int)
writer = SextanteVectorWriter(output, None, fields, provider.geometryType(), provider.crs() )

# Fields are defined by their names, but QGIS needs the index for the attributes map
class_field_index = provider.fieldNameIndex(class_field)
value_field_index = provider.fieldNameIndex(value_field)

inFeat = QgsFeature()
outFeat = QgsFeature()
inGeom = QgsGeometry()
nFeat = provider.featureCount()
nElement = 0
classes = {}

#Iterate over input layer to count unique values in each class
while provider.nextFeature(inFeat):
  #Printing a number to the console to indicate progress
  print int((50 * nElement)/nFeat)
  nElement += 1
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  value = atMap[value_field_index].toString()
  if clazz not in classes:
      classes[clazz] = []
  if value not in classes[clazz]:
      classes[clazz].append(value)      

# Create output vector layer with additional attribute
while provider.nextFeature(inFeat):
  print int((500 * nElement)/nFeat)
  nElement += 1  
  inGeom = inFeat.geometry()
  outFeat.setGeometry( inGeom )
  atMap = inFeat.attributeMap()
  clazz = atMap[class_field_index].toString()
  outFeat.setAttributeMap( atMap )
  outFeat.addAttribute( len(provider.fields()), QVariant(len(classes[clazz])))
  writer.addFeature( outFeat )
  
del writer

Again, the comments in the source code should explain how it works.

 Running that on the trees layer with the extra field will generate a new points layer with another new field containing the number of  different species (the richness) in the polygon each tree belongs to. To pass that value to the corresponding hexagon cell in the grid layer, we can use the SAGA Point statistics in polygons algorithm. Since all points in a cell will have the same richness value (because it is the richness of the cell they belong to), calculating the average value will yield exactly that richness value.

 Below you can see the result.



The script we have created can be used as well to find out the street with the largest number of different trees. Just run it on the original trees layer (which contains an attribute with the name of the street where the tree is), using the street name as class attribute, and then sort the attributes table of the resulting layer based on the richness attribute.

Tuerkenschanzpark seems to be the one with the largest value (219). Nice place for a walk, I guess ;-)

A note


The process above is a bit cumbersome, although rather useful not just as a didactic example, but also from a practical point of view, since we have created two new algorithms that can be reused. However, an algorithm to count unique points would be a better solution. As I said, there was not one to do that, but I thought it would be a good idea, and it is pretty easy to create one, so I added it to the fTools group. You can now find it in the SEXTANTE toolbox if using the development version, and it will be available in the next SEXTANTE release.

Another note


The measure of richness we have used is rather poor. For better ones, check this wikipedia article on diversity indices, and if you feel adventurous, try to implement one yourself :-)

Bonus 


Another interesting way of representing points when the number of them is high is by aggregating them and replacing a group of points by a bigger one with a size related to the size of the group. This technique is used, for instance, in the GeoTools PointStacker transformation. We can create such a rendering by converting the hexagon cells into circles with a radius depending on the number of trees. First we convert the hexagons into points, by running the fTools Polygon centroids algorithm. The attributes table of the centroids layer is exactly like the one of the original polygon layer, so it will contains the field with the number of points. We can divide the values in groups and create a new value for each group (use the Field pyculator algorithm for that), representing the radius of the circle that represents that class. Now we can buffer each centroid using that radius as buffer distance. Feel free to try it yourself :-)

lunes, 5 de noviembre de 2012

Some hydrological analysis

For this entry, we are going to do some hydrological analysis. Starting with a DEM, we are going to extract a channel network, delineate watersheds and calculate some statistics. We will use mostly algorithms from SAGA, but also from other providers, to complement them. And finally we will show how to use one of the powerful tools of SEXTANTE: the iterative execution of algorithms. Using it, we will calculate a histogram to show how elevation is distributed in each subwatershed, and plot it using R.

Let's start.

The first thing is to load this DEM.




And the first module to execute is the Catchment area (Parallel) one (use the search box in the toolbox to find it). You can use anyone of  the others named Catchment area. They have different algorithms underneath, but the results are the same.

Select the DEM in the Elevation field, and leave the default values for the rest of parameters.


This algorithm calculates many layers, but the Catchment Area one is the only one we will be using.


You can get rid of the other ones if you want.

The rendering of the layer is not very informative. To know why, you can have a look at the histogram and you will see that values are not evenly distributed (there are a few cells with very high value, those corresponding to the channel network). Calculating the logarithm of the catchment area value yields a layer that conveys much more information (you can do it using the raster calculator).



The catchment area (also known as flow accumulation), can be used to set a threshold for channel initiation. This can be done using the Channel network algorithm. Here is how you have to set it up.



Use the original catchment area layer, not the logarithm one. That one was just for rendering purposes.

If you increase the Initiation threshold value, you will get a more sparse channel network. If you decrease it, you will get a denser one. With the proposed value, this is what you get.



The image above shows just the resulting vector layer and the DEM, but there should be also a raster one with the same channel network. That raster one will be, in fact, the one we will be using.

Now, we will use the Watersheds basins algorithmtol delineate the subbasins corresponding to that channel network, using as outlet points all the junctions in it. Here is how you have to set the corresponding parameters dialog.



And this is what you will get.


This is a raster result. You can vectorise it using the Vectorising grid classes algorithm.




Now, let's move into the second part of this tutorial. Let's try to create a set of histograms that show how elevation values are distributed within each subbasin. First, let's see how to do it for a single subbasin. The idea is to have a layer that just represents the elevation within that subbasin and then pass that layer to R so we can calculate the histogram and plot it.

First, let's clip the orginal DEM with the polygon representing a subbasin. There is another SAGA algorithm to do so, and it's called Clip Grid with Polygon. Although we are calling an external application, since we are using SEXTANTE, it will be aware of the selected features in the layer we use as input. SEXTANTE will take care of creating an intermediate layer, so we do not have to worry about it and the integration between QGIS and SAGA is more complete. so, if we select a single subbasin polygon and then call the clipping algorithm, we can clip the DEM to the area covered by that polygon.

Select a polygon,




and call the clipping algorithm with the following parameters:


The element selected in the Input field is, or course, the DEM we want to clip.

You will get something like this.



This layer is ready to be used in R, but first we need an R algorithm that can use it. Add a new R script named  Raster Histogram, with the following code:


##R scripts=group
##layer=raster
##showplots
hist(as.matrix(layer))


This R script is, in fact, included as an example script, but you should try to add it yourself and get some practice. If you know R, you can even try to adapt it to plot the accumulated histogram or the values from an hypsometric curve, more commonly used to describe the relief of a watershed.

If you run the script using the clipped DEM as input, you will get a plot of the elevation histogram within the basin used to create that clipped DEM.

We can link this two last algorithm (clipping with SAGA + plotting with R) creating a simple model, as shown below. It takes a polygon layer and a raster layer, and then does the clipping and the plotting.



The intermediate clipped layer should not be a final result, but the output from the R script has to be marked as final, so you should enter a name in the corresponding box. Enter "Plots" as the name of that final output.
Save the model and name it "clipping + plotting". Now if you double click on the new entry corresponding to that model in the toolbox, you will have the following parameters dialog.


If you enter the vector layer as input, it will use the whole layer, and you will get a single histogram. You can select one of its polygons, and you will get the histogram for that basin. however, what we want is to have all the histograms corresponding to all the polygons. Here is where the iterative execution comes in.

Select the watershed vector layer as input, but click on the button on the right-hand side of the row corresponding to that parameter (the one with the green icon). That will cause SEXTANTE to run the algorithm (the model in this case), n times, one for each of the n features in the selected vector layer. That is exactly what we want.

All the resulting plots will be added to the SEXTANTE results window, as usual.


You can also create a model with the algorithms in the first part of the tutorial. That way, only two steps will be needed to get from the DEM to the set of plots: one for creating the layer with watersheds, another one to do the clipping and plotting.



domingo, 21 de octubre de 2012

Analyzing John Snow's cholera dataset with QGIS and SEXTANTE

To start this blog, I have taken the well-known dataset that John Snow used in his groundbreaking work and I have been playing with it in SEXTANTE to get some results. Not really new things to come out (the analysis is pretty obvious and there is no need for sofisticated GIS techniques to end up with good results and conclusions), but I think it is a good way of showing how these spatial problems can be analyzed and solved by using different tools included in SEXTANTE.

The dataset I have used contains shapefiles with cholera deaths and pump locations, and an OSM rendered map in TIFF format. They have been adapted from this dataset . There were a couple of missing pumps and I added some values to the pump layer attributes table. Also, I had to modify the OSM image to be able to process it in SAGA, which does not allow for raster layer with different x and y cellsizes (I have to see how to do that automatically in SEXTANTE...but it doesn't look like an easy task...)

The first thing to start with is calculting the Voronoi diagram (a.k.a. Thyessen polygons) of the pumps layer.
The Voronoi Diagram tool in MMQGIS can be used for that.


Pretty easy, but it will already give us interesting information.


Clearly, most cases are within one of the polygons

To get a more quantitative result, we can count the number of deaths in each polygon. Since each point represents a building where deaths occured, and the number of deaths is stored in an attribute, we cannot just count the points. We need a weighted count, so we will use the Count points in polygon (weighted) tool in the fTools group.


The new field will be called DEATHS, and we use the COUNT field as weighting field. The resulting table clearly reflects that the number of deaths in the polygon corresponding to the first pump is much larger than the other ones.


Another good way of visualizing the dependence of each point in the Cholera_deaths layer with a point in the Pumps layer is to draw a line to the closest one. This can be done with the Distance to closest hub tool in the MMQGIS group, and using the configuration shown next.


The result looks like this:


Although the number of lines is larger in the case of the central pump, do not forget that this does not represent the number of deaths, but the number of locations where cholera cases were found. It is a representative parameter, but it is not considering that some locations might have more cases than other.

A density layer will also give us a very clear view of what is happening. We can create it with the SAGA Kernel density algorithm. Using the Cholera_deaths layer and its COUNT field as weight field, and with a radius of 100, we get something like this.


Combining with the pumps layer, we see that there is one pump clearly in the hotspot where the maximum density of death cases is found.

All these analysis are using euclidean distance, but people did not walk in straight lines to go to the pump to get some water. Let's try to take that into account.

There are many sources of spatial data where we could get vector roads data, but let's imagine that we cannot have other data apart from the OSM image. How could we use it?

First, let's classify the map, so as to have a new map with a given value (1 seems like a good one, we will later see why) in those pixels representing streets, and the no-data value in the rest of them. All non-streetpixels correspond to buildings, and are rendered in a light-orange dark. This is a paletted image, and the index for that color is 204. To check it, you just have to select the information tool, select the image to query and click on any of those orange pixels. The dialog that shows up will tell you the band value.


So what we have to do is to change those pixels with a value of 204 to no-data (you cannot walk through buildings), and the rest of them to 1 (you can walk there). There are several ways of doing that. Using the Reclassify algorithms is one of them. The raster calculator is another one, and we have two of them, one in SAGA and one in GRASS. I will be doing it using the SAGA one.

Here is how the parameters dialog has to be filled.


You have to select the OSMMap layer in the grids field. The formula ifelse(a=204, -99999,1) tells SAGA that if the value in first grid in the Grids field (grids are referred as a, b, c, etc...or g1, g2, g3, etc, and in this case we just have one), the resulting value will be -99999 (no-data), otherwise it will be 1.

(I have to think about a better way of setting no data values, but in the meantime it is a good thing to know that SAGA uses -99999 as the default no-data value, so you can use it directly like that.)

The resulting layer looks like this.


Roads are rendered in grey, buildings are transparent. Rename it to "streets"

You might notice that this is a quick and dirty classification, as it does not consider other colors in the map, like those of labels, but in the center area where the interesting stuff is happening, it represents a very good solution, so there is no need for more refinement.

Let's turn this layer into a new one where no-data vlaues will be preserved, but all street pixels, instead of having a value of 1, will have the walking distance to the closest pump, considering, of course, the streets.  This is going to be a raster analysis, so let's first convert the pumps vector layer into a raster one.

Select the Shapes to Grid tool in SAGA and fill it as shown next.

To get the output extent, you do not have to type it. Click on the button on the right-hand side and select Use layer/canvas extent.


Since we will be using this rasterized layer along with the streets one, select the streets layer and its extent will be automatically added to the text field. You can do the same with the cellsize, but it is not needed, since the cellsize of the streets image (and the original OSMMap layer) is 1, which happens to be the default value (just a coincidence...)

The resulting layer looks mostly transparent, as it is all comprised of no-data values, except for those pixels where a point representing a pump was found. You might not see them, but those pixels are there, do not worry. We will use them in the next step. Rename the layer to "pumps"

Select the SAGA  Accumulated cost (isotropic) algorithm and set the following values.


Execute the algorithm and you will get two new layers, one with the cost, and another with the index of the closest destination point (the closest pump). This last one is really more intersting for us. Changing the styling to use the Pseudocolor color map, and adding the vector Pumps and Cholera_deaths layers, you should get something like this.


There is not much difference when compared with the Voronoi polygons, but it it illustrates a different way of getting to the same result. Other parameters (i.e. steepness of streets) can be incorporated this way, just changing the cost layer (now cost is 1 through all cells).

The cholera_deaths points layer can be extended sampling the closest point raster layer, adding a new attribute with the index of the closest pump. I'll let you find out how to do that with SEXTANTE.

I have a few more ideas, and will probably write some more examples based on this dataset. The results are not going to be much more better than this ones (these are more than enough to "solve" the problem), but will be useful for showing different techniques to use SEXTANTE in QGIS.

If you want to go a bit further, you can try creating a model with all these steps, or even a script in the QGIS console.

------

Note 1: I developed the SAGA cost algorithms quite a few years ago.  I think now that they can be improved, and I'm thinking about doing some improvements with new ideas. This looks to me  like a good excuse to go back to a bit of good old c++ coding :-)

Note 2: I fixed a couple of bugs, and improved some SEXTANTE little things while writing this, so you need  to work with the github version to be able to replicate this.

Note 3: Of course, the street classification can be done as well using the grayscale OSM image.