Skip to content

Envelope Models

October 20, 2010

Our student project for the Open Source Tools for Spatial Ecological Modeling (ost4sem) workshop in Copenhagen in June-July 2010.

http://spatial-ecology.net/doku.php?id=wiki:projects

http://spatial-ecology.net/doku.php?id=wikidk:dk10envelop

Open Source Tools for Spatial Ecological Modelling (ost4sem)
Copenhagen June-July 2010
Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang

Envelope Models

Open Source Tools for Spatial Ecological Modelling (ost4sem)
Copenhagen June-July 2010
Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang

We wanted to download freely available spatial data on climate, land-use and observations of study species, and process them for modeling current and future distributions. We chose to model two butterfly and three plant species.

  • Ubuntu Linux system in a VmWare virtual machine
  • BASH scripting
  • GDAL libraries
  • GRASS
  • R
  • Climate data from WorldClim URL
  • Land use data from EEA Corine URL
  • Country borders (shapefile from ESRI, etc)
  • Eg. GADM database of Global Administrative Areas GADM

We will use presence only data downloaded from GBIF.

Example getting butterfly data from GBIF (wget)

Plant data

Hops

Butterfly data

Red Admiral

  • Step 1 :: Species occurrence data (point, longitude, latitude input)
  • Step 1a: Get species occurrence point data (from GBIF and other sources) with WGET (http://data.gbif.org)
  • Step 1b: Convert XML species data to text with xmlstarlet (http://dave.ecoforge.net/wiki/HowTo/GbifRest)
  • Step 1b: Prepare source point data with AWK
  • Step 1c: Import occurrence data to GRASS

  • Step 2 :: Grid data (raster)
  • Step 2a: Get grid data, climate data, land use, etc
  • Step 2b: Prepare grid data with AWK (?)
  • Step 2c: Import grid data to GRASS

  • Step 3 :: Envelope modeling, randomForest, maxent, glm, gam, etc…(?) with R (GRASS?)
  • Step 3a: calibrate envelope model (area Europe) from the input points (for each species)
  • Step 3b: apply the model (in Sweden) to generate a species distribution prediction

#!/bin/bash
#
# Copenhagen June 2010, ost4sem, unidk2010
#
# Title: Envelope models
# Authors: Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang
# Script: envmod_data.sh
# Description: Species distribution data for envelope modeling
# Step 1 :: Species occurrence data (point, lon, lat input)
#

#######################
# Home directory folder
home_dir="~/ost4sem/project/unidk2010/envmod/"
cd ~/ost4sem/project/unidk2010/envmod/

###################################################
# Install tofrodos for import of windows text files
sudo apt-get install tofrodos

###################################
# Create directory folder structure
# -a file exists, -f file exists, -f file exists and is regular file, -d directory exists
#
for folder_name in input output sh_script grassdb input/species input/worldclim input/corine input/sjv; do
 if [ -d $folder_name ]; then { echo "folder $folder_name exists"; } else { echo "folder $folder_name does NOT exists"; mkdir $folder_name; } fi
done
#
ls -lR
ls -l input

#####################################################
# Species list (we will use one of the options below)
#
# Species list in a variable
species_list="vanessa+atalanta humulus+lupulus scolitantides+orion sedum+acre crambe+maritima"
#
# Species list in a text file
echo "pyrgus+armoricanus" "crambe+maritima" > "input/species/species_list.txt"
cat input/species/species_list.txt
#
# Read species list from text file to variable
species_list=`cat input/species/species_list.txt`
echo $species_list
#
# Test loop of species list
for species in $species_list; do echo -e "Species :: "$species" \n"; done

# download European country iso list
wget -O ./input/isocountry.txt "http://www.maxmind.com/eu_country_list.txt"

######################################################
# Download species occurrence data from GBIF with wget
# http://data.gbif.org/ws/rest/occurrence
# http://wiki.gbif.org/dadiwiki/wikka.php?wakka=DeveloperAPIs
# http://wwwdev.ngb.se/portal/files/cwr/CWR-Portal-Technical-Description.pdf
#
# originregioncode=EEU,NEU,SEU,WEU // EEU-Eastern Europe,NEU-Northern Europe,SEU-Southern Europe,WEU-Western Europe
# originisocountrycode=SE // Occurrences from Sweden
#
# wget -O outputfile; file to print some file information
# lynx http://data.gbif.org/ws/rest/occurrence/count?originisocountrycode=SE&scientificname=pyrgus+armoricanus&georeferencedonly=false
# wget -O ./input/species/test.xml "http://data.gbif.org/ws/rest/occurrence/list?originisocountrycode=SE&scientificname=pyrgus+armoricanus&georeferencedonly=true"
#
# loop species

#########################
# download species for each european country
# xmlstarlet to read XML
# sudo apt-get install xmlstarlet # to install xmlstarlet on Ubuntu
# Convert XML to text with xmlstarlet see: http://dave.ecoforge.net/wiki/HowTo/GbifRest
#
for country in `awk '{if (NR>3) print $1}' ./input/isocountry.txt`; do
  for species in $species_list; do
    echo -e "Species :: "$species"\n";
    wget -O "./input/species/"$species"_"$country".xml" "http://data.gbif.org/ws/rest/occurrence/list?originisocountrycode=$country&scientificname="$species"&georeferencedonly=true&mode=processed"
    # NB! for species occurrences per country above 1 000 hits we need to make paging
    # Check the XML response if there are more than the maximum returned 1 000 records
    # Check if the XML element gbif:summary have the attribute next=1000
    # http://data.gbif.org/ws/rest/occurrence/list?startindex=1000&... (for page two)
    # while loop until no more next page would be a solution
  xmlstarlet sel -T \
    -N tn="http://rs.tdwg.org/ontology/voc/TaxonName#" \
    -N to="http://rs.tdwg.org/ontology/voc/TaxonOccurrence#" \
    -t -m "//to:TaxonOccurrence" \
    -v "concat(.//tn:nameComplete,';',to:decimalLongitude,';',to:decimalLatitude)" \
    -n \
    ./input/species/"$species"_"$country".xml > ./input/species/"$species"_"$country".txt
  done
done

#concatenate species occurrence data from all countries
for species in $species_list; do
  cat ./input/species/"$species"_*.txt > ./input/species/"$species".txt
done

#make nicer species list
for species in $species_list; do
  specname=`echo "$species"`
  cat ./input/species/"$species".txt | tr [,] [.] | awk -v i="$specname" 'BEGIN{FS=";"};{print i, ",", $2,",",$3}' | tr -d [:blank:] | tr "+" " " > ./input/species/temp.txt
  cat ./input/species/temp.txt > ./input/species/"$species".txt
done
rm ./input/species/temp.txt

# make some nice col names for each species distribution file
for spec in $species_list; do
  awk 'BEGIN { print "species,long,lat"}' > ./input/species/temp.txt
  awk '{print $0}' ./input/species/$spec.txt >> ./input/species/temp.txt
  cat ./input/species/temp.txt > ./input/species/$spec.txt
done
rm ./input/species/temp.txt

#
ls -l input/species
head -40 ./input/species/$species.xml
head -10 ./input/species/$species.txt

#########################
# Download WorldClim data
# http://www.worldclim.org/download
# Current climate (there is also future and some past climate)
# wget -P prefix directory save files to
#
for layer in tmin tmax tmean prec bio alt; do # tmin tmax tmean prec bio alt
 for resolution in 10m; do # 30s, 2-5m, 5m, 10m (NB! finer resolution files are LARGE)
  wget -P "./input/worldclim/current/wc_"$resolution "http://r-gis.org/climate/worldclim1_4/grid/cur/"$layer"_"$resolution"_bil.zip"
  sleep 2 # wait 5 seconds between wget sessions to avoid overloading worldclim
 done
done
ls -l input/worldclim/current
#
# Future climate (do we plan to need this...?)
for layer in tmin tmax; do # tmin tmax tmean prec bio alt
 for resolution in 10m; do # 30s, 2-5m, 5m, 10m (NB! finer resolution files are LARGE)
  for model in CCCMA; do # CCCMA, HADCM3, CSIRO
   for scenario in A2a; do # A2a, B2a
    for year in 2020; do # 2020, 2050, 2080
     #http://r-gis.org/climate/worldclim1_4/grid/fut/IPPC3/CCCMA/wc_10m_CCCMA_A2a_2020_tmin.zip
     wget -P ./input/worldclim/future "http://r-gis.org/climate/worldclim1_4/grid/fut/IPPC3/"$model"/wc_"$resolution"_"$model"_"$scenario"_"$year"_"$layer".zip"
     sleep 5 # wait 5 seconds between wget sessions to avoid overloading worldclim
    done
   done
  done
 done
done
ls -l ./input/worldclim/future

#####################################
# Download CORINE Land cover from EEA
# http://www.eea.europa.eu/data-and-maps/data
# Corine Land Cover (clc) 2006 raster data, resolution 250 m (g250_06.zip, ZIP archive 26.76 MB)
# /data-and-maps/data/ga-downloads/SH04UZP80M/corine-land-cover-2006-raster/g250_06.zip
# Legends :: clc_legend.csv (Plain Text  3.88 KB)
# /data-and-maps/data/ga-downloads/SH04UZP80M/corine-land-cover-2006-raster/clc_legend.csv
#
wget -O ./input/corine/clc_g250_06.zip "http://www.eea.europa.eu/data-and-maps/data/corine-land-cover-2006-raster/clc-2006-v13-250m/g250_06.zip/at_download/file"

# unzip data
unzip -d ./input/corine/clc ./input/corine/clc_g250_06.zip
ls -l ./input/corine
ls -l ./input/corine/clc

#!/bin/bash
#
# Copenhagen June 2010, ost4sem, unidk2010
#
# Title: Envelope models
# Authors: Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang
# Description: Species distribution from envelope modeling
# Harmonize the projection and grid for the raster data (worldclim and corine)
# Use gdalinfo and gdalwarp for projection
#

#######################
# Home directory folder
home_dir="~/ost4sem/project/unidk2010/envmod"
cd ~/ost4sem/project/unidk2010/envmod

################
# GDAL, gdalinfo
#gdalinfo $home_dir"/input/worldclim/current/wc_5m_10km/alt.bil"
gdalinfo "./input/worldclim/current/wc_5m/alt.bil"
gdalinfo "./input/worldclim/current/wc_5m/bio1.bil"
gdalinfo -stats "./input/worldclim/current/wc_5m_10km_esri/alt/alt"

###############################################
# Issues of importing the wc bil grids in GRASS
#
# ESRI software assumes that the data files (.bil) do not have negative values.
# These values (x) are replaced by (65535 + x); E.g., -10 becomes 65525
# Also the nodata value of -9999 is not recognized. (becomes 65535-999=55537)
# gdal, used by GRASS GIS to import the data, seems to make the same assumption
# 16-bit (2 byte) integer, unsigned representation, values between 0 and 65535
# Signed representation however, possible values range from −32768 to 32767
# Worldclim data is signed, but ESRI and gdal assume grid values are unsigned
# You need to do is to reclass all values > 23767 as value-(66536)
# Eg: > r.mapcalc "tmin_corrected=if(tmin>23767,tmin-66536,tmin)"
# http://pvanb.wordpress.com/2010/02/06/importing-worldclim-climate-bil-datalayers-in-grass-gis/
#
# You can convert from bil to a different format like geotiff
#
# To import Worldclim data, the following line has to be added to each .hdr file:
# PIXELTYPE SIGNEDINT
#
echo -e "PIXELTYPE     SIGNEDINT\r" > ./temp/pixeltype.txt
for hdr_file in `ls ./input/worldclim/current/wc_5m | grep .hdr`; do
 hdr_pt=${hdr_file%.*}"_pt.hdr" ## cat two files together as concatenate cannot output to the same file
 echo "hdr_file :: "$hdr_file" :: hdr_pt :: "$hdr_pt;
 cat ./temp/pixeltype.txt ./input/worldclim/current/wc_5m/$hdr_file > ./input/worldclim/current/wc_5m/$hdr_pt;
 mv ./input/worldclim/current/wc_5m/$hdr_pt ./input/worldclim/current/wc_5m/$hdr_file
# head -3 ./input/worldclim/current/wc_5m/$hdr_file;
done

###########################################
#change datum/projection of corine landcover to WGS84
gdalwarp -t_srs EPSG:4326 g250_06.tif $home/input/corine/g250_06_WGS84.tif
cat clc_legend_qgis.txt #legends

#compress tif-file
gdal_translate -co "COMPRESS=LZW" g250_06_WGS84.tif temp.tif
mv temp.tif g250_06_WGS84.tif

cd ~/ost4sem/project/unidk2010/envmod/
#create Grass GIS DB, Location and Mapset
mkdir Grassdb
cd ~/ost4sem/project/unidk2010/envmod/Grassdb
ln -s  /home/user/ost4sem/project/unidk2010/envmod/input/corine/g250_06_WGS84.tif .

# use script to create a grass location for the project
bash ~/ost4sem/project/unidk2010/envmod/sh_script/create_location.sh g250_06_WGS84.tif europe ~/ost4sem/project/unidk2010/envmod/Grassdb

#start working in GRASS GIS
grass -text /home/user/ost4sem/project/unidk2010/envmod/Grassdb/europe/PERMANENT
g.mapset -c mapset=env_mod_mapset
g.copy rast=g250_06_WGS84@PERMANENT,g250_06_WGS84

#set region
g.region n=72.8 s=33.55 w=-25.1 e=45.9 res=0:05 save=eu --overwrite

#import map from folder
r.in.gdal -o input=~/ost4sem/project/unidk2010/envmod/input/worldclim/current/wc_5m/tmean7.bil output=tmeanJuly --o
#cut extent of map to europe
r.mapcalc tmeanJulyEU = tmeanJuly

#put it in a loop to load all months
for month in *.bil; do ## where the extension is .bil
r.in.gdal -o input=$month output=`basename $month .bil` --o ## open the file, using basename to lose the extension for the grass file
r.mapcalc `basename $month .bil`"EU" = `basename $month .bil` ## cut the map to europe
done

## cut corine data (and give it a better name)
r.mapcalc corineEU = g250_06_WGS84

Due to time restrictions (one week student project) on downloading and processing the large 1 km (30 sec) environment grids from Worldclim, we received another fine resolution (1 km) environment dataset from Stefano Casalegno. This dataset includes processed data from Worldclim, as well as data environment data from other sources. A description of the many different layers are included below as a PNG image.

Predictor variables

Predictor variables

 
#!/bin/bash
#
# Copenhagen June 2010, ost4sem, unidk2010
#
# Title: Envelope models
# Authors: Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang
# Description: Species distribution from envelope modeling
# Use gdalinfo and gdalwarp for projection
#

#######################
# Home directory folder
home_dir="~/ost4sem/project/unidk2010/envmod"
cd ~/ost4sem/project/unidk2010/envmod

###########################################
# use script to create a grass location for the project
inpr="/home/user/ost4sem/project/unidk2010/envmod/input"

# create a to LAEA transformed GEOtiff in the Grassdb folder
gdalwarp   -t_srs EPSG:3035 -s_srs EPSG:4326  $inpr"/pr101europe".tif  ~/ost4sem/project/unidk2010/envmod/Grassdb/pr101europeLAEA.tif

# go to Grassdb folder
cd ~/ost4sem/project/unidk2010/envmod/Grassdb/

# make a new location called euLAEA in the Grassdb from a GEOtiff
# it is important to migrate to the Grassdb folder since the create_location script expects you to and the tiff to be there
bash ~/ost4sem/project/unidk2010/envmod/sh_script/create_location.sh pr101europeLAEA.tif euLAEA ~/ost4sem/project/unidk2010/envmod/Grassdb

# start working in GRASS GIS
grass -text /home/user/ost4sem/project/unidk2010/envmod/Grassdb/euLAEA/PERMANENT
g.mapset -c mapset=env_mod_mapset
# after creating the new mapset it is possible to start GRASS with
# grass -text /home/user/ost4sem/project/unidk2010/envmod/Grassdb/euLAEA/env_mod_mapset

# set new region
g.region n=72.8 s=33.55 w=-25.1 e=45.9 res=0:05 save=eu --overwrite

##################################
# import predictor layers to grass
# http://www.spatial-ecology.net/doku.php?id=wiki:itaveg_respred_table_sh
inpr="/home/user/ost4sem/project/unidk2010/envmod/input"

for  (( dir=1 ; dir<=9 ; dir ++ ))  ; do
dir=1
    gdalwarp   -t_srs EPSG:3035 -s_srs EPSG:3035  $inpr"/PRESENT/pr10"$dir"/hdr.adf" $inpr"/pr10"$dir"europe".tif
    #rm -r $inpr"/PRESENT/pr10"$dir
    r.in.gdal -o -e input=$inpr"/pr10"$dir"europe".tif  output=pr10$dir"europe"  --overwrite
    #rm  $inpr"/pr10"$dir"europe".tif
done

for  (( dir=10 ; dir<=39 ; dir ++ ))  ; do
    gdalwarp   -t_srs EPSG:3035 -s_srs EPSG:3035  $inpr"/PRESENT/pr1"$dir"/hdr.adf" $inpr"/pr1"$dir"europe".tif
    #rm -r $inpr"/PRESENT/pr1"$dir
    r.in.gdal -o -e input=$inpr"/pr1"$dir"europe".tif  output=pr1$dir"europe"  --overwrite
#    r.in.gdal -o input=$inpr"pr1"$dir"europe".tif dir"europe"  --overwrite
done

for  dir in  198 199 200 201 202 203 204 205 250 254 255 257 258 259 262 264 265 292 293 294  ; do
    gdalwarp   -t_srs EPSG:4326 -s_srs EPSG:3035  $inpr"/PRESENT/pr"$dir"/hdr.adf" $inpr"/pr"$dir"europe".tif
    #rm -r $inpr"/PRESENT/pr"$dir
    r.in.gdal -o -e input=$inpr"/pr"$dir"europe".tif  output=pr$dir"europe"  --overwrite
    #r.in.gdal -o input=$inpr"/physic_pr/pr"$dir"/hdr.adf"  output=pr$dir"europe"  --overwrite
done

##################################
# import response layers (selected species) to grass
# we have csv files with coordinates of localities where the species were found

species_list="vanessa+atalanta humulus+lupulus scolitantides+orion sedum+acre crambe+maritima"
inpr="/home/user/ost4sem/project/unidk2010/envmod/input"

# when you want to import csv files into GRASS you have to create a vrt file for each csv file that defines the data structure of the csv file

for spec in $species_list; do
 vrt_file=$inpr/species/$spec.vrt
 echo "<OGRVRTDataSource>" > $vrt_file
 echo -e "\t<OGRVRTLayer name=\"$spec\">" >> $vrt_file
 echo -e "\t\t<SrcDataSource>$spec.csv</SrcDataSource>" >> $vrt_file
 echo -e "\t\t<GeometryType>wkbPoint</GeometryType>" >> $vrt_file
 echo -e "\t\t<LayerSRS>WGS84</LayerSRS>" >> $vrt_file
 echo -e "\t\t<GeometryField encoding=\"PointFromColumns\" x=\"long\" y=\"lat\"/>" >> $vrt_file
 echo -e "\t</OGRVRTLayer>" >> $vrt_file
 echo -e "</OGRVRTDataSource>" >> $vrt_file
 cat $vrt_file
done

for spec in `echo $species_list` ; do
  echo $spec
  rm -r $inpr"/species/"$spec
  cp $inpr/species/$spec.txt $inpr"/species/"$spec.csv
  ogr2ogr  -s_srs EPSG:4326 -t_srs EPSG:3035 $inpr"/species/"$spec   $inpr"/species/"$spec".csv"
  ogr2ogr  -overwrite -s_srs EPSG:4326 -t_srs EPSG:3035 $inpr"/species/"$spec   $inpr"/species/"$spec".vrt"
done

#########
# GRASS #
#########

# start GRASS GIS
grass -text /home/user/ost4sem/project/unidk2010/envmod/Grassdb/euLAEA/env_mod_mapset/

# export $species_list
species_list="vanessa+atalanta humulus+lupulus scolitantides+orion sedum+acre crambe+maritima"
inpr="/home/user/ost4sem/project/unidk2010/envmod/input"

# import species in GRASS
for spec in $species_list; do
  specname=`echo $spec | tr "+" "_"`
  v.in.ogr dsn=$inpr/species/$spec output=$specname --overwrite
done

# as alternative use v.in.ascii
# for spec in $species_list; do
#  specname=`echo $spec | tr "+" "_"`
#  v.in.ascii input=$inpr/species/"$spec".txt output=$specname fs=, skip=1 x=2 y=3 --o
# done

#v.info crambe_maritima

for spec in $species_list; do
  specname=`echo $spec | tr "+" "_"`
  #v.db.select $specname #print attribute table (if you want to see before you delete - in the next line)
  v.db.droptable map=$specname -f
  v.db.addtable map=$specname  columns="pr101 double,pr102 double,pr103 double,pr104 double,pr105 double,pr106 double,pr107 double,pr108 double,pr109 double,pr110 double,pr111 double,pr112 int,pr113 int,pr114 int,pr115 double,pr116 int,pr117 int,pr118 int,pr119 int,pr120 double,pr121 double,pr122 int,pr123 double,pr124 double,pr125 double,pr126 double,pr127 double,pr128 double,pr129 double,pr130 double,pr131 double,pr132 int,pr133 int,pr134 double,pr135 double,pr136 double,pr137 double,pr138 double,pr139 double,pr198 double,pr199 double,pr200 int,pr201 int,pr202 int,pr203 double,pr204 double,pr205 int,pr250 int,pr254 int,pr255 int,pr257 int,pr258 int,pr259 int,pr262 int,pr264 int,pr265 int,pr292 int,pr293 double,pr294 int,xcoor double,ycoor double"
done

# Print column types and names of table linked to vector map
# v.db.connect -c map=vanessa_atalanta

# Print database connection
# v.db.connect -p map=vanessa_atalanta

# create a raster_list.txt file (space separated values) including a list of raster
# file names we want to sample
# once you create the raster_list.txt, read the contents of raster_list.txt files with the "cat" command
#list=`cat /data/ost4sem/studycase/ITA_veg/grassdata/SPATIALDATA/raster_list.txt`
list="pr101europe  pr111europe  pr121europe  pr131europe  pr199europe  pr257europe pr102europe  pr112europe  pr122europe  pr132europe  pr200europe  pr258europe pr103europe  pr113europe  pr123europe  pr133europe  pr201europe  pr259europe pr104europe  pr114europe  pr124europe  pr134europe  pr202europe  pr262europe pr105europe  pr115europe  pr125europe  pr135europe  pr203europe  pr264europe pr106europe  pr116europe  pr126europe  pr136europe  pr204europe  pr265europe pr107europe  pr117europe  pr127europe  pr137europe  pr205europe  pr292europe pr108europe  pr118europe  pr128europe  pr138europe  pr250europe  pr293europe pr109europe  pr119europe  pr129europe  pr139europe  pr254europe  pr294europe pr110europe  pr120europe  pr130europe  pr198europe  pr255europe"
echo $list > $inpr/raster_list.txt

# Uploads predictor raster values at positions of vector points to the table
# In our case we sample for all raster files of the list
# their values in correspondence of the x y points locations corresponding to
# the vector point list for the species occurrence points downloaded from GBIF.
# NB! This step takes LONG time (approx 60 min++)
#
for spec in $species_list; do
  specname=`echo $spec | tr "+" "_"`
  for a in $list ; do
    a=${a/europe/}
    v.what.rast vector=$specname raster=$a"europe" column=$a
  done
done

# Create new vector (points) from database table containing coordinates.
# v.to.db - Load values from vector to database. In uploaded/printed category values '-1' is used for 'no category' and 'null'/'-' if category cannot be found or multiple categories were found.
for spec in $species_list; do
  specname=`echo $spec | tr "+" "_"`
  echo $specname
  #v.to.db -p map=$specname type=point layer=1 option=coor column=xcoor,ycoor # -p print only
  v.to.db  map=$specname type=point layer=1 option=coor column=xcoor,ycoor
done

###################################################
# We need pseudo-absences for the model calibration
# Here we generate 2 000 random points
# The species classification is as present and absent
# The random forest model calibration need examples of both classes
# Pseudo-absences is a substitute for missing absence points
# Many species occurrence datasets include ONLY presence points
# In other words: the places the species was NOT observed, is not recorded
#
#

g.copy input=pr101europe output=land
r.null map=land setnull=0
r.random input=land  n=2000 vector_output=random2000  --overwrite
d.mon start=x0
d.rast land
d.vect random2000

# clean attribute table
v.db.droptable map=random2000 -f

# populate attribute table, give column names
v.db.addtable map=random2000  columns="pr101 double,pr102 double,pr103 double,pr104 double,pr105 double,pr106 double,pr107 double,pr108 double,pr109 double,pr110 double,pr111 double,pr112 int,pr113 int,pr114 int,pr115 double,pr116 int,pr117 int,pr118 int,pr119 int,pr120 double,pr121 double,pr122 int,pr123 double,pr124 double,pr125 double,pr126 double,pr127 double,pr128 double,pr129 double,pr130 double,pr131 double,pr132 int,pr133 int,pr134 double,pr135 double,pr136 double,pr137 double,pr138 double,pr139 double,pr198 double,pr199 double,pr200 int,pr201 int,pr202 int,pr203 double,pr204 double,pr205 int,pr250 int,pr254 int,pr255 int,pr257 int,pr258 int,pr259 int,pr262 int,pr264 int,pr265 int,pr292 int,pr293 double,pr294 int,xcoor double,ycoor double"
#fill columns
for a in $list ; do
  a=${a/europe/}
  v.what.rast vector=random2000 raster=$a"europe" column=$a
done
#add coordinates to table
v.to.db  map=random2000 type=point layer=1 option=coor column=xcoor,ycoor	

########
# RESULT
# /home/user/ost4sem/project/unidk2010/envmod/Grassdb/euLAEA/env_mod_mapset/dbf
# Next step is top open the dbf file with R

Following the study case from the course wiki and the original scripts made by Stefano Casalegno and Guiseppe Amatulli.
See: http://www.spatial-ecology.net/doku.php?id=wiki:forestmod#computational_implementation

#!/bin/bash
#
# Copenhagen June 2010, ost4sem, unidk2010
#
# Title: Envelope models
# Authors: Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang
# Description: Species distribution from envelope modeling
# Table data from GRASS --> R
# Calibrate SDM models in R
# Deploy SDM models to calculate prediction of species distribution in Sweden
#

#######################
# Home directory folder
home_dir="~/ost4sem/project/unidk2010/envmod"
cd ~/ost4sem/project/unidk2010/envmod

#clean up workspace
rm(list=ls())

# load libraries
library(gdata) # used to drep.levels function
library(foreign) # used to read.dbf function
library(randomForest) # Modelling function
library(mda) # confusion
# library (vcd) # getting Kappa (if using Kappa for validation)

#define working directories
home_dir <- "/home/user/ost4sem/project/unidk2010/envmod"
dbf_dir <- "/home/user/ost4sem/project/unidk2010/envmod/Grassdb/euLAEA/env_mod_mapset/dbf"

#create output/R folder
system(paste("mkdir ",home_dir,"/output/R", sep=""))
setwd(paste(home_dir,"/output/R",sep=""))

species_list <- c("vanessa_atalanta", "crambe_maritima", "humulus_lupulus", "scolitantides_orion", "sedum_acre")

# load predictors DataFrame and clean it from the NA values
# define predictor types (numeric factors if needed
vanessa_atalanta <- read.dbf(paste(dbf_dir,"/vanessa_atalanta.dbf",sep=""))
crambe_maritima <- read.dbf(paste(dbf_dir,"/crambe_maritima.dbf",sep=""))
humulus_lupulus <- read.dbf(paste(dbf_dir,"/humulus_lupulus.dbf",sep=""))
scolitantides_orion <- read.dbf(paste(dbf_dir,"/scolitantides_orion.dbf",sep=""))
sedum_acre <- read.dbf(paste(dbf_dir,"/sedum_acre.dbf",sep=""))

write.table(vanessa_atalanta, file="vanessa_atalanta.csv", row.names=TRUE, sep=",")
write.table(crambe_maritima, file="crambe_maritima.csv", row.names=TRUE, sep=",")
write.table(humulus_lupulus, file="humulus_lupulus.csv", row.names=TRUE, sep=",")
write.table(scolitantides_orion, file="scolitantides_orion.csv", row.names=TRUE, sep=",")
write.table(sedum_acre, file="sedum_acre.csv", row.names=TRUE, sep=",")

#remove rows with NA
# some occurrences are outside our region (Europe)
vanessa_atalanta = na.omit(vanessa_atalanta)
crambe_maritima = na.omit(crambe_maritima)
humulus_lupulus = na.omit(humulus_lupulus)
scolitantides_orion = na.omit(scolitantides_orion)
sedum_acre = na.omit(sedum_acre)

# define the factor variables (classes)
vanessa_atalanta$pr205 = factor(vanessa_atalanta$pr205, ordered = FALSE)
vanessa_atalanta$pr250 = factor(vanessa_atalanta$pr250, ordered = FALSE)
vanessa_atalanta$pr254 = factor(vanessa_atalanta$pr254, ordered = TRUE)
vanessa_atalanta$pr255 = factor(vanessa_atalanta$pr255, ordered = TRUE)
vanessa_atalanta$pr257 = factor(vanessa_atalanta$pr257, ordered = TRUE)
vanessa_atalanta$pr258 = factor(vanessa_atalanta$pr258, ordered = TRUE)
vanessa_atalanta$pr259 = factor(vanessa_atalanta$pr259, ordered = FALSE)
vanessa_atalanta$pr262 = factor(vanessa_atalanta$pr262, ordered = FALSE)
vanessa_atalanta$pr264 = factor(vanessa_atalanta$pr264, ordered = FALSE)
vanessa_atalanta$pr265 = factor(vanessa_atalanta$pr265, ordered = FALSE)
vanessa_atalanta$pr292 = factor(vanessa_atalanta$pr292, ordered = FALSE)
#drop unused levels
vanessa_atalanta = drop.levels(vanessa_atalanta)

# define factor variables save way for other species ...
crambe_maritima$pr205 = factor(crambe_maritima$pr205, ordered = FALSE)
crambe_maritima$pr250 = factor(crambe_maritima$pr250, ordered = FALSE)
crambe_maritima$pr254 = factor(crambe_maritima$pr254, ordered = TRUE)
crambe_maritima$pr255 = factor(crambe_maritima$pr255, ordered = TRUE)
crambe_maritima$pr257 = factor(crambe_maritima$pr257, ordered = TRUE)
crambe_maritima$pr258 = factor(crambe_maritima$pr258, ordered = TRUE)
crambe_maritima$pr259 = factor(crambe_maritima$pr259, ordered = FALSE)
crambe_maritima$pr262 = factor(crambe_maritima$pr262, ordered = FALSE)
crambe_maritima$pr264 = factor(crambe_maritima$pr264, ordered = FALSE)
crambe_maritima$pr265 = factor(crambe_maritima$pr265, ordered = FALSE)
crambe_maritima$pr292 = factor(crambe_maritima$pr292, ordered = FALSE)
#drop unused levels
crambe_maritima = drop.levels(crambe_maritima)

#
# the pseudo-absence points
random2000 <- read.dbf(paste(dbf_dir,"/random2000.dbf",sep=""))
write.table(random2000, file="random2000.csv", row.names=TRUE, sep=",")
#
# define factor variables
random2000$pr205 = factor(random2000$pr205, ordered = FALSE)
random2000$pr250 = factor(random2000$pr250, ordered = FALSE)
random2000$pr254 = factor(random2000$pr254, ordered = TRUE)
random2000$pr255 = factor(random2000$pr255, ordered = TRUE)
random2000$pr257 = factor(random2000$pr257, ordered = TRUE)
random2000$pr258 = factor(random2000$pr258, ordered = TRUE)
random2000$pr259 = factor(random2000$pr259, ordered = FALSE)
random2000$pr262 = factor(random2000$pr262, ordered = FALSE)
random2000$pr264 = factor(random2000$pr264, ordered = FALSE)
random2000$pr265 = factor(random2000$pr265, ordered = FALSE)
random2000$pr292 = factor(random2000$pr292, ordered = FALSE)
#drop unused levels
random2000 = drop.levels(random2000)

#
#save all vars in workspace
save.image(file="workspace.RData")
# load("~/ost4sem/studycase/ITA_veg/R_script/results/RF47")

# set the presence 1 and absence as 0
vanessa_atalanta$pa <- 1
random2000$pa <- 0

# combine species occurrences with the random pseudo absences
pred <- rbind(vanessa_atalanta, random2000)
pred$pa=as.factor(pred$pa)

# subset to 47 predictor variables
# to make the model calibration fasetr for optimizing operations
pred47 = subset(pred, select = c(pa, pr101,pr102,pr103,pr104,pr105,pr106,pr107,pr108,pr109,pr110,pr111,pr112,pr113,pr114,pr115,pr116,pr117,pr118,pr119,pr120,pr121,pr122,pr123,pr124,pr125,pr126,pr127,pr128,pr129,pr131,pr133,pr134,pr135,pr136,pr137,pr138,pr139,pr198,pr199,pr200,pr201,pr202,pr203,pr204,pr205,pr259,pr262))

# remove records with NA values
pred = na.omit(pred)
pred = drop.levels(pred)
pred47 = na.omit(pred47)
pred47 = drop.levels(pred47)

###########################
### MODEL BUILDING ########
###########################

## run a randomForest model using 47 predictors
RF47 = randomForest (pred47$pa ~ pr101 + pr102 + pr103 + pr104 + pr105 + pr106 + pr107 + pr108 + pr109 + pr110 + pr111 + pr112 + pr113 + pr114 + pr115 + pr116 + pr117 + pr118 + pr119 + pr120 + pr121 + pr122 + pr123 + pr124 + pr125 + pr126 + pr127 + pr128 + pr129 + pr131 + pr133 + pr134 + pr135 + pr136 + pr137 + pr138 + pr139 + pr198 + pr199 + pr200 + pr201 + pr202 + pr203 + pr204 + pr205 + pr259 + pr262, data = pred47, ntree=1500 ) 

save(RF47,file="RF47")
# load("RF47")
RF47$confusion

## run a randomForest model using 11 predictors (the most important variables, GINI index)
pred11 = subset(pred47, select = c(pa,pr109,pr139,pr136,pr138,pr128,pr111,pr127,pr104,pr123,pr126,pr101))
RF11 = randomForest (pred11$pa ~ pr109+pr139+pr136+pr138+pr128+pr111+pr127+pr104+pr123+pr126+pr101, data = pred11, ntree=1500 )
save(RF11,file="RF11")
# load("RF11")
RF11$confusion
#!/bin/bash
#
# Copenhagen June 2010, ost4sem, unidk2010
#
# Title: Envelope models
# Authors: Alistair Auffret, Dag Endresen, Torben Wittwer, Zhenlin Yang
# Description: Species distribution from envelope modeling
# Table data from GRASS --> R
# Calibrate SDM models in R
# Deploy SDM models to calculate prediction of species distribution in Sweden
#

library(rgdal)
library(maptools)
library(grid)
library(randomForest) 

print(" ------------- NOW CREATE predictors grid stack  -------------")
home_dir <- "/home/user/ost4sem/project/unidk2010/envmod"
# set the output/R folder
setwd(paste(home_dir,"/output/R",sep=""))
### Imoprt model RF
load("RF11")

gc() # garbage collect to clean internal memory
path.rasters = as.character("/home/user/ost4sem/project/unidk2010/envmod/input")
path.results = as.character("/home/user/ost4sem/project/unidk2010/envmod/output/map/")

# tile = c(20*0:114)
# for(t in 1:57){

tile = c(20*0:3000)

for(t in 75:125) {
  datagrids = readGDAL(paste(path.rasters,"/pr101europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))
  datagrids$pr101 = datagrids$band1
  datagrids$pr123 = readGDAL(paste(path.rasters,"/pr123europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  gc()
  datagrids$pr104 = readGDAL(paste(path.rasters,"/pr104europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  datagrids$pr127 = readGDAL(paste(path.rasters,"/pr127europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  datagrids$pr111 = readGDAL(paste(path.rasters,"/pr111europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  gc()
  datagrids$pr109 = readGDAL(paste(path.rasters,"/pr109europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  datagrids$pr139 = readGDAL(paste(path.rasters,"/pr139europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  datagrids$pr136 = readGDAL(paste(path.rasters,"/pr136europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  gc()
  datagrids$pr138 = readGDAL(paste(path.rasters,"/pr138europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  datagrids$pr128 = readGDAL(paste(path.rasters,"/pr128europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  datagrids$pr126 = readGDAL(paste(path.rasters,"/pr126europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  gc()
  # datagrids$pr259 = readGDAL(paste(path.rasters,"/pr259europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))$band1
  # datagrids$pr259 = factor(datagrids$pr259, ordered=FALSE)

  cat(" ---- 12 grids inported into datagrids file - now drawing the map------\n")

  ## predict function is used to calculate pixel values and index accordind to the random forest model and original rasters inported
  pred.mapRF = predict (RF11 , newdata=datagrids, type="response")
  cat("---- class predicted ----\n")

  pred.mapRFindex = predict (RF11 , newdata=datagrids, type="vote")
  cat("---- class index recorded ----\n")

  cat("---- map values for Forest Type calculated and added to predict map vector ----\n")
  ## NOW create an empty tile with geographical coordinates to pixel values = 20
  data2map = readGDAL(paste(path.rasters,"/pr104europe.tif",sep=""), offset=c(0,tile[t]), region.dim=c(2100,20))
  cat("---- geocoordinates inported ----\n")

  data2map$band1 = 20
  cat("---- no data value is now = 15 in this tile[t] ----\n")

  ## NOW adding predicted values to the empty tile[t]
  data2map$band1[as.numeric(dimnames(pred.mapRFindex)[[1]])] = as.numeric(pred.mapRF)
  cat("---- class values added to tile[t] ----\n")

  ## NOW creating an asciii TILE file for modelled species
  write.asciigrid(data2map, paste(path.results,t,"tile_vanessa_scan.asc",sep=""))

  cat("-----------------------------------------Tile", t,"created--------------------------\n")
  ## cleaning up the memory
  rm(data2map)
  rm(datagrids)
  rm(pred.mapRF)
  rm(pred.mapRFindex)
  gc()
}

system("cd /home/user/ost4sem/project/unidk2010/envmod/output/map/")
system("gdalwarp  -srcnodata \"-9999\"  -dstnodata \"-9999\" -of GTiff -ot Int16 -wt Int16     *.asc   vanessa_scand.tif")
#system("rm *tile_scand.asc")

RESULTS and DISCUSSION

Insert a map and related table or graphics if available
Discuss the biological or geographical significance of results
The most important predictors, randomForest model for Vanessa atatlanta
Distribution of Vanessa atalanta with randomForest
Distribution of Vanessa atalanta with MaxEnt
Screen-shot of the click and run MaxEnt Java application



upload.wikimedia.org_wikipedia_commons_thumb_9_98_history_of_west_australia-the_end.jpg_800px-history_of_west_australia-the_end.jpg 



 

The psychic octopus Paul has shown an impressive prediction rate for the football games these last weeks! An alternative spatial ecology modeling approach would thus be to fill small cells in a physical grid with octopus food and submerge the frame into the aquarium. Paul would then eat octopus food from the small grid cells where the species habitat suitability is the highest…
Work in progress… We do not yet have the first prediction grid from Paul, but expect that Paul will no less than revolutionize the spatial ecology modeling…



2 Comments leave one →
  1. October 26, 2010 13:26

    BREAKING NEWS:
    A tragedy for modern species distribution modeling: Paul the Octopus (January 2008 – 26 October 2010) is dead, http://en.wikipedia.org/wiki/Paul_the_Octopus

Trackbacks

  1. GBIF occurrence data access and download « LifeBITS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: