Skip to content

Persistent identifiers at

July 13, 2016

The Norwegian GBIF node ( has chosen to use UUIDs (universally unique identifier) prefixed by a PURL as the preferred Darwin Core occurrenceID format. The PURL prefix provides the PID to be resolvable (using the linked open data friendly HTTP protocol). Resolvable means to enable a data user (or a machine) finding the PID to look up useful information about the thing that is identified. The PURL is redirected to a resolver service located at

As part of the data publishing process, the GBIF node monitors and scans all Norwegian datasets published in GBIF by Norwegian institutions to establish the resolver service for each herbarium specimen or observation. The UUIDs can be easily generated by the data publishing institutes locally (without any central coordination required) and the resolver service itself is established during data publication as a service from the Norwegian GBIF node.

Our resolver service provide by default a HTML information page and accept HTTP content negotiation to deliver other and machine readable formats such as JSON-LD, RDF, n3/turtle, comma separated valued, or tab delimited text. (You may also simply append an extension to the PID string such as “.json” to preview the result you would get from content negotiation).

The PURL configuration could for example be updated with a redirection to instead of to, if GBIF choose to establish a resolver service.

Why we chose UUID?

The UUID can be easily generated locally by the herbarium or the researcher producing or managing the datasets. Robust UUIDs can even easily be generated offline.

The UUIDs themselves provide a globally unique identifier that is not dependent on the PURL prefix. This will allow us to more easily migrate to other resolver solutions if this is required or requested in the future. In fact we expose the UUID in URN format as the “pure” form of the identifier.


We do encourage other data aggregation services to build resolver services (and other services) to reuse the “pure” UUID form (without the PURL prefix).

Other Darwin Core identifier terms

We also use the very same PURL + UUID format for some of the other identifier terms in Darwin Core. Most notably we have started to use the same format for eventID and taxonID. Some of the first data publishing institutes in Norway have used the PURL + UUID format for eventID, and we are now in the process of expanding the resolver service to handle also these PIDs. The same PURL + UUID format can, as it is, be used for any of the Darwin Core identifier term. The main challenge here is that the Darwin Core archive format is a very denormalized format and the Darwin Core terms are not always unambiguously described to belong to a particular class (type of thing, e.g. occurrence, organism, taxon, event, location). So it is not always easy to know exactly which attributes in the Darwin Core archive that describes the identified thing.

Whenever possible, we encourage the use of external systems such as Geonames or GRBio to describe things. We always recommend the data publishing institute to report the Geonames PID as dwc:locationID or the GRBio PID (for dwc:instituteID, dwc:collectionID) – and not to generate any separate new UUID here!!


Endresen D, and Svindseth C (2014). Persistent identifiers for museum specimens in Norway. [plenary] Proceedings of TDWG 2014, Jönköping, Sweden. doi:10.13140/2.1.4516.9606

Hagedorn G, Catapano T, Güntsch A, Mietchen D, Endresen D, Sierra S, Groom Q, Biserkov J, Glöckler F, and Morris R (2013). ‘Best practices for stable URIs’,

Obreza M and Endresen D (2015). ‘Persistent identifiers for germplasm’. White paper, February 2015.

FAO (2014) ‘Technical options to facilitate the establishment of data links in the field of plant genetic resources for food and agriculture: Permanent unique identifiers, IT/COGIS-1/15/3, November 2014’, International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA), Food and Agriculture Organization of the United Nations (FAO), Rome, Italy,

Guralnick RP, Cellinese N, Deck J, Pyle RL, Kunze J, Penev L, Walls R, Hagedorn G, Agosti D, Wieczorek J, Catapano T, and Page RDM (2015). Community next steps for making globally unique identifiers work for biocollections data, ZooKeys 494:133–154, doi: 10.3897/zookeys.494.9352


Key challenges

  • Many of the original source datasets indexed by GBIF are regularly updated and re-indexed by the GBIF portal. Without stable and persistent identifiers information on the same herbarium specimen (or species observation) are sometimes included more than one time, leading to duplicated information – duplicated in the sense of more than one (unlinked) data record for the same Real World entity.
  • Without stable and persistent identifiers for herbarium specimens (and species observations) it is difficult to link the same data record indexed at different re-indexing cycles of the GBIF portal. When a data record previously indexed is not re-identified in a new version of a given dataset, then the record is deleted from the portal, and the link to previous versions of this data record is lost.
  • A composite identifier (called Darwin Core triplet) based on a combination the metadata attributes for institute code (dwc:instituteCode), collection code (dwc:collectionCode), and the local specimen identifier (dwc:catalogNumber) is often used as the specimen identifier in GBIF. However, all three metadata attributes can (and do) sometimes change.
  • What could be a best practice guideline for identifier resolution. Is it useful to define and agree on a (set of) common and well-defined response format? Is it useful to provide recommendations for a set of metadata profiles with a clear set of defined metadata attributes? Or would more general principles and more open recommendations be more likely to stand the test of time and remain relevant with the emergence of new information infrastructure technologies?
  • Challenges, pros and cons of reusing object identifiers and metadata attribute terms declared by others without full control of how these objects and terms are maintained. Objects and concepts declared for a particular purpose will often not match exactly the needs suitable for another purpose. How to optimally reuse each others OWL ontologies, metadata vocabularies and data object models?
  • Identifiers identifying the Real World physical objects, the entities that the collection curators and users of the information care about. Or should the identifier be assigned to database records? Real World entities will not have a signature byte-sequence and will rely of interpretation of when an object is considered to be the same thing.

European GBIF nodes meeting bioblitz

April 29, 2016

The European GBIF nodes 2016 meeting was organized in Lisbon. At the excursion after the EU Nodes meeting we made sort of a Bioblitz as a project in iNaturalist.

At the Bioblitz Wouter Koch added a picture of some butterfly larvas on Meco beach (geonames:10342405) – and was asked if he knew the name of the host plant that the larva was feeding on.

Butterfly larva reported by Wouter Koch:
dwc:occurrenceID = urn:uuid:660a49ac-0b24-11e6-81d1-0b38fd7a862e


Large white butterfly (Pieris brassicae) observed by Wouter Koch April 22 2016, CC-BY.

I added my picture of the same larvas, and of the host plant and asked the iNaturalist community for help with the host plant species name.

My observation of the same butterfly larva:
dwc:occurrenceID = urn:uuid:4e6241a0-0b20-11e6-a837-0800200c9a66

My observation of the host plant:
dwc:occurrenceID = urn:uuid:57103a50-0bbb-11e6-a837-0800200c9a66


European searocket (Cracile maritima) observed by Dag Endresen 22 April 2016, CC-BY.

Wouter and I have tried to create and use the same dwc:eventID and dwc:organismID to tie all three observations together as well as annotating the species interaction between the host plant and the butterfly larvas (using iNaturalist terms).

Same dwc:eventID for butterfly and host plant:

Butterfly dwc:organismID = urn:uuid:3ac2f3a0-0b21-11e6-a837-0800200c9a66
Host plant dwc:organismID = urn:uuid:57103a50-0bbb-11e6-a837-0800200c9a66

It could be cool if we could harvest such iNaturalist observation annotation onto the data records presented by the GBIF portal. My preferred solution would be for the GBIF portal to aggregate all information known about events identified by their dwc:eventID across any dataset published in GBIF and likewise aggregate any known information about an organism identified by an dwc:organismID.

I have also explored linking some of my iNaturalist observations from the excursion to the same picture in Flickr and machine tagging the image directly in Flickr.


Downloading occurrence data using the GBIF REST API

September 11, 2015


Several R-packages (such as rgbif and dismo) facilitate download of GBIF-mediated occurrence records. However, here the download of GBIF data use a method of paging through the results (maximum 200 000 records at a time [link1]) which could demand very, very much time for species with many occurrence records.


The asynchron download as described from the GBIF portal API documentation page, provides a much faster and much more reliable download option for large sets of records. Here, Markus describes an approach to write your filter condition in a json file and issue a curl request to post the data download on the GBIF servers.

I have followed these instructions using BASH, the Unix command line shell on my MacBook. I assume it is possible to do something similar in MS DOS, but I would much rather suggest to install a bash prompt or to use another programming environment such as e.g. Python, PHP, Perl, Ruby, … if perhaps you are stuck on a Windows computer 😉

I noticed that leaving the query filter condition in a separate json file (as described by the GBIF API documentation), did not allow me to easily loop through a long list of species names to issue separate asynchron download requests. So instead, I wanted to make a bash function where I could give only the species name (actually the speciesKey or in fact the taxonKey) as the input parameter (I used R to find the speciesKey).

I believe that the genusKey will work all fine in the script as well. I believe that you can give familyKey, genusKey, speciesKey etc as input taxonKey – but I did not actually test this.

--- Provide your GBIF API user name and your email
--- (replace in the code below)
--- Copy the function to memory and paste into a bash command line prompt.
function gbifapi { curl -i --user _YOUR_GBIF_USER_NAME_:_YOUR_GBIF_PASSWORD_ -H "Content-Type: application/json" -H "Accept: application/json" -X POST -d "{\"creator\":\"_YOUR_GBIF_USER_NAME_\", \"notification_address\": [\"_YOUR_EMAIL_\"], \"predicate\": {\"type\":\"and\", \"predicates\": [{\"type\":\"equals\",\"key\":\"HAS_COORDINATE\",\"value\":\"true\"}, {\"type\":\"equals\", \"key\":\"TAXON_KEY\", \"value\":\"$1\"}] }}" >> log_gbifapi.txt echo -e "\r\n$1 $2\r\n\r\n----------------\r\n\r\n" >> log_gbifapi.txt }

You will notice from the code that I log all the download requests in a log-file (log_gbifapi.txt) in the current directory of the bash command line prompt.

To call the bash function, I created a list in a spreadsheet with first column the name of the function (gbifapi), the second column the respective speciesKey values (which I used R to find), and then as third column I included the species name (for no other reason than providing a human readable label for myself).

--- Copy to memory and paste into bash command line prompt
--- (or run as script)
gbifapi 4140730 "Aciachne acicularis"
gbifapi 4140704 "Aciachne flagellifera"
gbifapi 5289784 "Aegilops comosa"
gbifapi 4138203 "Aegilops mutica"
--- ... etc

I noticed that pasting the full list of 300 species API calls into my bash command line prompt caused some kind of time-out error. So, I spilt the list into segments of some 30 species at a time and allowed these species download requests to be placed at the GBIF server before doing the next segment of some 30 species. When this was done all my species download request was placed in the queue at the GBIF servers for asynchron download. Some species completed in a few minutes and those with the more numerous occurrences could take up to an hour. After one day all 300 species download requests where completed.

All completed download files are listed from your user profile at the GBIF portal, and you may simply pick them up here:

The log file log_gbifapi.txt captures the response from the GBIF API for each respective species download request. Here, the downloadKey for each respective data file is provided – however, not as a clean attribute parameter, so some regular expression text cleaning is needed. I did not yet complete making this step into a script…

With the cleaned list of download keys, you could issue a set of e.g. wget commands to collect your download files from the GBIF server.


Scott Chamberlain, Karthik Ram, Vijay Barve, Dan Mcglinn (2015). rgbif: Interface to the Global Biodiversity Information Facility ‘API’. Available at and

Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith (2015). dismo: Species Distribution Modeling. Available at

REcology (2012) GBIF biodiversity data from R, more functions. Available at

Natural history citizen science crowdsourcing

June 1, 2014

Are you interested in natural history? Help us to capture label information from images of specimens from the Norwegian natural history collections in Oslo.

New crowdsourcing portal for natural history collections! Help us to record information on museum specimens from the collections of the Natural History Museum at the University of Oslo! The new transcription portal, developed by GBIF-Norway, is launched today at the 200-year jubilee party for NHM-UiO botanical garden. The presentation of the portal will take place at 17:30 in the auditorium of Lids hus (Botanical museum) at the Tøyen campus and Botanical gardens. Christian Svindseth (GBIF-Norway, NHM-UiO) has developed the computer code for the new portal. Visit the portal at: (Figure 1).

NHM-Oslo transcription portal

Figure 1: The Natural History Museum at the University of Oslo (NHM-UiO) and GBIF-Norway presents a new citizen science platform to capture label information from photographs of specimens from natural history collections in Norway. Available at

Digitization of natural history collections
The collections at the Natural History Museum in Oslo include an estimated total of more than 6 million specimens (Mehlum et al., 2011). The collections in Oslo are estimated to hold more than 65% of the specimens held by natural history museums in Norway. The digitization of the Norwegian natural history collections has high priority and has reached a level of more than 50% of the specimens recorded and added into an electronic database system. This is a high proportion digitized when compared to other large natural history collections worldwide, but the estimated efforts to complete the appropriate registration of all remaining specimens is daunting. The Natural History Museum in Oslo has started a large-scale digitization activity in 2013 where specimens are photographed and only the very minimum information of the scientific name and the country where the specimen was collected is registered.

Primary biodiversity information
Large-scale imaging of the specimens in the Norwegian natural history collections in Norway is prioritized and has started. However, only a very minimum of the label information such as scientific name (sometimes only genus) and collecting country will be captured in this project. Capturing additional information such as the collecting location (where), collecting date (when) and the verified current scientific name (what) will substantially increase the scientific value of these data records. The data on where, when and what define the so-called primary biodiversity information and is recognized as the minimum information requirement for respective scientific research. Species distribution modelling is one of the important research tools for understanding the ecology of species, and is dependent on available primary biodiversity information (where, when and what).

Why participate and contribute to citizen science transcription
* Discovery of biodiversity information: Transcription of label information and electronic registration into online databases greatly improve the discoverability of museum specimens for the purpose of scientific research and other public use.
* Education: Students from high school level to graduate and post-graduate level can engage with the photographs of the museum specimens and take part in a first class learning experience in interaction with this resource of primary biodiversity information.
* Scientific research: Scientists that study natural history need readily access to primary biodiversity information made available from museums and their online databases. Using the transcription portal they can take a direct part in making the primary biodiversity information they need for their own research available by transcribing the labels for the respective species groups and or countries that they study.
* Public good, open and free online biodiversity information: The information that we gather from the transcription portal will flow into the museum specimen database and be published to open and free data portals such as the Global Biodiversity Information Facility (GBIF), Norwegian Species Map Service (Artskart) and the Encyclopedia of Life (EOL). This valuable information for documenting historic biodiversity patterns are thus preserved not only for future generations, but also made available for ongoing current research using up-to-date and modern web technologies.

Lichen herbarium, Hildur Krog collection from eastern Africa
The first specimen collection that was loaded to the new citizen science transcription portal is the lichens collected by the Norwegian biologist Hildur Krog and others in East Africa. This collection is part of the lichen herbarium and includes more than 2 500 specimens (figure 2). Professor Hildur Krogh was originally introduced to limnology as a student of professor Eilif Dahl (1916-1993). Eilif and Hildur pioneered the work on chemical methods for identification of lichen species. Hildur was appointed curator of the lichen herbarium at the Botanical Museum of the University of Oslo in 1971. Between 1972 and 1996 Hildur Krog and T.D.V. Swinscow explored systematically the lichen genera of East Africa for the development of the flora “Macrolichens of East Africa”. With this citizen science portal, we are asking for volunteers to assist us with transcribing the label information from the herbarium specimens collected during these expeditions to East Africa. The imaging of this collection was made late 2013 and early 2014 by Silje Larsen Rekdal and Even Stensrud under the coordination of lichen curator Einar Timdal and Siri Rui (NHM-UiO) and with funding from GBIF-Norway (Figure 3).

East African macrolichens collected by Hildur Krog et al.

Figure 2: Collecting sites for the East African macrolichens collected by Hildur Krog et al. and included to the citizen science transcription portal.

Figure 3, curator Einar Timdal digitize specimens from the lichen type herbarium at the Natural History Museum in Oslo, January 2013. Photo: Dag Endresen (NHM-UiO) CC-by-4.0.

Figure 3, curator Einar Timdal digitize specimens from the lichen type herbarium at the Natural History Museum in Oslo, January 2013. Photo: Dag Endresen (NHM-UiO) CC-by-4.0.

Mycological herbarium at NHM Oslo
The Mycological herbarium includes approximately 300 000 specimens were approximately 2/3 are electronically registered into the database with label information captured. NHM-Oslo has started a large-scale activity to photograph the specimens of the collections under the coordination of Dr. Eirik Rindal. The Mycological herbarium is here one of the first collections to be photographed. During one week (in September 2013) the staff at the museum digitized around 6000 specimens from the Mycological herbarium (Figure 4). We plan to explore the new citizen science transcription portal as a tool to capture the label information for the remaining specimens not yet appropriately registered into the database. After making the first experiences with transcription of the lichen collection, we plan to also load the first approximately 40 000 specimen images from the Mycological herbarium – and later add even more collections and sets of specimen images incrementally following the progress of the digitization activity.

Figure 4, digitation of specimens from the mycological herbarium, 6 September 2013. Photo: Dag Endresen (NHM-UiO) CC-by-4.0.

Figure 4, digitation of specimens from the mycological herbarium, September 2013. Photo: Dag Endresen (NHM-UiO) CC-by-4.0.

When is a specimen transcription complete?
Each specimen image is transcribed by at least three volunteers and the recorded information from each volunteer compared. If all three transcriptions provide the same information, the specimen transcription is flagged as completed. If all three transcriptions provide different information the specimen image will be flagged as incomplete and presented for review by new volunteers until there is a 50% agreement (on each information input box). Collection curators and museum staff will review the results, as they come in, before the information is included into the collection database and published to the Norwegian Artskart portal and the global GBIF portal.

Read more:

Label information should be transcribed verbatim
We ask our volunteer citizen scientists to transcribe the specimen label information in verbatim form as close to the information printed or written on the specimen label as possible. The citizen scientists are not recommended to make their own interpretations or corrections. We do recognize that this recommendation could be a lost opportunity to collect citizen science curation and correction of the specimen database. We are working on a solution for citizen scientists to provide such interpretations and inferred additional information from the same interface. Such a specimen annotation service could provide citizen scientists to a wide variety of inferred information about the museum specimens, including eg. georeferencing with geographic coordinates or links to other systems such as sequence data deposited in GenBank or BoL or traits in EOL.

Read more:

Challenge: How do we approach citizen scientist interpretations of label text? How to add annotations when the volunteers find or infer more information from other sources than only the specimen label?

Collaborator: Notes from Nature
The Notes from Nature portal provided the primary source of inspiration for the new crowdsourcing portal at NHM-Oslo. We are grateful for very valuable feedback and assistance from the Notes from Nature team including in particular director Michael Denslow at the National Ecological Observatory Network (NEON) and professor Robert Guralnick at the Museum of Natural History, University of Colorado at Boulder. Notes from Nature provides a citizen science platform to capture label information from photographs of specimens from natural history collections (Hill et al., 2012; Franzoni and Sauermann, 2013). The Notes from Nature software platform was developed and is maintained by Zooniverse and Vizzuality in collaboration with university museums and network in Florida (SERNEC), California (CalBug), Colorado (UCMNH) and the bird collection of the Natural History Museum in London (NHMUK). Notes from Nature is open source software, with the source code freely available at GitHub.

Links to some similar citizen science transcription portals
Many natural history collections are these days starting to establish similar transcription portals. One of the first of these crowdsourcing portals was the Herbaria@home from the Botanical Society of Britain & Ireland launched around 2006. The Atlas of Living Australia (ALA) Volunteer Portal provides a crowdsourcing platform for transcription of Australian collections of natural history specimens. The National Museum of Natural History (MNHN) in Paris provides a transcription portal for the collections in France. The Smithsonian National Museum of Natural History (NMNH) provides a Transcription Center with another excellent crowdsourcing portal.



Franzoni C, and Sauermann H (2013). Crowd science: The organization of scientific research in open collaborative projects, Research Policy, Available online 14 August 2013, ISSN 0048-7333, doi:10.1016/j.respol.2013.07.005.

Hill A, Guralnick R, Smith A, Sallans A, Gillespie R, Denslow M, Gross J, Murrell Z, Conyers T, Oboyski P, Ball J, Thomer A, Prys-Jones R, de la Torre J, Kociolek P, and Fortson L (2012). The notes from nature tool for unlocking biodiversity records from museum records through citizen science. ZooKeys 209: 219-233. doi:10.3897/zookeys.209.3472

Mehlum F, Lønnve J, and Rindal E (2011). Samlingsforvaltning ved NHM – strategier og planer. Versjon 30. juni 2011. Naturhistorisk museum, Universitetet i Oslo. Rapport nr. 18, pp. 1-89. ISBN: 978-82-7970-030-2. Available at, accessed 28 May 2014.

Pensoft Publishers (2012). No specimen left behind: Mass digitization of natural history collections [special issue]. Editors: Blagoderov, V. and Smith, V. ZooKeys 209: 1-267. ISBN: 9789546426451. Available at

Links (2014) “Vil ha di hjelp til å registrere gamle plantar”,

Natural History Museum in Oslo (2014) “Vil ha di hjelp til samlingsregistrering”,


Convert coordinates between spatial reference systems (SRS)

November 22, 2013

When calibrating the prediction model (species distribution model) in Maxent, both types of input spatial data, the samples/localities (the dependent, response variable) and the environment layers (independent, explanatory, predictor variables), have to be described using the very same spatial reference system (SRS). For spatial data with national coverage in Norway, it is common to use the Universal Transverse Mercator (UTM) of zone 33N and WGS84 datum (epsg:32633). The environmental layers provided for the BIO4115/BIO9115 master/PhD course at the Natural History Museum of the University of Oslo are provided in the UTM33N format.

Artskart provides species occurrence locality coordinates in both WGS84 UTM 33N (epsg:32633) and the standard WGS84 decimal degrees (epsg:4326) (often called “WGS84 latlong”).

ScientificName Longitude Latitude UTMsone UTMost UTMnord
Archillea milefolium 9.589478 62.597128 33 222412 6952352
Archillea milefolium 6.691447 62.545205 33 73481 6962430


The GBIF portal provides species occurrence locality coordinates as standard WGS84 decimal degrees (epsg:4326). The GBIF portal is a global portal and does not provide respective national coordinate systems. Few other countries than Norway use the UTM33N SRS. If you want to use species occurrence data downloaded from the GBIF portal together with environmental raster layers in UTM33N (or other SRSs) you will need to convert either the occurrence coordinates or the raster layers to a common SRS.

GDAL/OGR library software

The Geospatial Data Abstraction Library (GDAL/OGR, is a translator library for geospatial raster and vector data formats. The GDAL/OGR library is available for Windows, MacOsX, Linux and other UNIX-like operating systems. On a MacOsX system you may install the frameworks from KyngChaos ( On a Windows system you may want to install the FWTools that includes GDAL/OGR (

Convert GBIF species occurrence data to UTM33N using GDAL

The GDAL/OGR library can be used from the command line or from a GIS. I will describe using GDAL from the command line. On a Windows system with the FWTools open the “FWTools shell” in a DOS window (figure 1). On UNIX-like systems (such as MacOsX or Linux) open a standard terminal window (figure 2) (Applications >> Utilities >> Terminal).

Copy your coordinate tuple to a plain text file with each coordinate tuple on a new line and coordinates separated by a space. GDAL expects the coordinate tuple to be ordered as longitude, latitude (x, y) (easting, northing). So remember to put the x-coordinate (longitude or easting) in the first column, and the y-coordinate (latitude, northing) in the second column. Then use gdaltransform to convert your coordinate tuples.

Figure 1: Using GDAL with FWTools on a Windows system.

Figure 1: Using GDAL with FWTools on a Windows system.

Figure 2: Using GDAL in a terminal window on a MacOsX system.

Figure 2: Using GDAL in a terminal window on a MacOsX system.

Convert the coordinate tuple from decimal degrees to UTM33N

WGS84_DD.txt (longitude latitude coordinates in a text file):
9.327703 58.872074
10.55039 59.95672

$ gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 < “WGS84_DD.txt”
173154.636743861 6539674.3145872 0
251611.627711431 6654946.66561605 0

This GDAL output can be understood/interpreted as:
UTM 33 173154mE 6539674mN
UTM 33 251611mE 6654946mN


To save the gdaltransform output results directly into a new text-file, you may redirect the standard output using a “>” in the command:

$ gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 < “WGS84_DD.txt” > “UTM33N.txt”

Retrieve the UTM coordinates from text file named UTM33N.txt (easting northing).


New line as CR and/or LF

Notice that gdaltransform originates from unix-like systems and expects to find unix-type new line given as LF (“line feed”, “\n”, “0x0A”). MacOsX systems CR (“carriage return”, “\r”, “0x0D”) to separate new lines; while Windows systems use both CR + LR (“\r\n”, “0x0D0A”). If you have your coordinates in a MacOsX formatted text-file, you will need to replace the line endings from CR to LF. Here is an example of how to do this using perl (from the command line).

perl -pi -e ‘s/\r/\n/g’ WGS84_dd.txt # Convert CR (Mac) to LF (unix)

The “-p” flag makes perl operate on each line of the input text-file; “-i” flag makes perl operate directly on the file itself (instead of the need of creating a new file); -e allow a complete one-liner perl command to be run on the command line (not in a script); ‘s/\r/\n/g’ is a regular expression where “s” is for substitute or replace and “g” is for global replace of all matches. Inside the slashes /from-pattern-to-be-substituted/to-pattern-to-be-inserted/.

perl -pi -e ‘s/\r|\r?\n/\n/g’ WGS84_dd.txt # Convert both CR (Mac) and/or CR+LF (Win) to LF (unix)


Online coordinate conversion tools

Coordinates can also be converted using an online conversion tool. Most of these provide only the conversion of one single coordinate tuple at a time.

The online coordinate conversion tool from MyGeodata ( is the most useful online converter I could find. Keep the default input coordinate system (WGS84 (SRID=4326)) and give the output coordinate system as “EPSG:32633”. The tool accepts a list of input coordinate tuples in a wide variety of formats (including reverse latitude-longitude (y-x) ordering by ticking the bottom box: “Switch XY”).

The World Coordinate Converter ( provides a nice interface for converting one coordinate tuple at a time. You may click in the map or type in the coordinate tuple to be converted. Source SRS: “GPS (WGS84) (deg)”. UTM33N is not included in the list of target SRS by default and need to added: Click on the green plus sign button at the bottom right, and type “EPSG:32633” into box number 2.

The conversion tool from EarthPoint (single point, batch conversion, provides an option for batch conversion of multiple coordinate tuples in a spreadsheet – HOWEVER – you will need to register for a user account for this tool to take more than 5 coordinate tuples at a time…


These guidelines were prepared for the BIO4115/BIO9115 master/PhD course at the Natural History Museum of the University of Oslo written by Dag Endresen (GBIF-Norway) on 22nd November 2013.

GBIF occurrence data access and download

October 15, 2012

These are some notes for a student training course on species distribution modelling (BIO4115 and BIO9115) from October to December 2012 at the Natural History Museum, University of Oslo (NHM-UiO).

What is the Global Biodiversity Information Facility?

GBIF enables free and open access to biodiversity data online. We’re an international government-initiated and funded initiative focused on making biodiversity data available to all and anyone, for scientific research, conservation and sustainable development.

Darwin Core: What? Where? When?

Using the GBIF data portal:

Using Artskart, Artsdatabanken:

Using the REST web-service:

Examples for beet:

Examples for dragon head:

You may also want to use an R package to download GBIF presence data:


# get GBIF data with function:
betavulgaris <- gbif("Beta", "vulgaris", geo = T)
dragonhead <- gbif("Dracocephalum", "ruyschiana", geo = T)
sugarkelp <- gbif("Saccharina", "latissima", geo = T)

You may want to use R to plot a map with a preview of the point data:

# plot occurrences:
plot(wrld_simpl, col = "light yellow", axes = T)
points(betavulgaris$lon, betavulgaris$lat, col = "red", cex = 0.5)
text(-140, -50, "Beet")
# -- alternative for Dragonhead:
points(dragonhead$lon, dragonhead$lat, col = "red", cex = 0.5)
text(-140, -50, "Dragonhead")


You may also use the OpenModeller software to access and download GBIF occurrence data. OpenModeller provides a common platform for a number of prediction modelling algorithms – including Maxent.
Lat-long to UTM 33
The environment predictor variables we will use in this course will be in UTM grid 33. So we need to convert the occurrence point data from the lat-long format to UTM 33. The following web page can do this.
See also:

Darwin Core extension for germplasm genebanks

July 16, 2012

Darwin Core (DwC) defines a standard set of terms to describe the primary biodiversity data. Primary biodiversity data are data records derived from direct observation of species occurrences in nature or describing specimens in biological collections. The Darwin Core terms can be seen as an extension to the standard Dublin Core metadata terms. The new Darwin Core extension for genebanks declares the additional terms required for describing genebank datasets, and is based on established standards from the plant genetic resources community. The Global Biodiversity Information Facility (GBIF) provides an information infrastructure for biodiversity data including a suite of software tools for data publishing, distributed data access, and the capture of biodiversity data. The Darwin Core extension for genebanks is a key component that provides access for the genebanks and the plant genetic resources community to the GBIF informatics infrastructure including the new toolkits for data exchange. This paper provides one of the first examples and guidelines for how to create extensions to the Darwin Core standard.
This paper completes the publication of the four manuscripts included in my PhD thesis from last year, The manuscript has completed a major revision and many parts was removed as well as new parts added in this final and published version of the paper. The work described relates to the evaluation of the GBIF Integrated Data Publishing Toolkit (IPT) by the European genebank community during 2010. The Darwin Core extension for germplasm genebanks was a prerequisite for sharing minimum accession level data for genebank datasets using the IPT. Our experiences from designing the DwC germplasm extension are presented with this paper as an example and use case for how to design other DwC extensions. The goal is to provide a useful guideline for other thematic groups to study.
%d bloggers like this: