CITIZEN SCIENCE AND VOLUNTEERED GEOGRAPHIC INFORMATION: CAN THESE HELP IN BIODIVERSITY STUDIES?
Caspian Terns at Iona Beach
Department of Geography, University of British Columbia
Volunteered Geographic information (VGI), a georeferenced type of citizen science, is a growing area of information gathering. The term was coined by geographer Michael F. Goodchild who, in exploring the world of user-generated content on the web, noted that "a remarkable phenomenon ... has become evident in recent months: the widespread engagement of large numbers of private citizens, often with little in the way of formal qualifications, in the creation of geographic information, a function that for centuries has been reserved to official agencies. They are largely untrained and their actions are almost always voluntary, and the results may or may not be accurate. But, collectively, they represent a dramatic innovation that will certainly have profound impacts on geographic information systems (GIS) and more generally on the discipline of geography and its relationship to the general public. I term this volunteered geographic information (VGI), a special case of the more general Web phenomenon of user-generated content ..." (Goodchild 2008).
User-generated content, and its subset, volunteered geographic information, covers a span of geographically-based initiatives and abilities, including such novel applications as Wikipedia, Wikimapia, Flickr, OpenStreetMap, and the overall concept of web mashups (overlaying disparate data and existing maps to produce a new map). As Goodchild (2008) states: "These are just a few examples of a phenomenon that has taken the world of geographic information by storm and has the potential to redefine the traditional roles of mapping agencies and companies."
Headwaters of the Bear River Valley. Photo by Rob Field
In biodiversity studies, volunteered geographic information (VGI) has similarly taken the world by storm, fueled by the growing ability to interact with the web, to display information, and georeference that information. The new explosive world of VGI is inextricably tied to the evolution of the web and associated web-based technologies. As a result of this new web power, there are armies of volunteers now mobilized to collect data on biodiversity. VGI in biodiversity studies involves an array of data gathering, from the compilation of data on species occurrences to information on species abundances--all collected by a volunteer cohort.
While the term VGI is relatively new, there is a long-established tradition of volunteers contributing geographic information on species occurrences and population numbers and trends over the decades. Data gathered by volunteers has been prominent for plants (plant specimen collecting) and for birds (Audubon Christmas Birds Counts and breeding bird surveys). What is different now, however, is recognition of the critical role that VGI can play in documenting biodiversity changes through mapping and atlassing, and the sharp increase in the number of VGI projects over the last decade. VGI is an important tool for monitoring biodiversity in the face of increasing species extinctions, and can provide substantial support for biodiversity research. The emerging questions now become: how reliable is VGI, and how will researchers use it? Can VGI help us monitor species status, population health, and distributions? How does technology aid or enable VGI and, therefore, our knowledge base. Ultimately, is volunteered information a valid component of biodiversity research?
Cartoon by Berry Wijdeven
What can we learn from the past?
The answer to this is multifaceted. A lot depends upon the accuracy of the information provided and how useful it will be in the future. Accuracy in geographic information is important, and while mapping capabilities have dramtically shifted with the advent of on-line mapping tools and GPS-enabled smart phones, sound data is still at the heart of any mapping. As Goodchild (2008) points out, heavily relied upon mapping sites such as Google Earth have a well-known error component--they are not 100% accurate. Professional GIScientists recognize this, the public may not, and use of available technology such as this comes with built-in error issues. However, while issues such as these have to be addressed, VGI marches forward and gains momentum and use. What does this mean for volunteer gathered data?
Error and accuracy issues surrounding VGI means that strong emphasis has to be placed on the accuracy of the initial source data. Obtaining locationally-accurate information and documenting it allows researchers to revisit the original data records. This is why historical records held in museums and herbaria are widely accepted by researchers. By looking at how VGI was collected and stored in the past, at what efforts are being made today to automatically georeference that information, and how it has been used, we can gain some insights into what is needed so that newly collected VGI will be as useful tomorrow as historical information is today.
Museum collections provide good insight into this. There are hundreds of millions of records (collections of plants and animals) stored in herbaria and museums in countries around the world. Most of those records represent the voluntary efforts of hundreds of individuals over the years. Today there are many efforts underway to take those records and enter the information into electronic databases, with the aim of better understanding biodiversity and the changes that are occurring. Although many collections were made more than 100 years ago, they provide valuable data for researchers. Why? What is it about them that makes them a valid data source for researchers?
There are several reasons why historically collected VGI is useful to researchers:
- Collections come with geographic information that allows the researcher to locate the record on an interactive map;
- Collections were preserved and the attributes of the record can be entered into a database;
- The positional accuracy was limited (often collection labels simply name the nearest town or post office), but generally, within the uncertainty limits, accuracy levels are acceptable to the researcher;
- Temporal accuracy was high (i.e. the date of collection was recorded);
- High attribute accuracy means that researchers can confirm records if necessary, a key factor in record acceptance by researchers;
- Semantic accuracy is high, since any changes to the scientific name are documented on the collection itself.
Future Data Collection
Each of these (attribute accuracy, positional accuracy, temporal accuracy, and semantic accuracy) are critical to the reliability and usability of data. If data collected today meets these key requirements, then long-term validity and usefulness will be built in.
Because data accuracy is an overriding factor in the usability of VGI, a strong data vetting mechanism is needed. Some current VGI projects have strongly stressed this need for accuracy, and, indeed, eBird--which is one of the largest VGI projects currently underway--states that: " A database is only as good as its weakest record. If even a few records can be deemed questionable, then the entire data set can be labeled as such. With that in mind, we should all strive to keep the eBird data as clean as possible. You can do your part by being conservative in the field and meticulous in your data entry, and we can do ours by building better connections between the eBird community and scientists" (eBird 2009).
Overall, the use and validity of VGI is dependent on the use that will be made of the data. This is where the geographic extent of the data comes into play. For regional and national studies, precise spatial accuracy is probably not an issue. However, for local studies, and for studies looking at development of predictive models, both spatial and attribute accuracy need to be high. Valid VGI projects need to assess how their data will be used and what the acceptable accuracy levels will be.
On-going VGI Initiatives--Their Validity
There are several important VGI initiatives that are ongoing today that provide critical information that may be used by researchers. These have high 'credibility' amongst researchers because they address the need for accuracy and reliability, as outlined above. These include
eBird was initiated in 2002. It is amassing one of the largest and fastest growing biodiversity data resources in existence. For example, in 2006, participants reported more than 4.3 million bird observations across North America. The system administrators encourage participation by enabling users to create their own portal and thereby maintain their own records. Since the records are all self-supplied, the uncertainty for the spatial and temporal components is unknown, but they do attempt to confirm the attributes (birds reported) and double-check all outliers (spatial, temporal and attribute). The eBird coordinators state that: "ultimately we want to have a far-reaching database so that you can go back and look at trends across a wide geographic range, even on a 100-year time scale", thereby approaching the temporal coverage of existing museum and herbarium records.
Initiated in 1900, this project is based on visual identification of species observations, with data typically collected by many teams of individuals and then collated and coordinated by volunteer regional coordinators. Geographic precision is relatively low (a count circle is used, with a diameter of 15 km), as is the temporal precision (all birds seen within one calendar day). However the attribute accuracy is relatively high (at least from an 'observed/not observed' perspective). Given the consistent approach that has been used over the years, the records from year to year are comparable and, therefore, have been used by many scientific researchers.
The Breeding Bird Survey is an international project started in 1966 to track the status and trends of North American bird populations. It is based mainly on sound identification and results are typically collected by one or two individuals. Each survey route is 24.5 miles long with stops at 0.5-mile intervals. At each stop, a 3-minute point count is conducted. During the count, every bird seen or heard within a 0.25-mile radius is recorded. Surveys start one-half hour before local sunrise and take about 5 hours to complete. Over 4100 survey routes are located across the continental U.S. and Canada. While the results are of medium spatial accuracy, the attributes are highly reliable since those individuals that participate know bird songs. The results are compatible across years and have been cited in many research papers.
- E-Flora BC: The Atlas of the Plants of British Columbia
E-Flora BC (and its sister project E-Fauna BC) is a regional biogeographic atlas project that also serves as a one-stop shop for biogeographic and ecological information on all plant, lichen and fungi species in BC. Atlas pages include mapping, illustrations, species descriptions and ecological information. The first atlas pages for vascular plants went public in 2004, and since then atlas pages have been added for fungi, and many species of lichens, bryophytes and algae. E-Flora BC is inherently a VGI project because the atlas maps present specimen-based distribution information collected by botanists (often volunteers) that is geo-referenced and based upon verified plant collections. However, E-Flora has now entered an additional realm of VGI through its new photo record mapping component, where photo records that are submitted with precise location information can be mapped. The value of these records depends heavily on the accuracy of the identification of the species in the photo, and E-Flora works with experts to review ID accuracy on a regular basis.
New VGI Initiatives--Engaging Volunteers
Because volunteer initiatives are based on developing a cohort of keen individuals with enough experience and skill in their area of interest to make significant contributions, it is important to ensure that the VGI system they participate in meets certain requirements. Volunteers must remained engaged, and the product of their work should be readily seen. Some keys factors in retaining volunteer contributions include the following:
- One of the most important aspects to volunteer data collection centers around contributions: contributions to the system should be easy. Complex systems that require too many steps will turn volunteers off and reduce the numbers of contributions.
- Contributors want to be able to view the results of their contributions, especially maps. While interactive GIS maps are the best way to map data, they are not necessarily the most user friendly. It is important to develop maps that are readily viewable and do not require familiarity with GIS mapping icons, for example. The means by which the public interacts with the maps should be as intuitive as possible without the need for extensive tutorials to view the data.
- Web sites and public interfaces should be current. If the information provided on a project web site is not current, then users and participants will drift.
Today, even as species loss is accelerating, many species ranges are as yet undocumented and new species continue to be described, underscoring the critical need for data collection in biodiversity studies. Moreover, there is a critical need to know more about species abundance and population trends. While scientists drive many discoveries (see this article on recent species finds in Papua New Guinea), including data gathering and biodiversity analyses, determining the volume of changes that are occurring in natural populations is beyond the scope of many researchers. There is a growing need for volunteers to become the eyes and ears of researchers who are investigating changes to biodiversity, and this need can be accommodated if the key components of accuracy, reliability and durability can be met.
Cornell Lab of Ornithology. 2008. eBird. Accessed March 2008. Available here.
Goodchild, Michael F. 2008. Citizens as Censors: the World of Volunteered Geography. Accessed March 2009. Available here.
Citizens as Censors: the World of Volunteered Geography (Goodchild)