Developing a vocabulary and ontology for modeling insect natural history data
This article was contributed by Brian Stucky, Florida Museum of Natural History.
Figure1: A specimen of the cicada Hadoa duryi, available on the iDigBio portal.
Which predators might help control a newly emerging, invasive pest insect? If a particular plant species becomes locally extinct, what consequences will it have for the local ecosystem? How do the organisms in a particular habitat interact with one another? The answers to these and countless other research questions across a wide range of disciplines all depend on detailed information about what organisms in natural environments do: how they behave, where they live, what they eat, and so on. Collectively, these kinds of information are known as natural history information. In terrestrial ecosystems, no other group of multi-cellular organisms are more abundant or more diverse than insects, so natural history information about insects is therefore of outsize importance for answering many kinds of biological questions.
Even though researchers have been gathering insect natural history (NH) data for hundreds of years, and despite the importance of NH data as a fundamental source of biological information, actually using insect NH information with modern research methods is often prohibitively difficult. There are a variety of reasons for this, but a key reason is that insect NH information is often highly heterogeneous. For example, suppose we are interested in NH information about the cicada Hadoa duryi (Figure 1). We might find information about what predators eat it (did you know that turtles sometimes catch and eat cicadas?!), what plants it is associated with, where it lays its eggs, what times of day it is most active, what time of year the adult cicadas emerge from underground, or any number of other things. We would also discover that NH information comes in many different formats, including published literature, field notebooks, and labels on museum specimens (Figure 2). Given this great diversity in information, compiling insect NH information into general-use databases has proven to be a major challenge.
Fortunately, the field of knowledge representation, a subdiscipline within computer science, has developed powerful tools that can help us solve this problem. One key theoretical idea from knowledge representation research is a computational structure called an ontology. An ontology provides a formal framework for describing, organizing, and working with complex information. Ontologies work by providing rich, precisely defined vocabularies for describing information, and they also provide formal logic statements that allow computers to integrate complex information into a single data resource. Thus, a well-designed, comprehensive ontology for insect natural history information would represent a major step toward making insect NH data more easily accessible and usable for scientific research.
Figure 2. The labels for the specimen in Figure 1, which include natural history information.
In a recent paper (https://doi.org/10.3897/BDJ.7.e33303), I, along with colleagues from 8 other institutions, report on a new effort to develop just such an ontology for insect NH information. We launched the project with a 3-day, iDigBio-sponsored workshop in the summer of 2018, held at the University of Florida. In our paper, we report several key preliminary results that are essential for building a functional ontology. In this article, I will highlight three of these initial results.
First, we spent many hours before, during, and after the workshop compiling insect NH information from a variety of sources, but we especially focused on data from museum specimen labels, with emphasis on specimens that have been digitized and published on iDigBio’s data portal (Figures 1 and 2, e.g.). Our goal was to develop a database of example insect NH data that provided comprehensive coverage (or as near to that as possible) of the kinds of NH information found on insect specimen labels. Good example data is essential for ontology development, and to our knowledge, no such information resource had previously been developed for insect NH information. To encourage broad use of our database of example data, we have made it freely available and published it in five different formats.
Second, we analyzed our example data to determine the scope and key conceptual areas required for an ontology of insect NH information. Insect NH information encompasses much more than just information about the insects themselves, including information about such things as weather and climate, collecting and laboratory techniques, and geographic information. We identified 10 key conceptual areas that would be sufficient for describing virtually all natural history information from insect specimens, and we also identified existing ontological resources that provide at least partial coverage of some of those conceptual areas.
Third, we analyzed the potential users and use cases for an ontology of insect NH information, and we developed a set of competency questions for the ontology. Competency questions are precisely written questions that can be used to test whether an ontology provides the functionality required for a particular user and use case, and they are an important aid for ontology development. We determined that the potential user base of an ontology of insect NH information is very broad, ranging from traditional scientific disciplines such as entomology and ecology, to applied fields such as conservation biology and agriculture, to education and public outreach. Given the broad user base, we developed a large set of competency questions, and we linked each competency question and example datum to one or more use cases to allow for easy cross-reference among these resources.
Most ontology developers agree about at least one thing: building robust, broadly useful ontologies for complex information is hard. Although numerous challenges remain (some of which we describe in our paper), we are optimistic that with the work described here, we have laid a solid foundation for developing a rich ontology for insect NH information. All of the results summarized above are freely available, and we encourage anyone with interest to have a look and to get in touch with us if they’d like to learn more or contribute. Finally, we want to acknowledge the importance of specimen digitization efforts, and iDigBio in particular, not only for materially supporting efforts such as ours but also for making information about millions of insect specimens available online. Without these data resources, developing, testing, and using an ontology for insect NH information would undoubtedly be far more difficult!