by Peter Robinson and Sebastian Bauer
I finally got around to reading the excellent 'Introduction to Bio-Ontologies' by Peter Robinson and Sebastian Bauer from Charité, Berlin, which has been sitting on my desk for a few months. Peter Robinson is the person behind the Human Phenotype Ontology which has complemented the OMIM description of genetic diseases with computable defintions that allow computational reasoning, and which has been used to infer similarity between mouse and human phenotypes (see, e.g. Köhler S, Schulz MH, Krawitz P, Bauer S, Dölken S, Ott CE, Mundlos C, Horn D, Mundlos S, Robinson PN. Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies. The American Journal of Human Genetics [Volume 85, Issue 4, 457-464]).
Introduction to Bio-Ontologies is styled as a textbook for students, complete with exercises at the end of every chapter. The book is divided into four parts. The first part of the book gives definitions for ontology and bio-ontologies, gives an overview of the basic logic and statistics that are required in later chapters, and presents the structure of the most common ontology languages OBO and OWL. The second part of the book surveys and gives a short introduction to several of the most common bio-ontologies including the Gene Ontology, several anatomy ontologies, Chemical Entities of Biological Interest, Sequence Ontology, Mammalian Phenotype Ontology, and Human Phenotype Ontology. It also devotes some pages to upper-level ontologies, such as the Basic Formal Ontology and the Relation Ontology. The third part of the book goes into detail on the technicalities of the graph-based algorithms underlying ontology-based overrepresentation analysis, semantic similarity analysis, and Bayesian network inferencing. The fourth and final part of the book covers logic-based ontology reasoning, the formal semantics of semantic web languages RDF and RDFS, OWL inference rules and a key inference algorithm (of the sort that underlies OWL reasoners such as HermiT), the SPARQL query language, and the state of the art for querying OWL ontologies.
The book is extremely impressive in its broad range and depth of discussion, tackling the series of issues surrounding bio-ontologies, logic, content, statistics, and applications, in a satisfyingly deep and detailed fashion. Without a doubt, it is a must read for any student or professional who needs to develop and use algorithms to analyze and interpret data annotated with ontologies. It will also serve as a handy introduction to the bio-ontology domain for core computer science researchers intending to further develop the underlying reasoning technology, and for statisticians who need to improve methods for analysis of bioinformatics or medical data.
However, something to bear in mind is that a lot of the material contained is extremely technical and dense, and might be difficult for an entirely non-technical person to separate out the core introductory elements relating to bio-ontology usage from the technical material needed if one intends to go on to develop further algorithms around bio-ontology analysis. Much of the introductory non-technical information is in there, but it is sometimes strangely placed and hidden behind pages of dense impenetrable technical content. Therefore, to complement the book I would suggest the following roadmap through the overall text to start with a non-technical beginner's introduction to bio-ontologies before going into the rest of the details that are covered in brilliant depth:
- Chapter 1 provides an introduction to the meaning of the word "ontology" and how it is used in biology and biomedicine.
- Chapter 5 contains an introduction to the Gene Ontology.
- Pages 145--147 give definitions for the core relationships used across multiple bio-ontologies.
- Remainder of Chapter 6 introduces core terminology from upper-level ontologies
- Pick and choose ontologies of interest from Chapter 7 that surveys existing bio-ontologies.
- Getting a bit more technical but still relatively introductory, Chapter 4 provides an introduction to the ontology languages, OBO, RDF (for data description on the semantic web) and OWL. Protégé, one of the main ontology editors, also gets a quick introductory overview.
- And on to the rest of the book for those with interest in the details of the logic, algorithms and applications.
Basic Formal Ontology in plain English
Following on from my last post, here is an attempt to give an introduction to Basic Formal Ontology (BFO) that conveys the "bare minimum" basics needed for adoption, but avoids overly technical or philosophical language. Disclaimer: this post contains my personal (mis) understanding only, and in no way reflects the position of the BFO creators and developers. Nor do they in any way endorse what follows.
1. What is an upper level ontology such as BFO, and why is it needed?
Ontologies are being developed for many different domains within the life sciences: gene function and biological process (GO), chemistry (ChEBI), proteins (PR), anatomy of various different organisms, diseases, phenotypes, and so on. Each of these different ontologies is being developed by different groups with different institutional and community sources of funding, data and support. They are different artifacts and they evolve differently -- with different people maintaining them, striving to meet different use cases and objectives. And there is nothing amiss with all of that.
Increasingly, however, use cases are arising that need to use more than one of these ontologies together to achieve some goal. Perhaps the objective is to look at patterns in 'omics data across different levels of description, or to search for candidate genes for a range of different conditions in different model organisms that are phenotypically related. When you need to use more than one ontology within the same pipeline, you might end up in the situation that the two ontologies a) might paritally overlap, especially in their upper level / root structure, without explicit mappings clarifying the intended relationship between them, b) use similar sounding relationships without clarifying whether the relationships are intended to mean the same thing, and/or c) reimplement or redefine classes that are defined differently elsewhere.
In short, it helps if you can agree on some simple structures and relationships that can be reused across multiple ontologies, and avoid having to think about such general classes over and over again in different ontology building efforts.
2. Understanding BFO's main content classes
The primary distinction in BFO is between continuants and occurrents. This has to do with how entities relate to time, one of the most fundamental questions to ask when deciding what sort of thing something is when you are trying to place it in an ontology in an appropriate hieararchical position. Continuants continue to exist self-identically over a period of time, while occurrents are always unfolding and changing as time progresses until they are finished, and thereafter they are a part of history. For example, the geographic region considered to be a part of France is a continuant, as it has continued to exist while remaining the very same geographical region over a long period of time. However, World War I was an occurrent, as it unfolded and changed between the years of 1914 and 1918. As a clue that WWI was an occurrent and not a continuant, consider the fact that while it was progressing, it was impossible to say what its full description would be -- until it was finished and became part of history. On the other hand, it possible at any moment that France exists to describe the geographic region it encompasses (given access to the right sort of information).
Most continuants are material entities, just like you and me. Material entities are those that are composed of matter. In fact, you and I are special sorts of material entities -- we are the sort of material entity that holds itself together one way or another and moves around as a unit. Such material entities are called objects. That's a nice simple word getting given a more precise definition. Many of the sorts of entities that scientists investigate are objects in this sense -- cells, organisms, atoms, molecules, trees, flowers, strands of DNA are all objects. But in some cases, scientists are interested not in single objects but in groups of objects that are usefully investigated together -- such as entire populations or ecosystems, or indeed the aggregate of female football players in the European Union. These are classified as object aggregates in BFO.
There are another special sort of continuant that is very important for many scientific ontologies, and that is the sorts of continuant that are dependent on other entities for their very existence. Dependence here isn't in the loose sense that a child, say, depends on his or her biological parents for his or her existence. I call this loose because we can imagine a future in which technology would have advanced far enough that it would be possible to genetically and environmentally synthesise an entire organism without any parents in the picture at all. The sort of dependence that is meant here is much stricter than that. It means that it isn't even possible to imagine the existence of these sorts of entities without the things they depend on. A prime example is colour. Can you imagine a colour that isn't the colour of something or other? Colour is a dependent entity, since it always depends on the entity that it is the colour of. In fact, colour is a quality, an example of the sort of dependent entity that is fully evident at all times that it exists. (It is evident even in the dark, because it is always possible to see it if you just shine a light on it.) There are other sorts of dependent entities that are not fully evident until the entity that bears them comes into the right sort of circumstances to trigger something to happen. These are dispositions. For example, the fragility or breakability of glass are dispositions, since they exist whether or not the glass is ever broken, but are only manifested in case the glass hits a hard surface with a certain velocity, or is vibrated with the right sort of frequency. Human beings have all sorts of different complex dispositions to act or feel in certain ways. I, for example, have the disposition to feel rather giddy and dizzy when I see a sheer drop of some distance near me (vertigo). Some dispositions are extremely important for their bearers because they have resulted in their bearer being selected or designed, either naturally or artifically. These are called functions. The function of a screwdriver is to screw screws into things. The function of a screw is to get screwed into things, and to thereafter resist falling out or being dislodged. The function of DNA is to self-reproduce, and to be transcribed into all sorts of wonderful biological molecules, that themselves have many different functions in an incredibly diverse myriad of life-sustaining processes.
Speaking of processes, that brings us neatly back to the other branch of BFO, occurrents. Processes are the paradigm example of occurrents. Processes extend over time and have participants. The German army participated in the process of waging war during World War I, as did the French army. DNA molecules participate in processes of transcription. Chemical molecules participate in chemical reactions, which in turn give rise to further molecules that participate in further reactions and so on in complex pathway chains. Processes, such as reactions, might occur faster or slower. Structural properties of processes, such as their rates or the number of cycles, are called process profiles in BFO. A very special sort of process relates continuants to their entire life histories -- such as Einstein, and Einstein's life. These are called histories in BFO. You and I have not yet complete histories since our lives are still ongoing. Our histories are occurrents that are still unfolding in time. So, too, is the development of BFO a history that is still unfolding in time. So if you see problems or have questions, send an email to the BFO-discuss mailing list with your concerns or suggestions.
The image below gives the hierarchy of these different entities within BFO. Remember that this is just a subset of the full BFO ontology -- for all the nitty gritty details, go to the source.
3. Understanding BFO's relationships
Ontologies, in the most general sense of the word, contain entities and relationships. The 'is a' relationship is the special relationship that describes the hierarchical relationship of entities within the ontology, and this is called 'subClassOf' in OWL.
Parthood is another very important way that things can be related to each other. BFO recognises two different sorts of parthood -- parthood between continuants, such as the parthood between my toes and my foot, and parthood between occurrents, such as that between my childhood and my life. The former is called 'part of continuant', while the latter is called 'part of occurrent'. Note that when continuants have parts, it is necessary to specify at what time they have such parts. An apple is a part of an apple tree at some point in its history, but when it falls to the ground it is no longer part of the tree. For this reason, continuant parthood is a time-indexed relationship in BFO. In the OWL version of BFO, there are two versions of the relationship available -- one specifying parthood at all times, and the other specifying parthood at some time. The apple is 'part of continuant at some time' the tree, while a cell membrane is 'part of continuant at all times' a cell, since it wouldn't very well be a cell membrane if it wasn't.
The relationship between dependent continuants and the entities that they are dependent on is 'inheres in', its converse is 'bearer of'. That is, the apple is the bearer of the red colour, while the red colour inheres in the apple. As all relationships between continuants are time-indexed, so too are bearer of and inheres in. Fragility inheres in glass at all times, but this particular colour red doesn't inhere in the apple at all times.
Between objects and processes, the primary relationships is that of participation. Dispositions (such as fragility) have a special relationship to the processes in which they are manifested -- that of realization. So the fragility of the glass is manifested in the breaking process that occurs when the glass hits the cold tiled floor after my elbow accidentally pushes it off the table.
4. Using just what you need from BFO
If you, like most domain ontology developers, don't want to show your users technical terminology such as 'dependent continuant,' or force them to navigate down four levels of BFO class hierarchy before they see your domain-specific terminology, NO PROBLEM. You can hide or minimise the impact of your BFO ontology mapping as much as you (and your users) would like. The most important thing about your mapping to BFO is that it is available for ontologists and expert users. If you don't want your average domain-expert users to see your BFO mapping at all, you can maintain the mapping in a separate ontology file thanks to OWL's axiom-level modularity. Or you can extract just those classes that you want using the MIREOT mechanism as implemented by OntoFox (or even manually). An extracted SLIM of this sort is especially useful in the case of the object properties (relationships), for which the latest release of BFO in OWL contained more than 50!
BFO is already available in OWL and in FOL, and mappings already exist for several popular domain ontologies such as the Gene Ontology and ChEBI. And, as I hope I have illustrated, mapping to BFO needn't be a big headache.
ECCB Bio-Ontologies tutorial, Basel, 9 September 2012
On the 9th September, Barry Smith and I gave a bio-ontology tutorial in advance of the ECCB conference held in Basel, Switzerland. We had a modest but comfortable group of around 20 registrants, which filled our allocated tutorial room very nicely, with row after row of eager interested faces waiting to learn everything there was to know about BFO, relations, bio-ontologies for chemical biology, and Protégé.
Our schedule might have been slightly ambitious for a one-day course, including as it did comprehensive theoretical background as well as a practical hands-on component complete with group discussion, editing ontologies using Protégé, and reuse of terms and relations from existing bio-ontologies. But I think that we did reasonably well in covering most of the material we planned. I didn't get to give the quite the showcase of third-party applications powered by ontologies as I would have liked, and Barry didn't get to go into quite as much detail in the new improved aspects of BFO 2.0 as he would have liked. But by and large, I think that we covered the basics, and I had a few positive comments afterwards to confirm as much.
All through the day I found myself seeing our bio-ontology domain through the eyes of those who are new to it. (Thanks to the great participants for their comments, discussions, questions and feedback!) This was an interestingly different perspective for me. I've sort of gotten used to seeing bio-ontologies from inside the community, and I've forgotten what it was like at the outset. Besides, I came to bio-ontology from computer science, so it was a different entry point to the bioinformaticians and research scientists that were our audience in Basel. And the tools have moved on since the days of my entry into the community, as the technology has matured and ontologies have become more mainstream. It's a great community. I thoroughly enjoy being part of it. Teaching bio-ontologies is something I'm keen on doing, inter alia, precisely so that I can extend this great community further to include many more nice people. But as the day progressed, some unique challenges for bio-ontology education started to become increasingly apparent to me.
Firstly, we have a language problem. A really, really bad language problem. Don't believe me? OK, what is the main vocabulary through which we need to teach bio-ontology to beginners? Starting with BFO, we have entities, which can be continuants or occurrents or realizable or [...], and then of course there are relationships between these. We learn from OBO that these things are actually terms and relations, and suddenly we have to add all sorts of additional metadata such as xrefs and broader or narrower synonyms. But then someone suggests we should move to the increased power and interoperability offered by OWL, and then we suddenly need to work with classes, object properties, restrictions and individuals. And to create a relationship between two OWL classes, which incidentally are always Things except when there is something wrong with them and they become Nothings, although, hang on, Nothing is also a type of Thing... but I digress. In order to create a relationship between two of these classes, we need to create a `sub class of' a restriction (existential or universal) on an object property. Better still, to actually make full use of that increased power that comes with OWL, you have to add logical definitions which take the form of `equivalent classes' rather than subclasses. Don't worry, though, these are almost the same as the cross-products in OBO.
Does that make perfect sense, everyone?
The point I am trying to make is that to improve ontology education for scientists, to get more out of short training courses in the time available to all of us in our busy schedules, we need to settle on one non-contradictory vocabulary for the full range of subject matter that we want beginners to the domain to become competent in, and then create introductory training courses using that vocabulary. And it would be best if that vocabulary could be intuitive. Yes, we still need the niche terminology for the experts, and that's OK. But what we sorely lack is a reasonable springboard for getting non-experts initiated into the domain gently. And it's up to us to come up with that springboard. The tools that we expect beginners to use need to reflect that straightforward, intuitive vocabulary and still allow people to interact with the power of the underlying formalisms -- while hiding the complexity.
Secondly, and relatedly, (some of) our tools are (in at least some ways) hopelessly unusable. By normal human beings, that is. And quite likely even by moderately abnormal human beings. OK, what I'm getting at here is you have to really love ontologies or be some kind of psychopath if you are still getting kicks out of hunting down the entity rendering preferences somewhere hidden in the menu options in order to enable the display of actual names rather than incomprehensible strings of numbers when trying to view standard scientific ontologies such as the GO in your tool. Yes, I know that this particularly notorious `feature' of Protégé was recently fixed in the 4.2-beta release, and I am very glad for the hard work and effort the Protégé team have put into iterative improvements in usability. I don't want to list specific examples of what I think are unusable aspects of our tools here, as I know that the developers of these tools do really great work with very little by way of resources. All of these tools are powerful and flexible and do a great job for many people, including myself. I just want to make the very specific point that ontology training isn't going to get any easier until we find the time and money from somewhere to invest into the development of more usable tools targeting the scientists who are our greatest potential customers. Scientists don't have enough time to master a bewildering array of contradictory and opaque vocabulary before they can learn how to use a new tool, and they don't have enough time to learn to get around all the quirks of an unusable tool interface.
Agreeing on a core, easy-to-understand language with which to explain the bio-ontology domain to novices is essential. Creating simple, easy-to-teach views on powerful resources such as BFO and Protégé, and making limited lists of `beginner-friendly' relationships from which to select for re-use in most domain applications, just to ease people in, will enable wider-reaching, more successful ontology education. The in-depth details will always be waiting to be discovered when they are needed.