Challenges for adopting the next generation of bio-ontologies into bioinformatics applications
A few years ago (say, around 2008), the bio-ontology community faced several challenges, of which my partial and biased recollection yields:
- Advertising ontologies as an approach to standardisation (adoption by biocuration efforts)
- Getting the community to buy into some basic principles for ontology structuring and development, normalization, standard ID policies, delineation of domain content between projects so as to avoid duplication, etc.
- Interconverting OBO and OWL
- Working with tools developers to produce scalable reasoners able to cope with biology-scale and growing ontology sizes.
Many of those challenges were quietly resolved in the intervening years, in large part thanks to consorted efforts by members of the Gene Ontology Consortium and the OBO Foundry. Suddenly (or so it seems to me), almost every biological database is using one or more ontologies for annotation, and this is enabling useful downstream applications such as enrichment analyses and semantic similarity. (For an introduction to several of the applications of bio-ontologies, see Part 3 of Peter Robinson's book on the topic.) In fact, ontology use for standardisation in database curation has become so ubiquitous that I think we might have accidentally forgotten to hold a party to celebrate the success of the method. (For those interested in having a celebration drink, come along to Graz in the last week of July for some solid ontology celebrations...)
But success and technological evolution is bringing a new set of challenges. What are the primary challenges that we need to focus on at the moment? (This is my personal list -- please leave a comment if you think I've got it wrong or missed something important.)
- Usability of tools for working with OWL
- Working with multiple ontologies
- Training specifically targeting biologists
- Community integration and strengthening
Ontologies have historically often been visualized as graphs or trees. Their inheritance hierarchy transforms itself straightforwardly into a tree browser for on-screen database navigation, and the directed relationships make neat graph diagrams. This is the visualisation strategy used in many of the existing visualization platforms such as in BioPortal, AmiGO, QuickGO and the Ontology Lookup Service. However, as bio-ontologies become more complex with respect to the number of relationships used, and increase in size, these visualizations are becoming too large to be useful, going the way of the "dreaded hairball" of interactomics network analysis.
The following picture is a screenshot of the wonderful OLSVis tool overloaded by a particularly densely interconnected entity from the Fly Brain anatomy ontology.
Doesn't it seem that we have a problem of too much information? Targeted and flexible information hiding so that the information displayed is restricted to a highly relevant subset of the available information is a pressing challenge.
Another challenge for ontology visualization is developing a standard visualization for the different sorts of semantics available in OWL 2. How should a value restriction be illustrated differently from an existential quantification? How should data properties look as compared to object properties? What should OWL:Thing and OWL:Nothing actually look like, and (worst of all) how do we display complex anonymous nested class expressions? I'm hoping to see answers to all of these questions being presented at ontology and informatics conferences soon.
2. Usability of OWL tools
It's official: OBO-Edit has been stabilised at its present release version, and new enhancements will no longer be developed. It simply doesn't make financial sense to continue to maintain a parallel universe in OBO as the rest of the world converges around the OWL language and the associated Protégé tool. Unfortunately, the interface associated with OBO-Edit had been highly optimised to be useful and usable by the bio-ontology curators that were its audience. So far, Protégé usability has not been able to deliver quite the metadata features and usableness that OBO-Edit had. I hope that someone is working on this problem -- and I'm looking forward to seeing the result!
3. Working with multiple ontologies
(Many bio-)Ontologies are no longer being developed as monolithic stand-alone entities that include all the terms needed for annotation of anything, anywhere. A collaborative and decentralised approach has yielded multiple domain-specific reference ontologies supported by cross-ontology bridging "cross products". This means that it is increasingly insufficient to work with single ontologies at a time, and ontology developers and users are instead facing the challenge of working with a patchwork of modules from various ontologies that are related through bridging relationships. Tool support for this scenario is only slowly catching up with the need.
4. Training specifically targeting biologists
There is still a need for the development and delivery of comprehensive bio-ontology training packages focusing on all of the different aspects of bio-ontology theory and applications. The diversity that such training courses need to accommodate is staggering. From symbolic mathematical logic and computer science to biology; describing existing domain-delineated ontologies and showcasing GUI-based, online and library tools. These training packages need to target ontology developers and users, that is, mostly biologists and bioinformaticians. Thus far, there have been isolated courses offered (one was held fairly recently at the EBI by Chris Mungall and David Osumi-Sutherland), but these have not reached nearly enough potential users and there is a lot of room for extension of the activity. Barry Smith and I will be presenting a bio-ontology tutorial alongside ECCB in Basel later this year, on the 9th September, and the materials for that will be made available subsequently.
5. Community integration and strengthening
Without wanting to point to any specific event or exchange, let's just say that the communities of developers and users of ontologies, as well as the theoreticians who contribute to improving ontology principles and the computer scientists who make and improve the great underlying technology, have not always seen eye to eye. There has been a fair amount of griping in every direction, and perhaps that is a natural and normal part of an emerging scientific community when thrown together across historic disciplinary boundaries. However, as bio-ontology is interdisciplinary at heart, there is no way around working together. More community-wide bridging efforts are needed in order to resolve entrenched differences and hostilities and to secure productive broad-spectrum collaborations.
Earlier this week, I briefly presented on these topics to the Evolutionary Bioinformatics group of the Swiss Institute of Bioinformatics, where I have recently become an embedded bioinformatician (that's just a sort of official collaboration, not a new job).
Finally, working with bio-ontologies in Java is as easy as it should be
I'm a big fan of the OWL API. I've been using it for years to do useful things with OWL ontologies in Java. It's a comprehensive and powerful reference library for working with all aspects of the OWL language, and the repository also includes implementations for several experimental language extensions, such as the description graphs that we used to represent chemicals in our 2010 OWLED paper.
The problem is, the library is comprehensive, powerful, beautifully engineered... and has been inexplicably reported to be extremely difficult for software beginners or part-time-scripters-who-really-just-need-to-write-some-code-as-a-small-part-of-a-bigger-more-interesting-project to pick up.
The OWLTools project is designed to come to the rescue for Java programmers working with standard features of bio-ontologies in both OWL and OBO. Under the lead of Chris Mungall from the Gene Ontology Consortium, it has been developed as a volunteer effort bringing together convenience methods developed in the context of several bio-ontology engineering efforts. The result provides a comprehensive suite of simple, easy-to-use functions and classes that do useful things with ontologies in very few, easily readable, lines of code.
In what follows I will describe some of the features. Disclaimer: I'm not a developer of this library, just a fan -- it's probably best to go to the authors if you have questions. The purpose of this post is to highlight how neat and useful the OWLTools library is, not to bash the OWL API for being more complex: behind the scenes, it's still the OWL API is doing all the work.
1. Loading ontologies, retrieving their content
To load an ontology that is in either OBO or OWL format, it takes only a few short lines of code (for brevity, the extracts below omit essential Java stuff such as imports, class declarations, exception handling and so on):
String ontologyIRI="http://www.myontology.online/ontology.owl"; ParserWrapper pw = new ParserWrapper(); OWLGraphWrapper g = pw.parseToOWLGraph(ontologyIRI); OWLOntology ont = g.getSourceOntology();
The OWLGraphWrapper class provides a wealth of convenience and simple access methods that make coding with OWL ontologies much, much easier.
You can extract standard bio-ontology metadata (as is normally captured in OBO ontologies and in ontologies using the oboInOWL metadata standards) using simple convenience methods:
String label = g.getLabel(c); String def = g.getDef(c); List<ISynonym> syns = g.getOBOSynonyms(c);
By the way, the effectiveness of this sort of utility across a heterogeneous group of ontologies rests on the standardisation of metadata and ID formats across the suite of bio-ontologies, a hard-won goal achieved by OBO Foundry efforts.
There are also convenience classes for rendering the ontology as an image and for printing it neatly to the console for debugging purposes.
2. Working with multiple ontologies at the same time
A challenge in using OWL for engineering ontologies according to the OBO Foundry recommended practice of ontology modularity has been the need to compose developing ontologies from multiple imported modules of other ontologies and bridges to higher-level and upper-level ontologies. Working with the relevant composite ontologies programmatically has been difficult when using the straight OWL API; OWLTools provides a few utilities that make this easier, such as the Multiple Ontologies Concatenation tool Mooncat:
g.addSupportOntologiesFromImportsClosure(); Mooncat mooncat = new Mooncat(emOntoGraph); mooncat.mergeOntologies();
After this step, you can search across all of the terms in the full set of ontologies as if they were a single OWLOntology and accompanying OWLGraphWrapper object -- but the different ID spaces are preserved.
Another handy utility available for working with multiple ontologies at the same time is computing the transitive closure.
3. Analysis method implementations, such as semantic similarity
The library has implementations for various semantic distance measures and different calculations for semantic similarity. For example (taken from the test suite):
OWLGraphWrapper wrapper = getOntologyWrapper("lcstest3.owl"); CombinedJaccardConjunctiveSetSimilarity sa = new CombinedJaccardConjunctiveSetSimilarity(); OWLObject a = wrapper.getOWLObject("http://example.org#axon_terminals_ca2"); OWLObject b = wrapper.getOWLObject("http://example.org#axon_terminals_ca3"); SimEngine se = new SimEngine(wrapper); sa.calculate(se, a, b); sa.print();
You can also, in one line of code, extract minimal OWL-EL modules for super-fast reasoning:
OWLGraphWrapper gEL = InferenceBuilder.enforceEL(g);
And what about performing a diff between two versions of an ontology:
OWLGraphWrapper baseLine = pw.parseToOWLGraph("regulation_xp-baseline.obo"); OWLGraphWrapper change = pw.parseToOWLGraph("regulation_xp_addon.obo"); ReasonerDiff diff = ReasonerDiff.createReasonerDiff(baseLine, change, InferenceBuilder.REASONER_HERMIT); List<OWLAxiom> newAxioms = diff.getNewAxioms(); List<OWLAxiom> rmAxioms = diff.getRemovedInferredAxioms();
4. And much more!
This short introduction was just a taster to get your appetite going. The library contains much more than the simple features that have been described above, including script and web services support. Best of all, it's free and open source. Get started with OWLTools by checking out the source code from the Google code project, and then look at the examples in the JUnit tests. Thanks to the developers of the project and especially to the Berkeley Bioinformatics Open Source Projects team -- keep up the good work!