Challenges for adopting the next generation of bio-ontologies into bioinformatics applications
A few years ago (say, around 2008), the bio-ontology community faced several challenges, of which my partial and biased recollection yields:
- Advertising ontologies as an approach to standardisation (adoption by biocuration efforts)
- Getting the community to buy into some basic principles for ontology structuring and development, normalization, standard ID policies, delineation of domain content between projects so as to avoid duplication, etc.
- Interconverting OBO and OWL
- Working with tools developers to produce scalable reasoners able to cope with biology-scale and growing ontology sizes.
Many of those challenges were quietly resolved in the intervening years, in large part thanks to consorted efforts by members of the Gene Ontology Consortium and the OBO Foundry. Suddenly (or so it seems to me), almost every biological database is using one or more ontologies for annotation, and this is enabling useful downstream applications such as enrichment analyses and semantic similarity. (For an introduction to several of the applications of bio-ontologies, see Part 3 of Peter Robinson's book on the topic.) In fact, ontology use for standardisation in database curation has become so ubiquitous that I think we might have accidentally forgotten to hold a party to celebrate the success of the method. (For those interested in having a celebration drink, come along to Graz in the last week of July for some solid ontology celebrations...)
But success and technological evolution is bringing a new set of challenges. What are the primary challenges that we need to focus on at the moment? (This is my personal list -- please leave a comment if you think I've got it wrong or missed something important.)
- Usability of tools for working with OWL
- Working with multiple ontologies
- Training specifically targeting biologists
- Community integration and strengthening
Ontologies have historically often been visualized as graphs or trees. Their inheritance hierarchy transforms itself straightforwardly into a tree browser for on-screen database navigation, and the directed relationships make neat graph diagrams. This is the visualisation strategy used in many of the existing visualization platforms such as in BioPortal, AmiGO, QuickGO and the Ontology Lookup Service. However, as bio-ontologies become more complex with respect to the number of relationships used, and increase in size, these visualizations are becoming too large to be useful, going the way of the "dreaded hairball" of interactomics network analysis.
The following picture is a screenshot of the wonderful OLSVis tool overloaded by a particularly densely interconnected entity from the Fly Brain anatomy ontology.
Doesn't it seem that we have a problem of too much information? Targeted and flexible information hiding so that the information displayed is restricted to a highly relevant subset of the available information is a pressing challenge.
Another challenge for ontology visualization is developing a standard visualization for the different sorts of semantics available in OWL 2. How should a value restriction be illustrated differently from an existential quantification? How should data properties look as compared to object properties? What should OWL:Thing and OWL:Nothing actually look like, and (worst of all) how do we display complex anonymous nested class expressions? I'm hoping to see answers to all of these questions being presented at ontology and informatics conferences soon.
2. Usability of OWL tools
It's official: OBO-Edit has been stabilised at its present release version, and new enhancements will no longer be developed. It simply doesn't make financial sense to continue to maintain a parallel universe in OBO as the rest of the world converges around the OWL language and the associated Protégé tool. Unfortunately, the interface associated with OBO-Edit had been highly optimised to be useful and usable by the bio-ontology curators that were its audience. So far, Protégé usability has not been able to deliver quite the metadata features and usableness that OBO-Edit had. I hope that someone is working on this problem -- and I'm looking forward to seeing the result!
3. Working with multiple ontologies
(Many bio-)Ontologies are no longer being developed as monolithic stand-alone entities that include all the terms needed for annotation of anything, anywhere. A collaborative and decentralised approach has yielded multiple domain-specific reference ontologies supported by cross-ontology bridging "cross products". This means that it is increasingly insufficient to work with single ontologies at a time, and ontology developers and users are instead facing the challenge of working with a patchwork of modules from various ontologies that are related through bridging relationships. Tool support for this scenario is only slowly catching up with the need.
4. Training specifically targeting biologists
There is still a need for the development and delivery of comprehensive bio-ontology training packages focusing on all of the different aspects of bio-ontology theory and applications. The diversity that such training courses need to accommodate is staggering. From symbolic mathematical logic and computer science to biology; describing existing domain-delineated ontologies and showcasing GUI-based, online and library tools. These training packages need to target ontology developers and users, that is, mostly biologists and bioinformaticians. Thus far, there have been isolated courses offered (one was held fairly recently at the EBI by Chris Mungall and David Osumi-Sutherland), but these have not reached nearly enough potential users and there is a lot of room for extension of the activity. Barry Smith and I will be presenting a bio-ontology tutorial alongside ECCB in Basel later this year, on the 9th September, and the materials for that will be made available subsequently.
5. Community integration and strengthening
Without wanting to point to any specific event or exchange, let's just say that the communities of developers and users of ontologies, as well as the theoreticians who contribute to improving ontology principles and the computer scientists who make and improve the great underlying technology, have not always seen eye to eye. There has been a fair amount of griping in every direction, and perhaps that is a natural and normal part of an emerging scientific community when thrown together across historic disciplinary boundaries. However, as bio-ontology is interdisciplinary at heart, there is no way around working together. More community-wide bridging efforts are needed in order to resolve entrenched differences and hostilities and to secure productive broad-spectrum collaborations.
Earlier this week, I briefly presented on these topics to the Evolutionary Bioinformatics group of the Swiss Institute of Bioinformatics, where I have recently become an embedded bioinformatician (that's just a sort of official collaboration, not a new job).