Home | About | Events | Contact

Applications of Bio-Ontologies in Large-Scale Data-Driven Science: A Practical Introduction

ECCB 2012 Tutorial, September 9th 2012, Basel, Switzerland

Prerequisites

Please remember to

install Protégé 4.2.

on your laptop and bring it along to the tutorial with you.

Schedule and Materials

Time Session detail  
09:00 Introduction and Principles
During this session, we will introduce ontology research and development with their application in bioinformatics and computational biology. The principles of good ontology development in support of multiple biologically relevant applications such as standardization and classification will be outlined. In order to reason upon and draw inferences from data to which an ontology has been applied it is essential that the relationships be carefully defined, otherwise the data entry is insecure and the results are unpredictable. We will use case studies to illustrate situations to be avoided and the subtleties of intended meaning underlying various relationship types. The primary emphasis will be on illustrating key issues through examples which will be used to guide ensuing discussions. We will introduce the OBO Foundry project, its aims and methodology.
Slides
10:00 Basic Formal Ontology
In this session, we will outline the purpose of upper-level ontologies in ensuring interoperability and quality in ontology development. We will introduce Basic Formal Ontology, which is used as a foundational upper-level ontology to ensure interoperability in over 100 projects in the biomedical domain. The latest version of the ontology (2.0) includes significant improvements over the earlier version (1.1), and these will be detailed.
10:30 Coffee break
11:00 Relation Ontology
We will continue the previous session by introducing the Relation Ontology, and illustrate with practical examples how BFO and RO enable interoperability.
11:30 Bio-Ontologies and their Applications in Chemical Biology
In this session, we will provide an in-depth overview of several of the biological ontologies that are currently available, with a focus on those that are relevant for chemical biology. This will include the Chemical Entities of Biological Interest ontology (ChEBI), the Gene Ontology (GO), the Protein Ontology (PRO) and the Ontology of Biomedical Investigations (OBI). We will cover the motivation and scope of each of the ontologies, the relationships used and the logical inferences they support. We will also survey data sets to which the ontology has been applied. We will further illustrate how the ontologies are being used, with practical examples from recent research in text mining, systems biology modelling, and metabolic network reconstruction.
Slides
12:30 Lunch  
13:30 Constructing and Using Interoperable Ontologies with Protégé
In this hands-on session, we will provide an introduction to using Protégé and the Web Ontology Language OWL for ontology editing. This will include creating and editing ontologies, annotating ontologies with metadata such as synonyms, definitions, evidence codes, and comments, and creating appropriate relationships between entities. We will further introduce some of the more sophisticated features of the Protégé application, including automatic ontology classification using reasoners such as Pellet and HermiT, and sophisticated, flexible querying using Description Logic-based queries. In preparation for the practical sessions that follow, we will show how multiple ontologies can be edited at the same time, and show how modules can be extracted from existing ontologies via the MIREOT mechanism, as applied in the OntoFox tool implementation, in support of re-use of existing ontologies.
Slides
14:30 Ontology Construction Practical
During this practical session, students will participate in the construction of an ontology, learning by doing. The students will be separated into groups, and each group will be assigned an example that will be selected from challenging areas of emerging biology where existing standards have not yet matured. Students will then develop ontologies to describe these areas of biology, drawing on existing ontologies where appropriate. This will be highly interactive as the group will need to discuss the relative pros and cons for each term and relationship that is added.
Practical task description
15:00 Coffee break  
15:30 Ontology Construction Practical, continued
We will continue the practical session.
 
16:00 Ontology Harmonization and Discussion
Students will have generated different ontologies in the previous session. In this final session we will bring the various ontologies under a common root level in order to use them simultaneously for reasoning. The task will illustrate the utility of ontology-based reasoning across integrated interoperable ontology modules. It will highlight the need for harmonization of different representations and emphasise practical principles for ensuring interoperability. Through discussion, students will gain insights into methods to achieve consensus in the community development of shared ontologies.
 
17:00 Close of workshop  

This tutorial will be offered as a pre-ECCB tutorial on the 9th September 2012. Instructors are Barry Smith and Janna Hastings. Register for the tutorial here.

Introduction

Modern experimental techniques in biology are generating data at unprecedented rates, but analysis and interpretation still lags behind the available data, and data are still rarely combinable and reusable by independent groups. Ontological frameworks such as the Gene Ontology (GO) have been used for more than a decade to provide a shared language for communicating biological information that promotes integration of biological knowledge and thereby addresses the analysis and integration bottleneck. Common controlled vocabularies modelled on the GO have now been created in a large variety of different fields, and these ontologies are being used to support navigation through very large volumes of data in ways which can allow formulation and testing of complex hypotheses. Ontologies are also being used to provide a means to address mandates imposed by funding agencies for reusability of data, by providing the means to describe research results in ways which allow them to be discovered by users.


In recent years, new powerful editing and reasoning tools have made it much easier to develop and use ontologies, for example within the framework of the Semantic Web / Linked Open Data. This tutorial will present the use of these tools in a non-technical, biologically relevant fashion. Unfortunately, many of these newly created ontologies have been developed in isolation for specific local purposes with little attention to their applicability to multiple disparate data sets; the result, in many cases, is that new ontologies being created do not in fact serve large-scale cross-community needs. Ontologies will be truly beneficial to biological data integration and organisation only if common best practices are employed in a way that ensures semantic interoperability. The most important prerequisites are: that the ontologies are non-overlapping, that they are accepted and used by the broader scientific community, and that they are well-structured and marked by logical rigor and terminology consistency.


This tutorial will focus on practical strategies for the achievement of these objectives. We will introduce the Open Biomedical Ontologies (OBO) Foundry project, which has played a pivotal role in coordinating and standardizing ontology development in the biomedical domain, and the Basic Formal Ontology (BFO), which allows interoperation of multiple biomedical ontologies by providing a shared upper level. The practical components of the tutorial will involve ontology construction, modular re-use of parts of ontologies to reduce redundancy between ontologies, and discussion to achieve community agreement on the ontology structure. We will draw for illustration on examples taken from the chemical biology domain, including relevant existing bio-ontologies for chemistry, screening experiments and targets.
 

Expected Goals

This tutorial will provide instruction in all areas of ontology development to aid computational biological research, with a special focus on chemical biology. The specific learning outcomes for the students will be:

  • Understanding the background to existing bio-ontology development and application efforts;
  • Becoming familiar with what is currently available in various biological domains and with the most important success stories and common reasons for failure;
  • Gaining practical experience with modern ontology development tools such as the Protégé ontology editor and associated automated reasoners;
  • Developing an appreciation for the essential tried and tested principles that are required to build a robust formalization of an ontology that brings maximal benefits in search, integration and reasoning over the data that the ontology is used to annotate.

The overall goal of this tutorial is to foster communication and the adoption of best practices within the community, to encourage cooperative development, and to ensure that biological ontologies are created that are sufficiently well principled that they may be reasoned over within an open cumulatively growing framework.

Participants are expected to be generally familiar with bioinformatics, and have experience in using at least one biological database. Some familiarity with the Gene Ontology will be useful, but is not required. No prior use of tools such as Protégé is required, as participants will gain this skill during the tutorial.