... or, what makes an ontology an ontology?
Every now and then, thankfully not that frequently, somebody asks me "What's the point of ChEBI, when MeSH already does chemical classification?"
MeSH does indeed provide an organised controlled vocabulary that serves a similar function to ChEBI with respect to the organization of text-based literature. In fact, you can now compare ChEBI and MeSH classification side-by-side in the newly released PubChem classification component of the PubChem chemical knowledge base, along with the chemical classification from KEGG and others. MeSH (Medical Subject Headers) is used to index and organise literature in MEDLINE. They have a dedicated chemical section in the Supplementary Concept Records, which included about 300,000 chemical terms in 2010. More information about chemicals in MeSH can be found in Stefan Schulz's presentation to the 2010 ChEBI User Group workshop on the topic of the representation of chemicals in large medical vocabularies.
That all sounds quite good, actually. So what's the down side?
Simply put, MeSH is not an ontology. It's not even a chemical database. It's a thesaurus. And while it is well optimised for the application scenario for which it was designed -- indexing of primary literature -- it suffers from several limitations with respect to more general use cases for chemical knowledge representation.
1) MeSH is name-based rather than structure-based, making it largely inaccessible to cheminformatics applications and hindering its application to integration of chemical entities appearing in different chemical or biological databases where different naming conventions are used. ChEBI maintains chemical structures in MOL format internally and exports popular cheminformatics formats SMILES and InChI as well as maintaining cross-references of identifiers such as CAS and Reaxys where we can find those as open data. ChEBI is thus structure-searchable, and indeed enables chemical structure searching across all databases that use ChEBI IDs for chemical annotation, such as the IntAct protein interaction database.
2) MeSH uses only a small number of relationships, such as is-a or part-of, which are not formally (logically) defined and are in some cases ambiguously used, .such as using the same classification relationship to link chemicals to their pharmacological action, classification parents and parts. That means that the relationships cannot be used as a basis for automated reasoning with logic-based tools, such as is needed in the context of the Semantic Web. ChEBI uses the is-a and has-part relationships strictly, linking chemicals to their pharmacological actions using a has-role relationship, and using several structural relationships to link biologically relevant structurally related chemicals together such as conjugate bases and acids and tautomers.
3) MeSH does not provide stable unique identifiers for chemicals and chemical classes that can be used to annotate chemicals in the context of biological databases. ChEBI does, following the OBO Foundry ID policy.
Given all of the above, ChEBI enables a much broader range of applications in bioinformatics, metabolomics and chemical biology than MeSH does. Yes, ChEBI is smaller. But we're slowly but surely catching up.