Explore
Home 
Literature 
Links 
Posts 
Molecules 
Blogs 
Zeitgeist 
Markup Help 
News 
Everything Papers Books
ABSTRACT: BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain -- such as the development of a standard aromatic model for SMILES -- the ability to create the same SMILEs using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.

Posts

Earlier this year I wrote up some Chemical Toolkit Rosetta examples of using the CDK in Scala (github/cdk/cdk-scala-examples). When I was writing this it sprung to mind that it would be cool to (ab)use one feature for interoperability between cheminformatics...
In the previous post I outline the changes to SMILES Parsing in the Chemistry Development Kit (CDK). The original plan was to have several posts detailing the changes but in the end it was more practical to put this in a single release note document (available:...
Early on Wednesday I presented my recent paper on Universal SMILES at the New Orleans ACS. This is a canonical SMILES string that uses the InChI canonical labels. Usually I tell the audience that the slides will be made available, but this time there was someone...
I'll be presenting at the Spring ACS National Meeting in New Orleans in just over a week. The last ACS I was at was three years ago so I'm looking forward to catching up with what's been going on, and meeting up with some familiar faces. I've got three talks...
I believe that the Open chemistry community will wish to move towards InChI as the definitive approach for all canonicalisation in their codes. We have found that "unique SMILES" is not precisely defined and there is no accepted reference implementation that...
Take a look at our ten most accessed papers for September.Among the most popular articles last month was an article in Journal of Cheminformatics covering the Open Molecule Generator software. The program represents the first general purpose open source structure...
I’m a great fan of SMILES notation (simplified molecular-input line-entry system) as a compact means of storing chemical structures, and whilst there are many tools for creating SMILES strings they often give different (but acceptable) results. Various...