SQL Functions for Smarts Matching

Function SQL example
matches Select count(smiles) from nci.structure where matches(smiles,'c1ccccc1C(=O)NC');

Select count(smiles) from nci.structure where matches('c1ccccc1C(=O)NC',smiles);
-- example 2 --

Select count(smiles) from nci.structure where
bit_contains(gfp, fp('c1ccccc1C(=O)NC')) and matches(smiles,'c1ccccc1C(=O)NC');
-- example 3 --
count_matches Select count(smiles) from nci.structure where count_matches(smiles,'[N,n,O,o]') > 5;

Update nci.structure set hetero_count = count_matches(smiles,'[!C!c]');
list_matches Select list_matches(smiles,'NSN')   from nci.structure where matches(smiles,'NSN');

Select list_matches(smiles,'NSN',0) from nci.structure where matches(smiles,'NSN');
All the above functions are installed into a SCHEMA named gnova. They can be accessed as, for example gnova.matches. Or, you can set your search_path to include the SCHEMA gnova and access them by their unqualified names, for example matches.

matches(text Smiles, text Smarts) returns boolean

This function takes a Smiles string and and a Smarts string and returns true if the Smarts matches the Smiles; otherwise it returns false. Any valid Smarts string can be specified. When the Smarts is actually a Smiles, this is the classic “search by substructure” that identifies structures containing the second-argument Smiles, without regard to hydrogen atoms.

Hints:

count_matches(text Smiles, text Smarts) returns integer

This function takes a Smiles string and and a Smarts string and returns the count of the number of times the Smarts matches the Smiles. It will return 0 when the Smarts cannot match the Smiles.

Hints: You could flag structures having more than 5 N and O atoms (ala Lipinski's rules). You could find all structures with between two and five amide bonds. See the description of the tpsa function for a cool use of this function.

list_matches(text Smiles, text Smarts) returns integer[]

This function takes a Smiles string and and a Smarts string and returns a list of the atoms numbers in the Smiles which match the Smarts. It returns an array of integers between 1 and the number of atoms in the Smiles. It returns null if there are no matches. It considers only the first match. See below for a more general function.

Hints: Color atoms which match using MarvinView and it's atom_set feature.

list_matches(text Smiles, text Smarts, integer imatch) returns integer[]

This function takes a Smiles string and and a Smarts string and returns a list of the atoms numbers in the Smiles which match the Smarts. It returns an array of integers between 1 and the number of atoms in the Smiles. The third argument, imatch, specifies which match to consider, when multiple matches occur. If imatch is 0 (or negative), then all matches will be returned. This will be a 2-dimensional array of size Nmatches x Natoms.

Hints: Color atoms which match using MarvinView and it's atom_set feature.