Function | SQL example |
---|---|
matches | Select count(smiles) from nci.structure where
matches(smiles,'c1ccccc1C(=O)NC'); Select count(smiles) from nci.structure where matches('c1ccccc1C(=O)NC',smiles); -- example 2 -- Select count(smiles) from nci.structure where bit_contains(gfp, fp('c1ccccc1C(=O)NC')) and matches(smiles,'c1ccccc1C(=O)NC'); -- example 3 -- |
count_matches | Select count(smiles) from nci.structure where count_matches(smiles,'[N,n,O,o]')
> 5; Update nci.structure set hetero_count = count_matches(smiles,'[!C!c]'); |
list_matches | Select list_matches(smiles,'NSN') from nci.structure
where matches(smiles,'NSN'); Select list_matches(smiles,'NSN',0) from nci.structure where matches(smiles,'NSN'); |
matches(text Smiles, text Smarts) returns boolean
This function takes a Smiles string and and a Smarts string and returns true if the Smarts matches the Smiles; otherwise it returns false. Any valid Smarts string can be specified. When the Smarts is actually a Smiles, this is the classic “search by substructure” that identifies structures containing the second-argument Smiles, without regard to hydrogen atoms.
Hints:
- Switching the arguments, as in example 2 above, is "search for substructures" that identifies structures contained within the second-argument Smiles, without regard to hydrogen atoms.
- The speed of this type of search can be increased by creating a fingerprint column and using it as in example 3 above. Since the fp function requires a Smiles and not a Smarts, this type of speedup cannot be accomplished when you need to match by Smarts.
count_matches(text Smiles, text Smarts) returns integer
This function takes a Smiles string and and a Smarts string and returns the count of the number of times the Smarts matches the Smiles. It will return 0 when the Smarts cannot match the Smiles.
Hints: You could flag structures having more than 5 N and O atoms (ala Lipinski's rules). You could find all structures with between two and five amide bonds. See the description of the tpsa function for a cool use of this function.
list_matches(text Smiles, text Smarts) returns integer[]
This function takes a Smiles string and and a Smarts string and returns a list of the atoms numbers in the Smiles which match the Smarts. It returns an array of integers between 1 and the number of atoms in the Smiles. It returns null if there are no matches. It considers only the first match. See below for a more general function.
Hints: Color atoms which match using MarvinView and it's atom_set feature.
list_matches(text Smiles, text Smarts, integer imatch) returns integer[]
This function takes a Smiles string and and a Smarts string and returns a list of the atoms numbers in the Smiles which match the Smarts. It returns an array of integers between 1 and the number of atoms in the Smiles. The third argument, imatch, specifies which match to consider, when multiple matches occur. If imatch is 0 (or negative), then all matches will be returned. This will be a 2-dimensional array of size Nmatches x Natoms.
Hints: Color atoms which match using MarvinView and it's atom_set feature.