SQL: Tables can be joined with other tables to compute useful information. For example:
How many structures are matched by each atom/fragment type?
Update xlogp.simplex set train_freq = (select count(smiles) from xlogp.training_set
where matches(smiles,smarts) group by smarts);
Which structures contain atoms not matched by any atom type?
Select smiles from training_set, atom_types group by smiles
having count_matches(smiles,'*') <> sum(count_matches(smiles,smarts));
- No smiles found, as expected
Which atom types are in my new training set that were not in the original training set?
Select smarts from epa.hanschleo1985_atom_types
Except
Select smarts from xlogp.atom_types;
| smarts |
|---|
| [$([Br+0v1H0D1](N))] |
| [$([C+0v4H0D2](#N)(N))] |
| [$([c+0v4H0D3](:c)(:c)(P))] |
| [$([c+0v4H0D3](-c)(:n)(:o))] |
| [$([c+0v4H0D3](:c)(:n)(:s))] |
| [$([c+0v4H0D3](:c)(:s)(N))] |
| [$([c+0v4H0D3](-n)(:c)(:n))] |
| [$([C+0v4H0D3](=N)(C)(S))] |
| [$([C+0v4H0D3](=N)(N)(c))] |
| [$([c+0v4H0D3](-n)(:n)(:n))] |
| [$([C+0v4H0D3](=N)(N)(O))] |
| [$([C+0v4H0D3](=N)(N)(S))] |
| [$([c+0v4H0D3](:n)(:o)(C))] |
| [$([c+0v4H0D3](:n)(:s)(C))] |
| [$([C+0v4H0D3](=O)(C)([Se]))] |
| [$([C+0v4H0D3](=O)(N)(S))] |
| [$([C+0v4H0D3](=O)(O)(S))] |
| [$([c+0v4H0D3](:o)(:s)(=O))] |
| [$([C+0v4H0D3](=S)(N)(S))] |
| [$([C+0v4H0D4](C)(C)(C)(P))] |
| [$([C+0v4H0D4](C)(C)(N)(O))] |
| [$([C+0v4H1D2](=N)(C))] |
| [$([C+0v4H1D3](C)(Cl)(c))] |
| [$([C+0v4H1D3](C)(Cl)(F))] |
| [$([C+0v4H1D3](C)(O)(S))] |
| [$([C+0v4H1D3](F)(F)(O))] |
| [$([C+0v4H1D3](N)(c)(c))] |
| [$([C+0v4H1D3](N)(O)(c))] |
| [$([C+0v4H1D3](N)(S)(c))] |
| [$([C+0v4H1D3](O)(O)(c))] |
| [$([C+0v4H2D2](N)(O))] |
| [$([I+0v1H0D1](N))] |
| [$([K+0v1H0D1](O))] |
| [$([n+0v3H0D3](:c)(:c)(:c))] |
| [$([N+0v3H0D3](C)(S)(c))] |
| [$([n+0v3H0D3](:n)(:n)(C))] |
| [$([N+0v3H0D3](S)(S)(c))] |
| [$([N+0v3H1D2](N)(S))] |
| [$([N+0v3H1D2](S)(S))] |
| [$([N+0v4H0D3](=O)(C)(C))] |
| [$([n+0v5H0D3](:c)(:c)(=O))] |
| [$([N+0v5H0D4](=C)(Br)(C)(c))] |
| [$([N+0v5H0D4](=C)(Br)(C)(C))] |
| [$([N+0v5H0D4](=C)(C)(C)(I))] |
| [$([N+0v5H0D5](Br)(C)(C)(C)(c))] |
| [$([N+0v5H0D5](Br)(C)(C)(C)(C))] |
| [$([N+0v5H0D5](C)(C)(C)(C)(I))] |
| [$([N+0v5H0D5](C)(C)(C)(C)(O))] |
| [$([Na+0v1H0D1](O))] |
| [$([O+0v2H0D1](=n))] |
| [$([O+0v2H0D2](C)([Na]))] |
| [$([O+0v2H0D2]([Na])(c))] |
| [$([O+0v2H0D2](N)(c))] |
| [$([o+0v2H0D2](:n)(:n))] |
| [$([O+0v2H0D2](S)(c))] |
| [$([O+0v2H0D2](S)([K]))] |
| [$([O+0v2H0D2](S)([Na]))] |
| [$([P+0v5H0D4](=O)(N)(N)(O))] |
| [$([P+0v5H0D4](=O)(O)(O)(c))] |
| [$([S+0v2H0D2](N)(N))] |
| [$([S+0v6H0D4](=O)(=O)(C)(C))] |
| [$([S+0v6H0D4](=O)(=O)(O)(c))] |
| [$([S+0v6H0D6](F)(F)(F)(F)(F)(c))] |
| [$([Se+0v1H0D1](C))] |