Water solubility

Derive complex atom types yields 196 atom types with frequency > 2

solubility_atom_types
smarts train_freq coefficient error
[O+0v2H0D1](=C) 710 0.025 0.132
[c+0v4H1D2](:c)(:c) 573 -0.150 0.021
[O+0v2H1D1](C) 492 0.236 0.068
[C+0v4H3D1](C) 451 -0.066 0.041
[C+0v4H2D2](C)(C) 373 -0.201 0.018
[C+0v4H0D3](=O)(C)(O) 366 -0.207 0.145
[c+0v4H0D3](:c)(:c)(C) 358 -0.291 0.070
[C+0v4H2D2](C)(O) 249 -0.094 0.061
[O+0v2H0D2](C)(C) 229 0.146 0.090
[c+0v4H0D3](:c)(:c)(N) 201 0.019 0.112
[c+0v4H0D3](:c)(:c)(O) 174 0.296 0.184
[n+0v3H0D2](:c)(:c) 159 0.014 0.113
[C+0v4H0D3](=O)(C)(N) 151 0.009 0.146
[N+0v3H1D2](C)(C) 150 -0.050 0.110
[N+0v3H2D1](C) 139 -0.128 0.098
[C+0v4H0D3](=O)(O)(c) 126 -0.336 0.152
[C+0v4H2D2](C)(N) 124 -0.011 0.062
[C+0v4H1D3](C)(C)(O) 122 -0.189 0.083
[C+0v4H1D3](C)(C)(N) 115 0.105 0.104
[O+0v2H0D1](=S) 114 -0.054 0.084
[O+0v2H1D1](c) 112 -0.571 0.195
[c+0v4H0D3](:c)(:c)(:n) 106 -0.360 0.159
[N+0v3H2D1](c) 105 -0.387 0.130
[c+0v4H1D2](:c)(:n) 104 0.043 0.102
[c+0v4H0D3](:c)(:c)(S) 102 -1.035 0.147
[O+0v2H0D1](=c) 100 -0.003 0.224
[C+0v4H3D1](c) 97 0.037 0.118
[C+0v4H1D3](C)(C)(C) 92 0.007 0.076
[O+0v2H0D1](=N) 86 -0.139 0.047
[C+0v4H1D2](=C)(C) 79 -0.226 0.078
[c+0v4H0D3](:c)(:n)(:n) 78 -0.543 0.293
[C+0v4H3D1](O) 76 0.034 0.103
[n+0v3H1D2](:c)(:c) 73 -0.365 0.181
[O+0v2H0D2](C)(c) 68 -0.632 0.198
[C+0v4H2D2](C)(c) 66 0.128 0.130
[N+0v5H0D3](=O)(=O)(c) 63 -0.445 0.167
[c+0v4H0D3](:c)(:c)(:c) 59 -0.489 0.078
[c+0v4H1D2](:n)(:n) 58 0.001 0.198
[c+0v4H0D3](:c)(:n)(=O) 57 -0.137 0.293
[C+0v4H0D3](=O)(N)(N) 57 -0.194 0.201
[N+0v3H1D2](C)(c) 57 -0.349 0.158
[N+0v3H0D3](C)(C)(C) 52 -0.044 0.179
[Cl+0v1H0D1](c) 51 0.082 0.264
[C+0v4H2D2](C)(S) 49 -0.314 0.091
[S+0v6H0D4](=O)(=O)(N)(c) 47 0.850 0.274
[c+0v4H0D3](:n)(:n)(=O) 45 -0.314 0.331
[c+0v4H0D3](:c)(:n)(C) 43 -0.022 0.148
[o+0v2H0D2](:c)(:c) 42 -0.625 0.205
[C+0v4H0D4](C)(C)(C)(C) 40 -0.053 0.135
[C+0v4H3D1](N) 40 0.275 0.114
[c+0v4H0D3](:c)(:c)(Cl) 39 -0.520 0.277
[C+0v4H0D3](=C)(C)(C) 37 -0.257 0.128
[c+0v4H0D3](:c)(:n)(N) 36 -0.224 0.205
[C+0v4H0D3](=O)(C)(C) 36 -0.269 0.173
[Cl+0v1H0D1](C) 36 -0.162 0.092
[O+0v2H1D1](S) 34 0.786 0.386
[S+0v6H0D4](=O)(=O)(O)(c) 34 0.883 0.398
[C+0v4H0D3](=O)(N)(c) 33 0.192 0.201
[C+0v4H0D4](C)(C)(C)(O) 33 0.115 0.147
[n+0v3H0D3](:c)(:c)(C) 33 0.109 0.264
[C+0v4H3D1](n) 32 0.055 0.258
[N+0v3H1D2](S)(c) 31 -0.856 0.297
[C+0v4H0D3](=O)(C)(c) 30 -0.384 0.214
[N+0v3H1D2](C)(N) 29 -0.176 0.137
[O+0v2H0D1](=P) 29 0.371 0.219
[C+0v4H1D3](C)(O)(O) 28    
[C+0v4H2D2](O)(c) 28    
[N+0v3H0D1](#C) 27    
[S+0v2H0D2](C)(C) 27    
[c+0v4H0D3](:n)(:n)(N) 26    
[Br+0v1H0D1](c) 24    
[c+0v4H0D3](:c)(:o)(C) 24    
[c+0v4H0D3](:c)(:c)(Br) 22    
[C+0v4H2D1](=C) 22    
[N+0v3H0D2](=C)(N) 21    
[O+0v2H0D2](C)(P) 20    
[S+0v2H0D1](=C) 20    
[C+0v4H0D2](#N)(C) 19    
[Br+0v1H0D1](C) 18    
[c+0v4H0D3](:c)(:o)(=O) 18    
[C+0v4H1D3](C)(C)(c) 18    
[s+0v2H0D2](:c)(:c) 18    
[c+0v4H0D3](:c)(:c)(I) 16    
[C+0v4H2D2](C)(Cl) 16    
[O+0v2H1D1](N) 16    
[N+0v3H0D3](C)(C)(c) 15    
[C+0v4H0D3](=N)(C)(C) 14    
[c+0v4H0D3](:n)(:n)(C) 14    
[I+0v1H0D1](c) 14    
[n+0v3H0D2](:c)(:n) 14    
[N+0v3H1D1](=C) 14    
[c+0v4H0D3](:c)(:c)(=O) 13    
[C+0v4H0D3](=O)(N)(O) 13    
[C+0v4H1D2](=O)(C) 13    
[C+0v4H1D3](C)(O)(c) 13    
[N+0v3H0D3](C)(C)(N) 13    
[N+0v3H2D1](N) 13    
[c+0v4H0D3](:c)(:c)(:o) 12    
[C+0v4H0D3](=N)(N)(N) 12    
[c+0v4H1D2](:c)(:o) 12    
[C+0v4H2D2](N)(c) 12    
[C+0v4H2D2](O)(O) 12    
[C+0v4H3D1](S) 12    
[C+0v4H0D4](C)(C)(C)(N) 11    
[C+0v4H0D4](C)(C)(O)(O) 11    
[C+0v4H1D2](=C)(c) 11    
[N+0v3H0D2](=C)(O) 11    
[N+0v3H1D2](c)(c) 11    
[S+0v2H0D2](c)(c) 11    
[S+0v2H0D2](C)(S) 11    
[S+0v2H1D1](C) 11    
[c+0v4H0D3](-c)(:c)(:c) 10    
[c+0v4H0D3](:c)(:o)(N) 10    
[C+0v4H1D2](=N)(c) 10    
[N+0v5H0D3](=O)(=O)(C) 10    
[P+0v5H0D4](=O)(O)(O)(O) 10    
[S+0v6H0D4](=O)(=O)(C)(C) 10    
[C+0v4H0D3](=C)(C)(O) 9    
[C+0v4H0D3](=O)(c)(c) 9    
[C+0v4H0D3](=S)(N)(N) 9    
[c+0v4H1D2](:c)(:s) 9    
[O+0v2H1D1](P) 9    
[c+0v4H0D3](-n)(:c)(:c) 8    
[c+0v4H0D3](:n)(:s)(N) 8    
[C+0v4H1D2](=N)(C) 8    
[C+0v4H1D2](=O)(c) 8    
[C+0v4H1D3](C)(C)(S) 8    
[F+0v1H0D1](C) 8    
[N+0v3H2D1](S) 8    
[c+0v4H0D3](:c)(:c)(F) 7    
[c+0v4H0D3](:c)(:c)(P) 7    
[c+0v4H0D3](:n)(:n)(Cl) 7    
[C+0v4H2D2](C)(n) 7    
[C+0v4H2D2](C)(P) 7    
[F+0v1H0D1](c) 7    
[N+0v3H0D2](=N)(c) 7    
[n+0v3H0D3](-c)(:c)(:n) 7    
[N+0v5H0D3](=O)(=O)(N) 7    
[P+0v5H0D4](=O)(C)(O)(O) 7    
[c+0v4H0D3](:c)(:s)(C) 6    
[C+0v4H0D4](C)(C)(O)(c) 6    
[C+0v4H1D3](Br)(C)(C) 6    
[C+0v4H1D3](C)(c)(c) 6    
[C+0v4H1D3](C)(N)(c) 6    
[N+0v3H1D2](C)(S) 6    
[S+0v2H0D2](C)(c) 6    
[S+0v6H0D4](=O)(=O)(c)(c) 6    
[c+0v4H0D3](-c)(:c)(:n) 5    
[C+0v4H0D3](=C)(C)(N) 5    
[c+0v4H0D3](:c)(:c)(:s) 5    
[c+0v4H0D3](:c)(:n)(S) 5    
[C+0v4H2D2](c)(c) 5    
[C+0v4H2D2](C)([Se]) 5    
[C+0v4H2D2](N)(N) 5    
[N+0v3H0D2](=C)(C) 5    
[N+0v3H1D2](N)(c) 5    
[O+0v2H0D2](C)(S) 5    
[S+0v2H0D1](=c) 5    
[S+0v2H0D1](=P) 5    
[Se+0v2H0D2](C)([Se]) 5    
[c+0v4H0D3](:c)(:n)(Cl) 4    
[c+0v4H0D3](-c)(:n)(:n) 4    
[C+0v4H0D3](=S)(N)(S) 4    
[C+0v4H0D4](C)(C)(C)(c) 4    
[C+0v4H0D4](C)(Cl)(Cl)(Cl) 4    
[C+0v4H0D4](O)(c)(c)(c) 4    
[C+0v4H1D2](=O)(N) 4    
[C+0v4H1D3](C)(C)(Cl) 4    
[n+0v3H0D2](:c)(:o) 4    
[N+0v3H0D3](C)(C)(P) 4    
[n+0v3H0D3](:c)(:n)(C) 4    
[n+0v3H1D2](:c)(:n) 4    
[N+0v5H0D3](=O)(=O)(O) 4    
[O+0v2H0D2](C)(N) 4    
[O+0v2H0D2](C)(O) 4    
[O+0v2H0D2](P)(c) 4    
[P+0v5H0D4](=O)(O)(O)(c) 4    
[S+0v4H0D3](=O)(C)(C) 4    
[C+0v4H0D2](=N)(=S) 3    
[C+0v4H0D2](#N)(S) 3    
[C+0v4H0D3](=C)(C)(Cl) 3    
[c+0v4H0D3](:c)(:n)(O) 3    
[C+0v4H0D3](=N)(N)(c) 3    
[c+0v4H0D3](:n)(:n)(O) 3    
[c+0v4H0D3](:n)(:n)(=S) 3    
[c+0v4H0D3](:n)(:s)(C) 3    
[C+0v4H0D4](C)(F)(F)(F) 3    
[C+0v4H0D4](C)(N)(N)(N) 3    
[C+0v4H0D4](F)(F)(F)(c) 3    
[C+0v4H1D2](=C)(O) 3    
[C+0v4H1D3](C)(Cl)(Cl) 3    
[C+0v4H1D3](C)(N)(O) 3    
[C+0v4H2D2](N)(O) 3    
[C+0v4H2D2](S)(c) 3    
[Cl+0v1H0D1](N) 3    
[O+0v2H0D1](=n) 3    
[o+0v2H0D2](:c)(:n) 3    
[P+0v5H0D4](=S)(O)(O)(S) 3    
[Se+0v2H0D2](C)(C) 3    
[C+0v4H0D2](#N)(c) 2    
[C+0v4H0D3](=C)(Br)(C) 2    
[c+0v4H0D3](:c)(:n)(=N) 2    
[c+0v4H0D3](:c)(:o)(Br) 2    
[C+0v4H0D3](=N)(C)(N) 2    
[C+0v4H0D3](=O)(N)(S) 2    
[C+0v4H0D4](Br)(Br)(Br)(C) 2    
[C+0v4H0D4](Br)(C)(C)(C) 2    
[C+0v4H0D4](C)(C)(c)(c) 2    
[C+0v4H0D4](C)(C)(C)(S) 2    
[C+0v4H0D4](C)(C)(N)(c) 2    
[c+0v4H1D2](:n)(:s) 2    
[C+0v4H1D3](Br)(Br)(C) 2    
[C+0v4H1D3](C)(C)([Se]) 2    
[C+0v4H1D3](C)(N)(N) 2    
[C+0v4H1D3](C)(N)(S) 2    
[C+0v4H1D3](C)(O)(n) 2    
[C+0v4H2D2](Br)(C) 2    
[C+0v4H2D2](C)([Hg]) 2    
[C+0v4H2D2](c)(n) 2    
[C+0v4H2D2](S)(S) 2    
[C+0v4H3D1]([Si]) 2    
[N+0v3H0D2](=C)(c) 2    
[N+0v3H0D2](=C)(S) 2    
[n+0v3H0D2](:n)(:s) 2    
[N+0v3H0D2](=O)(N) 2    
[N+0v3H0D3](C)(c)(c) 2    
[N+0v3H0D3](C)(N)(c) 2    
[N+0v3H1D2](C)(Cl) 2    
[n+0v5H0D3](:c)(:c)(=O) 2    
[O+0v2H0D1](=I) 2    
[O+0v2H0D2](c)(c) 2    
[O+0v2H1D1]([Si]) 2    
[P+0v5H0D4](=O)(C)(C)(O) 2    
[s+0v2H0D2](:c)(:n) 2    
[S+0v2H1D1](c) 2    
[S+0v4H0D3](=O)(C)(S) 2    
[S+0v6H0D4](=O)(=O)(C)(c) 2    
[S+0v6H0D4](=O)(=O)(C)(N) 2    
[S+0v6H0D4](=O)(=O)(C)(O) 2    
[Si+0v4H0D4](C)(C)(O)(O) 2    
[C+0v4H0D2](#C)(c) 1    
[C+0v4H0D2](=C)(=C) 1    
[C+0v4H0D2](#C)(C) 1    
[C+0v4H0D2](#N)([Hg]) 1    
[C+0v4H0D2](#N)(I) 1    
[C+0v4H0D2](#N)(N) 1    
[C+0v4H0D3](=C)(Br)(Br) 1    
[C+0v4H0D3](=C)(c)(c) 1    
[C+0v4H0D3](=C)(C)(c) 1    
[C+0v4H0D3](=C)(Cl)(Cl) 1    
[C+0v4H0D3](=C)(Cl)(S) 1    
[c+0v4H0D3](-c)(:c)(:o) 1    
[C+0v4H0D3](=C)(N)(c) 1    
[c+0v4H0D3](:c)(:n)(=S) 1    
[c+0v4H0D3](:c)(:o)(Cl) 1    
[c+0v4H0D3](:c)(:s)(Cl) 1    
[C+0v4H0D3](=N)(C)(c) 1    
[C+0v4H0D3](=N)(C)(O) 1    
[c+0v4H0D3](:n)(:n)(:n) 1    
[c+0v4H0D3](:n)(:n)(S) 1    
[C+0v4H0D3](=N)(N)(S) 1    
[c+0v4H0D3](:n)(:o)(C) 1    
[c+0v4H0D3](:n)(:o)(S) 1    
[c+0v4H0D3](:n)(:s)(=N) 1    
[c+0v4H0D3](:n)(:s)(=S) 1    
[C+0v4H0D3](=O)(C)(S) 1    
[C+0v4H0D3](=O)(O)(O) 1    
[C+0v4H0D3](=S)(C)(N) 1    
[C+0v4H0D3](=S)(N)(O) 1    
[C+0v4H0D3](=S)(O)(S) 1    
[C+0v4H0D3](=S)(S)(S) 1    
[C+0v4H0D4](Br)(N)(N)(N) 1    
[C+0v4H0D4](C)(c)(c)(c) 1    
[C+0v4H0D4](C)(C)(S)(S) 1    
[C+0v4H0D4](Cl)(Cl)(Cl)(P) 1    
[C+0v4H0D4](C)(O)(c)(c) 1    
[C+0v4H1D1](#C) 1    
[C+0v4H1D2](=C)(Br) 1    
[C+0v4H1D2](=C)(Cl) 1    
[C+0v4H1D2](=C)(S) 1    
[C+0v4H1D2](=N)(N) 1    
[C+0v4H1D2](=O)(O) 1    
[C+0v4H1D3](Br)(C)(S) 1    
[C+0v4H1D3](Br)(F)(F) 1    
[C+0v4H1D3](C)(Cl)(O) 1    
[C+0v4H1D3](C)(Cl)(S) 1    
[C+0v4H1D3](C)(F)(F) 1    
[C+0v4H1D3](C)(S)(c) 1    
[C+0v4H1D3](C)(S)(S) 1    
[C+0v4H1D3](N)(N)(c) 1    
[C+0v4H1D3](O)(c)(c) 1    
[C+0v4H1D3](S)(S)(S) 1    
[C+0v4H2D2](Cl)(n) 1    
[C+0v4H2D2](Cl)(N) 1    
[C+0v4H3D1](P) 1    
[C+0v4H3D1]([Se]) 1    
[Cl+0v1H0D1](S) 1    
[Hg+0v2H0D2](C)(C) 1    
[Hg+0v2H0D2](C)(S) 1    
[I+0v1H0D1](C) 1    
[I+0v3H0D2](=O)(c) 1    
[I+0v5H0D3](=O)(=O)(c) 1    
[N+0v3H0D2](=c)(N) 1    
[N+0v3H0D2](=c)(O) 1    
[N+0v3H0D2](=c)(S) 1    
[N+0v3H0D2](=N)(C) 1    
[N+0v3H0D2](=N)(N) 1    
[n+0v3H0D2](:[se])(:c) 1    
[n+0v3H0D3](:c)(:c)(:c) 1    
[N+0v3H0D3](C)(C)(S) 1    
[N+0v3H0D3](Cl)(Cl)(S) 1    
[n+0v3H0D3](-c)(:n)(:n) 1    
[N+0v3H0D3](C)(N)(O) 1    
[N+0v3H0D3](C)(O)(c) 1    
[N+0v3H1D2](C)(O) 1    
[n+0v3H1D2](:n)(:n) 1    
[N+0v3H1D2](N)(P) 1    
[N+0v3H1D2](O)(c) 1    
[N+0v3H1D2](S)(S) 1    
[N+0v3H2D1](P) 1    
[n+0v5H0D3](:c)(:o)(=O) 1    
[N+0v5H0D4](=O)(C)(C)(C) 1    
[o+0v2H0D2](:n)(:n) 1    
[O+0v2H0D2](S)(c) 1    
[O+0v2H0D2]([Si])([Si]) 1    
[O+0v2H1D1](O) 1    
[P+0v5H0D4](=O)(N)(N)(c) 1    
[P+0v5H0D4](=O)(N)(N)(N) 1    
[P+0v5H0D4](=O)(N)(N)(O) 1    
[P+0v5H0D4](=O)(N)(O)(O) 1    
[P+0v5H0D4](=S)(N)(N)(c) 1    
[P+0v5H0D4](=S)(N)(N)(N) 1    
[P+0v5H1D3](=O)(O)(c) 1    
[P+0v5H1D3](=O)(O)(O) 1    
[S+0v2H0D2](C)([Hg]) 1    
[S+0v2H0D2](C)(P) 1    
[S+0v2H0D2](P)(S) 1    
[S+0v2H0D2](S)(S) 1    
[S+0v2H1D1](P) 1    
[S+0v4H0D3](=O)(O)(O) 1    
[S+0v6H0D4](=O)(=O)(C)(Cl) 1    
[S+0v6H0D4](=O)(=O)(C)(S) 1    
[S+0v6H0D4](=O)(=O)(N)(O) 1    
[S+0v6H0D4](=O)(=O)(O)(O) 1    
[se+0v2H0D2](:n)(:n) 1    
 

Coefficient computation

  • epa test set of 1200 compounds from Beilstein at various temperatures, some unreported
  • gNova CHORD count_match(smiles, smarts) function
  • R statistical program linear models function to correlate counts with experimental logP

Residual standard error: 0.9012 on 1003 degrees of freedom
Multiple R-Squared: 0.6326, Adjusted R-squared: 0.5608
F-statistic: 8.811 on 196 and 1003 DF, p-value: < 2.2e-16