glogP complex atom types + xlogP training set

Derive complex atom types by considering atom, aromaticity, bond order, all neighbors. Yields 305 types. 72 types occur in only one structure in the training set, 185 in less than 10 structures. Use remaining 120 types.

atom_types
smarts train_freq coefficient error
[c+0v4H1D2](:c)(:c) 1385 0.324 0.010
[O+0v2H0D1](=C) 787 -0.338 0.110
[c+0v4H0D3](:c)(:c)(C) 734 0.136 0.045
[C+0v4H3D1](C) 556 0.554 0.023
[c+0v4H0D3](:c)(:c)(O) 509 -0.104 0.081
[c+0v4H0D3](:c)(:c)(N) 456 -0.003 0.064
[O+0v2H1D1](C) 411 -0.478 0.075
[C+0v4H2D2](C)(C) 378 0.445 0.012
[O+0v2H0D2](C)(c) 341 0.210 0.107
[C+0v4H2D2](C)(O) 308 0.014 0.074
[n+0v3H0D2](:c)(:c) 308 -0.226 0.120
[c+0v4H1D2](:c)(:n) 254 -0.097 0.065
[C+0v4H0D3](=O)(C)(O) 238 -0.075 0.128
[O+0v2H1D1](c) 200 -0.030 0.087
[O+0v2H0D2](C)(C) 199 -0.155 0.140
[C+0v4H2D2](C)(N) 185 -0.212 0.050
[Cl+0v1H0D1](c) 184 0.578 0.372
[C+0v4H3D1](c) 181 0.571 0.054
[N+0v3H1D2](C)(C) 181 -0.230 0.099
[c+0v4H0D3](:c)(:c)(Cl) 176 0.364 0.376
[N+0v3H2D1](c) 176 -0.610 0.075
[C+0v4H2D2](C)(c) 175 0.362 0.054
[C+0v4H3D1](N) 175 -0.054 0.057
[O+0v2H0D1](=N) 175 0.144 0.173
[c+0v4H0D3](:c)(:c)(:n) 168 0.128 0.069
[C+0v4H3D1](O) 156 0.161 0.081
[N+0v3H2D1](C) 154 -0.782 0.064
[C+0v4H0D3](=O)(C)(N) 145 -0.318 0.125
[c+0v4H0D3](:c)(:c)(:c) 139 0.288 0.023
[C+0v4H1D3](C)(C)(O) 117 -0.066 0.083
[N+0v5H0D3](=O)(=O)(c) 114 -0.072 0.342
[C+0v4H0D3](=O)(N)(O) 111 0.132 0.151
[C+0v4H0D3](=O)(O)(c) 111 0.862 0.131
[c+0v4H0D3](:c)(:c)(S) 110 -0.323 0.125
[N+0v3H1D2](C)(c) 108 -0.274 0.087
[n+0v3H1D2](:c)(:c) 95 0.025 0.129
[F+0v1H0D1](C) 89 0.377 0.044
[O+0v2H0D1](=S) 86 0.028 0.094
[C+0v4H1D2](=C)(C) 85 0.339 0.040
[N+0v3H0D3](C)(C)(C) 81 0.020 0.144
[N+0v3H0D1](#C) 79 -0.838 0.300
[O+0v2H0D1](=c) 78 -0.846 0.177
[Br+0v1H0D1](c) 65 0.721 0.141
[C+0v4H1D3](C)(C)(C) 64 0.011 0.046
[S+0v6H0D4](=O)(=O)(N)(c) 63 -0.178 0.193
[C+0v4H1D3](C)(C)(N) 61 -0.567 0.077
[n+0v3H0D3](:c)(:c)(C) 61 0.121 0.199
[c+0v4H0D3](:c)(:c)(Br) 60 0.417 0.147
[C+0v4H0D3](=O)(N)(c) 60 0.036 0.133
[c+0v4H0D3](:c)(:n)(=O) 58 0.048 0.210
[C+0v4H0D3](=O)(N)(N) 58 0.393 0.163
[c+0v4H0D3](:c)(:n)(C) 57 -0.372 0.091
[Cl+0v1H0D1](C) 55 0.596 0.037
[c+0v4H0D3](:c)(:n)(N) 53 0.389 0.118
[C+0v4H0D3](=O)(C)(c) 52 -0.067 0.118
[C+0v4H0D4](F)(F)(F)(c) 51 -0.005 0.153
[F+0v1H0D1](c) 49 -0.096 0.225
[c+0v4H1D2](:n)(:n) 47 -0.112 0.149
[C+0v4H0D2](#N)(C) 46 0.416 0.324
[C+0v4H0D3](=O)(C)(C) 46 -0.695 0.130
[c+0v4H0D3](:c)(:c)(F) 43 0.556 0.229
[C+0v4H1D2](=C)(c) 41 0.548 0.091
[C+0v4H1D3](C)(C)(c) 41 0.337 0.089
[C+0v4H2D1](=C) 41 0.437 0.069
[C+0v4H1D3](C)(O)(n) 40 0.085 0.204
[c+0v4H0D3](-c)(:c)(:c) 39 0.275 0.042
[C+0v4H0D3](=N)(C)(C) 39 0.662 0.371
[c+0v4H0D3](:n)(:n)(=O) 39 -0.007 0.249
[c+0v4H0D3](:c)(:n)(:n) 38 0.302 0.211
[C+0v4H0D4](C)(C)(C)(C) 38 0.093 0.093
[N+0v3H1D2](N)(c) 38 -0.238 0.244
[n+0v3H0D2](:c)(:n) 37 -0.235 0.105
[N+0v3H0D2](=C)(N) 37 1.613 0.470
[I+0v1H0D1](c) 36 0.407 0.360
[N+0v3H0D3](C)(C)(c) 35 0.359 0.132
[S+0v2H0D1](=C) 35 -0.183 0.159
[c+0v4H0D3](:c)(:c)(I) 34 1.038 0.369
[N+0v3H0D2](=O)(N) 33 -0.279 0.379
[N+0v3H2D1](S) 33 -0.825 0.205
[N+0v3H0D2](=C)(c) 32 -0.152 0.212
[C+0v4H0D2](#N)(c) 31 0.685 0.310
[C+0v4H3D1](S) 31 -0.444 0.117
[N+0v3H0D3](C)(C)(N) 31 0.033 0.359
[c+0v4H0D3](:n)(:n)(C) 29 -0.443 0.169
[c+0v4H0D3](:n)(:n)(N) 28 0.491 0.161
[C+0v4H2D2](O)(c) 28 -0.144 0.122
[S+0v2H0D2](C)(c) 28 1.462 0.176
[N+0v5H0D3](=O)(=O)(C) 27 -0.536 0.372
[O+0v2H0D2](C)(P) 26 0.092 0.116
[C+0v4H0D3](=C)(C)(C) 24 -0.040 0.105
[C+0v4H2D2](C)(S) 24 -0.408 0.106
[C+0v4H0D2](=N)(=S) 23 1.768 0.256
[Br+0v1H0D1](C) 22 0.976 0.108
[O+0v2H0D1](=P) 20 -1.184 0.165
[N+0v3H1D2](C)(S) 19 -0.348 0.255
[N+0v3H1D2](S)(c) 19 -0.616 0.198
[C+0v4H0D4](C)(C)(C)(c) 18 0.043 0.126
[C+0v4H1D2](=C)(N) 18 -0.062 0.181
[C+0v4H2D2](C)(Cl) 18 0.153 0.090
[S+0v2H0D2](C)(C) 17 1.487 0.182
[C+0v4H0D4](C)(F)(F)(F) 16 -0.204 0.170
[C+0v4H1D3](C)(O)(O) 16 -0.194 0.244
[C+0v4H3D1](n) 16 0.070 0.175
[N+0v3H0D2](=C)(C) 16 -0.184 0.159
[C+0v4H0D4](C)(C)(C)(O) 14 -0.509 0.144
[C+0v4H2D2](N)(c) 13 -0.017 0.145
[n+0v3H0D2](:n)(:n) 13 -0.482 0.132
[N+0v3H0D3](C)(c)(c) 13 0.712 0.226
[C+0v4H1D3](C)(C)(Cl) 12 0.034 0.048
[C+0v4H1D3](C)(O)(c) 12 0.114 0.169
[C+0v4H2D2](Br)(C) 12 -0.052 0.175
[s+0v2H0D2](:c)(:c) 12 0.925 0.150
[S+0v2H0D2](c)(c) 12 1.544 0.325
[c+0v4H0D3](:c)(:n)(Cl) 11 -0.082 0.374
[C+0v4H0D3](=N)(C)(N) 11 -0.382 0.242
[C+0v4H1D2](=O)(c) 11 0.169 0.180
[C+0v4H1D2](=O)(N) 11 0.006 0.190
[n+0v3H1D2](:c)(:n) 11 0.694 0.171
[O+0v2H0D2](P)(c) 11 0.549 0.229
[S+0v2H0D2](C)(P) 11 1.208 0.199
[C+0v4H1D2](=O)(C) 10    
[n+0v3H0D3](:c)(:c)(N) 10    
[N+0v3H2D1](n) 10    
[O+0v2H1D1](N) 10    
[c+0v4H0D3](:c)(:n)(O) 9    
[C+0v4H2D2](c)(c) 9    
[C+0v4H2D2](C)(n) 9    
[n+0v3H0D3](:c)(:n)(C) 9    
[N+0v3H1D2](c)(c) 9    
[o+0v2H0D2](:c)(:c) 9    
[S+0v2H0D1](=P) 9    
[S+0v6H0D4](=O)(=O)(C)(c) 9    
[c+0v4H0D3](:n)(:n)(S) 8    
[C+0v4H0D4](F)(F)(F)(S) 8    
[N+0v3H1D2](C)(N) 8    
[c+0v4H0D3](-c)(:c)(:n) 7    
[c+0v4H0D3](-n)(:c)(:c) 7    
[C+0v4H0D3](=N)(c)(c) 7    
[C+0v4H0D4](C)(Cl)(Cl)(Cl) 7    
[c+0v4H1D2](:c)(:s) 7    
[O+0v2H0D2](c)(c) 7    
[c+0v4H0D3](:c)(:c)(:o) 6    
[C+0v4H0D3](=O)(c)(c) 6    
[C+0v4H0D3](=S)(N)(N) 6    
[C+0v4H0D4](C)(C)(C)(N) 6    
[N+0v3H2D1](N) 6    
[P+0v5H0D4](=O)(O)(O)(O) 6    
[C+0v4H0D3](=C)(C)(N) 5    
[c+0v4H0D3](:c)(:c)(=O) 5    
[c+0v4H0D3](:c)(:n)(Br) 5    
[C+0v4H0D4](C)(C)(C)(S) 5    
[C+0v4H1D3](C)(N)(S) 5    
[C+0v4H2D2](O)(O) 5    
[n+0v3H0D3](-c)(:c)(:n) 5    
[P+0v5H0D4](=S)(O)(O)(O) 5    
[C+0v4H0D2](#C)(C) 4    
[c+0v4H0D3](:c)(:n)(F) 4    
[C+0v4H0D3](=S)(C)(N) 4    
[C+0v4H1D1](#C) 4    
[c+0v4H1D2](:c)(:o) 4    
[C+0v4H1D3](C)(c)(c) 4    
[C+0v4H1D3](C)(N)(c) 4    
[C+0v4H1D3](O)(c)(c) 4    
[C+0v4H2D2](C)(F) 4    
[I+0v1H0D1](C) 4    
[N+0v3H0D2](=C)(O) 4    
[N+0v3H0D2](=O)(c) 4    
[N+0v3H0D3](C)(C)(S) 4    
[N+0v3H1D2](C)(O) 4    
[N+0v3H1D2](C)(P) 4    
[P+0v5H0D4](=S)(O)(O)(S) 4    
[C+0v4H0D2](#N)(S) 3    
[C+0v4H0D3](=C)(c)(c) 3    
[C+0v4H0D3](=C)(C)(c) 3    
[C+0v4H0D3](=C)(Cl)(Cl) 3    
[C+0v4H0D3](=C)(C)(O) 3    
[c+0v4H0D3](:c)(:c)(:s) 3    
[c+0v4H0D3](:n)(:s)(N) 3    
[C+0v4H1D2](=N)(c) 3    
[C+0v4H1D2](=N)(N) 3    
[C+0v4H1D2](=O)(O) 3    
[C+0v4H1D3](Br)(C)(C) 3    
[C+0v4H1D3](C)(Cl)(Cl) 3    
[C+0v4H1D3](C)(C)(n) 3    
[C+0v4H1D3](C)(N)(O) 3    
[C+0v4H2D2](C)(I) 3    
[C+0v4H3D1](P) 3    
[n+0v3H0D2](:c)(:o) 3    
[N+0v3H0D3](C)(C)(P) 3    
[o+0v2H0D2](:c)(:n) 3    
[P+0v5H0D4](=O)(N)(O)(S) 3    
[S+0v2H1D1](C) 3    
[S+0v6H0D4](=O)(=O)(C)(N) 3    
[C+0v4H0D2](#C)(c) 2    
[C+0v4H0D3](=C)(C)(Cl) 2    
[c+0v4H0D3](-c)(:c)(:o) 2    
[c+0v4H0D3](-c)(:c)(:s) 2    
[C+0v4H0D3](=C)(C)(S) 2    
[c+0v4H0D3](:c)(:n)(I) 2    
[c+0v4H0D3](-c)(:n)(:n) 2    
[c+0v4H0D3](:c)(:n)(S) 2    
[c+0v4H0D3](:c)(:o)(N) 2    
[c+0v4H0D3](:c)(:o)(=O) 2    
[C+0v4H0D3](=N)(C)(c) 2    
[c+0v4H0D3](:n)(:n)(Cl) 2    
[c+0v4H0D3](:n)(:n)(F) 2    
[c+0v4H0D3](:n)(:n)(O) 2    
[c+0v4H0D3](:n)(:n)(=S) 2    
[C+0v4H0D4](C)(C)(c)(c) 2    
[C+0v4H0D4](C)(C)(F)(F) 2    
[C+0v4H0D4](C)(C)(O)(c) 2    
[C+0v4H0D4](C)(F)(F)(O) 2    
[C+0v4H0D4](Cl)(Cl)(Cl)(c) 2    
[C+0v4H0D4](F)(F)(F)(O) 2    
[C+0v4H1D2](=C)(O) 2    
[c+0v4H1D2](:n)(:s) 2    
[C+0v4H1D3](C)(F)(F) 2    
[C+0v4H1D3](C)(N)(N) 2    
[C+0v4H2D2](N)(N) 2    
[F+0v1H0D1](S) 2    
[N+0v3H0D2](=N)(c) 2    
[N+0v3H0D2](=N)(C) 2    
[N+0v3H0D3](C)(N)(c) 2    
[P+0v5H0D4](=O)(C)(O)(O) 2    
[P+0v5H0D4](=O)(C)(O)(S) 2    
[P+0v5H0D4](=O)(N)(N)(N) 2    
[P+0v5H0D4](=O)(N)(O)(O) 2    
[S+0v2H0D1](=c) 2    
[S+0v2H1D1](c) 2    
[S+0v4H0D3](=O)(c)(c) 2    
[S+0v4H0D3](=O)(C)(c) 2    
[S+0v6H0D4](=O)(=O)(c)(c) 2    
[S+0v6H0D4](=O)(=O)(F)(c) 2    
[C+0v4H0D3](=C)(F)(F) 1    
[c+0v4H0D3](-c)(:n)(:s) 1    
[c+0v4H0D3](:c)(:o)(C) 1    
[c+0v4H0D3](:c)(:s)(C) 1    
[C+0v4H0D3](=N)(Br)(N) 1    
[c+0v4H0D3](:n)(:n)(Br) 1    
[C+0v4H0D3](=N)(N)(N) 1    
[C+0v4H0D3](=N)(O)(c) 1    
[c+0v4H0D3](:n)(:o)(N) 1    
[c+0v4H0D3](:n)(:s)(=O) 1    
[c+0v4H0D3](:n)(:s)(O) 1    
[c+0v4H0D3](:n)(:s)(S) 1    
[C+0v4H0D3](=O)(C)(S) 1    
[C+0v4H0D3](=S)(C)(O) 1    
[C+0v4H0D3](=S)(N)(c) 1    
[C+0v4H0D4](Br)(Br)(Br)(C) 1    
[C+0v4H0D4](Br)(C)(C)(C) 1    
[C+0v4H0D4](C)(Cl)(F)(F) 1    
[C+0v4H0D4](C)(C)(N)(c) 1    
[C+0v4H0D4](C)(C)(N)(N) 1    
[C+0v4H0D4](C)(C)(O)(O) 1    
[C+0v4H0D4](Cl)(Cl)(Cl)(Cl) 1    
[C+0v4H0D4](Cl)(Cl)(Cl)(S) 1    
[C+0v4H0D4](C)(N)(c)(c) 1    
[C+0v4H1D2](=C)(Br) 1    
[C+0v4H1D2](=C)(Cl) 1    
[C+0v4H1D2](=O)(n) 1    
[C+0v4H1D3](Br)(C)(Cl) 1    
[C+0v4H1D3](Br)(N)(N) 1    
[C+0v4H1D3](c)(c)(c) 1    
[C+0v4H1D3](C)(C)(S) 1    
[C+0v4H1D3](Cl)(Cl)(Cl) 1    
[C+0v4H1D3](C)(O)(P) 1    
[C+0v4H1D3](F)(F)(c) 1    
[C+0v4H2D2](Br)(c) 1    
[C+0v4H2D2](Cl)(c) 1    
[C+0v4H2D2](Cl)(Cl) 1    
[C+0v4H2D2](C)(P) 1    
[C+0v4H2D2](F)(F) 1    
[C+0v4H2D2](S)(c) 1    
[C+0v4H2D2](S)(n) 1    
[C+0v4H2D2](S)(S) 1    
[C+0v4H3D1](Br) 1    
[C+0v4H3D1](Cl) 1    
[C+0v4H3D1](F) 1    
[C+0v4H3D1](I) 1    
[Cl+0v1H0D1](N) 1    
[N+0v3H0D2](=C)(Cl) 1    
[n+0v3H0D2](:c)(:s) 1    
[N+0v3H0D2](=N)(N) 1    
[n+0v3H0D3](-c)(:c)(:c) 1    
[N+0v3H0D3](c)(c)(c) 1    
[n+0v3H0D3](:c)(:c)(O) 1    
[N+0v3H0D3](C)(C)(O) 1    
[n+0v3H0D3](-c)(:n)(:n) 1    
[N+0v3H0D3](N)(c)(c) 1    
[N+0v3H1D1](=C) 1    
[N+0v3H1D2](O)(c) 1    
[N+0v3H2D1](P) 1    
[N+0v5H0D3](=O)(=O)(O) 1    
[O+0v2H0D2](C)(N) 1    
[O+0v2H0D2](C)(S) 1    
[O+0v2H1D1](n) 1    
[P+0v5H0D4](=O)(C)(N)(N) 1    
[P+0v5H0D4](=O)(O)(O)(S) 1    
[P+0v5H0D4](=O)(O)(S)(S) 1    
[S+0v2H0D2](C)(N) 1    
[S+0v2H0D2](C)(S) 1    
[s+0v2H0D2](:n)(:n) 1    
[S+0v4H0D3](=O)(C)(C) 1    
[S+0v6H0D4](=O)(=O)(C)(O) 1    
[S+0v6H0D4](=O)(=O)(N)(N) 1    
 

Coefficient computation

  • xlogp test set of 1850 compounds
  • gNova CHORD count_match(smiles, smarts) function
  • R statistical program linear models function to correlate counts with experimental logP

Residual standard error: 0.4724 on 1732 degrees of freedom
Multiple R-Squared: 0.9087,     Adjusted R-squared: 0.9023 
F-statistic: 143.6 on 120 and 1732 DF,  p-value: < 2.2e-16