SQL Functions for Bit Strings

Function SQL example
bit_set Select bit_set(0::bit(20),5);

Update nci.structure set myfp = bit_set(myfp, 7) where matches(smiles,'[N,n]');
nbits_set Select nbits_set(vfp)::real/length(vfp) from nci.structure limit 50;
isbit_set Select count(smiles) from nci.structure where isbit_set(myfp, 7);
bit_contains Select count(smiles) from nci.structure where bit_contains(vfp, fp('c1ccccc1C(=O)NC',2048,64,25));
bit_density Select count(smiles) from nci.structure where bit_density(gfp) > 0.5;
contains Select count(smiles) from nci.structure where contains(gfp, fp('c1ccccc1C(=O)NC'));
orsum Select orsum(bit_set(0::bit(166), bit)) from fragment.public166keys
where matches('c1ccccc1C(=O)NC',smarts);
All the above functions are installed into a SCHEMA named gnova. They can be accessed as, for example gnova.bit_set. Or, you can set your search_path to include the SCHEMA gnova and access them by their unqualified names, for example bit_set.

Bit strings in PostgreSQL can be thought of as a character string of 0's or 1's, although they are actually stored efficiently as bits in a string of bytes. The external representation, as seen when a bit string is selected, is a character string of 0's and 1's. Bit #1 appears on the left. There are some built-in functions for handling bit strings, as well as &(AND) and |(OR), and ~(NOT) operators. The functions additional here are provided to help handle bit strings typically used for fingerprints.

bit_set(bit Astring, integer ibit) returns bit

This function reads a bit string Astring and returns another bit string with the ibit'th bit set. It does not modify Astring. Bit #1 is the first bit at the left of the string.

nbits_set(bit Astring) returns integer

This function returns the number of bits in Astring which are set to 1.

isbit_set(bit Astring, integer ibit) returns boolean

This function returns true if the ibit'th bit in Astring is set to 1; otherwise it returns false. Bit #1 is the first bit at the left of the string.

bit_contains(bit Astring, bit Bstring) returns boolean

This function returns true if all the bits in Bstring that are set to 1 are also set to 1 in Astring; otherwise it returns false. If the two strings are of unequal size, a folded version of the longer string is compared to the shorter. Folding is possible only if both strings' size is a power of two. If folding is not possible, an error is reported.

bit_density(bit Astring) returns real

This function returns the ratio of the number of bits in Astring that are set to 1, to the size of the string. This will always be between 0 and 1.

contains(bit Astring, bit Bstring) returns boolean

This function returns true if all the bits in Bstring that are set to 1 are also set to 1 in Astring; otherwise it returns false. The two strings must be of equal size.

orsum(bit Astring) returns bit

This is an aggregate function, like sum or count, that operates on an aggregate of values. It combines each bit string of the aggregate by ORing corresponding bits together. Each bit string of the aggregate must have the same size. It is used in the fragment.public166keys function to compute a fingerprint. Consider the following two SQL statements:

select       bit_set(0::bit(166),bit)  from fragment.public166keys where matches('c1ccccc1C(=O)NC',smarts);
select orsum(bit_set(0::bit(166),bit)) from fragment.public166keys where matches('c1ccccc1C(=O)NC',smarts);

The first statement returns14 rows, because there are 14 smarts in the table fragment.public166keys that match c1ccccc1C(=O)NC. Each row is a fingerprint part, of size 166 because of 0:bit(166), but having only one bit set, as given by the value in the bit column of fragment.public166keys. The second statement returns one row which is the OR of each of the 14 bit strings - the full fragment/key fingerprint for c1ccccc1C(=O)NC.