2-D searching Tutorial

By Scott Lusher and G. Schaftenaar



2D-searching is applied to find compounds in a database which are similar in molecular features to a known active molecule(s). Each compound is assigned a 2D-fingerprint. A fingerprint is a set of bits, where each bit indicates the absence or presence of a molecular feature. To determine how similar two compounds are based on their fingerprints, the Tanimoto coefficient is often used. Below you will find an example of two compounds and their fingerprints and the calculation of the Tanimoto coefficient:



Below you will find two new compounds. Calculate the Tanimoto coefficient for this pair of molecules.



Below you will find two known active compounds for the estrogen receptor; Raloxifene(1) and Tamoxifen(2). Calculate the Tanimoto coefficient for this pair of molecules. Are these two compounds very similar ?




Setup the working environment

If necessary, check the X-windows start-up page for detailed instructions on how to set up the X-windows environment and to access the CMBI's main Unix machine, cheminf.cmbi.kun.nl. Then, from the Unix shell (command prompt):




How many compounds in the database of 500 compounds (485 randomly selected, 15 SERM (Selective Estrogen Receptor Modulator) are similar to tamoxifen, a known active for the ERa receptor.

Read in tamoxifen:

File >> Read >> tamoxifen.mol2

Read in the database of 500 compounds:

File >> Molecular Spreadsheet >> Open >> 500.tbl

Be sure that the Format is SYBYL Table.

Now let's do the 2D similarity search:

Unity >> Unity Search

Unity 2D Search window

In the sybyl textport you will find the number of compounds in the database that are found to be similar to tamoxifen, this should be 4 compounds.
Now have a look at those similar compounds:

Unity >> Load Results >> From a Search >> take the last entry.

A new spreadsheet called UNITY_SIMSEARCH_2D will popup.
Have a look at the compounds in the spreadsheet:

Results Unity 2D Search window

In the spreadsheet: select a row and then click Show RowSel.
You will find these compounds are all very similar.




Now repeat the same procedure but change the Minimum Similarity to 45. You will now find 16 compounds matched using these similarity criteria.
Have a look at these compounds too (see above). How many of these hits are known drugs and how many of these are likely to be false positives ? (answer is 10 and 6 respectively.)

If you repeat the same procedure with Minimum Similarity set to 55. You will find two thirds of the known drugs in the database (10) and no false positives. In this case the optimal similarity cutoff is apparently 55.