By Scott Lusher and G. Schaftenaar
1. Introduction to 2D-similarity searches
2D-searching is applied to find compounds in a database which are similar
in molecular features to a known active molecule(s). Each compound is
assigned a 2D-fingerprint. A fingerprint is a set of bits, where each bit
indicates the absence or presence of a molecular feature. To determine
how similar two compounds are based on their fingerprints, the Tanimoto
coefficient is often used. Below you will find an example of two compounds
and their fingerprints and the calculation of the Tanimoto coefficient:

Below you will find two new compounds. Calculate the Tanimoto coefficient
for this pair of molecules.

Below you will find two known active compounds for the estrogen receptor;
Raloxifene(1) and Tamoxifen(2).
Calculate the Tanimoto coefficient for this pair of molecules.
Are these two compounds very similar ?

If necessary, check the X-windows
start-up page for detailed instructions on how to set up the
X-windows environment and to access the CMBI's main Unix machine,
cheminf.cmbi.kun.nl. Then, from the Unix shell (command prompt):
-
Change directory to data/bioinf4/2D by typing
cd data/bioinf4/2D
-
And call Sybyl by typing
sybyl
2. Find compounds in a database similar to a known active
How many compounds in the database of 500 compounds (485 randomly selected, 15 SERM
(Selective Estrogen Receptor Modulator)
are similar to tamoxifen, a known active for the ERa
receptor.
Read in tamoxifen:
File >> Read >> tamoxifen.mol2
Read in the database of 500 compounds:
File >> Molecular Spreadsheet >> Open >> 500.tbl
Be sure that the Format is SYBYL Table.
Now let's do the 2D similarity search:
Unity >> Unity Search

- Check that Search Query in Molecular Area is selected.
- Change Query Type to 2D Similarity.
- Change Query Options from Default to Specify,
next click Options.
- Under Minimum Similarity, select value is and fill
in the number 75.0.
- Select Spreadsheet by checking the checkbox next to it.
The 500 compound database will automatically be selected.
- Uncheck Search Selected Rows
- Finally click the OK button.
In the sybyl textport you will find the number of compounds in the database
that are found to be similar to tamoxifen, this should be 4 compounds.
Now have a look at those similar compounds:
Unity >> Load Results >> From a Search >> take the last entry.
A new spreadsheet called UNITY_SIMSEARCH_2D will popup.
Have a look at the compounds in the spreadsheet:

In the spreadsheet: select a row and then click Show RowSel.
You will find these compounds are all very similar.
2. Optimizing the similarity cutoff
Now repeat the same procedure but change the Minimum Similarity to
45. You will now find 16 compounds matched using these similarity
criteria.
Have a look at these compounds too (see above). How many of these hits
are known drugs and how many of these are likely to be false positives ?
(answer is 10 and 6 respectively.)
If you repeat the same procedure with Minimum Similarity set to 55.
You will find two thirds of the known drugs in the database (10) and no false positives.
In this case the optimal similarity cutoff is apparently 55.