PROTEIN-BASED VIRTUAL SCREENING OF A 3D CHEMICAL DATABASE

Prepared first by

Dr. Didier Rognan
Adapted by Dr. G. Schaftenaar


Aim : Virtual screening of a chemical database to identify potential hits


Target : Estrogen receptor (ERa)

A number of X-ray structures of the ligand binding domain of ER-a are available (at least two containing antagonists: raloxifene, and 4-hydroxy-tamoxifen).

Database : 32 small molecular weight molecules

- 25 random molecules

(from the National Cancer Institute database)

- 6 known ERa-antagonists

(raloxifene, 4-OH-tamoxifen, ICI-164384, nafoxidene, LY-326315, EM-343)

- dihydroxy-tamoxifen

(a ligand designed in analogy to 4-OH-tamoxifen and estradiol, the natural ligand)
Method: Flexible docking of the 32 ligands using the SYBYL/FlexX interface.

For more details on FlexX, see :

Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. J. Mol. Biol. 1996, 261, 470-89.
Rarey, M.; Kramer, B.; Lengauer, T. Proteins 1999, 34, 17-28.

Briefly, FlexX is a flexible docking tool using an incremental construction algorithm that first
places a base fragment in the active site and then extends it to peripheral fragments according to the
most favorable torsion angles (intramolecular energy) and protein-ligand interactions
(intermolecular energy)

Analysis :

- Docking accuracy (root-mean square deviations of the FlexX pose from the X-ray solution)

- Ranking accuracy, according to seven different scoring functions (FlexX, Gold, Pmf, Dock,
Chemscore, Score)

- Possible use of consensus lists

- Hit rate in the top 5 ligands according to single or consensus scoring

Fig.1 Known ERa antagonists in the database



If you are working from a Windows PC, first read how to setup a X-windows Session with our main Unix machine cmbi6.cmbi.kun.nl.
From the UNIX shell, change directory to data/bioinf4/docking by typing

> cd data/bioinf4/docking

then type

> sybyl The SYBYL menu appears



File >> Read >>

A Read File window appears (Fig.2)

Default directory from which SYBYL
has been started

Fig.2 : Read File window

Select 3ert.mol2 in the upper right menu and click ok.

The X-ray structure of the ligand binding domain of the ERa receptor in complex with 4-hydroxy-
tamoxifen is loaded on the screen. Please note that no hydrogen atoms are present. Carbon atoms
of the ligand are colored in green. Red crosses indicate positions of crystallographic water
molecules



Extracting the coordinates of the active site is a prerequisite to any virtual screening, in order to
reduce the number of docking solutions for each ligand.

Define the active site as the collection of amino acids lying within 6.5 Å of any ligand atom.

Build/Edit >> Merge >>

The Atom Expression window appears on the
screen (Fig.3)

1. Move from atoms to Monomers in the
upper left menu

2.In the Substructures menu, select
OHT600 after browsing the proposed list and
clicking ok.

3.Click the Sets menu, activate the Sphere
option, give 6.5 as radius, and confirm by
clicking ok

4.click ok


Fig.3 Atom Expression window

Choose m2 as molecule area into which selected atoms are to be merged, and click ok.
The active site (in red) is superimposed to the whole protein. To display the active site alone, select
the Display area option in the left panel menu (Fig.4)
Select in the Display Area menu (left menu on the screen) and undisplay
the full protein by disabling the On/Off option (D1 row) of the Mol
Display submenu. Quit this window by clicking the Q button.
Fig.4 Left panel menu

The active site (in red) with the bound ligand (in green) is now visible.
Give a name to the active site :

Built/Edit >> Modify >> Molecule >> Name >>m2 >> active_site. Click ok.

Remove the ligand and save the coordinates of the active site : Build/Edit >>Delete >>Atom
A new Atom Expression window is displayed (Fig.5)

1. Unselect m1 (by clicking the row), select m2
as the molecule area to work with.

2. Browse the Substructures menu and select
OHT600, Confirm the selection by clicking ok

3. click ok to close the window
Fig.5. Atom Expression window

The ligand has been removed. Now, we save the coordinates of the active site.

File >> Save As

A Save Molecule window is displayed (Fig. 6)

1. Unselect m1 (by clicking the row),
select m2 as the molecule area to work with.


2. Choose the PDB format for
saving coordinates
3.

3. give a name to the file to save :
active_site.pdb


4. click on save
Fig.6 Save Molecule window

Display the full protein again (recall Fig.4) and delete all molecules from the screen

Build/Edit >> Zap(Delete) Molecule >> All >> ok




Using FlexX, we will dock a small database of 32 compounds (25 random, 5 known antagonists)
into the ERa active site. For each ligand, a single low-energy conformation has been previously
defined by converting 2D into 3D coordinates using CORINA (Gasteiger et al. Tetrahedron Comp.
Method. 1990, 3, 537-547)

Tools >> Docking Suite>> Dock Ligands

The FlexX window is displayed after a few seconds (Fig.7)



Fig.7. FlexX window

Now create a receptor description file(rdf) :

click on the Create... button, give a name for the file (e.g. er) and confirm by clicking ok. A new
«Create rdf file » window appears (Fig. 8)
1. PDB Filename: click the ... button, select 3ert.pdb in the
« Files » window. Confirm the selection by clicking ok


2. Active-Site File: click the ... button, select active_site.pdb in the
« Files » window. Confirm the selection by clicking ok


3. click OK for exiting the window


Fig.8 Create RDF File window

After a few seconds, we come back to the previous
FlexX window (Fig.7). Please note that the name given to the RDF file is now selected (Fig.9)

1. Ligands from:Select the database file type (mol2).

2. click the
... button, select
database.mol2 in the « Files » window.
Confirm the selection by clicking ok.

3. Assign formal charges

4. Activate the « FlexX Details »
window and modify the default Maximum Number of
Poses per Ligand (from 30 to 1). This means that
only the top solution will be saved for each
ligand. Confirm by clicking ok.

5. activate the Netbatch mode

6. Step 6 should submit the job. For
saving time (docking 32 ligands requires ca.
1h cpu time), we assume that the job has
been submitted and we will directly analyze
the results.

Thus, do not press ok but cancel!




Fig.9 FlexX window

Exit the FlexX window by selecting
Cancel



Tools >> Docking Suite>> Analyze results
The FlexX answer browser window is displayed (Fig. 10)

a) FlexX score

click the ... button, select dbflexx as jobfile in the « Sub-Directories » window. Confirm the
selection by clicking ok. The window is updated (Fig. 11) and the list of docked ligands with their
binding energy score (FlexX score) is given
Fig.10 FlexX answer browser window
Fig.11 Updated FlexX answer browser window

- For all but one of the 32 ligands, a docking solution has been found. By clicking the
Show Failed Ligands ... button, a list of 1 ligands that FlexX failed to dock in the ERa active
site is displayed. Close the window.
- Clicking the Scores icon ranks the 31 docked ligands according to the FlexX score (in kJ/mol)
(Toggle to switch between ascending and descending order)
(Table 1)
Table 10. FlexX Drugscore ranking of the 31 docked ligands


Rank Ligand FlexX Score
kJ/mol
1 NSC__147505 -81.9
2 RALOXIFENE -72.0
3 4_hydroxy_tamoxifen -66.4
4 EM_343 -66.2
5 ICI_164384 -63.3
6 dihydroxy_tamoxifen -59.9
7 LY_326315 -57.1
8 NSC__506431 -49.9
9 NAFOXIDENE -48.5
10 NSC__152522 -46.0
11 NSC__131754 -44.5
12 NSC__88579 -41.3
13 NSC__74751 -39.0
14 NSC__46215 -38.4
15 NSC__2 -38.3
16 NSC__240424 -38.1
17 NSC__102240 -36.9
18 NSC__618129 -36.9
19 NSC__658337 -34.9
20 NSC__679529 -34.5
21 NSC__208922 -34.3
22 NSC__346517 -34.1
23 NSC__176927 -32.6
24 NSC__163127 -28.7
25 NSC__60047 -28.5
26 NSC__118161 -28.1
27 NSC__636713 -25.5
28 NSC__34379 -23.9
29 NSC__276435 -23.5
30 NSC__382147 -18.4
31 NSC__703010 -16.3

Please note that 6 out the 6 known ERa antagonists are amongst the top 9 positions
Dihydroxy_tamoxifen is ranked 6th. Nafoxidene scores one place below NSC_506431. Nafoxidene has a protected hydroxyl, and needs to be metabolised before it becomes active. Nafoxidene in this pro-drug state is not be expected to rank very high. We will see in a few minutes whether other scoring functions may improve on this result.
Close the FlexX answer browser window.

b) Docking accuracy

For two ligands (raloxifene, 4-oh-tamoxifen), a protein-ligand X-ray structure is available.
We can then compare the FlexX pose with the X-ray solution.

Load the protein active site:

File >> Read >> active_site.pdb >> No (Do not center the molecule at screen)

Load the ERa-bound X-ray structure of 4-oh-tamoxifen:

File >> Read >> 4_oh_tamoxifen_xray_pdb >> No

Load the FlexX solution for 4-oh- tamoxifen:

File >> Read >>
Select dbflexx, then 4_hydroxy_tamoxifen.mdb in the upper « Sub-directories » menu and then 4-hydroxy-tamoxifen_001.mol2 in the right « Files » menu.

You can color the ligands by:

View >> Color >> Atoms.. >> Select Molecular Area >> All >> OK >> Choose a Color

Look at analogies and differences between the two poses.

Answer the following questions for the above-described compounds :

- Is the FlexX conformation similar to the protein-bound X-ray structure ?
- Is the FlexX orientation in agreement with the X-ray solution ?
- Could the FlexX poses be used for lead optimization ?

c) Re-scoring all hits

The 31 hits docked by FlexX will be rescored by 4 other scoring functions:

Dock (Ewing et al., J. Comput. Chem. 1997, 18, 1175-1189)
Gold (Jones et al., J. Mol. Biol. 1997, 267, 727-748)
Pmf (Muegge et al., J. Med. Chem. 1999, 42, 379384)
Chemscore (Eldridge et al. J. Comput-Aided Mol. Des. 1997, 11,425-445)

Dock and Gold use a force-field energy decomposition for calculating interaction energies whereas
Chemscore and FlexX belong to the category of empirical free energy scoring
functions (energy decomposition into various scores to which a coefficient has been assigned).
Pmf uses a statistical potential of mean force.

- Rescoring with the CScoreTM module of Sybyl

The CScoreTM option allows to compute FlexX (F_score), Dock (D-score), Gold (G score), PMF
(P_score) and Chemscore (C_score) scores from a table where the FlexX docked conformations
had previously been saved.
Delete all molecules from the screen:

Build/Edit >> Zap (Delete) Molecule

Load the table:

File >> Molecular Spreadsheet >> New >> Database >>

Select hits.mdb in the right window where all available databases are listed.
Confirm by clicking the Open button.

A new spreadsheet entitled HITS is displayed on the screen (Fig. 11)



Fig.11 Sybyl Molecular Spreadsheet

The 31 molecules docked by FlexX are here listed by alphabetical order. To run CScore :
type at the SYBYL shell:

Sybyl>> cscore[ENTER] !! Warning
Associated receptor mol2 file: protein.mol2 Commands to execute in the
Row expression : *[ENTER}bottom shell


5 scores (F_score, D_score, G_score, P_score ) have been computed for each row (Fig. 12).



Fig.12. 5 new scores for the 31 potential ligands

The CScore value is a consensus score (from 0 to 5) indicating whether each compound belongs to
the top scorers of individual lists (5 : always, 4 : 4 out of 5 lists, etc..). Using this simple consensus scoring,
4 out of the 6 known ligands would have been selected
using cscore value 4 or more. Note that dihydroxytamoxifen with a cscore of 4 would be classified as a new potential drug.
Note that the other 2 known active compounds and also some of the random ligands have a cscore of 3 and are therefore promissing drug candidates.
One of these random ligands is NSC_152522 (Fig. 13).



Fig. 13. NSC_152522

To look at the proposed docked conformation, select the compound in the spreadsheet (click the
corresponding row) and display the molecule at the screen:

File (Spreadsheet menu) >> Put Rows into Molecule Areas

The docked conformation of NSC_152522 is displayed. Load the protein and 4-oh-
tamoxifen_xray as references (recall earlier). The two phenolic groups are H-bonded to Glu353 and
Thr347 side chains and fit very well the same two moieties of 4-oh-tamoxifen. The 2 aromatic
rings are also well superimposed. Thus, this new compound might be a putative true hit. You can
label the protein residues by View >> Label >> Substructure >> All


Look at the order of each list :
View (Spreadsheet menu) >> Sort >>

Select a column primary input (e.g. PMF_SCORE)
Select the rank order : Ascending

The table is updated according to the PMF_score ranking (Fig. 15). The PMF_score is the score corresponding (but NOT equal) to the Drugscore we used to generate the docking poses.



Fig. 15 Table listed according to PMF_score
(original FlexX Drugscore scores, recall Table 1)

Look at the performance of the five scoring functions (ranking of known ligands )


Are different scores correlated ?
To see the possible correlations between different scores, plot one score versus another one.

Graph (Spreadsheet menu) >> Scatter
X axis :any of the 7 scores
Y axis : any of the seven scores
Z axis : omit
Color : uniform
Press the Create button.

For most of all possible 2D plots, 4 out of the 5 known ligands are well separated from the random
pool. By selecting the Pick Points option in the Spreadsheet, you can click any point and identify
the corresponding hit as well as its structure (Fig.16)



Fig. 15 Scatter plot

Delete all molecules and backgrounds:

Built/Edit >> Zap (Delete) Molecule
View >> Delete All Backgrounds
File (Spreadsheet menu) >> Close >> No

Exit sybyl:

Sybyl > exit [ENTER] !! Warning: Command to execute in the bottom blue shell



FlexX can accurately dock ligands in the binding site of the ERa receptor and discriminate 6 out
of 6 known ligands from a random pool of small molecular-weight molecules.