Computational Medicinal Chemistry Course: Sequence Retrieval & Alignment

Expression experiments yield the following nucleotide sequence associated with an overexpressed gene:

gcgaagagggtgccaggctttgtggatttg

Perform a blast search to identify the gene associated with this sequence.

You can find Blast at the following URL:

http://srs.bioasp.sara.nl:4081/cgi-bin/blast.cgi

Select the EMBL database to search against. Exclude the patent section from the embl. Can you uniquely identify the corresponding gene ?

Now search with SRS in the swissprot database for this gene.

You will find SRS at the following URL:

http://services.bioasp.nl/srs71/

Next we will make a multiple alignment of the proteins from the two isoforms In order to do this, search for esr1 and esr2 in the ID field of Swissprot. Write out the names of the found sequences to a file (esr.fof) . This file is a so-called file of filenames (FOF). Remove the names of the yeast and sheep sequences.
(Since multiple alignment is a global alignment all sequences must be roughly of the same length).

Now we will need software from the GCG package on our main Unix machine. If you are working from a Windows PC, first read how to setup a X-windows Session with our main Unix machine cmbi1.cmbi.kun.nl.

We will use the GCG program suite to extract these sequences from the database and convert them to fasta format:

cammsastartup
tofasta @esr.fof

Name the output file esr.seq.

We create the multiple alignment with the clustalX program:

clustalx

Click FILE->Load Sequences and select the esr.seq file.

Click Alignment->Do Complete Alignment


NOTE: Now we are going to create a model via the homology modelling server. Click Here to go there.

For the die-hards out there you can try to create a model your self with the Whatif software. Mind you, we can only allow a few of you to do this, because it is to heavy a load on the main computer.


Modelling the ERα receptor

We are now going to model to ERα receptor. To do this we need a protein sequence of the ERα receptor, we will take the sequence for the human estrogen SWISSPROT:ESR1_HUMAN. Next we need a template 3D structure with high homology to the ERα receptor. Take the ESR1_HUMAN sequence and blast it against the Protein DataBank (PDB). Discard the ERα hits and choose one of the remaining structures as a template. A good template has:

- high homology with the model sequence
- is complete in the region of the Ligand Binding Domain
- is a high quality structure (resolution)

For the purpose of this course, also discard dimeric structures.
1QKM seems a good choice (resolution 1.8, intact between H11 and H12, and a monomer (so no editing needed)).
But on close inspection, it turns out the backbone is interrupted.
It turns out that 1L2J is the better template. However some manual editing is required; you need to make it a monomer (use the first chain).

Some drawback of other related crystal structures:

1QKN --> there is a part of the sequence missing between H11 and H12

1HJ1 --> no H12

1NDE --> resolution 3.00

We will use the Whatif program to create the model. To create a model Whatif requires:

- The pdb file of the template structure
- Its corresponding sequence in PIR format.
- The model sequence in PIR format.

To obtain sequence of the template structure we can use Whatif as well:

whatif
getmol 1l2j.pdb
pirpsq
souppir
all
1L2J.pir
To create the model sequence in PIR format, we use the GCG program topir. Assuming you have used the 1l2j structure as a template, we only need the ligand binding domain of the sequence:

cammsastartup
topir swissprot:esr1_human 
Begin: 311
End: *
seq.pir
Now we need to align the template and the model structure, lets use clustalx for this purpose.

clustalx
Click FILE->Load Sequences and select the 1l2j.pir file.
Click FILE->Add Sequences and select the seq.pir file.
Click Alignment->Do Complete Alignment

Click FILE->Save Sequences as, select NBRF/PIR and choose a filename and select OK. (See pir format).

The alignment should look something like this:
1L2J            DALSPEQLVLTLLEAEPPHVLISR-------TEASMMMSLTKLADKELVH
seq             ---TADQMVSALLDAEPP-ILYSEYDPTRPFSEASMMGLLTNLADRELVH
                   :.:*:* :**:**** :* *.       :*****  **:***:****

1L2J            MISWAKKIPGFVELSLFDQVRLLESCWMEVLMMGLMWRSIDHPGKLIFAP
seq             MINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAP
                **.***::****:*:*.***:***..*:*:**:**:***::*****:***

1L2J            DLVLDRDEGKCVEGILEIFDMLLATTSRFRELKLQHKEYLCVKAMILLNS
seq             NLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNS
                :*:***::******::*********:**** ::** :*::*:*::*****

1L2J            -----LVTATQDADSSRKLAHLLNAVTDALVWVIAKSGISSQQQSMRLAN
seq             GVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQ
                     * :: :. :.. :: ::*: :**:*: ::**:*:: ***  ***:

1L2J            LLMLLSHVRHASNKGMEHLLNMKCKNVVPVYDLLLEMLNAHVLR-
seq             LLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHA
                **::***:** ******** .********:********:** *: 

Now we need to incorporate the gaps as found in the alignment of sequences 1l2j.pir and seq.pir.

Now we are ready to create the model.

whatif
getmol 1l2j.pdb
getpir
bldpir
seq.pir
all
# choose the slow version of the modelbuilder
makemol
all

Click here for the final model structure model.pdb.

     A superposition of the model and an ERa structure (3ERT). 
     Both 1L2J homologue and 3ERT have H12 in the antagonist position, altough
     not in exactly the same way.
     Note also gaps between helix 1 and 3 and a gap at the end of helix 8.