| Computational Medicinal Chemistry Course: Sequence Retrieval & Alignment |
|---|
Expression experiments yield the following nucleotide sequence associated with an overexpressed gene:
gcgaagagggtgccaggctttgtggatttg Perform a blast search to identify the gene associated with this sequence.You can find Blast at the following URL:
http://srs.bioasp.sara.nl:4081/cgi-bin/blast.cgi Select the EMBL database to search against. Exclude the patent section from the embl. Can you uniquely identify the corresponding gene ? Now search with SRS in the swissprot database for this gene. You will find SRS at the following URL: http://services.bioasp.nl/srs71/ Next we will make a multiple alignment of the proteins from the two isoforms In order to do this, search for esr1 and esr2 in the ID field of Swissprot. Write out the names of the found sequences to a file (esr.fof) . This file is a so-called file of filenames (FOF). Remove the names of the yeast and sheep sequences. (Since multiple alignment is a global alignment all sequences must be roughly of the same length).Now we will need software from the GCG package on our main Unix machine. If you are working from a Windows PC, first read how to setup a X-windows Session with our main Unix machine cmbi1.cmbi.kun.nl.
We will use the GCG program suite to extract these sequences from the database and convert them to fasta format:
cammsastartup tofasta @esr.fofName the output file esr.seq. We create the multiple alignment with the clustalX program: clustalx Click FILE->Load Sequences and select the esr.seq file. Click Alignment->Do Complete Alignment
Now we are going to create a model via the homology modelling server. Click Here to go there.
For the die-hards out there you can try to create a model your self with the Whatif software. Mind you, we can only allow a few of you to do this, because it is to heavy a load on the main computer.
We are now going to model to ERα receptor. To do this we need a protein sequence of the ERα receptor, we will take the sequence for the human estrogen SWISSPROT:ESR1_HUMAN. Next we need a template 3D structure with high homology to the ERα receptor. Take the ESR1_HUMAN sequence and blast it against the Protein DataBank (PDB). Discard the ERα hits and choose one of the remaining structures as a template. A good template has:
For the purpose of this course, also discard dimeric structures.
1QKM seems a good choice (resolution 1.8, intact between H11 and H12, and a monomer (so no editing needed)).
But on close inspection, it turns out
the backbone is interrupted.
It turns out that 1L2J is the better template. However some manual editing
is required; you need to make it a monomer (use the first chain).
Some drawback of other related crystal structures:
1QKN --> there is a part of the sequence missing between H11 and H12
1HJ1 --> no H121NDE --> resolution 3.00
We will use the Whatif program to create the model. To create a model Whatif requires:
To obtain sequence of the template structure we can use Whatif as well:
whatif getmol 1l2j.pdb pirpsq souppir all 1L2J.pirTo create the model sequence in PIR format, we use the GCG program topir. Assuming you have used the 1l2j structure as a template, we only need the ligand binding domain of the sequence:
cammsastartup topir swissprot:esr1_human Begin: 311 End: * seq.pirNow we need to align the template and the model structure, lets use clustalx for this purpose.
clustalx Click FILE->Load Sequences and select the 1l2j.pir file. Click FILE->Add Sequences and select the seq.pir file. Click Alignment->Do Complete Alignment Click FILE->Save Sequences as, select NBRF/PIR and choose a filename and select OK. (See pir format).The alignment should look something like this:
1L2J DALSPEQLVLTLLEAEPPHVLISR-------TEASMMMSLTKLADKELVH
seq ---TADQMVSALLDAEPP-ILYSEYDPTRPFSEASMMGLLTNLADRELVH
:.:*:* :**:**** :* *. :***** **:***:****
1L2J MISWAKKIPGFVELSLFDQVRLLESCWMEVLMMGLMWRSIDHPGKLIFAP
seq MINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAP
**.***::****:*:*.***:***..*:*:**:**:***::*****:***
1L2J DLVLDRDEGKCVEGILEIFDMLLATTSRFRELKLQHKEYLCVKAMILLNS
seq NLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNS
:*:***::******::*********:**** ::** :*::*:*::*****
1L2J -----LVTATQDADSSRKLAHLLNAVTDALVWVIAKSGISSQQQSMRLAN
seq GVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQ
* :: :. :.. :: ::*: :**:*: ::**:*:: *** ***:
1L2J LLMLLSHVRHASNKGMEHLLNMKCKNVVPVYDLLLEMLNAHVLR-
seq LLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHA
**::***:** ******** .********:********:** *:
Now we need to incorporate the gaps as found in the alignment of sequences 1l2j.pir and seq.pir.
Now we are ready to create the model.
whatif getmol 1l2j.pdb getpir bldpir seq.pir all # choose the slow version of the modelbuilder makemol all
Click here for the final model structure model.pdb.