Frequently Asked Questions



These are a few questions and answers about PDB_REDO. New questions will be added when necessary. Mail us yours.


I'm missing entry 9xyz. Why didn't you add it?

New entries are added regularly but it takes some time. If you are in a hurry, please e-mail us. There are a number of reasons why certain entries cannot be made at all. Our error annotation server WHY_NOT will tell you why.

What if there is no experimental data?

We cannot REDO without experimental data. If you can convince the structure depositor to submit the experimental data to the PDB, we will add an optimised structure model.

Do all structures improve when run through PDB_REDO?

Fortunately not. There are a lot of high-quality structure models in the PDB that cannot be improved by means of automatic optimisation. Unfortunately, there are also a few structure models that are beyond repair. In any case, our script is set up so that the 'conservatively optimised' structure model is never worse than the original PDB entry. In a limited number of cases the fully optimised structure model is much worse than the original. Be sure to check the R-factors and the validation scores when you use a PDB_REDO entry.

Why is there a difference between the recalculated R(-free) and the value from the PDB header?

Deviations of a few percent are quite common. Here are a few common reasons:

Since PDB_REDO version 1.8 we follow this rule: If the R-factor from the header cannot be reproduced (with a tolerance of 0.10 or 10%), the optimisation is aborted and no PDB_REDO entry is made. Every once in a while we check these problematic structures by hand.

What is a Z-score?

A Z-score expresses how many standard deviations a certain value deviates from the mean. So if the average is 5 with a standard deviation of 2, then a value of 7 will have a Z-score of 1 and 3 will have a Z-score of -1, and 6 a Z-score of 0.5. Z-scores allow you to compare deviations from the mean with different standard deviations by putting everything on one scale. This is convenient for model validation. Example: two atom bonds (A and B) are both 0.05Å longer than normal. How bad is that? And is it equally bad for both bonds or not? Z-scores give us the answer. Say bond A has a standard deviation of 0.01 and bond B a standard deviation of 0.05. Then bond A has a Z-score of 5 and bond B a Z-score of 1. So bond A is much longer than normal (that's bad), whereas bond B is somewhat longer, but still okay. Wikipedia has more to say on Z-scores and their relation with p-values and the like.

What is the 'unbiased' R-free?

We can calculate an expected R-free/R ratio for unbiased refinement using an adapted version of Tickle et al. (Acta Cryst (1998), D54, 547-557). Multiplying R with this ratio gives us an expected 'unbiased' R-free. This value should be close to (or at least not much lower than) your R-free value. We use the related Z(R-free) at several point in the PDB_REDO procedure to monitor the refinement results.

What is Z(R-free) or the R-free Z-score?

We can calculate an expected R-free/R ratio for unbiased refinement using an adapted version of Tickle et al. (Acta Cryst (1998), D54, 547-557). Multiplying R with this ratio gives us an expected R-free value. Based on Tickle et al. (Acta Cryst (2000), D56, 442-450) we can also calculate the R-free uncertainty σR-free. So we can now express the difference between the expected R-free and the calculated R-free in units of σR-free: a Z-score.
Ideally this score should be close to zero. Positive values indicate that there may be room for improvement of the structure, that convergence was not yet reached in the refinement, or that R-free was extremely biased (e.g. when a new R-free set was selected, or the wrong set deposited). Negative values indicate a problem with the structure model caused by specific errors or by overrefinement.
Please note that the Z-score may be unreliable for low resolution (2.8Å or lower) structure models or when the R-free set is very small.

Why do the entries have such different version numbers?

The PDB_REDO pipeline is continuously updated and new entries are redone with the latest PDB_REDO version (a changelog can be found in the PDB_REDO software). Existing entries will be updated eventually, but because an enormous amount of CPU time is required, this may take a while. If you need a series of structure models parsed with the same version of PDB_REDO, just ask.

Why do I get 'NA' for σR-free and the R-free Z-score after re-refinement for entry 9xyz?

This means that the values values should not be used because the re-refinement did not improve the structure model. The values calculated from the data may be severely biased when there is something wrong with the R-free set. The values after re-refinement can only be used safely if the structure has changed. That is, if the structure was refined to convergence.

Why are the WHAT_CHECK scores (slightly) different before and after re-refinement when re-refinement did not 'change' the structure?

The validation is done on the structure that comes out of Refmac. In the case the re-refinement did not work, this is the entry obtained after PDB_REDO tried to reproduce the original refinement results. This may be slightly different than the original. For instance, some atoms are removed. The structure may also have been subjected to rigid-body refinement or 'pure' TLS refinement, if the original refinement could not be reproduced in the first attempt. Anyway, the WHAT_CHECK scores are correct for the structure you download from PDB_REDO.

Why are all HETATMs converted to ATOMs?

This used to be a problem with older versions of Refmac. New PDB_REDO entries (version 5 and onwards) should not suffer from this problems. MSE (seleno-methionine) residues are an exception. Due to compatability issues, these residues are still called ATOM instead of HETATM.

Why are the Z-values in the CRYST1 card missing?

Let's just call this an undocumented feature. We will try to solve this properly at some point. If you really need the Z-value, please ask me to write a work-around.

What happened to entry 9xyz? PDB_REDO completely destroyed the structure model.

This is the result of a bug in PDB_REDO or a problem that we do not yet catch. We do check for entries like these, but we missed this one. You should complain about it.

What is in the TLS group definitions that are tested?

If the input PDB file contains a TLS description, it is extracted and saved as the definition '9xyz.tls'. PDB_REDO also generates a simple definition 'REDO.tls' which has one TLS group per macromolecular chain. Any user supplied TLS group is named systematically in the order in which they are supplied 'in01.tls' to 'in99.tls'.

What is the difference between the 're-refined' and the 'final' model?

The 're-refined' comes after the conservative phase of PDB_REDO. The atom coordinates and B-factors are refined in reciprocal space with optimised refinement setting. The 'final' model is produced from the 're-refined' model by first rebuilding side-chains, performing peptide flips, etc. and then some more refinement.

What is 'paired refinement'?

Paired refinement is a way to find a suitable resolution cut-off for your X-ray data. To see whether including higher resolution, but weaker, data actually helps, we perform a pair of refinements: one with the original resolution cut-off and one with a higher resolution cut-off. If the higher resolution data helps, the higher cut-off is accepted. To be able to make a fair comparison, R-factors and the like for both refinements are calculated using only the data with the lower resolution cut-off.
Paired refinement in PDB_REDO is based on the work of Karplus and Diederichs (Science 2012; 336:1030-1033) with some modifications: we use not only R-free but also the weighted R-free, the free likelihood, and the free correlation coefficient to see whether adding higher resolution data helps and we use resolution extension steps that have an equal number of reflections, rather than equal steps in Å.

What is the 'Hammilton R ratio test'?

The Hamilton R ratio test is a means to see whether adding a lot of extra parameters to you refinement leads to a genuine improvement of the model rather than a cosmetic improvement resulting from the addition of parameters. The idea was introduced by Hamilton (Acta Cryst. 1965; 18, 502–510), but never caught on in macromolecular crystallography. Merritt recently reintroduced the test and provided some means to deal with the practical limitations of the test (Acta Cryst. 2012; D68:468-477). In PDB_REDO we use the Hamilton R ratio test to select the most suitable B-factor model (anisotropic, isotropic, or one B-factor for all atoms) and also to check wether a complex TLS model with many groups is genuinely better that a simple model (e.g. with one group per protein chain). The implementation of the test is discussed in Acta Cryst. 2012; D68:484-496 (reprint).

What are rmsZ-scores?

RmsZ-scores are the Z-score equivalent of rmsd values. They are a more applicable measure of bond length or angle deviations. A bond angle rmsd 1.0 degrees may be good or bad depending on the types of angles you are looking at (for a protein, this depends on the sequence). Strictly speaking, you should not use the rmsd for non-equivalent things such as different bonds or angles at all. We actively discourage the use of rmsd values for bond lengths and angles.
RmsZ values do not suffer from the same problems as rmsd values because every deviation is on a common scale, i.e. the number of standard deviations from the mean or 'ideal' value. That way you can compare different bond lengths or angles within a model, between models and even between different proteins. Another advantage is that rmsZ values give you a usable refinement target: if the deviations have a normal distribution, the rmsZ values will be 1.000. Note that the geometric target values in bond length and angle restraints used in refinement come from small molecules rather than protein structures, so in effect your rmsZ should always be lower than 1.000. Beyond this constraint, any value should be considered reasonable if the model is properly refined.

What is the (weighted) bump severity?

Two atoms cannot occupy the same space at the same time, but in structure models they sometimes do. This is called an atomic bump or a clash. Not every bump is equally bad. When atoms overlap more, the bump is worse. For the weighted bump severity, we sum up the square of the atomic overlaps. This ensures that one severe bump (which is likely a real model building error) has a much higher impact than several minor bumps (which are typically the result of poor refinement settings). To allow comparison of different structures, the weighted bump severity is divided by the number of atoms.

What is the free correlation coefficient?

The free correlation coefficient (also refered to as CC-free) is an alternative to R-free to compare the fit of the structure model with the X-ray data. It has useful statistical properties that allow you to more reliably test wether a change in model-to-data fit is significant. Comparison between different protein structures is also more reliable. The free correlation coefficient is calculated from the same set of reflections as R-free.

What is the Gibbs folding energy?

There is an energy difference between the folded and the unfolded state of a protein due to different hydrogen bonding, hydrophobic contacts, salt bridges etc. This difference is called the Gibbs folding energy. We use the program FoldX to estimate this for the structure model in the current state. The absolute values for the Gibbs folding energy estimated by FoldX are not claimed to be exact, but the difference between two different models of the same protein are accurate enough to test wether a one model is a more energetically favourable description of the true structure then the other.

During the re-refinement, why doesn't PDB_REDO pick the restraint weight that gives the lowest R-free?

The goal of refinement is to obtain the best model, this is not necessarily the model with the lowest R-free. The selection of the best restraint weight is performed by the program picker and follows these steps:

  1. Calculate the maximum R-free for each weight:
    1. The maximum R-free is the unbiased R-free + 2.6 times σR-free.
    2. The maximum R-free is set to the lower of
      • The maximum R-free
      • R + 0.06
      This is done to make sure the R-factor gap doesn't get rediculously large.
    3. The maximum R-free is set to the higher of
      • The maximum R-free
      • R * original_R-free/original_R
      This is done to be lenient if the original R-factor ratio was very high.
  2. Reject all weights that give bond length or bond angle rmsZ values
    • greater than 1.0 if the original value was less than 1.0
    • greater than the original value if that was greater than 1.0
  3. Reject all weights for which R-free is greater than the maximum R-free.
  4. Reject all weights for which R-free is greater than the original R-free.
  5. If the R-factor gap is greater than 0.02, then reject all weights for which the R-factor gap is doubled compared to the original gap.
  6. From the remaining conditions select the weight that gives the lowest free likelihood and R-free. In cases where these are two different weights, take the one that gives the best R-free Z-score.

Note that the method is biased towards tighter restraints. This is bias can be strengthened when picker is run in 'significant' mode: looser restraints are only accepted when they leed to a drop in free likelihood of 5 point and a drop in R-free of 0.5 times σR-free.

What are the Coot scripts (9xyz_final.scm and 9xyz_final.py) and how do I use them?

The Coot script files allow you to visually inspect the changes PDB_REDO made to the structure model in the free crystallographic program Coot. The scripts show a dialog with buttons that take you to the locations of changed rotamers, hydrogen bond flips, peptide flips, deleted waters, etc. You can load the scripts via the coot menu: Calculate > Run Script.... Depending on which Coot version you have you should use either the scheme script (9xyz_final.scm) or the python script (9xyz_final.py). Generally, Linux users can use the scheme script and Windows users the python script. OSX users may need to try both.

What is the Refmac command file (9xyz.refmac) and how do I use it?

PDB_REDO optimises many parameters to get the most out of refinement with Refmac. After running PDB_REDO you may want to change a few more things in you model before completing it. Rather than running the complete PDB_REDO pipeline again, you can also do a quick refinement in Refmac with all the refinement setting previously optimised by PDB_REDO. The refmac command file (9xyz.refmac) contains all the keywords to apply these settings (and override the default settings). You can use the command file in two ways:

  1. In the graphical CCP4 interface, you can add the file as Refmac keyword file when you set up your refinement
  2. On the command line or in scripts you can load the command file with the keyword @9xyz.refmac

Which YASARA version do I need for PDB_REDO?

YASARA is a molecular-graphics, -modeling and -simulation program for Windows, Linux, OS X, and Android that covers most of your structural bioinformatics needs. It offers user-friendly ways to do homology modeling, molecular dynamics, drug docking and many other calculations and experiments. In PDB_REDO we use it to analyse the changes to the structure model and to validate ligands and their binding sites.
YASARA comes in several versions (or tiers) with increasingly large feature sets. The lowest tier, YASARA view, is free the other tiers have a licence fee (with excellent value-for-money). This is what you get in PDB_REDO with different YASARA tiers: