Overview

Release Notes

v1.11 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Overview¶

The first step in signature dynamics analysis is to collect a set of related protein structures and build a PDBEnsemble. This can be achieved by multiple routes: a query search of the PDB using blastPDB() or searchDali(), extraction of PDB IDs from the Pfam or CATH database, or input of a pre-defined list.

We demonstrate the Dali method here in the first part of the tutorial. The usage of Pfam and CATH methods are described in the database tutorial (under construction) and the function blastPDB() is described in the Structure Analysis Tutorial.

We apply these methods to the PBP-I domains, a group of protein structures originally found in bacteria for transport of solutes across the periplasmic space and later seen in various eukaryotic receptors including ionotropic and metabotropic glutamate receptors. We use the N-terminal domain of AMPA receptor subunit GluA2 (gene name GRIA2) as a query.

First, make necessary imports from ProDy and Matplotlib packages if you haven’t already.

In [1]: from prody import *

In [2]: from pylab import *

In [3]: ion()

Prepare ensemble (using Dali)¶

First we use the function searchDali() to search the PDB, which returns a DaliRecord object that contains a list of PDB IDs and their corresponding mappings to the reference structure.

In [4]: dali_rec = searchDali('3H5V','A')

In [5]: dali_rec
Out[5]: <prody.database.dali.DaliRecord at 0x7fb4fdb6fe50>

If DALI times out then, you can use the following code to fetch the data afterwards.

In [6]: while not dali_rec.isSuccess:
   ...:    dali_rec.fetch()
   ...: 

Next, we get the lists of PDB IDs and mappings from dali_rec, parse the pdb_ids to get a list of AtomGroup instances:

In [7]: pdb_ids = dali_rec.filter(cutoff_len=0.7, cutoff_rmsd=1.0, cutoff_Z=10)

In [8]: mappings = dali_rec.getMappings()

In [9]: ags = parsePDB(pdb_ids, subset='ca')

In [10]: len(ags)
Out[10]: 482

Then we provide ags together with mappings to buildPDBEnsemble(). We set the keyword argument seqid=20 to account for the low sequence identity between some of the structures.

In [11]: dali_ens = buildPDBEnsemble(ags, mapping=mappings, seqid=20)

In [12]: dali_ens
Out[12]: <PDBEnsemble: Unknown (69 conformations; 376 atoms)>

Finally we save the ensemble for later processing:

In [13]: saveEnsemble(dali_ens, 'PBP-I')
Out[13]: 'PBP-I.ens.npz'

Mode ensemble¶

For this analysis we’ll build a ModeEnsemble by calculating normal modes for each member of the PDBEnsemble. You can load a PDB ensemble at this stage if you already have one. We demonstrate this for the one we just saved.

In [14]: dali_ens = loadEnsemble('PBP-I.ens.npz')

Then we calculated GNM modes for each member of the ensemble. There are options to select the model (GNM by default) and the way of considering non-aligned residues by setting the trim option (default is reduceModel(), which treats them as environment).

In [15]: gnms = calcEnsembleENMs(dali_ens, model='GNM', trim='reduce')

In [16]: gnms
Out[16]: <ModeEnsemble: 69 modesets (20 modes, 376 atoms)>

We can also save the mode ensemble as follows:

In [17]: saveModeEnsemble(gnms, 'PBP-I')
Out[17]: 'PBP-I.modeens.npz'

We can load in a mode ensemble at this point as follows:

In [18]: gnms = loadModeEnsemble('PBP-I.modeens.npz')

Signature dynamics¶

Signatures are calculated as the mean and standard deviation of various properties such as mode shapes and mean square fluctations.

For example, we can show the average and standard deviation of the shape of the first mode (second index 0). The first index of the mode ensemble is over conformations.

In [19]: showSignatureMode(gnms[:, 0]);

We can also show such results for properties involving multiple modes such as the mean square fluctuations from the first 5 modes or the cross-correlations from the first 20.

In [20]: showSignatureSqFlucts(gnms[:, :5]);

In [21]: showSignatureCrossCorr(gnms[:, :20]);

../../_images/signdy_dali_cross-corr.png

We can also look at distributions over values across different members of the ensemble such as inverse eigenvalue. We can show a bar above this with individual members labelled like [JK15].

We can select particular members to highlight with arrows by putting their names and labels in a dictionary:

In [22]: highlights = {'3h5vA_ca': 'GluA2','3o21C_ca': 'GluA3',
   ....:               '3h6gA_ca': 'GluK2', '3olzA_ca': 'GluK3',
   ....:               '5kc8A_ca': 'GluD2'}
   ....: 

We plot the variance bar for the first five modes (showing a function of the inverse eigenvalues related to the resultant relative size of motion) above the inverse eigenvalue distributions for each of those modes. To arrange the plots like this, we use the :class:~matplotlib.gridspec.GridSpec method of Matplotlib.

In [23]: from matplotlib.gridspec import GridSpec

In [24]: gs = GridSpec(ncols=1, nrows=2, height_ratios=[1, 10], hspace=0.15)

In [25]: subplot(gs[0]);

In [26]: showVarianceBar(gnms[:, :5], fraction=True, highlights=highlights);

In [27]: xlabel('');

In [28]: subplot(gs[1]);

In [29]: showSignatureVariances(gnms[:, :5], fraction=True, bins=80, alpha=0.7);

In [30]: xlabel('Fraction of inverse eigenvalue');

../../_images/signdy_dali_variance_mode1-5.png

Spectral overlap and distance¶

Spectral overlap, also known as covariance overlap as defined in [BH02], measures the overlap between two covariance matrices, or the overlap of a subset of the modes (a mode spectrum).

In [31]: so_matrix = calcEnsembleSpectralOverlaps(gnms[:, :1])

In [32]: showMatrix(so_matrix);

../../_images/signdy_dali_so_matrix_mode1.png

We can then calculate a tree from its arccosine, which converts the overlaps to distances:

In [33]: labels = gnms.getLabels()

In [34]: so_tree = calcTree(names=labels,
   ....:                    distance_matrix=arccos(so_matrix),
   ....:                    method='upgma')
   ....: 

This tree can be displaced using the :func:.showTree function. The default format is ASCII text but we can change it to plt to get a figure:

In [35]: showTree(so_tree, 'plt');

../../_images/signdy_dali_so_tree_mode1.png

We can reorder the spectral overlap matrix using the tree as follows:

In [36]: reordered_so, new_so_indices = reorderMatrix(names=labels,
   ....:                                              matrix=so_matrix,
   ....:                                              tree=so_tree)
   ....: 

Both PDBEnsemble and ModeEnsemble objects can be reordered based on the new indices:

In [37]: reordered_ens = dali_ens[new_so_indices]

In [38]: reordered_gnms = gnms[new_so_indices, :]

Comparing with sequence and structural distances¶

The sequence distance is given by the (normalized) Hamming distance, which is calculated by subtracting the percentage identity (fraction) from 1, and the structural distance is the RMSD. We can also calculate and show the matrices and trees for these from the PDB ensemble.

First we calculate the sequence distance matrix:

In [39]: seqid_matrix = buildSeqidMatrix(dali_ens.getMSA())

In [40]: seqdist_matrix = 1. - seqid_matrix

We can visualize the matrix using showMatrix():

In [41]: showMatrix(seqdist_matrix);

../../_images/signdy_dali_seqdist_matrix.png

We can also construct a tree based on this distance matrix:

In [42]: seqdist_tree = calcTree(names=labels,
   ....:                         distance_matrix=seqdist_matrix,
   ....:                         method='upgma')
   ....: 

Similarily, once we obtain the RMSD matrix using PDBEnsemble.getRMSDs(), we can calculate the structure-based tree:

In [43]: rmsd_matrix = dali_ens.getRMSDs(pairwise=True)

In [44]: showMatrix(rmsd_matrix);

In [45]: rmsd_tree = calcTree(names=labels,
   ....:                      distance_matrix=rmsd_matrix,
   ....:                      method='upgma')
   ....: 

../../_images/signdy_dali_rmsd_matrix.png

We could plot the three trees one by one. Or, it could be of interest to put all three trees constructed based on different distance metrics side by side and compare them:

In [46]: showTree(seqdist_tree, format='plt')

In [47]: title('Sequence')
Out[47]: Text(0.5,1,'Sequence')

In [48]: showTree(rmsd_tree, format='plt')

In [49]: title('Structure')
Out[49]: Text(0.5,1,'Structure')

In [50]: showTree(so_tree, format='plt')

In [51]: title('Dynamics')
Out[51]: Text(0.5,1,'Dynamics')

This analysis is quite sensitive to how many modes are used. As the number of modes approaches the full number, the dynamic distance order approaches the RMSD order. With smaller numbers, we see finer distinctions. This is particularly clear in the current case where we used just one mode.

[JK15]

Krieger J, Bahar I, Greger IH. Structure, Dynamics, and Allosteric Potential of Ionotropic Glutamate Receptor N-Terminal Domains. Biophys. J. 2015 109(6):1136-48