ProDy Tutorial: Gaussian Network Models

GNM - Minimal Theory

The potential energy of a Gaussian Network Model is defined as the summation of harmonic potentials over all unique $(i,j)$-pairs and is a function of only the square of inter-residue distance vector, $\Delta \mathbf{R}_{ij} = \mathbf{R}_i - \mathbf{R}_j^0$. It can be given in terms of the Kirchhoff matrix for inter-residue contacts, $\mathbf{\Gamma}$,

$$ V^{GNM} = \frac{\gamma}{2} \Delta \mathbf{R}^T \left( \mathbf{\Gamma} \otimes \mathbf{I}_{3\times 3} \right) \Delta \mathbf{R} \,, $$

where $\mathbf{\Gamma}$ is an $(N \times N)$-matrix and is defined as:

$$ \mathbf{\Gamma}_{ij} = \left\{\begin{matrix} -1 & \text{if } i \ne j \text{ and }R_{ij} \le R_{cut} \,, \\ 0 & \text{if } i \ne j \text{ and }R_{ij} > R_{cut} \,, \\ -\sum_{j,j \ne i}^{N} \mathbf{\Gamma}_{ij} & \text{if } i = j \,. \end{matrix}\right. $$

GNM Module

First, let's import all the required packages!

Tip: how to access the documentation

Parse the PDB structure for adenylate kinase.

This PDB file contains 2 chains. If we only want to use $C^{\alpha}$ atoms from chain A, then we can select it as:

Computing GNM normal modes

See also the online tutorial.

First, let's instantiate an object of GNM class.

Now, we can build the Kirchhoff matrix by the method buildKirchhoff.

What are the parameters of this method?

Let's build Kirchhoff matrix based on the coordinate of $C^{\alpha}$-atoms already selected.

We can get the Kirchhoff matrix by the following method:

Check that every row sum is equal to 0 (since it is a Kirchhoff matrix!):

Degrees of residues:

Calculate normal modes

We can calculate normal modes from the Kirchhoff matrix by the method calcModes. Many parameters in this function can be adjusted, e.g. the number of modes: calcModes( ) doc

The algebraic multiplicity of zero eigenvalues determines separate connected components of your graph.

If it is equal to one, then your graph does not have a disconnected piece!

Get eigenvalues and eigenvectors:

NB: The eigenvector corresponding to 0 eigenvalue is always constant.

How about the variances:

NB: A more compact way to compute GNM is to use the function calcGNM( ) doc

How to plot results

The contact map can be visualized with the command:

... while the shape of normal modes can be visualized by typing:

We can show possible hinge sites on that plot as well.

Furthermore, you can also access hinge sites identified from multiple modes (e.g. 2 modes) by:

Many other visualization tools can be found online at the page Plotting Functions.

GNM covariance and (normalized/orientational) cross-correlation matrices

Let's get the covariance matrix first, $C_{ij} \propto \sum_{i=1}^{N-1} \lambda_i^{-1} \mathbf{u}_i \mathbf{u}_i^T$:

Alternative ways to compute the covariance and (normalized/orientational) cross-correlation matrix, $C_{ij}^{(n)} = C_{ij} \big/ \left(C_{ii} C_{jj} \right)^{1/2}$:

We can visualize the normalized/orientational cross-correlations between residue fluctuations with the command:

The residues' mean square fluctuations (MSF) are found on the diagonal of the covariance matrix and describe the residue mobility or flexibility.

Moreover, you can specify the number of modes to compute the MSFs to see either their indivudual or cumulative contributions visually as well!

Comparison of MSFs with experimental B-factors (Debye-Waller factor)

We can also see how well the computed MSF correlates with the experimental B-factors (Debye-Waller) extracted from the PDB file. It is better to use all modes for this aim.

Let's calculate the effective force constant $\gamma$ of GNM springs in $k_B T \mathring{A}^{-2}$:

Let's evaluate Pearson's correlation coefficient between experimental B-factors and GNM predicted MSFs:

The values above are equal to those that DynOmics server gives!

Visualize modes and MSFs on protein's 3D structure

Mean square fluctuations computed by GNM modes can also be stored in the tempFactor (B-factors) column of a PDB file, and can be visualized in VMD by coloring the protein using B-factors option.

Let's see how we can color-code the protein using its $C^\alpha$-atoms with the mean square fluctuations based on the first 10 GNM modes.

You can run VMD from jupyter notebook (if it is in your PATH variables) using an exclamation mark to tell the jupyter notebook that it is a command run by the operating system not by the python kernel.

Caveat - flexible loops/termini

The unfolded C-terminal (residues 71-76) of ubiquitin is particularly flexible and its contribution to total residue fluctuations is dominant. Therefore, we need to exclude it from the analysis.

Let's look at the effect of those residues on the collectivity of a particular mode, whose degree basically reflects the number of atoms which are significantly affected by that mode. The collectivity is defined as the exponential of the information entropy of that mode: calcCollectivity

We can easily compute the collectivity of a set of modes at once!