Ensemble Analysis¶

This example compares experimental structural data analyzed using Principal Component Analysis (PCA) with the theoretical data predicted by Anisotropic Network Model (ANM):

First make the necessary imports:

Retrieve dataset¶

One way to retrieve data is to run an NCBI BLAST search against the PDB with the function blastPDB.

To do this, we first need to obtain a sequence and one way to do that is from the PDB:

We just want one sequence, so we get the sequence of chain A from the pdb file

Once we have this sequence, we can use it in the function blastPDB. We also provide a filename to save the output so we don't need to run it again. To reduce demand on the NCBI webserver, we have provided you this file so please do not run this command.

# blast_record = blastPDB(p38_sequence)

We can get hits from this record using certain parameters to filter them and extract a list of PDB IDs from them.

Next, we will use the parsePDB function to import each one of the structures corresponding to these IDs.

Before doing that, we will make a folder to put them in (if it doesn't already exist) and configure ProDy to use that folder with the function pathPDBFolder.

After parsing the structures from the PDB, we can use this function again to reset the default download folder back to our current directory:

Set reference chain¶

Next, we make a selection to use as the reference for ensemble building:

We extract chain A by indexing to get a Chain object to make things easier.

Ensemble Preparation¶

We will prepare a PDBEnsemble by mapping each structure against the reference chain and adding a coordinates set corresponding to the mapped atoms. We first make sure that our list of PDB structures (pdbs) includes the ref_chain.

Ensemble Dynamics¶

Now we will examine the structural dynamics of this ensemble using two different methods

1. Principal Component Analysis (PCA)¶

PCA is a method that identifies the components which account for the greatest amount of variability in your dataset, i.e. ensemble.

The components/modes of variation are sorted such that the first modes contribute the greatest fractional variance, which we can show as follows:

The first modes with the highest fractional variance are called the principal components (PCs).

2. Anisotropic Network Model (ANM) Normal Mode Analysis (NMA)¶

The ANM allows for the identification of the most impactful (slowest) modes in dynamics of a single protein, which we can compare to the principal components from PCA.

Analysis of PCA and ANM modes¶

Collectivity of modes¶

One property that we can calculate and compare is the collectivity, which describes the extent to which a mode collectively recruits large portions of the structure. We see that most of the first modes from both calculations are highly collective.

PCA - ANM overlap¶

We can also look at how well the modes produced from each method correlate with each other using the overlap (correlation cosine).

The overlap table shows how each mode from the two methods overlap with each other. Some modes overlap very well with one other mode while others overlap with multiple modes to a lesser extent.

We can also look at the overlap between one mode and all others as follows. The cumulative overlap is the square root of the sum of squared overlaps.

Square Fluctuations¶

In order to see where in the protein these important motions occur, we can visualize the square fluctuations of the principal components and/or slow ANM modes as follows. The function showScaledSqFlucts allows us to scale the square fluctuations from each set of modes to have the same overall size for easier comparison.

We can apply this to individual modes, such as overlapping mode 1 (index 0) from each method, or multiple modes such as the first 3.

Cross Correlations¶

We can also see how correlated the motions for each residue are with each other residue. We see similar patterns for the two methods, especially when using a large number of modes.