Prody Basics

This tutorial aim to teach the basic class structure and the functions in Prody.

First we need to import required packages.

In [1]:
from prody import *
from pylab import *
%matplotlib inline
#confProDy(auto_show=False)
//anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

Structural Analysis

Measure geometric properties

ProDy comes with many functions that can be used to calculate structural properties and compare structures. Let's parse a structure:

In [2]:
p38 = parsePDB('1p38')
@> PDB file is found in working directory (1p38.pdb.gz).
@> 2962 atoms and 1 coordinate set(s) were parsed in 0.12s.
In [3]:
showProtein(p38)
Out[3]:
<mpl_toolkits.mplot3d.axes3d.Axes3D at 0x10dab1250>

In measure module, you can find various functions for calculations for structural properties. For example, you can calculate the phi angle of 10th residue or the radius of gyration of protein.

In [7]:
calcPhi(p38[10,]).round(3)
Out[7]:
-115.535
In [8]:
calcGyradius(p38).round(3)
Out[8]:
22.058

Compare and align structures

You can also compare different structures using some of the methods in proteins module. Let’s parse another p38 MAP kinase structure.

In [9]:
bound = parsePDB('1zz2')
@> PDB file is found in working directory (1zz2.pdb.gz).
@> 2872 atoms and 1 coordinate set(s) were parsed in 0.07s.

You can find similar chains in structure 1p38 and 1zz2 using matchChains() function

In [10]:
apo_chA, bnd_chA, seqid, overlap = matchChains(p38, bound)[0]
apo_chA
@> Checking AtomGroup 1p38: 1 chains are identified
@> Checking AtomGroup 1zz2: 1 chains are identified
@> Trying to match chains based on residue numbers and names:
@>   Comparing Chain A from 1p38 (len=351) and Chain A from 1zz2 (len=337):
@> 	Match: 337 residues match with 99% sequence identity and 96% overlap.
Out[10]:
<AtomMap: Chain A from 1p38 -> Chain A from 1zz2 from 1p38 (337 atoms)>
In [11]:
bnd_chA
Out[11]:
<AtomMap: Chain A from 1zz2 -> Chain A from 1p38 from 1zz2 (337 atoms)>
In [14]:
seqid
Out[14]:
99.40652818991099
In [15]:
overlap
Out[15]:
96
In [16]:
calcRMSD(bnd_chA, apo_chA)
Out[16]:
72.930230869465859
In [17]:
bnd_chA, transformation = superpose(bnd_chA, apo_chA)
calcRMSD(bnd_chA, apo_chA)
Out[17]:
1.86280149086955
In [18]:
showProtein(p38);
showProtein(bound);

Dynamics Analysis

In this section, we will show how to perform quick GNM, ANM and PCA analysis using a solution structure of Ubiquitin.

GNM Analysis

We will parse the PDB structure for ubiquitin.

In [19]:
ubi = parsePDB('1aar')
ubi
@> PDB file is found in working directory (1aar.pdb.gz).
@> 1218 atoms and 1 coordinate set(s) were parsed in 0.03s.
Out[19]:
<AtomGroup: 1aar (1218 atoms)>

We need to select the alpha-carbon atoms.

In [20]:
calphas = ubi.select('calpha and chain A and resnum<71')
calphas
Out[20]:
<Selection: 'calpha and chain A and resnum<71' from 1aar (70 atoms)>

We will build GNM instance and calculate normal modes.

In [24]:
gnm = GNM('Ubiquitin') # Instantiate GNM instance
In [25]:
gnm.buildKirchhoff(calphas) # Build Kirchoff matrix based on the C-alpha coordinates. 
@> Kirchhoff was built in 0.01s.
In [26]:
showContactMap(gnm)
Out[26]:
<matplotlib.image.AxesImage at 0x111938a50>
In [27]:
gnm.calcModes() # calculate eigenvalues and eigenvectors of Kirchhoff matrix
@> 20 modes were calculated in 0.01s.

We can get the eigenvalues, eigenvectors and covariance matrix.

In [28]:
gnm.getEigvals().round(3)
Out[28]:
array([  2.502,   2.812,   4.366,   5.05 ,   7.184,   7.65 ,   7.877,
         9.08 ,   9.713,  10.132,  10.502,  10.644,  10.888,  11.157,
        11.285,  11.632,  11.78 ,  11.936,  12.006,  12.218])
In [30]:
gnm.getEigvecs().round(3)
Out[30]:
array([[-0.064, -0.131, -0.245, ..., -0.256,  0.538, -0.   ],
       [-0.073, -0.085, -0.19 , ...,  0.006, -0.069,  0.032],
       [-0.076, -0.043, -0.135, ...,  0.017, -0.047,  0.018],
       ..., 
       [-0.092,  0.064,  0.105, ...,  0.032, -0.042,  0.006],
       [-0.07 ,  0.099,  0.054, ...,  0.031,  0.024, -0.014],
       [-0.081,  0.135,  0.124, ...,  0.013, -0.04 , -0.018]])
In [32]:
showMode(gnm[0])
Out[32]:
[<matplotlib.lines.Line2D at 0x111cf9f10>]
In [33]:
showSqFlucts(gnm[0])
Out[33]:
[<matplotlib.lines.Line2D at 0x111dbddd0>]
In [34]:
gnm.getCovariance().round(2)
Out[34]:
array([[ 0.08,  0.02,  0.01, ..., -0.01, -0.01, -0.01],
       [ 0.02,  0.02,  0.01, ..., -0.  , -0.  , -0.01],
       [ 0.01,  0.01,  0.01, ...,  0.  , -0.  , -0.  ],
       ..., 
       [-0.01, -0.  ,  0.  , ...,  0.01,  0.01,  0.01],
       [-0.01, -0.  , -0.  , ...,  0.01,  0.01,  0.02],
       [-0.01, -0.01, -0.  , ...,  0.01,  0.02,  0.05]])
In [35]:
showCrossCorr(gnm)
Out[35]:
(<matplotlib.image.AxesImage at 0x1124d7350>,
 <matplotlib.colorbar.Colorbar at 0x1125650d0>)
In [36]:
writeNMD('ubi_gnm.nmd',gnm,calphas)
Out[36]:
'ubi_gnm.nmd'

ANM Calculations

Anisotropic network model (ANM) analysis can be performed in the following way:

In [37]:
ubi = parsePDB('2k39', subset='calpha')
@> PDB file is found in working directory (2k39.pdb.gz).
@> 76 atoms and 116 coordinate set(s) were parsed in 1.21s.
In [38]:
ubi_selection = ubi.select('resnum < 71')
In [39]:
anm = ANM('ubi') # instantiate ANM object
In [40]:
anm.buildHessian(ubi_selection) # build Hessian matrix for selected atoms
@> Hessian was built in 0.06s.
In [41]:
anm.calcModes() # calculate normal modes
@> 20 modes were calculated in 0.08s.

Individual Mode instances can be accessed by indexing the ANM instance:

In [42]:
slowest_mode = anm[0]
print( slowest_mode )
Mode 1 from ANM ubi
In [43]:
print( slowest_mode.getEigval().round(3) )
1.714

Let’s confirm that normal modes are orthogonal to each other:

In [44]:
(anm[0] * anm[1]).round(10)
Out[44]:
0.0
In [45]:
(anm[0] * anm[2]).round(10)
Out[45]:
0.0

You can calculate cross correlation and plot them.

In [62]:
cross_corr = calcCrossCorr(anm[0])
In [61]:
showCrossCorr(anm[0])
Out[61]:
(<matplotlib.image.AxesImage at 0x1127cced0>,
 <matplotlib.colorbar.Colorbar at 0x11264ccd0>)

Collective modes are significant.

In [50]:
calcCollectivity(anm[0])
Out[50]:
0.050864903317453324
In [52]:
calcCollectivity(anm[1])
Out[52]:
0.36877772410536919
In [63]:
showCrossCorr(anm[1])
Out[63]:
(<matplotlib.image.AxesImage at 0x112093510>,
 <matplotlib.colorbar.Colorbar at 0x112261b90>)
In [64]:
writeNMD('ubi_anm.nmd',anm, ubi_selection)
Out[64]:
'ubi_anm.nmd'

PCA Calculations

Let’s perform principal component analysis (PCA) of an ensemble of NMR models, such as 2k39. First, we prepare the ensemble:

In [65]:
ubi_ensemble = Ensemble(ubi_selection)
In [66]:
ubi_ensemble.iterpose()
@> Starting iterative superposition:
@> Step #1: RMSD difference = 5.4124e-01
@> Step #2: RMSD difference = 2.1728e-04
@> Step #3: RMSD difference = 2.3341e-07
@> Iterative superposition completed in 0.22s.

Then, we perform the PCA calculations:

In [67]:
pca = PCA('Ubiquitin')
In [68]:
pca.buildCovariance(ubi_ensemble)
@> Covariance is calculated using 116 coordinate sets.
@> Covariance matrix calculated in 0.007146s.
In [72]:
pca.calcModes()
@> 20 modes were calculated in 0.01s.
In [74]:
calcCollectivity(pca[0])
Out[74]:
0.57332262588273264

This analysis provides us with a description of the dominant changes in the structural ensemble. Let’s see the fraction of variance for top ranking 4 PCs:

In [70]:
for mode in pca[:4]:
    print(calcFractVariance(mode).round(2))
0.13
0.09
0.08
0.07
In [71]:
writeNMD('ubi_pca.nmd',pca,ubi_ensemble)
Out[71]:
'ubi_pca.nmd'

Comparative Analysis

ProDy comes with many built-in functions to facilitate a comparative analysis of experimental and theoretical data. For example, using printOverlapTable() function you can see the agreement between experimental (PCA) modes and theoretical (ANM) modes calculated above:

In [75]:
printOverlapTable(pca[:4], anm[:4])
Overlap Table
                            ANM ubi
                     #1     #2     #3     #4
PCA Ubiquitin #1   -0.21  +0.30  -0.17  -0.47
PCA Ubiquitin #2   +0.01  +0.72  +0.08  +0.05
PCA Ubiquitin #3   +0.31  +0.11  +0.18  +0.19
PCA Ubiquitin #4   +0.11  -0.02  -0.17  -0.39

Output above shows that PCA mode 2 and ANM mode 2 for ubiquitin show the highest overlap (cosine-correlation).

In [76]:
showOverlapTable(pca[:4], anm[:4]);