Sequence-Structure Comparison

The part shows how to compare sequence conservation properties with structural mobility obtained from Gaussian network model (GNM) calculations.

In [1]: from prody import *

In [2]: from matplotlib.pylab import *

In [3]: ion()  # turn interactive mode on

Entropy Calculation

First, we retrieve MSA for protein for protein family PF00074:

In [4]: fetchPfamMSA('PF00074')
Out[4]: 'PF00074_full.sth'

We parse the MSA file:

In [5]: msa = parseMSA('PF00074_full.sth')

Then, we refine it using refineMSA() based on the sequence of RNAS1_BOVIN:

In [6]: msa_refine = refineMSA(msa, label='RNAS1_BOVIN', rowocc=0.8, seqid=0.98)

We calculate the entropy for the refined MSA:

In [7]: entropy = calcShannonEntropy(msa_refine)

Mobility Calculation

Next, we obtain residue fluctuations or mobility for protein member of the above family. We will use chain B of 2W5I.

In [8]: pdb = parsePDB('2W5I', chain='B')

In [9]: chB_ca = pdb.select('protein and name CA and resid 1 to 119')

We perform GNM as follows:

In [10]: gnm = GNM('2W5I')

In [11]: gnm.buildKirchhoff(chB_ca)

In [12]: gnm.calcModes(n_modes=None)  # calculate all modes

Now, let’s obtain residue mobility using slowest mode, slowest 8 modes, and all modes:

In [13]: mobility_1 = calcSqFlucts(gnm[0])

In [14]: mobility_1to8 = calcSqFlucts(gnm[:8])

In [15]: mobility_all = calcSqFlucts(gnm[:])

See Gaussian Network Model (GNM) for details.

Comparison of mobility and conservation

We use the above data to compare structural mobility and degree of conservation. We can calculate a correlation coefficient between the two quantities:

In [16]: result = corrcoef(mobility_all, entropy)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-90fa48f40022> in <module>()
----> 1 result = corrcoef(mobility_all, entropy)

/home/cihank/.local/lib/python2.7/site-packages/numpy/lib/function_base.pyc in corrcoef(x, y, rowvar, bias, ddof)
   2143         warnings.warn('bias and ddof have no affect and are deprecated',
   2144                       DeprecationWarning)
-> 2145     c = cov(x, y, rowvar)
   2146     try:
   2147         d = diag(c)

/home/cihank/.local/lib/python2.7/site-packages/numpy/lib/function_base.pyc in cov(m, y, rowvar, bias, ddof, fweights, aweights)
   2022         if rowvar == 0 and y.shape[0] != 1:
   2023             y = y.T
-> 2024         X = np.vstack((X, y))
   2025 
   2026     if ddof is None:

/home/cihank/.local/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
    228 
    229     """
--> 230     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    231 
    232 def hstack(tup):

ValueError: all the input array dimensions except for the concatenation axis must match exactly

In [17]: result.round(3)[0,1]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-17-db90e9c8dcf3> in <module>()
----> 1 result.round(3)[0,1]

NameError: name 'result' is not defined

We can plot the two curves simultaneously to visualize the correlation. We have to scale the values of mobility to display them in the same plot.

Plotting

In [18]: indices = range(1,120)

In [19]: bar(indices, entropy, width=1.2, color='grey', hold='True');

In [20]: xlim(min(indices)-1, max(indices)+1);

In [21]: plot(indices, mobility_all*(max(entropy)/mean(mobility_all)), color='b',
   ....: linewidth=2);
   ....: 
../../_images/entropy_mobility.png

Writing PDB files

We can also write PDB with b-factor column replaced by entropy and mobility values respectively. We can then load the PDB structure in VMD or PyMol to see the distribution of entropy and mobility on the structure.

In [22]: selprot = pdb.select('protein and resid 1 to 121')

In [23]: resindex = selprot.getResindices()

In [24]: index = unique(resindex)

In [25]: count = 0

In [26]: entropy_prot = []

In [27]: mobility_prot = []

In [28]: for ind in index:
   ....:     while(count < len(resindex)):
   ....:         if(ind == resindex[count]):
   ....:             entropy_prot.append(entropy[ind])
   ....:             mobility_prot.append(mobility_all[ind]*100)
   ....:         count = count + 1
   ....: 

In [29]: selprot.setBetas(entropy_prot)

In [30]: writePDB('2W5I_entropy.pdb', selprot)
Out[30]: '2W5I_entropy.pdb'

In [31]: selprot.setBetas(mobility_prot)

In [32]: writePDB('2W5I_mobility.pdb', selprot)
Out[32]: '2W5I_mobility.pdb'