# Sequence-Structure Comparison¶

The part shows how to compare sequence conservation properties with structural mobility obtained from Gaussian network model (GNM) calculations.

```
In [1]: from prody import *
In [2]: from matplotlib.pylab import *
In [3]: ion() # turn interactive mode on
```

## Entropy Calculation¶

First, we retrieve MSA for protein for protein family PF00074:

```
In [4]: fetchPfamMSA('PF00074')
Out[4]: 'PF00074_full.sth'
```

We parse the MSA file:

```
In [5]: msa = parseMSA('PF00074_full.sth')
```

Then, we refine it using `refineMSA()`

based on the sequence of
RNAS1_BOVIN:

```
In [6]: msa_refine = refineMSA(msa, label='RNAS1_BOVIN', seqid=0.98)
```

We calculate the entropy for the refined MSA:

```
In [7]: entropy = calcShannonEntropy(msa_refine)
```

## Mobility Calculation¶

Next, we obtain residue fluctuations or mobility for protein member of the above family. We will use chain B of 2W5I.

```
In [8]: pdb = parsePDB('2W5I', chain='B')
In [9]: chB_ca = pdb.select('protein and name CA and resid 1 to 121')
```

We perform GNM as follows:

```
In [10]: gnm = GNM('2W5I')
In [11]: gnm.buildKirchhoff(chB_ca)
In [12]: gnm.calcModes(n_modes=None) # calculate all modes
```

Now, let’s obtain residue mobility using the slowest mode, the slowest 8 modes, and all modes:

```
In [13]: mobility_1 = calcSqFlucts(gnm[0])
In [14]: mobility_1to8 = calcSqFlucts(gnm[:8])
In [15]: mobility_all = calcSqFlucts(gnm[:])
```

See Gaussian Network Model (GNM) for details.

## Comparison of mobility and conservation¶

We use the above data to compare structural mobility and degree of conservation. We can calculate a correlation coefficient between the two quantities:

```
In [16]: result = corrcoef(mobility_all, entropy)
In [17]: result.round(3)[0,1]
Out[17]: 0.403
```

We can plot the two curves simultaneously to visualize the correlation. We have to scale the values of mobility to display them in the same plot.

### Plotting¶

```
In [18]: indices = range(1,122)
In [19]: bar(indices, entropy, width=1.2, color='grey');
In [20]: xlim(min(indices)-1, max(indices)+1);
In [21]: plot(indices, mobility_all*(max(entropy)/max(mobility_all)), color='b',
....: linewidth=2);
....:
```

## Writing PDB files¶

We can also write PDB with b-factor column replaced by entropy and mobility values respectively. We can then load the PDB structure in VMD or PyMol to see the distribution of entropy and mobility on the structure.

```
In [22]: selprot = pdb.select('protein and resid 1 to 121')
In [23]: resindex = selprot.getResindices()
In [24]: entropy_prot = [entropy[ind] for ind in resindex]
In [25]: mobility_prot = [mobility_all[ind]*10 for ind in resindex]
In [26]: selprot.setBetas(entropy_prot)
In [27]: writePDB('2W5I_entropy.pdb', selprot)
Out[27]: '2W5I_entropy.pdb'
In [28]: selprot.setBetas(mobility_prot)
In [29]: writePDB('2W5I_mobility.pdb', selprot)
Out[29]: '2W5I_mobility.pdb'
```

We can see on the structure just as we could in the bar graph that there is some correlation with highly conserved (low entropy) regions having low mobility and high entropy regions have higher mobility.