Prody Basics

This tutorial aims to teach basic data structures and functions in ProDy.

First, we need to import required packages:

These import commands will load numpy, matplotlib, and ProDy into the memory. confProDy is used to modify the default behaviors of ProDy. Here we turned off auto_show so that the plots can be made in the same figure, and we turn on auto_secondary to parse the secondary structure information whenever we load a PDB into ProDy. See here for a complete list of behaviors that can be changed by this function. This function only needs to be called once, and the setting will be remembered by ProDy.

Tip: how to access the documentation

Loading PDB files and visualization

ProDy comes with many functions that can be used to fetch data from Protein Data Bank.

The standard way to do this is with parsePDB, which will download a PDB file if needed and load it into a variable of a special data type called an AtomGroup:

To visualize the structure, we do the following:

If you would like to display the 3D structure using other packages or your own code, you can get the 3D coordinates via the getCoords method, which returns a NumPy ndarray:

We can also visualise the contact map as follows:

An AtomGroup is essentially a collection of protein atoms. Each atom can be indexed/queried/found by the following way:

This will give you the $11^{th}$ atom from p38, noting that Python index starts from 0. We can also examine the spatial location of this atom by querying the coordinates, which we can also use to highlight this atom in a plot.

We could select a chain, e.g. chain A, of the protein by indexing using its identifier, as follows:

In many cases, it is more convenient to examine the structure with residue numbers, and AtomGroup supports indexing with a chain ID and a residue number:

This will give you the residue with the residue number of atom 10, which is an arginine in p38. Please note the difference between this line and the previous one.

Note that some ProDy objects may not support indexing using a chain identifier or a residue number. In such cases, we can first obtain a hierarchical view of the object:

And then use HierView to index with a chain identifier and residue number as it will always be supported:

Retrieving data from an AtomGroup

Many properties of the protein can be acquired by functions named like "getxxx". For instance, we can obtain the B-factors by:

In this way, we can obtain the B-factor for every single atom. However, in some cases, we only need to know the B-factors of alpha-carbons. We have a shortcut for this:

If we would like to use residue numbers in the PDB, instead of the indices as the x-axis of the plot, it would be much more convenient to use the ProDy plotting function, showAtomicLines.

We can also obtain the secondary structure information as an array:

To make it easier to read, we can convert the array into a string using the Python's built-in function, join :

C is for coil, H for alpha helix, I for pi helix, G for 3-10 helix, and E for beta strand (sheet).

To get a complete list of "get" functions, you can type p38.get<TAB>. We provide a cell for doing this here:

The measure module contains various additional functions for calculations for structural properties. For example, you can calculate the phi angle of the 11th residue:

Note that the residue at the N-terminus or C-terminus does not have a Phi or Psi angle, respectively.

If we calculate the Phi and Psi angle for every non-terminal residue, we can obtain a Ramachandran plot for a protein. An example of Ramachandran plot for human PCNA is shown below:

Three favored regions are shown in red -- upper left: beta sheet; center left: alpha helix; center right: left-handed helix. Each blue data point corresponds to the two dihedrals of a residue. We will reproduce this plot for ubiquitin (only the points).

In the above code, we use an exception handler to exclude the terminal residues from the calculation.

Selection

In theory you could retrieve any set of atoms by indexing the AtomGroup, but it would be cumbersome to do so. To make it more convienient, ProDy provides VMD-like syntax for selecting atoms. Here lists a few common selection strings. For a more complete tutorial on selection, please see here.

We could also perform some simple selections right when the structure is being parsed. For example, we can specify that we would like to obtain only alpha-carbons of chain A of p38 as follows:

We could find the chain A using selection (as an alternative to the indexing method shown above):

Selection also works for finding a single residue or multiple residues:

We can also select a range of residues as follows:

If we have data associated to the full length of the protein, we can slice the data using the sliceAtomicData:

We can visualize the data of this range using showAtomicLines:

Or highlight the subset in the plot of the whole protein:

Selection also allows us to extract particular amino acid types:

Again, combined with sliceAtomicData and showAtomicLines, we can highlight these residues in the plot of the whole protein:

Compare and align structures

You can also compare different structures using some of the methods in proteins module. Let’s parse another p38 MAP kinase structure.

You can find similar chains in structure 1p38 and 1zz2 using the matchChains function

In Python, a tuple (or any indexable objects) can be unpacked as follows:

The first two terms are the mapping of the proteins to each other,

the third term is the sequence identity,

and the forth term is the sequence coverage or overlap:

If we calculate RMSD right now, we will obtain the value for the unsuperposed proteins:

After superposition, the RMSD will be much improved,

We can also visualize the superposition of the full proteins as the transform matrix is applied to the entire structure:

Advanced Visualization

Using matplotlib, we only obtained a very simple linear representation of proteins. ProDy also supports a more sophisticated way of visualizing proteins in 3D via py3Dmol:

The limitation is that py3Dmol only works in an iPython notebook. You can always write out the protein to a PDB file and visualize it in an external program: