Atom Selections

This part gives more information on properties of AtomGroup objects. We start with making necessary imports. Note that every documentation page contains them so that the code within the can be executed independently. You can skip them if you have already done them in a Python session.

In [1]: from prody import *

In [2]: from pylab import *

In [3]: ion()

Atom Selections

AtomGroup instances have a plain view of atoms for efficiency, but they are coupled with a powerful atom selection engine. You can get well defined atom subsets by passing simple keywords or make rather sophisticated selections using composite statements. Selection keywords and grammar are very much similar to those found in VMD. Some examples are shown here:

Keyword selections

Now, we parse a structure. This could be any structure, one that you know well from your research, for example.

In [4]: structure = parsePDB('1p38')

In [5]: protein = structure.select('protein')

In [6]: protein
Out[6]: <Selection: 'protein' from 1p38 (2833 atoms)>

Using the "protein" keyword we selected 2833 atoms out of 2962 atoms. Atomic.select() method returned a Selection instance. Note that all get and set methods defined for the AtomGroup objects are also defined for Selection objects. For example:

In [7]: protein.getResnames()
Out[7]: array(['GLU', 'GLU', 'GLU', ..., 'ASP', 'ASP', 'ASP'], dtype='|S6')

Select by name/type

We can select backbone atoms by passing atom names following "name" keyword:

In [8]: backbone = structure.select('protein and name N CA C O')

In [9]: backbone
Out[9]: <Selection: 'protein and name N CA C O' from 1p38 (1404 atoms)>

Alternatively, we can use "backbone" to make the same selection:

In [10]: backbone = structure.select('backbone')

We select acidic and basic residues by using residue names with "resname" keyword:

In [11]: charged = structure.select('resname ARG LYS HIS ASP GLU')

In [12]: charged
Out[12]: <Selection: 'resname ARG LYS HIS ASP GLU' from 1p38 (906 atoms)>

Alternatively, we can use predefined keywords “acidic” and “basic”.

In [13]: charged = structure.select('acidic or basic')

In [14]: charged
Out[14]: <Selection: 'acidic or basic' from 1p38 (906 atoms)>

In [15]: set(charged.getResnames())
Out[15]: {'ARG', 'ASP', 'GLU', 'HIS', 'LYS'}

Composite selections

Let’s try a more sophisticated selection. We first calculate the geometric center of the protein atoms using calcCenter() function. Then, we select the Cα and Cβ atoms of residues that have at least one atom within 10 A away from the geometric center.

In [16]: center = calcCenter(protein).round(3)

In [17]: center
Out[17]: array([ 1.005, 17.533, 40.052])

In [18]: sel = structure.select('protein and name CA CB and same residue as '
   ....:                        '((x-1)**2 + (y-17.5)**2 + (z-40.0)**2)**0.5 < 10')
   ....: 

In [19]: sel
Out[19]: <Selection: 'protein and nam...)**2)**0.5 < 10' from 1p38 (66 atoms)>

Alternatively, this selection could be done as follows:

In [20]: sel = structure.select('protein and name CA CB and same residue as '
   ....:                        'within 10 of center', center=center)
   ....: 

In [21]: sel
Out[21]: <Selection: 'index 576 579 5... 1687 1707 1710' from 1p38 (66 atoms)>

Selections simplified

In interactive sessions, an alternative to typing in .select('protein') or .select('backbone') is using dot operator:

In [22]: protein = structure.protein

In [23]: protein
Out[23]: <Selection: 'protein' from 1p38 (2833 atoms)>

You can use dot operator multiple times:

In [24]: bb = structure.protein.backbone

In [25]: bb
Out[25]: <Selection: '(backbone) and (protein)' from 1p38 (1404 atoms)>

This may go on and on:

In [26]: ala_ca = structure.protein.backbone.resname_ALA.calpha

In [27]: ala_ca
Out[27]: <Selection: '(calpha) and ((...and (protein)))' from 1p38 (26 atoms)>

More examples

There is much more to what you can do with this flexible and fast atom selection engine, without the need for writing nested loops with comparisons or changing the source code. See the following pages:

Operations on Selections

Selection objects can used with bitwise operators:

Union

Let’s select β-carbon atoms for non-GLY amino acid residues, and α-carbons for GLYs in two steps:

In [28]: betas = structure.select('name CB and protein')

In [29]: len(betas)
Out[29]: 336

In [30]: gly_alphas = structure.select('name CA and resname GLY')

In [31]: len(gly_alphas)
Out[31]: 15

The above shows that the p38 structure contains 15 GLY residues.

These two selections can be combined as follows:

In [32]: betas_gly_alphas = betas | gly_alphas

In [33]: betas_gly_alphas
Out[33]: <Selection: '(name CB and pr...nd resname GLY)' from 1p38 (351 atoms)>

In [34]: len(betas_gly_alphas)
Out[34]: 351

The selection string for the union of selections becomes:

In [35]: betas_gly_alphas.getSelstr()
Out[35]: '(name CB and protein) or (name CA and resname GLY)'

Note that it is also possible to yield the same selection using selection string (name CB and protein) or (name CA and resname GLY).

Intersection

It is as easy to get the intersection of two selections. Let’s find charged and medium size residues in a protein:

In [36]: charged = structure.select('charged')

In [37]: charged
Out[37]: <Selection: 'charged' from 1p38 (906 atoms)>

In [38]: medium = structure.select('medium')

In [39]: medium
Out[39]: <Selection: 'medium' from 1p38 (751 atoms)>
In [40]: medium_charged = medium & charged

In [41]: medium_charged
Out[41]: <Selection: '(medium) and (charged)' from 1p38 (216 atoms)>

In [42]: medium_charged.getSelstr()
Out[42]: '(medium) and (charged)'

Let’s see which amino acids are considered charged and medium:

In [43]: set(medium_charged.getResnames())
Out[43]: {'ASP'}

What about amino acids that are medium or charged:

In [44]: set((medium | charged).getResnames())
Out[44]: {'ARG', 'ASN', 'ASP', 'CYS', 'GLU', 'HIS', 'LYS', 'PRO', 'THR', 'VAL'}

Inversion

It is also possible to invert a selection:

In [45]: only_protein = structure.select('protein')

In [46]: only_protein
Out[46]: <Selection: 'protein' from 1p38 (2833 atoms)>

In [47]: only_non_protein = ~only_protein

In [48]: only_non_protein
Out[48]: <Selection: 'not (protein)' from 1p38 (129 atoms)>

In [49]: water = structure.select('water')

In [50]: water
Out[50]: <Selection: 'water' from 1p38 (129 atoms)>

The above shows that 1p38 does not contain any non-water hetero atoms.

Addition

Another operation defined on the Select object is addition (also on other AtomPointer derived classes).

This may be useful if you want to yield atoms in an AtomGroup in a specific order. Let’s think of a simple case, where we want to output atoms in 1p38 in a specific order:

In [51]: protein = structure.select('protein')

In [52]: water = structure.select('water')

In [53]: water_protein = water + protein

In [54]: writePDB('1p38_water_protein.pdb', water_protein)
Out[54]: '1p38_water_protein.pdb'

In the resulting file, the water atoms will precedes the protein atoms.

Membership

Selections also allows membership test operations:

In [55]: backbone = structure.select('protein')

In [56]: calpha = structure.select('calpha')

Is calpha a subset of backbone?

In [57]: calpha in backbone
Out[57]: True

Or, is water in protein selection?

In [58]: water in protein
Out[58]: False

Other tests include:

In [59]: protein in structure
Out[59]: True

In [60]: backbone in structure
Out[60]: True

In [61]: structure in structure
Out[61]: True

In [62]: calpha in calpha
Out[62]: True

Equality

You can also check the equality of selections. Comparison will return True if both selections refer to the same atoms.

In [63]: calpha = structure.select('protein and name CA')

In [64]: calpha2 = structure.select('calpha')

In [65]: calpha == calpha2
Out[65]: True