# -*- coding: utf-8 -*-
"""This module defines a class for selecting subsets of atoms. You can read
this page in interactive sessions using ``help(select)``.
ProDy offers a fast and powerful atom selection class, :class:`.Select`.
Selection features, grammar, and keywords are similar to those of VMD.
Small differences, that is described below, should not affect most practical
uses of atom selections. With added flexibility of Python, ProDy selection
engine can also be used to identify intermolecular contacts. You may see
this and other usage examples in :ref:`contacts` and
:ref:`selection-operations`.
First, we import everything from ProDy and parse a protein-DNA-ligand
complex structure:
.. ipython:: python
from prody import *
p = parsePDB('3mht')
:func:`.parsePDB` returns :class:`.AtomGroup` instances, ``p`` in this case,
that stores all atomic data in the file. We can count different types of
atoms using :ref:`flags` and :meth:`~.AtomGroup.numAtoms` method as follows:
.. ipython:: python
p.numAtoms('protein')
p.numAtoms('nucleic')
p.numAtoms('hetero')
p.numAtoms('water')
Last two counts suggest that ligand has 26 atoms, i.e. number of :term:`hetero`
atoms less the number of :term:`water` atoms.
Atom flags
-------------------------------------------------------------------------------
We select subset of atoms by using :meth:`.AtomGroup.select` method.
All :ref:`flags` can be input arguments to this methods as follows:
.. ipython:: python
p.select('protein')
p.select('water')
This operation returns :class:`.Selection` instances, which can be an input
to functions that accepts an *atoms* argument.
Logical operators
-------------------------------------------------------------------------------
Flags can be combined using ``'and'`` and ``'or'`` operators:
.. ipython:: python
p.select('protein and water')
``'protein and water'`` did not result in selection of :term:`protein` and
:term:`water` atoms. This is because, no atom is flagged as a protein and a
water atom at the same time.
.. note::
**Interpreting selection strings**
You may think as if a selection string, such as ``'protein and water'``, is
evaluated on a per atom basis and an atom is selected if it satisfies the
given criterion. To select both water and protein atoms, ``'or'`` logical
operator should be used instead. A protein or a water atom would satisfy
``'protein or water'`` criterion.
.. ipython:: python
p.select('protein or water')
We can also use ``'not'`` operator to negate an atom flag. For example,
the following selection will only select ligand atoms:
.. ipython:: python
p.select('not water and hetero')
If you omit the ``'and'`` operator, you will get the same result:
.. ipython:: python
p.select('not water hetero')
.. note::
**Default operator** between two flags, or other selection tokens that will
be discussed later, is ``'and'``. For example, ``'not water hetero'``
is equivalent to ``'not water and hetero'``.
We can select Cα atoms of acidic residues by omitting the default logical
operator as follows:
.. ipython:: python
sel = p.select('acidic calpha')
sel
set(sel.getResnames())
Quick selections
-------------------------------------------------------------------------------
For simple selections, such as shown above, following may be preferable over
the :meth:`~.AtomGroup.select` method:
.. ipython:: python
p.acidic_calpha
The result is the same as using ``p.select('acidic calpha')``. Underscore,
``_``, is considered as a whitespace. The limitation of this approach is that
special characters cannot be used.
Atom data fields
-------------------------------------------------------------------------------
In addition to :ref:`flags`, :ref:`fields` can be used in atom selections
when combined with some values. For example, we can select Cα and Cβ atoms
of alanine residues as follows:
.. ipython:: python
p.select('resname ALA name CA CB')
Note that we omitted the default ``'and'`` operator.
.. note::
**Whitespace** or **empty string** can be specified using an ``'_'``.
Atoms with string data fields empty, such as those with no a chain
identifiers or alternate location identifiers, can be selected using
an underscore.
.. ipython:: python
p.select('chain _') # chain identifiers of all atoms are specified in 3mht
p.select('altloc _') # altloc identifiers for all atoms are empty
Numeric data fields can also be used to make selections:
.. ipython:: python
p.select('ca resnum 1 2 3 4')
A special case for residues is having insertion codes. Residue numbers and
insertion codes can be specified together as follows:
* ``'resnum 5'`` selects residue 5 (all insertion codes)
* ``'resnum 5A'`` selects residue 5 with insertion code A
* ``'resnum 5_'`` selects residue 5 with no insertion code
Number ranges
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A range of numbers using ``'to'`` or Python style slicing with ``':'``:
.. ipython:: python
p.select('ca resnum 1to4')
p.select('ca resnum 1:4')
p.select('ca resnum 1:4:2')
.. note::
**Number ranges** specify continuous intervals:
* ``'to'`` is all inclusive, e.g. ``'resnum 1 to 4'`` means
``'1 <= resnum <= 4'``
* ``':'`` is left inclusive, e.g. ``'resnum 1:4'`` means
``'1 <= resnum < 4'``
Consecutive use of ``':'``, however, specifies a discrete range of numbers,
e.g. ``'resnum 1:4:2'`` means ``'resnum 1 3'``
Special characters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Following characters can be specified when using :ref:`fields` for atom
selections::
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
~@#$.:;_',
For example, ``"name C' N` O~ C$ C#"`` is a valid selection string.
.. note::
**Special characters** (``~!@#$%^&*()-_=+[{}]\|;:,<>./?()'"``) must be
escaped using grave accent characters (``````).
Negative numbers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Negative numbers and number ranges must also be escaped using grave accent
characters, since negative sign ``'-'`` is considered a special character
unless it indicates subtraction operation (see below).
.. ipython:: python
p.select('x `-25 to 25`')
p.select('x `-22.542`')
Omitting the grave accent character will cause a :exc:`.SelectionError`.
Regular expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Finally, you can specify regular expressions to select atoms based on
data fields with type string. Following will select residues whose names
start with capital letter A
.. ipython:: python
sel = p.select('resname "A.*"')
set(sel.getResnames())
.. note::
**Regular expressions** can be specified using double quotes, ``"..."``.
For more information on regular expressions see :mod:`re`.
Numerical comparisons
-------------------------------------------------------------------------------
:ref:`fields` with numeric types can be used as operands in numerical
comparisons:
.. ipython:: python
p.select('x < 0')
p.select('occupancy = 1')
========== =================================
Comparison Description
========== =================================
< less than
> greater than
<= less than or equal
>= greater than or equal
== equal
= equal
!= not equal
========== =================================
It is also possible to chain comparison statements as follows:
.. ipython:: python
p.select('-10 <= x < 0')
This would be the same as the following selection:
.. ipython:: python
p.select('-10 <= x and x < 0') == p.select('-10 <= x < 0')
Furthermore, numerical comparisons may involve the following operations:
========= ==================================
Operation Description
========= ==================================
x ** y x to the power y
x ^ y x to the power y
x * y x times y
x / y x divided by y
x // y x divided by y (floor division)
x % y x modulo y
x + y x plus y
x - y x minus y
========= ==================================
These operations must be used with a numerical comparison, e.g.
.. ipython:: python
p.select('x ** 2 < 10')
p.select('x ** 2 ** 2 < 10')
Finally, following functions can be used in numerical comparisons:
======== ===================================
Function Description
======== ===================================
abs(x) absolute value of x
acos(x) arccos of x
asin(x) arcsin of x
atan(x) arctan of x
ceil(x) smallest integer not less than x
cos(x) cosine of x
cosh(x) hyperbolic cosine of x
floor(x) largest integer not greater than x
exp(x) e to the power x
log(x) natural logarithm of x
log10(x) base 10 logarithm of x
sin(x) sine of x
sinh(x) hyperbolic sine of x
sq(x) square of x
sqrt(x) square-root of x
tan(x) tangent of x
tanh(x) hyperbolic tangent of x
======== ===================================
.. ipython:: python
p.select('sqrt(sq(x) + sq(y) + sq(z)) < 100') # within 100 A of origin
Distance based selections
-------------------------------------------------------------------------------
Atoms within a user specified distance (A) from a set of user specified atoms
can be selected using ``'within . of .'`` keyword, e.g. ``'within 5 of water'``
selects atoms that are within 5 A of water molecules. This setting will
results selecting water atoms as well.
User can avoid selecting specified atoms using ``exwithin . of ..`` setting,
e.g. ``'exwithin 5 of water'`` will not select water molecules and is
equivalent to ``'within 5 of water and not water'``
.. ipython:: python
p.select('exwithin 5 of water') == p.select('not water within 5 of water')
Sequence selections
-------------------------------------------------------------------------------
One-letter amino acid sequences can be used to make atom selections.
``'sequence SAR'`` will select **SER-ALA-ARG** residues in a chain. Note
that the selection does not consider connectivity within a chain. Regular
expressions can also be used to make selections: ``'sequence "MI.*KQ"'`` will
select **MET-ILE-(XXX)n-ASP-LYS-GLN** pattern, if present.
.. ipython:: python
sel = p.select('ca sequence "MI.*DKQ"')
sel
sel.getResnames()
Expanding selections
-------------------------------------------------------------------------------
A selection can be expanded to include the atoms in the same *residue*,
*chain*, or *segment* using ``same .. as ..`` setting, e.g.
``'same residue as exwithin 4 of water'`` will select residues that have
at least an atom within 4 A of any water molecule.
.. ipython:: python
p.select('same residue as exwithin 4 of water')
Additionally, a selection may be expanded to the immediately bonded atoms using
``bonded [n] to ...`` setting, e.g. ``bonded 1 to calpha`` will select atoms
bonded to Cα atoms. For this setting to work, bonds must be set by the user
using the :meth:`.AtomGroup.setBonds` or :meth:`.AtomGroup.inferBonds` method.
It is also possible to select bonded atoms by excluding the originating atoms
using ``exbonded [n] to ...`` setting. Number ``'[n]'`` indicates number of
bonds to consider from the originating selection.
Selection macros
-------------------------------------------------------------------------------
ProDy allows you to define a macro for any valid selection string. Below
functions are for manipulating selection macros:
* :func:`defSelectionMacro`
* :func:`delSelectionMacro`
* :func:`getSelectionMacro`
* :func:`isSelectionMacro`
.. ipython:: python
defSelectionMacro('alanine', 'resname ALA')
p.select('alanine') == p.select('resname ALA')
You can also use this macro as follows:
.. ipython:: python
p.alanine
Macros are stored in ProDy configuration file permanently. You can delete
them if you wish as follows:
.. ipython:: python
delSelectionMacro('alanine')
Keyword arguments
-------------------------------------------------------------------------------
:meth:`~.Select.select` method also accepts keyword arguments that can simplify
some selections. Consider the following case where you want to select some
protein atoms that are close to its center:
.. ipython:: python
protein = p.protein
calcCenter(protein).round(2)
sel1 = protein.select('sqrt(sq(x--21.17) + sq(y-35.86) + sq(z-79.97)) < 5')
sel1
Instead, you could pass a keyword argument and use the keyword in the
selection string:
.. ipython:: python
sel2 = protein.select('within 5 of center', center=calcCenter(protein))
sel2
sel1 == sel2
Note that selection string for *sel2* lists indices of atoms. This
substitution is performed automatically to ensure reproducibility of the
selection without the keyword *center*.
Keywords cannot be reserved words (see :func:`.listReservedWords`) and must be
all alphanumeric characters."""
import sys
from re import compile as re_compile
from collections import Iterable
import numpy as np
from numpy import array, ndarray, ones, zeros, arange
from numpy import invert, unique, concatenate, all, any
from numpy import logical_and, logical_or, floor, ceil, where
import pyparsing as pp
from prody import LOGGER, SETTINGS, PY2K
from .atomic import Atomic
from .fields import ATOMIC_FIELDS
from .flags import PLANTERS as FLAG_PLANTERS
from .atomgroup import AtomGroup
from .chain import Chain, getSequence
from .atomic import AAMAP
from .pointer import AtomPointer
from .selection import Selection
from .segment import Segment
from .atommap import AtomMap
from prody.utilities import rangeString
from prody.kdtree import KDTree
if PY2K:
range = xrange
DEBUG = 0
NUMB = 0 # Select instance will not really evaluate string for the atoms
TIMER = 0
def debug(sel, loc, *args):
if DEBUG:
print('')
if args:
print(args[0], args[1:])
print(repr(sel))
print(' ' * (loc + 1) + '^')
__all__ = ['Select', 'SelectionError', 'SelectionWarning',
'defSelectionMacro', 'delSelectionMacro', 'getSelectionMacro',
'isSelectionMacro']
ATOMGROUP = None
MACROS = SETTINGS.get('selection_macros', {})
MACROS_REGEX = None
[docs]def isSelectionMacro(word):
"""Returns **True** if *word* is a user defined selection macro."""
try:
return word in MACROS
except:
return False
[docs]def defSelectionMacro(name, selstr):
"""Define selection macro *selstr* with name *name*. Both *name* and
*selstr* must be string. An existing keyword cannot be used as a macro
name. If a macro with given *name* exists, it will be overwritten.
.. ipython:: python
defSelectionMacro('cbeta', 'name CB and protein')"""
if not isinstance(name, str) or not isinstance(selstr, str):
raise TypeError('both name and selstr must be strings')
elif isReserved(name):
raise ValueError('{0} is a reserved word and cannot be used as a '
'macro name'.format(repr(name)))
elif not (name.isalpha() and name.islower()):
raise ValueError('macro names must be all lower case letters, {0} '
'is not a valid macro name'.format(repr(name)))
LOGGER.info('Testing validity of selection string.')
try:
ATOMGROUP.select(selstr)
except SelectionError:
LOGGER.warn('{0} is not a valid selection string, macro {1} is not'
'defined.'.format(repr(selstr), repr(name)))
else:
LOGGER.info("Macro {0} is defined as {1}."
.format(repr(name), repr(selstr)))
MACROS[name] = selstr
SETTINGS['selection_macros'] = MACROS
SETTINGS.save()
[docs]def delSelectionMacro(name):
"""Delete the macro *name*.
.. ipython:: python
delSelectionMacro('cbeta')"""
try:
MACROS.pop(name)
except:
LOGGER.warn("Macro {0} is not found.".format(repr(name)))
else:
if MACROS_REGEX is not None: MACROS_REGEX.pop(name, None)
LOGGER.info("Macro {0} is deleted.".format(repr(name)))
SETTINGS['selection_macros'] = MACROS
SETTINGS.save()
[docs]def getSelectionMacro(name=None):
"""Returns the definition of the macro *name*. If *name* is not given,
returns a copy of the selection macros dictionary."""
if name is None:
return MACROS.copy()
try:
return MACROS[name]
except KeyError:
LOGGER.info("{0} is not a user defined macro name."
.format(repr(name)))
def replaceMacros(selstr):
global MACROS_REGEX
if MACROS_REGEX is None: MACROS_REGEX = {}
selstr = ' ' + selstr + ' '
if MACROS:
for name, macro in MACROS.items(): # PY3K: OK
re = MACROS_REGEX.setdefault(name,
re_compile('[( )]' + name + '[( )]'))
for match in re.finditer(selstr):
start, end = match.start(), match.end()
match = selstr[start:end]
selstr = (selstr[:start] + match[0] + '(' + macro+ ')' +
match[-1] + selstr[end:])
return selstr[1:-1]
def checkSelstr(selstr, what, error=ValueError):
"""Check *selstr* if it satisfies a selected condition. For now, only
whether coordinate/distance based selection are checked. If *error* is
a subclass of :exc:`Exception`, an exception will be raised, otherwise
return **True** or **False** will be returned."""
selstr = selstr.replace('(', ' ( ')
selstr = selstr.replace(')', ' ) ')
if what in set(['dist']):
for item in selstr.split():
if item in XYZDIST:
if issubclass(error, Exception):
raise error('invalid selection {0}, coordinate '
'based selections are not accepted'
.format(repr(selstr)))
else:
return False
[docs]class SelectionError(Exception):
"""Exception raised when there are errors in the selection string."""
def __init__(self, sel, loc=0, msg='', tkns=None):
if tkns:
for tkn in tkns:
tkn = str(tkn)
if sel.count(tkn, loc) == 1: loc = sel.index(tkn, loc)
msg = ("An invalid selection string is encountered:\n{0}\n"
.format(repr(sel)) +
' ' * (loc + 1) + '^ ' + msg)
Exception.__init__(self, msg)
[docs]class SelectionWarning(Warning):
"""A class used for issuing warning messages when potential typos are
detected in a selection string. Warnings are issued to ``sys.stderr``
via ProDy package logger. Use :func:`.confProDy` to selection warnings
*on* or *off*, e.g. ``confProDy(selection_warning=False)``."""
def __init__(self, sel='', loc=0, msg='', tkns=None):
if SETTINGS.get('selection_warning', True):
if tkns:
for tkn in tkns:
tkn = str(tkn)
if sel.count(tkn, loc) == 1: loc = sel.index(tkn, loc)
msg = ('Selection string contains typo(s):\n'
'{0}\n '.format(repr(sel)) +
' ' * loc + '^ ' + msg)
LOGGER.warn(msg)
FIELDS_SYNONYMS = {'chid': 'chain',
'fragment': 'fragindex',
'resid': 'resnum',
'secstr': 'secondary',
'segname': 'segment'}
XYZ2INDEX = {'x': 0, 'y': 1, 'z': 2}
FUNCTIONS = {
'sqrt' : np.sqrt,
'sq' : lambda num: np.power(num, 2),
'abs' : np.abs,
'floor' : np.floor,
'ceil' : np.ceil,
'sin' : np.sin,
'cos' : np.cos,
'tan' : np.tan,
'asin' : np.arcsin,
'acos' : np.arccos,
'atan' : np.arctan,
'sinh' : np.sinh,
'cosh' : np.cosh,
'tahn' : np.tanh,
'exp' : np.exp,
'log' : np.log,
'log10' : np.log10,
}
OPERATORS = {
'+' : np.add,
'-' : np.subtract,
'*' : np.multiply,
'/' : np.divide,
'%' : np.remainder,
'>' : np.greater,
'<' : np.less,
'>=' : np.greater_equal,
'<=' : np.less_equal,
'=' : np.equal,
'==' : np.equal,
'!=' : np.not_equal,
}
SAMEAS_MAP = {'residue': 'resindex', 'chain': 'chindex',
'segment': 'segindex', 'fragment': 'fragindex'}
# maybe a value to a field/data label or field label
XYZ = set(['x', 'y', 'z'])
XYZDIST = set(['x', 'y', 'z', 'within', 'exwithin'])
OR = pp.Keyword('or')
AND = pp.Keyword('and')
WORD = pp.Word(pp.alphanums + '''~@#$.:;_'`,''')
_ = list(FUNCTIONS)
FUNCNAMES = set(_)
kwfunc = pp.Keyword(_[0])
FUNCNAMES_OPLIST = kwfunc
FUNCNAMES_EXPR = ~kwfunc
for func in _[1:]:
kwfunc = pp.Keyword(func)
FUNCNAMES_OPLIST = FUNCNAMES_OPLIST | kwfunc
FUNCNAMES_EXPR += ~kwfunc
RE_SCHARS = re_compile('`[\w\W]*?`')
PP_SCHARS = pp.Regex(RE_SCHARS)
def specialCharsParseAction(sel, loc, token):
token = token[0][1:-1]
if not token:
raise SelectionError(sel, loc, '`` is invalid, no special characters')
if ':' in token or 'to' in token:
try:
token = PP_NRANGE.parseString(token)[0]
except pp.ParseException:
pass
return token
PP_SCHARS.setParseAction(specialCharsParseAction)
RE_REGEXP = re_compile('"[\w\W]*"')
PP_REGEXP = pp.Regex(RE_REGEXP.pattern)
def regularExpParseAction(sel, loc, token):
token = token[0]
if token == '""':
raise SelectionError(sel, loc, '"" is invalid, no regular expression')
try:
regexp = re_compile(token[1:-1])
except:
raise SelectionError(sel, loc, 'failed to compile regular '
'expression {0}'.format(repr(token)))
else:
return regexp
PP_REGEXP.setParseAction(regularExpParseAction)
_ = '[-+]?\d+(\.\d*)?([eE]\d+)?'
RE_NRANGE = re_compile(_ + '\ *(to|:)\ *' + _)
PP_NRANGE = pp.Group(pp.Regex(RE_NRANGE.pattern) +
pp.Optional(pp.Regex('(\ *:\ *' + _ + ')')))
def rangeParseAction(sel, loc, tokens):
tokens = tokens[0]
debug(sel, loc, '_nrange', tokens)
token = tokens[0]
sep = ':' if ':' in token else 'to'
first, last = token.split(sep)
try:
start = int(first)
except ValueError:
start = float(first)
try:
stop = int(last)
except ValueError:
stop = float(last)
if start > stop:
raise SelectionError(sel, loc, 'range start value ({0}) is greater '
'than stop value ({1})'
.format(repr(start), repr(stop)))
elif start == stop:
if sep == ':':
raise SelectionError(sel, loc, 'range start value ({0}) is greater '
'than or equal to stop value ({1})'
.format(repr(start), repr(stop)))
else:
return first
if sep == 'to':
comp = '<='
elif len(tokens) == 1:
comp = '<'
else:
try:
comp = int(tokens[1][1:])
except ValueError:
comp = float(tokens[1][1:])
return 'range', start, stop, comp
PP_NRANGE.setParseAction(rangeParseAction)
UNARY = set(['not', 'bonded', 'exbonded', 'within', 'exwithin', 'same'])
[docs]class Select(object):
"""Select subsets of atoms based on a selection string.
See :mod:`~.atomic.select` module documentation for selection grammar
and examples. This class makes use of pyparsing_ module."""
def __init__(self):
self._ag = None
self._atoms = None
self._indices = None
self._n_atoms = None
self._selstr = None
self._coords = None
self._kwargs = None
# set True when selection string alone cannot reproduce the selection
self._ss2idx = False
self._data = dict()
self._replace = False
self._parsers = {}
self._evalmap = {'resnum': self._resnum, 'resid': self._resnum,
'serial': self._serial, 'index': self._index,
'x': self._generic, 'y': self._generic, 'z': self._generic,
'chid': self._generic, 'secstr': self._generic,
'fragment': self._generic, 'fragindex': self._generic,
'segment': self._generic, 'sequence': self._sequence, }
def _reset(self):
self._ag = None
self._atoms = None
self._indices = None
self._n_atoms = None
self._coords = None
self._data.clear()
def _evalAtoms(self, atoms):
self._atoms = atoms
try:
self._ag = atoms.getAtomGroup()
except AttributeError:
self._ag = atoms
self._indices = None
else:
self._indices = atoms._getIndices()
if len(self._indices) == 1:
try:
index = atoms.getIndex()
except AttributeError:
pass
else:
self._atoms = Selection(self._ag, array([index]),
'index ' + str(index), atoms.getACSIndex())
[docs] def select(self, atoms, selstr, **kwargs):
"""Returns a :class:`.Selection` of atoms matching *selstr*, or
**None**, if selection string does not match any atoms.
:arg atoms: atoms to be evaluated
:type atoms: :class:`.Atomic`
:arg selstr: selection string
:type selstr: str
Note that, if *atoms* is an :class:`.AtomMap` instance, an
:class:`.AtomMap` is returned, instead of a a :class:`.Selection`.
.. note:
* A special case for making atom selections is passing an
:class:`.AtomMap` instance as *atoms* argument. Dummy
atoms will not be included in the result, but the order
of atoms will be preserved."""
self._ss2idx = False
self._replace = False
self._selstr = selstr
indices = self.getIndices(atoms, selstr, **kwargs)
self._kwargs = None
if len(indices) == 0:
return None
if not isinstance(atoms, AtomGroup):
indices = self._indices[indices]
ag = self._ag
try:
dummies = atoms.numDummies()
except AttributeError:
if self._ss2idx:
selstr = 'index {0}'.format(rangeString(indices))
else:
if self._replace:
for key, value in kwargs.items(): # PY3K: OK
if value in ag and key in selstr:
if value == ag:
ss = 'all'
else:
ss = value.getSelstr()
selstr = selstr.replace(key, '(' + ss + ')')
if isinstance(atoms, AtomPointer):
selstr = '({0}) and ({1})'.format(selstr,
atoms.getSelstr())
return Selection(ag, indices, selstr, atoms.getACSIndex(),
unique=True)
else:
return AtomMap(ag, indices, atoms.getACSIndex(), dummies=dummies,
title='Selection {0} from '.format(repr(selstr)) + str(atoms))
[docs] def getIndices(self, atoms, selstr, **kwargs):
"""Returns indices of atoms matching *selstr*. Indices correspond to
the order in *atoms* argument. If *atoms* is a subset of atoms, they
should not be used for indexing the corresponding :class:`.AtomGroup`
instance."""
ss = selstr.strip()
if (len(ss.split()) == 1 and ss.isalnum() and ss not in MACROS):
self._evalAtoms(atoms)
if ss == 'none':
return array([])
elif ss == 'all':
return arange(atoms.numAtoms())
elif atoms.isFlagLabel(ss):
return atoms._getFlags(ss).nonzero()[0]
elif atoms.isDataLabel(ss) or ss in self._evalmap:
raise SelectionError(selstr, 0, 'must be followed by values',
[ss])
else:
raise SelectionError(selstr, 0, 'is not a valid selection '
'string', [ss])
else:
torf = self.getBoolArray(atoms, selstr, **kwargs)
return torf.nonzero()[0]
[docs] def getBoolArray(self, atoms, selstr, **kwargs):
"""Returns a boolean array with **True** values for *atoms* matching
*selstr*. The length of the boolean :class:`numpy.ndarray` will be
equal to the length of *atoms* argument."""
if not isinstance(atoms, Atomic):
raise TypeError('atoms must be an Atomic instance, not {0}'
.format(type(atoms)))
self._reset()
for key in kwargs:
if not key.isalnum():
raise TypeError('{0} is not a valid keyword argument, '
'keywords must be all alpha numeric '
'characters'.format(repr(key)))
if isReserved(key):
loc = selstr.find(key)
if loc > -1:
raise SelectionError(selstr, loc, '{0} is a reserved '
'word and cannot be used as a keyword argument'
.format(repr(key)))
self._n_atoms = atoms.numAtoms()
self._selstr = selstr
self._kwargs = kwargs
if DEBUG: print('getBoolArray', selstr)
self._evalAtoms(atoms)
selstr = selstr.strip()
if (len(selstr.split()) == 1 and selstr.isalnum() and
selstr not in MACROS):
if selstr == 'none':
return zeros(atoms.numAtoms(), bool)
elif selstr == 'all':
return ones(atoms.numAtoms(), bool)
elif atoms.isFlagLabel(selstr):
return atoms.getFlags(selstr)
elif atoms.isDataLabel(selstr):
raise SelectionError(selstr, 0, 'must be followed by values')
else:
raise SelectionError(selstr, 0, 'is not a valid selection or '
'user data label')
selstr = replaceMacros(selstr)
try:
parser = self._getParser(selstr)
tokens = parser(selstr, parseAll=True)
except pp.ParseException as err:
self._parsers.pop(self._parser, None)
pass
which = selstr.rfind(' ', 0, err.column)
if which > -1:
if selstr[which + 1] == '(':
msg = ('an arithmetic, comparison, or logical operator '
'must precede the opening parenthesis')
elif selstr[which - 1] == ')':
msg = ('an arithmetic, comparison, or logical operator '
'must follow the closing parenthesis')
else:
msg = 'parsing failed here'
else:
msg = 'parsing failed here'
raise SelectionError(selstr, err.column, msg + '\n' + str(err))
else:
if DEBUG: print('_evalSelstr', tokens)
torf = tokens[0]
if not isinstance(torf, ndarray):
if DEBUG: print(torf)
raise SelectionError(selstr)
elif torf.dtype != bool:
if DEBUG:
print('_select torf.dtype', torf.dtype, isinstance(torf.dtype,
bool))
raise SelectionError(selstr)
if DEBUG:
print('_select', torf)
return torf
def _getParser(self, selstr):
"""Returns an efficient parser that can handle *selstr*."""
alnum = selstr
alpha = selstr
for ch in selstr:
if not ch.isalnum(): alnum = alnum.replace(ch, ' ')
if not ch.isalpha(): alpha = alpha.replace(ch, ' ')
items = set(alnum.split())
chars = set(selstr)
funcs = 4 if items.intersection(FUNCNAMES) else 0
opers = 2 if chars.intersection(OPERATORS) else 0
logic = 1 if 'or' in items or '(' in chars else 0
schars = 8 if '`' in chars and RE_SCHARS.search(selstr) else 0
regexp = 16 if '"' in chars and RE_REGEXP.search(selstr) else 0
nrange = 32 if ((':' in chars or ' to ' in alpha) and
RE_NRANGE.search(selstr)) else 0
self._parser = key = (logic + opers + funcs,
logic + funcs + schars + regexp + nrange)
if key == (0, 0):
return self._noParser
try:
return self._parsers[key][0].parseString
except KeyError:
pass
word = ~AND + ~OR
oplist = []
if funcs:
oplist.append((FUNCNAMES_OPLIST, 1, pp.opAssoc.RIGHT, self._func))
# following causes 20% slow down
#word += FUNCNAMES_EXPR
if funcs or opers:
oplist.extend([
(pp.oneOf('+ -'), 1, pp.opAssoc.RIGHT, self._sign),
(pp.oneOf('** ^'), 2, pp.opAssoc.LEFT, self._pow),
(pp.oneOf('* / %'), 2, pp.opAssoc.LEFT, self._binop),
(pp.oneOf('+ -'), 2, pp.opAssoc.LEFT, self._binop),
(pp.oneOf('< > <= >= == = !='), 2, pp.opAssoc.LEFT,
self._comp)])
oplist.extend([
(pp.Optional(AND), 2, pp.opAssoc.LEFT, self._and),
(OR, 2, pp.opAssoc.LEFT, self._or)])
word += WORD
expr = word
if schars: expr = PP_SCHARS | expr
if regexp: expr = PP_REGEXP | expr
if nrange: expr = PP_NRANGE | expr
parser = pp.operatorPrecedence(expr, oplist)
parser.setParseAction(self._default)
parser.leaveWhitespace()
parser.enablePackrat()
self._parsers[key] = parser, expr, oplist
return parser.parseString
def _noParser(self, selstr, parseAll=True):
debug(selstr, 0, ['_noParser'])
return [self._default(selstr, 0, selstr.split())]
def _getZeros(self, subset=None):
"""Returns a bool array with zero elements."""
if subset is None:
return zeros(self._atoms.numAtoms(), bool)
else:
return zeros(len(subset), bool)
def _default(self, sel, loc, tokens):
debug(sel, loc, '_default', tokens)
if NUMB: return
if len(tokens) == 1:
torf, err = self._eval(sel, loc, tokens)
else:
torf, err = self._and2(sel, loc, tokens)
if err: raise err
return torf
def _eval(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_eval', tokens)
if NUMB: return
#if isinstance(tokens, ndarray):
# return tokens
if len(tokens) == 1:
token = tokens[0]
try:
dtype = token.dtype
except AttributeError:
pass
else:
return token, False
if token == 'none':
return zeros(self._n_atoms if subset is None else len(subset),
bool), False
elif token == 'all':
return ones(self._n_atoms if subset is None else len(subset),
bool), False
elif self._atoms.isFlagLabel(token):
return self._getFlags(token, subset), False
elif self._atoms.isDataLabel(token):
data, err = self._getData(sel, loc, token)
if subset is None:
return data, False
else:
return None, SelectionError(sel, loc, 'subset??')
return data[subset], False
try:
arg = self._kwargs[token]
except KeyError:
try:
return float(token), False
except ValueError:
data, err = self._getData(sel, loc, token)
if data is not None:
return data, False
if token in ATOMIC_FIELDS:
return None, SelectionError(sel, loc, '{0} data is '
'not found'.format(repr(token)), [token])
else:
return None, SelectionError(sel, loc, '{0} could '
'not be evaluated'.format(repr(token)), [token])
else:
if arg in self._ag:
try:
dummies = arg.numDummies()
except AttributeError:
try:
indices = arg._getIndices()
except AttributeError:
indices = arange(self._ag.numAtoms())
else:
if dummies:
indices = arg._getIndices()[arg.getFlags('mapped')]
else:
indices = arg._getIndices()
torf = zeros(self._ag.numAtoms(), bool)
torf[indices] = True
if self._indices is not None:
torf = torf[self._indices]
self._replace = True
return torf, False
return token, False
else:
return self._evalmap.get(tokens[0], self._generic)(
sel, loc, tokens, subset)
def _getFlags(self, label, subset):
if subset is None:
# get a copy to avoid alterations
return self._atoms.getFlags(label)
else:
return self._atoms._getFlags(label)[subset]
def _or(self, sel, loc, tokens):
debug(sel, loc, '_or', tokens)
tokens = tokens[0]
if NUMB: return
flags = []
torfs = []
evals = []
atoms = self._atoms
isFlagLabel = atoms.isFlagLabel
isDataLabel = atoms.isDataLabel
for token in tokens:
# check whether token is an array to avoid array == str comparison
try:
dtype = token.dtype
except AttributeError:
if token == 'or':
continue
elif isFlagLabel(token):
flags.append(token)
elif isDataLabel(token) or token in self._evalmap:
evals.append([])
evals[-1].append(token)
else:
try:
evals[-1].append(token)
except IndexError:
raise SelectionError(sel, loc)
else:
if dtype == bool:
torfs.append(token)
else:
try:
evals[-1].append(token)
except IndexError:
raise SelectionError(sel, loc)
torf = None
if torfs:
torf = torfs.pop(0)
while torfs:
ss = where(torf == 0)[0]
if len(ss) == 0: return torf
torf[ss] = torfs.pop(0)[ss]
if flags:
if torf is None:
torf = atoms.getFlags(flags.pop(0))
while flags:
ss = where(torf == 0)[0]
if len(ss) == 0: return torf
torf[ss] = atoms._getFlags(flags.pop(0))[ss]
if evals:
if torf is None:
tokens = evals.pop(0)
first = str(tokens[0])
torf, err = self._eval(sel, loc, tokens)
if err: raise err
try:
dtype = torf.dtype
except AttributeError:
raise SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
else:
if dtype != bool:
raise SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
while evals:
ss = where(torf == 0)[0]
if len(ss) == 0: return torf
tokens = evals.pop(0)
first = str(tokens[0])
arr, err = self._eval(sel, loc, tokens, subset=ss)
if err: raise err
try:
dtype = arr.dtype
except AttributeError:
raise SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
else:
if dtype != bool:
raise SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
torf[ss] = arr
return torf
def _and(self, sel, loc, tokens):
debug(sel, loc, '_and', tokens)
tokens = tokens[0]
torf, err = self._and2(sel, loc, tokens)
if err: raise err
return torf
def _and2(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_and2', tokens)
if NUMB: return
firsttoken = tokens[0] if not isinstance(tokens[0], Iterable) else list(tokens[0])
lasttoken = tokens[-1] if not isinstance(tokens[-1], Iterable) else list(tokens[-1])
if firsttoken == 'and' or lasttoken == 'and':
return None, SelectionError(sel, loc, '{0} operator must be '
'surrounded with arguments'.format(repr('and')), [tokens[0]])
flags = []
torfs = []
evals = []
unary = []
atoms = self._atoms
isFlagLabel = atoms.isFlagLabel
isDataLabel = atoms.isDataLabel
append = None
wasand = False
while tokens:
# check whether token is an array to avoid array == str comparison
token = tokens.pop(0)
try:
dtype = token.dtype
except AttributeError:
if token == 'and':
if wasand:
return None, SelectionError(sel, loc, 'incorrect use '
'of `and` operator, expected {0}'
.format(repr('and ... and')), ['and', 'and'])
append = None
wasand = True
continue
elif isFlagLabel(token):
flags.append(token)
append = None
elif (isDataLabel(token) or token in self._evalmap or
token in ATOMIC_FIELDS):
# not evals, must start a new list
# last evals list must have more than one values
#
if not evals:
evals.append([])
append = evals[-1].append
elif len(evals[-1]) == 1:
if token in XYZ:
pass
else:
return None, SelectionError(sel, loc, '{0} '
'must be followed by values'
.format(evals[-1][0]), [evals[-1][0]])
elif token in XYZ:
if (not tokens or tokens[0] in self._evalmap or
tokens[0] in ATOMIC_FIELDS or
isDataLabel(tokens[0]) or tokens[0] == 'and'):
pass
else:
try:
float(tokens[0])
except ValueError as err:
pass
else:
evals.append([])
append = evals[-1].append
elif not tokens:
return None, SelectionError(sel, loc, '{0} '
'must be followed by values'
.format(token), [token])
else:
evals.append([])
append = evals[-1].append
append(token)
elif token in UNARY:
unary.append([])
append = unary[-1].append
if token == 'not':
append((token,))
elif token == 'same':
if len(tokens) < 3 or tokens[1] != 'as':
return None, SelectionError(sel, loc, 'incorrect '
'use of `same as` statement, expected {0}'
.format('same entity as ...'), [token])
append((token, tokens.pop(0), tokens.pop(0)))
elif token.endswith('within'):
if len(tokens) < 3 or tokens[1] != 'of':
return None, SelectionError(sel, loc, 'incorrect '
'use of `within` statement, expected {0}'
.format('[ex]within x.y of ...'), [token])
append((token, tokens.pop(0), tokens.pop(0)))
elif token.endswith('bonded'):
token2 = tokens.pop(0)
if len(tokens) < (1 + int(token2 == 'to')):
return None, SelectionError(sel, loc, 'incorrect '
'use of `bonded` statement, expected {0}'
.format('[ex]bonded [n] to ...'), [token2])
if token2 == 'to':
append((token, 'to'))
else:
append((token, token2, tokens.pop(0)))
anyargs = False
while tokens:
next = tokens[0]
try:
dtype = next.dtype
except AttributeError:
append(tokens.pop(0))
if next == 'and' or isFlagLabel(next):
break
if isDataLabel(next) or next in self._evalmap:
if anyargs:
break
else:
anyargs = True
else:
append(tokens.pop(0))
break
else:
try:
append(token)
except TypeError:
return None, SelectionError(sel, loc, 'a problem '
'occurred when evaluation token {0}'
.format(repr(token)), [token])
else:
if dtype == bool:
torfs.append(token)
else:
return None, SelectionError(sel, loc, 'a problem '
'occurred when evaluation token {0}'
.format(repr(token)), [token])
wasand = False
torf = None
if torfs:
torf = torfs.pop(0)
while torfs:
ss = torf.nonzero()[0]
if len(ss) == 0: return torf, False
torf[ss] = torfs.pop(0)[ss]
if flags:
if torf is None:
torf = atoms.getFlags(flags.pop(0))
while flags:
ss = torf.nonzero()[0]
if len(ss) == 0: return torf, False
torf[ss] = atoms._getFlags(flags.pop(0))[ss]
if unary:
if torf is None:
torf, err = self._unary(sel, loc, unary.pop(0))
if err: return None, err
while unary:
ss = torf.nonzero()[0]
if len(ss) == 0: return torf, False
arr, err = self._unary(sel, loc, unary.pop(0))
if err: return None, err
torf[ss] = arr[ss]
if evals:
if torf is None:
tokens = evals.pop(0)
first = str(tokens[0])
torf, err = self._eval(sel, loc, tokens, subset=subset)
if err: return None, err
try:
dtype = torf.dtype
except AttributeError:
return None, SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
else:
if dtype != bool:
return None, SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
while evals:
ss = torf.nonzero()[0]
if len(ss) == 0: return torf, False
tokens = evals.pop(0)
first = str(tokens[0])
arr, err = self._eval(sel, loc, tokens, subset=ss)
if err: return None, err
try:
dtype = arr.dtype
except AttributeError:
return None, SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
else:
if dtype != bool:
return None, SelectionError(sel, loc, 'a problem '
'occurred when evaluating token {0}'
.format(repr(first)), [first])
torf[ss] = arr
# ?? check torf.shape/ndim
if subset is None:
return torf, False
else:
return torf[subset], False
def _unary(self, sel, loc, tokens):
debug(sel, loc, '_unary', tokens)
if NUMB: return
what = tokens[0]
which = tokens[1:]
if not which:
return None, SelectionError(sel, loc, '{0} must be followed by '
.format(repr(' '.join(what))), what)
if len(which) == 1:
which, err = self._eval(sel, loc, which)
else:
which, err = self._and2(sel, loc, which)
if err: raise err
tokens = [what, which]
if what[0] == 'not':
return self._not(sel, loc, tokens)
elif what[0] == 'same':
return self._sameas(sel, loc, tokens)
elif what[-1] == 'to':
return self._bondedto(sel, loc, tokens)
else:
return self._within(sel, loc, tokens)
def _not(self, sel, loc, tokens):
"""Negate selection."""
debug(sel, loc, '_not', tokens)
label, torf = tokens
return invert(torf, torf), False
def _within(self, sel, loc, tokens):
"""Perform distance based selection."""
if DEBUG: print('_within', tokens)
label, which = tokens
within = label[1]
label = ' '.join(label)
try:
within = float(within)
except Exception as err:
return None, SelectionError('could not convert {0} in {1} to '
'float ({2})'.format(within, repr(label), str(err)),
[label, within])
exclude = label.startswith('ex')
other = False
try:
dtype = which.dtype
except AttributeError:
if which in self._kwargs:
coords = self._kwargs[which]
try:
ndim, shape = coords.ndim, coords.shape
except AttributeError:
try:
coords = coords._getCoords()
except AttributeError:
try:
coords = coords.getCoords()
except AttributeError:
return None, SelectionError(sel, loc,
'{0} must be a coordinate array or have '
'`getCoords` method'.format(repr(which)),
[label, which])
if coords is None:
return None, SelectionError(sel, loc,
'coordinates are not set for {0} ({1})'
.format(repr(which), repr(self._kwargs[which])),
[label, which])
else:
ndim, shape = coords.ndim, coords.shape
if ndim == 1 and shape[0] == 3:
coords = array([coords])
elif not (ndim == 2 and shape[1] == 3):
return None, SelectionError(sel, loc,
'{0} must be a coordinate array or have '
'`getCoords` method'.format(repr(which)),
[label, which])
exclude=False
self._ss2idx = True
which = arange(len(coords))
other = True
else:
if dtype == bool:
which = which.nonzero()[0]
coords = self._getCoords()
if coords is None:
return None, SelectionError(sel, loc, 'coordinates are '
'not set')
else:
return None, SelectionError(sel, loc, 'not understood')
if other or len(which) < 20:
kdtree = self._atoms._getKDTree()
get_indices = kdtree.getIndices
search = kdtree.search
get_count = kdtree.getCount
torf = zeros(self._ag.numAtoms(), bool)
for index in which:
search(within, coords[index])
if get_count():
torf[get_indices()] = True
if self._indices is not None:
torf = torf[self._indices]
if exclude:
torf[which] = False
else:
n_atoms = self._atoms.numAtoms()
torf = ones(n_atoms, bool)
torf[which] = False
check = torf.nonzero()[0]
torf = zeros(n_atoms, bool)
cxyz = coords[check]
kdtree = KDTree(coords[which])
search = kdtree.search
get_count = kdtree.getCount
select = []
append = select.append
for i, xyz in enumerate(cxyz):
search(within, xyz)
if get_count():
append(i)
torf[check[select]] = True
if not exclude:
torf[which] = True
return torf, False
def _sameas(self, sel, loc, tokens):
"""Evaluate ``'same entity as ...'`` expression."""
debug(self, loc, '_sameas', tokens)
label, which = tokens
what = label[1]
label = ' '.join(label)
index = SAMEAS_MAP.get(what)
if index is None:
return None, SelectionError(sel, loc, 'entity in "same ... as" '
'must be one of "chain", "residue", "segment", or "fragment",'
' not {0}'.format(repr(what)), [label])
indices, err = self._getData(sel, loc, index)
iset = set(indices[which])
torf = array([i in iset for i in indices], bool)
return torf, False
def _bondedto(self, sel, loc, tokens):
"""Expand selection to immediately bonded atoms."""
debug(sel, loc, '_bondedto', tokens)
label, torf = tokens
token = label[1]
label = ' '.join(label)
if token == 'to':
repeat = 1
else:
try:
repeat = int(token)
except TypeError:
return None, SelectionError(sel, loc, '{0} in {0} could not '
'be converted to an integer'.format(token, repr(label)),
[label])
else:
if float(token) != repeat:
SelectionWarning(sel, loc, 'number in {0} should be an '
'integer'.format(repr(label)), [label])
if repeat <= 0:
SelectionWarning(sel, loc, 'number in {0} should be a '
'positive integer'.format(repr(label)), [label])
return zeros(self._atoms.numAtoms(), bool), False
bmap = self._ag._bmap
if bmap is None:
return None, SelectionError(sel, loc, 'bonds are not set',
[label])
which = torf.nonzero()[0]
if not len(which):
return torf, False
indices = self._indices
if indices is not None:
bmap = bmap[indices]
n_atoms = self._ag.numAtoms()
for i in range(repeat):
torf = zeros(n_atoms, bool)
bonded = unique(bmap[which])
if bonded[0] == -1:
torf[bonded[1:]] = True
else:
torf[bonded] = True
if indices is not None:
torf = torf[indices]
if label.startswith('ex'):
torf[which] = False
else:
torf[which] = True
if i + 1 < repeat:
which = torf.nonzero()[0]
return torf, False
def _getNumeric(self, sel, loc, arg, copy=False):
"""Returns numeric data or a number."""
debug(sel, loc, '_getNumeric', arg)
# arg may be an array, a string, or a regular expression
try:
dtype, ndim = arg.dtype, arg.ndim
except AttributeError:
pass
else:
# i don't expect that a string array may show up
if dtype == bool:
return None, SelectionError(sel, loc, 'operands and function '
'arguments must be numbers or numeric data labels')
else:
return arg, False
# no regular expressions
try:
pattern = arg.pattern
except AttributeError:
pass
else:
return None, SelectionError(sel, sel.index(pattern, loc),
'operands and function arguments cannot be regular expressions')
if arg in XYZ2INDEX:
coords = self._getCoords() # how about atoms._getCoords() ?
if coords is None:
return None, SelectionError(sel, loc,
'coordinates are not set')
else:
if copy:
return coords[:, XYZ2INDEX[arg]].copy(), False
else:
return coords[:, XYZ2INDEX[arg]], False
arg = FIELDS_SYNONYMS.get(arg, arg)
try:
if copy:
data = self._atoms.getData(arg)
else:
data = self._atoms._getData(arg)
except Exception as err:
return None, SelectionError(sel, loc, 'following exception '
'occurred when evaluating {0}: {1}'
.format(repr(arg), str(err)))
if data is not None:
if data.dtype.char in 'US':
return None, SelectionError(sel, loc, '{0} is not a numeric '
'data label'.format(repr(arg)))
else:
return data, False
if arg == 'index':
try:
if copy:
return self._atoms.getIndices(), False
else:
return self._atoms._getIndices(), False
except AttributeError:
return arange(self._atoms.numAtoms()), False
try:
return float(arg), False
except Exception as err:
return None, SelectionError(sel, loc, '{0} is not a number or a '
'numeric data label'
.format(repr(arg), ))
def _comp(self, sel, loc, tokens):
"""Perform comparison."""
tokens = tokens[0]
debug(sel, loc, '_comp', tokens)
if NUMB: return
if len(tokens) >= 3 and len(tokens) % 2 != 1:
raise SelectionError(sel, loc,
'invalid number of operators and operands')
left, err = self._getNumeric(sel, loc, tokens.pop(0))
if err: raise err
torf = None
while tokens:
try:
binop = OPERATORS[tokens.pop(0)]
except KeyError:
raise SelectionError(sel, loc, 'invalid operator encountered')
right, err = self._getNumeric(sel, loc, tokens.pop(0))
if err: raise err
if torf is None:
torf = binop(left, right)
else:
logical_and(binop(left, right), torf, torf)
left = right
# check whether atomic data was contained in comparison
# i.e. len(atoms) == len(torf)
try:
ndim, shape = torf.ndim, torf.shape
except AttributeError:
raise SelectionError(sel, loc,
'comparison must contain atomic data')
else:
if ndim != 1 or shape[0] != self._atoms.numAtoms():
raise SelectionError(sel, loc,
'comparison must contain atomic data')
else:
return torf
def _binop(self, sel, loc, tokens):
"""Perform binary operation."""
tokens = tokens[0]
debug(sel, loc, '_binop', tokens)
if NUMB: return
if len(tokens) >= 3 and len(tokens) % 2 != 1:
raise SelectionError(sel, loc, 'invalid number of items')
left, err = self._getNumeric(sel, loc, tokens.pop(0), copy=True)
if err: raise err
while tokens:
binop = tokens.pop(0)
if binop not in OPERATORS:
raise SelectionError(sel, loc, 'invalid operator encountered')
right, err = self._getNumeric(sel, loc, tokens.pop(0))
if err: raise err
if DEBUG: print(binop, left, right)
if binop == '/' and any(right == 0.0):
raise SelectionError(sel, loc, 'zero division error')
binop = OPERATORS[binop]
try:
ndim = left.ndim
except:
left = binop(left, right)
else:
# ndim must not be zero for in place operation
if ndim:
binop(left, right, left)
else:
left = binop(left, right)
return left
def _pow(self, sel, loc, tokens):
"""Perform power operation. Expected operands are numbers
and numeric atom attributes."""
tokens = tokens[0]
debug(sel, loc, '_pow', tokens)
if NUMB: return
base, err = self._getNumeric(sel, loc, tokens.pop(0))
if err: raise err
power, err = self._getNumeric(sel, loc, tokens.pop())
if err: raise err
if tokens.pop() not in ('^', '**'):
raise SelectionError(sel, loc, 'invalid power operator')
while tokens:
number, err = self._getNumeric(sel, loc, tokens.pop())
if err: raise err
power = number ** power
if tokens.pop() not in ('^', '**'):
raise SelectionError(sel, loc, 'invalid power operator')
return base ** power
def _sign(self, sel, loc, tokens):
"""Change the sign of a selection argument."""
tokens = tokens[0]
debug(sel, loc, '_sign', tokens)
if NUMB: return
if len(tokens) != 2:
raise SelectionError(sel, loc,
'sign operators (+/-) must be followed '
'by single keyword, e.g. "-x", "-beta"')
arg, err = self._getNumeric(sel, loc, tokens[1])
if err: raise err
if tokens[0] == '-':
arg = -arg
return arg
def _func(self, sel, loc, tokens):
"""Evaluate functions used in selection strings."""
tokens = list(tokens[0])
debug(sel, loc, '_func', tokens)
if NUMB: return
if len(tokens) != 2:
raise SelectionError(sel, loc, '{0} accepts a single numeric '
'argument, e.g. {0}(x)'.format(repr(tokens[0])))
arg, err = self._getNumeric(sel, loc, tokens[1])
if err: raise err
debug(sel, loc, tokens[0], arg)
return FUNCTIONS[tokens[0]](arg)
def _generic(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_generic', tokens)
label = tokens.pop(0)
data, err = self._getData(sel, loc, label)
if err: return None, err
if subset is not None:
data = data[subset]
subset = None
dtype = data.dtype
type_ = dtype.type
isstr = dtype.char == 'S' or dtype.char == 'U'
if isstr:
maxlen = int(dtype.str[2:])
torf = None
regexp = []
values = []
ranges = []
valset = True
for token in tokens:
# check for regular expressions which are only compatible with
# string data type
try:
token.pattern
except AttributeError:
pass
else:
if not isstr:
ptrn = '"{0}"'.format(token.pattern)
SelectionWarning(sel, loc, '{0} is a regular '
'expression and is not evaluated for {1}'
.format(ptrn, repr(label)), [label, ptrn])
else:
regexp.append(token)
continue
# check for ranges
if token[0] == 'range':
if isstr:
SelectionWarning(sel, loc, 'number ranges '
'are not evaluated with data type of {0}'
.format(repr(label)), [label, token[1]])
else:
if token[-1] in OPERATORS:
ranges.append(token)
else:
nrange = arange(*token[1:])
# if dtypes are not the same, don't use set method
if nrange.dtype != dtype: valset = False
values.extend(nrange)
continue
if isstr:
if token == '_':
values.append('')
values.append(' ')
else:
if len(token) > maxlen:
SelectionWarning(sel, loc, '{0} is longer than the '
'maximum characters for data field {1}'
.format(repr(token), repr(label)), [label, token])
values.append(token)
else:
try:
value = type_(token)
except Exception as err:
SelectionWarning(sel, loc, '{0} could not be '
'converted to type of {1} ({2})'
.format(repr(token), repr(label), str(err)),
[label, token])
continue
try:
val2 = float(token)
except:
pass
else:
if val2 != value:
SelectionWarning(sel, loc, '{0} has a different '
'values when converted to a float and to type of '
'{1}'.format(repr(token), repr(label)),
[label, token])
values.append(value)
if values:
# use first option only if values and data array has the same dtype
if valset and len(values) > 10:
valset = set(values)
torf = array([val in values for val in data], bool)
else:
torf = data == values.pop(0)
for val in values:
subset = (torf == False).nonzero()[0]
if len(subset) == 0: return torf, False
torf[subset] = data[subset] == val
if ranges:
while ranges:
_, start, stop, comp = ranges.pop(0)
if torf is None:
torf = start <= data
else:
subset = (torf == False).nonzero()[0]
if len(subset) == 0: return torf, False
subdata = data[subset]
torf[subset] = start <= subdata
if subset is None:
torf = logical_and(torf, OPERATORS[comp](data, stop), torf)
else:
torf[subset] = logical_and(torf[subset],
OPERATORS[comp](subdata, stop))
if regexp:
for re in regexp:
if torf is None:
torf = array([re.match(val) is not None
for val in data], bool)
else:
subset = (torf == False).nonzero()[0]
if len(subset) == 0: return torf, False
torf[subset] = [re.match(val) is not None
for val in data[subset]]
if torf is None:
torf = self._getZeros(subset)
return torf, False
def _index(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_index', tokens)
label = tokens.pop(0)
torf = zeros(self._ag.numAtoms(), bool)
for token in tokens:
try:
remainder = token % 1.
except TypeError:
pass
else:
return None, SelectionError(sel, loc, 'it is a number, index')
if remainder == 0:
try:
torf[token] = True
except IndexError:
pass
else:
SelectionWarning(sel, loc, '{0} must be followed by '
'integers and/or number ranges'.format(repr(label)),
[label, token])
continue
try:
token.pattern
except AttributeError:
pass
else:
ptrn = '"{0}"'.format(token.pattern)
SelectionWarning(sel, loc, '{0} is a regular '
'expression and is not evaluated for {1}'
.format(ptrn, repr(label)), [label, ptrn])
continue
if token[0] == 'range':
_, start, stop, step = token
if step in OPERATORS:
if step == '<=':
stop += 1
step = 1
if start % 1.0 != 0 or stop % 1.0 != 0 or step % 1.0 != 0:
SelectionWarning(sel, loc, '{0} number ranges should be '
'specified by integers'.format(repr(label)),
[label, str(start)])
torf[start:stop:step] = True
continue
try:
val = float(token)
except TypeError:
return None, SelectionError(sel, loc, '{0} must be '
'followed by integers and/or number ranges',
[label, token])
else:
if val % 1.0 == 0:
try:
torf[int(val)] = True
except IndexError:
pass
else:
SelectionWarning(sel, loc, '{0} must be followed by '
'integers and/or number ranges'.format(repr(label)),
[label, token])
try:
indices = self._atoms._getIndices()
except AttributeError:
pass
else:
torf = torf[indices]
if subset is not None:
torf = torf[subset]
return torf, False
def _serial(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_serial', tokens)
label = tokens.pop(0)
sn2i = self._ag._getSN2I()
if sn2i is None:
return None, SelectionError(sel, loc, 'serial numbers are not set',
['serial'])
torf = zeros(len(sn2i), bool)
for token in tokens:
try:
remainder = token % 1.
except TypeError:
pass
else:
return None, SelectionError(sel, loc, '??? it is a number, serial')
if remainder == 0:
try:
torf[token] = True
except IndexError:
pass
else:
SelectionWarning(sel, loc, '{0} must be followed by '
'integers and/or number ranges'.format(repr(label)),
[label, token])
continue
try:
pattern = token.pattern
except AttributeError:
pass
else:
ptrn = '"{0}"'.format(pattern)
SelectionWarning(sel, loc, '{0} is a regular '
'expression and is not evaluated for {1}'
.format(ptrn, repr(label)), [label, ptrn])
continue
if token[0] == 'range':
_, start, stop, step = token
if step in OPERATORS:
if step == '<=':
stop += 1
step = 1
if start % 1.0 != 0 or stop % 1.0 != 0 or step % 1.0 != 0:
SelectionWarning(sel, loc, '{0} number ranges should be '
'specified by integers'.format(repr(label)),
[label, str(start)])
torf[start:stop:step] = True
continue
try:
val = float(token)
except TypeError:
return None, SelectionError(sel, loc, '{0} must be '
'followed by integers and/or number ranges',
[label, token])
else:
if val % 1.0 == 0:
try:
torf[int(val)] = True
except IndexError:
pass
else:
SelectionWarning(sel, loc, '{0} must be followed by '
'integers and/or number ranges'.format(repr(label)),
[label, token])
indices = sn2i[torf]
indices = indices[where(indices != -1)[0]]
if len(indices) == 0:
return self._getZeros(subset), False
torf = zeros(self._ag.numAtoms(), bool)
torf[indices] = True
try:
indices = self._atoms._getIndices()
except AttributeError:
pass
else:
torf = torf[indices]
if subset is not None:
torf = torf[subset]
return torf, False
def _resnum(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_resnum', tokens)
label = tokens.pop(0)
resnums, err = self._getData(sel, loc, 'resnum')
if err: return None, err
wicode = set([])
values = ['resnum']
for token in tokens:
try:
pattern = token.pattern
except AttributeError:
pass
else:
ptrn = '"{0}"'.format(pattern)
SelectionWarning(sel, loc, '{0} is a regular expression and '
'its use with {1} is not recommended'
.format(ptrn, repr(label)), [label, ptrn])
continue
if token[0] == 'range':
values.append(token)
continue
try:
value = float(token)
except (TypeError, ValueError):
icode = token[-1]
value = token[:-1]
try:
value = int(value)
except:
SelectionWarning(sel, loc, '{0} must be followed by '
'integers, number ranges, or integer and insertion '
'code combinations, e.g. {1}'
.format(repr(label), repr('10A')),
[label, token])
else:
wicode.add((value, '' if icode == '_' else icode))
else:
if value != int(token):
SelectionWarning(sel, loc, '{0} must be followed by '
'integers and/or number ranges'.format(repr(label)),
[label, token])
values.append(token)
torf = None
if len(values) > 1:
torf, _ = self._generic(sel, loc, values, subset)
if wicode:
icode, err = self._getData(sel, loc, 'icode')
if err: return None, err
if subset is None:
rnic = zip(resnums, icode) # PY3K: OK
else:
rnic = zip(resnums[subset], icode[subset]) # PY3K: OK
if torf is None:
torf = array([val in wicode for val in rnic], bool)
else:
torf = logical_or(torf, array([val in wicode for val in rnic],
bool), torf)
return torf, False
def _sequence(self, sel, loc, tokens, subset=None):
debug(sel, loc, '_sequence', tokens)
label = tokens.pop(0)
regexp = []
for token in tokens:
try:
token.pattern
except AttributeError:
if not token.isalpha() or not token.isupper():
SelectionWarning(sel, loc, '{0} does not look like a '
'valid sequence'.format(repr(token)),
[label, token])
try:
token = re_compile(token)
except Exception as err:
SelectionWarning(sel, loc, '{0} could not be compiled '
'as a regular expression for sequence evaluation'
.format(repr(token)), [label, token])
else:
regexp.append(token)
else:
regexp.append(token)
if not regexp:
return self._getZeros(subset), False
calpha = self._atoms.calpha
if calpha is None:
return self._getZeros(subset), False
matches = []
for chain in iter(HierView(calpha)):
sequence = chain.getSequence()
indices = chain._getIndices()
for re in regexp:
for match in re.finditer(sequence):
matches.extend(indices[match.start():match.end()])
if matches:
torf = zeros(self._ag.numAtoms(), bool)
torf[matches] = True
if self._indices is not None:
torf = torf[self._indices]
torf = self._sameas(sel, loc, [('same', 'residue', 'as'), torf])[0]
if subset is None:
return torf, False
else:
return torf[subset], False
else:
return self._getZeros(subset), False
def _getData(self, sel, loc, keyword):
"""Returns atomic data."""
data = self._data.get(keyword)
if data is not None:
return data, False
try:
idx = XYZ2INDEX[keyword]
except KeyError:
pass
else:
data = self._getCoords()
if data is not None:
data = data[:,idx]
self._data[keyword] = data
return data, False
if keyword == 'index':
try:
data = self._atoms._getIndices()
except AttributeError:
data = arange(self._atoms.numAtoms())
self._data['index'] = data
return data, False
field = ATOMIC_FIELDS.get(FIELDS_SYNONYMS.get(keyword, keyword))
if field is None:
data = self._atoms._getData(keyword)
if data is None:
return None, SelectionError(sel, loc, '{0} is not a valid '
'data label'.format(repr(keyword)))
elif not isinstance(data, ndarray) and data.ndim == 1:
return None, SelectionError(sel, loc, '{0} is not a 1d '
'array'.format(repr(keyword)))
else:
try:
data = getattr(self._atoms, '_get' + field.meth_pl)()
except Exception as err:
return None, SelectionError(sel, loc, str(err))
if data is None:
return None, SelectionError(sel, loc, '{0} is not set'
.format(repr(keyword)))
self._data[keyword] = data
return data, False
def _getCoords(self):
"""Returns coordinates of atoms."""
if self._coords is None:
self._coords = self._atoms._getCoords()
return self._coords