MSA File

This module defines functions and classes for parsing, manipulating, and analyzing multiple sequence alignments.

class MSAFile(msa, mode='r', format=None, aligned=True, **kwargs)[source]

Handle MSA files in FASTA, SELEX, CLUSTAL and Stockholm formats.

msa may be a filename or a stream. Multiple sequence alignments can be read from or written in FASTA (.fasta), Stockholm (.sth), CLUSTAL (.aln), or SELEX (.slx) format. For specified extensions, format argument is not needed. If aligned is True, unaligned sequences in the file or stream will cause an IOError exception. filter, a function that returns a boolean, can be used for filtering sequences, see MSAFile.setFilter() for details. slice can be used to slice sequences, and is applied after filtering, see MSAFile.setSlice() for details.

close()[source]

Close the file. This method will not affect a stream.

getFilename()[source]

Returns filename, or None if instance is handling a stream.

getFilter()[source]

Returns function used for filtering sequences.

getFormat()[source]

Returns file format.

getSlice()[source]

Returns object used to slice sequences.

getTitle()[source]

Returns title of the instance.

isAligned()[source]

Returns True if MSA is aligned.

reset()[source]

Returns to the beginning of the file.

setFilter(filter, filter_full=False)[source]

Set function used for filtering sequences. filter will be applied to split sequence label, by default. If filter_full is True, filter will be applied to the full label.

setSlice(slice)[source]

Set object used to slice sequences, which may be a slice() or a list() of numbers.

setTitle(title)[source]

Set title of the instance.

write(seq)[source]

Write seq, an Sequence instance, into the MSA file.

closed

True for closed file.

format

Format of the MSA file.

splitSeqLabel(label)[source]

Returns label, starting residue number, and ending residue number parsed from sequence label.

parseMSA(filename, **kwargs)[source]

Returns an MSA instance that stores multiple sequence alignment and sequence labels parsed from Stockholm, SELEX, CLUSTAL, PIR, or FASTA format filename file, which may be a compressed file. Uncompressed MSA files are parsed using C code at a fraction of the time it would take to parse compressed files in Python.

writeMSA(filename, msa, **kwargs)[source]

Returns filename containing msa, a MSA or MSAFile instance, in the specified format, which can be SELEX, Stockholm, or FASTA. If compressed is True or filename ends with .gz, a compressed file will be written. MSA instances will be written using C function into uncompressed files.

Can also write CLUSTAL or PIR format files using Python functions.