Straw HiC API

Straw module

Straw enables programmatic access to .hic files. .hic files store the contact matrices from Hi-C experiments and the normalization and expected vectors, along with meta-data in the header.

The main function, straw, takes in the normalization, the filename or URL, chromosome1 (and optional range), chromosome2 (and optional range), whether the bins desired are fragment or base pair delimited, and bin size.

It then reads the header, follows the various pointers to the desired matrix and normalization vector, and stores as [x, y, count]

Usage: straw <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr1>[:x1:x2] <chr2>[:y1:y2] <BP/FRAG> <binsize>

Example: >>>import straw >>>result = straw.straw(‘NONE’, ‘HIC001.hic’, ‘X’, ‘X’, ‘BP’, 1000000) >>>for i in range(len(result[0])): ... print(“{0} {1} {2}”.format(result[0][i], result[1][i], result[2][i]))

See https://github.com/theaidenlab/straw/wiki/Python for more documentation

getBlockNumbersForRegionFromBinPosition(regionIndices, blockBinCount, blockColumnCount, intra)[source]

Gets the block numbers we will need for a specific region; used when the range to extract is sent in as a parameter

Args:
regionIndices (array): Array of ints giving range blockBinCount (int): The block bin count of the matrix blockColumnCount (int): The block column count of the matrix intra: Flag indicating if this is an intrachromosomal matrix
Returns:
blockSet (set): A set of blocks to print
printme(norm, infile, chr1loc, chr2loc, unit, binsize, outfile)[source]

Reads a .hic file and extracts and prints the given contact matrix to a text file

Args:
norm(str): Normalization type, one of VC, KR, VC_SQRT, or NONE infile(str): File name or URL of .hic file chr1loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” chr2loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” unit(str): One of BP or FRAG binsize(int): Resolution, i.e. 25000 for 25K outfile(str): Name or stream of text file to write to
readBlock(req, size)[source]

Reads the block - reads the compressed bytes, decompresses, and stores results in array. Presumes file pointer is in correct position.

Args:
req (file): File to read from. Presumes file pointer is in correct position size (int): How many bytes to read
Returns:
array containing row, column, count data for this block
readFooter(req, c1, c2, norm, unit, resolution)[source]

Reads the footer, which contains all the expected and normalization vectors. Presumes file pointer is in correct position Args:

req (file): File to read from; presumes file pointer is in correct position chr1 (str): Chromosome 1 chr2 (str): Chromosome 2 norm (str): Normalization type, one of NONE, VC, KR, VC_SQRT unit (str): One of BP or FRAG resolution (int): Bin size
Returns:
list: File position of matrix, position+size chr1 normalization vector,
position+size chr2 normalization vector
readHeader(req, chr1, chr2, posilist)[source]

Reads the header

Args:
req (file): File to read from chr1 (str): Chromosome 1 chr2 (str): Chromosome 2 c1pos1 (int, optional): Starting range of chromosome1 output c1pos2 (int, optional): Stopping range of chromosome1 output c2pos1 (int, optional): Starting range of chromosome2 output c2pos2 (int, optional): Stopping range of chromosome2 output
Returns:
list: master index, chromosome1 index, chromosome2 index
readMatrix(req, unit, binsize)[source]

Reads the matrix - that is, finds the appropriate pointers to block data and stores them. Needs to read through headers of zoom data to find appropriate matrix. Presumes file pointer is in correct position.

Args:
req (file): File to read from; presumes file pointer is in correct position unit (str): Unit to search for (BP or FRAG) binsize (int): Resolution to search for
Returns:
list containing block bin count and block column count of matrix
readMatrixZoomData(req, myunit, mybinsize)[source]

Reads the Matrix Zoom Data, which gives pointer list for blocks for the data. Presumes file pointer is in correct position

Args:
req (file): File to read from; presumes file pointer is in correct position myunit (str): Unit (BP or FRAG) we’re searching for mybinsize (int): Resolution we’re searching for
Returns:
list containing boolean indicating if we found appropriate matrix, and if so, the counts for the bins and columns
readNormalizationVector(req)[source]

Reads the normalization vector from the file; presumes file pointer is in correct position

Args:
req (file): File to read from; presumes file pointer is in correct position
Returns:
Array of normalization values
straw(norm, infile, chr1loc, chr2loc, unit, binsize)[source]

This is the main workhorse method of the module. Reads a .hic file and extracts the given contact matrix. Stores in an array in sparse upper triangular format: row, column, (normalized) count

Args:
norm(str): Normalization type, one of VC, KR, VC_SQRT, or NONE infile(str): File name or URL of .hic file chr1loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” chr2loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” unit(str): One of BP or FRAG binsize(int): Resolution, i.e. 25000 for 25K