Straw HiC API¶
Straw module
Straw enables programmatic access to .hic files. .hic files store the contact matrices from Hi-C experiments and the normalization and expected vectors, along with meta-data in the header.
The main function, straw, takes in the normalization, the filename or URL, chromosome1 (and optional range), chromosome2 (and optional range), whether the bins desired are fragment or base pair delimited, and bin size.
It then reads the header, follows the various pointers to the desired matrix and normalization vector, and stores as [x, y, count]
Usage: straw <NONE/VC/VC_SQRT/KR> <hicFile(s)> <chr1>[:x1:x2] <chr2>[:y1:y2] <BP/FRAG> <binsize>
Example: >>>import straw >>>result = straw.straw(‘NONE’, ‘HIC001.hic’, ‘X’, ‘X’, ‘BP’, 1000000) >>>for i in range(len(result[0])): ... print(“{0} {1} {2}”.format(result[0][i], result[1][i], result[2][i]))
See https://github.com/theaidenlab/straw/wiki/Python for more documentation
-
getBlockNumbersForRegionFromBinPosition
(regionIndices, blockBinCount, blockColumnCount, intra)[source]¶ Gets the block numbers we will need for a specific region; used when the range to extract is sent in as a parameter
- Args:
- regionIndices (array): Array of ints giving range blockBinCount (int): The block bin count of the matrix blockColumnCount (int): The block column count of the matrix intra: Flag indicating if this is an intrachromosomal matrix
- Returns:
- blockSet (set): A set of blocks to print
-
printme
(norm, infile, chr1loc, chr2loc, unit, binsize, outfile)[source]¶ Reads a .hic file and extracts and prints the given contact matrix to a text file
- Args:
- norm(str): Normalization type, one of VC, KR, VC_SQRT, or NONE infile(str): File name or URL of .hic file chr1loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” chr2loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” unit(str): One of BP or FRAG binsize(int): Resolution, i.e. 25000 for 25K outfile(str): Name or stream of text file to write to
-
readBlock
(req, size)[source]¶ Reads the block - reads the compressed bytes, decompresses, and stores results in array. Presumes file pointer is in correct position.
- Args:
- req (file): File to read from. Presumes file pointer is in correct position size (int): How many bytes to read
- Returns:
- array containing row, column, count data for this block
Reads the footer, which contains all the expected and normalization vectors. Presumes file pointer is in correct position Args:
req (file): File to read from; presumes file pointer is in correct position chr1 (str): Chromosome 1 chr2 (str): Chromosome 2 norm (str): Normalization type, one of NONE, VC, KR, VC_SQRT unit (str): One of BP or FRAG resolution (int): Bin size- Returns:
- list: File position of matrix, position+size chr1 normalization vector,
- position+size chr2 normalization vector
-
readHeader
(req, chr1, chr2, posilist)[source]¶ Reads the header
- Args:
- req (file): File to read from chr1 (str): Chromosome 1 chr2 (str): Chromosome 2 c1pos1 (int, optional): Starting range of chromosome1 output c1pos2 (int, optional): Stopping range of chromosome1 output c2pos1 (int, optional): Starting range of chromosome2 output c2pos2 (int, optional): Stopping range of chromosome2 output
- Returns:
- list: master index, chromosome1 index, chromosome2 index
-
readMatrix
(req, unit, binsize)[source]¶ Reads the matrix - that is, finds the appropriate pointers to block data and stores them. Needs to read through headers of zoom data to find appropriate matrix. Presumes file pointer is in correct position.
- Args:
- req (file): File to read from; presumes file pointer is in correct position unit (str): Unit to search for (BP or FRAG) binsize (int): Resolution to search for
- Returns:
- list containing block bin count and block column count of matrix
-
readMatrixZoomData
(req, myunit, mybinsize)[source]¶ Reads the Matrix Zoom Data, which gives pointer list for blocks for the data. Presumes file pointer is in correct position
- Args:
- req (file): File to read from; presumes file pointer is in correct position myunit (str): Unit (BP or FRAG) we’re searching for mybinsize (int): Resolution we’re searching for
- Returns:
- list containing boolean indicating if we found appropriate matrix, and if so, the counts for the bins and columns
-
readNormalizationVector
(req)[source]¶ Reads the normalization vector from the file; presumes file pointer is in correct position
- Args:
- req (file): File to read from; presumes file pointer is in correct position
- Returns:
- Array of normalization values
-
straw
(norm, infile, chr1loc, chr2loc, unit, binsize)[source]¶ This is the main workhorse method of the module. Reads a .hic file and extracts the given contact matrix. Stores in an array in sparse upper triangular format: row, column, (normalized) count
- Args:
- norm(str): Normalization type, one of VC, KR, VC_SQRT, or NONE infile(str): File name or URL of .hic file chr1loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” chr2loc(str): Chromosome name and (optionally) range, i.e. “1” or “1:10000:25000” unit(str): One of BP or FRAG binsize(int): Resolution, i.e. 25000 for 25K