Skip to content

Commit

Permalink
Merge pull request #44 from deeptools/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
joachimwolff authored Jul 20, 2021
2 parents c44f5ed + 9ec576d commit a945716
Show file tree
Hide file tree
Showing 9 changed files with 130 additions and 88 deletions.
53 changes: 0 additions & 53 deletions .travis.yml

This file was deleted.

13 changes: 7 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,33 +1,34 @@
HiCMatrix
===========

This library implements the central class of HiCExplorer to manage Hi-C interaction matrices. It is separated from the main project to enable to use of Hi-C matrices
in other project without the dependency to HiCExplorer. Moreover, it enables us to use the already separated pyGenomeTracks (former hicPlotTADs) to be used in HiCExplorer
This library implements the central class of HiCExplorer to manage Hi-C interaction matrices. It is separated from the main project to enable Hi-C matrices
in other projects without the dependency on HiCExplorer. Moreover, it enables us to use the already separated pyGenomeTracks (former hicPlotTADs) in HiCExplorer
because mutual dependencies are resolved.

With version 8 we dropped the support for Python 2.
With version 8, we dropped the support for Python 2.

Version 14 introduced the official support for scool file format, used by scHiCExplorer since version 5: https://github.com/joachimwolff/scHiCExplorer and https://schicexplorer.readthedocs.io/en/latest/.

Read support
-------------

- h5
- cool
- cool / mcool / scool
- hicpro
- homer

Write support
--------------

- h5
- cool
- cool / mcool
- scool
- homer
- ginteractions
- hicpro

Citation:
^^^^^^^^^

Joachim Wolff, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, Björn A Grüning.
**Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization, Nucleic Acids Research**, gkaa220, https://doi.org/10.1093/nar/gkaa220
**Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization, Nucleic Acids Research**, Volume 48, Issue W1, 02 July 2020, Pages W177–W184, https://doi.org/10.1093/nar/gkaa220
80 changes: 80 additions & 0 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
pr:
autoCancel: true

jobs:

- job: 'Linux'
timeoutInMinutes: 0
pool:
vmImage: 'ubuntu-latest'
strategy:
matrix:
Python36:
python.version: '3.6'
Python37:
python.version: '3.7'
Python38:
python.version: '3.8'

steps:
- bash: |
echo "##vso[task.prependpath]$CONDA/bin"
hash -r
displayName: Add conda to PATH
- bash: |
conda config --set always_yes yes --set changeps1 no
conda info -a
conda create -n hicmatrix --yes -c conda-forge -c bioconda python=$(python.version) --file requirements.txt
source activate hicmatrix
conda install --yes -c conda-forge -c bioconda pytest flake8 pytest-xdist pytest-forked
conda install --yes -c conda-forge -c bioconda nose
conda install --yes pathlib
conda install --yes -c defaults -c conda-forge -c bioconda configparser
python setup.py install
displayName: installing dependencies
- script: |
source activate hicmatrix
flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F401,F403,E402,F999,F405,E712
displayName: linting
- script: |
source activate hicmatrix
py.test hicmatrix/test/ --capture=sys
displayName: pytest
- job: 'OSX'
timeoutInMinutes: 0
pool:
vmImage: 'macOS-10.14'
strategy:
matrix:
Python36:
python.version: '3.6'
Python37:
python.version: '3.7'
Python38:
python.version: '3.8'

steps:
- bash: |
echo "##vso[task.prependpath]$CONDA/bin"
hash -r
displayName: Add conda to PATH
- bash: |
conda config --set always_yes yes --set changeps1 no
conda info -a
conda create -n hicmatrix --yes -c conda-forge -c bioconda python=$(python.version) --file requirements.txt
source activate hicmatrix
conda install --yes -c conda-forge -c bioconda pytest flake8 pytest-xdist pytest-forked
conda install --yes -c conda-forge -c bioconda nose
conda install --yes pathlib
conda install --yes -c defaults -c conda-forge -c bioconda configparser
python setup.py install
displayName: installing dependencies
- script: |
source activate hicmatrix
flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F401,F403,E402,F999,F405,E712
displayName: linting
- script: |
source activate hicmatrix
py.test hicmatrix/test/ --capture=sys
displayName: pytest
15 changes: 9 additions & 6 deletions hicmatrix/HiCMatrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from intervaltree import IntervalTree, Interval
import cooler
import time
from collections import Counter

from .utilities import toBytes
from .utilities import toString
Expand Down Expand Up @@ -169,7 +170,7 @@ def getBinSize(self):
return self.bin_size
# If there are more bins, the diff will be compared
# to the median of the differences between starts
median = int(np.median(np.diff(start)))
median = int(np.median(np.concatenate([np.diff([start for chro, start, end, extra in self.cut_intervals if chro == cur_chrom]) for cur_chrom, nb in Counter(chrom).items() if nb > 1])))

# check if the bin size is
# homogeneous
Expand Down Expand Up @@ -334,7 +335,7 @@ def fit_cut_intervals(cut_intervals):
return cut_intervals
chrom, start, end, extra = zip(*cut_intervals)

median = int(np.median(np.diff(start)))
median = int(np.median(np.concatenate([np.diff([start for chro, start, end, extra in cut_intervals if chro == cur_chrom]) for cur_chrom, nb in Counter(chrom).items() if nb > 1])))
diff = np.array(end) - np.array(start)
# check if the bin size is homogeneous
if len(np.flatnonzero(diff != median)) > (len(diff) * 0.01):
Expand All @@ -354,7 +355,7 @@ def snap_nearest_multiple(start_x, m):
def convert_to_zscore_matrix(self, maxdepth=None, perchr=False):
return self.convert_to_obs_exp_matrix(maxdepth=maxdepth, zscore=True, perchr=perchr)

def convert_to_obs_exp_matrix(self, maxdepth=None, zscore=False, perchr=False):
def convert_to_obs_exp_matrix(self, maxdepth=None, zscore=False, perchr=False, pSkipTriu=False):
"""
Converts a corrected counts matrix into a
obs / expected matrix or z-scores fast.
Expand Down Expand Up @@ -395,10 +396,12 @@ def convert_to_obs_exp_matrix(self, maxdepth=None, zscore=False, perchr=False):
# max_depth_in_bis
# (this is done by subtracting a second sparse matrix
# that contains only the upper matrix that wants to be removed.
self.matrix = triu(self.matrix, k=0, format='csr') - \
triu(self.matrix, k=max_depth_in_bins, format='csr')
if not pSkipTriu:
self.matrix = triu(self.matrix, k=0, format='csr') - \
triu(self.matrix, k=max_depth_in_bins, format='csr')
else:
self.matrix = triu(self.matrix, k=0, format='csr')
if not pSkipTriu:
self.matrix = triu(self.matrix, k=0, format='csr')

self.matrix.eliminate_zeros()
depth = None
Expand Down
2 changes: 2 additions & 0 deletions hicmatrix/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import logging
logging.basicConfig(level=logging.INFO)
# logging.basicConfig(level=logging.DEBUG)

logging.getLogger('cooler').setLevel(logging.WARNING)
2 changes: 1 addition & 1 deletion hicmatrix/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = '15'
__version__ = '16'
# Version number differs from HiCExplorer!
32 changes: 17 additions & 15 deletions hicmatrix/lib/cool.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,12 @@ def load(self):
log.warning('No matrix is initialized')
try:
cooler_file = cooler.Cooler(self.matrixFileName)
if 'metadata' in cooler_file.info:
self.hic_metadata = cooler_file.info['metadata']
else:
self.hic_metadata = None
self.cool_info = deepcopy(cooler_file.info)
# if 'metadata' in cooler_file.info:
self.hic_metadata = cooler_file.info
# else:
# self.hic_metadata = None
# self.cool_info = deepcopy(cooler_file.info)
# log.debug('self.hic_metadata {}'.format(self.hic_metadata))
except Exception as e:
log.warning("Could not open cooler file. Maybe the path is wrong or the given node is not available.")
log.warning('The following file was tried to open: {}'.format(self.matrixFileName))
Expand Down Expand Up @@ -256,10 +257,13 @@ def load(self):
nan_bins = None

distance_counts = None
# log.debug('self.hic_metadata {}'.format(self.hic_metadata))

return matrix, cut_intervals, nan_bins, distance_counts, correction_factors

def create_cooler_input(self, pSymmetric=True, pApplyCorrection=True):
log.debug('self.hic_metadata 34{}'.format(self.hic_metadata))

self.matrix.eliminate_zeros()

if self.nan_bins is not None and len(self.nan_bins) > 0 and self.fileWasH5:
Expand Down Expand Up @@ -296,6 +300,7 @@ def create_cooler_input(self, pSymmetric=True, pApplyCorrection=True):
# instead of handling this before.
bins_data_frame = pd.DataFrame(self.cut_intervals, columns=['chrom', 'start', 'end', 'interactions']).drop('interactions', axis=1)
dtype_pixel = {'bin1_id': np.int32, 'bin2_id': np.int32, 'count': np.int32}
log.debug('foo')
if self.correction_factors is not None and pApplyCorrection:
dtype_pixel['weight'] = np.float32

Expand All @@ -313,6 +318,7 @@ def create_cooler_input(self, pSymmetric=True, pApplyCorrection=True):
self.correctionOperator = '*'
log.debug('inverted correction factors')
weight = convertNansToOnes(np.array(self.correction_factors).flatten())
log.debug('weight {}'.format(weight))
bins_data_frame = bins_data_frame.assign(weight=weight)

log.debug("Reverting correction factors on matrix...")
Expand Down Expand Up @@ -340,7 +346,7 @@ def create_cooler_input(self, pSymmetric=True, pApplyCorrection=True):
dtype_pixel['weight'] = np.float32
weight = convertNansToOnes(np.array(self.correction_factors).flatten())
bins_data_frame = bins_data_frame.assign(weight=weight)

log.debug('weight 2: {}'.format(weight))
instances, features = self.matrix.nonzero()

matrix_data_frame = pd.DataFrame(instances, columns=['bin1_id'], dtype=np.int32)
Expand Down Expand Up @@ -386,27 +392,21 @@ def create_cooler_input(self, pSymmetric=True, pApplyCorrection=True):

info['tool-url'] = str('https://github.com/deeptools/HiCMatrix')

# info['nchroms'] = int(bins_data_frame['chrom'][:].nunique())
# info['chromosomes'] = list(bins_data_frame['chrom'][:].unique())
# info['nnz'] = np.string_(str(self.matrix.nnz * 2))
# info['min-value'] = np.string_(str(matrix_data_frame['count'].min()))
# info['max-value'] = np.string_(str(matrix_data_frame['count'].max()))
# info['sum-elements'] = int(matrix_data_frame['count'].sum())

if self.hic_metadata is not None and 'matrix-generated-by' in self.hic_metadata:
info['matrix-generated-by'] = str(self.hic_metadata['matrix-generated-by'])
del self.hic_metadata['matrix-generated-by']
if self.hic_metadata is not None and 'matrix-generated-by-url' in self.hic_metadata:
info['matrix-generated-by-url'] = str(self.hic_metadata['matrix-generated-by-url'])
del self.hic_metadata['matrix-generated-by-url']
log.debug('self.hic_metadata {}'.format(self.hic_metadata))
if self.hic_metadata is not None and 'genome-assembly' in self.hic_metadata:
info['genome-assembly'] = str(self.hic_metadata['genome-assembly'])
del self.hic_metadata['genome-assembly']

return bins_data_frame, matrix_data_frame, dtype_pixel, info

def save(self, pFileName, pSymmetric=True, pApplyCorrection=True):
log.debug('Save in cool format')
log.debug('Save in cool format11112323')

bins_data_frame, matrix_data_frame, dtype_pixel, info = self.create_cooler_input(pSymmetric=pSymmetric, pApplyCorrection=pApplyCorrection)
local_temp_dir = os.path.dirname(os.path.realpath(pFileName))
Expand All @@ -416,9 +416,11 @@ def save(self, pFileName, pSymmetric=True, pApplyCorrection=True):
mode=self.appendData,
dtypes=dtype_pixel,
ordered=True,
metadata=self.hic_metadata,
metadata=info,

temp_dir=local_temp_dir)

log.debug('info {}'.format(info))
if self.appendData == 'w':
fileName = pFileName.split('::')[0]
with h5py.File(fileName, 'r+') as h5file:
Expand Down
13 changes: 13 additions & 0 deletions hicmatrix/lib/hicpro.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,16 @@ def load(self):
distance_counts = None
correction_factors = None
return matrix, cut_intervals, nan_bins, distance_counts, correction_factors

def save(self, pFilename, pSymmetric=None, pApplyCorrection=None):
self.matrix.eliminate_zeros()
instances, features = self.matrix.nonzero()
data = self.matrix.data

with open(pFilename, 'w') as matrix_file:
for x, y, value in zip(instances, features, data):
matrix_file.write(str(int(x + 1)) + '\t' + str(int(y + 1)) + '\t' + str(value) + '\n')

with open(self.bedFile, 'w') as bed_file:
for i, interval in enumerate(self.cut_intervals):
bed_file.write('\t'.join(map(str, interval[:3])) + '\t' + str(i + 1) + '\n')
8 changes: 1 addition & 7 deletions hicmatrix/lib/matrixFileHandler.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,33 +18,27 @@ def __init__(self, pFileType='cool', pMatrixFile=None, pChrnameList=None,
if pFileType == 'hicpro':
self.matrixFile = self.class_(pMatrixFile=pMatrixFile, pBedFile=pBedFileHicPro)
else:
log.debug('23')
self.matrixFile = self.class_(pMatrixFile=pMatrixFile)
log.debug('22 self.matrixFile.matrixFileName {}'.format(self.matrixFile.matrixFileName))
if pFileType == 'cool':
self.matrixFile.chrnameList = pChrnameList
if pCorrectionFactorTable is not None:
self.matrixFile.correctionFactorTable = pCorrectionFactorTable
if pCorrectionOperator is not None:
self.matrixFile.correctionOperator = pCorrectionOperator
if pEnforceInteger is not None:
log.debug('pEnforceInteger {}'.format(pEnforceInteger))
self.matrixFile.enforceInteger = pEnforceInteger
if pAppend is not None:
self.matrixFile.appendData = pAppend
if pFileWasH5 is not None:
self.matrixFile.fileWasH5 = pFileWasH5
log.debug('pApplyCorrectionCoolerLoad {}'.format(pApplyCorrectionCoolerLoad))
if pApplyCorrectionCoolerLoad is not None:
self.matrixFile.applyCorrectionLoad = pApplyCorrectionCoolerLoad
if pHiCInfo is not None:
self.hic_metadata = pHiCInfo
log.debug('pHic2CoolVersion : {}'.format(pHic2CoolVersion))
self.matrixFile.hic_metadata = pHiCInfo
if pHic2CoolVersion is not None:
self.matrixFile.hic2cool_version = pHic2CoolVersion
if pDistance is not None:
self.matrixFile.distance = pDistance
log.debug('self.distance {}'.format(self.matrixFile.distance))
if pMatrixFormat is not None:
self.matrixFile.matrixFormat = pMatrixFormat
if pLoadMatrixOnly is not None:
Expand Down

0 comments on commit a945716

Please sign in to comment.