Articles

< Previous         Next >  
scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking 
Shichen Fan1 , Dachang Dang2 , Yusen Ye1 , Shao-Wu Zhang2 , Lin Gao1,* , Shihua Zhang3,4,5,6,*
1School of Computer Science and Technology, Xidian University, Xi’an 710071, China
2School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
3NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
4School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
5Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
6Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
*Correspondence to:Lin Gao , Email:lgao@mail.xidian.edu.cn Shihua Zhang , Email:zsh@amss.ac.cn
J Mol Cell Biol, Volume 15, Issue 1, January 2023, mjad003,  https://doi.org/10.1093/jmcb/mjad003
Keyword: single-cell Hi-C data, simulator, high-fidelity, single-cell Hi-C clustering, distance-stratified sampling, benchmarking

Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.