206-如何拿10X数据去做inferCNV

Sep 15, 2020 2 min read cnposts

刘小泽写于2020.9.15

inferCNV需要准备三个文件

在官网也有提到：https://github.com/broadinstitute/inferCNV/wiki/File-Definitions#sample-annotation-file

原始表达矩阵: 可以是smart-seq2，也可以是10X数据，但最好也是先去除低质量细胞。其他类型官网也没测试过
细胞注释 ：每个细胞对应表矩阵的列名。这个文件做成两列，第一列是细胞ID（对应上面表达矩阵的列名），第二列是细胞类型，并且不需要设置列名，例如：
```
MGH54_P2_C12    Microglia/Macrophage
MGH36_P6_F03    Microglia/Macrophage
MGH54_P16_F12   Oligodendrocytes (non-malignant)
MGH54_P12_C10   Oligodendrocytes (non-malignant)
MGH36_P1_B02    malignant_MGH36
MGH36_P1_H10    malignant_MGH36
```
如果存在许多正常细胞，但类型有很多（比如有免疫细胞、正常的成纤维细胞等），可以分别给它们归档，也可以统一命名为”normal“，这样的话它们都会被当成一类。所以说还是分的越细越好

基因位置： 每个基因也是对应着表达矩阵的行名**。使用tab分割，记录了每个基因的染色体以及起始终止，另外基因名不能有重复类似于：

WASH7P  chr1    14363   29806
LINC00115       chr1    761586  762902
NOC2L   chr1    879584  894689
MIR200A chr1    1103243 1103332
SDF4    chr1    1152288 1167411
UBE2J2  chr1    1189289 1209265

如何获取基因位置？

有两个方法：

根据官方的：https://data.broadinstitute.org/Trinity/CTAT/cnv/，但是这个文件版本有点旧，2016年的

自己生成：

首先下载一个脚本：https://github.com/broadinstitute/infercnv/blob/master/scripts/gtf_to_position_file.py

然后下载最新的gtf文件：

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_35/gencode.v35.annotation.gtf.gz

最后获取基因位置

$ python gtf_to_position.py --attribute_name "gene_name"  gencode.v35.annotation.gtf gene_pos.txt
  
$ head gene_pos.txt -n5
DDX11L1	chr1	11869	14409
WASH7P	chr1	14404	29570
MIR6859-1	chr1	17369	17436
MIR1302-2HG	chr1	29554	31109
MIR1302-2	chr1	30366	30503

最后进行inferCNV操作

可以参考：https://github.com/broadinstitute/inferCNV/wiki/Running-InferCNV

第一步：构建`inferCNV`对象

其中比较重要的参数是ref_group_names 如果没有参考细胞，就设置set ref_group_names=NULL

infercnv_obj = CreateInfercnvObject(raw_counts_matrix="singleCell.counts.matrix",
                                    annotations_file="cellAnnotations.txt",
                                    delim="\t",
                                    gene_order_file="gene_ordering_file.txt",
                                    ref_group_names=c("normal"))

第二步：运行程序

其中比较重要的参数是cutoff，建议10X数据使用cutoff=0.1，smart-seq2数据使用cutoff=1

infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=0.1,  
                             out_dir="output_dir",  # 如果输出文件夹不存在，会自己创建
                             cluster_by_groups=T,   # cluster
                             denoise=T,
                             HMM=T
                             )

R scRNA

206-如何拿10X数据去做inferCNV

inferCNV需要准备三个文件

如何获取基因位置？

最后进行inferCNV操作

第一步：构建`inferCNV`对象

第二步：运行程序

Yunze Liu

Bioinformatics Sharer

Related

206-如何拿10X数据去做inferCNV

inferCNV需要准备三个文件

如何获取基因位置？

最后进行inferCNV操作

第一步：构建inferCNV对象

第二步：运行程序

Yunze Liu

Bioinformatics Sharer

Related

第一步：构建`inferCNV`对象