雲計算

AGS無服務化分析基因數據 – mutect2 腫瘤樣本分析

AGS無服務化分析基因數據

通過調用AGS的遠程任務,可以完成一序列的基因數據的二級分析,不需要申請和持有云計算資源,就可以完成對海量數據的批量處理,目前可以支持人類全基因組,外顯子,基因比對,宏基因組比對,Somatic胚系變異發現等業務場景的加速和低成本處理。詳細使用方式參考AGS服務

image.png

ags remote run --help
run aliyun custom process

Usage:
  ags remote run [flags]
  ags remote run [command]

Available Commands:
  hc          HaplotypeCaller job for haplotypes
  mapping     mapping job, it is equal to a combination of "bwa aln, bwa sampe, samtools sort, gatk MarkDuplicates "
  mutect2     mutect2 job for somatic variant
  rna-mapping mapping job for virus
  wgs         end to end job of mapping and HaplotypeCaller for WGS, WES and so on

通過AGS分析腫瘤樣本

通過AGS調用mutect2任務來檢測體細胞短突變, 短突變包括單核苷酸(SNV)以及插入和缺失(Indel)的改變。本文介紹如何通過AGS分析腫瘤樣本。

背景信息

AGS mutect2 支持兩種模式的典型場景:

  • 腫瘤加正常樣本模式:腫瘤樣本在分析過程跳過正常人的胚系變異。
  • 腫瘤模式:對單個腫瘤樣本的比對數據進行分析。

mutect2的體系變異檢測是保持了和GATK4.1.3一致的變異檢測方式,但提供了30-80倍的加速。針對90Gbase的比對數據,10分鐘內可以完成變異檢測。

在腫瘤加正常樣本模式下分析樣本

以給定匹配的正常樣本作為基準,Mutect2僅檢測體細胞變異。 mutect2會根據提供的證據(例如在匹配的正常人中),實現跳過在胚系中明顯存在的變異的邏輯,以避免在胚系事件上花費計算資源。

用法:

Usage:

ags remote run mutect2 \
--region cn-shenzhen # region of oss, e.g. cn-shenzhen, cn-beijing and etc\
--bucket my-test-shenzhen # Bucket name\
--input-bam-tumor bam/HKU2_160660.bam #Tumor sample bam file\
--input-bam-normal  bam/MGISEQ_NA12878_RG_HG38.bam  # Optional normal sample bam \
--bed bed/performance.blocks.exp.bed # Optional target bed \
--output-vcf vcf/HKU2_160660.vcf  # Output filename\
--service "s" #SLA: [n:normal|s:silver|g:gold|p:platinum]\
--reference [hg19|hg38|<reference path on OSS>] # hg19: it is hs37d5 version, GRCh37/hg19 include decoy contig, no support for UCSC hg19. hg38: GRCh38/hg38 include decoy

e.g:
ags remote run mutect2 \
--region cn-shenzhen \
--bucket my-test-shenzhen \
--input-bam-tumor bam/HKU2_160660.bam \
--input-bam-normal  bam/MGISEQ_NA12878_RG.bam \
--output-vcf vcf/HKU2_160660.vcf \
--service "s"  \
--reference hg19
INFO[0001] {"JobName":"mutect2-gpu-vp7d9"}
INFO[0001] Job submit succeed

ags remote get mutect2-gpu-vp7d9 --show
+-------------------+------------------+---------+-------------------------------+---------------+-------------+-------------+
|     JOB NAME      |  JOB NAMESPACE   | STATUS  |          CREATE TIME          |   DURATION    | TOTAL READS | TOTAL BASES |
+-------------------+------------------+---------+-------------------------------+---------------+-------------+-------------+
| mutect2-gpu-vp7d9 | XXXXXXXXX | Running | 2020-04-10 16:02:39 +0800 CST | 36.311883677s |           0 |           0 |
+-------------------+------------------+---------+-------------------------------+---------------+-------------+-------------+


+--------------------------+---------------------------+
|        JOB DETAIL        |                           |
+--------------------------+---------------------------+
| mutect2_reference_group  |                           |
| mutect2_oss_region       | cn-shenzhen               |
| mutect2_bucket_name      | my-test-shenzhen          |
| mutect2_output_vcf_name  | vcf/HKU2_160660.vcf       |
| mutect2_reference_file   | hg19                      |
| mutect2_input_bam_tumor  | bam/HKU2_160660.bam       |
| mutect2_input_bam_normal | bam/MGISEQ_NA12878_RG.bam |
| mutect2_input_bed        |                           |
| mutect2_service          | s                         |
+--------------------------+---------------------------+

在單獨腫瘤樣本模式下分析樣本

此模式對單一類型的樣本(例如腫瘤或正常樣本)進行分析。

用法

Usage:

ags remote run mutect2 \
--region cn-shenzhen # region of oss, e.g. cn-shenzhen, cn-beijing and etc\
--bucket my-test-shenzhen # Bucket name\
--input-bam-tumor bam/HKU2_160660.bam #Tumor/Normal sample bam file\
--output-vcf vcf/HKU2_160660.vcf  # Output filename\
--service "s" #SLA: [n:normal|s:silver|g:gold|p:platinum]\
--reference [hg19|hg38|<reference path on OSS>] # hg19: it is hs37d5 version, GRCh37/hg19 include decoy contig, no support for UCSC hg19. hg38: GRCh38/hg38 include decoy

e.g.

ags remote run mutect2 \
--region cn-shenzhen \
--bucket my-test-shenzhen \
--input-bam-tumor bam/HKU2_160660.bam \
--output-vcf vcf/HKU2_160660.all.vcf \
--service "s"  \
--reference hg19
INFO[0001] {"JobName":"mutect2-gpu-6tc8s"}
INFO[0001] Job submit succeed

ags remote get mutect2-gpu-6tc8s --show
+-------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
|     JOB NAME      |  JOB NAMESPACE   |  STATUS   |          CREATE TIME          | DURATION |          FINISH TIME          | TOTAL READS | TOTAL BASES |
+-------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+
| mutect2-gpu-6tc8s | XXXXXXXXXX | Succeeded | 2020-04-10 15:51:59 +0800 CST | 4m12s    | 2020-04-10 15:56:11 +0800 CST |           0 |           0 |
+-------------------+------------------+-----------+-------------------------------+----------+-------------------------------+-------------+-------------+


+--------------------------+-------------------------+
|        JOB DETAIL        |                         |
+--------------------------+-------------------------+
| mutect2_oss_region       | cn-shenzhen             |
| mutect2_input_bam_tumor  | bam/HKU2_160660.bam     |
| mutect2_input_bam_normal |                         |
| mutect2_input_bed        |                         |
| mutect2_output_vcf_name  | vcf/HKU2_160660.all.vcf |
| mutect2_bucket_name      | my-test-shenzhen        |
| mutect2_reference_file   | hg19                    |
| mutect2_reference_group  |                         |
| mutect2_service          | s                       |
+--------------------------+-------------------------+

Leave a Reply

Your email address will not be published. Required fields are marked *