VCF format
Last updated: 2025-03-14
Requirements
- bi-allelic VCF/BCF files following the VCF specification
- single-sample VCF/BCF files are suggested as input for SV calling workflow. Although multi-sample VCFs may not raise errors (except for
genotype
), they are not fully tested and may cause unexpected results. - The input/output VCF format (i.e., vcf, vcf.gz, bcf) will be automatically detected. However, a temporary uncompressed VCF file will be generated if the output is vcf.gz or bcf
Variant ID
Unique variant IDs are used to index variants in the SV calling workflow. To generate a unique ID, use harmonize
command. For example, harmonize --rename-id --id-prefix HG002.minimap2.sniffles
will generate unique IDs like HG002.minimap2.sniffles.INS.1
.
INFO tags
The following INFO tags could be used or generated in the SV calling workflow. Not all listed INFO tags are required for all commands. Please refer to the documentation of each command for more details.
INFO tags generated by upstream SV callers:
SVTYPE
: SV types defined in the VCF specificationSVLEN
: SV length defined in the VCF specificationAC
: allele count of alternate allelesDP
: sequencing depth of the SVRE
: number of reads supporting the SV
INFO tags generated by harmonisv
:
represent
command:REPRESENT_SV
: original ID that selected as the representative SVID_LIST
: list of SVs merged to the representative SV
genotype
command:VAF
: variant allele fraction (VAF = RE / DP)SUPP_METHOD
: number of methods support the SV (method = aligner + caller)SUPP_METHOD_FORCE
: number of force calling methods support the SVSUPP_CALLER
: number of callers support the SV, it counts maximum number of callers using the same aligner, e.g., ifminimap2
+sniffles
,minimap2
+svim
andNGMLR
+cuteSV
support the SV,SUPP_CALLER=2
.SUPP_CALLER_FORCE
: number of force calling callers support the SVMEAN_VAF
: average VAF of all methodsSTD_VAF
: standard deviation of VAF of all methodsMEAN_VAF_CALL
: average VAF of methods support the SV (VAF > threshold)STD_VAF_CALL
: standard deviation of VAF of methods support the SVMAX_RE
: maximum number of reads support the SVTAG_ALIGNER_CALLER
: the INFO/TAG value obtained from individual methods, e.g.,VAF_MINIMAP2_SNIFFLES
is theVAF
value of SV called using the methodminimap2
+sniffles
.
filter
command:RF_SCORE
: random forest quality scoreo of the SV (ranged from 0 to 1)