VCF format

Last updated: 2025-03-14

Requirements

  • bi-allelic VCF/BCF files following the VCF specification
  • single-sample VCF/BCF files are suggested as input for SV calling workflow. Although multi-sample VCFs may not raise errors (except for genotype), they are not fully tested and may cause unexpected results.
  • The input/output VCF format (i.e., vcf, vcf.gz, bcf) will be automatically detected. However, a temporary uncompressed VCF file will be generated if the output is vcf.gz or bcf

Variant ID

Unique variant IDs are used to index variants in the SV calling workflow. To generate a unique ID, use harmonize command. For example, harmonize --rename-id --id-prefix HG002.minimap2.sniffles will generate unique IDs like HG002.minimap2.sniffles.INS.1.

INFO tags

The following INFO tags could be used or generated in the SV calling workflow. Not all listed INFO tags are required for all commands. Please refer to the documentation of each command for more details.

INFO tags generated by upstream SV callers:

  • SVTYPE: SV types defined in the VCF specification
  • SVLEN: SV length defined in the VCF specification
  • AC: allele count of alternate alleles
  • DP: sequencing depth of the SV
  • RE: number of reads supporting the SV

INFO tags generated by harmonisv:

  • represent command:
    • REPRESENT_SV: original ID that selected as the representative SV
    • ID_LIST: list of SVs merged to the representative SV
  • genotype command:
    • VAF: variant allele fraction (VAF = RE / DP)
    • SUPP_METHOD: number of methods support the SV (method = aligner + caller)
    • SUPP_METHOD_FORCE: number of force calling methods support the SV
    • SUPP_CALLER: number of callers support the SV, it counts maximum number of callers using the same aligner, e.g., if minimap2 + sniffles, minimap2 + svim and NGMLR + cuteSV support the SV, SUPP_CALLER=2.
    • SUPP_CALLER_FORCE: number of force calling callers support the SV
    • MEAN_VAF: average VAF of all methods
    • STD_VAF: standard deviation of VAF of all methods
    • MEAN_VAF_CALL: average VAF of methods support the SV (VAF > threshold)
    • STD_VAF_CALL: standard deviation of VAF of methods support the SV
    • MAX_RE: maximum number of reads support the SV
    • TAG_ALIGNER_CALLER: the INFO/TAG value obtained from individual methods, e.g., VAF_MINIMAP2_SNIFFLES is the VAF value of SV called using the method minimap2 + sniffles.
  • filter command:
    • RF_SCORE: random forest quality scoreo of the SV (ranged from 0 to 1)