harmonize-header
Harmonize VCF headers
Last updated: 2025-03-14
Input requirmenets
- VCF: VCF/BCF files following the VCF specification.
Output
- VCF header: harmonized VCF header containing all header records of input VCFs
Usage
harmonisv harmonize-header [options] -i <input_vcf> -o <output_header>
Examples
In this example, the VCF headers from cuteSV, sniffles, and svim are harmonized. If one header occurs in more than one VCF, the priority is: sniffles > cuteSV > svim (based on the input order).
harmonisv harmonize-header \
-i HG002.minimap2.cuteSV.vcf,HG002.minimap2.sniffles.vcf,HG002.minimap2.svim.vcf \
-o harmonized_header.txt \
-r HG002.minimap2.sniffles.vcf # reference VCF has the highest priority
Input:
HG002.minimap2.cuteSV.vcf:
##fileformat=VCFv4.2
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variant">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NULL
HG002.minimap2.sniffles.vcf:
##fileformat=VCFv4.2
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variation">
##INFO=<ID=STDEV_POS,Number=1,Type=Float,Description="Standard deviation of structural variation start position">
##INFO=<ID=STDEV_LEN,Number=1,Type=Float,Description="Standard deviation of structural variation length">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
HG002.minimap2.svim.vcf:
##fileformat=VCFv4.2
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=STD_SPAN,Number=1,Type=Float,Description="Standard deviation in span of merged SV signatures">
##INFO=<ID=STD_POS,Number=1,Type=Float,Description="Standard deviation in position of merged SV signatures">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
Output:
PRECISE
, IMPRECISE
, SVLEN
, SVTYPE
are defined in multiple VCFs, and the output use the definition in sniffles as it has the highest priority. STDEV_POS
, STDEV_LEN
, STD_SPAN
, STD_POS
are defined in single VCF and are appended to the output header.
##fileformat=VCFv4.2
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Structural variation with precise breakpoints">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Structural variation with imprecise breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variation">
##INFO=<ID=STDEV_POS,Number=1,Type=Float,Description="Standard deviation of structural variation start position">
##INFO=<ID=STDEV_LEN,Number=1,Type=Float,Description="Standard deviation of structural variation length">
##INFO=<ID=STD_SPAN,Number=1,Type=Float,Description="Standard deviation in span of merged SV signatures">
##INFO=<ID=STD_POS,Number=1,Type=Float,Description="Standard deviation in position of merged SV signatures">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Arguments
Input/Output arguments:
- -i, --invcf VCF
- Comma-separated list of input VCF files. Duplicate headers will use the first one, including SAMPLE header. For multi-sample VCF, please make sure all input VCFs have the same SAMPLE order.
- -f, --file-list FILE_LIST
- File containing a list of input VCF files, one VCF per line. VCFs from both -i and -f will be used.
- -o, --output OUTPUT
- Output VCF header file
optional arguments:
- -r, --ref-vcf VCF
- Reference VCF file, highest priority for duplicated headers