VariantMetaCaller
Home

VariantMetaCaller automatically integrates variant calling pipelines into a better performing overall model that also predicts accurate variant probabilities.

The source code can be downloaded here.

The artificial sequencing data sets supporting the results is available at ftp://phoenix.mit.bme.hu:49021/VariantMetaCaller/fastq

Software usage

Usage: genotypePrioritizer [OPTIONS] -t <tool> Where <tool> can be (default: 'prioritize'): prioritize options: -v <title> <model> <vcfFilePath> [-v ...] : Title, name of model in configuration file and vcf file path. -c <configurationFilePath> : Path for configuration file. -o <outputFilePath> : Write output to this file instead of standard output. --type <both|snp|indel> : Restrict output to variant type. Default: both --threshold-positive <n> : Minimum number of callers supporting a variant to consider as positive training instance for SVM. Default: number of input vcf files --threshold-negative <n> : Maximum number of callers supporting a variant to consider as negative training instance for SVM. Default: 1 --svmParams <C values> <Gamma values> <cv> : SVM parameters. <C values> : Min,Max,By .OR. Min,Max,By,ByFine Performs grid search over C from 2^Min to 2^Max by step size of 2^By. If ByFine is also given, performs a finer grid search on the neighbourhood of the best parameters. Default: '-5,15,2' <Gamma values> : Min,Max,By .OR. Min,Max,By,ByFine Performs grid search over Gamma parameter of the RBF kernel from 2^Min to 2^Max by step size of 2^By. If ByFine is also given, performs a finer grid search on the neighbourhood of the best parameters. Default: '-15,3,2' <cv> : Cross validation fold size. Default: 5 --minQ <num> : Minimum Quality threshold for a variant to consider in input files. Default: 1.0 --useFilter : Filter out variants from input files where FILTER is not PASS or empty. Default: don't filter --allInfo : Output all INFO fields from input files. Default: don't write out advanced options: --libSVM <param> <value> : libSVM parameter and value. Available parameters: nu (for one-class SVM); e (epsilon); h (shrinking heuristics); m (cache size in MB); w-1, w1 (weight of parameter C for negative and positive training samples, for two-class SVM, both for SNPs and indels); snp:w-1, snp:w1, indel:w-1, indel:w1 (weight of C separately for SNPs and indels) See details in libSVM documentation. --createContourPlots : Create contour plots of SVM grid search accuracy for all input files. Default: don't create such plots --computeFisherScores : Compute Fisher score (importance of features) for all input files. Default: don't compute --featureImportance : Perform feature importance calculations. High computational cost! Default: don't perform genotypeconcordance options: -comp <vcfFilePath> : Comparison dataset. -truth <vcfFilePath> : Truth dataset. -o <prefix> : Output file prefix. Default: standard output. --minQ <num> : Minimum Quality threshold for a variant to consider in input files. Default: 0.0 --useFilter : Filter out variants from input files where FILTER is not PASS or empty. Default: don't filter General options : --verbosity <n> : Verbosity level. (0: Errors and warnings; *1*: Status messages; 2: More; 3: Every litle detail.)

For example, the paramters of the VariantMetaCaller combining four VCF files created by HaplotypeCaller, UnifiedGenotyper, FreeBayes and SAMTools can be:

genotypePrioritizer -c data/definitions.model \ -v HC HaplotypeCaller haplotypecaller.vcf \ -v samtools samtools samtools.vcf \ -v freebayes freebayes freebayes.vcf \ -v UG UnifiedGenotyper unifiedgenotyper.vcf > result.vcf

Explanation:

Each '-v' parameters defines an input VCF file where the first item of the '<title> <model> <vcfFilePath>' triplet is the title of the input file (used in the resulting output file), the second item is the name of the model defined in the configurations file used for this input file, and the third is the path of the input file.

The '-c' parameter sets the definition file, in which the annotations and the necessary data transformations are given for each input variant caller method. A sample definition file, defining the basic annotations of the above four variant caller methods can be downloaded from here.

The output of the program is written into the standard output. Therefore, the '> result.vcf' is used for redirecting the resulting VCF file into the results.vcf file.