Structured Review

Illumina Inc nextseq sequencing data
Relative abundance of reads for indicator OTUs in the pooled Rana dalmatina clutch samples across embryo developmental stage categories ( a – b ), and relationships between R. dalmatina clutch-associated operational taxonomic units (OTUs) community dissimilarity and temporal distance for each sampling site ( c – d ). OTUs annotated to order level for 18S <t>data</t> (a) and to genus level for rbcL indicator OTUs (b). Higher level taxa in b plot indicate that blastn match was lower than 95%, thus the OTU was not annotated to genus level
Nextseq Sequencing Data, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 99/100, based on 12 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/nextseq sequencing data/product/Illumina Inc
Average 99 stars, based on 12 article reviews
Price from $9.99 to $1999.99
nextseq sequencing data - by Bioz Stars, 2022-11
99/100 stars

Images

1) Product Images from "Diversity and substrate-specificity of green algae and other micro-eukaryotes colonizing amphibian clutches in Germany, revealed by DNA metabarcoding"

Article Title: Diversity and substrate-specificity of green algae and other micro-eukaryotes colonizing amphibian clutches in Germany, revealed by DNA metabarcoding

Journal: Die Naturwissenschaften

doi: 10.1007/s00114-021-01734-0

Relative abundance of reads for indicator OTUs in the pooled Rana dalmatina clutch samples across embryo developmental stage categories ( a – b ), and relationships between R. dalmatina clutch-associated operational taxonomic units (OTUs) community dissimilarity and temporal distance for each sampling site ( c – d ). OTUs annotated to order level for 18S data (a) and to genus level for rbcL indicator OTUs (b). Higher level taxa in b plot indicate that blastn match was lower than 95%, thus the OTU was not annotated to genus level
Figure Legend Snippet: Relative abundance of reads for indicator OTUs in the pooled Rana dalmatina clutch samples across embryo developmental stage categories ( a – b ), and relationships between R. dalmatina clutch-associated operational taxonomic units (OTUs) community dissimilarity and temporal distance for each sampling site ( c – d ). OTUs annotated to order level for 18S data (a) and to genus level for rbcL indicator OTUs (b). Higher level taxa in b plot indicate that blastn match was lower than 95%, thus the OTU was not annotated to genus level

Techniques Used: Sampling

Relative abundance of rbcL ( a ) and 18S ( d ) reads from taxa associated with clutch samples (relative abundance of reads from taxa associated with other substrates in Online Resource 4 , Fig. S1 ). Bar plots for most abundant (sequence abundance) indicator rbcL ( b , c ) and 18S ( e , f ) operational taxonomic units (OTUs) detected in clutch samples (from Table 1 ). The y-axis of plots a and c represent sequence counts, while these counts have been log transformed in b and d plots to better highlight the distribution of these OTUs in leaves, sediment, and water samples. For easier interpretation of the graph, the main target taxon ( Oophila ) is marked with x in the respective bars. Percentages in plots b and e represent the relative abundance of sequences for a corresponding taxon in the clutch samples
Figure Legend Snippet: Relative abundance of rbcL ( a ) and 18S ( d ) reads from taxa associated with clutch samples (relative abundance of reads from taxa associated with other substrates in Online Resource 4 , Fig. S1 ). Bar plots for most abundant (sequence abundance) indicator rbcL ( b , c ) and 18S ( e , f ) operational taxonomic units (OTUs) detected in clutch samples (from Table 1 ). The y-axis of plots a and c represent sequence counts, while these counts have been log transformed in b and d plots to better highlight the distribution of these OTUs in leaves, sediment, and water samples. For easier interpretation of the graph, the main target taxon ( Oophila ) is marked with x in the respective bars. Percentages in plots b and e represent the relative abundance of sequences for a corresponding taxon in the clutch samples

Techniques Used: Sequencing, Transformation Assay

2) Product Images from "BBCAnalyzer: a visual approach to facilitate variant calling"

Article Title: BBCAnalyzer: a visual approach to facilitate variant calling

Journal: BMC Bioinformatics

doi: 10.1186/s12859-017-1549-4

Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis , detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)
Figure Legend Snippet: Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis , detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)

Techniques Used: Generated, Variant Assay

3) Product Images from "BBCAnalyzer: a visual approach to facilitate variant calling"

Article Title: BBCAnalyzer: a visual approach to facilitate variant calling

Journal: BMC Bioinformatics

doi: 10.1186/s12859-017-1549-4

Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis , detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)
Figure Legend Snippet: Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis , detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)

Techniques Used: Generated, Variant Assay

4) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

5) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

6) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

7) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

8) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Similar Products

  • Logo
  • About
  • News
  • Press Release
  • Team
  • Advisors
  • Partners
  • Contact
  • Bioz Stars
  • Bioz vStars
  • 90
    Illumina Inc nextseq 500 pre mrna sequencing data
    Nextseq 500 Pre Mrna Sequencing Data, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/nextseq 500 pre mrna sequencing data/product/Illumina Inc
    Average 90 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    nextseq 500 pre mrna sequencing data - by Bioz Stars, 2022-11
    90/100 stars
      Buy from Supplier

    88
    Illumina Inc nextseq 500 sequencing platform
    Nextseq 500 Sequencing Platform, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 88/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/nextseq 500 sequencing platform/product/Illumina Inc
    Average 88 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    nextseq 500 sequencing platform - by Bioz Stars, 2022-11
    88/100 stars
      Buy from Supplier

    90
    Illumina Inc sequencing data from illumina
    Sequencing Data From Illumina, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/sequencing data from illumina/product/Illumina Inc
    Average 90 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    sequencing data from illumina - by Bioz Stars, 2022-11
    90/100 stars
      Buy from Supplier

    Image Search Results