Structured Review

Illumina Inc illumina nextseq
FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 <t>NextSeq</t> reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.
Illumina Nextseq, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 99/100, based on 40 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/illumina nextseq/product/Illumina Inc
Average 99 stars, based on 40 article reviews
Price from $9.99 to $1999.99
illumina nextseq - by Bioz Stars, 2022-09
99/100 stars

Images

1) Product Images from "Model-driven generation of artificial yeast promoters"

Article Title: Model-driven generation of artificial yeast promoters

Journal: bioRxiv

doi: 10.1101/748616

FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.
Figure Legend Snippet: FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.

Techniques Used: FACS, Activity Assay, Binding Assay, High Throughput Screening Assay, Next-Generation Sequencing, Derivative Assay, Sequencing, Generated

Neural networks trained on P GPD and P ZEV data accurately predict promoter activity. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in analyses of P GPD data; only sequences for which at least 20 NextSeq reads were counted in each replicate were used in analyses of P ZEV data. A: Model loss curve for P GPD training; dashed line indicates epoch selected by early stopping for the final model. B: Predicted promoter activities versus FACS-seq measurements for held-out test data in the P GPD dataset. C: Model loss curve for P ZEV training; dashed line indicates epoch selected by early stopping for the final model. D: Predicted promoter activities in the uninduced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. E: Predicted promoter activities in the induced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. F: Predicted activation ratios (ratio of predicted induced and uninduced promoter activities) versus FACS-seq-derived activation ratios for held-out test data in the P ZEV dataset.
Figure Legend Snippet: Neural networks trained on P GPD and P ZEV data accurately predict promoter activity. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in analyses of P GPD data; only sequences for which at least 20 NextSeq reads were counted in each replicate were used in analyses of P ZEV data. A: Model loss curve for P GPD training; dashed line indicates epoch selected by early stopping for the final model. B: Predicted promoter activities versus FACS-seq measurements for held-out test data in the P GPD dataset. C: Model loss curve for P ZEV training; dashed line indicates epoch selected by early stopping for the final model. D: Predicted promoter activities in the uninduced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. E: Predicted promoter activities in the induced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. F: Predicted activation ratios (ratio of predicted induced and uninduced promoter activities) versus FACS-seq-derived activation ratios for held-out test data in the P ZEV dataset.

Techniques Used: Activity Assay, FACS, Activation Assay, Derivative Assay

2) Product Images from "Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants"

Article Title: Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants

Journal: bioRxiv

doi: 10.1101/2021.03.10.434828

Additional tiled-primers improves read coverage and allows identification of minority variants: A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted using an Illumina MiSeq when using the original primers as in Fig 2 (v1 - blue) or with an additional 326 tiled-primers (v3 - pink). Tiled-primers are indicated at the bottom of the plot by short blue (v1) or pink (v3) lines. B) The rates of mismatching nucleotides found in mapped NGS reads is depicted across the SARS-CoV-2 genome for isolate WRECVA_000508 prior to trimming the tiled primers from forward/’R1’ reads and without PCR deduplication. C) The rates of mismatching is also depicted after data quality processing to remove PCR duplicates and primer-derived nucleotides in the reads, revealing 3 minority variants in this sample with frequencies > 2%.
Figure Legend Snippet: Additional tiled-primers improves read coverage and allows identification of minority variants: A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted using an Illumina MiSeq when using the original primers as in Fig 2 (v1 - blue) or with an additional 326 tiled-primers (v3 - pink). Tiled-primers are indicated at the bottom of the plot by short blue (v1) or pink (v3) lines. B) The rates of mismatching nucleotides found in mapped NGS reads is depicted across the SARS-CoV-2 genome for isolate WRECVA_000508 prior to trimming the tiled primers from forward/’R1’ reads and without PCR deduplication. C) The rates of mismatching is also depicted after data quality processing to remove PCR duplicates and primer-derived nucleotides in the reads, revealing 3 minority variants in this sample with frequencies > 2%.

Techniques Used: Next-Generation Sequencing, Polymerase Chain Reaction, Derivative Assay

Schematic of Tiled-ClickSeq and Computational Pipeline: A) Schematic of SARS-CoV-2 genome with two examples of sub-genomic mRNAs. B) Paired-primer approaches typically generate short amplicons flanked by upstream and downstream primers that are PCR amplified in non-overlapping pools. C) Tiled-ClickSeq uses a single pool of primers at the reverse-transcription step with the upstream site generated by stochastic termination by azido-nucleotides. D) 3’-azido-blocked single-stranded cDNA fragments are ‘click-ligated’ using copper-catalyzed azide alkyne cycloaddition (CuAAC) to hexynyl functionalized Illumina i5 sequencing adaptors. Triazole-linked ssDNA is PCR amplified to generate a final cDNA library. E) The structure of the final cDNA is illustrated indicating the presence of the i5 and i7 adaptors, the 12N unique molecular identifier (UMI), the expected location of the triazole linkage, and the origins of the cDNA in the reads including the tiled primer-derived DNA, which is captured using paired-end sequencing. F) The hypothetical read coverage over a viral genome is indicated in red, yielding overlapping ‘saw-tooth’ patterns of sequencing coverage. Longer fragment lengths with more extensive overlapping can be obtained using decreased AzNTP:dNTP ratios. G) Final cDNA libraries are analyzed and size-selected by gel electrophoresis (2% agarose gel). Duplicates of libraries synthesized from 8, 80 and 800 ng of input SARS-CoV-2 RNA input are shown. H) Flowchart of the data processing and bioinformatic pipeline. Input data is in Blue, output data are in Green, scripts/processes are Purple.
Figure Legend Snippet: Schematic of Tiled-ClickSeq and Computational Pipeline: A) Schematic of SARS-CoV-2 genome with two examples of sub-genomic mRNAs. B) Paired-primer approaches typically generate short amplicons flanked by upstream and downstream primers that are PCR amplified in non-overlapping pools. C) Tiled-ClickSeq uses a single pool of primers at the reverse-transcription step with the upstream site generated by stochastic termination by azido-nucleotides. D) 3’-azido-blocked single-stranded cDNA fragments are ‘click-ligated’ using copper-catalyzed azide alkyne cycloaddition (CuAAC) to hexynyl functionalized Illumina i5 sequencing adaptors. Triazole-linked ssDNA is PCR amplified to generate a final cDNA library. E) The structure of the final cDNA is illustrated indicating the presence of the i5 and i7 adaptors, the 12N unique molecular identifier (UMI), the expected location of the triazole linkage, and the origins of the cDNA in the reads including the tiled primer-derived DNA, which is captured using paired-end sequencing. F) The hypothetical read coverage over a viral genome is indicated in red, yielding overlapping ‘saw-tooth’ patterns of sequencing coverage. Longer fragment lengths with more extensive overlapping can be obtained using decreased AzNTP:dNTP ratios. G) Final cDNA libraries are analyzed and size-selected by gel electrophoresis (2% agarose gel). Duplicates of libraries synthesized from 8, 80 and 800 ng of input SARS-CoV-2 RNA input are shown. H) Flowchart of the data processing and bioinformatic pipeline. Input data is in Blue, output data are in Green, scripts/processes are Purple.

Techniques Used: Polymerase Chain Reaction, Amplification, Generated, Sequencing, cDNA Library Assay, Derivative Assay, Nucleic Acid Electrophoresis, Agarose Gel Electrophoresis, Synthesized

Read coverage over the SARS-CoV-2 genome using Tiled-ClickSeq: A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted when sequencing using an Illumina MiSeq (orange) or on an Oxford Nanopore Technologies MinION device (blue). A ‘saw-tooth’ pattern of coverage is observed with ‘teeth’ upstream of tiled-primers, indicated at the bottom of the plot by short black lines. B) Zoomed in read coverage of nts 1–2400 of the SARS-CoV-2 genome with coverage of Illumina MiSeq reads from five individual primers coloured to illustrate coverage from downstream amplicons overlapping the primer-binding sites of upstream tiled-primers (Blue: Read coverage from primer 1; Orange: coverage from primer 2; Green: coverage from primer 3; Red: coverage from primer 4; Purple: coverage from primer 5).
Figure Legend Snippet: Read coverage over the SARS-CoV-2 genome using Tiled-ClickSeq: A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted when sequencing using an Illumina MiSeq (orange) or on an Oxford Nanopore Technologies MinION device (blue). A ‘saw-tooth’ pattern of coverage is observed with ‘teeth’ upstream of tiled-primers, indicated at the bottom of the plot by short black lines. B) Zoomed in read coverage of nts 1–2400 of the SARS-CoV-2 genome with coverage of Illumina MiSeq reads from five individual primers coloured to illustrate coverage from downstream amplicons overlapping the primer-binding sites of upstream tiled-primers (Blue: Read coverage from primer 1; Orange: coverage from primer 2; Green: coverage from primer 3; Red: coverage from primer 4; Purple: coverage from primer 5).

Techniques Used: Sequencing, Binding Assay

3) Product Images from "fastp: an ultra-fast all-in-one FASTQ preprocessor"

Article Title: fastp: an ultra-fast all-in-one FASTQ preprocessor

Journal: Bioinformatics

doi: 10.1093/bioinformatics/bty560

The base content ratio curves generated by fastp for one Illumina NextSeq FASTQ file. ( a ) Before fastp preprocessing, and ( b ) after fastp preprocessing. As depicted in (a), the G curve is abnormal and the G/C curves are separated. In (b), the G/C separation problem is eliminated
Figure Legend Snippet: The base content ratio curves generated by fastp for one Illumina NextSeq FASTQ file. ( a ) Before fastp preprocessing, and ( b ) after fastp preprocessing. As depicted in (a), the G curve is abnormal and the G/C curves are separated. In (b), the G/C separation problem is eliminated

Techniques Used: Generated

4) Product Images from "Highly Multiplexed Single-Cell Full-Length cDNA Sequencing of human immune cells with 10X Genomics and R2C2"

Article Title: Highly Multiplexed Single-Cell Full-Length cDNA Sequencing of human immune cells with 10X Genomics and R2C2

Journal: bioRxiv

doi: 10.1101/2020.01.10.902361

R2C2 and Illumina datasets independently cluster into B cells, T cells, and Monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 (A) and Illumina (B) datasets both clustered into 3 groups which, based on marker gene expression (C and D) could be identified as B cells, T cells, and Monocytes. The color gradient (C and D) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown.
Figure Legend Snippet: R2C2 and Illumina datasets independently cluster into B cells, T cells, and Monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 (A) and Illumina (B) datasets both clustered into 3 groups which, based on marker gene expression (C and D) could be identified as B cells, T cells, and Monocytes. The color gradient (C and D) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown.

Techniques Used: Expressing, Marker

R2C2 reads sequence 10X full-length cDNA transcripts. Genome Browser shots of ACTB. Genome annotation is shown on top and Illumina reads (center) R2C2 reads (bottom) aligning to the locus are shown below. Both Illumina and R2C2 read alignments were randomly subsampled to 60 reads. The directionality of features is indicated by color (“top strand”=blue, “bottom strand”=yellow). Data for replicate 1 are shown.
Figure Legend Snippet: R2C2 reads sequence 10X full-length cDNA transcripts. Genome Browser shots of ACTB. Genome annotation is shown on top and Illumina reads (center) R2C2 reads (bottom) aligning to the locus are shown below. Both Illumina and R2C2 read alignments were randomly subsampled to 60 reads. The directionality of features is indicated by color (“top strand”=blue, “bottom strand”=yellow). Data for replicate 1 are shown.

Techniques Used: Sequencing

Data Generation and Characteristics. A) Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3’ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B) After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown.
Figure Legend Snippet: Data Generation and Characteristics. A) Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3’ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B) After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown.

Techniques Used: Expressing, Sequencing

Swarm plots of gene expression correlation between R2C2 and Illumina. The median Pearson correlation for each swarm is shown in red. From left to right: (All cell types, same) cells were matched based on their cellular barcode from R2C2 and Illumina. (All cell types, different) R2C2 cells were correlated to a random cell in the Illumina data. The next three swarms were subsampled to 85 points because there are 89 B-Cells. (T-Cells, same) Random T-Cells were correlated between R2C2 and Illumina data. (T-Cells, different) Random R2C2 T-Cells were correlated with random Illumina non-T-Cells. (B-Cells, same) Random B-Cells were correlated between R2C2 and Illumina. (B-Cells, different) Random R2C2 B-Cells were correlated with random Illumina non-B-Cells. (Monocytes, same) Random Monocytes were correlated between R2C2 and Illumina. (Monocytes, different) Random R2C2 Monocytes were correlated with random Illumin non-Monocytes.
Figure Legend Snippet: Swarm plots of gene expression correlation between R2C2 and Illumina. The median Pearson correlation for each swarm is shown in red. From left to right: (All cell types, same) cells were matched based on their cellular barcode from R2C2 and Illumina. (All cell types, different) R2C2 cells were correlated to a random cell in the Illumina data. The next three swarms were subsampled to 85 points because there are 89 B-Cells. (T-Cells, same) Random T-Cells were correlated between R2C2 and Illumina data. (T-Cells, different) Random R2C2 T-Cells were correlated with random Illumina non-T-Cells. (B-Cells, same) Random B-Cells were correlated between R2C2 and Illumina. (B-Cells, different) Random R2C2 B-Cells were correlated with random Illumina non-B-Cells. (Monocytes, same) Random Monocytes were correlated between R2C2 and Illumina. (Monocytes, different) Random R2C2 Monocytes were correlated with random Illumin non-Monocytes.

Techniques Used: Expressing

t-SNE plots with additional marker genes for replicates 1 and 2. As for Figure 2 plots are based on gene expression data as calculated by featureCounts and Seurat. Plots for replicate 1 and replicate 2 are shown on the left and right respectively. Top left: replicate 1 cell type clusters for R2C2 and Illumina. Bottom left: replicate 1 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Top right: replicate 2 cell type clusters for R2C2 and Illumina. Bottom right: replicate 2 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Additional marker genes taken from [ 14 ]. The color gradient encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data.
Figure Legend Snippet: t-SNE plots with additional marker genes for replicates 1 and 2. As for Figure 2 plots are based on gene expression data as calculated by featureCounts and Seurat. Plots for replicate 1 and replicate 2 are shown on the left and right respectively. Top left: replicate 1 cell type clusters for R2C2 and Illumina. Bottom left: replicate 1 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Top right: replicate 2 cell type clusters for R2C2 and Illumina. Bottom right: replicate 2 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Additional marker genes taken from [ 14 ]. The color gradient encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data.

Techniques Used: Marker, Expressing

5) Product Images from "Classifying cells with Scasat, a single-cell ATAC-seq analysis tool"

Article Title: Classifying cells with Scasat, a single-cell ATAC-seq analysis tool

Journal: Nucleic Acids Research

doi: 10.1093/nar/gky950

Experimental design for disentangling different cell types from a complex mixture. Three cell types HET1A, OE19 and OE33 are mixed in equal proportions. Samples are then taken from this mixture into two independent batches (Batch-1 and Batch-2) by running them in two C1 integrated fluidic circuits (IFC). The captured cells are then sequenced using a NextSeq.
Figure Legend Snippet: Experimental design for disentangling different cell types from a complex mixture. Three cell types HET1A, OE19 and OE33 are mixed in equal proportions. Samples are then taken from this mixture into two independent batches (Batch-1 and Batch-2) by running them in two C1 integrated fluidic circuits (IFC). The captured cells are then sequenced using a NextSeq.

Techniques Used:

6) Product Images from "Long-read sequencing reveals the structural complexity of genomic integration of HBV DNA in hepatocellular carcinoma"

Article Title: Long-read sequencing reveals the structural complexity of genomic integration of HBV DNA in hepatocellular carcinoma

Journal: NPJ Genomic Medicine

doi: 10.1038/s41525-021-00245-1

Locations of representative HBV-human chimeric reads in the HBV and human genome. a , c , e , g HBV: 1900, chr5: 1295069 detected in Nanopore, PacBio, Illumina (DNA), Illumina(mRNA) platforms. b , d , f , h HBV: 2373, chr19: 2983018 detected in Nanopore, PacBio, Illumina (DNA), Illumina(mRNA) platforms.
Figure Legend Snippet: Locations of representative HBV-human chimeric reads in the HBV and human genome. a , c , e , g HBV: 1900, chr5: 1295069 detected in Nanopore, PacBio, Illumina (DNA), Illumina(mRNA) platforms. b , d , f , h HBV: 2373, chr19: 2983018 detected in Nanopore, PacBio, Illumina (DNA), Illumina(mRNA) platforms.

Techniques Used:

7) Product Images from "Single-cell isoform analysis in human immune cells"

Article Title: Single-cell isoform analysis in human immune cells

Journal: Genome Biology

doi: 10.1186/s13059-022-02615-z

Data generation and characteristics. A Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3′ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown
Figure Legend Snippet: Data generation and characteristics. A Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3′ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown

Techniques Used: Expressing, Sequencing

R2C2 and Illumina datasets independently cluster into B cells, T cells, and monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 ( A ) and Illumina ( B ) datasets both clustered into 3 groups which, based on marker gene expression ( C and D ) could be identified as B cells, T cells, and monocytes. The color gradient ( C and D ) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown
Figure Legend Snippet: R2C2 and Illumina datasets independently cluster into B cells, T cells, and monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 ( A ) and Illumina ( B ) datasets both clustered into 3 groups which, based on marker gene expression ( C and D ) could be identified as B cells, T cells, and monocytes. The color gradient ( C and D ) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown

Techniques Used: Expressing, Marker

8) Product Images from "Highly Multiplexed Single-Cell Full-Length cDNA Sequencing of human immune cells with 10X Genomics and R2C2"

Article Title: Highly Multiplexed Single-Cell Full-Length cDNA Sequencing of human immune cells with 10X Genomics and R2C2

Journal: bioRxiv

doi: 10.1101/2020.01.10.902361

R2C2 and Illumina datasets independently cluster into B cells, T cells, and Monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 (A) and Illumina (B) datasets both clustered into 3 groups which, based on marker gene expression (C and D) could be identified as B cells, T cells, and Monocytes. The color gradient (C and D) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown.
Figure Legend Snippet: R2C2 and Illumina datasets independently cluster into B cells, T cells, and Monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 (A) and Illumina (B) datasets both clustered into 3 groups which, based on marker gene expression (C and D) could be identified as B cells, T cells, and Monocytes. The color gradient (C and D) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown.

Techniques Used: Expressing, Marker

R2C2 reads sequence 10X full-length cDNA transcripts. Genome Browser shots of ACTB. Genome annotation is shown on top and Illumina reads (center) R2C2 reads (bottom) aligning to the locus are shown below. Both Illumina and R2C2 read alignments were randomly subsampled to 60 reads. The directionality of features is indicated by color (“top strand”=blue, “bottom strand”=yellow). Data for replicate 1 are shown.
Figure Legend Snippet: R2C2 reads sequence 10X full-length cDNA transcripts. Genome Browser shots of ACTB. Genome annotation is shown on top and Illumina reads (center) R2C2 reads (bottom) aligning to the locus are shown below. Both Illumina and R2C2 read alignments were randomly subsampled to 60 reads. The directionality of features is indicated by color (“top strand”=blue, “bottom strand”=yellow). Data for replicate 1 are shown.

Techniques Used: Sequencing

Data Generation and Characteristics. A) Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3’ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B) After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown.
Figure Legend Snippet: Data Generation and Characteristics. A) Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3’ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B) After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown.

Techniques Used: Expressing, Sequencing

Swarm plots of gene expression correlation between R2C2 and Illumina. The median Pearson correlation for each swarm is shown in red. From left to right: (All cell types, same) cells were matched based on their cellular barcode from R2C2 and Illumina. (All cell types, different) R2C2 cells were correlated to a random cell in the Illumina data. The next three swarms were subsampled to 85 points because there are 89 B-Cells. (T-Cells, same) Random T-Cells were correlated between R2C2 and Illumina data. (T-Cells, different) Random R2C2 T-Cells were correlated with random Illumina non-T-Cells. (B-Cells, same) Random B-Cells were correlated between R2C2 and Illumina. (B-Cells, different) Random R2C2 B-Cells were correlated with random Illumina non-B-Cells. (Monocytes, same) Random Monocytes were correlated between R2C2 and Illumina. (Monocytes, different) Random R2C2 Monocytes were correlated with random Illumin non-Monocytes.
Figure Legend Snippet: Swarm plots of gene expression correlation between R2C2 and Illumina. The median Pearson correlation for each swarm is shown in red. From left to right: (All cell types, same) cells were matched based on their cellular barcode from R2C2 and Illumina. (All cell types, different) R2C2 cells were correlated to a random cell in the Illumina data. The next three swarms were subsampled to 85 points because there are 89 B-Cells. (T-Cells, same) Random T-Cells were correlated between R2C2 and Illumina data. (T-Cells, different) Random R2C2 T-Cells were correlated with random Illumina non-T-Cells. (B-Cells, same) Random B-Cells were correlated between R2C2 and Illumina. (B-Cells, different) Random R2C2 B-Cells were correlated with random Illumina non-B-Cells. (Monocytes, same) Random Monocytes were correlated between R2C2 and Illumina. (Monocytes, different) Random R2C2 Monocytes were correlated with random Illumin non-Monocytes.

Techniques Used: Expressing

t-SNE plots with additional marker genes for replicates 1 and 2. As for Figure 2 plots are based on gene expression data as calculated by featureCounts and Seurat. Plots for replicate 1 and replicate 2 are shown on the left and right respectively. Top left: replicate 1 cell type clusters for R2C2 and Illumina. Bottom left: replicate 1 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Top right: replicate 2 cell type clusters for R2C2 and Illumina. Bottom right: replicate 2 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Additional marker genes taken from [ 14 ]. The color gradient encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data.
Figure Legend Snippet: t-SNE plots with additional marker genes for replicates 1 and 2. As for Figure 2 plots are based on gene expression data as calculated by featureCounts and Seurat. Plots for replicate 1 and replicate 2 are shown on the left and right respectively. Top left: replicate 1 cell type clusters for R2C2 and Illumina. Bottom left: replicate 1 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Top right: replicate 2 cell type clusters for R2C2 and Illumina. Bottom right: replicate 2 expression heat maps for various marker genes where the two columns on the left are for R2C2 and the right two are Illumina. Additional marker genes taken from [ 14 ]. The color gradient encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data.

Techniques Used: Marker, Expressing

9) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

10) Product Images from "mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species"

Article Title: mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species

Journal: bioRxiv

doi: 10.1101/329045

Workflow to create the plasmid models for Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli . A) For E. faecium , 60 Illumina sequenced strains were selected for ONT sequencing and Unicycler was used to extend the number of complete genomes available for this species. For E. coli and K. pneumoniae , we downloaded complete genomes with plasmids associated from Assembly Entrez NCBI database. B) For E. coli and K. pneumoniae , we simulated reads with 50X coverage and no error rate using wgsim. C) Illumina simulated and non-simulated reads were de novo assembled using SPAdes. D) We mapped short-read contigs against complete genome sequences to define a reliable dataset of short-read contigs as plasmid- or chromosome-derived. E) For each bacterial species, five machine-learning classifiers were trained (10-fold cross-validation) and compared using a specific bacterial species training and test set. F) SVM models were implemented in mlplasmids and used to predict plasmid- and chromosome-derived sequences in isolates with only short-read WGS data available. The complete workflow is available from https://gitlab.com/sirarredondo/analysis_mlplasmids
Figure Legend Snippet: Workflow to create the plasmid models for Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli . A) For E. faecium , 60 Illumina sequenced strains were selected for ONT sequencing and Unicycler was used to extend the number of complete genomes available for this species. For E. coli and K. pneumoniae , we downloaded complete genomes with plasmids associated from Assembly Entrez NCBI database. B) For E. coli and K. pneumoniae , we simulated reads with 50X coverage and no error rate using wgsim. C) Illumina simulated and non-simulated reads were de novo assembled using SPAdes. D) We mapped short-read contigs against complete genome sequences to define a reliable dataset of short-read contigs as plasmid- or chromosome-derived. E) For each bacterial species, five machine-learning classifiers were trained (10-fold cross-validation) and compared using a specific bacterial species training and test set. F) SVM models were implemented in mlplasmids and used to predict plasmid- and chromosome-derived sequences in isolates with only short-read WGS data available. The complete workflow is available from https://gitlab.com/sirarredondo/analysis_mlplasmids

Techniques Used: Plasmid Preparation, Sequencing, Derivative Assay

11) Product Images from "Single-cell isoform analysis in human immune cells"

Article Title: Single-cell isoform analysis in human immune cells

Journal: Genome Biology

doi: 10.1186/s13059-022-02615-z

Data generation and characteristics. A Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3′ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown
Figure Legend Snippet: Data generation and characteristics. A Thousands of peripheral blood mononuclear cells (PBMCs) were processed using the 10X Genomics Chromium Single Cell 3′ Gene Expression Solution. The resulting full-length cDNA was either fragmented for Illumina sequencing or processed using the R2C2 workflow. B After read processing and demultiplexing, the unique molecular identifiers (UMIs) associated with each cellular index (cell) in R2C2 (top) and Illumina (center) datasets are shown as histograms. Cells are ranked by the number of UMIs and colored based on their rank in the R2C2 dataset. Red lines indicate cellular identifiers found in Illumina but not R2C2 data. At the bottom, the UMIs shared between cellular identifiers in Illumina and R2C2 datasets or unique to each dataset are shown as stacked histograms. Cells are ranked by the number of shared UMIs. Data for replicate 1 are shown

Techniques Used: Expressing, Sequencing

R2C2 and Illumina datasets independently cluster into B cells, T cells, and monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 ( A ) and Illumina ( B ) datasets both clustered into 3 groups which, based on marker gene expression ( C and D ) could be identified as B cells, T cells, and monocytes. The color gradient ( C and D ) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown
Figure Legend Snippet: R2C2 and Illumina datasets independently cluster into B cells, T cells, and monocytes. Gene expression profiles were determined independently for each cell in R2C2 and Illumina datasets. The Seurat package was then used to cluster cells based on the gene expression profiles. The cells in R2C2 ( A ) and Illumina ( B ) datasets both clustered into 3 groups which, based on marker gene expression ( C and D ) could be identified as B cells, T cells, and monocytes. The color gradient ( C and D ) encodes ln(fold change), where the fold change is comparing that cluster’s expression to the rest of the data. Data for replicate 1 are shown

Techniques Used: Expressing, Marker

12) Product Images from "The Absence of Retroelement Activity Is Characteristic for Childhood Acute Leukemias and Adult Acute Lymphoblastic Leukemia"

Article Title: The Absence of Retroelement Activity Is Characteristic for Childhood Acute Leukemias and Adult Acute Lymphoblastic Leukemia

Journal: International Journal of Molecular Sciences

doi: 10.3390/ijms23031756

Principle of the method. gDNA was digested by either TaqI + FspBI (for L1), or FspBI + Csp6I (for Alu) and ligated to a stem-loop adapter (pink). Retroelement-specific primers (yellow and green arrows) were used for selective amplification of 3′ L1 or 5′ Alu flanking sequences (flanks). Indexing PCR introduced the sample barcodes and oligonucleotides that were necessary for Illumina sequencing (i5 and i7). dL1—part of an L1 element, dAlu—part of an Alu element, UMI—Unique Molecular Identifier.
Figure Legend Snippet: Principle of the method. gDNA was digested by either TaqI + FspBI (for L1), or FspBI + Csp6I (for Alu) and ligated to a stem-loop adapter (pink). Retroelement-specific primers (yellow and green arrows) were used for selective amplification of 3′ L1 or 5′ Alu flanking sequences (flanks). Indexing PCR introduced the sample barcodes and oligonucleotides that were necessary for Illumina sequencing (i5 and i7). dL1—part of an L1 element, dAlu—part of an Alu element, UMI—Unique Molecular Identifier.

Techniques Used: Amplification, Polymerase Chain Reaction, Sequencing

13) Product Images from "A chromosome-level, fully phased genome assembly of the oat crown rust fungus Puccinia coronata f. sp. avenae: a resource to enable comparative genomics in the cereal rusts"

Article Title: A chromosome-level, fully phased genome assembly of the oat crown rust fungus Puccinia coronata f. sp. avenae: a resource to enable comparative genomics in the cereal rusts

Journal: G3: Genes|Genomes|Genetics

doi: 10.1093/g3journal/jkac149

Flowchart illustrating the key steps in the haplotype phasing and chromosome level assembly of Pca 203. The NuclearPhaser pipeline ( Duan et al. 2022 ) was used to create a fully phased, chromosome-level assembly. First, NuclearPhaser constructs a highly confident subset of the 2 haplotypes that are expected to reside in separate nuclei and identifies potential phase swaps in the 2 preliminary haplotype sets. A high proportion of trans Hi-C reads should map within the A haplotype, so positions where the proportion drops are flagged as suspect for phase swaps. We inspected these potential phase switch breakpoints using Illumina read mappings. As shown in the figure, a change to high coverage multimapping reads indicates high similarity between regions across haplotypes that may have resulted in a phase swap. After correcting phase swaps, the NuclearPhaser pipeline was used again with the updated genome. Lastly, the 2 haplotypes were scaffolded separately with Hi-C data into chromosomes.
Figure Legend Snippet: Flowchart illustrating the key steps in the haplotype phasing and chromosome level assembly of Pca 203. The NuclearPhaser pipeline ( Duan et al. 2022 ) was used to create a fully phased, chromosome-level assembly. First, NuclearPhaser constructs a highly confident subset of the 2 haplotypes that are expected to reside in separate nuclei and identifies potential phase swaps in the 2 preliminary haplotype sets. A high proportion of trans Hi-C reads should map within the A haplotype, so positions where the proportion drops are flagged as suspect for phase swaps. We inspected these potential phase switch breakpoints using Illumina read mappings. As shown in the figure, a change to high coverage multimapping reads indicates high similarity between regions across haplotypes that may have resulted in a phase swap. After correcting phase swaps, the NuclearPhaser pipeline was used again with the updated genome. Lastly, the 2 haplotypes were scaffolded separately with Hi-C data into chromosomes.

Techniques Used: Construct, Hi-C

14) Product Images from "Quantifying molecular bias in DNA data storage"

Article Title: Quantifying molecular bias in DNA data storage

Journal: Nature Communications

doi: 10.1038/s41467-020-16958-3

Dilution-PCR experiment. a The experimental workflow. A master DNA pool was diluted to different average copy numbers as indicated in the drawing. Each dilution sample was PCR-amplified and sequenced using an Illumina NextSeq instrument, and the results sampled at 200x coverage. b A computational model for the dilution-PCR experiments. The synthesis pool model used N seq = 7,373 number of sequences, and normally distributed copy numbers with mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar n_{syn}$$\end{document} n ¯ s y n = 10 8 , and standard deviation σ = 3.2 × 10 7 . The c.v. of the synthesis pool in this simulation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$( {\frac{\sigma }{{\bar n_{syn}}} = 0.32} )$$\end{document} ( σ n ¯ s y n = 0.32 ) was set to be equal to the c.v. of our ready-to-sequence pool sequenced at mean coverage 17. The dilution process was simulated using random sampling with a mean copy number \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar n_0$$\end{document} n ¯ 0 , ranging from 8 to 113. PCR was simulated as a binomial process with a probability of successful amplification P = 0.95 and 18 PCR cycles. The simulated sequencing result was obtained using random sampling with an average coverage \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar n_r$$\end{document} n ¯ r = 200. c Simulated post-PCR sequencing coverage histogram of each dilution-PCR sample. The initial (pre-PCR) average copy number of each histogram is shown in the legend, ranging from 8 to 113. Coverage counts are normalized to display a probability density. A Gaussian estimated density curve is added as an outline of each histogram to help with visualization. d Sequencing coverage c.v. of the post-PCR mix versus average copy number in the pre-PCR mix. The model prediction (green) shows good agreement with the experimental data (blue) with R 2 = 0.71. The error bars of experimental data indicate standard error of the mean calculated from triplicate experiments. The error bars of model outputs indicate standard error of the mean calculated from 100 repeated simulations. Source data are available in the Source data file.
Figure Legend Snippet: Dilution-PCR experiment. a The experimental workflow. A master DNA pool was diluted to different average copy numbers as indicated in the drawing. Each dilution sample was PCR-amplified and sequenced using an Illumina NextSeq instrument, and the results sampled at 200x coverage. b A computational model for the dilution-PCR experiments. The synthesis pool model used N seq = 7,373 number of sequences, and normally distributed copy numbers with mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar n_{syn}$$\end{document} n ¯ s y n = 10 8 , and standard deviation σ = 3.2 × 10 7 . The c.v. of the synthesis pool in this simulation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$( {\frac{\sigma }{{\bar n_{syn}}} = 0.32} )$$\end{document} ( σ n ¯ s y n = 0.32 ) was set to be equal to the c.v. of our ready-to-sequence pool sequenced at mean coverage 17. The dilution process was simulated using random sampling with a mean copy number \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar n_0$$\end{document} n ¯ 0 , ranging from 8 to 113. PCR was simulated as a binomial process with a probability of successful amplification P = 0.95 and 18 PCR cycles. The simulated sequencing result was obtained using random sampling with an average coverage \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar n_r$$\end{document} n ¯ r = 200. c Simulated post-PCR sequencing coverage histogram of each dilution-PCR sample. The initial (pre-PCR) average copy number of each histogram is shown in the legend, ranging from 8 to 113. Coverage counts are normalized to display a probability density. A Gaussian estimated density curve is added as an outline of each histogram to help with visualization. d Sequencing coverage c.v. of the post-PCR mix versus average copy number in the pre-PCR mix. The model prediction (green) shows good agreement with the experimental data (blue) with R 2 = 0.71. The error bars of experimental data indicate standard error of the mean calculated from triplicate experiments. The error bars of model outputs indicate standard error of the mean calculated from 100 repeated simulations. Source data are available in the Source data file.

Techniques Used: Polymerase Chain Reaction, Amplification, Standard Deviation, Sequencing, Sampling

15) Product Images from "BBCAnalyzer: a visual approach to facilitate variant calling"

Article Title: BBCAnalyzer: a visual approach to facilitate variant calling

Journal: BMC Bioinformatics

doi: 10.1186/s12859-017-1549-4

Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis , detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)
Figure Legend Snippet: Exemplary output file from real patient data generated by Illumina NextSeq. Relative number of reads at seven positions analyzed in case of sample “Example_Illumina”. Reference bases are plotted at the negative y axis , detected bases in the mapped reads are plotted at the positive y axis (marked 5% threshold). Likely SNV at chr1:115,258,747 (reference C, ∼70% of the reads with high-quality C and ∼30% of the reads with high-quality T). No variant at chr2:25,467,204 (reference G, ∼100% of the reads with high-quality G). Unlikely SNV at chr2:198,267,280 (reference C, ∼95% of the reads with low-quality C, ∼5% of the reads with low-quality A). Likely deletion at chr4:106,157,106 (reference A, ∼75% of the reads with high quality A, ∼25% of the reads with deleted A). Known homozygous SNP at chr17:7,579,472 (reference G, polymorphism C displayed as additional reference base, ∼100% of the reads with high-quality C). Possible insertion of a “G”, but unlikely deletion at chr20:31,022,442 (reference G, ∼97% of the reads with high-quality G, ∼3% of the reads with deleted G, ∼30% of the reads with inserted high-quality G). Likely SNV at chr21:44,514,777 (reference T, ∼65% of the reads with high-quality T, ∼35% of the reads with high-quality G)

Techniques Used: Generated, Variant Assay

16) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

17) Product Images from "Adapterama IV: Sequence Capture of Dual-digest RADseq Libraries with Identifiable Duplicates (RADcap)) Trade-offs and utility of alternative RADseq methods: Reply to"

Article Title: Adapterama IV: Sequence Capture of Dual-digest RADseq Libraries with Identifiable Duplicates (RADcap)) Trade-offs and utility of alternative RADseq methods: Reply to

Journal: bioRxiv

doi: 10.1101/044651

Sequencing reads that can be obtained from full length 3RAD library molecules with iTru5-8N sequence tags. The top double stranded molecule shows a 3RAD library molecule prepared as described in the text. The color-scheme follows those of Glenn et al . (2016a ; 2016b ; 2016c) and Figure 2 . The horizontal arrows above the text indicate positions on baits. The horizontal arrows beneath the library molecule indicate Illumina sequencing primers (binding to the complementary strand of the library molecules). The tip of the arrowhead indicates the 3’ end of the primer and the direction of elongation for sequencing. Four sequencing reads are shown for each library prepared molecule, with one read for each index and each strand of the genomic DNA, including internal indexes. Reads are arranged 1 to 4 (numbered in magenta) from top to bottom, respectively. The arrow immediately 3’ of the primers, indicates the data that are obtained from that primer.
Figure Legend Snippet: Sequencing reads that can be obtained from full length 3RAD library molecules with iTru5-8N sequence tags. The top double stranded molecule shows a 3RAD library molecule prepared as described in the text. The color-scheme follows those of Glenn et al . (2016a ; 2016b ; 2016c) and Figure 2 . The horizontal arrows above the text indicate positions on baits. The horizontal arrows beneath the library molecule indicate Illumina sequencing primers (binding to the complementary strand of the library molecules). The tip of the arrowhead indicates the 3’ end of the primer and the direction of elongation for sequencing. Four sequencing reads are shown for each library prepared molecule, with one read for each index and each strand of the genomic DNA, including internal indexes. Reads are arranged 1 to 4 (numbered in magenta) from top to bottom, respectively. The arrow immediately 3’ of the primers, indicates the data that are obtained from that primer.

Techniques Used: Sequencing, Binding Assay

The components of the library molecule added in different steps of the protocol and the sequence of the ends of the molecules. Genomic DNA is digested with enzymes that leave enzyme-specific sticky ends, to which we ligate adapters. The left hand adapter is comprised of four bases that bind to the Xbal restriction site overhang (dark red), a sample-specific internal sequence tag, used to identify the sample (orange) and a Read 1 sequencing primer that is partially single stranded to facilitate annealing of the iTru5 primer (purple). The right-hand adapter is a y-yoke adapter composed of the four bases that bind to the EcoRI restriction site overhang (dark green), a sample-specific internal sequence tag (tan), and the Read 2 sequencing primer (red). During the single cycle PCR, the iTru5 primer is added to the library: the partial library is denatured, the primer anneals to the Read 1 sequencing primer overhang (purple), and extends, thereby adding the degenerate barcode with 8 N bases (green), and the P5 primer (maroon) which anneals to the Illumina flowcell. After cleaning up the reaction, a limited cycle PCR is performed to add the iTru7 primer, comprised of the Read 2 sequencing primer (red) which anneals to the single stranded adapter added earlier, a sample-specific barcode (blue), and P7 primer (light green) which anneals to the Illumina flowcell.
Figure Legend Snippet: The components of the library molecule added in different steps of the protocol and the sequence of the ends of the molecules. Genomic DNA is digested with enzymes that leave enzyme-specific sticky ends, to which we ligate adapters. The left hand adapter is comprised of four bases that bind to the Xbal restriction site overhang (dark red), a sample-specific internal sequence tag, used to identify the sample (orange) and a Read 1 sequencing primer that is partially single stranded to facilitate annealing of the iTru5 primer (purple). The right-hand adapter is a y-yoke adapter composed of the four bases that bind to the EcoRI restriction site overhang (dark green), a sample-specific internal sequence tag (tan), and the Read 2 sequencing primer (red). During the single cycle PCR, the iTru5 primer is added to the library: the partial library is denatured, the primer anneals to the Read 1 sequencing primer overhang (purple), and extends, thereby adding the degenerate barcode with 8 N bases (green), and the P5 primer (maroon) which anneals to the Illumina flowcell. After cleaning up the reaction, a limited cycle PCR is performed to add the iTru7 primer, comprised of the Read 2 sequencing primer (red) which anneals to the single stranded adapter added earlier, a sample-specific barcode (blue), and P7 primer (light green) which anneals to the Illumina flowcell.

Techniques Used: Sequencing, Polymerase Chain Reaction

18) Product Images from "Characterization and engineering of Streptomyces griseofuscus DSM 40191 as a potential host for heterologous expression of biosynthetic gene clusters"

Article Title: Characterization and engineering of Streptomyces griseofuscus DSM 40191 as a potential host for heterologous expression of biosynthetic gene clusters

Journal: Scientific Reports

doi: 10.1038/s41598-021-97571-2

Genome wide off-target evaluation of CRISPR-BEST mediated mutations in the strain preserved and sequenced after the introduction of Stop-codon (p057_0D; 2) in comparison to the same strain that was passed consecutively 20 times in liquid cultures (p057_20D; 3). Mutations, predicted for the WT Illumina dataset (1), used to produce the reference, the level of which can be considered a technical noise. This figure was made using online illustrator draw.io ( https://app.diagrams.net/ ).
Figure Legend Snippet: Genome wide off-target evaluation of CRISPR-BEST mediated mutations in the strain preserved and sequenced after the introduction of Stop-codon (p057_0D; 2) in comparison to the same strain that was passed consecutively 20 times in liquid cultures (p057_20D; 3). Mutations, predicted for the WT Illumina dataset (1), used to produce the reference, the level of which can be considered a technical noise. This figure was made using online illustrator draw.io ( https://app.diagrams.net/ ).

Techniques Used: Genome Wide, CRISPR

Overview of S. griseofuscus strains and mutations. The positions of genomic inverted repeats are highlighted in blue, BGCs in black and transposases in green. The positions of the identified mutations are highlighted in red. The mutations detected in the wild type Illumina dataset can be considered as a technical noise. The position of the CRISPR-cBEST introduced STOP-codon is indicated with a black triangle. The strains p057_0D and p057_20D relate to the long term cultivation experiment, in which CRISPR-cBEST generated strain S. griseofuscus IHEP81_06602 (p057_0D), that contains an introduced STOP-codon in BGC 30, was transferred 20 consecutive times in liquid ISP2 media without selective pressure, thus generating strain p057_20D. The alignment was created with CLC Genomics Workbench 12.0.3 https://digitalinsights.qiagen.com/ and visualised with Adobe Illustrator 23.0.6 https://www.adobe.com/products/illustrator.html .
Figure Legend Snippet: Overview of S. griseofuscus strains and mutations. The positions of genomic inverted repeats are highlighted in blue, BGCs in black and transposases in green. The positions of the identified mutations are highlighted in red. The mutations detected in the wild type Illumina dataset can be considered as a technical noise. The position of the CRISPR-cBEST introduced STOP-codon is indicated with a black triangle. The strains p057_0D and p057_20D relate to the long term cultivation experiment, in which CRISPR-cBEST generated strain S. griseofuscus IHEP81_06602 (p057_0D), that contains an introduced STOP-codon in BGC 30, was transferred 20 consecutive times in liquid ISP2 media without selective pressure, thus generating strain p057_20D. The alignment was created with CLC Genomics Workbench 12.0.3 https://digitalinsights.qiagen.com/ and visualised with Adobe Illustrator 23.0.6 https://www.adobe.com/products/illustrator.html .

Techniques Used: CRISPR, Generated

19) Product Images from "COV-ID: A LAMP sequencing approach for high-throughput co-detection of SARS-CoV-2 and influenza virus in human saliva"

Article Title: COV-ID: A LAMP sequencing approach for high-throughput co-detection of SARS-CoV-2 and influenza virus in human saliva

Journal: medRxiv

doi: 10.1101/2021.04.23.21255523

Barcoding and PCR amplification of RT-LAMP products (A) Overview of COV-ID. Saliva is collected and inactivated prior to RT-LAMP performed with up to 96 individual sample barcoded primers. LAMP reactions are pooled and further amplified via PCR to introduce Illumina adapter sequences and pool-level dual indexes. A single thermal cycler can amplify 96 or 384 such pools and the resulting “super-pool” can be sequenced overnight to detect multiple amplicons from 9,216 or 36,864 individual patient samples (number of reads in parenthesis assume an output of ∼450M reads from a NextSeq 500). (B) Schematic of the RT-LAMP (step I) of COV-ID. Selected numbered intermediates of RT-LAMP reaction are shown to illustrate how the LAMP barcode, shown in yellow, and the P5 and P7 homology sequences (blue and pink, respectively) are introduced in the final LAMP product. Upon generating the dumb-bell intermediate the reaction proceeds through rapid primed and self-primed extensions to form mixture of various DNA amplicons containing sequences for PCR amplification. A more detailed version of the LAMP phase of COV-ID, including specific sequences, is illustrated in Fig. S1 . (C) Conventional RT-LAMP primers (solid lines) or primers modified for COV-ID (dotted lines) were used for RT-LAMP of SARS-CoV-2. The numbers of inactivated SARS-CoV-2 virions per µL is indicated in the color legend. (D) Schematic of the PCR (step II) of COV-ID. Following RT-LAMP, up to 96 reactions are pooled and purified and Illumina libraries are generated directly by PCR with dual-indexed P5 and P7 adapters in preparation for sequencing. (E) COV-ID primers targeting ACTB mRNA were used for RT-LAMP with HeLa total RNA. LAMP was diluted 1:100, amplified via PCR and resolved on 2% agarose gel.
Figure Legend Snippet: Barcoding and PCR amplification of RT-LAMP products (A) Overview of COV-ID. Saliva is collected and inactivated prior to RT-LAMP performed with up to 96 individual sample barcoded primers. LAMP reactions are pooled and further amplified via PCR to introduce Illumina adapter sequences and pool-level dual indexes. A single thermal cycler can amplify 96 or 384 such pools and the resulting “super-pool” can be sequenced overnight to detect multiple amplicons from 9,216 or 36,864 individual patient samples (number of reads in parenthesis assume an output of ∼450M reads from a NextSeq 500). (B) Schematic of the RT-LAMP (step I) of COV-ID. Selected numbered intermediates of RT-LAMP reaction are shown to illustrate how the LAMP barcode, shown in yellow, and the P5 and P7 homology sequences (blue and pink, respectively) are introduced in the final LAMP product. Upon generating the dumb-bell intermediate the reaction proceeds through rapid primed and self-primed extensions to form mixture of various DNA amplicons containing sequences for PCR amplification. A more detailed version of the LAMP phase of COV-ID, including specific sequences, is illustrated in Fig. S1 . (C) Conventional RT-LAMP primers (solid lines) or primers modified for COV-ID (dotted lines) were used for RT-LAMP of SARS-CoV-2. The numbers of inactivated SARS-CoV-2 virions per µL is indicated in the color legend. (D) Schematic of the PCR (step II) of COV-ID. Following RT-LAMP, up to 96 reactions are pooled and purified and Illumina libraries are generated directly by PCR with dual-indexed P5 and P7 adapters in preparation for sequencing. (E) COV-ID primers targeting ACTB mRNA were used for RT-LAMP with HeLa total RNA. LAMP was diluted 1:100, amplified via PCR and resolved on 2% agarose gel.

Techniques Used: Polymerase Chain Reaction, Amplification, Introduce, Modification, Purification, Generated, Sequencing, Agarose Gel Electrophoresis

20) Product Images from "Supplementation of a lacto-fermented rapeseed-seaweed blend promotes gut microbial- and gut immune-modulation in weaner piglets"

Article Title: Supplementation of a lacto-fermented rapeseed-seaweed blend promotes gut microbial- and gut immune-modulation in weaner piglets

Journal: bioRxiv

doi: 10.1101/2020.09.22.308106

Relative abundance of prokaryotes based on 16S rRNA gene amplicon sequencing using either the Illumina or the Oxford Nanopore Technologies platform. Respectively n= 9, 8, 10 for basal diet with 0%, 2.5% and 5% added FRS (fermented rapeseed- Sacharina latissima - Ascophillum nodossum ).
Figure Legend Snippet: Relative abundance of prokaryotes based on 16S rRNA gene amplicon sequencing using either the Illumina or the Oxford Nanopore Technologies platform. Respectively n= 9, 8, 10 for basal diet with 0%, 2.5% and 5% added FRS (fermented rapeseed- Sacharina latissima - Ascophillum nodossum ).

Techniques Used: Amplification, Sequencing

Prevotella stercorea and Mitsuokella spp. showed different colonized abundance in gut between feeding regime groups by both Illumina sequencing on V3 region of 16S rRNA gene (A, B) and Oxford Nanopore sequencing on V1-V8 region of 16S rRNA gene (C, D). Data in the bar plot was presented as mean value and SEM error bar. Respectively n= 9, 8, 10 for basal diet with 0%, 2.5% and 5% added FRS (fermented rapeseed- Sacharina latissima - Ascophillum nodossum ). The labels of *, **, *** represent p
Figure Legend Snippet: Prevotella stercorea and Mitsuokella spp. showed different colonized abundance in gut between feeding regime groups by both Illumina sequencing on V3 region of 16S rRNA gene (A, B) and Oxford Nanopore sequencing on V1-V8 region of 16S rRNA gene (C, D). Data in the bar plot was presented as mean value and SEM error bar. Respectively n= 9, 8, 10 for basal diet with 0%, 2.5% and 5% added FRS (fermented rapeseed- Sacharina latissima - Ascophillum nodossum ). The labels of *, **, *** represent p

Techniques Used: Sequencing, Nanopore Sequencing

Alpha and beta diversity analysis of 16S rRNA gene amplicon sequencing data. Observed ASVs and Shannon index based on rarefied ASV table with Illumina sequencing on V3 region (Illumina V3) (A); Observed features and Shannon index based on species-level summarized table with Illumina sequencing on V3 region (B) and Oxford Nanopore sequencing on V1-V8 region (ONT V1-V8) (C). The mean value for each group is marked as a bold line respectively. Respective PCoA plot of binary Jaccard and Bray Curtis distance metrics based on rarefied ASV table (Illumina V3) (D), species-level summarized table (Illumina V3) (E) and species-level summarized table (ONT V1-V8) (F). The ellipses show respective 80% confidential area following multivariate t-distribution. Respectively n= 9, 8, 10 for basal diet with 0%, 2.5% and 5% added FRS (fermented rapeseed- Sacharina latissima - Ascophillum nodossum ). For pairwise Wilcoxon test on alpha diversity, the labels of *, ** represent p
Figure Legend Snippet: Alpha and beta diversity analysis of 16S rRNA gene amplicon sequencing data. Observed ASVs and Shannon index based on rarefied ASV table with Illumina sequencing on V3 region (Illumina V3) (A); Observed features and Shannon index based on species-level summarized table with Illumina sequencing on V3 region (B) and Oxford Nanopore sequencing on V1-V8 region (ONT V1-V8) (C). The mean value for each group is marked as a bold line respectively. Respective PCoA plot of binary Jaccard and Bray Curtis distance metrics based on rarefied ASV table (Illumina V3) (D), species-level summarized table (Illumina V3) (E) and species-level summarized table (ONT V1-V8) (F). The ellipses show respective 80% confidential area following multivariate t-distribution. Respectively n= 9, 8, 10 for basal diet with 0%, 2.5% and 5% added FRS (fermented rapeseed- Sacharina latissima - Ascophillum nodossum ). For pairwise Wilcoxon test on alpha diversity, the labels of *, ** represent p

Techniques Used: Amplification, Sequencing, Nanopore Sequencing

21) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

22) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

23) Product Images from "Infection and transmission of ancestral SARS-CoV-2 and its alpha variant in pregnant white-tailed deer"

Article Title: Infection and transmission of ancestral SARS-CoV-2 and its alpha variant in pregnant white-tailed deer

Journal: bioRxiv

doi: 10.1101/2021.08.15.456341

Next-generation sequencing of swabs collected from SARS-CoV-2 co-infected white-tailed deer. cDNA products of SARS-CoV-2 RNA extracted from nasal swabs were sequenced on the Illumina NextSeq platform to evaluate the in vivo competition between the ancestral lineage A WA1 (SARS-CoV-2/human/USA/WA1/2020) and the alpha VOC B.1.1.7 (SARS-CoV-2/human/ USA/CA-5574/ 2020) strains.
Figure Legend Snippet: Next-generation sequencing of swabs collected from SARS-CoV-2 co-infected white-tailed deer. cDNA products of SARS-CoV-2 RNA extracted from nasal swabs were sequenced on the Illumina NextSeq platform to evaluate the in vivo competition between the ancestral lineage A WA1 (SARS-CoV-2/human/USA/WA1/2020) and the alpha VOC B.1.1.7 (SARS-CoV-2/human/ USA/CA-5574/ 2020) strains.

Techniques Used: Next-Generation Sequencing, Infection, In Vivo

24) Product Images from "Model-driven generation of artificial yeast promoters"

Article Title: Model-driven generation of artificial yeast promoters

Journal: bioRxiv

doi: 10.1101/748616

FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.
Figure Legend Snippet: FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.

Techniques Used: FACS, Activity Assay, Binding Assay, High Throughput Screening Assay, Next-Generation Sequencing, Derivative Assay, Sequencing, Generated

Neural networks trained on P GPD and P ZEV data accurately predict promoter activity. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in analyses of P GPD data; only sequences for which at least 20 NextSeq reads were counted in each replicate were used in analyses of P ZEV data. A: Model loss curve for P GPD training; dashed line indicates epoch selected by early stopping for the final model. B: Predicted promoter activities versus FACS-seq measurements for held-out test data in the P GPD dataset. C: Model loss curve for P ZEV training; dashed line indicates epoch selected by early stopping for the final model. D: Predicted promoter activities in the uninduced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. E: Predicted promoter activities in the induced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. F: Predicted activation ratios (ratio of predicted induced and uninduced promoter activities) versus FACS-seq-derived activation ratios for held-out test data in the P ZEV dataset.
Figure Legend Snippet: Neural networks trained on P GPD and P ZEV data accurately predict promoter activity. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in analyses of P GPD data; only sequences for which at least 20 NextSeq reads were counted in each replicate were used in analyses of P ZEV data. A: Model loss curve for P GPD training; dashed line indicates epoch selected by early stopping for the final model. B: Predicted promoter activities versus FACS-seq measurements for held-out test data in the P GPD dataset. C: Model loss curve for P ZEV training; dashed line indicates epoch selected by early stopping for the final model. D: Predicted promoter activities in the uninduced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. E: Predicted promoter activities in the induced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. F: Predicted activation ratios (ratio of predicted induced and uninduced promoter activities) versus FACS-seq-derived activation ratios for held-out test data in the P ZEV dataset.

Techniques Used: Activity Assay, FACS, Activation Assay, Derivative Assay

25) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

26) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

27) Product Images from "fastp: an ultra-fast all-in-one FASTQ preprocessor"

Article Title: fastp: an ultra-fast all-in-one FASTQ preprocessor

Journal: Bioinformatics

doi: 10.1093/bioinformatics/bty560

The base content ratio curves generated by fastp for one Illumina NextSeq FASTQ file. ( a ) Before fastp preprocessing, and ( b ) after fastp preprocessing. As depicted in (a), the G curve is abnormal and the G/C curves are separated. In (b), the G/C separation problem is eliminated
Figure Legend Snippet: The base content ratio curves generated by fastp for one Illumina NextSeq FASTQ file. ( a ) Before fastp preprocessing, and ( b ) after fastp preprocessing. As depicted in (a), the G curve is abnormal and the G/C curves are separated. In (b), the G/C separation problem is eliminated

Techniques Used: Generated

28) Product Images from "Deep genome annotation of the opportunistic human pathogen Streptococcus pneumoniae D39"

Article Title: Deep genome annotation of the opportunistic human pathogen Streptococcus pneumoniae D39

Journal: bioRxiv

doi: 10.1101/283663

Detection of small RNAfeatures. A . Size distributions of entire sequencing library (left) and of fragments ending in the terminator region of pepC (right), as determined from paired-end sequencing. The negative control for sRNA detection, pepC , is much longer (1.3 kbp) than the typical fragment length in Illumina sequencing (100-350 bp). The plot is based on all sequenced molecules of which the 3’-end falls inside the downstream terminator region. The position of the 5’-end of each of these molecules is determined by random fragmentation in the library preparation. Therefore, its size distribution is expected to be comparable to that of the entire library (left plot and faint grey in right plot). B-D . Top in each panel: RNA-seq coverage plots, as calculated from paired-end sequenced fragments in the unprocessed control library (See Supplementary Methods ). Bottom in each panel: size distributions (bin sizes 10 and 1) of fragments ending in the indicated terminator regions. B . Detection of ssrS (left) and joint ssrS/tRNA-Lys1 (right) transcripts. Due to high abundance of ssrS , coverage is shown on log-scale in the right panel. Size distributions reveal 5’-processed and unprocessed ssrS (left) and full-length ssrs / tRNA-Lys1 transcripts (right). C . Detection of an sRNA antisense to the 3’-region of mutR1. D . Detection of a T-box RNA switch structure upstream of pheS .
Figure Legend Snippet: Detection of small RNAfeatures. A . Size distributions of entire sequencing library (left) and of fragments ending in the terminator region of pepC (right), as determined from paired-end sequencing. The negative control for sRNA detection, pepC , is much longer (1.3 kbp) than the typical fragment length in Illumina sequencing (100-350 bp). The plot is based on all sequenced molecules of which the 3’-end falls inside the downstream terminator region. The position of the 5’-end of each of these molecules is determined by random fragmentation in the library preparation. Therefore, its size distribution is expected to be comparable to that of the entire library (left plot and faint grey in right plot). B-D . Top in each panel: RNA-seq coverage plots, as calculated from paired-end sequenced fragments in the unprocessed control library (See Supplementary Methods ). Bottom in each panel: size distributions (bin sizes 10 and 1) of fragments ending in the indicated terminator regions. B . Detection of ssrS (left) and joint ssrS/tRNA-Lys1 (right) transcripts. Due to high abundance of ssrS , coverage is shown on log-scale in the right panel. Size distributions reveal 5’-processed and unprocessed ssrS (left) and full-length ssrs / tRNA-Lys1 transcripts (right). C . Detection of an sRNA antisense to the 3’-region of mutR1. D . Detection of a T-box RNA switch structure upstream of pheS .

Techniques Used: Sequencing, Negative Control, RNA Sequencing Assay

Data analysis pipeline used for genome assembly and annotation. Left. DNA level: the genome sequence of D39V was determined by SMRT sequencing, supported by previously published Illumina data ( 10 , 25 ). Automated annotation by the RAST ( 13 ) and PGAP ( 4 ) annotation pipelines was followed by curation based on information from literature and a variety of databases and bioinformatic tools. Right. RNA level: Cappable-seq ( 7 ) was utilized to identify transcription start sites. Simultaneously, putative transcript ends were identified by combining reverse reads from paired-end, stranded sequencing of the control sample (i.e. not 5’-enriched). Terminators were annotated when such putative transcript ends overlapped with stem loops predicted by TransTermHP ( 22 ). Finally, local fragment size enrichment in the paired-end sequencing data was used to identify putative small RNA features. α D39 derivative ( bgaA ::P ssbB - luc ; GEO accessions GSE54199 and GSE69729). β The first 1 kbp of the genome file was duplicated at the end, to allow mapping over FASTA boundaries. γ Analysis was performed with only sequencing pairs that map uniquely to the genome.
Figure Legend Snippet: Data analysis pipeline used for genome assembly and annotation. Left. DNA level: the genome sequence of D39V was determined by SMRT sequencing, supported by previously published Illumina data ( 10 , 25 ). Automated annotation by the RAST ( 13 ) and PGAP ( 4 ) annotation pipelines was followed by curation based on information from literature and a variety of databases and bioinformatic tools. Right. RNA level: Cappable-seq ( 7 ) was utilized to identify transcription start sites. Simultaneously, putative transcript ends were identified by combining reverse reads from paired-end, stranded sequencing of the control sample (i.e. not 5’-enriched). Terminators were annotated when such putative transcript ends overlapped with stem loops predicted by TransTermHP ( 22 ). Finally, local fragment size enrichment in the paired-end sequencing data was used to identify putative small RNA features. α D39 derivative ( bgaA ::P ssbB - luc ; GEO accessions GSE54199 and GSE69729). β The first 1 kbp of the genome file was duplicated at the end, to allow mapping over FASTA boundaries. γ Analysis was performed with only sequencing pairs that map uniquely to the genome.

Techniques Used: Sequencing, RAST Test

29) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

30) Product Images from "Applicability of Neighborhood and Building Scale Wastewater-Based Genomic Epidemiology to Track the SARS-CoV-2 Pandemic and other Pathogens"

Article Title: Applicability of Neighborhood and Building Scale Wastewater-Based Genomic Epidemiology to Track the SARS-CoV-2 Pandemic and other Pathogens

Journal: medRxiv

doi: 10.1101/2021.02.18.21251939

Processing flowchart for Total RNA-Seq data. RNA-Seq data was obtained from runs on an Illumina NextSeq instrument. Files will demultiplexed, quality checked, and trimmed. Sequences were assembled using SPAdes into contigs and then compared to a known reference sequences in the refseq database using BlastN or BlastX-DIAMOND to identify pathogens in the wastewater samples.
Figure Legend Snippet: Processing flowchart for Total RNA-Seq data. RNA-Seq data was obtained from runs on an Illumina NextSeq instrument. Files will demultiplexed, quality checked, and trimmed. Sequences were assembled using SPAdes into contigs and then compared to a known reference sequences in the refseq database using BlastN or BlastX-DIAMOND to identify pathogens in the wastewater samples.

Techniques Used: RNA Sequencing Assay

31) Product Images from "fastp: an ultra-fast all-in-one FASTQ preprocessor"

Article Title: fastp: an ultra-fast all-in-one FASTQ preprocessor

Journal: Bioinformatics

doi: 10.1093/bioinformatics/bty560

The base content ratio curves generated by fastp for one Illumina NextSeq FASTQ file. ( a ) Before fastp preprocessing, and ( b ) after fastp preprocessing. As depicted in (a), the G curve is abnormal and the G/C curves are separated. In (b), the G/C separation problem is eliminated
Figure Legend Snippet: The base content ratio curves generated by fastp for one Illumina NextSeq FASTQ file. ( a ) Before fastp preprocessing, and ( b ) after fastp preprocessing. As depicted in (a), the G curve is abnormal and the G/C curves are separated. In (b), the G/C separation problem is eliminated

Techniques Used: Generated

32) Product Images from "A comprehensive approach for genome-wide efficiency profiling of DNA modifying enzymes"

Article Title: A comprehensive approach for genome-wide efficiency profiling of DNA modifying enzymes

Journal: Cell Reports Methods

doi: 10.1016/j.crmeth.2022.100187

GwEEP - Pipeline overview (A) Laboratory pipeline: (1) Genomic DNA is digested by endo nucleases followed by (2) Klenow exo-catalyzed A-tailing. (3) A-Tailed DNA molecules are subjected to sequencing adapter, hairpin linker ligation and (4) subsequent enrichment of hairpin-ligated molecules. (5) Half of the library is used for BS, the other half for oxBS treatment. (6) After amplification and indexing using PCR the libraries are sequenced on an Illumina platform with minimum 100 bp in a paired-end mode (created with BioRender.com ). (B) Computational processing: Illumina raw data are processed into base calls (FASTQ) and trimmed for adapter and hairpin linker sequences. Bisulfite reads from the same molecules are paired to restore the genomic sequence for efficient mapping. Subsequently, the double-strand information is annotated and stored in DSI (double strand information) files. The HMM then derives 5mC and 5hmC distributions, as well as the efficiencies of Dnmt and Tets which are stored in the IGV file format. Both DSI and IGV files can be visualized using the IGV genome browser. (created with BioRender.com).
Figure Legend Snippet: GwEEP - Pipeline overview (A) Laboratory pipeline: (1) Genomic DNA is digested by endo nucleases followed by (2) Klenow exo-catalyzed A-tailing. (3) A-Tailed DNA molecules are subjected to sequencing adapter, hairpin linker ligation and (4) subsequent enrichment of hairpin-ligated molecules. (5) Half of the library is used for BS, the other half for oxBS treatment. (6) After amplification and indexing using PCR the libraries are sequenced on an Illumina platform with minimum 100 bp in a paired-end mode (created with BioRender.com ). (B) Computational processing: Illumina raw data are processed into base calls (FASTQ) and trimmed for adapter and hairpin linker sequences. Bisulfite reads from the same molecules are paired to restore the genomic sequence for efficient mapping. Subsequently, the double-strand information is annotated and stored in DSI (double strand information) files. The HMM then derives 5mC and 5hmC distributions, as well as the efficiencies of Dnmt and Tets which are stored in the IGV file format. Both DSI and IGV files can be visualized using the IGV genome browser. (created with BioRender.com).

Techniques Used: Sequencing, Ligation, Amplification, Polymerase Chain Reaction

33) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

34) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

35) Product Images from "Feasibility of neighborhood and building scale wastewater-based genomic epidemiology for pathogen surveillance"

Article Title: Feasibility of neighborhood and building scale wastewater-based genomic epidemiology for pathogen surveillance

Journal: The Science of the Total Environment

doi: 10.1016/j.scitotenv.2021.147829

Processing flowchart for Total RNA-Seq data. RNA-Seq data was obtained from runs on an Illumina NextSeq instrument. Files were demultiplexed, quality checked, and trimmed. Sequences were assembled using SPAdes into contigs and then compared to known reference sequences in the refseq database using BlastN or DIAMOND BlastX to identify pathogens in the wastewater samples.
Figure Legend Snippet: Processing flowchart for Total RNA-Seq data. RNA-Seq data was obtained from runs on an Illumina NextSeq instrument. Files were demultiplexed, quality checked, and trimmed. Sequences were assembled using SPAdes into contigs and then compared to known reference sequences in the refseq database using BlastN or DIAMOND BlastX to identify pathogens in the wastewater samples.

Techniques Used: RNA Sequencing Assay

36) Product Images from "Sex-specific transcriptomic and epitranscriptomic signatures of PTSD-like fear acquisition"

Article Title: Sex-specific transcriptomic and epitranscriptomic signatures of PTSD-like fear acquisition

Journal: iScience

doi: 10.1016/j.isci.2022.104861

Sex-specific transcriptional profiles are associated with the acquisition of PTSD-like fear memories in the amygdala (A) Adult male and female mice underwent cued-fear conditioning (or the control condition) after which they were tested for cued-fear expression 24 h later; another group was sacrificed within 15 min of fear conditioning to obtain the amygdala. (B) RNA from the amygdala of adult male and female mice from both experimental groups (control or cued-fear) was subsequently pooled (3–4 individuals per pool or 12 individuals per condition) for indirect (Illumina) and direct (Oxford Nanopore) sequencing. (C) Face validity of the cued fear conditioning paradigm was verified in male and female mice, which showed strong expression of conditioned fear responses. ANOVA, ∗∗∗p
Figure Legend Snippet: Sex-specific transcriptional profiles are associated with the acquisition of PTSD-like fear memories in the amygdala (A) Adult male and female mice underwent cued-fear conditioning (or the control condition) after which they were tested for cued-fear expression 24 h later; another group was sacrificed within 15 min of fear conditioning to obtain the amygdala. (B) RNA from the amygdala of adult male and female mice from both experimental groups (control or cued-fear) was subsequently pooled (3–4 individuals per pool or 12 individuals per condition) for indirect (Illumina) and direct (Oxford Nanopore) sequencing. (C) Face validity of the cued fear conditioning paradigm was verified in male and female mice, which showed strong expression of conditioned fear responses. ANOVA, ∗∗∗p

Techniques Used: Mouse Assay, Expressing, Nanopore Sequencing

37) Product Images from "GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data"

Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

Journal: PLoS ONE

doi: 10.1371/journal.pone.0171983

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T
Figure Legend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

Techniques Used: Sequencing, Variant Assay, Filtration, Selection

38) Product Images from "Methylation status of genes escaping from X-chromosome inactivation in patients with X-chromosome rearrangements"

Article Title: Methylation status of genes escaping from X-chromosome inactivation in patients with X-chromosome rearrangements

Journal: Clinical Epigenetics

doi: 10.1186/s13148-021-01121-6

Methylation status at the promoter region of 32 escape genes examined by array-based methylation analysis using the Illumina Infinium Human Methylation EPIC BeadChip kit in the four patients. The heat map of β values in the four patients. The probes with lower than 0.25 in the β value are shown in gray and those with higher methylation (over 0.25) are shown in pink. The names of escape genes are shown at the right side of the heat map. The names of the three genes with abnormal methylation in Patient 1 are exhibited in red. The number of probes in each gene is shown as the height of each gene in the heat map (the height of one probe and two probes is exemplified at the lower right part of the figure)
Figure Legend Snippet: Methylation status at the promoter region of 32 escape genes examined by array-based methylation analysis using the Illumina Infinium Human Methylation EPIC BeadChip kit in the four patients. The heat map of β values in the four patients. The probes with lower than 0.25 in the β value are shown in gray and those with higher methylation (over 0.25) are shown in pink. The names of escape genes are shown at the right side of the heat map. The names of the three genes with abnormal methylation in Patient 1 are exhibited in red. The number of probes in each gene is shown as the height of each gene in the heat map (the height of one probe and two probes is exemplified at the lower right part of the figure)

Techniques Used: Methylation

39) Product Images from "Genome-wide analysis in Escherichia coli unravels a high level of genetic homoplasy associated with cefotaxime resistance"

Article Title: Genome-wide analysis in Escherichia coli unravels a high level of genetic homoplasy associated with cefotaxime resistance

Journal: Microbial Genomics

doi: 10.1099/mgen.0.000556

Schematic of the workflow used to perform the homoplasy-based association analysis. Starting from the top, (a) the de novo assembly of the NextSeq/MiSeq reads and (b) the hybrid assembly of the reference chromosome ampC_069. On the left side, (c) the alignment of promoter/attenuator region. In the middle, (d) the coreSNP analysis for the phylogeny used in (e) the homoplasy analysis combined with (f) the fullSNP data, on the right, which was also used for (g) the statistics (Fisher's exact test and FDR) to relate CTX resistance to SNP positions. ( h ) Inferring recombination events using Gubbins.
Figure Legend Snippet: Schematic of the workflow used to perform the homoplasy-based association analysis. Starting from the top, (a) the de novo assembly of the NextSeq/MiSeq reads and (b) the hybrid assembly of the reference chromosome ampC_069. On the left side, (c) the alignment of promoter/attenuator region. In the middle, (d) the coreSNP analysis for the phylogeny used in (e) the homoplasy analysis combined with (f) the fullSNP data, on the right, which was also used for (g) the statistics (Fisher's exact test and FDR) to relate CTX resistance to SNP positions. ( h ) Inferring recombination events using Gubbins.

Techniques Used:

40) Product Images from "Assessment of Viral Targeted Sequence Capture Using Nanopore Sequencing Directly from Clinical Samples"

Article Title: Assessment of Viral Targeted Sequence Capture Using Nanopore Sequencing Directly from Clinical Samples

Journal: Viruses

doi: 10.3390/v12121358

Sample composition with and without TSC on the ONT MinION and with TSC on the Illumina NextSeq. Abbreviations: M, MinION; MV, MinION with ViroCap; NV, NextSeq with ViroCap.
Figure Legend Snippet: Sample composition with and without TSC on the ONT MinION and with TSC on the Illumina NextSeq. Abbreviations: M, MinION; MV, MinION with ViroCap; NV, NextSeq with ViroCap.

Techniques Used: ViroCap

Similar Products

  • Logo
  • About
  • News
  • Press Release
  • Team
  • Advisors
  • Partners
  • Contact
  • Bioz Stars
  • Bioz vStars
  • 90
    Illumina Inc nextseq 500 instrument
    TGIRT-seq of ribodepleted fragmented UHR RNA with ERCC spike-ins using the NTT and NTC adapters. TGIRT-seq libraries were prepared in triplicate for each adapter and sequenced on an Illumina <t>NextSeq</t> 500 to obtain 58-105 million 75-nt paired-end reads, which were mapped to a human reference genomic (Ensembl GRCh38) modified to include additional rRNA repeats (Materials and Methods and Supplemental Table S1). The data were used to generate stacked bar graphs showing the percentages of: ( A ) read-pairs that mapped concordantly to the annotated orientation of different categories of genomic features; ( B ) small ncRNA reads that mapped to different classes of small ncRNAs; ( C) protein-coding gene reads that mapped to the sense or antisense strand; ( D ) bases in protein-coding gene reads that mapped to coding sequences (CDS), introns, 5’- and 3’-untranslated regions (UTRs), and intergenic regions. The name of the dataset is indicated below. ( E) Sequence biases at the 5’- and 3’-ends of RNA fragments in combined technical replicates of datasets obtained by TGIRT-seq of fragmented UHR RNAs with either the NTC or NTT adapters. Mapped reads from fragmented human reference RNAs using NTC (datasets NTC-F1 to -F3) or NTT (datasets NTT-F1 to F3) adapters were combined to calculate the nucleotides frequency at 14 positions at the 5’- and 3’-ends of the RNA fragments (positions +1 to +14 at the 5’ end of read 1 and −1 to −14 at the 5’ end of read 2, respectively).
    Nextseq 500 Instrument, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/nextseq 500 instrument/product/Illumina Inc
    Average 90 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    nextseq 500 instrument - by Bioz Stars, 2022-09
    90/100 stars
      Buy from Supplier

    88
    Illumina Inc illumina nextseq
    FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 <t>NextSeq</t> reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.
    Illumina Nextseq, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 88/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/illumina nextseq/product/Illumina Inc
    Average 88 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    illumina nextseq - by Bioz Stars, 2022-09
    88/100 stars
      Buy from Supplier

    99
    Illumina Inc nextseq sequencing data
    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina <t>NextSeq.</t> Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454
    Nextseq Sequencing Data, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 99/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/nextseq sequencing data/product/Illumina Inc
    Average 99 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    nextseq sequencing data - by Bioz Stars, 2022-09
    99/100 stars
      Buy from Supplier

    99
    Illumina Inc nextseq 500 platform
    Schematic circular representation of complete genome sequences of KPC‐2‐producing A. hydrophila GSH8‐2. Short paired‐end whole‐genome sequencing was performed using an Illumina <t>NextSeq</t> 500 platform with a 300‐cycle NextSeq 500 Reagent Kit v2 (2 × 150‐mer). The complete genome sequences of the strains were determined using a PacBio Sequel sequencer for long‐read sequencing [Sequel SMRT Cell 1 M v2 (4/tray]; Sequel sequencing kit v2.1; insert size, approximately 10 kb). De novo assembly was performed using Canu version 1.4 (Koren et al ., 2017 ), minimap version 0.2‐r124 (Li, 2016 ), racon version 1.1.0 (Vaser et al ., 2017 ) and circulator version 1.5.3 (Hunt et al ., 2015 ). Error correction of tentative complete circular sequences was performed using Pilon version 1.18 with Illumina short reads (Walker et al ., 2014 ). Annotation was performed in Prokka version 1.11 (Seemann, 2014 ), InterPro v49.0 (Finn et al ., 2017 ) and NCBI‐BLASTP/BLASTX. Circular representations of complete genomic sequences were visualized using GView server (Petkau et al ., 2010 ). AMR genes were identified by homology searching against the ResFinder database (Zankari et al ., 2012 ). The class 1 integron was assigned in the INTEGRALL database ( http://integrall.bio.ua.pt/ ) (Moura et al ., 2009 ). Visualization of comparative plasmid ORFs organization was performed using Easyfig (Sullivan et al ., 2011 ). For representation of chromosomal DNA, from the inside: slot 1, GC skew; slot 2, GC content; slot 3, ORFs; slot 4, rRNA/tRNA; slots 5–7, BLASTatlas conserved gene analysis indicating three relative strains (see also Supporting Information Fig. S1 ); slot 8, prophage; slot 9, notable ARGs or ARG‐related genes (transposase, ARGs and reductases). In representations of circular plasmids, notable ORFs are highlighted as the indicated colour.
    Nextseq 500 Platform, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 99/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/nextseq 500 platform/product/Illumina Inc
    Average 99 stars, based on 1 article reviews
    Price from $9.99 to $1999.99
    nextseq 500 platform - by Bioz Stars, 2022-09
    99/100 stars
      Buy from Supplier

    Image Search Results


    TGIRT-seq of ribodepleted fragmented UHR RNA with ERCC spike-ins using the NTT and NTC adapters. TGIRT-seq libraries were prepared in triplicate for each adapter and sequenced on an Illumina NextSeq 500 to obtain 58-105 million 75-nt paired-end reads, which were mapped to a human reference genomic (Ensembl GRCh38) modified to include additional rRNA repeats (Materials and Methods and Supplemental Table S1). The data were used to generate stacked bar graphs showing the percentages of: ( A ) read-pairs that mapped concordantly to the annotated orientation of different categories of genomic features; ( B ) small ncRNA reads that mapped to different classes of small ncRNAs; ( C) protein-coding gene reads that mapped to the sense or antisense strand; ( D ) bases in protein-coding gene reads that mapped to coding sequences (CDS), introns, 5’- and 3’-untranslated regions (UTRs), and intergenic regions. The name of the dataset is indicated below. ( E) Sequence biases at the 5’- and 3’-ends of RNA fragments in combined technical replicates of datasets obtained by TGIRT-seq of fragmented UHR RNAs with either the NTC or NTT adapters. Mapped reads from fragmented human reference RNAs using NTC (datasets NTC-F1 to -F3) or NTT (datasets NTT-F1 to F3) adapters were combined to calculate the nucleotides frequency at 14 positions at the 5’- and 3’-ends of the RNA fragments (positions +1 to +14 at the 5’ end of read 1 and −1 to −14 at the 5’ end of read 2, respectively).

    Journal: bioRxiv

    Article Title: Improved TGIRT-seq methods for comprehensive transcriptome profiling with decreased adapter dimer formation and bias correction

    doi: 10.1101/474031

    Figure Lengend Snippet: TGIRT-seq of ribodepleted fragmented UHR RNA with ERCC spike-ins using the NTT and NTC adapters. TGIRT-seq libraries were prepared in triplicate for each adapter and sequenced on an Illumina NextSeq 500 to obtain 58-105 million 75-nt paired-end reads, which were mapped to a human reference genomic (Ensembl GRCh38) modified to include additional rRNA repeats (Materials and Methods and Supplemental Table S1). The data were used to generate stacked bar graphs showing the percentages of: ( A ) read-pairs that mapped concordantly to the annotated orientation of different categories of genomic features; ( B ) small ncRNA reads that mapped to different classes of small ncRNAs; ( C) protein-coding gene reads that mapped to the sense or antisense strand; ( D ) bases in protein-coding gene reads that mapped to coding sequences (CDS), introns, 5’- and 3’-untranslated regions (UTRs), and intergenic regions. The name of the dataset is indicated below. ( E) Sequence biases at the 5’- and 3’-ends of RNA fragments in combined technical replicates of datasets obtained by TGIRT-seq of fragmented UHR RNAs with either the NTC or NTT adapters. Mapped reads from fragmented human reference RNAs using NTC (datasets NTC-F1 to -F3) or NTT (datasets NTT-F1 to F3) adapters were combined to calculate the nucleotides frequency at 14 positions at the 5’- and 3’-ends of the RNA fragments (positions +1 to +14 at the 5’ end of read 1 and −1 to −14 at the 5’ end of read 2, respectively).

    Article Snippet: The PCR products were cleaned up by using Agencourt AMPure XP beads (1.4X volume; Beckman Coulter), and sequenced on an Illumina NextSeq 500 instrument to obtain 2 x 75-nt paired-end reads.

    Techniques: Modification, Sequencing

    FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.

    Journal: bioRxiv

    Article Title: Model-driven generation of artificial yeast promoters

    doi: 10.1101/748616

    Figure Lengend Snippet: FACS-seq experimental strategy and dataset overview. A: Schematic of tested libraries ( above ), indicating regions held constant in promoter design ( grey boxes ); schematic of two-color reporter device used to characterize promoter activity ( below ). “RAP1”, “GCR1”, “ZEV”: transcription factor binding sites; “TATA”: TATA box motif; “TSS”: transcription start site motif. B: Schematic of FACS-seq approach for high-throughput promoter activity characterization, in which next-generation sequencing (NGS)-derived histograms of sequence counts in FACS bins generated by sorting a library on promoter activity are used to derive promoter activity for each sequence in a library. C: Histogram of promoter activities (log10 ratio of mean GFP to mCherry intensity, in arbitrary units) in the final P GPD library. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in this analysis. D: Density scatter plot of induced and uninduced promoter activities measured in the final P ZEV library. Only sequences for which at least 20 NextSeq reads were counted in each replicate were used in this analysis.

    Article Snippet: In some experiments, the sample was additionally sequenced on an Illumina NextSeq by the Biohub, using 1×75 unpaired reads.

    Techniques: FACS, Activity Assay, Binding Assay, High Throughput Screening Assay, Next-Generation Sequencing, Derivative Assay, Sequencing, Generated

    Neural networks trained on P GPD and P ZEV data accurately predict promoter activity. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in analyses of P GPD data; only sequences for which at least 20 NextSeq reads were counted in each replicate were used in analyses of P ZEV data. A: Model loss curve for P GPD training; dashed line indicates epoch selected by early stopping for the final model. B: Predicted promoter activities versus FACS-seq measurements for held-out test data in the P GPD dataset. C: Model loss curve for P ZEV training; dashed line indicates epoch selected by early stopping for the final model. D: Predicted promoter activities in the uninduced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. E: Predicted promoter activities in the induced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. F: Predicted activation ratios (ratio of predicted induced and uninduced promoter activities) versus FACS-seq-derived activation ratios for held-out test data in the P ZEV dataset.

    Journal: bioRxiv

    Article Title: Model-driven generation of artificial yeast promoters

    doi: 10.1101/748616

    Figure Lengend Snippet: Neural networks trained on P GPD and P ZEV data accurately predict promoter activity. Only sequences for which at least 10 NextSeq reads were counted in each replicate were used in analyses of P GPD data; only sequences for which at least 20 NextSeq reads were counted in each replicate were used in analyses of P ZEV data. A: Model loss curve for P GPD training; dashed line indicates epoch selected by early stopping for the final model. B: Predicted promoter activities versus FACS-seq measurements for held-out test data in the P GPD dataset. C: Model loss curve for P ZEV training; dashed line indicates epoch selected by early stopping for the final model. D: Predicted promoter activities in the uninduced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. E: Predicted promoter activities in the induced condition versus FACS-seq measurements for held-out test data in the P ZEV dataset. F: Predicted activation ratios (ratio of predicted induced and uninduced promoter activities) versus FACS-seq-derived activation ratios for held-out test data in the P ZEV dataset.

    Article Snippet: In some experiments, the sample was additionally sequenced on an Illumina NextSeq by the Biohub, using 1×75 unpaired reads.

    Techniques: Activity Assay, FACS, Activation Assay, Derivative Assay

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green).

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T η ^ i _ I n d e l _ I l l u m i n a . Development of the AIC in case of an alternative parameter selection method based on RVI. Number of samples in the comparison data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the re-sequencing data set. Number of samples in the re-sequencing data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). Median coverage of the bases of the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the test data set. Number of samples in the test data set with 0x coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green). p ^ i and the actual probability p i for an SNV being a true positive.

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

    Journal: PLoS ONE

    Article Title: GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data

    doi: 10.1371/journal.pone.0171983

    Figure Lengend Snippet: Supporting information Sequencing information. Read alignment information. Variant calling and annotation. Filtration information. Parameter determination. Validation information. Generalized linear model and threshold. Data analysis—second approach. TCGA sample analysis. File providing detailed information on the called- and missed variants. File containing the called SNVs and information on their determined parameters in the TCGA training subset. File containing the called SNVs and information on their determined parameters in the TCGA test subset. R script determining the 18 parameters characterizing SNVs and 22 parameters characterizing indels called by GATK. R script calculating the relative variable importance (RVI) and determining information for normalization. R script determining the best GLM separating true from false positive SNV calls using forward selection based on AIC. R script determining the best GLM separating true from false positive indel calls using forward selection based on AIC. List of the genes, exons and their ENSEMBL transcript IDs that were targeted by Roche 454, Ion Torrent PGM and Illumina NextSeq. Base pairs (bp) in the target region, in exons in the target region and number of genes covered by 454, Ion Torrent and Illumina. Alignment statistics for the 454 data aligned with BWA mem. Alignment statistics for the Ion Torrent data aligned with TMAP. Alignment statistics for the Illumina NextSeq data aligned with BWA mem. Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ S N V _ 454 η ^ i _ S N V _ I o n T η ^ i _ S N V _ I l l u m i n a . Number of models containing a parameter and normalized relative variable importance (RVI) for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data. η ^ i _ I n d e l _ 454 η ^ i _ I n d e l _ I o n T

    Article Snippet: In case of 454 and Illumina NextSeq sequencing data the alignment algorithm was invoked using BWA mem.

    Techniques: Sequencing, Variant Assay, Filtration, Selection

    Schematic circular representation of complete genome sequences of KPC‐2‐producing A. hydrophila GSH8‐2. Short paired‐end whole‐genome sequencing was performed using an Illumina NextSeq 500 platform with a 300‐cycle NextSeq 500 Reagent Kit v2 (2 × 150‐mer). The complete genome sequences of the strains were determined using a PacBio Sequel sequencer for long‐read sequencing [Sequel SMRT Cell 1 M v2 (4/tray]; Sequel sequencing kit v2.1; insert size, approximately 10 kb). De novo assembly was performed using Canu version 1.4 (Koren et al ., 2017 ), minimap version 0.2‐r124 (Li, 2016 ), racon version 1.1.0 (Vaser et al ., 2017 ) and circulator version 1.5.3 (Hunt et al ., 2015 ). Error correction of tentative complete circular sequences was performed using Pilon version 1.18 with Illumina short reads (Walker et al ., 2014 ). Annotation was performed in Prokka version 1.11 (Seemann, 2014 ), InterPro v49.0 (Finn et al ., 2017 ) and NCBI‐BLASTP/BLASTX. Circular representations of complete genomic sequences were visualized using GView server (Petkau et al ., 2010 ). AMR genes were identified by homology searching against the ResFinder database (Zankari et al ., 2012 ). The class 1 integron was assigned in the INTEGRALL database ( http://integrall.bio.ua.pt/ ) (Moura et al ., 2009 ). Visualization of comparative plasmid ORFs organization was performed using Easyfig (Sullivan et al ., 2011 ). For representation of chromosomal DNA, from the inside: slot 1, GC skew; slot 2, GC content; slot 3, ORFs; slot 4, rRNA/tRNA; slots 5–7, BLASTatlas conserved gene analysis indicating three relative strains (see also Supporting Information Fig. S1 ); slot 8, prophage; slot 9, notable ARGs or ARG‐related genes (transposase, ARGs and reductases). In representations of circular plasmids, notable ORFs are highlighted as the indicated colour.

    Journal: Environmental Microbiology Reports

    Article Title: Potential KPC‐2 carbapenemase reservoir of environmental Aeromonas hydrophila and Aeromonas caviae isolates from the effluent of an urban wastewater treatment plant in Japan

    doi: 10.1111/1758-2229.12772

    Figure Lengend Snippet: Schematic circular representation of complete genome sequences of KPC‐2‐producing A. hydrophila GSH8‐2. Short paired‐end whole‐genome sequencing was performed using an Illumina NextSeq 500 platform with a 300‐cycle NextSeq 500 Reagent Kit v2 (2 × 150‐mer). The complete genome sequences of the strains were determined using a PacBio Sequel sequencer for long‐read sequencing [Sequel SMRT Cell 1 M v2 (4/tray]; Sequel sequencing kit v2.1; insert size, approximately 10 kb). De novo assembly was performed using Canu version 1.4 (Koren et al ., 2017 ), minimap version 0.2‐r124 (Li, 2016 ), racon version 1.1.0 (Vaser et al ., 2017 ) and circulator version 1.5.3 (Hunt et al ., 2015 ). Error correction of tentative complete circular sequences was performed using Pilon version 1.18 with Illumina short reads (Walker et al ., 2014 ). Annotation was performed in Prokka version 1.11 (Seemann, 2014 ), InterPro v49.0 (Finn et al ., 2017 ) and NCBI‐BLASTP/BLASTX. Circular representations of complete genomic sequences were visualized using GView server (Petkau et al ., 2010 ). AMR genes were identified by homology searching against the ResFinder database (Zankari et al ., 2012 ). The class 1 integron was assigned in the INTEGRALL database ( http://integrall.bio.ua.pt/ ) (Moura et al ., 2009 ). Visualization of comparative plasmid ORFs organization was performed using Easyfig (Sullivan et al ., 2011 ). For representation of chromosomal DNA, from the inside: slot 1, GC skew; slot 2, GC content; slot 3, ORFs; slot 4, rRNA/tRNA; slots 5–7, BLASTatlas conserved gene analysis indicating three relative strains (see also Supporting Information Fig. S1 ); slot 8, prophage; slot 9, notable ARGs or ARG‐related genes (transposase, ARGs and reductases). In representations of circular plasmids, notable ORFs are highlighted as the indicated colour.

    Article Snippet: These isolates were then subjected to whole‐genome sequencing (WGS) as described previously (Sekizuka et al ., ), using an Illumina NextSeq 500 platform with a 300‐cycle NextSeq 500 Reagent Kit v2 (2 × 150‐mer).

    Techniques: Sequencing, Genomic Sequencing, Plasmid Preparation