boxchart Search Results


90
MathWorks Inc boxchart
a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the <t>MATLAB</t> function <t>boxchart</t> . n.m.f., normalized mutation frequency; m.f., mutation frequencies.
Boxchart, supplied by MathWorks Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/boxchart/product/MathWorks Inc
Average 90 stars, based on 1 article reviews
boxchart - by Bioz Stars, 2026-04
90/100 stars
  Buy from Supplier

90
MathWorks Inc boxchart function
a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the <t>MATLAB</t> function <t>boxchart</t> . n.m.f., normalized mutation frequency; m.f., mutation frequencies.
Boxchart Function, supplied by MathWorks Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/boxchart function/product/MathWorks Inc
Average 90 stars, based on 1 article reviews
boxchart function - by Bioz Stars, 2026-04
90/100 stars
  Buy from Supplier

90
MathWorks Inc built-ins boxplot or boxchart
a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the <t>MATLAB</t> function <t>boxchart</t> . n.m.f., normalized mutation frequency; m.f., mutation frequencies.
Built Ins Boxplot Or Boxchart, supplied by MathWorks Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/built-ins boxplot or boxchart/product/MathWorks Inc
Average 90 stars, based on 1 article reviews
built-ins boxplot or boxchart - by Bioz Stars, 2026-04
90/100 stars
  Buy from Supplier

90
MathWorks Inc box plot
a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the <t>MATLAB</t> function <t>boxchart</t> . n.m.f., normalized mutation frequency; m.f., mutation frequencies.
Box Plot, supplied by MathWorks Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/box plot/product/MathWorks Inc
Average 90 stars, based on 1 article reviews
box plot - by Bioz Stars, 2026-04
90/100 stars
  Buy from Supplier

90
MathWorks Inc matlab r2020b
a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the <t>MATLAB</t> function <t>boxchart</t> . n.m.f., normalized mutation frequency; m.f., mutation frequencies.
Matlab R2020b, supplied by MathWorks Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/matlab r2020b/product/MathWorks Inc
Average 90 stars, based on 1 article reviews
matlab r2020b - by Bioz Stars, 2026-04
90/100 stars
  Buy from Supplier

90
MathWorks Inc boxchart command
a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the <t>MATLAB</t> function <t>boxchart</t> . n.m.f., normalized mutation frequency; m.f., mutation frequencies.
Boxchart Command, supplied by MathWorks Inc, used in various techniques. Bioz Stars score: 90/100, based on 1 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
https://www.bioz.com/result/boxchart command/product/MathWorks Inc
Average 90 stars, based on 1 article reviews
boxchart command - by Bioz Stars, 2026-04
90/100 stars
  Buy from Supplier

Image Search Results


a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the MATLAB function boxchart . n.m.f., normalized mutation frequency; m.f., mutation frequencies.

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , A diagram of the PER-seq method. b , Normalized mutation frequency across all samples, shown on a log 10 scale, with respect to the required number of linear copies (each with a unique linear-copy identifier). The mutation frequencies were normalized by the average mutation frequency in molecules with at least three linear copies in each sample. c , The observed versus expected frequencies of plasmids with artificially introduced mutations spiked in predefined ratios . Each dot represents one artificial mutant in one sample. Pearson correlation coefficient R and P values are shown. d , e , Error spectra of individual base changes for Klenow-EXO − ( d ) and KAPA-U + ( e ) measured by PER-seq (after background subtraction and normalization for trinucleotides in the ROI, as in all figures; ). n = 3 replicates each. The green lines represent the range of previously measured base change error frequencies of Klenow-EXO − (ref. ). f , The average error frequency for Klenow-EXO − and KAPA-U + measured by PER-seq. P values determined by two-sided t -test and the ratio of medians are shown. n = 3 replicates each. g , h , Strand-specific error signatures of Klenow-EXO − ( g ) and KAPA-U + ( h ), computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-seq and averaged across three replicates. For example, T:dG denotes the misincorporation of guanine opposite thymine on the template strand. Boxplots are plotted with the MATLAB function boxchart . n.m.f., normalized mutation frequency; m.f., mutation frequencies.

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis

a , The average cell-free PER-POLE-P286R error signature measured by PER-seq and scaled as a probability density function (PDF) to sum to one. All CpGs in the template DNA were methylated. b , The average spectrum of mutations in 17 patients with cancer with a combination of a pathogenic mutation in the POLE proofreading domain and a defect in the MMR pathway (POLEd and MMRd cancers), normalized for trinucleotide frequency and scaled as a PDF in the same way as in a . c , A distribution of the cosine similarity between mutational spectra of human cancer samples to the PER-POLE-P286R error signature shown in a (both scaled as a PDF). The red boxplot shows cosine similarity values for POLEd and MMRd cancers, and the gray boxplot shows cosine similarity values for all other cancers. P value determined by two-sided, two-sample Mann–Whitney U test. d , A reconstruction of the PER-POLE-P286R error signature by SBS mutational signatures of the COSMIC-V3 database, using non-negative least square regression . The linear coefficients for each of the four SBS signatures are shown in gray. The last graph in d shows the reconstructed vector (computed as a linear combination of the four SBS signatures) and the resulting cosine similarity to the original PER-POLE-P286R error signature. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , The average cell-free PER-POLE-P286R error signature measured by PER-seq and scaled as a probability density function (PDF) to sum to one. All CpGs in the template DNA were methylated. b , The average spectrum of mutations in 17 patients with cancer with a combination of a pathogenic mutation in the POLE proofreading domain and a defect in the MMR pathway (POLEd and MMRd cancers), normalized for trinucleotide frequency and scaled as a PDF in the same way as in a . c , A distribution of the cosine similarity between mutational spectra of human cancer samples to the PER-POLE-P286R error signature shown in a (both scaled as a PDF). The red boxplot shows cosine similarity values for POLEd and MMRd cancers, and the gray boxplot shows cosine similarity values for all other cancers. P value determined by two-sided, two-sample Mann–Whitney U test. d , A reconstruction of the PER-POLE-P286R error signature by SBS mutational signatures of the COSMIC-V3 database, using non-negative least square regression . The linear coefficients for each of the four SBS signatures are shown in gray. The last graph in d shows the reconstructed vector (computed as a linear combination of the four SBS signatures) and the resulting cosine similarity to the original PER-POLE-P286R error signature. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Methylation, Mutagenesis, MANN-WHITNEY, Plasmid Preparation

a , A heatmap and hierarchical clustering on a pairwise cosine similarity matrix between PER-POLE-P286R and PER-POLE-EXO- samples. The cosine similarity is computed on the strand-specific error spectra (that is, each with 192 error types) after background subtraction and trinucleotide frequency normalization. The hierarchical clustering is computed using the MATLAB functions linkage, optimalleaforder and dendrogram with default parameters. b , c , Error/mutational spectra rescaled within each of the six nucleotide substitutions (divided by the sum of all bars of the same color). In other words, this visualization shows the relative mutation frequencies within each nucleotide substitution group. b , The average in vitro POLE-P286R (‘PER-POLE-P286R’) error spectrum measured by PER-seq, after subtraction of assay-specific background, normalized for trinucleotide frequency and scaled as a probability density function in each of the six substitution types. c , The average in vivo spectrum of mutations in 17 human cancers with a combination of a pathogenic mutation in the POLE proofreading domain and a defect in the mismatch repair pathway (POLEd and MMRd cancers), normalized for trinucleotide frequency and scaled as a probability density function in each of the six substitution types. The numbers below the profile plot in c denote the cosine similarity values between b and c computed for each of the six substitution types. Interestingly, all six substitution classes exhibit a relatively high cosine similarity, with a minimum of 0.8 in T>A and a maximum of 0.97 in T>G (mainly TpT>GpT). The overall cosine similarity on the rescaled profiles is 0.9. d , A reconstruction of the PER-POLE-P286R error signature by SBS mutational signatures of the COSMIC-V2 database, using non-negative least square regression . The linear coefficients for each of the four SBS signatures are shown in gray. The last panel shows the reconstructed vector (computed as a linear combination of the four SBS signatures) and the resulting cosine similarity to the original PER-POLE-P286R error signature. e , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. Each dot represents a value in one sample and one 5mC bin (N: 17 for POLEd and MMRd, 66 for POLEd, 329 for MMRd, 3181 for PROF). Spearman correlation coefficient and two-sided P-value are shown on top. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , A heatmap and hierarchical clustering on a pairwise cosine similarity matrix between PER-POLE-P286R and PER-POLE-EXO- samples. The cosine similarity is computed on the strand-specific error spectra (that is, each with 192 error types) after background subtraction and trinucleotide frequency normalization. The hierarchical clustering is computed using the MATLAB functions linkage, optimalleaforder and dendrogram with default parameters. b , c , Error/mutational spectra rescaled within each of the six nucleotide substitutions (divided by the sum of all bars of the same color). In other words, this visualization shows the relative mutation frequencies within each nucleotide substitution group. b , The average in vitro POLE-P286R (‘PER-POLE-P286R’) error spectrum measured by PER-seq, after subtraction of assay-specific background, normalized for trinucleotide frequency and scaled as a probability density function in each of the six substitution types. c , The average in vivo spectrum of mutations in 17 human cancers with a combination of a pathogenic mutation in the POLE proofreading domain and a defect in the mismatch repair pathway (POLEd and MMRd cancers), normalized for trinucleotide frequency and scaled as a probability density function in each of the six substitution types. The numbers below the profile plot in c denote the cosine similarity values between b and c computed for each of the six substitution types. Interestingly, all six substitution classes exhibit a relatively high cosine similarity, with a minimum of 0.8 in T>A and a maximum of 0.97 in T>G (mainly TpT>GpT). The overall cosine similarity on the rescaled profiles is 0.9. d , A reconstruction of the PER-POLE-P286R error signature by SBS mutational signatures of the COSMIC-V2 database, using non-negative least square regression . The linear coefficients for each of the four SBS signatures are shown in gray. The last panel shows the reconstructed vector (computed as a linear combination of the four SBS signatures) and the resulting cosine similarity to the original PER-POLE-P286R error signature. e , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. Each dot represents a value in one sample and one 5mC bin (N: 17 for POLEd and MMRd, 66 for POLEd, 329 for MMRd, 3181 for PROF). Spearman correlation coefficient and two-sided P-value are shown on top. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis, In Vitro, In Vivo, Plasmid Preparation, Methylation Sequencing

a , The average error frequency for the three polymerases (wild-type (WT), exonuclease-deficience (EXO − ) and P286R mutant) measured by PER-seq. P values determined by paired two-sided t -test and the ratio of medians are shown. All CpGs in the template DNA were methylated. n = 4 replicates each. b , A diagram of the most common misincorporations by Pol ε. The top strand represents the DNA template, and the bottom strand is filled by Pol ε. The red boxes represent the base that is incorrectly incorporated by Pol ε. c – e , Strand-specific error signatures of P286R ( c ), EXO − ( d ) and wild-type ( e ) polymerases, computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases, measured by PER-seq and averaged across four samples. f , Average mutation frequency observed in WGS data of POLEd and MMRd human cancers in the leading (dark blue) and lagging (orange) replication strand templates, normalized for trinucleotides in the two strands. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , The average error frequency for the three polymerases (wild-type (WT), exonuclease-deficience (EXO − ) and P286R mutant) measured by PER-seq. P values determined by paired two-sided t -test and the ratio of medians are shown. All CpGs in the template DNA were methylated. n = 4 replicates each. b , A diagram of the most common misincorporations by Pol ε. The top strand represents the DNA template, and the bottom strand is filled by Pol ε. The red boxes represent the base that is incorrectly incorporated by Pol ε. c – e , Strand-specific error signatures of P286R ( c ), EXO − ( d ) and wild-type ( e ) polymerases, computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases, measured by PER-seq and averaged across four samples. f , Average mutation frequency observed in WGS data of POLEd and MMRd human cancers in the leading (dark blue) and lagging (orange) replication strand templates, normalized for trinucleotides in the two strands. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis, Methylation

a , Average mutational spectra in POLEd and MMRd, POLEd (and MMRp), MMRd (and POLEp) and PROF (=POLEp and MMRp) human cancer samples. b , Distribution of frequency of CpG>TpG mutations (dark red, per CpG) compared to other mutation types (gray, average frequency of the other 92 mutation types, normalized for trinucleotide occurrences) in these four groups of cancer samples. P values determined by two-sided sign test are shown; P values rounded to 0 if P < 5 × 10 − 324 . c , A log 2 transformation of the ratio of CpG>TpG mutation frequency in the leading and lagging strands. High values represent enrichment on the leading-strand template. P values determined by two-sided sign test are shown. d , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. The data points in each boxplot represent samples in each group ( n as in b ). e , Percentage of samples with CpG>TpG mutation frequency higher on the leading strand than the lagging strand, stratified by cancer tissue (columns) and sequence context (rows), with the first row representing all CpGs grouped together. Red values represent higher CpG>TpG frequency on the leading-strand template, and blue values represent higher CpG>TpG frequency on the lagging strand template. To allow comparison of WES and WGS data, analyses in a – e were restricted to exonic regions only. To make the comparisons tissue adjusted, PROF graphs in a – d are restricted to the tissue types that contain POLEd and/or MMRd samples (colon/rectum, gastric, uterus and brain); all tissue types are shown in e . Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , Average mutational spectra in POLEd and MMRd, POLEd (and MMRp), MMRd (and POLEp) and PROF (=POLEp and MMRp) human cancer samples. b , Distribution of frequency of CpG>TpG mutations (dark red, per CpG) compared to other mutation types (gray, average frequency of the other 92 mutation types, normalized for trinucleotide occurrences) in these four groups of cancer samples. P values determined by two-sided sign test are shown; P values rounded to 0 if P < 5 × 10 − 324 . c , A log 2 transformation of the ratio of CpG>TpG mutation frequency in the leading and lagging strands. High values represent enrichment on the leading-strand template. P values determined by two-sided sign test are shown. d , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. The data points in each boxplot represent samples in each group ( n as in b ). e , Percentage of samples with CpG>TpG mutation frequency higher on the leading strand than the lagging strand, stratified by cancer tissue (columns) and sequence context (rows), with the first row representing all CpGs grouped together. Red values represent higher CpG>TpG frequency on the leading-strand template, and blue values represent higher CpG>TpG frequency on the lagging strand template. To allow comparison of WES and WGS data, analyses in a – e were restricted to exonic regions only. To make the comparisons tissue adjusted, PROF graphs in a – d are restricted to the tissue types that contain POLEd and/or MMRd samples (colon/rectum, gastric, uterus and brain); all tissue types are shown in e . Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis, IF-P, Transformation Assay, Methylation Sequencing, Sequencing, Comparison

a , Average mutational spectra in POLEd and MMRd, POLEd (and MMRp), MMRd (and POLEp) and PROF (=POLEp and MMRp) human cancer samples. b , Distribution of frequency of CpG>TpG mutations (dark red, per CpG) compared to other mutation types (gray, average frequency of the other 92 mutation types, normalized for trinucleotide occurrences) in these four groups of cancer samples. The gray text below the boxplots shows ‘N’: the number of samples, ‘higher in CpGs’: the percentage of samples with higher CpG>TpG mutation frequency compared to the frequency of other mutation types and ‘P’: two-sided sign test P-value comparison between the CpG>TpG vs. other mutation frequencies. c , A log 2 transformation of the ratio of CpG>TpG mutation frequency in the leading and lagging strands. High values represent enrichment on the leading-strand template. Two-sided sign test P-value is shown in each group. d , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. The data points in each boxplot represent samples in each group (N as in b ). Two-sided sign test P-value is used to compare CpG>TpG frequency between the first and the last bin. e , The heatmap color and text represent the percentage of samples with CpG>TpG mutation frequency higher on the leading strand compared to the lagging strand, stratified by cancer tissue (columns) and sequence context (rows), with the first row representing all CpGs grouped together. Red values represent higher CpG>TpG frequency on the leading-strand template, and blue values represent higher CpG>TpG frequency on the lagging strand template. To make the comparisons tissue adjusted, PROF panels in a – d are restricted to the tissue types that contain POLEd and/or MMRd samples (colon/rectum, gastric, uterus and brain). e shows all tissue types. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , Average mutational spectra in POLEd and MMRd, POLEd (and MMRp), MMRd (and POLEp) and PROF (=POLEp and MMRp) human cancer samples. b , Distribution of frequency of CpG>TpG mutations (dark red, per CpG) compared to other mutation types (gray, average frequency of the other 92 mutation types, normalized for trinucleotide occurrences) in these four groups of cancer samples. The gray text below the boxplots shows ‘N’: the number of samples, ‘higher in CpGs’: the percentage of samples with higher CpG>TpG mutation frequency compared to the frequency of other mutation types and ‘P’: two-sided sign test P-value comparison between the CpG>TpG vs. other mutation frequencies. c , A log 2 transformation of the ratio of CpG>TpG mutation frequency in the leading and lagging strands. High values represent enrichment on the leading-strand template. Two-sided sign test P-value is shown in each group. d , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. The data points in each boxplot represent samples in each group (N as in b ). Two-sided sign test P-value is used to compare CpG>TpG frequency between the first and the last bin. e , The heatmap color and text represent the percentage of samples with CpG>TpG mutation frequency higher on the leading strand compared to the lagging strand, stratified by cancer tissue (columns) and sequence context (rows), with the first row representing all CpGs grouped together. Red values represent higher CpG>TpG frequency on the leading-strand template, and blue values represent higher CpG>TpG frequency on the lagging strand template. To make the comparisons tissue adjusted, PROF panels in a – d are restricted to the tissue types that contain POLEd and/or MMRd samples (colon/rectum, gastric, uterus and brain). e shows all tissue types. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis, Comparison, Transformation Assay, Methylation Sequencing, Sequencing

a , Average mutational spectra in POLEd and MMRd, POLEd (and MMRp), MMRd (and POLEp) and PROF (=POLEp and MMRp) human cancer samples. b , Distribution of frequency of CpG>TpG mutations (dark red, per CpG) compared to other mutation types (gray, average frequency of the other 92 mutation types, normalized for trinucleotide occurrences) in these four groups of cancer samples. The gray text below the boxplots shows ‘N’: the number of samples, ‘higher in CpGs’: the percentage of samples with higher CpG>TpG mutation frequency compared to the frequency of other mutation types and ‘P’: two-sided sign test P-value comparison between the CpG>TpG vs. other mutation frequencies. c , A log 2 transformation of the ratio of CpG>TpG mutation frequency in the leading and lagging strands. High values represent enrichment on the leading-strand template. Two-sided sign test P-value is shown in each group. d , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. The data points in each boxplot represent samples in each group (N as in b ). Two-sided sign test P-value is used to compare CpG>TpG frequency between the first and the last bin. e , The heatmap color and text represent the percentage of samples with CpG>TpG mutation frequency higher on the leading strand compared to the lagging strand, stratified by cancer tissue (columns) and sequence context (rows), with the first row representing all CpGs grouped together. Red values represent higher CpG>TpG frequency on the leading-strand template, and blue values represent higher CpG>TpG frequency on the lagging strand template. To make the comparisons tissue adjusted, PROF panels in a – d are restricted to the tissue types that contain POLEd and/or MMRd samples (colon/rectum, gastric, uterus and brain). e shows all tissue types. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , Average mutational spectra in POLEd and MMRd, POLEd (and MMRp), MMRd (and POLEp) and PROF (=POLEp and MMRp) human cancer samples. b , Distribution of frequency of CpG>TpG mutations (dark red, per CpG) compared to other mutation types (gray, average frequency of the other 92 mutation types, normalized for trinucleotide occurrences) in these four groups of cancer samples. The gray text below the boxplots shows ‘N’: the number of samples, ‘higher in CpGs’: the percentage of samples with higher CpG>TpG mutation frequency compared to the frequency of other mutation types and ‘P’: two-sided sign test P-value comparison between the CpG>TpG vs. other mutation frequencies. c , A log 2 transformation of the ratio of CpG>TpG mutation frequency in the leading and lagging strands. High values represent enrichment on the leading-strand template. Two-sided sign test P-value is shown in each group. d , CpG>TpG mutation frequency in CpGs binned by their 5mC levels, measured by bisulfite sequencing in a matched tissue of origin. The data points in each boxplot represent samples in each group (N as in b ). Two-sided sign test P-value is used to compare CpG>TpG frequency between the first and the last bin. e , The heatmap color and text represent the percentage of samples with CpG>TpG mutation frequency higher on the leading strand compared to the lagging strand, stratified by cancer tissue (columns) and sequence context (rows), with the first row representing all CpGs grouped together. Red values represent higher CpG>TpG frequency on the leading-strand template, and blue values represent higher CpG>TpG frequency on the lagging strand template. To make the comparisons tissue adjusted, PROF panels in a – d are restricted to the tissue types that contain POLEd and/or MMRd samples (colon/rectum, gastric, uterus and brain). e shows all tissue types. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis, Comparison, Transformation Assay, Methylation Sequencing, Sequencing

a , A reconstruction of the mutational profile of the P286R mutation in mES cells by SBS mutational signatures of the COSMIC-V3 database, using non-negative least square regression. The linear coefficients for each of the four SBS signatures are shown in gray. The last graph in a shows the reconstructed vector (computed as a linear combination of the four SBS signatures) and the resulting cosine similarity to the original mES cell P286R mutational profile. b , Normalized mutational profile from WGS of mES cell POLE-P286R clones after 2 months of mutation accumulation and single-cell bottlenecking, averaged across two samples. c , CpG>TpG mutation frequency in the mES cell clones (WT versus P286R) in lowly (<20%) and highly (>80%) methylated CpGs, determined from whole-genome bisulfite sequencing of E14 mES cells (GEO GSM4818066 ). d , CpG>TpG mutation frequency in the mES cell clones in the lagging and leading strand, estimated from mouse replication timing data. e , Normalized mutational profile from tumor WES from CRISPR–Cas9 knock-in germline POLE-P286R or S459F mouse models , averaged across 34 samples. f , CpG>TpG mutation frequency in the mouse tumors (P286R versus S459F versus S459F/−) in lowly (<20%) and highly (>80%) methylated CpGs, determined from whole-genome bisulfite sequencing of mouse thymus (ENCODE ENCFF850HBL). g , CpG>TpG mutation frequency in the mouse tumors in the lagging and leading strand. Boxplots are plotted with the MATLAB function boxchart . P values were determined by two-sided sign test.

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , A reconstruction of the mutational profile of the P286R mutation in mES cells by SBS mutational signatures of the COSMIC-V3 database, using non-negative least square regression. The linear coefficients for each of the four SBS signatures are shown in gray. The last graph in a shows the reconstructed vector (computed as a linear combination of the four SBS signatures) and the resulting cosine similarity to the original mES cell P286R mutational profile. b , Normalized mutational profile from WGS of mES cell POLE-P286R clones after 2 months of mutation accumulation and single-cell bottlenecking, averaged across two samples. c , CpG>TpG mutation frequency in the mES cell clones (WT versus P286R) in lowly (<20%) and highly (>80%) methylated CpGs, determined from whole-genome bisulfite sequencing of E14 mES cells (GEO GSM4818066 ). d , CpG>TpG mutation frequency in the mES cell clones in the lagging and leading strand, estimated from mouse replication timing data. e , Normalized mutational profile from tumor WES from CRISPR–Cas9 knock-in germline POLE-P286R or S459F mouse models , averaged across 34 samples. f , CpG>TpG mutation frequency in the mouse tumors (P286R versus S459F versus S459F/−) in lowly (<20%) and highly (>80%) methylated CpGs, determined from whole-genome bisulfite sequencing of mouse thymus (ENCODE ENCFF850HBL). g , CpG>TpG mutation frequency in the mouse tumors in the lagging and leading strand. Boxplots are plotted with the MATLAB function boxchart . P values were determined by two-sided sign test.

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis, Plasmid Preparation, Clone Assay, Methylation, Methylation Sequencing, CRISPR, Knock-In

a , PER-EXTRACT-seq error signature of filling gapped plasmids in nuclear extracts from cells with POLEP286R. The error signature is computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-EXTRACT-seq and averaged across available samples: 5 samples from nuclear extracts from the mESC clones with POLEP286R mutation, and 4 samples from nuclear extracts from HCC2998 cell line (that naturally harbors a POLEP286R/+ mutation). b – d , PER-EXTRACT-seq measured C>T (C:dA) error rate with respect to the modification state and cytosine sequence contexts: CpG and CpH (all other C contexts). Every dot represents average error frequency in the given context in one sample. Samples with all CpGs methylated by the M.SssI DNA methyltransferase are shown with the plus sign in the bottom row. The color of the boxplots highlights whether the template cytosine is methylated (5mC, dark red) or unmodified (C, light blue) in the given sample and sequence context. Note that M.SssI presence does not change modification state in CpH due to its selectivity to CpGs. A two-sided paired t-test was used to compare the values between the groups, and the ratio of the medians is shown below the significant P-values. The values from PER-EXTRACT-seq for filling in HCC2998 ( b ), mESC POLEP286R ( c ) and mESC WT ( d ) nuclear extracts are shown. e , PER-EXTRACT-seq error signature of incubating the control ungapped plasmids in nuclear extracts from cells with POLEP286R, averaged across available samples: 5 samples from nuclear extracts from the mESC clones with POLEP286R mutation, and 4 samples from nuclear extracts from HCC2998 cell line. f – h , PER-EXTRACT-seq measured C>T (C:dA) error rate in the control ungapped plasmids. A two-sided paired t-test was used to compare the values between the groups, and the ratio of the medians is shown below the significant P-values. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , PER-EXTRACT-seq error signature of filling gapped plasmids in nuclear extracts from cells with POLEP286R. The error signature is computed as error (nucleotide misincorporation) spectra with respect to the template 5′ and 3′ neighboring bases (that is, the template trinucleotide), measured by PER-EXTRACT-seq and averaged across available samples: 5 samples from nuclear extracts from the mESC clones with POLEP286R mutation, and 4 samples from nuclear extracts from HCC2998 cell line (that naturally harbors a POLEP286R/+ mutation). b – d , PER-EXTRACT-seq measured C>T (C:dA) error rate with respect to the modification state and cytosine sequence contexts: CpG and CpH (all other C contexts). Every dot represents average error frequency in the given context in one sample. Samples with all CpGs methylated by the M.SssI DNA methyltransferase are shown with the plus sign in the bottom row. The color of the boxplots highlights whether the template cytosine is methylated (5mC, dark red) or unmodified (C, light blue) in the given sample and sequence context. Note that M.SssI presence does not change modification state in CpH due to its selectivity to CpGs. A two-sided paired t-test was used to compare the values between the groups, and the ratio of the medians is shown below the significant P-values. The values from PER-EXTRACT-seq for filling in HCC2998 ( b ), mESC POLEP286R ( c ) and mESC WT ( d ) nuclear extracts are shown. e , PER-EXTRACT-seq error signature of incubating the control ungapped plasmids in nuclear extracts from cells with POLEP286R, averaged across available samples: 5 samples from nuclear extracts from the mESC clones with POLEP286R mutation, and 4 samples from nuclear extracts from HCC2998 cell line. f – h , PER-EXTRACT-seq measured C>T (C:dA) error rate in the control ungapped plasmids. A two-sided paired t-test was used to compare the values between the groups, and the ratio of the medians is shown below the significant P-values. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Clone Assay, Mutagenesis, Modification, Sequencing, Methylation, Control

a , A comparison of the PER-seq measured CpG>TpG error rate in 5mC per single round of replication (purple color) versus previously published estimates of in vitro spontaneous deamination rate of 5mC in double-stranded DNA at 37 °C (5.8 × 10 −13 per second) (blue color). The x axis shows the estimated length of incubation at 37 °C that would generate the same number of CpG>TpG errors as a single round of replication by Pol ε (WT, exonuclease-deficient or P286R). The y axis shows the resulting frequency of 5mCpG>TpG errors. b , c , CpG>TpG mutations are depleted in MMR-active (early replicating ( b ) or H3K36me3-enriched ( c )) regions in MMRp but not/less so in MMRd WGS samples. The y axis shows a log 2 -transformed ratio of CpG>TpG mutation frequency in early/late ( b ) and inside/outside H3K36me3-marked ( c ) regions. Two-sided sign test P values (shown below each boxplot) were used to to evaluate whether the values differ from zero. P values comparing samples (shown above each boxplot) were determined by two-sided t -test with an uneven variance. d – f , The PER-seq measured C>T (C:dA) error rate with respect to the modification state and cytosine sequence contexts—CpG, dcm (C C AGG and C C TGG) and CpH (all other C contexts). Every dot represents the average error frequency in the given context in one sample. Samples with all CpGs methylated by the M.SssI DNA methyltransferase are shown with the plus sign in the bottom row. The color of the boxplots highlights whether the cytosine is methylated (5mC, dark red) or unmodified (C, teal) in the given sample and sequence context. Note that M.SssI presence does not change modification state in CpH or dcm contexts due to its selectivity to CpGs. A paired two-sided t -test was used to compare the values between the groups, and the ratio of the medians is shown below the significant P values. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , A comparison of the PER-seq measured CpG>TpG error rate in 5mC per single round of replication (purple color) versus previously published estimates of in vitro spontaneous deamination rate of 5mC in double-stranded DNA at 37 °C (5.8 × 10 −13 per second) (blue color). The x axis shows the estimated length of incubation at 37 °C that would generate the same number of CpG>TpG errors as a single round of replication by Pol ε (WT, exonuclease-deficient or P286R). The y axis shows the resulting frequency of 5mCpG>TpG errors. b , c , CpG>TpG mutations are depleted in MMR-active (early replicating ( b ) or H3K36me3-enriched ( c )) regions in MMRp but not/less so in MMRd WGS samples. The y axis shows a log 2 -transformed ratio of CpG>TpG mutation frequency in early/late ( b ) and inside/outside H3K36me3-marked ( c ) regions. Two-sided sign test P values (shown below each boxplot) were used to to evaluate whether the values differ from zero. P values comparing samples (shown above each boxplot) were determined by two-sided t -test with an uneven variance. d – f , The PER-seq measured C>T (C:dA) error rate with respect to the modification state and cytosine sequence contexts—CpG, dcm (C C AGG and C C TGG) and CpH (all other C contexts). Every dot represents the average error frequency in the given context in one sample. Samples with all CpGs methylated by the M.SssI DNA methyltransferase are shown with the plus sign in the bottom row. The color of the boxplots highlights whether the cytosine is methylated (5mC, dark red) or unmodified (C, teal) in the given sample and sequence context. Note that M.SssI presence does not change modification state in CpH or dcm contexts due to its selectivity to CpGs. A paired two-sided t -test was used to compare the values between the groups, and the ratio of the medians is shown below the significant P values. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Comparison, In Vitro, Incubation, Transformation Assay, Mutagenesis, Modification, Sequencing, Methylation

a , Diagram of the four strands sequenced in PER-seq (PD, PT, D, T) and how their values are used to determine the true-positive polymerase error rate in the daughter strand after background subtraction (dark red) by subtracting background (blue) from the raw mutation frequency in the daughter strand (yellow). The background then consists of two components: potential gapping damage (green) that could have happened to the template strand when single-stranded and before/while being filled, and a general background (purple) estimated by the raw mutation frequency in the parental daughter (PD) strand. Finally, the gapping damage is estimated as the difference between the template (T; darker blue) and parental template (PT; dark orange) strands. Of note, only fully filled molecules can undergo successful restriction digest and downstream library preparation for both the template and daughter strands, and therefore unfilled plasmids do not confound the results. In other words, by ‘template’ we mean the template strand of the ROI after filling by the respective polymerases. b , The CpG>TpG mutation frequency for all the values described in a . N = 4 replicates each. Boxplots are plotted with the MATLAB function boxchart .

Journal: Nature Genetics

Article Title: Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides

doi: 10.1038/s41588-024-01945-x

Figure Lengend Snippet: a , Diagram of the four strands sequenced in PER-seq (PD, PT, D, T) and how their values are used to determine the true-positive polymerase error rate in the daughter strand after background subtraction (dark red) by subtracting background (blue) from the raw mutation frequency in the daughter strand (yellow). The background then consists of two components: potential gapping damage (green) that could have happened to the template strand when single-stranded and before/while being filled, and a general background (purple) estimated by the raw mutation frequency in the parental daughter (PD) strand. Finally, the gapping damage is estimated as the difference between the template (T; darker blue) and parental template (PT; dark orange) strands. Of note, only fully filled molecules can undergo successful restriction digest and downstream library preparation for both the template and daughter strands, and therefore unfilled plasmids do not confound the results. In other words, by ‘template’ we mean the template strand of the ROI after filling by the respective polymerases. b , The CpG>TpG mutation frequency for all the values described in a . N = 4 replicates each. Boxplots are plotted with the MATLAB function boxchart .

Article Snippet: Boxplots are plotted with the MATLAB function boxchart ( ).

Techniques: Mutagenesis