contigs Search Results


  • Logo
  • About
  • News
  • Press Release
  • Team
  • Advisors
  • Partners
  • Contact
  • Bioz Stars
  • Bioz vStars
  • 94
    Thermo Fisher contig express
    Contig Express, supplied by Thermo Fisher, used in various techniques. Bioz Stars score: 94/100, based on 320 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contig express/product/Thermo Fisher
    Average 94 stars, based on 320 article reviews
    Price from $9.99 to $1999.99
    contig express - by Bioz Stars, 2021-01
    94/100 stars
      Buy from Supplier

    91
    Pacific Biosciences pacbio contigs
    Cumulative number of assembled nucleotides in <t>contigs</t> of different minimum lengths for ( a ) Link_ADI, ( b ) unClos_1, and ( c ) unFirm_1. Each line corresponds to a different sample (Link_ADI or eCI, where noted), sequencing method (HiSeq [HS] or <t>PacBio</t> [PB]), different assembly method (co-assembly across samples Link_ADI and eCI, hybrid using mapped reads from HiSeq and PacBio, or hybrid using contigs from HiSeq and PacBio), or assembly program (CAP3, IDBA_UD, MIRA, or SOAPdenovo).
    Pacbio Contigs, supplied by Pacific Biosciences, used in various techniques. Bioz Stars score: 91/100, based on 495 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/pacbio contigs/product/Pacific Biosciences
    Average 91 stars, based on 495 article reviews
    Price from $9.99 to $1999.99
    pacbio contigs - by Bioz Stars, 2021-01
    91/100 stars
      Buy from Supplier

    92
    Pacific Biosciences contigs
    The statistics of PacBio long reads derived from TGS and corrected by LSC using Hiseq sequencing data. ( a ) PacBio corrected long reads supported by the Hiseq data with varied coverage. ( b ) Numbers distribution of PacBio long reads before and after correction using LSC. ( c ) Number of PacBio corrected long reads before and after duplication-removal using our developed methods. ( d ) The number distribution of PacBio corrected long reads and Hiseq assembled <t>contigs</t> at various lengths.
    Contigs, supplied by Pacific Biosciences, used in various techniques. Bioz Stars score: 92/100, based on 1346 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Pacific Biosciences
    Average 92 stars, based on 1346 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    Unigene contigs
    Details of the selection process for field pea <t>contigs.</t> K - Kaspa transcriptome and P - Parafield transcriptome
    Contigs, supplied by Unigene, used in various techniques. Bioz Stars score: 92/100, based on 707 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Unigene
    Average 92 stars, based on 707 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    90
    Oxford Nanopore contigs
    Assembly of the B73-Ab10 genome. a Whole-genome view. For each chromosome, the top to bottom tracks are gene density, Cinful-Zeon retrotransposon density, Gypsy superfamily retrotransposon density in 10 Kb sliding windows, repeat location (knob180 in blue, TR-1 in red, 45S rDNA in teal, CentC in magenta), and the distribution of gapless <t>contigs.</t> CENH3 ChIP-seq peaks identifying centromeres are marked by orange rectangles. The inset shows the centromere on chromosome 3, TR-1-rich knob on chromosome 4, and knob180-rich knob on chromosome 7. The five most common retroelement families are shown for each panel, along with centromeric retrotransposons (CRM) for the centromere. CENH3 enrichment in chromosome 3 is displayed in a heatmap. b The impact of assembly merging over a CentC-rich region on chromosome 9. Seven contigs (orange, above) from the PacBio assembly were originally misassembled, as can be seen in the alignment to the Bionano map (connecting lines show matching sites). CentC tracts and gaps are annotated. Assembly merging corrected the output, leaving an 11-Kb gap that was filled with <t>nanopore</t> reads. c Sequence alignment between normal chromosome 10 from B73 (N10) (140–152 Mb) and Ab10 (140–195 Mb) from B73-Ab10. Annotation is as in a , with Kindr genes marked with black bars in the top track. Links show homologous regions larger than 500 bp
    Contigs, supplied by Oxford Nanopore, used in various techniques. Bioz Stars score: 90/100, based on 81 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Oxford Nanopore
    Average 90 stars, based on 81 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    90/100 stars
      Buy from Supplier

    92
    Celera contigs
    Each pair of plots give an overview of the comparisons of the quality of the assemblies across assemblers for E. coli and yeast datasets. a b : Histograms with error bars plotted between % of 2D reads and N50_value of an assembly show the variation in N50 value of an assembly among different assembler algorithms and how it varies with respect to the data size. c d : Histograms with error bars plotted between % of 2D reads and number of <t>contigs</t> generated from an assembly, shows how the number of contigs generated vary with respect to the mean contig length for each respective assembler algorithm across various bins of respective datasets. e f : Histograms showing the percentage of 2D reads employed on X-axis versus the average length of the contigs obtained using each algorithm. g h : Histograms showing the sum of the lengths of all the contigs generated by an assembler as a function of the percentage of the total reads employed in the assembly. In each set of plots, left panel corresponds to E. coli dataset while the plots in the right panel correspond to the Yeast dataset. In all the plots labeled numeric values on histograms indicate corresponding values of the metric in respective color representing each tool
    Contigs, supplied by Celera, used in various techniques. Bioz Stars score: 92/100, based on 401 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Celera
    Average 92 stars, based on 401 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    Illumina Inc contigs
    BlobPlot of the genome assembly before removing contamination. Each circle is a contig proportionally scaled by contig length and coloured by taxonomic annotation based on BLAST similarity search results. <t>Contigs</t> are positioned based on the GC content (X-axis) and the coverage of PacBio reads (Y-axis). There are some contigs of Proteobacteria origin at high GC and variable coverage indicating possible contamination. These contigs were removed from the assembly.
    Contigs, supplied by Illumina Inc, used in various techniques. Bioz Stars score: 92/100, based on 3564 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Illumina Inc
    Average 92 stars, based on 3564 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    Bioedit Company contigs
    BlobPlot of the genome assembly before removing contamination. Each circle is a contig proportionally scaled by contig length and coloured by taxonomic annotation based on BLAST similarity search results. <t>Contigs</t> are positioned based on the GC content (X-axis) and the coverage of PacBio reads (Y-axis). There are some contigs of Proteobacteria origin at high GC and variable coverage indicating possible contamination. These contigs were removed from the assembly.
    Contigs, supplied by Bioedit Company, used in various techniques. Bioz Stars score: 92/100, based on 896 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Bioedit Company
    Average 92 stars, based on 896 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    89
    Celera chimeric contigs
    Correlation analysis from assemblies. Hierarchical clustering given by the Spearman correlation matrix of the viral-bacterial (left) and the viral (right) assemblies (A) . The gradient indicates the strength and direction of the correlations. Blue squares represent significant negative correlations and red squares positive correlations ( P value ≤ 0.05). Green clusters are based on accurate short-length and low-assembled <t>contigs</t> while purple clusters are based on highly-assembled long-chimeric contigs. Ctgs: number of contigs, ChCtgs: percentage of chimeric contigs, ChCtgs.V.B: percentage of viral-bacterial chimeric contigs, GR: genomes recovered, IM: median of percentage of contig identity against its original genome, LC: Largest contig, N50, RA: percentage of reads assembled, ROG: percentage of reads assembled on their original genomes, RVB: percentage of reads within a viral-bacterial hit. Principal Component Analysis for the different assemblies in both metagenomes (B) : The two principal components that better explain variation between assemblies are shown for the viral-bacterial (left) and the viral (right) assemblies. The length of the red vectors represents the effect of each component on the stats. The distance between them indicates their correlation. Circles represent two clusters formed due to the close correlations in the matrices in panel A . C: Celera, M: Minimo, N: Newbler, Optimal: optimal assembly.
    Chimeric Contigs, supplied by Celera, used in various techniques. Bioz Stars score: 89/100, based on 30 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/chimeric contigs/product/Celera
    Average 89 stars, based on 30 article reviews
    Price from $9.99 to $1999.99
    chimeric contigs - by Bioz Stars, 2021-01
    89/100 stars
      Buy from Supplier

    92
    Thermo Fisher contigs
    Correlation analysis from assemblies. Hierarchical clustering given by the Spearman correlation matrix of the viral-bacterial (left) and the viral (right) assemblies (A) . The gradient indicates the strength and direction of the correlations. Blue squares represent significant negative correlations and red squares positive correlations ( P value ≤ 0.05). Green clusters are based on accurate short-length and low-assembled <t>contigs</t> while purple clusters are based on highly-assembled long-chimeric contigs. Ctgs: number of contigs, ChCtgs: percentage of chimeric contigs, ChCtgs.V.B: percentage of viral-bacterial chimeric contigs, GR: genomes recovered, IM: median of percentage of contig identity against its original genome, LC: Largest contig, N50, RA: percentage of reads assembled, ROG: percentage of reads assembled on their original genomes, RVB: percentage of reads within a viral-bacterial hit. Principal Component Analysis for the different assemblies in both metagenomes (B) : The two principal components that better explain variation between assemblies are shown for the viral-bacterial (left) and the viral (right) assemblies. The length of the red vectors represents the effect of each component on the stats. The distance between them indicates their correlation. Circles represent two clusters formed due to the close correlations in the matrices in panel A . C: Celera, M: Minimo, N: Newbler, Optimal: optimal assembly.
    Contigs, supplied by Thermo Fisher, used in various techniques. Bioz Stars score: 92/100, based on 1254 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Thermo Fisher
    Average 92 stars, based on 1254 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    Biotechnology Information contig
    Correlation analysis from assemblies. Hierarchical clustering given by the Spearman correlation matrix of the viral-bacterial (left) and the viral (right) assemblies (A) . The gradient indicates the strength and direction of the correlations. Blue squares represent significant negative correlations and red squares positive correlations ( P value ≤ 0.05). Green clusters are based on accurate short-length and low-assembled <t>contigs</t> while purple clusters are based on highly-assembled long-chimeric contigs. Ctgs: number of contigs, ChCtgs: percentage of chimeric contigs, ChCtgs.V.B: percentage of viral-bacterial chimeric contigs, GR: genomes recovered, IM: median of percentage of contig identity against its original genome, LC: Largest contig, N50, RA: percentage of reads assembled, ROG: percentage of reads assembled on their original genomes, RVB: percentage of reads within a viral-bacterial hit. Principal Component Analysis for the different assemblies in both metagenomes (B) : The two principal components that better explain variation between assemblies are shown for the viral-bacterial (left) and the viral (right) assemblies. The length of the red vectors represents the effect of each component on the stats. The distance between them indicates their correlation. Circles represent two clusters formed due to the close correlations in the matrices in panel A . C: Celera, M: Minimo, N: Newbler, Optimal: optimal assembly.
    Contig, supplied by Biotechnology Information, used in various techniques. Bioz Stars score: 92/100, based on 118 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contig/product/Biotechnology Information
    Average 92 stars, based on 118 article reviews
    Price from $9.99 to $1999.99
    contig - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    90
    BioNano Genomics bionano contigs
    Assembly and validation of Drosophila miranda genome. A . Overview of assembly pipeline. The steps include assembly of male PacBio reads followed by scaffolding using Hi-C, and extensive QC using <t>BioNano</t> reads and BAC clone sequencing followed by gene and repeat annotation. B. Hi-C linkage density map. Chromatin interaction maps allow recovery of entire chromosome arms. Note that the Y-linked <t>contigs</t> were scaffolded separately from X-linked and autosomal contigs. Unlinked regions with many contacts indicate repetitive regions. C. Comparison of current (Dmir2.0) versus old (Dmir1.0) D . miranda assembly. Note that the Y/neo-Y was not assembled in Dmir1.0, and the dot plot indicates homology between our neo-Y assembly and the neo-X. Other repeat-rich regions, such as the large pericentromeric block on AD, are also missing from D.mir1.0. D. BAC clone mapping for assembly verification. BAC clones are color coded according to how many genomic regions they map to in our assembly; green lines indicate stitch points of scaffolds based on Hi-C contacts, and the black line gives the local repeat content along the genome. Three hundred sixty-one sequenced BAC clones (97%) map contiguously and uniquely to our genome assembly. BAC, bacterial artificial chromosome; F, female; M, male; QC, quality control; Repeat %, local repeat content.
    Bionano Contigs, supplied by BioNano Genomics, used in various techniques. Bioz Stars score: 90/100, based on 116 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/bionano contigs/product/BioNano Genomics
    Average 90 stars, based on 116 article reviews
    Price from $9.99 to $1999.99
    bionano contigs - by Bioz Stars, 2021-01
    90/100 stars
      Buy from Supplier

    92
    Gene Codes Inc contig assembly
    Assembly and validation of Drosophila miranda genome. A . Overview of assembly pipeline. The steps include assembly of male PacBio reads followed by scaffolding using Hi-C, and extensive QC using <t>BioNano</t> reads and BAC clone sequencing followed by gene and repeat annotation. B. Hi-C linkage density map. Chromatin interaction maps allow recovery of entire chromosome arms. Note that the Y-linked <t>contigs</t> were scaffolded separately from X-linked and autosomal contigs. Unlinked regions with many contacts indicate repetitive regions. C. Comparison of current (Dmir2.0) versus old (Dmir1.0) D . miranda assembly. Note that the Y/neo-Y was not assembled in Dmir1.0, and the dot plot indicates homology between our neo-Y assembly and the neo-X. Other repeat-rich regions, such as the large pericentromeric block on AD, are also missing from D.mir1.0. D. BAC clone mapping for assembly verification. BAC clones are color coded according to how many genomic regions they map to in our assembly; green lines indicate stitch points of scaffolds based on Hi-C contacts, and the black line gives the local repeat content along the genome. Three hundred sixty-one sequenced BAC clones (97%) map contiguously and uniquely to our genome assembly. BAC, bacterial artificial chromosome; F, female; M, male; QC, quality control; Repeat %, local repeat content.
    Contig Assembly, supplied by Gene Codes Inc, used in various techniques. Bioz Stars score: 92/100, based on 182 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contig assembly/product/Gene Codes Inc
    Average 92 stars, based on 182 article reviews
    Price from $9.99 to $1999.99
    contig assembly - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    CodonCode contig assembly
    Bayesian phylogenetic inference of CTV genomes and genome fragments. Unrooted, consensus phylogenetic trees were obtained from 2,000,000 generations of the Markov chain Monte Carlo simulation in Bayesian analysis using a general time-reversal model of nucleotide substitution [33] . The number above each branch indicates the Bayesian posterior probability. The scale bars represent 0.1 expected substitutions per site. Branch lengths are proportional to evolutionary distance. Sequences were aligned using ClustalX [47] and subsequently manually aligned prior to the Bayesian phylogenetic analysis. A, Known CTV genomes and CTV genomes assembled from resequencing analysis of FS2-2 (highlighted orange). The suffix at the end of fs2_2 distinguishes multiple genotypes in the isolate and also indicates the anchor sequence from which the consensus <t>contig</t> was generated by the <t>Phrap</t> program. B, the 5′ proximal 1 kb, and C, p33-coding region of CTV genomes obtained by direct sequencing of RT-PCR clones. In both B and C, Bayesian posterior probability and clones with identical sequences were omitted for clarity. Recombinant sequences are highlighted in green.
    Contig Assembly, supplied by CodonCode, used in various techniques. Bioz Stars score: 92/100, based on 211 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contig assembly/product/CodonCode
    Average 92 stars, based on 211 article reviews
    Price from $9.99 to $1999.99
    contig assembly - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    Gene Codes Inc contig
    Distribution of <t>contig</t> sequences after BLASTx analysis. (A) Percentage of sequences related to the main categories of existing viruses: vertebrate (blue), plant/fungal (green), invertebrate (brown), protozoan (yellow) viruses and bacteriophages (gray), and unassigned viral sequences (no data available concerning the taxonomic family, indicated in red). The total number of viral <t>contigs</t> is indicated below each pie chart. (B) Percentage of sequences related to the most abundant viral families, indicated by the same color range for each of the main viral categories as in (A) : blue = vertebrate, green = plant/fungal, brown = invertebrate, yellow = protozoan, gray = phage. Viral families accounting for less than 2% of all sequences are pooled in the “other” category (in purple), and read sequences with no available data regarding the taxonomic family are considered to be unassigned (in red).
    Contig, supplied by Gene Codes Inc, used in various techniques. Bioz Stars score: 92/100, based on 115 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contig/product/Gene Codes Inc
    Average 92 stars, based on 115 article reviews
    Price from $9.99 to $1999.99
    contig - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    92
    Solexa contigs
    Integrated pipeline for de novo assembly of novel genome sequencing. The scheme is filtering data to remove low-quality and shot-read initial assemblies using variable software and compare to <t>contigs,</t> hybrid contigs using MIRA assembler, and contig ordering using SSPACE software to scaffold construction.
    Contigs, supplied by Solexa, used in various techniques. Bioz Stars score: 92/100, based on 79 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contigs/product/Solexa
    Average 92 stars, based on 79 article reviews
    Price from $9.99 to $1999.99
    contigs - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    88
    Bioedit Company cap contig assembly program
    Integrated pipeline for de novo assembly of novel genome sequencing. The scheme is filtering data to remove low-quality and shot-read initial assemblies using variable software and compare to <t>contigs,</t> hybrid contigs using MIRA assembler, and contig ordering using SSPACE software to scaffold construction.
    Cap Contig Assembly Program, supplied by Bioedit Company, used in various techniques. Bioz Stars score: 88/100, based on 57 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/cap contig assembly program/product/Bioedit Company
    Average 88 stars, based on 57 article reviews
    Price from $9.99 to $1999.99
    cap contig assembly program - by Bioz Stars, 2021-01
    88/100 stars
      Buy from Supplier

    92
    CLC Bio contig
    A schematic representation of de novo assembling and coverage estimate of the 15,263 bp mitogenome of Bactericera cockerelli using <t>MiSeq</t> data. In the top <t>Contig</t> line, the blue and red arrows represent the forward and reversed primers (BC-mito-F/BC-mito-R) used to verify the circularity of B . cockerelli mitogenome by PCR. Numbers are nucleotides in bp. In the Coverage section, the pink area represents nucleotide coverage with the highest of 16,468 X in nad 6 and the lowest of 2,074 X in CR. In the Reads section, ten top read assemblings from MiSeq data were representatively shown with blue as pair reads, red as forward reads only and green as reversed reads only.
    Contig, supplied by CLC Bio, used in various techniques. Bioz Stars score: 92/100, based on 129 PubMed citations. ZERO BIAS - scores, article reviews, protocol conditions and more
    https://www.bioz.com/result/contig/product/CLC Bio
    Average 92 stars, based on 129 article reviews
    Price from $9.99 to $1999.99
    contig - by Bioz Stars, 2021-01
    92/100 stars
      Buy from Supplier

    Image Search Results


    Cumulative number of assembled nucleotides in contigs of different minimum lengths for ( a ) Link_ADI, ( b ) unClos_1, and ( c ) unFirm_1. Each line corresponds to a different sample (Link_ADI or eCI, where noted), sequencing method (HiSeq [HS] or PacBio [PB]), different assembly method (co-assembly across samples Link_ADI and eCI, hybrid using mapped reads from HiSeq and PacBio, or hybrid using contigs from HiSeq and PacBio), or assembly program (CAP3, IDBA_UD, MIRA, or SOAPdenovo).

    Journal: Scientific Reports

    Article Title: Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data

    doi: 10.1038/srep25373

    Figure Lengend Snippet: Cumulative number of assembled nucleotides in contigs of different minimum lengths for ( a ) Link_ADI, ( b ) unClos_1, and ( c ) unFirm_1. Each line corresponds to a different sample (Link_ADI or eCI, where noted), sequencing method (HiSeq [HS] or PacBio [PB]), different assembly method (co-assembly across samples Link_ADI and eCI, hybrid using mapped reads from HiSeq and PacBio, or hybrid using contigs from HiSeq and PacBio), or assembly program (CAP3, IDBA_UD, MIRA, or SOAPdenovo).

    Article Snippet: Using contig clustering and marker gene analysis of our PacBio contigs (because they are on average longer and contain greater marker gene representation including SSU rDNA fragments), we were able to generate phylotype-specific training data for the two most abundant organisms (unClos_1 and unFirm_1).

    Techniques: Sequencing

    Visualization of GC %, coverage and size of assembled contigs generated from PacBio CCS ( a , b ) and HiSeq data ( c , d ) from a biogas reactor microbiome (Link_ADI). Contigs are coloured based on taxonomic binning that was performed using PhyloPythiaS+ under default settings ( a , c ) and after including custom phylotype-specific training data ( b , d ). Contig lengths are indicated by circle sizes. PacBio CCS contigs that contain marker genes and were used as training data for phylotype unClos_1 and unFirm_1 are outlined in black. For the purposes of clarity, only HiSeq contigs greater than 5 kb are represented ( c , d ).

    Journal: Scientific Reports

    Article Title: Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data

    doi: 10.1038/srep25373

    Figure Lengend Snippet: Visualization of GC %, coverage and size of assembled contigs generated from PacBio CCS ( a , b ) and HiSeq data ( c , d ) from a biogas reactor microbiome (Link_ADI). Contigs are coloured based on taxonomic binning that was performed using PhyloPythiaS+ under default settings ( a , c ) and after including custom phylotype-specific training data ( b , d ). Contig lengths are indicated by circle sizes. PacBio CCS contigs that contain marker genes and were used as training data for phylotype unClos_1 and unFirm_1 are outlined in black. For the purposes of clarity, only HiSeq contigs greater than 5 kb are represented ( c , d ).

    Article Snippet: Using contig clustering and marker gene analysis of our PacBio contigs (because they are on average longer and contain greater marker gene representation including SSU rDNA fragments), we were able to generate phylotype-specific training data for the two most abundant organisms (unClos_1 and unFirm_1).

    Techniques: Generated, Marker

    Selected taxonomic bins generated via PhyloPythiaS+ binning using default settings with and without use of custom training data. Circle size indicates relative bin size; for complete binning information see Table S3 . The proportion of total DNA binned in the major phyla ( A ) represented in the Link_ADI microbiome was similar for both PacBio CCS and HiSeq contigs regardless of the use of training data. However, use of training data enhanced the recovery of unClos_1 and unFirm_1 ( B ) in both the PacBio and HiSeq assemblies. Differences between the sequencing methods were also evident at a species level where some abundant species assembled and binned better with PacBio ( Thermacetogenium phaeum , unClos_1, and unFirm_1), whereas others produced better results with HiSeq data ( Syntrophomonas wolfei and Methanosarcina barkeri ).

    Journal: Scientific Reports

    Article Title: Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data

    doi: 10.1038/srep25373

    Figure Lengend Snippet: Selected taxonomic bins generated via PhyloPythiaS+ binning using default settings with and without use of custom training data. Circle size indicates relative bin size; for complete binning information see Table S3 . The proportion of total DNA binned in the major phyla ( A ) represented in the Link_ADI microbiome was similar for both PacBio CCS and HiSeq contigs regardless of the use of training data. However, use of training data enhanced the recovery of unClos_1 and unFirm_1 ( B ) in both the PacBio and HiSeq assemblies. Differences between the sequencing methods were also evident at a species level where some abundant species assembled and binned better with PacBio ( Thermacetogenium phaeum , unClos_1, and unFirm_1), whereas others produced better results with HiSeq data ( Syntrophomonas wolfei and Methanosarcina barkeri ).

    Article Snippet: Using contig clustering and marker gene analysis of our PacBio contigs (because they are on average longer and contain greater marker gene representation including SSU rDNA fragments), we were able to generate phylotype-specific training data for the two most abundant organisms (unClos_1 and unFirm_1).

    Techniques: Generated, Sequencing, Produced

    Visualization of GC %, coverage and size of assembled contigs generated from eCI HiSeq data. Sample eCI originated from a lab-scale enrichment grown on cellulose that was inoculated from Link_ADI. Contig lengths are indicated by circle sizes. Contigs are coloured based on phylogenetic binning that was performed using PhyloPythiaS+ under default settings ( a ) and PacBio-derived custom phylotype-specific training data ( b ). For the purposes of clarity, only HiSeq contigs greater than 5 kb are represented.

    Journal: Scientific Reports

    Article Title: Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data

    doi: 10.1038/srep25373

    Figure Lengend Snippet: Visualization of GC %, coverage and size of assembled contigs generated from eCI HiSeq data. Sample eCI originated from a lab-scale enrichment grown on cellulose that was inoculated from Link_ADI. Contig lengths are indicated by circle sizes. Contigs are coloured based on phylogenetic binning that was performed using PhyloPythiaS+ under default settings ( a ) and PacBio-derived custom phylotype-specific training data ( b ). For the purposes of clarity, only HiSeq contigs greater than 5 kb are represented.

    Article Snippet: Using contig clustering and marker gene analysis of our PacBio contigs (because they are on average longer and contain greater marker gene representation including SSU rDNA fragments), we were able to generate phylotype-specific training data for the two most abundant organisms (unClos_1 and unFirm_1).

    Techniques: Generated, Derivative Assay

    The statistics of PacBio long reads derived from TGS and corrected by LSC using Hiseq sequencing data. ( a ) PacBio corrected long reads supported by the Hiseq data with varied coverage. ( b ) Numbers distribution of PacBio long reads before and after correction using LSC. ( c ) Number of PacBio corrected long reads before and after duplication-removal using our developed methods. ( d ) The number distribution of PacBio corrected long reads and Hiseq assembled contigs at various lengths.

    Journal: Scientific Reports

    Article Title: Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome

    doi: 10.1038/srep43793

    Figure Lengend Snippet: The statistics of PacBio long reads derived from TGS and corrected by LSC using Hiseq sequencing data. ( a ) PacBio corrected long reads supported by the Hiseq data with varied coverage. ( b ) Numbers distribution of PacBio long reads before and after correction using LSC. ( c ) Number of PacBio corrected long reads before and after duplication-removal using our developed methods. ( d ) The number distribution of PacBio corrected long reads and Hiseq assembled contigs at various lengths.

    Article Snippet: By contrast, use of the traditional RNA-Seq analysis method revealed that the longest assembled contigs mapping to the PacBio corrected reads may, or may not, truly be present in the final reference isoform library following clustering analysis of the assembled transcripts.

    Techniques: Derivative Assay, Sequencing

    The statistics of assembly contigs derived from SGS. ( a ) The number of assembled contigs by Trinity assembler using low threshold or default parameters. ( b ) Average and N50 length of contigs under the two different assembly parameters. ( c ) Distribution of the number of contigs at various lengths.

    Journal: Scientific Reports

    Article Title: Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome

    doi: 10.1038/srep43793

    Figure Lengend Snippet: The statistics of assembly contigs derived from SGS. ( a ) The number of assembled contigs by Trinity assembler using low threshold or default parameters. ( b ) Average and N50 length of contigs under the two different assembly parameters. ( c ) Distribution of the number of contigs at various lengths.

    Article Snippet: By contrast, use of the traditional RNA-Seq analysis method revealed that the longest assembled contigs mapping to the PacBio corrected reads may, or may not, truly be present in the final reference isoform library following clustering analysis of the assembled transcripts.

    Techniques: Derivative Assay

    Extensive identification of gene alternative splicing patterns at a global level using HySeMaFi method. ( a ) Clustering of the SGS assembled contigs (genes) mapped by the PacBio corrected long reads. ( b ) The statistics of the gene numbers with various alternative splicing forms. ( c,d ) Cases selected to verify the effectiveness of detecting the isoforms (alternatively spliced molecules) using the HySeMaFi method, as are supported by the Miseq data (see arrow).

    Journal: Scientific Reports

    Article Title: Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome

    doi: 10.1038/srep43793

    Figure Lengend Snippet: Extensive identification of gene alternative splicing patterns at a global level using HySeMaFi method. ( a ) Clustering of the SGS assembled contigs (genes) mapped by the PacBio corrected long reads. ( b ) The statistics of the gene numbers with various alternative splicing forms. ( c,d ) Cases selected to verify the effectiveness of detecting the isoforms (alternatively spliced molecules) using the HySeMaFi method, as are supported by the Miseq data (see arrow).

    Article Snippet: By contrast, use of the traditional RNA-Seq analysis method revealed that the longest assembled contigs mapping to the PacBio corrected reads may, or may not, truly be present in the final reference isoform library following clustering analysis of the assembled transcripts.

    Techniques:

    The statistics of mapping between PacBio corrected long reads of TGS and de novo assembled contigs of SGS. ( a ) The mapping between PacBio corrected long reads using LSC method and the assembled contigs using low threshold parameters at various identity rates. ( b ) The mapping between PacBio corrected long reads using ICE and Quiver software, and the assembled contigs using low threshold or default parameters. ( c ) The mapping between PacBio corrected long reads with a more than 99% corrected rate, and the assembled contigs using low threshold or default parameters. ( d ) Pie chart of mapped or unmapped analysis to the contigs as assembled in total from SGS by Trinity software with a 99% level mapping threshold. ( e ) The mapping between the existing annotated isoform sets and the contigs datasets assembled from SGS by de novo assembly in Arabidopsis . ( f ) The exon number difference between the existing annotated isoform sets and the contigs datasets assembled from SGS in Arabidopsis in many mapping.

    Journal: Scientific Reports

    Article Title: Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome

    doi: 10.1038/srep43793

    Figure Lengend Snippet: The statistics of mapping between PacBio corrected long reads of TGS and de novo assembled contigs of SGS. ( a ) The mapping between PacBio corrected long reads using LSC method and the assembled contigs using low threshold parameters at various identity rates. ( b ) The mapping between PacBio corrected long reads using ICE and Quiver software, and the assembled contigs using low threshold or default parameters. ( c ) The mapping between PacBio corrected long reads with a more than 99% corrected rate, and the assembled contigs using low threshold or default parameters. ( d ) Pie chart of mapped or unmapped analysis to the contigs as assembled in total from SGS by Trinity software with a 99% level mapping threshold. ( e ) The mapping between the existing annotated isoform sets and the contigs datasets assembled from SGS by de novo assembly in Arabidopsis . ( f ) The exon number difference between the existing annotated isoform sets and the contigs datasets assembled from SGS in Arabidopsis in many mapping.

    Article Snippet: By contrast, use of the traditional RNA-Seq analysis method revealed that the longest assembled contigs mapping to the PacBio corrected reads may, or may not, truly be present in the final reference isoform library following clustering analysis of the assembled transcripts.

    Techniques: Software

    Characterization of the root, flower, stem and leaf transcriptome, and illustrating different expressions of genes specifically elevated or depressed in roots by TGS using hybrid sequencing and map finding. ( a ) The higher expression of isoforms specifically in roots; ( b ) Heat map shows the expression of 639 genes; ( c ) The lower expression of genes specifically in roots; ( d ) Heat map shows the expression of 869 genes; ( e,f ) The existing isoforms were detected to be differentially expressed in the tested samples, and the mapped contigs had been removed in the clustering analysis in the traditional RNA-Seq analysis; ( g,h ) The existing isoforms were detected to be differentially expressed in the tested samples and the mapped contigs had also been found to be differentially expressed.

    Journal: Scientific Reports

    Article Title: Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome

    doi: 10.1038/srep43793

    Figure Lengend Snippet: Characterization of the root, flower, stem and leaf transcriptome, and illustrating different expressions of genes specifically elevated or depressed in roots by TGS using hybrid sequencing and map finding. ( a ) The higher expression of isoforms specifically in roots; ( b ) Heat map shows the expression of 639 genes; ( c ) The lower expression of genes specifically in roots; ( d ) Heat map shows the expression of 869 genes; ( e,f ) The existing isoforms were detected to be differentially expressed in the tested samples, and the mapped contigs had been removed in the clustering analysis in the traditional RNA-Seq analysis; ( g,h ) The existing isoforms were detected to be differentially expressed in the tested samples and the mapped contigs had also been found to be differentially expressed.

    Article Snippet: By contrast, use of the traditional RNA-Seq analysis method revealed that the longest assembled contigs mapping to the PacBio corrected reads may, or may not, truly be present in the final reference isoform library following clustering analysis of the assembled transcripts.

    Techniques: Sequencing, Expressing, RNA Sequencing Assay

    Alignment of the curated PacBio contigs to the AgamP4 PEST reference [ 21 ]. Alignments are colored by the primary PEST reference chromosome to which they align but are placed in the panel and Y offset to which the contig as a whole aligns best. Contig ends are denoted by horizontal lines in the assembly and vertical lines in PEST. However, there are many Ns in PEST not annotated as contig breaks so the percent Ns per megabase of PEST is overlaid (scale on the right Y axis). There are no Ns in the PacBio assembly.

    Journal: Genes

    Article Title: A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing

    doi: 10.3390/genes10010062

    Figure Lengend Snippet: Alignment of the curated PacBio contigs to the AgamP4 PEST reference [ 21 ]. Alignments are colored by the primary PEST reference chromosome to which they align but are placed in the panel and Y offset to which the contig as a whole aligns best. Contig ends are denoted by horizontal lines in the assembly and vertical lines in PEST. However, there are many Ns in PEST not annotated as contig breaks so the percent Ns per megabase of PEST is overlaid (scale on the right Y axis). There are no Ns in the PacBio assembly.

    Article Snippet: For example, a single contig from the new PacBio assembly expanded a tandem repeat region on chromosome 2L that in PEST was collapsed, while also filling in many Ns (gaps) in PEST, and also spanning a break between PEST scaffolds set to 10,000 Ns ( ).

    Techniques:

    Alignment of X pericentromeric contigs to PEST, highlighting likely order and orientation issues in the PEST assembly that are resolved by a single PacBio contig (22F).

    Journal: Genes

    Article Title: A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing

    doi: 10.3390/genes10010062

    Figure Lengend Snippet: Alignment of X pericentromeric contigs to PEST, highlighting likely order and orientation issues in the PEST assembly that are resolved by a single PacBio contig (22F).

    Article Snippet: For example, a single contig from the new PacBio assembly expanded a tandem repeat region on chromosome 2L that in PEST was collapsed, while also filling in many Ns (gaps) in PEST, and also spanning a break between PEST scaffolds set to 10,000 Ns ( ).

    Techniques:

    Example of a compressed repeat in PEST that has been expanded by the PacBio assembly. Dotted vertical lines represent a gap in the PEST assembly (10,000 Ns) between scaffolds, which is now spanned by the single PacBio contig. Coverage plot of the PacBio subreads aligned to PEST (bottom) highlights the region where excess coverage indicates a collapsed repeat in PEST, in contrast the coverage of PacBio subreads aligned to the PacBio contig (left) is more uniform.

    Journal: Genes

    Article Title: A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing

    doi: 10.3390/genes10010062

    Figure Lengend Snippet: Example of a compressed repeat in PEST that has been expanded by the PacBio assembly. Dotted vertical lines represent a gap in the PEST assembly (10,000 Ns) between scaffolds, which is now spanned by the single PacBio contig. Coverage plot of the PacBio subreads aligned to PEST (bottom) highlights the region where excess coverage indicates a collapsed repeat in PEST, in contrast the coverage of PacBio subreads aligned to the PacBio contig (left) is more uniform.

    Article Snippet: For example, a single contig from the new PacBio assembly expanded a tandem repeat region on chromosome 2L that in PEST was collapsed, while also filling in many Ns (gaps) in PEST, and also spanning a break between PEST scaffolds set to 10,000 Ns ( ).

    Techniques:

    Illumina single-end reads mapped against PacBio contigs. a Copy number variation along the 4 PacBio large contigs, as determined from Illumina coverage. Copy numbers (ordinates) were normalized to 1 (for haploids). b Copy number variation along the 4 PacBio large contigs, as determined by remapping of putative plasmid reads identified by plasmidSPAdes. Copy numbers (ordinates) were normalized to 1 (for haploids)

    Journal: Standards in Genomic Sciences

    Article Title: Short genome report of cellulose-producing commensal Escherichia coli 1094

    doi: 10.1186/s40793-018-0316-0

    Figure Lengend Snippet: Illumina single-end reads mapped against PacBio contigs. a Copy number variation along the 4 PacBio large contigs, as determined from Illumina coverage. Copy numbers (ordinates) were normalized to 1 (for haploids). b Copy number variation along the 4 PacBio large contigs, as determined by remapping of putative plasmid reads identified by plasmidSPAdes. Copy numbers (ordinates) were normalized to 1 (for haploids)

    Article Snippet: The resulting overlapping sequences were easily identified between the beginning and the end of each large contigs, suggesting that all four PacBio large contigs are circular.

    Techniques: Plasmid Preparation

    Details of the selection process for field pea contigs. K - Kaspa transcriptome and P - Parafield transcriptome

    Journal: BMC Genomics

    Article Title: De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

    doi: 10.1186/s12864-015-1815-7

    Figure Lengend Snippet: Details of the selection process for field pea contigs. K - Kaspa transcriptome and P - Parafield transcriptome

    Article Snippet: The multiple contigs that were assembled into unigene clusters may represent transcription variants, allelic variants, closely related paralogous sequences, misassembled transcripts, or transcripts that were fragmented due to low coverage [ ].

    Techniques: Selection

    Expression patterns in different tissue samples: ( a ) Number of contigs expressed in each tissue sample; ( b ) Percentage of shared and specific expression profiles of contigs expressed in Kaspa; ( c ) Percentage of shared and specific expression profiles of contigs expressed in Parafield; ( d ) Number of tissue-specific contigs. *For Parafield, stipule and leaflet tissue-derived read counts were merged, while Kaspa contributed only stipule tissue-derived reads

    Journal: BMC Genomics

    Article Title: De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

    doi: 10.1186/s12864-015-1815-7

    Figure Lengend Snippet: Expression patterns in different tissue samples: ( a ) Number of contigs expressed in each tissue sample; ( b ) Percentage of shared and specific expression profiles of contigs expressed in Kaspa; ( c ) Percentage of shared and specific expression profiles of contigs expressed in Parafield; ( d ) Number of tissue-specific contigs. *For Parafield, stipule and leaflet tissue-derived read counts were merged, while Kaspa contributed only stipule tissue-derived reads

    Article Snippet: The multiple contigs that were assembled into unigene clusters may represent transcription variants, allelic variants, closely related paralogous sequences, misassembled transcripts, or transcripts that were fragmented due to low coverage [ ].

    Techniques: Expressing, Derivative Assay

    Length distribution of contigs from the ( a ) Kaspa-specific assembly, and ( b ) Parafield-specific assembly

    Journal: BMC Genomics

    Article Title: De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

    doi: 10.1186/s12864-015-1815-7

    Figure Lengend Snippet: Length distribution of contigs from the ( a ) Kaspa-specific assembly, and ( b ) Parafield-specific assembly

    Article Snippet: The multiple contigs that were assembled into unigene clusters may represent transcription variants, allelic variants, closely related paralogous sequences, misassembled transcripts, or transcripts that were fragmented due to low coverage [ ].

    Techniques:

    The distribution of field pea contigs against genes encoding enzymes involved in nitrogen metabolism pathways. This is a global nitrogen metabolism pathway map in which a red colour indicates genes identified in data from the present study, all of the known nitrogen metabolism genes in legumes having been identified

    Journal: BMC Genomics

    Article Title: De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

    doi: 10.1186/s12864-015-1815-7

    Figure Lengend Snippet: The distribution of field pea contigs against genes encoding enzymes involved in nitrogen metabolism pathways. This is a global nitrogen metabolism pathway map in which a red colour indicates genes identified in data from the present study, all of the known nitrogen metabolism genes in legumes having been identified

    Article Snippet: The multiple contigs that were assembled into unigene clusters may represent transcription variants, allelic variants, closely related paralogous sequences, misassembled transcripts, or transcripts that were fragmented due to low coverage [ ].

    Techniques:

    Sequence conservation of field pea contigs in comparison to sequences from other species ( a ) Percentage of sequence similarity of field pea contigs with nr, nt databases and sequences from other plant species; ( b ) Venn diagram summarising the distribution of BLASTN matches between the Kaspa transcriptome and sequences from three other legume genomes; ( c ) Venn diagram summarising the distribution of BLASTN matches between the Parafield transcriptome and sequences from three other legume genomes. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLASTN and the numbers within the parenthesis indicate the percentage of matches in terms of total numbers

    Journal: BMC Genomics

    Article Title: De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

    doi: 10.1186/s12864-015-1815-7

    Figure Lengend Snippet: Sequence conservation of field pea contigs in comparison to sequences from other species ( a ) Percentage of sequence similarity of field pea contigs with nr, nt databases and sequences from other plant species; ( b ) Venn diagram summarising the distribution of BLASTN matches between the Kaspa transcriptome and sequences from three other legume genomes; ( c ) Venn diagram summarising the distribution of BLASTN matches between the Parafield transcriptome and sequences from three other legume genomes. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLASTN and the numbers within the parenthesis indicate the percentage of matches in terms of total numbers

    Article Snippet: The multiple contigs that were assembled into unigene clusters may represent transcription variants, allelic variants, closely related paralogous sequences, misassembled transcripts, or transcripts that were fragmented due to low coverage [ ].

    Techniques: Sequencing

    Unrooted phylogram of selected GolS genes. Sequence alignments were done using ClustalW and the tree was constructed with the neighbor-joining method using the program MEGA6. The numbers on the tree branches indicate bootstrap results from neighbour-joining analyses of the branches. The sequences were obtained through NCBI: Ajuga reptans (CAB51533.1 – AjGolS1; CAB51534.1 – AjGolS2), Arabidopsis thaliana (NP_1822401.1 – AtGolS1; NP_176053.1 – AtGolS2; NP_172406.1 – AtGolS3; NP_176250.1 – AtGolS4; NP_197768.1 – AtGolS5; NP_567741.2 – AtGolS6; NP_176248.1 – AtGolS7), Brassica napus (ACJ15472.1 – BnGolS1), Capsicum annuum (ABQ44212.1 – CpaGolS), Coffea arabica (ADM92588.1 – CaGolS1; ADM92590.1 – CaGolS2; ADM92589.1 – CaGolS3), Coffea canephora (CcGolS1 – Contig 7664; Mondego et al. , 2011 ), Cucumis melo (AAL78687.1 – CmGolS1; AAL78686.1 – CmGolS2), Glycine max (AAM96867.1 – GmGolS), Gossypium hirsutum (AFG26331.1 – GhGolS1), Oriza sativa (Os.2677.1S1_at – OsGolS), Populus alba x Populus grandidentata (AEN74905.1 – PaxgGolS1; AEN74906.1 – PaxGolS2), Salvia miltiorrhiza (ACT34765.1 – SmGolS1; AEQ54920.1 – SmGolS2; AEQ54921.1 – SmGolS3), Verbascum phoeniceum (ABQ12640.1 – VpGolS1; ABQ12641.1 – VpGolS2) and Zea mays (AAQ07248.1 – ZmGolS1; AAQ07249.1 – ZmGolS2; AAQ07250.1 – ZmGolS3).

    Journal: Genetics and Molecular Biology

    Article Title: Galactinol synthase transcriptional profile in two genotypes of Coffea canephora with contrasting tolerance to drought

    doi: 10.1590/S1415-475738220140171

    Figure Lengend Snippet: Unrooted phylogram of selected GolS genes. Sequence alignments were done using ClustalW and the tree was constructed with the neighbor-joining method using the program MEGA6. The numbers on the tree branches indicate bootstrap results from neighbour-joining analyses of the branches. The sequences were obtained through NCBI: Ajuga reptans (CAB51533.1 – AjGolS1; CAB51534.1 – AjGolS2), Arabidopsis thaliana (NP_1822401.1 – AtGolS1; NP_176053.1 – AtGolS2; NP_172406.1 – AtGolS3; NP_176250.1 – AtGolS4; NP_197768.1 – AtGolS5; NP_567741.2 – AtGolS6; NP_176248.1 – AtGolS7), Brassica napus (ACJ15472.1 – BnGolS1), Capsicum annuum (ABQ44212.1 – CpaGolS), Coffea arabica (ADM92588.1 – CaGolS1; ADM92590.1 – CaGolS2; ADM92589.1 – CaGolS3), Coffea canephora (CcGolS1 – Contig 7664; Mondego et al. , 2011 ), Cucumis melo (AAL78687.1 – CmGolS1; AAL78686.1 – CmGolS2), Glycine max (AAM96867.1 – GmGolS), Gossypium hirsutum (AFG26331.1 – GhGolS1), Oriza sativa (Os.2677.1S1_at – OsGolS), Populus alba x Populus grandidentata (AEN74905.1 – PaxgGolS1; AEN74906.1 – PaxGolS2), Salvia miltiorrhiza (ACT34765.1 – SmGolS1; AEQ54920.1 – SmGolS2; AEQ54921.1 – SmGolS3), Verbascum phoeniceum (ABQ12640.1 – VpGolS1; ABQ12641.1 – VpGolS2) and Zea mays (AAQ07248.1 – ZmGolS1; AAQ07249.1 – ZmGolS2; AAQ07250.1 – ZmGolS3).

    Article Snippet: In silico analysis Searches for GolS sequences in the HarvEST:Coffea v.0.16 platform ( http://harvest.ucr.edu/ ) yielded only one assembled contig with full-length cDNA (Unigene 2798).

    Techniques: Sequencing, Construct

    Alignment of galactinol synthase amino acid sequences from Coffea canephora (CcGolS1 – Contig 7664) and Coffea arabica (ADM92588.1 – CaGolS1). The alignment was done using CLC Main Workbench v.5.0 software, with ClustalW default parameters. Asterisks indicate differences in amino acid residues (shaded in grey) at positions 139 and 180. The conserved glycosyltransferase domain is indicated by a black line above the amino acids (reviewed by Zhou et al. , 2012 ) and the C-terminal hydrophobic pentapeptide APSAA is boxed.

    Journal: Genetics and Molecular Biology

    Article Title: Galactinol synthase transcriptional profile in two genotypes of Coffea canephora with contrasting tolerance to drought

    doi: 10.1590/S1415-475738220140171

    Figure Lengend Snippet: Alignment of galactinol synthase amino acid sequences from Coffea canephora (CcGolS1 – Contig 7664) and Coffea arabica (ADM92588.1 – CaGolS1). The alignment was done using CLC Main Workbench v.5.0 software, with ClustalW default parameters. Asterisks indicate differences in amino acid residues (shaded in grey) at positions 139 and 180. The conserved glycosyltransferase domain is indicated by a black line above the amino acids (reviewed by Zhou et al. , 2012 ) and the C-terminal hydrophobic pentapeptide APSAA is boxed.

    Article Snippet: In silico analysis Searches for GolS sequences in the HarvEST:Coffea v.0.16 platform ( http://harvest.ucr.edu/ ) yielded only one assembled contig with full-length cDNA (Unigene 2798).

    Techniques: Software

    The results of a local BLAST between the 40,227 Unigene contigs allow the estimation of the reconstruction rate of 35,400 de novo contigs at the end of the assembly (iteration I6)

    Journal: BMC Research Notes

    Article Title: De novo construction of a “Gene-space” for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

    doi: 10.1186/s13104-016-1903-z

    Figure Lengend Snippet: The results of a local BLAST between the 40,227 Unigene contigs allow the estimation of the reconstruction rate of 35,400 de novo contigs at the end of the assembly (iteration I6)

    Article Snippet: The “filtration Step.1” was performed by mapping paired-end reads against the reference sequence file provided (either Unigene contigs for the 1st iteration or the extended sequence from the previous iteration).

    Techniques:

    Assembly of the B73-Ab10 genome. a Whole-genome view. For each chromosome, the top to bottom tracks are gene density, Cinful-Zeon retrotransposon density, Gypsy superfamily retrotransposon density in 10 Kb sliding windows, repeat location (knob180 in blue, TR-1 in red, 45S rDNA in teal, CentC in magenta), and the distribution of gapless contigs. CENH3 ChIP-seq peaks identifying centromeres are marked by orange rectangles. The inset shows the centromere on chromosome 3, TR-1-rich knob on chromosome 4, and knob180-rich knob on chromosome 7. The five most common retroelement families are shown for each panel, along with centromeric retrotransposons (CRM) for the centromere. CENH3 enrichment in chromosome 3 is displayed in a heatmap. b The impact of assembly merging over a CentC-rich region on chromosome 9. Seven contigs (orange, above) from the PacBio assembly were originally misassembled, as can be seen in the alignment to the Bionano map (connecting lines show matching sites). CentC tracts and gaps are annotated. Assembly merging corrected the output, leaving an 11-Kb gap that was filled with nanopore reads. c Sequence alignment between normal chromosome 10 from B73 (N10) (140–152 Mb) and Ab10 (140–195 Mb) from B73-Ab10. Annotation is as in a , with Kindr genes marked with black bars in the top track. Links show homologous regions larger than 500 bp

    Journal: Genome Biology

    Article Title: Gapless assembly of maize chromosomes using long-read technologies

    doi: 10.1186/s13059-020-02029-9

    Figure Lengend Snippet: Assembly of the B73-Ab10 genome. a Whole-genome view. For each chromosome, the top to bottom tracks are gene density, Cinful-Zeon retrotransposon density, Gypsy superfamily retrotransposon density in 10 Kb sliding windows, repeat location (knob180 in blue, TR-1 in red, 45S rDNA in teal, CentC in magenta), and the distribution of gapless contigs. CENH3 ChIP-seq peaks identifying centromeres are marked by orange rectangles. The inset shows the centromere on chromosome 3, TR-1-rich knob on chromosome 4, and knob180-rich knob on chromosome 7. The five most common retroelement families are shown for each panel, along with centromeric retrotransposons (CRM) for the centromere. CENH3 enrichment in chromosome 3 is displayed in a heatmap. b The impact of assembly merging over a CentC-rich region on chromosome 9. Seven contigs (orange, above) from the PacBio assembly were originally misassembled, as can be seen in the alignment to the Bionano map (connecting lines show matching sites). CentC tracts and gaps are annotated. Assembly merging corrected the output, leaving an 11-Kb gap that was filled with nanopore reads. c Sequence alignment between normal chromosome 10 from B73 (N10) (140–152 Mb) and Ab10 (140–195 Mb) from B73-Ab10. Annotation is as in a , with Kindr genes marked with black bars in the top track. Links show homologous regions larger than 500 bp

    Article Snippet: Gaps that were complemented by Nanopore contigs were identified as gaps present in the PacBio assembly but absent in the final assembly.

    Techniques: Chromatin Immunoprecipitation, Sequencing

    Each pair of plots give an overview of the comparisons of the quality of the assemblies across assemblers for E. coli and yeast datasets. a b : Histograms with error bars plotted between % of 2D reads and N50_value of an assembly show the variation in N50 value of an assembly among different assembler algorithms and how it varies with respect to the data size. c d : Histograms with error bars plotted between % of 2D reads and number of contigs generated from an assembly, shows how the number of contigs generated vary with respect to the mean contig length for each respective assembler algorithm across various bins of respective datasets. e f : Histograms showing the percentage of 2D reads employed on X-axis versus the average length of the contigs obtained using each algorithm. g h : Histograms showing the sum of the lengths of all the contigs generated by an assembler as a function of the percentage of the total reads employed in the assembly. In each set of plots, left panel corresponds to E. coli dataset while the plots in the right panel correspond to the Yeast dataset. In all the plots labeled numeric values on histograms indicate corresponding values of the metric in respective color representing each tool

    Journal: BMC Genomics

    Article Title: Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

    doi: 10.1186/s12864-016-2895-8

    Figure Lengend Snippet: Each pair of plots give an overview of the comparisons of the quality of the assemblies across assemblers for E. coli and yeast datasets. a b : Histograms with error bars plotted between % of 2D reads and N50_value of an assembly show the variation in N50 value of an assembly among different assembler algorithms and how it varies with respect to the data size. c d : Histograms with error bars plotted between % of 2D reads and number of contigs generated from an assembly, shows how the number of contigs generated vary with respect to the mean contig length for each respective assembler algorithm across various bins of respective datasets. e f : Histograms showing the percentage of 2D reads employed on X-axis versus the average length of the contigs obtained using each algorithm. g h : Histograms showing the sum of the lengths of all the contigs generated by an assembler as a function of the percentage of the total reads employed in the assembly. In each set of plots, left panel corresponds to E. coli dataset while the plots in the right panel correspond to the Yeast dataset. In all the plots labeled numeric values on histograms indicate corresponding values of the metric in respective color representing each tool

    Article Snippet: When the percentage of alignment was compared between the assemblers, the contigs generated by Celera and ABySS for the E. coli 2D read data showed 100 % alignment to the reference genome while the alignment percentage of the contigs generated by Velvet and SSAKE was found to be 80 % and 0 % respectively (see Fig. ).

    Techniques: Generated, Labeling

    Each pair of plots show the accuracy of the assembly generated by various assembler algorithms for E.coli (Panels A and C) and Yeast (Panels B and D) datasets. a b : Line graphs plotted between % of 2D reads and the % of genome covered, showing the extent of genome assembled by each assembler algorithm. c d : Line graphs between the % of 2D reads and % of alignment showing the confidence level of the contigs being assembled by various assembler algorithms

    Journal: BMC Genomics

    Article Title: Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches

    doi: 10.1186/s12864-016-2895-8

    Figure Lengend Snippet: Each pair of plots show the accuracy of the assembly generated by various assembler algorithms for E.coli (Panels A and C) and Yeast (Panels B and D) datasets. a b : Line graphs plotted between % of 2D reads and the % of genome covered, showing the extent of genome assembled by each assembler algorithm. c d : Line graphs between the % of 2D reads and % of alignment showing the confidence level of the contigs being assembled by various assembler algorithms

    Article Snippet: When the percentage of alignment was compared between the assemblers, the contigs generated by Celera and ABySS for the E. coli 2D read data showed 100 % alignment to the reference genome while the alignment percentage of the contigs generated by Velvet and SSAKE was found to be 80 % and 0 % respectively (see Fig. ).

    Techniques: Generated

    BlobPlot of the genome assembly before removing contamination. Each circle is a contig proportionally scaled by contig length and coloured by taxonomic annotation based on BLAST similarity search results. Contigs are positioned based on the GC content (X-axis) and the coverage of PacBio reads (Y-axis). There are some contigs of Proteobacteria origin at high GC and variable coverage indicating possible contamination. These contigs were removed from the assembly.

    Journal: Scientific Data

    Article Title: Genome assembly and annotation of Meloidogyne enterolobii, an emerging parthenogenetic root-knot nematode

    doi: 10.1038/s41597-020-00666-0

    Figure Lengend Snippet: BlobPlot of the genome assembly before removing contamination. Each circle is a contig proportionally scaled by contig length and coloured by taxonomic annotation based on BLAST similarity search results. Contigs are positioned based on the GC content (X-axis) and the coverage of PacBio reads (Y-axis). There are some contigs of Proteobacteria origin at high GC and variable coverage indicating possible contamination. These contigs were removed from the assembly.

    Article Snippet: Contigs that had coverage only in one Illumina library, GC percentage outside of the range of the estimated M. enterolobii GC content, and affiliation to different taxa, were considered possible contaminants.

    Techniques:

    Correlation analysis from assemblies. Hierarchical clustering given by the Spearman correlation matrix of the viral-bacterial (left) and the viral (right) assemblies (A) . The gradient indicates the strength and direction of the correlations. Blue squares represent significant negative correlations and red squares positive correlations ( P value ≤ 0.05). Green clusters are based on accurate short-length and low-assembled contigs while purple clusters are based on highly-assembled long-chimeric contigs. Ctgs: number of contigs, ChCtgs: percentage of chimeric contigs, ChCtgs.V.B: percentage of viral-bacterial chimeric contigs, GR: genomes recovered, IM: median of percentage of contig identity against its original genome, LC: Largest contig, N50, RA: percentage of reads assembled, ROG: percentage of reads assembled on their original genomes, RVB: percentage of reads within a viral-bacterial hit. Principal Component Analysis for the different assemblies in both metagenomes (B) : The two principal components that better explain variation between assemblies are shown for the viral-bacterial (left) and the viral (right) assemblies. The length of the red vectors represents the effect of each component on the stats. The distance between them indicates their correlation. Circles represent two clusters formed due to the close correlations in the matrices in panel A . C: Celera, M: Minimo, N: Newbler, Optimal: optimal assembly.

    Journal: BMC Genomics

    Article Title: Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut

    doi: 10.1186/1471-2164-15-37

    Figure Lengend Snippet: Correlation analysis from assemblies. Hierarchical clustering given by the Spearman correlation matrix of the viral-bacterial (left) and the viral (right) assemblies (A) . The gradient indicates the strength and direction of the correlations. Blue squares represent significant negative correlations and red squares positive correlations ( P value ≤ 0.05). Green clusters are based on accurate short-length and low-assembled contigs while purple clusters are based on highly-assembled long-chimeric contigs. Ctgs: number of contigs, ChCtgs: percentage of chimeric contigs, ChCtgs.V.B: percentage of viral-bacterial chimeric contigs, GR: genomes recovered, IM: median of percentage of contig identity against its original genome, LC: Largest contig, N50, RA: percentage of reads assembled, ROG: percentage of reads assembled on their original genomes, RVB: percentage of reads within a viral-bacterial hit. Principal Component Analysis for the different assemblies in both metagenomes (B) : The two principal components that better explain variation between assemblies are shown for the viral-bacterial (left) and the viral (right) assemblies. The length of the red vectors represents the effect of each component on the stats. The distance between them indicates their correlation. Circles represent two clusters formed due to the close correlations in the matrices in panel A . C: Celera, M: Minimo, N: Newbler, Optimal: optimal assembly.

    Article Snippet: The first one is characterized by their low percentage of chimeric contigs, high prevalence of the reads within a viral-bacterial hit and short contigs while the second is defined by their high percentage of reads assembled, as well as long and chimeric contigs.

    Techniques:

    Lowest Common Ancestor in chimeric contigs. The LCA of each chimeric contig is represented as a fraction of the total number of chimeric contigs on every viral and viral-bacterial metagenome assembly.

    Journal: BMC Genomics

    Article Title: Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut

    doi: 10.1186/1471-2164-15-37

    Figure Lengend Snippet: Lowest Common Ancestor in chimeric contigs. The LCA of each chimeric contig is represented as a fraction of the total number of chimeric contigs on every viral and viral-bacterial metagenome assembly.

    Article Snippet: The first one is characterized by their low percentage of chimeric contigs, high prevalence of the reads within a viral-bacterial hit and short contigs while the second is defined by their high percentage of reads assembled, as well as long and chimeric contigs.

    Techniques:

    Assembly and validation of Drosophila miranda genome. A . Overview of assembly pipeline. The steps include assembly of male PacBio reads followed by scaffolding using Hi-C, and extensive QC using BioNano reads and BAC clone sequencing followed by gene and repeat annotation. B. Hi-C linkage density map. Chromatin interaction maps allow recovery of entire chromosome arms. Note that the Y-linked contigs were scaffolded separately from X-linked and autosomal contigs. Unlinked regions with many contacts indicate repetitive regions. C. Comparison of current (Dmir2.0) versus old (Dmir1.0) D . miranda assembly. Note that the Y/neo-Y was not assembled in Dmir1.0, and the dot plot indicates homology between our neo-Y assembly and the neo-X. Other repeat-rich regions, such as the large pericentromeric block on AD, are also missing from D.mir1.0. D. BAC clone mapping for assembly verification. BAC clones are color coded according to how many genomic regions they map to in our assembly; green lines indicate stitch points of scaffolds based on Hi-C contacts, and the black line gives the local repeat content along the genome. Three hundred sixty-one sequenced BAC clones (97%) map contiguously and uniquely to our genome assembly. BAC, bacterial artificial chromosome; F, female; M, male; QC, quality control; Repeat %, local repeat content.

    Journal: PLoS Biology

    Article Title: De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture

    doi: 10.1371/journal.pbio.2006348

    Figure Lengend Snippet: Assembly and validation of Drosophila miranda genome. A . Overview of assembly pipeline. The steps include assembly of male PacBio reads followed by scaffolding using Hi-C, and extensive QC using BioNano reads and BAC clone sequencing followed by gene and repeat annotation. B. Hi-C linkage density map. Chromatin interaction maps allow recovery of entire chromosome arms. Note that the Y-linked contigs were scaffolded separately from X-linked and autosomal contigs. Unlinked regions with many contacts indicate repetitive regions. C. Comparison of current (Dmir2.0) versus old (Dmir1.0) D . miranda assembly. Note that the Y/neo-Y was not assembled in Dmir1.0, and the dot plot indicates homology between our neo-Y assembly and the neo-X. Other repeat-rich regions, such as the large pericentromeric block on AD, are also missing from D.mir1.0. D. BAC clone mapping for assembly verification. BAC clones are color coded according to how many genomic regions they map to in our assembly; green lines indicate stitch points of scaffolds based on Hi-C contacts, and the black line gives the local repeat content along the genome. Three hundred sixty-one sequenced BAC clones (97%) map contiguously and uniquely to our genome assembly. BAC, bacterial artificial chromosome; F, female; M, male; QC, quality control; Repeat %, local repeat content.

    Article Snippet: HybridScaffold was then used to produce hybrid maps from the BioNano contigs and the genomic scaffolds from our scaffolded PacBio assembly, and IrysView was used to visualize alignments of the BioNano contigs and genomic scaffolds to the hybrid ones. shows coverage of hybrid scaffolds by BioNano contigs and NGS contigs (genomic scaffolds).

    Techniques: Scaffolding, Hi-C, BAC Assay, Sequencing, Blocking Assay, Clone Assay

    Bayesian phylogenetic inference of CTV genomes and genome fragments. Unrooted, consensus phylogenetic trees were obtained from 2,000,000 generations of the Markov chain Monte Carlo simulation in Bayesian analysis using a general time-reversal model of nucleotide substitution [33] . The number above each branch indicates the Bayesian posterior probability. The scale bars represent 0.1 expected substitutions per site. Branch lengths are proportional to evolutionary distance. Sequences were aligned using ClustalX [47] and subsequently manually aligned prior to the Bayesian phylogenetic analysis. A, Known CTV genomes and CTV genomes assembled from resequencing analysis of FS2-2 (highlighted orange). The suffix at the end of fs2_2 distinguishes multiple genotypes in the isolate and also indicates the anchor sequence from which the consensus contig was generated by the Phrap program. B, the 5′ proximal 1 kb, and C, p33-coding region of CTV genomes obtained by direct sequencing of RT-PCR clones. In both B and C, Bayesian posterior probability and clones with identical sequences were omitted for clarity. Recombinant sequences are highlighted in green.

    Journal: PLoS ONE

    Article Title: Persistent Infection and Promiscuous Recombination of Multiple Genotypes of an RNA Virus within a Single Host Generate Extensive Diversity

    doi: 10.1371/journal.pone.0000917

    Figure Lengend Snippet: Bayesian phylogenetic inference of CTV genomes and genome fragments. Unrooted, consensus phylogenetic trees were obtained from 2,000,000 generations of the Markov chain Monte Carlo simulation in Bayesian analysis using a general time-reversal model of nucleotide substitution [33] . The number above each branch indicates the Bayesian posterior probability. The scale bars represent 0.1 expected substitutions per site. Branch lengths are proportional to evolutionary distance. Sequences were aligned using ClustalX [47] and subsequently manually aligned prior to the Bayesian phylogenetic analysis. A, Known CTV genomes and CTV genomes assembled from resequencing analysis of FS2-2 (highlighted orange). The suffix at the end of fs2_2 distinguishes multiple genotypes in the isolate and also indicates the anchor sequence from which the consensus contig was generated by the Phrap program. B, the 5′ proximal 1 kb, and C, p33-coding region of CTV genomes obtained by direct sequencing of RT-PCR clones. In both B and C, Bayesian posterior probability and clones with identical sequences were omitted for clarity. Recombinant sequences are highlighted in green.

    Article Snippet: These fragments and the quality scores associated with each base call were used in contig assembly by the Phrap program as implemented in the CodonCode Aligner program.

    Techniques: Sequencing, Generated, Reverse Transcription Polymerase Chain Reaction, Clone Assay, Recombinant

    Distribution of contig sequences after BLASTx analysis. (A) Percentage of sequences related to the main categories of existing viruses: vertebrate (blue), plant/fungal (green), invertebrate (brown), protozoan (yellow) viruses and bacteriophages (gray), and unassigned viral sequences (no data available concerning the taxonomic family, indicated in red). The total number of viral contigs is indicated below each pie chart. (B) Percentage of sequences related to the most abundant viral families, indicated by the same color range for each of the main viral categories as in (A) : blue = vertebrate, green = plant/fungal, brown = invertebrate, yellow = protozoan, gray = phage. Viral families accounting for less than 2% of all sequences are pooled in the “other” category (in purple), and read sequences with no available data regarding the taxonomic family are considered to be unassigned (in red).

    Journal: PLoS ONE

    Article Title: A Preliminary Study of Viral Metagenomics of French Bat Species in Contact with Humans: Identification of New Mammalian Viruses

    doi: 10.1371/journal.pone.0087194

    Figure Lengend Snippet: Distribution of contig sequences after BLASTx analysis. (A) Percentage of sequences related to the main categories of existing viruses: vertebrate (blue), plant/fungal (green), invertebrate (brown), protozoan (yellow) viruses and bacteriophages (gray), and unassigned viral sequences (no data available concerning the taxonomic family, indicated in red). The total number of viral contigs is indicated below each pie chart. (B) Percentage of sequences related to the most abundant viral families, indicated by the same color range for each of the main viral categories as in (A) : blue = vertebrate, green = plant/fungal, brown = invertebrate, yellow = protozoan, gray = phage. Viral families accounting for less than 2% of all sequences are pooled in the “other” category (in purple), and read sequences with no available data regarding the taxonomic family are considered to be unassigned (in red).

    Article Snippet: Larger viral contigs were obtained and their position in the genome determined, by pooling contig and read sequences for a given viral family for each sample, and assembling them with Sequencher 5.0 software (Gene Codes Corporation).

    Techniques:

    Integrated pipeline for de novo assembly of novel genome sequencing. The scheme is filtering data to remove low-quality and shot-read initial assemblies using variable software and compare to contigs, hybrid contigs using MIRA assembler, and contig ordering using SSPACE software to scaffold construction.

    Journal: Genomics & Informatics

    Article Title: Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

    doi: 10.5808/GI.2012.10.1.1

    Figure Lengend Snippet: Integrated pipeline for de novo assembly of novel genome sequencing. The scheme is filtering data to remove low-quality and shot-read initial assemblies using variable software and compare to contigs, hybrid contigs using MIRA assembler, and contig ordering using SSPACE software to scaffold construction.

    Article Snippet: Even though numerous contigs assembled with Illumina/Solexa data were produced in the eukaryotic genome, a few drafts for the assembled genome sequence were reported, except for the giant panda genome [ ], which was covered with assembled contigs (2.25 Gb), covering approximately 94% of the expected whole genome.

    Techniques: Sequencing, Software

    A schematic representation of de novo assembling and coverage estimate of the 15,263 bp mitogenome of Bactericera cockerelli using MiSeq data. In the top Contig line, the blue and red arrows represent the forward and reversed primers (BC-mito-F/BC-mito-R) used to verify the circularity of B . cockerelli mitogenome by PCR. Numbers are nucleotides in bp. In the Coverage section, the pink area represents nucleotide coverage with the highest of 16,468 X in nad 6 and the lowest of 2,074 X in CR. In the Reads section, ten top read assemblings from MiSeq data were representatively shown with blue as pair reads, red as forward reads only and green as reversed reads only.

    Journal: PLoS ONE

    Article Title: The Complete Mitochondrial Genome Sequence of Bactericera cockerelli and Comparison with Three Other Psylloidea Species

    doi: 10.1371/journal.pone.0155318

    Figure Lengend Snippet: A schematic representation of de novo assembling and coverage estimate of the 15,263 bp mitogenome of Bactericera cockerelli using MiSeq data. In the top Contig line, the blue and red arrows represent the forward and reversed primers (BC-mito-F/BC-mito-R) used to verify the circularity of B . cockerelli mitogenome by PCR. Numbers are nucleotides in bp. In the Coverage section, the pink area represents nucleotide coverage with the highest of 16,468 X in nad 6 and the lowest of 2,074 X in CR. In the Reads section, ten top read assemblings from MiSeq data were representatively shown with blue as pair reads, red as forward reads only and green as reversed reads only.

    Article Snippet: Coverage was calculated by mapping to the contig using paired reads of the MiSeq data by CLC Genomics Workbench 7.5 (CLC Bio, Denmark), with the following parameters: mismatch cost = 2, insertion cost = 3, deletion cost = 3, length fraction = 0.8, and similarity fraction = 0.9.

    Techniques: Polymerase Chain Reaction