2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Ecogenomics and taxonomy of Cyanobacteria phylum

Juline M. Walter, Felipe H. Coutinho, Bas E. Dutilh, Jean Swings, Fabiano Thompson​, and Cristiane C. Thompson​ (2017), "Ecogenomics and taxonomy of Cyanobacteria phylum", provisionally accepted in Frontiers in Microbiology (preprint in PeerJ Preprints). doi: 10.3389/fmicb.2017.02132.

Cyanobacteria are major contributors to global biogeochemical cycles. The genetic diversity among Cyanobacteria enables them to thrive across many habitats, although only a few studies have analysed the association of phylogenomic clades to specific environmental niches. In this study, we adopted an ecogenomics strategy with the aim to delineate ecological niche preferences of Cyanobacteria and integrate them to the genomic taxonomy of these bacteria. First, an appropriate phylogenomic framework was established using a set of genomic taxonomy signatures (including a tree based on conserved gene sequences, genome-to-genome distance, and average amino acid identity) to analyse ninety-nine publicly available cyanobacterial genomes. Next, the relative abundances of these genomes were determined throughout diverse global marine and freshwater ecosystems, using metagenomic data sets. The whole-genome-based taxonomy of the ninety-nine genomes allowed us to identify 57 (of which 28 are new genera) and 87 (of which 32 are new species) different cyanobacterial genera and species, respectively. The ecogenomic analysis allowed the distinction of three major ecological groups of Cyanobacteria (named as i. Low Temperature; ii. Low Temperature Copiotroph; and iii. High Temperature Oligotroph) that were coherently linked to the genomic taxonomy. This work establishes a new taxonomic framework for Cyanobacteria in the light of genomic taxonomy and ecogenomic approaches.

Temporal dynamics of uncultured viruses: a new dimension in viral diversity

Ksenia Arkhipova, Timofey Skvortsov, John P. Quinn, John W. McGrath, Christopher C.R. Allen, Bas E. Dutilh, Yvonne McElarney, and Leonid A. Kulakov (2017) "Temporal dynamics of uncultured viruses: a new dimension in viral diversity", The ISME Journal. Pubmed, doi: 10.1038/ismej.2017.157.

Recent work has vastly expanded the known viral genomic sequence space, but the seasonal dynamics of viral populations at the genome level remain unexplored. Here we followed the viral community in a freshwater lake for 1 year using genome-resolved viral metagenomics, combined with detailed analyses of the viral community structure, associated bacterial populations and environmental variables. We reconstructed 8950 complete and partial viral genomes, the majority of which were not persistent in the lake throughout the year, but instead continuously succeeded each other. Temporal analysis of 732 viral genus-level clusters demonstrated that one-fifth were undetectable at specific periods of the year. Based on host predictions for a subset of reconstructed viral genomes, we for the first time reveal three distinct patterns of host–pathogen dynamics, where the viruses may peak before, during or after the peak in their host’s abundance, providing new possibilities for modelling of their interactions. Time series metagenomics opens up a new dimension in viral profiling, which is essential to understand the full scale of viral diversity and evolution, and the ecological roles of these important factors in the global ecosystem.

Preprint: Induction of differentiation and metabolic reprogramming in human hepatoma cells by adult human serum

Rineke H. Steenbergen, Martin Oti, Rob ter Horst, Wilson Tat, Chris Neufeldt, Alexandr Belovodskiy, Tiing Tiing Chua, Woo Jung Cho, Michael Joyce, Bas E. Dutilh, and D. Lorne Tyrrell (2017), "Induction of differentiation and metabolic reprogramming in human hepatoma cells by adult human serum", bioRxiv, doi: 10.1101/180968.

Tissue culture medium routinely contains fetal bovine serum (FBS). Here we show that culturing human hepatoma cells in their native, adult serum (human serum, HS) results in the restoration of key morphological and metabolic features of normal liver cells. When moved to HS, these cells show differential transcription of 22-32% of the genes, stop proliferating, and assume a hepatocyte-like morphology. Metabolic analysis shows that the Warburg-like metabolic profile, typical for FBS-cultured cells, is replaced by a diverse metabolic profile consistent with in vivo hepatocytes. We demonstrate the formation of large lipid and glycogen stores, increased glycogenesis, increased β-oxidation, increased ketogenesis, and decreased glycolysis. Finally, organ-specific functions are restored, including xenobiotics degradation and secretion of bile, very low density lipoprotein, and albumin. Thus, organ-specific functions are not necessarily lost in cell cultures, but might be merely suppressed in FBS. Together, we showed that cells that are representative of normal physiology can be produced from cancer cells simply by replacing FBS by HS in culture media. The effect of serum is often overseen in cell culture and we provide a detailed study in the changes that occur, provide insight in some of the serum components that may play a role in the establishment of the different phenotypes, and discuss how these finding might be beneficial to a variety of research fields.

Editorial: Virus discovery by metagenomics: the (im)possibilities

Bas E. Dutilh, Alejandro Reyes, Richard J. Hall, and Katrine L. Whiteson (2017) "Editorial: Virus discovery by metagenomics: the (im)possibilities", Frontiers in Microbiology. Pubmed, doi: 10.3389/fmicb.2017.01710.

This Frontiers in Virology Research Topic showcases how metagenomic and bioinformatic approaches have been combined to discover, classify and characterize novel viruses. Since the late 1800s, the discovery of new viruses was a gradual process. Viruses were described one by one using a suite of techniques such as (electron) microscopy and viral culture. Investigators were usually interested in a disease state within an organism, and expeditions in viral ecology were rare. The advent of metagenomics using high-throughput sequencing has revolutionized not only the rate of virus discovery, but also the nature of the discoveries. For example, the viral ecology and etiology of many human diseases are being characterized, non-pathogenic viral commensals are ubiquitous, and the description of environmental viromes is making progress.

Preservation of bacterial DNA in 10-year-old guaiac FOBT cards and FIT tubes

Matheus C.F. Albuquerque, Yasmijn van Herwaarden, Guus A.M. Kortman, Bas E. Dutilh, Tanya Bisseling, and Annemarie Boleij (2017) "Preservation of bacterial DNA in 10-year-old guaiac FOBT cards and FIT tubes", Journal of Clinical Pathology. Pubmed, doi: 10.1136/jclinpath-2017-204592.

With great interest we read the article of Taylor et al in the Journal of Clinical Pathology regarding the use of guaiac faecal occult blood test (gFOBT) cards for microbiome studies. gFOBT cards were found to be an easy to use option for stool collection and gained results comparable to fresh stool, even when cards were stored for up to 3 years at ambient temperature before DNA extraction. We would like to share our experience that even after 10 years of storage, gFOBT cards and faecal immunochemical test (FIT) tubes can be used to study the microbiome.

Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans

Felipe H. Coutinho, Cynthia B. Silveira, Gustavo B. Gregoracci, Cristiane C. Thompson, Robert A. Edwards, Corina P.D. Brussaard, Bas E. Dutilh*, and Fabiano L. Thompson* (2017) "Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans", Nature Communications 8: 15955. Pubmed, doi: 10.1038/ncomms15955. *Authors contributed equally.

Marine viruses are key drivers of host diversity, population dynamics and biogeochemical cycling and contribute to the daily flux of billions of tons of organic matter. Despite recent advancements in metagenomics, much of their biodiversity remains uncharacterized. Here we report a data set of 27,346 marine virome contigs that includes 44 complete genomes. These outnumber all currently known phage genomes in marine habitats and include members of previously uncharacterized lineages. We designed a new method for host prediction based on co-occurrence associations that reveals these viruses infect dominant members of the marine microbiome such as Prochlorococcus and Pelagibacter. A negative association between host abundance and the virus-to-host ratio supports the recently proposed Piggyback-the-Winner model of reduced phage lysis at higher host densities. An analysis of the abundance patterns of viruses throughout the oceans revealed how marine viral communities adapt to various seasonal, temperature and photic regimes according to targeted hosts and the diversity of auxiliary metabolic genes.

Characterization and temperature dependence of arctic Micromonas polaris viruses

Douwe S. Maat*, Tristan Biggs*, Claire Evans, Judith D.L. van Bleijswijk, Nicole N. van der Wel, Bas E. Dutilh, and Corina P.D. Brussaard (2017) "Characterization and temperature dependence of arctic Micromonas polaris viruses", Viruses 9: 134. Pubmed, doi: 10.3390/v9060134. *Authors contributed equally.

Global climate change-induced warming of the Artic seas is predicted to shift the phytoplankton community towards dominance of smaller-sized species due to global warming. Yet, little is known about their viral mortality agents despite the ecological importance of viruses regulating phytoplankton host dynamics and diversity. Here we report the isolation and basic characterization of four prasinoviruses infectious to the common Arctic picophytoplankter Micromonas. We furthermore assessed how temperature influenced viral infectivity and production. Phylogenetic analysis indicated that the putative double-stranded DNA (dsDNA) Micromonas polaris viruses (MpoVs) are prasinoviruses (Phycodnaviridae) of approximately 120 nm in particle size. One MpoV showed intrinsic differences to the other three viruses, i.e., larger genome size (205 ± 2 vs. 191 ± 3 Kb), broader host range, and longer latent period (39 vs. 18 h). Temperature increase shortened the latent periods (up to 50%), increased the burst size (up to 40%), and affected viral infectivity. However, the variability in response to temperature was high for the different viruses and host strains assessed, likely affecting the Arctic picoeukaryote community structure both in the short term (seasonal cycles) and long term (global warming).

Insights of phage-host interaction in hypersaline ecosystem through metagenomics analyses

Amir Mohaghegh Motlagh, Ananda S. Bhattacharjee, Felipe Hernandes Coutinho, Bas E. Dutilh, Sherwood R. Casjens, and Ramesh K. Goel (2017) "Insights of phage-host interaction in hypersaline ecosystem through metagenomics analyses", Frontiers in Microbiology 8: 352. Pubmed, doi: 10.3389/fmicb.2017.00352.

Bacteriophages, as the most abundant biological entities on Earth, place significant predation pressure on their hosts. This pressure plays a critical role in the evolution, diversity, and abundance of bacteria. In addition, phages modulate the genetic diversity of prokaryotic communities through the transfer of auxiliary metabolic genes. Various studies have been conducted in diverse ecosystems to understand phage-host interactions and their effects on prokaryote metabolism and community composition. However, hypersaline environments remain among the least studied ecosystems and the interaction between the phages and prokaryotes in these habitats is poorly understood. This study begins to fill this knowledge gap by analyzing bacteriophage-host interactions in the Great Salt Lake, the largest prehistoric hypersaline lake in the Western Hemisphere. Our metagenomics analyses allowed us to comprehensively identify the bacterial and phage communities with Proteobacteria, Firmicutes, and Bacteroidetes as the most dominant bacterial species and Siphoviridae, Myoviridae, and Podoviridae as the most dominant viral families found in the metagenomic sequences. We also characterized interactions between the phage and prokaryotic communities of Great Salt Lake and determined how these interactions possibly influence the community diversity, structure, and biogeochemical cycles. In addition, presence of prophages and their interaction with the prokaryotic host was studied and showed the possibility of prophage induction and subsequent infection of prokaryotic community present in the Great Salt Lake environment under different environmental stress factors. We found that carbon cycle was the most susceptible nutrient cycling pathways to prophage induction in the presence of environmental stresses. This study gives an enhanced snapshot of phage and prokaryote abundance and diversity as well as their interactions in a hypersaline complex ecosystem, which can pave the way for further research studies.

Principles and Trends in Genomics and Computational Biology

The Course "Principles and Trends in Genomics and Computational Biology" is a first collaborative e-learning project involving the Oswaldo Cruz Foundation (Brazil) and the Institut Pasteur (France). The project was idealized by Fiocruz and Pasteur researchers Carolina Mizuno, Sara Cuadros, Fabiano Pais and Victor Pylro

Recent advances in science are leading to a revision and reorientation of methods, allowing old and current issues to be addressed in a new perspective. Next-generation sequencing, metagenomics, metatranscriptomics and all other “omics” are permitting a comparative analysis of biological systems, generating a large quantity of data and findings. Despite this progress, these technologies have developed faster than our ability to analyze this large amounts of data. In order to overcome this problem the course will enable students to get a working knowledge of various facets of molecular and computational biology, including genome structure and organization, introduction to Linux, besides other most advanced topics, such as transcriptomics and proteomics.
The course will comprise four modules to be offered independently. Each module will contain several lessons. The length of each module will be of one week. The modules will consist of text, video classes and monitored activities.
In addition, a last module with state-of-the-art talks related to the course theme will be used to inspire students looking for new challenging projects in the area.

Draft genome of Scalindua rubra, obtained from the interface above the Discovery Deep Brine in the Red Sea, sheds light on potential salt adaptation strategies in anammox bacteria

Daan R. Speth, Ilias Lagkouvardos, Yong Wang, Pei-Yuan Qian, Bas E. Dutilh, and Mike S. M. Jetten (2017) "Draft genome of Scalindua rubra, obtained from the interface above the Discovery Deep Brine in the Red Sea, sheds light on potential salt adaptation strategies in anammox bacteria", Microbial Ecology: 10.1007/s00248-017-0929-7. Pubmed, doi: 10.1007/s00248-017-0929-7.

Several recent studies have indicated that members of the phylum Planctomycetes are abundantly present at the brine-seawater interface (BSI) above multiple brine pools in the Red Sea. Planctomycetes include bacteria capable of anaerobic ammonium oxidation (anammox). Here, we investigated the possibility of anammox at BSI sites using metagenomic shotgun sequencing of DNA obtained from the BSI above the Discovery Deep brine pool. Analysis of sequencing reads matching the 16S rRNA and hzsA genes confirmed presence of anammox bacteria of the genus Scalindua. Phylogenetic analysis of the 16S rRNA gene indicated that this Scalindua sp. belongs to a distinct group, separate from the anammox bacteria in the seawater column, that contains mostly sequences retrieved from high-salt environments. Using coverage- and composition-based binning, we extracted and assembled the draft genome of the dominant anammox bacterium. Comparative genomic analysis indicated that this Scalindua species uses compatible solutes for osmoadaptation, in contrast to other marine anammox bacteria that likely use a salt-in strategy. We propose the name Candidatus Scalindua rubra for this novel species, alluding to its discovery in the Red Sea.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Taxonomy of prokaryotic viruses: 2016 update from the ICTV bacterial and archaeal viruses subcommittee

Evelien M. Adriaenssens, Mart Krupovic, Petar Knezevic, Hans-Wolfgang Ackermann, Jakub Barylski, J. Rodney Brister, Martha R. C. Clokie, Siobain Duffy, Bas E. Dutilh, Robert A. Edwards, Francois Enault, Ho Bin Jang, Jochen Klumpp, Andrew M. Kropinski, Rob Lavigne, Minna M. Poranen, David Prangishvili, Janis Rumnieks, Matthew B. Sullivan, Johannes Wittmann, Hanna M. Oksanen, Annika Gillis, Jens H. Kuhn (2016), "Taxonomy of prokaryotic viruses: 2016 update from the ICTV bacterial and archaeal viruses subcommittee", Archives of Virology. Pubmed, PDF, doi: 10.1007/s00705-016-3173-4.

Ultrastructure and viral metagenome of bacteriophages from an anaerobic methane oxidizing Methylomirabilis bioreactor enrichment culture

Lavinia Gambelli, Geert Cremers, Rob Mesman, Simon Guerrero, Bas E. Dutilh, Mike S. Jetten, Huub J. Op den Camp, and Laura van Niftrik (2016), "Ultrastructure and viral metagenome of bacteriophages from an anaerobic methane oxidizing Methylomirabilis bioreactor enrichment culture", Frontiers in Microbiology 7: 1740. Pubmed, doi: 10.3389/fmicb.2016.01740.

With its capacity for anaerobic methane oxidation and denitrification, the bacterium Methylomirabilis oxyfera plays an important role in natural ecosystems. Its unique physiology can be exploited for more sustainable wastewater treatment technologies. However, operational stability of full-scale bioreactors can experience setbacks due to, for example, bacteriophage blooms. By shaping microbial communities through mortality, horizontal gene transfer and metabolic reprogramming, bacteriophages are important players in most ecosystems. Here, we analysed an infected Methylomirabilis sp. bioreactor enrichment culture using (advanced) electron microscopy, viral metagenomics and bioinformatics. Electron micrographs revealed four different viral morphotypes, one of which was observed to infect Methylomirabilis cells. The infected cells contained densely packed ~55 nm icosahedral bacteriophage particles with a putative internal membrane. Various stages of virion assembly were observed. Moreover, during the bacteriophage replication, the host cytoplasmic membrane appeared extremely patchy, which suggests that the bacteriophages may use host bacterial lipids to build their own putative internal membrane. The viral metagenome contained 1.87 million base pairs of assembled viral sequences, from which five putative complete viral genomes were assembled and manually annotated. Using bioinformatics analyses, we could not identify which viral genome belonged to the Methylomirabilis- infecting bacteriophage, in part because the obtained viral genome sequences were novel and unique to this reactor system. Taken together these results show that new bacteriophages can be detected in anaerobic cultivation systems and that the effect of bacteriophages on the microbial community in these systems is a topic for further study.

Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses

Simon Roux, Jennifer R. Brum, Bas E. Dutilh, Shinichi Sunagawa, Melissa B. Duhaime, Alexander Loy, Bonnie T. Poulos, Natalie Solonenko, Elena Lara, Julie Poulain, Stéphane Pesant, Stefanie Kandels-Lewis, Céline Dimier, Marc Picheral, Sarah Searson, Corinne Cruaud, Adriana Alberti, Carlos M. Duarte, Josep M. Gasol, Dolors Vaqué, Peer Bork, Silvia G. Acinas, Patrick Wincker, and Matthew B. Sullivan for Tara Oceans Coordinators (2016), "Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses", Nature 537: 689–693. Pubmed, doi: 10.1038/nature19366.

Ocean microbes drive biogeochemical cycling on a global scale. However, this cycling is constrained by viruses that affect community composition, metabolic activity, and evolutionary trajectories. Owing to challenges with the sampling and cultivation of viruses, genome-level viral diversity remains poorly described and grossly understudied, with less than 1% of observed surface-ocean viruses known. Here we assemble complete genomes and large genomic fragments from both surface- and deep-ocean viruses sampled during the Tara Oceans and Malaspina research expeditions, and analyse the resulting ‘global ocean virome’ dataset to present a global map of abundant, double-stranded DNA viruses complete with genomic and ecological contexts. A total of 15,222 epipelagic and mesopelagic viral populations were identified, comprising 867 viral clusters (defined as approximately genus-level groups). This roughly triples the number of known ocean viral populations and doubles the number of candidate bacterial and archaeal virus genera, providing a near-complete sampling of epipelagic communities at both the population and viral-cluster level. We found that 38 of the 867 viral clusters were locally or globally abundant, together accounting for nearly half of the viral populations in any global ocean virome sample. While two-thirds of these clusters represent newly described viruses lacking any cultivated representative, most could be computationally linked to dominant, ecologically relevant microbial hosts. Moreover, we identified 243 viral-encoded auxiliary metabolic genes, of which only 95 were previously known. Deeper analyses of four of these auxiliary metabolic genes (dsrC, soxYZ, P-II (also known as glnB) and amoC) revealed that abundant viruses may directly manipulate sulfur and nitrogen cycling throughout the epipelagic ocean. This viral catalog and functional analyses provide a necessary foundation for the meaningful integration of viruses into ecosystem models where they act as key players in nutrient cycling and trophic networks.

News on Universiteit Utrecht (Nederlands/English), Ohio State University, University of Michigan, Universität Wien (Deutsch), Division of Microbial Ecology, Moore Foundation, Apa Science (Deutsch), BioPortfolio, Focus.it (Italiano), Krone.at (Deutsch), Kurier (Deutsch), Laboratory Equipment, Natural Science News, Nature World News, NRC (Nederlands), Phys.org, Science Daily, The Science Explorer, Science Orf (Deutsch), The Scientist, Seeker, Der Standard (Deutsch), Tendencias21 (Español), Vista al Mar (Español).

Preprint: Bottom-up ecology of the human microbiome: from metagenomes to metabolomes

Daniel R. Garza, Marcel C. Van Verk, Martijn A. Huynen, and Bas E. Dutilh (2016), "Bottom-up ecology of the human microbiome: from metagenomes to metabolomes", bioRxiv, doi: 10.1101/060673.

The environmental metabolome is a dominant and essential factor shaping microbial communities. Thus, we hypothesized that metagenomic datasets could reveal the quantitative metabolic status of a given sample. Using a newly developed bottom-up ecology algorithm, we predicted high-resolution metabolomes of hundreds of metagenomic datasets from the human microbiome, revealing body-site specific metabolomes consistent with known metabolomics data, and suggesting that common cosmetics ingredients are some of the major metabolites shaping the human skin microbiome.

Proposal of fifteen new species of Parasynechococcus based on genomic, physiological and ecological features

Felipe H. Coutinho, Bas E. Dutilh, Cristiane C. Thompson, and Fabiano L. Thompson (2016), "Proposal of fifteen new species of Parasynechococcus based on genomic, physiological and ecological features", Archives of Microbiology 198: 973-986. Pubmed, doi: 10.1007/s00203-016-1256-y.

Members of the recently proposed genus Parasynechococcus (Cyanobacteria) are extremely abundant throughout the global ocean and contribute significantly to global primary productivity. However, the taxonomy of these organisms remains poorly characterized. The aim of this study was to propose a new taxonomic framework for Parasynechococcus based on a genomic taxonomy approach that incorporates genomic, physiological and ecological data. Through in silico DNA–DNA hybridization, average amino acid identity, dinucleotide signatures and phylogenetic reconstruction, a total of 15 species of Parasynechococcus could be delineated. Each species was then described on the basis of their gene content, light and nutrient utilization strategies, geographical distribution patterns throughout the oceans and response to environmental parameters.

Bioinformatics for studying environmental microorganisms

Adriana M. Fróes and Bas E. Dutilh (2016), "Bioinformatics for studying environmental microorganisms". In: Molecular Diversity of Environmental Prokaryotes. Eds. Thiago B. Rodrigues and Amaro E. Trindade Silva. CRC Press.

Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

Daan R. Speth, Michiel H. in ’t Zandt, Simon Guerrero-Cruz, Bas E. Dutilh*, and Mike S.M. Jetten* (2016), "Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system", Nature Communications 7: 11172. Pubmed, doi: 10.1038/ncomms11172. *Authors contributed equally. News on Bionieuws (Dutch), Water Online, Radboud University (English/Dutch).

Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

Preprint: FOCUS2: agile and sensitive classification of metagenomics data using a reduced database

Genivaldo Silva, Bas Dutilh, Robert Edwards (2016), "FOCUS2: agile and sensitive classification of metagenomics data using a reduced database", bioRxiv, doi: 10.1101/046425.

Summary: Metagenomics approaches rely on identifying the presence of organisms in the microbial community from a set of unknown DNA sequences. Sequence classification has valuable applications in multiple important areas of medical and environmental research. Here we introduce FOCUS2, an update of the previously published computational method FOCUS. FOCUS2 was tested with 10 simulated and 543 real metagenomes demonstrating that the program is more sensitive, faster, and more computationally efficient than existing methods. Availability: The Python implementation is freely available at https://edwards.sdsu.edu/FOCUS2.

Computational pan-genomics: status, promises and challenges

The Computational Pan-Genomics Consortium: Tobias Marschall, Manja Marz, Thomas Abeel, Louis Dijkstra, Bas E. Dutilh, Ali Ghaffaari, Paul Kersey, Wigard P. Kloosterman, Veli Mäkinen, Adam M. Novak, Benedict Paten, David Porubsky, Eric Rivals, Can Alkan, Jasmijn A. Baaijens, Paul I. W. De Bakker, Valentina Boeva, Raoul J. P. Bonnal, Francesca Chiaromonte, Rayan Chikhi, Francesca D. Ciccarelli, Robin Cijvat, Erwin Datema, Cornelia M. Van Duijn, Evan E. Eichler, Corinna Ernst, Eleazar Eskin, Erik Garrison, Mohammed El-Kebir, Gunnar W. Klau, Jan O. Korbel, Eric-Wubbo Lameijer, Benjamin Langmead, Marcel Martin, Paul Medvedev, John C. Mu, Pieter Neerincx, Klaasjan Ouwens, Pierre Peterlongo, Nadia Pisanti, Sven Rahmann, Ben Raphael, Knut Reinert, Dick de Ridder, Jeroen de Ridder, Matthias Schlesner, Ole Schulz-Trieglaff, Ashley D. Sanders, Siavash Sheikhizadeh, Carl Shneider, Sandra Smit, Daniel Valenzuela, Jiayin Wang, Lodewyk Wessels, Ying Zhang, Victor Guryev, Fabio Vandin, Kai Ye, and Alexander Schönhuth (2016), "Computational pan-genomics: status, promises and challenges", Briefings in Bioinformatics, doi: 10.1093/bib/bbw089.

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

Microbial metabolism shifts towards an adverse profile with supplementary iron in the TIM-2 in vitro model of the human colon

Guus A.M. Kortman, Bas E. Dutilh, Annet J.H. Maathuis, Udo F. Engelke, Jos Boekhorst, Kevin P. Keegan, Fiona G.G. Nielsen, Jason Betley, Jacqueline C. Weir, Zoya Kingsbury, Leo A.J. Kluijtmans, Dorine W. Swinkels, Koen Venema, and Harold Tjalsma (2016), "Microbial metabolism shifts towards an adverse profile with supplementary iron in the TIM-2 in vitro model of the human colon", Frontiers in Microbiology 6: 1481. Pubmed, PDF, doi: 10.3389/fmicb.2015.01481.

Oral iron administration in African children can increase the risk for infections. However, it remains unclear to what extent supplementary iron affects the intestinal microbiome. We here explored the impact of iron preparations on microbial growth and metabolism in the well-controlled TNO's in vitro model of the large intestine (TIM-2). The model was inoculated with a human microbiota, without supplementary iron, or with 50 or 250 µmol/L ferrous sulfate, 50 or 250 µmol/L ferric citrate, or 50 µmol/L hemin. High resolution responses of the microbiota were examined by 16S rDNA pyrosequencing, microarray analysis, and metagenomic sequencing. The metabolome was assessed by fatty acid quantification, gas chromatography-mass spectrometry (GC-MS) and 1H-NMR spectroscopy. Cultured intestinal epithelial Caco-2 cells were used to assess fecal water toxicity. Microbiome analysis showed, among others, that supplementary iron induced decreased levels of Bifidobacteriaceae and Lactobacillaceae, while it caused higher levels of Roseburia and Prevotella. Metagenomic analyses showed an enrichment of microbial motility-chemotaxis systems, while the metabolome markedly changed from a saccharolytic to a proteolytic profile in response to iron. Branched chain fatty acids and ammonia levels increased significantly, in particular with ferrous sulfate. Importantly, the metabolite-containing effluent from iron-rich conditions showed increased cytotoxicity to Caco-2 cells. Our explorations indicate that in the absence of host influences, iron induces a more hostile environment characterized by a reduction of microbes that are generally beneficial, and increased levels of bacterial metabolites that can impair the barrier function of a cultured intestinal epithelial monolayer.

Taxonomy of prokaryotic viruses: update from the ICTV bacterial and archaeal viruses subcommittee

Mart Krupovic, Bas E. Dutilh, Evelien M. Adriaenssens, Johannes Wittmann, Finn K. Vogensen, Mathew B. Sullivan, Janis Rumnieks, David Prangishvili, Rob Lavigne, Andrew M. Kropinski, Jochen Klumpp, Annika Gillis, Francois Enault, Rob A. Edwards, Siobain Duffy, Martha R.C. Clokie, Jakub Barylski, Hans-Wolfgang Ackermann, and Jens H. Kuhn (2016), "Taxonomy of prokaryotic viruses: update from the ICTV bacterial and archaeal viruses subcommittee", Archives of Virology 161: 1095-1099. Pubmed, PDF, doi: 10.1007/s00705-015-2728-0. Prophage Blog.

Rob Lavigne, Takashi Yamada, Johannes Wittmann, Finn K. Vogensen, Mathew B. Sullivan, Janis Rumnieks, David Prangishvili, Jens H. Kuhn, Mart Krupovic, Andrew M. Kropinski, Jochen Klumpp, Annika Gillis, Francois Enault, Rob A. Edwards, Bas E. Dutilh, Siobain Duffy, Martha R.C. Clokie, Jakub Barylski, Hans-Wolfgang Ackermann, Evelien M. Adriaenssens (2015), "The Taxonomy of Bacterial & Archaeal Viruses: An Update from the International Committee on Taxonomy of Viruses", Evergreen Phage Meeting 2015, Evergreen, Washington, USA.

Computational approaches to predict bacteriophage-host relationships

Robert A. Edwards, Katelyn McNair, Karoline Faust, Jeroen Raes, and Bas E. Dutilh (2016), "Computational approaches to predict bacteriophage-host relationships", FEMS Microbiology Reviews fuv048, 40: 258–272. Pubmed, doi: 10.1093/femsre/fuv048. Editor's Choice.

Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus-host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage-host signals. Sequence homology approaches are the most effective at identifying known phage-host pairs. Compositional and abundance-based methods contain significant signal for phage-host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage-host relationships, with potential relevance for medical and industrial applications.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Dispersion of the HIV-1 epidemic in men who have sex with men in The Netherlands: a combined mathematical model and phylogenetic analysis

Daniela Bezemer, Anne Cori, Oliver Ratmann, Ard van Sighem, Hillegonda S. Hermanides, Bas E. Dutilh, Luuk Gras, Nuno Rodrigues Faria, Rob van den Hengel, Ashley J. Duits, Peter Reiss, Frank de Wolf, Christophe Fraser, and ATHENA observational cohort (2015), "Dispersion of the HIV-1 epidemic in men who have sex with men in The Netherlands: a combined mathematical model and phylogenetic analysis", PLoS Medicine 12: e1001898. Pubmed, PDF, doi: 10.1371/journal.pmed.1001898. News on NRC Medical News Today, Nederlands Dagblad, Seksoa, HIV Monitoring, Internazionale (Italiano), MedicalXpress, Science20.

Background The HIV-1 subtype B epidemic amongst men who have sex with men (MSM) is resurgent in many countries despite the widespread use of effective combination antiretroviral therapy (cART). In this combined mathematical and phylogenetic study of observational data, we aimed to find out the extent to which the resurgent epidemic is the result of newly introduced strains or of growth of already circulating strains. Methods and Findings As of November 2011, the ATHENA observational HIV cohort of all patients in care in the Netherlands since 1996 included HIV-1 subtype B polymerase sequences from 5,852 patients. Patients who were diagnosed between 1981 and 1995 were included in the cohort if they were still alive in 1996. The ten most similar sequences to each ATHENA sequence were selected from the Los Alamos HIV Sequence Database, and a phylogenetic tree was created of a total of 8,320 sequences. Large transmission clusters that included >=10 ATHENA sequences were selected, with a local support value >=0.9 and median pairwise patristic distance below the fifth percentile of distances in the whole tree. Time-varying reproduction numbers of the large MSM-majority clusters were estimated through mathematical modeling. We identified 106 large transmission clusters, including 3,061 (52%) ATHENA and 652 Los Alamos sequences. Half of the HIV sequences from MSM registered in the cohort in the Netherlands (2,128 of 4,288) were included in 91 large MSM-majority clusters. Strikingly, at least 54 (59%) of these 91 MSM-majority clusters were already circulating before 1996, when cART was introduced, and have persisted to the present. Overall, 1,226 (35%) of the 3,460 diagnoses among MSM since 1996 were found in these 54 long-standing clusters. The reproduction numbers of all large MSM-majority clusters were around the epidemic threshold value of one over the whole study period. A tendency towards higher numbers was visible in recent years, especially in the more recently introduced clusters. The mean age of MSM at diagnosis increased by 0.45 years/year within clusters, but new clusters appeared with lower mean age. Major strengths of this study are the high proportion of HIV-positive MSM with a sequence in this study and the combined application of phylogenetic and modeling approaches. Main limitations are the assumption that the sampled population is representative of the overall HIV-positive population and the assumption that the diagnosis interval distribution is similar between clusters. Conclusions The resurgent HIV epidemic amongst MSM in the Netherlands is driven by several large, persistent, self-sustaining, and, in many cases, growing sub-epidemics shifting towards new generations of MSM. Many of the sub-epidemics have been present since the early epidemic, to which new sub-epidemics are being added.

SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data

Genivaldo Gueiros Z. Silva, Kevin T. Green, Bas E. Dutilh, and Robert A. Edwards (2016), "SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data", Bioinformatics 32: 354-361. Pubmed, doi: 10.1093/bioinformatics/btv584.

Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1,000 times faster than other tools. Availability and implementation: SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS.

Sequence specificity between interacting and non-interacting homologs identifies interface residues - a homodimer and monomer use case

Qingzhen Hou, Bas E. Dutilh, Martijn A. Huynen, Jaap Heringa, and K. Anton Feenstra (2015), "Sequence specificity between interacting and non-interacting homologs identifies interface residues - a homodimer and monomer use case", BMC Bioinformatics 16: 325. Pubmed, PDF, doi: 10.1186/s12859-015-0758-y.

Background Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. Results We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision. Conclusions To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites.

Metagenomic and metaproteomic analyses of Accumulibacter phosphatis enriched floccular and granular biofilm

Jeremy J. Barr, Bas E. Dutilh, Connor T. Skennerton, Toshikazu Fukushima, Marcus L. Hastie, Jeffrey J. Gorman, Gene W. Tyson, and Philip L. Bond (2015), "Metagenomic and metaproteomic analyses of Accumulibacter phosphatis enriched floccular and granular biofilm", Environmental Microbiology 18: 273–287. Pubmed, doi: 10.1111/1462-2920.13019.

Biofilms are ubiquitous in nature, forming diverse adherent microbial communities that perform a plethora of functions. Here we operated two laboratory-scale sequencing batch reactors enriched with Candidatus Accumulibacter phosphatis (Accumulibacter) performing enhanced biological phosphorus removal (EBPR). Reactors formed two distinct biofilms, one floccular biofilm, consisting of small, loose, microbial aggregates, and one granular biofilm, forming larger, dense, spherical aggregates. Using metagenomic and metaproteomic methods we investigated the proteomic differences between these two biofilm communities, identifying a total of 2,022 unique proteins. To understand biofilm differences, we compared protein abundances that were statistically enriched in both biofilm states. Floccular biofilms were enriched with pathogenic secretion systems suggesting a highly competitive microbial community. Comparatively, granular biofilms revealed a high stress environment with evidence of nutrient starvation, phage predation pressure, and increased extracellular polymeric substance (EPS) and cell lysis. Granular biofilms were enriched in outer membrane transport proteins to scavenge the extracellular milieu for amino acids and other metabolites, likely released through cell lysis, to supplement metabolic pathways. This study provides the first detailed proteomic comparison between Accumulibacter-enriched floccular and granular biofilm communities, proposes a conceptual model for the granule biofilm, and offers novel insights into granule biofilm formation and stability.

From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems

Daniel R. Garza and Bas E. Dutilh (2015), "From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems", Cellular and Molecular Life Sciences 72: 4287-4308. PDF, Pubmed, doi: 10.1007/s00018-015-2004-1.

Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.

Niche distribution and influence of environmental parameters in marine microbial communities: a systematic review

Felipe H. Coutinho, Pedro M. Meirelles, Ana Paula B. Moreira, Rodolfo P. Paranhos, Bas E. Dutilh, and Fabiano L. Thompson (2015), "Niche distribution and influence of environmental parameters in marine microbial communities: a systematic review", PeerJ 3: e1008. PDF, Pubmed.

Associations between microorganisms occur extensively throughout Earth's oceans. Understanding how microbial communities are assembled and how the presence or absence of species is related to that of others are central goals of microbial ecology. Here, we investigate co-occurrence associations between marine prokaryotes by combining 180 new and publicly available metagenomic datasets from different oceans in a large-scale meta-analysis. A co-occurrence network was created by calculating correlation scores between the abundances of microorganisms in metagenomes. A total of 1,906 correlations amongst 297 organisms were detected, segregating them into 11 major groups that occupy distinct ecological niches. Additionally, by analyzing the oceanographic parameters measured for a selected number of sampling sites, we characterized the influence of environmental variables over each of these 11 groups. Clustering organisms into groups of taxa that have similar ecology, allowed the detection of several significant correlations that could not be observed for the taxa individually.

Illuminating 'dark matter': unknown bacteriophages and their role in our intestinal ecosystem

Bas E. Dutilh (2015). "Illuminating 'dark matter': unknown bacteriophages and their role in our intestinal ecosystem", Vidi award, NWO.

This Vidi award from the Netherlands Organization for Scientific Research (NWO) enables me to do 5 years of independent research. I will discover new human gut-associated bacteriophages and investigate their role in structuring the gut microbiome.

Genomic comparison of the closely-related Salmonella enterica serovars Enteritidis, Dublin and Gallinarum

T. David Matthews, Robert Schmieder, Genivaldo G. Z. Silva, Julia Busch, Noriko Cassman, Bas E. Dutilh, Dawn Green, Brian Matlock, Brian Heffernan, Gary J. Olsen, Leigh Farris Hanna, Dieter M, Schifferli, Stanley Maloy, Elizabeth A. Dinsdale, and Robert A. Edwards (2015), "Genomic comparison of the closely-related Salmonella enterica serovars Enteritidis, Dublin and Gallinarum", PLoS ONE 10: e0126883, PDF, Pubmed.

The Salmonella enterica serovars Enteritidis, Dublin, and Gallinarum are closely related but differ in virulence and host range. To identify the genetic elements responsible for these differences and to better understand how these serovars are evolving, we sequenced the genomes of Enteritidis strain LK5 and Dublin strain SARB12 and compared these genomes to the publicly available Enteritidis P125109, Dublin CT 02021853 and Dublin SD3246 genome sequences. We also compared the publicly available Gallinarum genome sequences from biotype Gallinarum 287/91 and Pullorum RKS5078. Using bioinformatic approaches, we identified single nucleotide polymorphisms, insertions, deletions, and differences in prophage and pseudogene content between strains belonging to the same serovar. Through our analysis we also identified several prophage cargo genes and pseudogenes that affect virulence and may contribute to a host-specific, systemic lifestyle. These results strongly argue that the Enteritidis, Dublin and Gallinarum serovars of Salmonella enterica evolve by acquiring new genes through horizontal gene transfer, followed by the formation of pseudogenes. The loss of genes necessary for a gastrointestinal lifestyle ultimately leads to a systemic lifestyle and niche exclusion in the host-specific serovars.

Copper tolerance and distribution of epibiotic bacteria associated with giant kelp Macrocystis pyrifera in southern California

Julia Busch, Juliana R. Nascimento, Ana Carolina Magalhães, Bas E. Dutilh, and Elizabeth Dinsdale (2015), "Copper tolerance and distribution of epibiotic bacteria associated with giant kelp Macrocystis pyrifera in southern California", Ecotoxicology 24: 1131-1140, PDF, Pubmed.

Kelp forests in southern California are important ecosystems that provide habitat and nutrition to a multitude of species. Macrocystis pyrifera and other brown algae that dominate kelp forests, produce negatively charged polysaccharides on the cell surface, which have the ability to accumulate transition metals such as copper. Kelp forests near areas with high levels of boating and other industrial activities are exposed to increased amounts of these metals, leading to increased concentrations on the algal surface. The increased concentration of transition metals creates a harsh environment for colonizing microbes altering community structure. The impact of altered bacterial populations in the kelp forest have unknown consequences that could be harmful to the health of the ecosystem. In this study we describe the community of microorganisms associated with M. pyrifera, using a culture based approach, and their increasing tolerance to the transition metal, copper, across a gradient of human activity in southern California. The results support the hypothesis that M. pyrifera forms a distinct marine microhabitat and selects for species of bacteria that are rarer in the water column, and that copper-resistant isolates are selected for in locations with elevated exposure to transition metals associated with human activity.

Immunoglobulin rearrangement analysis from multiple lesions in the same patient using Next Generation Sequencing

Silke Appenzeller, Christian Gilissen, Jos Rijntjes, Bastiaan B.J. Tops, Annemiek Kastner-van Raaij, Konnie M. Hebeda, Loes Nissen, Bas E. Dutilh, J. Han J.M. van Krieken, and Patricia J.T.A. Groenen (2015), "Immunoglobulin rearrangement analysis from multiple lesions in the same patient using Next Generation Sequencing", Histopathology 67: 843–858. Pubmed, doi: 10.1111/his.12714.

Background For patients who have multiple lymphomas with discordant pathology, it is relevant to determine whether there is one disseminated lymphoma or two unrelated lymphomas. Patients with disseminated, clonally related lymphomas, are usually treated with the most powerful drugs available, while patients with unrelated (primary) lymphomas mostly receive standard first-line therapies. Methods We have used next generation sequencing on the Ion Torrent Personal Genome Machine to characterize the immunoglobulin heavy gene V-D-J rearrangements in two diagnostic tissue samples, including formalin-fixed and paraffin-embedded tissue, of two patients with iatrogenic immunodeficiency-associated Epstein-Barr virus lymphoproliferative disorder, with ulcerative colitis as underlying disease. Results The immunoglobulin rearrangement sequences obtained by next generation sequencing revealed undoubtedly clonally related lesions in two tissue biopsies that were taken over time in the first patient, which is concordant with disseminated lymphoma. The other patient showed two clonally unrelated lesions, which is incompatible with clonal dissemination. This information was not inferred from evaluation of the heavy and light chain rearrangements by fragment analysis, which is currently the gold standard. Conclusion Our study demonstrates the diagnostic application of next generation sequencing of immunoglobulin rearrangement assessment in pathology for clinical decision making in patients with several simultaneous or subsequent lymphoproliferations.

Beyond research: a primer for considerations on using viral metagenomics in the field and clinic

Richard J. Hall, Jenny L. Draper, Fiona G.G. Nielsen, and Bas E. Dutilh (2015), "Beyond research: a primer for considerations on using viral metagenomics in the field and clinic", Frontiers in Microbiology 6: 224, doi: 10.3389/fmicb.2015.00224. Pubmed, PDF. News on DNA Digest, UBC.

Powered by recent advances in next-generation sequencing technologies, metagenomics has already unveiled vast microbial biodiversity in a range of environments, and is increasingly being applied in clinics for difficult-to-diagnose cases. It can be tempting to suggest that metagenomics could be used as a "universal test" for all pathogens without the need to conduct lengthy serial testing using specific assays. While this is an exciting prospect, there are issues that need to be addressed before metagenomic methods can be applied with rigour as a diagnostic tool, including the potential for incidental findings, unforeseen consequences for trade and regulatory authorities, privacy and cultural issues, data sharing, and appropriate reporting of results to end-users. These issues will require consideration and discussion across a range of disciplines, including scientists, ethicists, clinicians, diagnosticians, health practitioners, and ultimately the public. Here, we provide a primer for consideration on some of these issues.

Draft genome sequence of anammox bacterium "Candidatus Scalindua brodae", obtained using differential coverage binning of sequencing data from two reactor enrichments

Daan R. Speth, Lina Russ, Boran Kartal, Huub J.M. Op den Camp, Bas E. Dutilh, and Mike S.M. Jetten (2015), "Draft genome sequence of anammox bacterium "Candidatus Scalindua brodae", obtained using differential coverage binning of sequencing data from two reactor enrichments", Genome Announcements 3: e01415-14, doi: 10.1128/genomeA.01415-14. Pubmed, PDF.

We present the draft genome of anammox bacterium "Candidatus Scalindua brodae", which at 282 contigs is a major improvement over the highly fragmented genome assembly of related species "Ca. Scalindua profunda" (1,580 contigs) which was previously published.

Microbial taxonomy in the post-genomic era: Rebuilding from scratch?

Cristiane C. Thompson, Gilda R. Amaral, Robert A. Edwards, Martin F. Polz, Bas E. Dutilh, David W. Ussery, Erko Stackebrandt, Jean Swings, and Fabiano L. Thompson (2015), "Microbial taxonomy in the post-genomic era: Rebuilding from scratch?", Archives of Microbiology 197: 359-370, doi: 10.1007/s00203-014-1071-2. Pubmed, PDF.

Prokaryotic taxonomy should provide adequate descriptions of prokaryotic diversity in ecological, clinical and industrial environments. Its cornerstone, the prokaryote species has been re-evaluated twice (Stackebrandt et al., 2002; Gevers et al., 2005). It is time to revisit polyphasic taxonomy (Vandamme & Peeters, 2014), its principles and its practice, including its underlying pragmatic species concept. Ultimately, we will be able to realize the old dream of our predecessors and build a genomic prokaryotic taxonomy with genome sequences as gold standards.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Metagenomic ventures into outer sequence space

Bas E. Dutilh (2014), "Metagenomic ventures into outer sequence space", Bacteriophage 4: e979664. DOI, PDF.

Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns", and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter". However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.

A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes

Bas E. Dutilh, Noriko Cassman, Katelyn McNair, Savannah E. Sanchez, Genivaldo G.Z. Silva, Lance Boling, Jeremy J. Barr, Daan R. Speth, Victor Seguritan, Ramy K. Aziz, Ben Felts, Elizabeth A. Dinsdale, John L. Mokili and Robert A. Edwards (2014), "A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes", Nature Communications 5: 4498. Pubmed. Ranked in the 99 percentile of all tracked articles based on the Altmetric score.

Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which we refer to as crAssphage. Its ~97 kbp genome is six times more abundant in publicly available metagenomes than all other known phages together; it comprises up to 90% and 22% of all reads in virus-like particle (VLP)-derived metagenomes and total community metagenomes, respectively; and it totals 1.68% of all human faecal metagenomic sequencing reads in the public databases. The majority of crAssphage-encoded proteins match no known sequences in the database, which is why it was not detected before. Using a new co-occurrence profiling approach, we predict a Bacteroides host for this phage, consistent with Bacteroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome.

News about crAssphage: National Geographic Not Exactly Rocket Science, NPR Goats and Soda, Forbes, Science Friday (radio interview), BBC, BBC Mundo (Español), Nature Research Highlights. New Scientist 1, New Scientist 2, Scientific American, EOS (Nederlands), Twitter (#), Huffington Post, IFL Science!, NRC (Nederlands), nu.nl (Nederlands), Beta Sandwich (podcast minute 36), Austrian Tribune 1, Austrian Tribune 2, El Balad (Arabic), BayouBuzz, Bild (Deutsch), Bio-Medicine, BioPharma, Business Insider, Celebrity Cafe, China Topix, CN Blogs (Chinese), Cosmos, Cotidianul (Român), Counsel & Heal, Le Daily (Français), Daily Digest News, Daily Mail, Design & Trend, Diabetes.co.uk, Diario Panorama, 43.2 The Drop, DZ Online, East County Magazine, EBioTrade, Edwards lab, Einheit11, EurekAlert, Financial Express, Fox News, Franzcalvo, Frequency, GB Times, Gezondheid.nl (Nederlands), GGSOKU (Japanese), Il Giornale di Montesilvano (Italiano), Gizmodo, Google News, Google Trending terms, Guardian Liberty Voice, Health Canal, Health Hub, Healthline, Health Site, HNGN, HRN (Español), IANS Live, International Business Times 1, International Business Times 2, io9, Klaipėda (Lietuvos), KNAU, Knibmus, Kopalnia Wiedzy (Polski), KPCC, Kyle's Daily Bulletin, Lacto Bacto, Libertad Digital (Español), Live Science, Lockerdome, A man with a PhD, MaxiSciences (Français), Mejor Vendedor, Mendoza online (Español), Medical Daily, MedicalFacts.nl (Nederlands), MedicalXpress, medONLINE.at (Deutsch), Medportal.ru (русский), Meteoweb (Italiano), Microbiome Digest, Mizo News, Le Monde (Français), Mother Nature Network, MSFN, Muzic4you, Muy Interesante (Español), Nationale Zorggids (Nederlands), Nature Middle East, Nature World News, Newhub, News.de (Deutsch), NewsFactor, News Locker, News Tonight Africa, NineMSN, Noticias de la Ciencia (Español), La Nouvelle Tribune (Français), OnMed.gr (ελληνικα), Outbreak News Today, Pakistan Today, Le Point (Français), Prescripteurs (Français), Prevention, Radboudumc, R&D Magazine, Readable, Rectofossal Ambiguity, Reddit (AMA), Red Orbit, Regator, RIMLS, Salon, Science 2.0, Science Alert, Science Codex, Science Daily, Science Newsline, Science Sifter, Science World Report 1, Science World Report 2, The Scientist, Sci-News, Scinexx, SDSU, Slashdot, A Smaller Flea, Softpedia, Sohu (Chinese), Der Spiegel (Deutsch), Spire Healthcare Der Standard (Deutsch), TECHsme.sk (Slovenská), Tech Times, Tendencias (Español), Tendencias 21 (Español), The Times of India, Times of San Diego, Tinyletter, το χϖνι (ελληνικα), UBC, UK Wired, Vesti (русский), Virus Doctors blog, Voxxi, 90.9 wbur, Die Welt, WGBH News, Yahoo! News, Z News; and a Wikipedia page.

Preprint: FORMAL: A model to identify organisms present in metagenomes using Monte Carlo simulation

Genivaldo G.Z. Silva, Bas E. Dutilh, and Robert A. Edwards (2014), "FORMAL: A model to identify organisms present in metagenomes using Monte Carlo simulation". bioRXiv, doi: 10.1101/010801.

One of the major goals in metagenomics is to identify organisms present in the microbial community from a huge set of unknown DNA sequences. This profiling has valuable applications in multiple important areas of medical research such as disease diagnostics. Nevertheless, it is not a simple task, and many approaches that have been developed are slow and depend on the read length of the DNA sequences. Here we introduce an innovative and agile approach which k-mer and Monte Carlo simulation to profile and report abundant organisms present in metagenomic samples and their relative abundance without sequence length dependencies. The program was tested with a simulated metagenomes, and the results show that our approach predicts the organisms in microbial communities and their relative abundance.

Microbial community diversity and physical-chemical features of the Southwestern Atlantic Ocean

Nelson Alves Junior, Pedro Milet Meirelles, Eidy de Oliveira Santos, Bas Dutilh, Genivaldo G.Z. Silva, Rodolfo Paranhos, Anderson S. Cabral, Carlos Rezende, Tetsuya Iida, Rodrigo L. de Moura, Ricardo Henrique Kruger, Renato C. Pereira, Rogério Valle, Tomoo Sawabe, Cristiane Thompson, and Fabiano Thompson (2014), "Microbial community diversity and physical-chemical features of the Southwestern Atlantic Ocean", Archives of Microbiology 197: 165-179, doi: 10.1007/s00203-014-1035-6. Pubmed, PDF.

Microbial oceanography studies have demonstrated the central role of microbes in functioning and nutrient cycling of the global ocean. Most of these former studies including at Southwestern Atlantic Ocean (SAO) focused on surface seawater and benthic organisms (e.g., coral reefs and sponges). This is the first metagenomic study of the SAO. The SAO harbors a great microbial diversity and marine life (e.g., coral reefs and rhodolith beds). The aim of this study was to characterize the microbial community diversity of the SAO along the depth continuum and different water masses by means of metagenomic, physical-chemical and biological analyses. The microbial community abundance and diversity appear to be strongly influenced by the temperature, dissolved organic carbon, and depth, and three groups were defined [1. surface waters; 2. sub-superficial chlorophyll maximum (SCM) (48-82 m) and 3. deep waters (236-1,200 m)] according to the microbial composition. The microbial communities of deep water masses [South Atlantic Central water, Antarctic Intermediate water and Upper Circumpolar Deep water] are highly similar. Of the 421,418 predicted genes for SAO metagenomes, 36.7 % had no homologous hits against 17,451,486 sequences from the North Atlantic, South Atlantic, North Pacific, South Pacific and Indian Oceans. From these unique genes from the SAO, only 6.64 % had hits against the NCBI non-redundant protein database. SAO microbial communities share genes with the global ocean in at least 70 cellular functions; however, more than a third of predicted SAO genes represent a unique gene pool in global ocean. This study was the first attempt to characterize the taxonomic and functional community diversity of different water masses at SAO and compare it with the microbial community diversity of the global ocean, and SAO had a significant portion of endemic gene diversity. Microbial communities of deep water masses (236-1,200 m) are highly similar, suggesting that these water masses have very similar microbiological attributes, despite the common knowledge that water masses determine prokaryotic community and are barriers to microbial dispersal. The present study also shows that SCM is a clearly differentiated layer within Tropical waters with higher abundance of phototrophic microbes and microbial diversity.

Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions

Bas E. Dutilh, Cristiane C. Thompson, Ana C.P. Vicente, Michel A. Marin, Clarence Lee, Genivaldo G.Z. Silva, Robert Schmieder, Bruno G.N. Andrade, Luciane Chimetto, Daniel Cuevas, Daniel Garza, Iruka N. Okeke, A. Oladipo Aboderin, Jessica Spangler, Tristen Ross, Elizabeth A. Dinsdale, Fabiano L. Thompson, Timothy T. Harkins and Robert A. Edwards (2014), "Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions", BMC Genomics 15: 654. Pubmed, PDF. Featured in Biome.

Background Vibrio cholerae is a globally dispersed pathogen that has evolved with humans for centuries, but also includes non-pathogenic environmental strains. Here, we identify the genomic variability underlying this remarkable persistence across the three major niche dimensions space, time, and habitat. Results Taking an innovative approach of genome-wide association applicable to microbial genomes (GWAS-M), we classify 274 complete V. cholerae genomes by niche, including 39 newly sequenced for this study with the Ion Torrent DNA-sequencing platform. Niche metadata were collected for each strain and analyzed together with comprehensive annotations of genetic and genomic attributes, including point mutations (single-nucleotide polymorphisms, SNPs), protein families, functions and prophages. Conclusions Our analysis revealed that genomic variations, in particular mobile functions including phages, prophages, transposable elements, and plasmids underlie the metadata structuring in each of the three niche dimensions. This underscores the role of phages and mobile elements as the most rapidly evolving elements in bacterial genomes, creating local endemicity (space), leading to temporal divergence (time), and allowing the invasion of new habitats. Together, we take a data-driven approach for comparative functional genomics that exploits high-volume genome sequencing and annotation, in conjunction with novel statistical and machine learning analyses to identify connections between genotype and phenotype on a genome-wide scale.

Bas E. Dutilh (2012), "Gene repertoires responsible for persistence of Vibrio cholerae across niche dimensions by Ion Torrent sequencing", talk at Advances in Genome Biology and Technology Conference (AGBT 2012), Marco Island, Florida, USA.

Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands research expedition

Yan Wei Lim, Daniel Cuevas, Genivaldo G.Z. Silva, Kristen Aguinaldo, Elizabeth Dinsdale, Andreas Haas, Mark Hatay, Savannah Sanchez, Linda Wegley-Kelly, Bas E. Dutilh, Timothy Harkins, Clarence Lee, Warren Tom, Stuart Sandin, Jennifer E. Smith, Brian Zgliczynski, Mark J.A. Vermeij, Forest Rohwer and Robert A. Edwards (2014), "Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands research expedition", PeerJ 2: e520. PDF. Selected for PeerJ Picks 2015.

Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes,and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phospohonate as an important phosphorous source for microbes growing in theLine Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.

News highlights: EurekAlert, GenomeWeb, Genetic Engineering and Biotechnology News, International Business Times, Nature World News, Ocean Portal, PeerJ Blog, PeerJ Video, Science Daily, Tech Times, Times of San Diego.

Cell wall modifications during conidial maturation of the human pathogenic fungus Pseudallescheria boydii

Sarah Ghamrawi, Gilles Rénier, Patrick Saulnier, Stéphane Cuenot, Agata Zykwinska, Bas E. Dutilh, Christopher Thornton, Sébastien Faure and Jean-Philippe Bouchara (2014), "Cell wall modifications during conidial maturation of the human pathogenic fungus Pseudallescheria boydii". PLoS ONE 9: e100290. Pubmed, PDF.

Progress in extending the life expectancy of cystic fibrosis (CF) patients remains jeopardized by the increasing incidence of fungal respiratory infections. Pseudallescheria boydii (P. boydii), an emerging pathogen of humans, is a filamentous fungus frequently isolated from the respiratory secretions of CF patients. It is commonly believed that infection by this fungus occurs through inhalation of airborne conidia, but the mechanisms allowing the adherence of Pseudallescheria to the host epithelial cells and its escape from the host immune defenses remain largely unknown. Given that the cell wall orchestrates all these processes, we were interested in studying its dynamic changes in conidia as function of the age of cultures. We found that the surface hydrophobicity and electronegative charge of conidia increased with the age of culture. Melanin that can influence the cell surface properties, was extracted from conidia and estimated using UV-visible spectrophotometry. Cells were also directly examined and compared using electron paramagnetic resonance (EPR) that determines the production of free radicals. Consistent with the increased amount of melanin, the EPR signal intensity decreased suggesting polymerization of melanin. These results were confirmed by flow cytometry after studying the effect of melanin polymerization on the surface accessibility of mannose-containing glycoconjugates to fluorescent concanavalin A. In the absence of melanin, conidia showed a marked increase in fluorescence intensity as the age of culture increased. Using atomic force microscopy, we were unable to find rodlet-forming hydrophobins, molecules that can also affect conidial surface properties. In conclusion, the changes in surface properties and biochemical composition of the conidial wall with the age of culture highlight the process of conidial maturation. Mannose-containing glycoconjugates that are involved in immune recognition, are progressively masked by polymerization of melanin, an antioxidant that is commonly thought to allow fungal escape from the host immune defenses.

FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares

Genivaldo G.Z. Silva, Daniel A. Cuevas, Bas E. Dutilh and Robert A. Edwards (2014), "FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares", PeerJ 2: e425. Pubmed, PDF.

One of the major goals in metagenomics is to identify the organisms present in a microbial community from unannotated shotgun sequencing reads. Taxonomic profiling has valuable applications in biological and medical research, including disease diagnostics. Most currently available approaches do not scale well with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here we introduce FOCUS, an agile composition based approach using non-negative least squares (NNLS) to report the organisms present in metagenomic samples and profile their abundances. FOCUS was tested with simulated and real metagenomes, and the results show that our approach accurately predicts the organisms present in microbial communities. FOCUS was implemented in Python. The source code and web-sever are freely available at http://edwards.sdsu.edu/FOCUS.

Pharmacomicrobiomics: the impact of human microbiome variations on systems pharmacology and personalized therapeutics

Marwa ElRakaiby, Bas E. Dutilh, Mariam R. Rizkallah, Annemarie Boleij, Jason N. Cole and Ramy K. Aziz (2014), "Pharmacomicrobiomics: the impact of human microbiome variations on systems pharmacology and personalized therapeutics", OMICS: A Journal of Integrative Biology 18: 402-414. Pubmed, PDF. Listed first in Mary Ann Liebert's Most Read Articles.

The Human Microbiome Project (HMP) is a global initiative undertaken to identify and characterize the collection of human-associated microorganisms at multiple anatomic sites (skin, mouth, nose, colon, vagina), and to determine how intraindividual and interindividual alterations in the microbiome influence human health, immunity, and different disease states. In this review article, we summarize the key findings and applications of the HMP that may impact pharmacology and personalized therapeutics. We propose a microbiome cloud model, reflecting the temporal and spatial uncertainty of defining an individual's microbiome composition, with examples of how intraindividual variations (such as age and mode of delivery) shape the microbiome structure. Additionally, we discuss how this microbiome cloud concept hinders the definition of a core human microbiome and the classification of individuals according to their biome types. Detailed examples are presented on microbiome changes related to colorectal cancer, antibiotic administration, and pharmacomicrobiomics, or drug-microbiome interactions, highlighting how an improved understanding of the human microbiome, and alterations thereof, may lead to the development of novel therapeutic agents, the modification of antibiotic policies and implementation, and improved health outcomes. Finally, the prospects of a collaborative computational microbiome research initiative in Africa are discussed.

Colorectal cancer associated microbiota

Harold Tjalsma, Bas E. Dutilh, Annemarie Boleij, and Julian R. Marchesi (2014), "Colorectal cancer associated microbiota". In: Encyclopedia of Metagenomics III: Human Metagenomics. Eds. Sarah Highlander and Karen E. Nelson. Springer Reference.

Colorectal cancer (CRC) is one of the big killers in developed societies. More than one million new CRC cases are diagnosed and >600,000 patients die from this disease each year, making it the fourth most common cancer-associated cause of death. The genetic framework for this disease is formulated by the "adenoma-carcinoma sequence" based on the occurrence of driver mutations in crypt stem cells that render them immortal, and passenger mutations that accumulate as the tumor expands but which do not contribute directly to disease progression. Despite the fact that dietary and environmental factors (Western lifestyle), genetic background, and ethnicity have been associated with CRC risk, the exact molecular events that cause CRC driver mutations remain elusive. Important triggers may be derived from the dense and complex bacterial community of the gut that resides in close contact with the colonic mucosa and developing tumors. Recent clinical studies and experimental models have directly or indirectly linked the intestinal microbiota, or specific members thereof, to CRC progression.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Screening metatranscriptomes for toxin genes as functional drivers of human colorectal cancer

Bas E. Dutilh, Lennart Backus, Sacha A.F.T. van Hijum and Harold Tjalsma (2013), "Screening metatranscriptomes for toxin genes as functional drivers of human colorectal cancer", Best Practice & Research: Clinical Gastroenterology 27: 85-99. Pubmed, PDF.

The colonic mucosa is in constant physical interaction with a dense and complex bacterial community that comprises health-promoting and pathogenic microbes. Here, we highlight important clinical studies and experimental models that have linked the intestinal microbiota to the development of colorectal cancer (CRC). Moreover, we use recently published metatranscriptome sequencing data to test whether potentially carcinogenic toxin genes exhibit higher expression levels in human CRC tissue compared to adjacent non-malignant mucosa. Our analyses show a large variation in expression of toxin(-related) genes from different species. Surprisingly, Enterobacterial toxins were among the highest expressed, while Enterobacteria were not among the most abundant species in these samples. Although we can differentiate on- and off-tumor sites based on toxin reads, the read depth profiles are quite similar and show only limited coverage of the toxin genes. Thus, extended metagenomic studies are needed to obtain a high-resolution picture of host-pathogen interactions during human CRC.

Explaining microbial phenotypes on a genomic scale: GWAS for microbes

Bas E. Dutilh, Lennart Backus, Robert A. Edwards, Michiel Wels, Jumamurat R. Bayjanov and Sacha A.F.T. van Hijum (2013), "Explaining microbial phenotypes on a genomic scale: GWAS for microbes", Briefings in Functional Genomics 12: 366-380. Pubmed, PDF.

There is an increasing availability of complete or draft genome sequences for microbial organisms. These data form a potentially valuable resource for genotype-phenotype association and gene function prediction, provided that phenotypes are consistently annotated for all the sequenced strains. In this review, we address the requirements for successful gene-trait matching. We outline a basic protocol for microbial functional genomics, including genome assembly, annotation of genotypes (including SNPs, orthologous groups and prophages), data pre-processing, genotype-phenotype association, visualization and interpretation of results. The methodologies for association described herein can be applied to other data types, opening up possibilities to analyze transcriptome-phenotype associations, and correlate microbial population structure or activity, as measured by metagenomics, to environmental parameters.

Bas E. Dutilh (2013), "Genome-wide association studies for microbial genomes", talk at Conference on Predicting Cell Metabolism and Phenotypes, Menlo Park, California, USA.
Combining de novo and reference-guided assembly with Scaffold_builder

Genivaldo G.Z. Silva, Bas E. Dutilh, T. David Matthews, Keri Elkins, Robert Schmieder, Elizabeth A. Dinsdale and Robert A. Edwards (2013), "Combining de novo and reference-guided assembly with Scaffold_builder". Source Code for Biology and Medicine 8: 23. Pubmed, PDF.

Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the homology to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.

Genivaldo G. Z. Silva, Bas E. Dutilh, T. David Matthews, Keri Elkins, Elizabeth A. Dinsdale and Robert A. Edwards (2012), "Scaffold-builder for combining de novo and Reference-guided assembly", poster P48 at 10th Annual Rocky Mountain Bioinformatics Conference 2012, Snowmass, Colorado, USA.

Identification of a novel human papillomavirus by metagenomic analysis of samples from patients with febrile respiratory illness

John L. Mokili, Bas E. Dutilh, Yan Wei Lim, Bradley S. Schneider, Travis Taylor, Matthew R. Haynes, David Metzgar, Christopher A. Myers, Patrick J. Blair, Bahador Nosrat, Nathan D. Wolfe and Forest Rohwer (2013), "Identification of a novel human papillomavirus by metagenomic analysis of samples from patients with febrile respiratory illness", PLoS ONE 8: e58404. Pubmed, PDF.

As part of a virus discovery investigation using a metagenomic approach, a highly divergent novel Human papillomavirus type was identified in pooled convenience nasal/oropharyngeal swab samples collected from patients with febrile respiratory illness. Phylogenetic analysis of the whole genome and the L1 gene reveals that the new HPV identified in this study clusters with previously described gamma papillomaviruses, sharing only 61.1% (whole genome) and 63.1% (L1) sequence identity with its closest relative in the Papillomavirus episteme (PAVE) database. This new virus was named HPV_SD2 pending official classification. The complete genome of HPV-SD2 is 7,299 bp long (36.3% G/C) and contains 7 open reading frames (L2, L1, E6, E7, E1, E2 and E4) and a non-coding long control region (LCR) between L1 and E6. The metagenomic procedures, coupled with the bioinformatic methods described herein are well suited to detect small circular genomes such as those of human papillomaviruses.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Reference-independent comparative metagenomics using cross-assembly: crAss

Bas E. Dutilh, Robert Schmieder, Jim Nulton, Ben Felts, Peter Salamon, Robert A. Edwards and John L. Mokili (2012), "Reference-independent comparative metagenomics using cross-assembly: crAss", Bioinformatics 28: 3225-3231. Pubmed, PDF.

Motivation: Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known micro-organisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic analysis by creating a cross-assembly of all reads, i.e. a single assembly of reads from different samples. Comparative metagenomics studies the inter-relationships between metagenomes from different samples. Using an assembly algorithm is a fast and intuitive way to link (partially) homologous reads without requiring a database of reference sequences. Results: Here, we introduce crAss, a novel bioinformatic tool that enables fast, simple analysis of cross-assembly files, yielding distances between all metagenomic sample pairs and an insightful image displaying the similarities. Availability and Implementation: crAss is available as a web server at http://edwards.sdsu.edu/crass/ and the Perl source code can be downloaded to run as a stand-alone, command line tool.

Bas E. Dutilh (2013), "Comparative metagenomics by cross-assembly", talk at 8th Benelux Bioinformatics Conference 2013, Brussels, Belgium.

Bas E. Dutilh, Robert Schmieder, Jim Nulton, Ben Felts, Peter Salamon, Robert A. Edwards and John L. Mokili (2013), "Comparative metagenomics by cross-assembly", 8th Benelux Bioinformatics Conference 2013. Brussels, Belgium.

Discovery of a hapE mutation that causes azole resistance in Aspergillus fumigatus through whole genome sequencing and sexual crossing

Simone M.T. Camps*, Bas E. Dutilh*, Maiken C. Arendrup, Antonius J.M.M. Rijs, Eveline Snelders, Martijn A. Huynen, Paul E. Verweij and Willem J.G. Melchers (2012), "Discovery of a hapE mutation that causes azole resistance in Aspergillus fumigatus through whole genome sequencing and sexual crossing", PLoS ONE 7: e50034. Pubmed, PDF. *Authors contributed equally. F1000 Must Read.

Azole compounds are the primary therapy for patients with diseases caused by Aspergillus fumigatus. However, prolonged treatment may cause resistance to develop, which is associated with treatment failure. The azole target cyp51A is a hotspot for mutations that confer phenotypic resistance, but in an increasing number of resistant isolates the underlying mechanism remains unknown. Here, we report the discovery of a novel resistance mechanism, caused by a mutation in the CCAAT-binding transcription factor complex subunit HapE. From one patient, four A. fumigatus isolates were serially collected. The last two isolates developed an azole resistant phenotype during prolonged azole therapy. Because the resistant isolates contained a wild type cyp51A gene and the isolates were isogenic, the complete genomes of the last susceptible isolate and the first resistant isolate (taken 17 weeks apart) were sequenced using Illumina technology to identify the resistance conferring mutation. By comparing the genome sequences to each other as well as to two A. fumigatus reference genomes, several potential non-synonymous mutations in protein-coding regions were identified, six of which could be confirmed by PCR and Sanger sequencing. Subsequent sexual crossing experiments showed that resistant progeny always contained a P88L substitution in HapE, while the presence of the other five mutations did not correlate with resistance in the progeny. Cloning the mutated hapE gene into the azole susceptible akuBKU80 strain showed that the HapE P88L mutation by itself could confer the resistant phenotype. This is the first time that whole genome sequencing and sexual crossing strategies have been used to find the genetic basis of a trait of interest in A. fumigatus. The discovery may help understand alternate pathways for azole resistance in A. fumigatus with implications for the molecular diagnosis of resistance and drug discovery.

Oxygen minimum zones harbor novel viral communities with low diversity

Noriko Cassman*, Alejandra Prieto-Davó*, Kevin Walsh, Genivaldo G. Z. Silva, Florent Angly, Sajia Akhter, Katie Barott, Julia Busch, Tracey McDole, J. Matthew Haggerty, Dana Willner, Gadiel Alarcón, Osvaldo Ulloa, Edward F. DeLong, Bas E. Dutilh, Forest Rohwer and Elizabeth A. Dinsdale (2012), "Oxygen minimum zones harbor novel viral communities with low diversity", Environmental Microbiology 14: 3043-3065. Pubmed, PDF. *Authors contributed equally.

Oxygen minimum zones (OMZs) are oceanographic features that affect ocean productivity and biodiversity, and contribute to ocean nitrogen loss and greenhouse gas emissions. Here we describe the viral communities associated with the Eastern Tropical South Pacific (ETSP) OMZ off Iquique, Chile for the first time through abundance estimates and viral metagenomic analysis. The viral to microbial ratio (VMR) in the ETSP OMZ fluctuated in the oxycline and declined in the anoxic core to below one on several occasions. The number of viral genotypes (unique genomes as defined by sequence assembly) ranged from 2040 at the surface to 98 in the oxycline, which is the lowest viral diversity recorded to date in the ocean. Within the ETSP OMZ viromes, only 4.95 % of genotypes were shared between surface and anoxic core viromes using reciprocal BLASTn sequence comparison. ETSP virome comparison with surface marine viromes (Sargasso Sea, Gulf of Mexico, Kingman Reef, Chesapeake Bay) revealed a dissimilarity of ETSP OMZ viruses to those from other oceanic regions. From the 1.4 million non-redundant DNA sequences sampled within the altered oxygen conditions of the ETSP OMZ, more than 97.8 % were novel. Of the average 3.2 % of sequences that showed similarity to the SEED non-redundant database, phage sequences dominated the surface viromes, eukaryotic virus sequences dominated the oxycline viromes, and phage sequences dominated the anoxic core viromes. The viral community of the ETSP OMZ was characterized by fluctuations in abundance, taxa and diversity across the oxygen gradient. The ecological significance of these changes was difficult to predict, however, it appears that the reduction in oxygen coincides with an increased shedding of eukaryotic viruses in the oxycline, and a shift to unique viral genotypes in the anoxic core.

Genomes, metagenomes, and microbiomes: a new biology for a new millennium

Bas E. Dutilh and Ramy K. Aziz (2012), "Genomes, metagenomes, and microbiomes: a new biology for a new millennium", New Life Sciences: Linking Science to Society, BioVision Alexandria 2012 Proceedings: 105-116. PDF (click here for the whole book).

Taxonomic and functional microbial signatures of the endemic marine sponge Arenosclera brasiliensis

Amaro E. Trindade-Silva, Cintia Rua, Genivaldo G. Z. Silva, Bas E. Dutilh, Ana Paula B. Moreira, Robert A. Edwards, Eduardo Hajdu, Gisele Lobo-Hajdu, Ana Tereza Vasconcelos, Roberto G. S. Berlinck, and Fabiano L. Thompson (2012), "Taxonomic and functional microbial signatures of the endemic marine sponge Arenosclera brasiliensis", PLoS ONE 7: e39905. Pubmed, PDF.

The endemic marine sponge Arenosclera brasiliensis (Porifera, Demospongiae, Haplosclerida) is a known source of secondary metabolites such as arenosclerins A-C. In the present study, we established the composition of the A. brasiliensis microbiome and the metabolic pathways associated with this community. We used 454 shotgun pyrosequencing to generate approximately 640,000 high-quality sponge-derived sequences (~150 Mb). Clustering analysis including sponge, seawater and twenty-three other metagenomes derived from marine animal microbiomes shows that A. brasiliensis contains a specific microbiome. Fourteen bacterial phyla (including Proteobacteria, Cyanobacteria, Actinobacteria, Bacteroidetes, Firmicutes and Chloroflexi) were consistently found in the A. brasiliensis metagenomes. The A. brasiliensis microbiome is enriched for Betaproteobacteria (e.g., Burkholderia) and Gammaproteobacteria (e.g., Pseudomonas and Alteromonas) compared with the surrounding planktonic microbial communities. Functional analysis based on Rapid Annotation using Subsystem Technology (RAST) indicated that the A. brasiliensis microbiome is enriched for sequences associated with membrane transport and one-carbon metabolism. In addition, there was an overrepresentation of sequences associated with aerobic and anaerobic metabolism as well as the synthesis and degradation of secondary metabolites. This study represents the first analysis of sponge-associated microbial communities via shotgun pyrosequencing, a strategy commonly applied in similar analyses in other marine invertebrate hosts, such as corals and algae. We demonstrate that A. brasiliensis has a unique microbiome that is distinct from that of the surrounding planktonic microbes and from other marine organisms, indicating a species-specific microbiome.

A bacterial driver-passenger model for colorectal cancer: beyond the usual suspects

Harold Tjalsma, Annemarie Boleij, Julian R. Marchesi and Bas E. Dutilh (2012), "A Bacterial Driver-Passenger Model for Colorectal Cancer: Beyond the Usual Suspects", Nature Reviews Microbiology 10: 575-582. Pubmed, PDF.

Cancer has long been considered a genetic disease. However, accumulating evidence supports the involvement of infectious agents in the development of cancer, especially in those organs that are continuously exposed to microorganisms, such as the large intestine. Recent next-generation sequencing studies of the intestinal microbiota now offer an unprecedented view of the aetiology of sporadic colorectal cancer and have revealed that the microbiota associated with colorectal cancer contains bacterial species that differ in their temporal associations with developing tumours. Here, we propose a bacterial driver-passenger model for microbial involvement in the development of colorectal cancer and suggest that this model be incorporated into the genetic paradigm of cancer progression.

Bacterial responses to a simulated colon tumor microenvironment

Annemarie Boleij, Bas E Dutilh, Guus Kortman, Rian Roelofs, Coby M. Laarakkers, Udo F. Engelke and Harold Tjalsma (2012), "Bacterial Responses to a Simulated Colon Tumor Microenvironment", Molecular and Cellular Proteomics 11: 851-862. Pubmed, PDF.

One of the few bacteria that have been consistently linked to colorectal cancer (CRC) is the opportunistic pathogen Streptococcus gallolyticus. S. gallolyticus infections are generally regarded as an indicator for colonic malignancy, while the carriage rate of this bacterium in the healthy large intestine is relatively low. We speculated that the physiological changes accompanying the development of CRC might favor the colonization of this bacterium. To investigate whether colon tumor cells can support the survival of S. gallolyticus, S. gallolyticus was grown in spent medium of malignant colonocytes to simulate the altered metabolic conditions in the CRC microenvironment. These in vitro simulations indicated that S. gallolyticus had a significant growth advantage in these spent media, which was not observed for other intestinal bacteria. Under these conditions, bacterial responses were profiled by proteome analysis and metabolic shifts were analyzed by 1H-NMR-spectroscopy. In silico pathway analysis of the differentially expressed proteins and metabolite analysis indicated that this advantage resulted from the increased utilization of glucose, glucose derivates and alanine. Together, these data suggest that tumor cell metabolites facilitate the survival of S. gallolyticus, favoring its local outgrowth and providing a possible explanation for the specific association of S. gallolyticus with colonic malignancy.

Genome-wide study of the defective sucrose fermenter strain of Vibrio cholerae from the Latin American cholera epidemic

Daniel R. Garza, Cristiane C. Thompson, Edvaldo C.B. Loureiro, Bas E. Dutilh, Davi T. Inada, Edivaldo C. Sousa Jr, Jedson F. Cardoso, Márcio R.T. Nunes, Clayton Pereira Silva de Lima, Rodrigo V.D. Silvestre, Keley N.B. Nunes, Elisabeth C.O. Santos, Robert A. Edwards, Ana C.P. Vicente and Lena L. Canto de Sá Morais (2012), "Genome-wide study of the defective sucrose fermenter strain of Vibrio cholerae from the Latin American cholera epidemic", PLoS ONE 7: e37283. Pubmed, PDF.

The 7th cholera pandemic reached Latin America in 1991, spreading from Peru to virtually all Latin American countries. During the late epidemic period, a strain that failed to ferment sucrose dominated cholera outbreaks in the Northern Brazilian Amazon region. In order to understand the genomic characteristics and the determinants of this altered sucrose fermenting phenotype, the genome of the strain IEC224 was sequenced. This paper reports a broad genomic study of this strain, showing its correlation with the major epidemic lineage. The potentially mobile genomic regions are shown to possess GC content deviation, and harbor the main V. cholerae virulence genes. A novel bioinformatic approach was applied in order to identify the putative functions of hypothetical proteins, and was compared with the automatic annotation by RAST. The genome of a large bacteriophage was found to be integrated to the IEC224's alanine aminopeptidase gene. The presence of this phage is shown to be a common characteristic of the El Tor strains from the Latin American epidemic, as well as its putative ancestor from Angola. The defective sucrose fermenting phenotype is shown to be due to a single nucleotide insertion in the V. cholerae sucrose-specific transportation gene. This frame-shift mutation truncated a membrane protein, altering its structural pore-like conformation. Further, the identification of a common bacteriophage reinforces both the monophyletic and African-Origin hypotheses for the main causative agent of the 1991 Latin America cholera epidemics.

The metagenome of the marine anammox bacterium "Candidatus Scalindua profunda" illustrates the versatility of this globally important nitrogen cycle bacterium

Jack van de Vossenberg, Dagmar Woebken, Wouter J. Maalcke, Hans J.C.T. Wessels, Bas E. Dutilh, Boran Kartal, Eva M. Janssen-Megens, Guus Roeselers, Jia Yan, Daan Speth, Jolein Gloerich, Wim Geerts, Erwin van der Biezen, Wendy Pluk, Kees-Jan Françoijs, Lina Russ, Phyllis Lam, Stefanie A. Malfatti, Susannah Green Tringe, Suzanne C.M. Haaijer, Huub J.M. Op den Camp, Henk G. Stunnenberg, Rudi Amann, Marcel M.M. Kuypers and Mike S.M. Jetten (2012), "The metagenome of the marine anammox bacterium "Candidatus Scalindua profunda" illustrates the versatility of this globally important nitrogen cycle bacterium", Environmental Microbiology 15: 1275-1289. Pubmed, PDF.

Anaerobic ammonium oxidizing (anammox) bacteria are responsible for a significant portion of the loss of fixed nitrogen from the oceans, making them important players in the global nitrogen cycle. To date, marine anammox bacteria found in marine water columns and sediments worldwide belong almost exclusively to the "Candidatus Scalindua" species, but the molecular 5 basis of their metabolism and competitive fitness is presently unknown. We applied community sequencing of a marine anammox enrichment culture dominated by "Candidatus Scalindua profunda" to construct a genome assembly, which was subsequently used to analyze the most abundant gene transcripts and proteins. In the S. profunda assembly, 4756 genes were annotated, and only about half of them showed the highest identity to the only other anammox bacterium of which a metagenome assembly had been constructed so far, the fresh water "Candidatus Kuenenia stuttgartiensis". In total, 2016 genes of S. profunda could not be matched to the K. stuttgartiensis metagenome assembly at all, and a similar number of genes in K. stuttgartiensis could not be found in S. profunda. Most of these genes did not have a known function but 98 expressed genes could be attributed to oligopeptide transport, amino acid metabolism, use of organic acids, and electron transport. On the basis of the S. profunda metagenome, and environmental metagenome data, we observed pronounced differences in the gene organization and expression of important anammox enzymes, such as hydrazine synthase (HzsAB), nitrite reductase (NirS), and inorganic nitrogen transport proteins. Adaptations of Scalindua to the substrate limitation of the ocean may include highly expressed ammonium, nitrite and oligopeptide transport systems and pathways for the transport, oxidation, and assimilation of small organic compounds that may allow a more versatile lifestyle contributing to the competitive fitness of Scalindua in the marine realm.

Genome sequence of the ethanol tolerant Lactobacillus vini strains LMG23202T and JP7.8.9

Brigida T. Luckwu de Lucena, Genivaldo G.Z. Silva, Billy Manoel dos Santos, Graciela M. Dias, Gilda R. Amaral, Ana P. Moreira, Marcos A. de Morais Júnior, Bas E. Dutilh, Robert A. Edwards, Valdir Balbino, Cristiane C. Thompson and Fabiano L. Thompson (2012), "Genome sequence of the ethanol tolerant Lactobacillus vini strains LMG23202T and JP7.8.9", Journal of Bacteriology 194: 3018. Pubmed, PDF.

We report on the genome sequences of Lactobacillus vini LMG 23202T (DSM 20605) (isolated from fermenting grape musts in Spain) and the industrial strain L. vini JP7.8.9 (isolated from a bioethanol plant in northeast Brazil). All contigs were assembled using gsAssembler, and genes were predicted and annotated using Rapid Annotation using Subsystem Technology (RAST). The identified genome sequence of LMG 23202T had 2,201,333 bp, 37.6% G+C, and 1,833 genes, whereas the identified genome sequence of JP7.8.9 had 2,301,037 bp, 37.8% G+C, and 1,739 genes. The gene repertoire of the species L. vini offers promising opportunities for biotechnological applications.

Metagenomics and future perspectives in virus discovery

John L. Mokili, Forest Rohwer and Bas E. Dutilh (2012), "Metagenomics and future perspectives in virus discovery", Current Opinion in Virology. 2: 63-77. Pubmed, DOI. Listed #17 in Elsevier Virology's most downloaded articles.

Monitoring the emergence and re-emergence of viral diseases with the goal of containing the spread of viral agents requires both adequate preparedness and quick response. Identifying the causative agent of a new epidemic is one of the most important steps for effective response to disease outbreaks. Traditionally, virus discovery required propagation of the virus in cell culture, a proven technique responsible for the identification of the vast majority of viruses known to date. However, many viruses cannot be easily propagated in cell culture, thus limiting our knowledge of viruses. Viral metagenomic analyses of environmental samples suggest that the field of virology has explored less than 1% of the extant viral diversity. In the last decade, the culture-independent and sequence-independent metagenomic approach has permitted the discovery of many viruses in a wide range of samples. Phylogenetically, some of these viruses are distantly related to previously discovered viruses. In addition, 60-99% of the sequences generated in different viral metagenomic studies are not homologous to known viruses. In this review, we discuss the advances in the area of viral metagenomics during the last decade and their relevance to virus discovery, clinical microbiology and public health. We discuss the potential of metagenomics for characterization of the normal viral population in a healthy community and identification of viruses that could pose a threat to humans through zoonosis. In addition, we propose a new model of the Koch's postulates named the 'Metagenomic Koch's Postulates'. Unlike the original Koch's postulates and the Molecular Koch's postulates as formulated by Falkow, the metagenomic Koch's postulates focus on the identification of metagenomic traits in disease cases. The metagenomic traits that can be traced after healthy individuals have been exposed to the source of the suspected pathogen.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Towards the human colorectal cancer microbiome

Julian R. Marchesi, Bas E. Dutilh, Neil Hall, Wilbert H. M. Peters, Rian Roelofs, Annemarie Boleij and Harold Tjalsma (2011), "Towards the Human Colorectal Cancer Microbiome", PLoS ONE 6: e20447. Pubmed, PDF. F1000 Recommended.

Multiple factors drive the progression from healthy mucosa towards sporadic colorectal carcinomas and accumulating evidence associates intestinal bacteria with disease initiation and progression. Therefore, the aim of this study was to provide a first high-resolution map of colonic dysbiosis that is associated with human colorectal cancer (CRC). To this purpose, the microbiomes colonizing colon tumor tissue and adjacent non-malignant mucosa were compared by deep rRNA sequencing. The results revealed striking differences in microbial colonization patterns between these two sites. Although inter-individual colonization in CRC patients was variable, tumors consistently formed a niche for Coriobacteria and other proposed probiotic bacterial species, while potentially pathogenic Enterobacteria were underrepresented in tumor tissue. As the intestinal microbiota is generally stable during adult life, these findings suggest that CRC-associated physiological and metabolic changes recruit tumor-foraging commensal-like bacteria. These microbes thus have an apparent competitive advantage in the tumor microenvironment and thereby seem to replace pathogenic bacteria that may be implicated in CRC etiology. This first glimpse of the CRC microbiome provides an important step towards full understanding of the dynamic interplay between intestinal microbial ecology and sporadic CRC, which may provide important leads towards novel microbiome-related diagnostic tools and therapeutic interventions.

Bas E. Dutilh (2012), "Challenges to analyze and model microbial communities", invited talk at BioVisionAlexandria 2012 "New Life Sciences: Linking Science to Society". Alexandria, Egypt.

Bas E. Dutilh, Julian R Marchesi, Annemarie Boleij and Harold Tjalsma (2012), "Towards the human colorectal cancer microbiome", poster at MetaHIT International Human Microbiome Congress, Paris, France.

Pyrosequencing of 16S rRNA gene amplicons to study the microbiota in the gastrointestinal tract of carp (Cyprinus carpio L.)

Maartje A.H.J. van Kessel, Bas E. Dutilh, Kornelia Neveling, Michael P. Kwint, Joris A. Veltman, Gert Flik, Mike S.M. Jetten, Peter H.M. Klaren and Huub J.M. Op den Camp (2011), "Pyrosequencing of 16S rRNA gene amplicons to study the microbiota in the gastrointestinal tract of carp (Cyprinus carpio L.)", AMB Express 1: 41. Pubmed, PDF.

The microbes in the gastrointestinal (GI) tract are of high importance for the health of the host. In this study, Roche 454 pyrosequencing was applied to a pooled set of different 16S rRNA gene amplicons obtained from GI content of common carp (Cyprinus carpio) to make an inventory of the diversity of the microbiota in the GI tract. Compared to other studies, our culture-independent investigation reveals an impressive diversity of the microbial flora of the carp GI tract. The major group of obtained sequences belonged to the phylum Fusobacteria. Bacteroidetes, Planctomycetes and Gammaproteobacteria were other well represented groups of micro-organisms. Verrucomicrobiae, Clostridia and Bacilli (the latter two belonging to the phylum Firmicutes) had fewer representatives among the analyzed sequences. Many of these bacteria might be of high physiological relevance for carp as these groups have been implicated in vitamin production, nitrogen cycling and (cellulose) fermentation.

Genome sequence of the human pathogen Vibrio cholerae Amazonia

Cristiane C. Thompson, Michel Abanto Marin, Graciela Maria Dias, Bas E. Dutilh, Rob Edwards, Tetsuya Iida and Fabiano L. Thompson (2011), "Genome sequence of the human pathogen Vibrio cholerae Amazonia", Journal of Bacteriology 193: 5877-5878. Pubmed, PDF.

Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera like diarrhea cases in, at least, two countries, Brazil and Ghana. It belongs to a distinct profile by MLSA. The genomic analysis revealed that it contains the Vibrio pathogenicity island-2 and a set of genes related with pathogenesis and fitness, as the type VI secretion system, present in choleragenic V. cholerae strains.

FACIL: fast and accurate genetic code inference and logo

Bas E. Dutilh, Rasa Jurgelenaite, Radek Szklarczyk, Sacha A.F.T. van Hijum, Harry R. Harhangi, Markus Schmid, Bart de Wild, Kees-Jan Françoijs, Hendrik G. Stunnenberg, Marc Strous, Mike S.M. Jetten, Huub J.M. Op den Camp and Martijn A. Huynen (2011), "FACIL: fast and accurate genetic code inference and logo", Bioinformatics 27: 1929-1933. Pubmed, PDF.

Motivation: The intensification of environmental DNA sequencing will increasingly unveil uncharacterized species with potential alternative genetic codes. A total of 0.65% of the DNA sequences currently in Genbank encode their proteins with a variant genetic code, and these exceptions occur in many unrelated taxa. Results: We introduce FACIL, a fast and reliable tool to evaluate nucleic acid sequences for their genetic code that detects alternative codes even in species distantly related to known organisms. To illustrate this, we apply FACIL to a set of mitochondrial genomic contigs of Globobulimina pseudospinescens. This foraminifer does not have any sequenced close relatives in the databases, yet we infer its alternative genetic code with high confidence values. Results are intuitively visualized in a Genetic Code Logo. Availability and Implementation: FACIL is available as a web-based service at http://www.cmbi.ru.nl/FACIL/ and as a stand-alone program.

Bas E. Dutilh (2011), "FACIL: fast and accurate genetic code inference and logo", talk at San Diego Microbiology Group All Day Meeting 2011, San Diego, California, USA.

Bas E. Dutilh, Rasa Jurgelenaite, Radek Szklarczyk, Sacha A.F.T. van Hijum, Harry R. Harhangi, Markus Schmid, Bart de Wild, Kees-Jan Françoijs, Hendrik G. Stunnenberg, Marc Strous, Mike S.M. Jetten, Huub J.M. Op den Camp and Martijn A. Huynen (2011), "FACIL: fast and accurate genetic code inference and logo", poster at San Diego Microbiology Group All Day Meeting 2011, San Diego, California, USA.

Ultra-deep pyrosequencing of pmoA amplicons confirms prevalence of Methylomonas and Methylocystis in Sphagnum mosses from a Dutch peat bog

Nardy Kip, Bas E. Dutilh, Yao Pan, Levente Bodrossy, Kornelia Neveling, Michael P. Kwint, Mike S.M. Jetten and Huub J.M. Op den Camp (2011), "Ultra-deep pyrosequencing of pmoA amplicons confirms prevalence of Methylomonas and Methylocystis in Sphagnum mosses from a Dutch peat bog", Environmental Microbiology Reports 3: no. doi: 10.1111/j.1758-2229.2011.00260.x. PDF.

Sphagnum peatlands are important ecosystems in the methane cycle. Methanotrophs in these ecosystems have been shown to reduce methane emissions and provide additional carbon to Sphagnum mosses. However, little is known about the diversity and identity of the methanotrophs present in and on Sphagnum mosses in these peatlands. In this study, we applied a pmoA microarray and high-throughput 454 pyrosequencing to pmoA PCR products obtained from total DNA from Sphagnum mosses from a Dutch peat bog to investigate the presence of methanotrophs and to compare the two different methods. Both techniques showed comparable results and revealed an abundance of Methylomonas and Methylocystis species in the Sphagnum mosses. The advantage of the microarray analysis is that it is fast and cost-effective, especially when many samples have to be screened. Pyrosequencing is superior in providing pmoA sequences of many unknown or uncultivated methanotrophs present in the Sphagnum mosses and, thus, provided much more detailed and quantitative insight into the microbial diversity.

Mass spectrometry analysis of hepcidin peptides in experimental mouse models

Harold Tjalsma*, Coby M.M. Laarakkers*, Rachel P.S. van Swelm, Milan Theurl, Igor Theurl, Erwin H. Kemna, Yuri E.M. van der Burgt, Hanka Venselaar, Bas E. Dutilh, Frans G.M. Russel, Günter Weiss, Rosalinde Masereeuw, Robert E. Fleming, Dorine W. Swinkels (2011), "Mass Spectrometry Analysis of Hepcidin Peptides in Experimental Mouse Models", PLoS ONE 6: e16762. Pubmed, PDF. *Authors contributed equally.

Background The mouse is a valuable model for unravelling the role of hepcidin in iron homeostasis. Here, we aimed to assess mouse hepcidin-1 (Hep-1) and -2 (Hep-2) peptide levels in serum and urine by a novel mass spectrometry (MS)-based approach. Methods We used time-of-flight (TOF) MS to determine Hep-1 and -2 levels and Fourier transform ion cyclotron resonance (FTICR) and tandem-MS for hepcidin identifications. The method was biologically validated by hepcidin assessment in: i) 3 mouse strains (C57Bl/6; DBA/2 and BABL/c) upon stimulation with intravenous iron and LPS, ii) homozygous Hfe knock out, homozygous transferrin receptor 2 (Y245X) mutated mice and double affected mice, and iii) mice treated with a sublethal hepatotoxic dose of paracetamol. Results Hep-1 detection was restricted to serum, while Hep-2 was only found in urine and consisted of several isoforms. Elevations in serum Hep-1 and urine Hep-2 upon intravenous iron or LPS were only moderate and varied considerably between mouse strains. Serum Hep-1 was decreased in all three hemochromatosis models and lowest in the double affected mouse. Serum Hep-1 levels correlated with liver hepcidin-1 gene expression, while acute liver damage by paracetamol depleted Hep-1 from serum. Furthermore, serum Hep-1 appeared to be an excellent indicator of splenic iron accumulation. Conclusion Hep-1 and Hep-2 peptide responses in experimental mouse agree with the known biology of hepcidin mRNA regulators, and their measurement can now be implemented in iron-related experimental mouse models to provide novel insights in post-transcriptional regulation, hepcidin function, and kinetics.

The organellar genome and metabolic potential of the hydrogen-producing mitochondrion of Nyctotherus ovalis

Rob M. de Graaf*, Guenola Ricard*, Theo A. van Alen, Isabel Duarte, Bas E. Dutilh, Carola Burgtorf, Jan W.P. Kuiper, Georg W.M. van der Staay, Aloysius G.M. Tielens, Martijn A. Huynen and Johannes H.P. Hackstein (2011), "The organellar genome and metabolic potential of the hydrogen-producing mitochondrion of Nyctotherus ovalis", Molecular Biology and Evolution 28: 2379-2391. Pubmed, PDF. *Authors contributed equally.

It is generally accepted that hydrogenosomes (hydrogen-producing organelles) evolved from a mitochondrial ancestor. However, until recently, only indirect evidence for this hypothesis was available. Here we present the almost complete genome of the hydrogen-producing mitochondrion of the anaerobic ciliate Nyctotherus ovalis and show that, except for the notable absence of genes encoding electron-transport chain components of Complexes III, IV and V, it has a gene content similar to the mitochondrial genomes of aerobic ciliates. Analysis of the genome of the hydrogen-producing mitochondrion, in combination with that of more than 9,000 gDNA and cDNA sequences, allows a preliminary reconstruction of the organellar metabolism. The sequence data indicate that N. ovalis possesses hydrogen-producing mitochondria that have a truncated, two step (Complex I and II) electron-transport chain that uses fumarate as electron acceptor. In addition, components of an extensive protein network for the metabolism of amino-acids, defense against oxidative stress, mitochondrial protein synthesis, mitochondrial protein import and processing, and transport of metabolites across the mitochondrial membrane were identified. Genes for MPV17 and ACN9, two hypothetical proteins linked to mitochondrial disease in humans, were also found. The inferred metabolism is remarkably similar to the organellar metabolism of the phylogenetically distant anaerobic Stramenopile Blastocystis. Notably, the Blastocystis organelle and that of the related flagellate Proteromonas lacertae also lacks genes encoding components of Complexes III, IV and V. Thus, our data show that the hydrogenosomes of N. ovalis are highly specialized, hydrogen-producing mitochondria.

Genome wide screening in human growth plates during puberty in one patient suggests a role for RUNX2 in epiphyseal maturation

Joyce Emons, Bas E. Dutilh, Eva Decker, Heide Pirzer, Carsten Sticht, Norbert Gretz, Gudrun Rappold, Ewen R. Cameron, James C. Neil, Gary S. Stein, Andre J. van Wijnen, Jan Maarten Wit, Janine N. Post, Marcel Karperien (2011), "Genome wide screening in human growth plates during puberty in one patient suggests a role for RUNX2 in epiphyseal maturation", Journal of Endocrinology 209: 245-254. Pubmed, PDF.

In late puberty, estrogen decelerates bone growth by stimulating growth plate maturation. Here, we studied the mechanism of estrogen action using two pubertal growth plate specimens of one girl at Tanner stage B2 and Tanner stage B3. Histological analysis showed that progression of puberty coincided with characteristic morphological changes; a decrease in total growth plate height (p=0.002), height of the individual zones (p<0.001) and an increase in intercolumnar space (p<0.001). Microarray analysis of the specimens identified 394 genes (72% upregulated, 28% downregulated) that changed with the progression of puberty. Overall changes in gene expression were small (average 1.38-fold upregulated and 1.36-fold downregulated genes). The 394 genes mapped to 13 significantly changing pathways (p<0.05) associated with growth plate maturation (e.g., extracellular matrix, cell cycle and cell death). We next scanned the upstream promoter regions of the 394 genes for the presence of evolutionarily conserved binding sites for transcription factors implicated in growth plate maturation such as Estrogen Receptor, Androgen Receptor, Elk1, Stat5b, CREB and RUNX2. High quality motif sites for RUNX2 (87 genes), Elk1 (43 genes) and Stat5b (31 genes), but not estrogen receptor, were evolutionarily conserved, indicating their functional relevance across primates. Moreover, we show that some of these sites are direct target genes of these transcription factors as shown by ChIP assays.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Genome-wide profiling of p63 DNA-binding sites identifies an element that regulates gene expression during limb development in the 7q21 SHFM1 locus

Evelyn N. Kouwenhoven*, Simon J. van Heeringen*, Juan J. Tena*, Martin Oti, Bas E. Dutilh, M. Eva Alonso, Elisa de la Calle-Mustienes, Leonie Smeenk, Tuula Rinne, Lilian Parsaulian, Emine Bolat, Rasa Jurgelenaite, Martijn A. Huynen, Alexander Hoischen, Joris A. Veltman, Han G. Brunner, Tony Roscioli, Emily Oates, Meredith Wilson, Miguel Manzanares, José Luis Gómez-Skarmeta, Hendrik G. Stunnenberg, Marion Lohrum, Hans van Bokhoven and Huiqing Zhou (2010), "Genome-wide profiling of p63 DNA-binding sites identifies an element that regulates gene expression during limb development in the 7q21 SHFM1 locus", PLoS Genetics 6: e1001065. Pubmed, PDF. *Authors contributed equally.

Heterozygous mutations in p63 are associated with split hand/foot malformations (SHFM), orofacial clefting and ectodermal abnormalities. Elucidation of the p63 gene network that includes target genes and regulatory elements may reveal new genes for other malformation disorders. We performed genome-wide DNA-binding profiling by chromatin immunoprecipitation (ChIP) followed by deep sequencing (ChIP-seq) in primary human keratinocytes, and identified potential target genes and regulatory elements controlled by p63. We show that p63 binds to an enhancer element in the SHFM1 locus on chromosome 7q and that this element controls expression of DLX6 and possibly DLX5, both of which are important for limb development. A unique microdeletion including this enhancer element but not the DLX5/DLX6 genes was identified in a patient with SHFM. Our study strongly indicates disruption of a non-coding cis-regulatory element located more than 250 kb from the DLX5/DLX6 genes as a novel disease mechanism in SHFM1. These data provide a proof-of-concept that the catalogue of p63 binding sites identified in this study may be of relevance to the studies of SHFM and other congenital malformations that resemble the p63-associated phenotypes.

Deconstructing the super-organism

Bas E. Dutilh (2010). "Deconstructing the super-organism: detecting metabolic differentiation by compartmentalizing metagenomes", Veni award, NWO.

This Veni award from the Netherlands Organization for Scientific Research (NWO) enables me to do 3 years of independent research. Click here for the results. I will interpret the functionality of metagenomes at the level of individual micro-organisms. The award has been highlighted by Gezondheidskrant.nl, Medicalfacts.nl.

Nitrite-driven anaerobic methane oxidation by oxygenic bacteria

Katharina F. Ettwig*, Margaret K. Butler*, Denis Le Paslier, Eric Pelletier, Sophie Mangenot, Marcel M.M. Kuypers, Frank Schreiber, Bas E. Dutilh, Johannes Zedelius, Dirk de Beer, Jolein Gloerich, Hans J.C.T. Wessels, Theo A. van Alen, Francisca Luesken, Ming L. Wu, Katinka T. van de Pas-Schoonen, Huub J.M. Op den Camp, Eva M. Janssen-Megens, Kees-Jan Francoijs, Henk Stunnenberg, Jean Weissenbach, Mike S.M. Jetten and Marc Strous (2010), "Nitrite-driven anaerobic methane oxidation by oxygenic bacteria", Nature 464: 543-548. Pubmed, PDF, F1000 Exceptional. *Authors contributed equally.

Only three biological pathways are known to produce oxygen: photosynthesis, chlorate respiration and the detoxification of reactive oxygen species. Here we present evidence for a fourth pathway, possibly of considerable geochemical and evolutionary importance. The pathway was discovered after metagenomic sequencing of an enrichment culture that couples anaerobic oxidation of methane with the reduction of nitrite to dinitrogen. The complete genome of the dominant bacterium, named 'Candidatus Methylomirabilis oxyfera', was assembled. This apparently anaerobic, denitrifying bacterium encoded, transcribed and expressed the well-established aerobic pathway for methane oxidation, whereas it lacked known genes for dinitrogen production. Subsequent isotopic labelling indicated that 'M. oxyfera' bypassed the denitrification intermediate nitrous oxide by the conversion of two nitric oxide molecules to dinitrogen and oxygen, which was used to oxidize methane. These results extend our understanding of hydrocarbon degradation under anoxic conditions and explain the biochemical mechanism of a poorly understood freshwater methane sink. Because nitrogen oxides were already present on early Earth, our finding opens up the possibility that oxygen was available to microbial metabolism before the evolution of oxygenic photosynthesis.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
The mitochondrial genomes of the ciliates Euplotes minuta and Euplotes crassus

Rob M. de Graaf, Theo A. van Alen, Bas E. Dutilh, Jan W.P. Kuiper, Hanneke J.A.A. van Zoggel, Minh Bao Huynh, Hans-Dieter Görtz, Martijn A. Huynen and Johannes H.P. Hackstein (2009), "The mitochondrial genomes of the ciliates Euplotes minuta and Euplotes crassus", BMC Genomics 10: 514. PDF, Pubmed.

Background There are thousands of very diverse ciliates species from which only a handful mitochondrial genomes have been studied so far. These genomes are rather similar because the ciliates analysed (Tetrahymena spp. and Paramecium aurelia) are closely related. Here we study the mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus. These ciliates are only distantly related to Tetrahymena spp. and Paramecium aurelia, but more closely related to Nyctotherus ovalis, which possesses a hydrogenosomal (mitochondrial) genome. Results The linear mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus were sequenced and compared with the mitochondrial genomes of several Tetrahymena species, Paramecium aurelia and the partially sequenced mitochondrial genome of the anaerobic ciliate Nyctotherus ovalis. This study reports new features such as long 5'gene extensions of several mitochondrial genes, extremely long cox1 and cox2 open reading frames and a large repeat in the middle of the linear mitochondrial genome. The repeat separates the open reading frames into two blocks, each having a single direction of transcription, from the repeat towards the ends of the chromosome. Although the Euplotes mitochondrial gene content is almost identical to Paramecium and Tetrahymena, the order of the genes is completely different. In contrast, the 33273 bp (excluding the repeat region) piece of the mitochondrial genome that has been sequenced in both Euplotes species exhibits no difference in gene order. Unexpectedly, many of the mitochondrial genes of E. minuta encoding ribosomal proteins possess N-terminal extensions that are similar to mitochondrial targeting signals. Conclusions The mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus are rather different from the previously studied genomes. Many genes are extended in size compared to mitochondrial genes from other sources.

Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly

Bas E. Dutilh, Martijn A. Huynen and Marc Strous (2009), "Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly", Bioinformatics 25: 2878-2881, PDF, Pubmed.

Motivation Most microbial species can not be cultured in the lab. Metagenomic sequencing may still yield a complete genome if the sequenced community is enriched and the sequencing coverage is high. However, the complexity in a natural population may cause the enrichment culture to contain multiple related strains. This diversity can confound existing strict assembly programs and lead to a fragmented assembly, which is unnecessary if we have a related reference genome available that can function as a scaffold. Results Here, we map short metagenomic sequencing reads from a population of strains to a related reference genome, and compose a genome that captures the consensus of the population's sequences. We show that by iteration of the mapping and assembly procedure, the coverage increases while the similarity with the reference genome decreases. This indicates that the assembly becomes less dependent on the reference genome and approaches the consensus genome of the multi-strain population.

Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly

Bas E. Dutilh, Martijn A. Huynen, Jolein Gloerich and Marc Strous (2011), "Iterative Read Mapping and Assembly Allows the Use of a More Distant Reference in Metagenome Assembly". In: Handbook of Molecular Microbial Ecology I: Metagenomics and Complementary Approaches. Ed. Frans J. de Bruijn. Wiley-Blackwell.

Most microbial species can not be cultured in the laboratory. Metagenomic sequencing may still yield a complete genome if the sequenced community is enriched and the sequencing coverage is high. However, the complexity in a natural population may cause the enrichment culture to contain multiple related strains. Moreover, it is not uncommon that these strains represent a quasispecies that is relatively distantly related to the closest available reference genome. These matters can confound existing strict assembly programs and lead to a fragmented assembly, which is unnecessary if we have a related reference genome available that can function as a scaffold, and if we use this scaffold loosely. We show that by iteratively mapping short metagenomic sequencing reads from a population of strains to a related reference genome, we can create a genome that captures the consensus of the population's sequences. Iteration allows us to map more of the reads, leading to a higher coverage and depth of the assembled consensus genome. At the same time, the similarity with the reference genome decreases. This indicates that the assembly becomes less dependent on the reference genome and approaches the consensus genome of the multi-strain population. Thus, by exploiting the homology offered by a reference genome in combination with permissive, iterative read mapping, we get a better view of both the consensus genome sequence of the quasispecies present in the sample and of the sequence diversity between the strains.

Bas E. Dutilh (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", talk at Bio-IT World Conference and Expo 2010, Hannover, Germany.

Bas E. Dutilh (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", talk at Genomics Automation Europe 2010, Dublin, Ireland.

Bas E. Dutilh (2010), "Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly", talk at NBIC Conference 2010, Lunteren, The Netherlands.

Bas E. Dutilh, Martijn A. Huynen and Marc Strous (2009), "Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly", talk at Next Generation Sequencing and Algorithms for Short Read Analysis SIG, ISMB/ECCB 2009, Stockholm, Sweden.

Bas E. Dutilh, Martijn A. Huynen, Jolein Gloerich and Marc Strous (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", poster at ECCB 2010, Ghent, Belgium.

Bas E. Dutilh, Martijn A. Huynen, Jolein Gloerich and Marc Strous (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", poster at NBIC Conference 2010, Lunteren, The Netherlands.

Asymmetric relationships between proteins shape genome evolution

Richard A. Notebaart*, Philip R. Kensche*, Martijn A. Huynen and Bas E. Dutilh (2009), "Asymmetric relationships between proteins shape genome evolution", Genome Biology 10: R19. Pubmed, PDF. *Authors contributed equally.

Background The relationships between proteins are often asymmetric: one protein (A) depends for its function on another protein (B), but the second protein does not depend on the first. For example, in regulatory interactions, the regulator's function depends on the availability of its target, but the target can often function without the regulator. Other examples are metabolic networks, in which there are multiple pathways that converge into one central pathway. The enzymes in the converging pathways depend on the enzymes in the central pathway, but the enzymes in the latter do not depend on any specific enzyme in the converging pathways. Asymmetric relations are analogous to the "if->then" logical relation where A implies B, but B does not imply A (A->B). Results We show that the majority of relationships between enzymes in metabolic flux models of metabolism in Escherichia coli and Saccharomyces cerevisiae are asymmetric. We show furthermore that these asymmetric relationships are reflected in the expression of the genes encoding those enzymes, the effect of gene knockouts and the evolution of genomes. From the asymmetric relative dependency, one would expect that the gene that is relatively independent (B), can occur without the other, dependent gene (A), but not the reverse. Indeed, when only one gene of an A->B pair is expressed, is essential, is present in a genome, is gained in evolutionary history without the other, or is present after a loss of one of the two, it tends to be the independent gene (B). This bias is strongest for genes encoding proteins whose asymmetric relationship is evolutionarily conserved. Conclusions The asymmetric relations between proteins that arise from the system properties of metabolic networks affect gene expression, the relative effect of gene knockouts and genome evolution in a predictable manner.

Richard A. Notebaart*, Philip R. Kensche*, Martijn A. Huynen and Bas E. Dutilh (2009), "Asymmetric relationships between proteins shape genome evolution", talk by R.A. Notebaart at NBIC Conference 2009, Lunteren, The Netherlands. *These authors contributed equally.

Richard A. Notebaart*, Philip R. Kensche*, Martijn A. Huynen and Bas E. Dutilh (2009), "Asymmetric relationships between proteins shape genome evolution", poster at ISMB/ECCB 2009, Stockholm, Sweden.
*These authors contributed equally.

Philip R. Kensche*, Richard A. Notebaart*, Martijn A. Huynen and Bas E. Dutilh (2008), "Asymmetric relationships between proteins shape genome evolution", poster at Benelux Bioinformatics Conference 2008, Maastricht, The Netherlands.
*These authors contributed equally.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns

Guénola Ricard, Rob M. de Graaf, Bas E. Dutilh, Isabel Duarte, Theo A. van Alen, Angela H.A.M. van Hoek, Brigitte Boxma, Georg W.M. van der Staay, Seung Yeo Moon van der Staay, Wei-Jen Chang, Laura F. Landweber, Johannes H.P. Hackstein and Martijn A. Huynen (2008), "Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns", BMC Genomics 9: 587. Pubmed, PDF.

Background Nyctotherus ovalis is a single-celled eukaryote that has hydrogen-producing mitochondria and lives in the hindgut of cockroaches. Like all members of the ciliate taxon, it has two types of nuclei, a micronucleus and a macronucleus. N. ovalis generates its macronuclear chromosomes by forming polytene chromosomes that subsequently develop into macronuclear chromosomes by DNA elimination and rearrangement. Results We examined the structure of these gene-sized macronuclear chromosomes in N. ovalis. We determined the telomeres, subtelomeric regions, UTRs, coding regions and introns by sequencing a large set of macronuclear DNA sequences (4,242) and cDNAs (5,484) and comparing them with each other. The telomeres consist of repeats CCC(AAAACCCC)n, similar to those in spirotrichous ciliates such as Euplotes, Sterkiella (Oxytricha) and Stylonychia. Per sequenced chromosome we found evidence for either a single protein-coding gene, a single tRNA, or the complete ribosomal RNAs cluster. Hence the chromosomes appear to encode single transcripts. In the short subtelomeric regions we identified a few over-represented motifs that could be involved in gene regulation, but there is no consensus polyadenylation site. The introns are short (21-29 nucleotides), and a significant fraction (1/3) of the tiny introns is conserved in the distantly related ciliate Paramecium tetraurelia. As has been observed in P. tetraurelia, the N. ovalis introns tend to contain in-frame stop codons or have a length that is not dividable by three. This pattern causes premature termination of mRNA translation in the event of intron retention, and potentially degradation of unspliced mRNAs by the nonsense-mediated mRNA decay pathway. Conclusions The combination of short leaders, tiny introns and single genes leads to very minimal macronuclear chromosomes. The smallest we identified contained only 150 nucleotides.

Signature genes as a phylogenomic tool

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", Molecular Biology and Evolution 25: 1659-1667. Pubmed, PDF.

Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss, and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition.
We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that ~92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarising, signature genes can complement traditional sequence based methods in addressing taxonomic questions.

Signature, a web server for taxonomic characterization of sequence samples using signature genes

Bas E. Dutilh, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature, a web server for taxonomic characterization of sequence samples using signature genes", Nucleic Acids Res, 36 (Web Server Issue): W470-W474. Pubmed, PDF.

Signature genes are genes that are unique to a taxonomic clade and are common within it. They contain a wealth of information about clade-specific 15 processes and hold a strong evolutionary signal that can be used to phylogenetically characterize a set of sequences, such as a metagenomics sample. As signature genes are based on gene content, they provide a means to assess the taxonomic origin 20 of a sequence sample that is complementary to sequence-based analyses. Here, we introduce Signature (http://www.cmbi.ru.nl/signature), a web server that identifies the signature genes in a set of query sequences, and therewith 25 phylogenetically characterizes it. The server produces a list of taxonomic clades that share signature genes with the set of query sequences, along with an insightful image of the tree of life, in which the clades are color coded based on the number of 30 signature genes present. This allows the user to quickly see from which part(s) of the taxonomy the query sequences likely originate.

Signature genes are genes with a common ancestor, that are specific for a clade in the Tree of Life, and can be used to address phylogenetic or functional questions. Signature allows you to find out whether your sequence is a signature for any clade, and places the signature OGs in the context of the tre of life. Your initial input is first assigned to orthologous groups (OGs), or you can choose to skip the OG assignment step and enter OG identifiers directly. The distribution of these OGs is assessed in a default or custom tree of life, and finally Signature outputs all the clades that share signature OGs with your query.

The Bioscience Technology article Characterizing The Tree Of Life includes an interview with me about Signature.

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", talk at Benelux Bioinformatics Conference 2008, Maastricht, The Netherlands.

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", poster at ISMB/ECCB 2009, Stockholm, Sweden.

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", poster at Society for Bioinformatics in Northern Europe Conference 2008, Warszawa, Poland.
Selected for presentation.

Conservation of divergent transcription in fungi

Philip R. Kensche, Martin Oti, Bas E. Dutilh and Martijn A. Huynen (2008), "Conservation of divergent transcription in fungi", Trends in Genetics 24: 207-211. PubMed, PDF.

The comparison of fully sequenced genomes enables the study of selective constraints that determine genome organisation. We show that, in fungi, adjacent divergently transcribed (<-->) genes are more conserved in orientation than convergent (-><-) or co-oriented (->->) gene pairs. Furthermore, the time divergent orientation of two genes is conserved correlates with the degree of their co-expression and with the likelihood of them being functionally related. The functional interactions of the proteins encoded by the conserved divergent gene pairs indicate a potential for protein function prediction in eukaryotes.

Philip R. Kensche, Martin Oti, Bas E. Dutilh and Martijn A. Huynen (2008), "Conservation of Divergent Transcription in Fungi", poster at Society for Bioinformatics in Northern Europe Conference 2008, Warszawa, Poland.
Selected for presentation.

Philip R. Kensche, Martin Oti, Bas E. Dutilh and Martijn A. Huynen (2007), "Conservation of Gene Orientation in Fungi", poster at ESF-EMBO Symposium "Comparative Genomics of Eukaryotic Microorganisms: Eukaryotic Genome Evolution", Sant Feliu de Guixols, Spain.

Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution

Philip R. Kensche, Vera van Noort, Bas E. Dutilh and Martijn A. Huynen (2008), "Practical and theoretical advances in predicting the function of a protein from its phylogenetic distribution", Journal of the Royal Society Interface 5: 151-170. PubMed, PDF.

The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a number of proteins, including ones of medical relevance, have thus been predicted and subsequently confirmed experimentally. Additionally, various approaches to measure the similarity of phylogenetic profiles and to account for the phylogenetic bias in the data have been proposed. We review the successful applications of phylogenetic profiling and analyse the performance of various profile similarity measures with a set of one microsporidial and 25 fungal genomes. In the fungi, phylogenetic profiling yields high-confidence predictions for the highest and only the highest scoring gene pairs illustrating both the power and the limitations of the approach. Both practical examples and theoretical considerations suggest that in order to get a reliable and specific picture of a protein's function, results from phylogenetic profiling have to be combined with other sources of evidence.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Extracting the evolutionary signal from genomes

Bas E. Dutilh (October 15th 2007), "Extracting the evolutionary signal from genomes", Ph.D. thesis. PDF. Email me to receive a printed copy.

Several methods to analyze aspects of evolution are developed, that depend on the availability of complete genomes. While I consistently find a phylogenetic signal using many approaches, a question that is winning concern is how these evolutionary relationships should be interpreted. Since Darwin's idea about the tree-like structure of evolution, the dogma has been that evolution is mainly a vertical process, but recently, Doolittle pointed out that especially for prokaryotes, a tree may be insufficient to capture the complex evolutionary paths leading to the current-day genomes. While a tree may fall short as a representation of the evolutionary relationships between genomes, I think that describing a species as its entire genome blurs your vision. To characterize a species, I would look at its core, and disregard the noisy genes that obscure its evolutionary history (chapters "Genome trees and the nature of genome evolution", "The Consistent Phylogenetic Signal in Genome Trees Revealed by Reducing the Impact of Noise" and "Assessment of phylogenomic and orthology approaches for phylogenetic inference"). To identify these cores at many different levels throughout the tree of life, I use the hundreds of complete genomes that have become available. In the chapter "Signature genes as a phylogenomic tool", I find signature genes for every clade, and show that these can be used for the taxonomic characterization of a sequenced sample, for example an environmental sample. Another type of data that have become available on a large scale are gene expression data. To be able to compare the functional context of genes in distantly related species, we developed the expression context, that relies on the completeness of the genome sequences and on the availability of genome-wide expression experiments (chapter "A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation").

Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate

Nicole A. Datson, Maarten C. Morsink, Srebrena Atanasova, Victor W. Armstrong, Hans Zischler, Christina Schlumbohm, Bas E. Dutilh, Martijn A. Huynen, Brigitte Waegele, Andreas Ruepp, E. Ronald de Kloet and Eberhard Fuchs (2007), "Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate" BMC Genomics 8: 190. PubMed, PDF.

Background The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal model, the molecular tools for genetic analysis are extremely limited. Results Here we report the development of the first marmoset-specific oligonucleotide microarray (EUMAMA) containing probe sets targeting 1541 different marmoset transcripts expressed in hippocampus. These 1541 transcripts represent a wide variety of different functional gene classes. Hybridisation of the marmoset microarray with labelled RNA from hippocampus, cortex and a panel of 7 different peripheral tissues resulted in high detection rates of 85% in the neuronal tissues and on average 70% in the non-neuronal tissues. The expression profiles of the 2 neuronal tissues, hippocampus and cortex, were highly similar, as indicated by a correlation coefficient of 0.96. Several transcripts with a tissue-specific pattern of expression were identified. Besides the marmoset microarray we have generated 3215 ESTs derived from marmoset hippocampus, which have been annotated and submitted to GenBank [GenBank: EF214838 - EF215447, EH380242 - EH382846]. Conclusions We have generated the first marmoset-specific DNA microarray and demonstrated its use to characterise large-scale gene expression profiles of hippocampus but also of other neuronal and non-neuronal tissues. In addition, we have generated a large collection of ESTs of marmoset origin, which are now available in the public domain. These new tools will facilitate molecular genetic research into this non-human primate animal model.

Assessment of phylogenomic and orthology approaches for phylogenetic inference

Bas E. Dutilh, Vera van Noort, René T.J.M. van der Heijden, Teun Boekhout, Berend Snel and Martijn A. Huynen (2007), "Assessment of phylogenomic and orthology approaches for phylogenetic inference", Bioinformatics 23: 815-824. PubMed, PDF.

Motivation: Phylogenomics integrates the vast amount of phylogenetic information contained in complete genome sequences, and is rapidly becoming the standard for inferring reliable species phylogenies. There are however fundamental differences between the ways in which phylogenomic approaches like gene content, superalignment, superdistance and supertree integrate the phylogenetic information from separate orthologous groups. Furthermore, they all depend on the method by which the orthologous groups are initially determined. Here, we systematically compare these four phylogenomic approaches, in parallel with three approaches for large-scale orthology determination: pairwise orthology, cluster orthology and tree-based orthology. Results: Including various phylogenetic methods, we apply a total of 54 fully automated phylogenomic procedures to the Fungi, the eukaryotic clade with the largest number of sequenced genomes, for which we retrieved a golden standard phylogeny from the literature. Phylogenomic trees based on gene content show, relative to the other methods, a bias in the tree topology that parallels convergence in life style among the species compared, indicating convergence in gene content. Conclusions: Complete genomes are no warrant for good, or even consistent phylogenies. However, the large amounts of data in genomes enable us to carefully select the data most suitable for phylogenomic inference. In terms of performance, the superalignment approach, combined with restrictive orthology, is the most successful in recovering a fungal phylogeny that agrees with current taxonomic views, and allows us to obtain a high resolution phylogeny. We provide solid support for what has grown to be common practice in phylogenomics during its advance in recent years.

Bas E. Dutilh, Vera van Noort, René T.J.M. van der Heijden, Teun Boekhout, Berend Snel and Martijn A. Huynen (2007), "Assessment of phylogenomic and orthology approaches for phylogenetic inference", talk at ESF-EMBO Symposium "Comparative Genomics of Eukaryotic Microorganisms: Eukaryotic Genome Evolution", Sant Feliu de Guixols, Spain.

Bas E. Dutilh and Martijn A. Huynen (2006), "Superalignment and supertree are the best phylogenomic approaches", talk at International Conference in Phylogenomics, Sainte Adèle, Quebec, Canada. In: Conference Program Phylogenomics Conference, p. 20.

Bas E. Dutilh, Vera van Noort, René T.J.M. van der Heijden, Martijn A. Huynen and Berend Snel (2006), "A comprehensive comparison of phylogenomics and orthology methods applied to the Fungi", poster at International Conference in Phylogenomics, Sainte Adèle, Quebec, Canada. In: Conference Program Phylogenomics Conference, p. 40.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Deciphering the evolution and metabolism of an anammox bacterium from a community genome

Marc Strous, Eric Pelletier, Sophie Mangenot, Thomas Rattei, Angelika Lehner, Michael W. Taylor, Matthias Horn, Holger Daims, Delphine Bartol-Mavel, Patrick Wincker, Valérie Barbe, Nuria Fonknechten, David Vallenet, Béatrice Segurens, Chantal Schenowitz-Truong, Claudine Médigue, Astrid Collingro, Berend Snel, Bas E. Dutilh, Huub J. M. Op den Camp, Chris van der Drift, Irina Cirpus, Katinka T. van de Pas-Schoonen, Harry R. Harhangi, Laura van Niftrik, Markus Schmid, Jan Keltjens, Jack van de Vossenberg, Boran Kartal, Harald Meier, Dmitrij Frishman, Martijn A. Huynen, Hans-Werner Mewes, Jean Weissenbach, Mike S. M. Jetten, Michael Wagner and Denis Le Paslier (2006), "Deciphering the evolution and metabolism of an anammox bacterium from a community genome", Nature 440: 790-794. PubMed, PDF, F1000 Exceptional.

Anaerobic ammonium oxidation (anammox) has become a main focus in oceanography and wastewater treatment. It is also the nitrogen cycle's major remaining biochemical enigma. Among its features, the occurrence of hydrazine as a free intermediate of catabolism, the biosynthesis of ladderane lipids and the role of cytoplasm differentiation are unique in biology. Here we use environmental genomics the reconstruction of genomic data directly from the environment to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis from a complex bioreactor community. The genome data illuminate the evolutionary history of the Planctomycetes and allow us to expose the genetic blueprint of the organism's special properties. Most significantly, we identified candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism, and discovered unexpected metabolic versatility.

Horizontal gene transfer from Bacteria to rumen Ciliates indicates adaptation to their anaerobic carbohydrates rich environment

Guenola Ricard, Neil R. McEwan, Bas E. Dutilh, Jean-Pierre Jouany, Didier Macheboeuf, Makoto Mitsumori, Freda M. McIntosh, Tadeusz Michalowski, Takafumi Nagamine, Nancy Nelson, Charles J. Newbold, Eli Nsabimana, Akio Takenaka, Nadine A. Thomas, Kazunari Ushida, Johannes H.P. Hackstein and Martijn A. Huynen (2006), "Horizontal Gene Transfer from Bacteria to rumen Ciliates indicates adaptation to their anaerobic carbohydrates rich environment", BMC Genomics 7: 22. PubMed, PDF, F1000 Recommended, ISI.

Background The horizontal transfer of expressed genes from Bacteria into Ciliates which live in close contact with each other in the rumen (the foregut of ruminants) was studied using ciliate Expressed Sequence Tags (ESTs). About 4000 ESTs were sequenced from the two main types of rumen Cilates: Entodiniomorphs (Entodinium simplex, Entodinium caudatum, Eudiplodinium maggii, Metadinium medium, Diploplastron affine, Polyplastron multivesiculatum and Epidinium ecaudatum) and Vestibuliferida, previously called Holotrichs (Isotricha prostoma, Isotricha intestinalis and Dasytricha ruminantium). Results A comparison of the sequences with the completely sequenced genomes of Eukaryotes and Prokaryotes, followed by large scale construction and analysis of phylogenies, identified 148 ciliate genes that specifically cluster with genes from the Bacteria. Of these genes, 34 cluster with genes from the Firmicutes, a phylum of Bacteria that is well represented in the rumen. The phylogenetic clustering with bacterial genes, coupled with the absence of close relatives of these genes in the Ciliate Tetrahymena thermophila, indicates that they have recently been acquired via Horizontal Gene Transfer (HGT). Conclusions Among the HGT candidates, we found an over representation (>75%) of genes involved in metabolism, specifically in the catabolism of complex carbohydrates (>45%), a rich food source in the rumen. We propose that the acquisition of these genes has facilitated the Ciliates' colonization of the rumen and provides evidence for the role of HGT in the adaptation to new niches.

A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation

Bas E. Dutilh, Martijn A. Huynen and Berend Snel (2006), "A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation", BMC Genomics 7: 10 (highly accessed). PubMed, PDF, ISI.

Background The massive scale of microarray derived gene expression data allows for a global view of cellular function. Thus far, comparative studies of gene expression between species have been based on the level of expression of the gene across corresponding tissues, or on the co-expression of the gene with another gene. Results To compare gene expression between distant species on a global scale, we introduce the "expression context". The expression context of a gene is based on the co-expression with all other genes that have unambiguous counterparts in both genomes. Employing this new measure, we show 1) that the expression context is largely conserved between orthologs, and 2) that sequence identity shows little correlation with expression context conservation after gene duplication and speciation. Conclusions This means that the degree of sequence identity has a limited predictive quality for differential expression context conservation between orthologs, and thus presumably also for other facets of gene function.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Genome trees and the nature of genome evolution

Berend Snel, Martijn A. Huynen and Bas E. Dutilh (2005), "Genome trees and the nature of genome evolution", Annual Review of Microbiology 59: 191-209. PubMed.

Genome trees are a means to capture the overwhelming amount of phylogenetic information that is present in genomes. Different formalisms have been introduced to reconstruct genome trees on the basis of various aspects of the genome. On the basis of these aspects, we separate genome trees into five classes: (a) alignment-free trees based on statistic properties of the genome, (b) gene content trees based on the presence and absence of genes, (c) trees based on chromosomal gene order, (d) trees based on average sequence similarity, and (e) phylogenomics-based genome trees. Despite their recent development, genome tree methods have already had some impact on the phylogenetic classification of bacterial species. However, their main impact so far has been on our understanding of the nature of genome evolution and the role of horizontal gene transfer therein. An ideal genome tree method should be capable of using all gene families, including those containing paralogs, in a phylogenomics framework capitalizing on existing methods in conventional phylogenetic reconstruction. We expect such sophisticated methods to help us resolve the branching order between the main bacterial phyla.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise

Bas E. Dutilh, Martijn A. Huynen, William J. Bruno and Berend Snel (2004), "The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise", Journal of Molecular Evolution 58: 527-539. PubMed, PDF, F1000 Must Read.

With the sequencing of complete genomes we have the most complete molecular data for the reconstruction of the phylogeny of life. For example, instead of using the sequence similarity between proteins, as is classically done for the reconstruction of phylogenies, we can now use the number of shared genes between genomes. The goal of this project is to construct phylogenies using these and other types of "complete genome data" in new ways, combining as much of the genomic information as possible. Aside from being interesting in themselves, these genome trees are of extreme importance to detect and filter out various types of phylogenetic bias in genomic data sets, and therewith improve our methods for the prediction of protein interactions.

Bas E. Dutilh, Martijn A. Huynen, William J. Bruno and Berend Snel (2003), "The consistent signal in genome trees revealed by reducing the impact of noise", poster at ECCB, Paris, France.

Bas E. Dutilh, Martijn A. Huynen, William J. Bruno and Berend Snel (2003), "The consistent signal in genome trees revealed by reducing the impact of noise", poster at 6th Annual Conference on Computational Genomics, Boston, USA.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Decline in excision circles requires homeostatic renewal or homeostatic death of naive T cells

Bas E. Dutilh and Rob J. de Boer (2003), "Decline in excision circles requires homeostatic renewal or homeostatic death of naive T cells", Journal of Theoretical Biology 224: 351-358. PubMed, PDF.

When the TCR is formed in the thymus, fragments of DNA are excised from the T cell progenitor chromosome. These TCR rearrangement excision circles (TRECs) are stable, are not replicated in cell division and are therefore most frequent in naive T cells that have recently left the thymus. During life, the average TREC content of peripheral naive T cells decreases between one and two orders of magnitude in humans. It is generally believed that the age-dependent decrease in the production of naive T cells by the thymus is sufficient to explain the decrease in the TREC content. Here, we demonstrate that this decrease in thymic production is required, but it is not sufficient to explain the TREC data. Only if the decrease in thymic output is compensated by homeostasis can one explain the decrease in the TREC content. The homeostatic response can take two forms: when the total number of naive T cells declines, there could be an increase in the renewal rate or an increase of the average cellular lifespan.

Rob J. de Boer, Bas E. Dutilh, Mette D. Hazenberg and Frank M. Miedema (2000), "Mathematical models are required for the interpretation of T cell receptor excision circle data", poster at Joint Annual Meeting of Immunology of DGfI and NVvI, Düsseldorf, Germany. Abstract.

Bas E. Dutilh, Rob J. de Boer, Mette D. Hazenberg and Frank M. Miedema (2000), "Decline in excision circles proves homeostasis of naive T-cells", poster at Joint Annual Meeting of Immunology of DGfI and NVvI, Düsseldorf, Germany.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Reconstruction of Pyrococcus central carbohydrate metabolism

Bas E. Dutilh (2002), "Pyrobase: integrated database of Pyrococcus genes": www.cmbi.ru.nl/pyrobase.

With the sequencing of complete genomes we can predict the metabolic pathways in a species. Some of these species are of particular importance, either for medical or for industrial reasons. The Pyrococci are a genus that belong to the Archaea, one of the three branches in the evolution of cellular life, and the one about which the least is known. Pyrococcus is a hyperthermophile that lives at about 90°C, and has a large reductive potential. This makes the organism interesting for the production of certain fine chemicals (alcohols and aldehydes) from carboxyl acids.
The goal of this project is to detect the enzymes that are involved in the carbohydrate metabolism (e.g. glycolysis, citric acid cycle, fatty acid metabolism) of the Pyrococci. Some of them have already been annotated in the genome, but with the methods that are being developed in our group we have proposed new candidates. In this partly EU funded project, we work in collaboration with the Bacterial Genetics Group of John van der Oost at Wageningen University, where specific predictions can be tested.

Pyrobase is the integrated database of Pyrococcus genes made as a part of this project. The genomes of Pyrococcus abyssi, Pyrococcus furiosus, Pyrococcus horikoshii, the only three Pyrococcus species with completely sequenced genomes, were screened for Pfam protein families, the genes were assigned to a cluster of orthologous groups of proteins from the COG database, genomically linked orthologous groups were identified with STRING, and buttons are included to directly submit the protein sequence to a STRING genomic context search, and a SMART domain architecture search.

Augustinus R. Uria, Ronnie Machielsen, Bas E. Dutilh, Martijn A. Huynen and John van der Oost (2006), "Alcohol dehydrogenases from marine hyperthermophilic microorganisms and their importance to the pharmaceutical industry", presented at "International seminar and workshop on marine biodiversity and their potential for developing bio-pharmaceutical industry", May 2006, Jakarta. PDF.

M.P. Machielsen, Corné H. Verhees, Bas E. Dutilh, Martijn A. Huynen, Willem M. de Vos and John van der Oost (2002), "Distribution of alcohol dehydrogenases in Pyrococcus furiosus", poster at Extremophiles 2002, Napoli, Italy. In: Extremophiles 2002. Proceedings of the 4th international congress on extremophiles (Rossi, M., Bartolucci, S., Ciaramella, M. and Moracci, M., Eds.), p. 229. Napoli.
Computational genomics for protein function and pathway prediction

Berend Snel, Toni Gabaldón, Vera van Noort, Bas E. Dutilh and Martijn A. Huynen (2002), "Computational genomics for protein function and pathway prediction", poster at KNCV Symposium "Bioinformatics, the best of both worlds", Wageningen, The Netherlands.

2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Initiatives on sustainable development in the food sector worldwide

Bas E. Dutilh, Chris E. Dutilh and Willem H.M.M. van Laarhoven (2001), "Initiatives on Sustainable Development in the Food Sector Worldwide", Foundation for Sustainability in the Food Chain (DuVo). PDF.

In 1995 fifteen companies, active in the food chain in The Netherlands have initiated the Foundation for a Sustainable Food Chain (DuVo). The first projects carried out by DuVo were related to the identification of major environmental impacts in the food chain. Subsequently the focus changed to the identification of options for improvement along the production chain and to the development of an infrastructure, which could contain and provide such information. In 1998, DuVo has formulated a new strategy, which is composed of the following elements:

  • A dialogue with relevant stakeholders, aimed at establishing a common definition for the concept 'sustainable food chain'. In that process, measurable criteria can be developed to manage and monitor an improvement process;
  • Development of knowledge, aimed at providing factual information which can improve the content of the dialogue;
  • Open exchange of knowledge to enable as many parties as possible to share the insights which have been acquired.
DuVo organises an annual Dialogue Meeting since 1999, bringing together a broad range of stakeholders to inspire one another and exchange ideas. Also since 1999, it issues a booklet reporting on its activities every year: "Sustainability in the Food Chain" (1999),"Beginning of a Dialogue" (2000), and "Sustainability in Perspective" (2001). Of all the booklets, an English translation of the summary has been made.DuVo realised that their initiatives might inspire others, and thus hope to inform a wider international audience about their activities. For the same reason, DuVo decided to investigate whether similar initiatives exist elsewhere in the world. This report is the outcome of that investigation.
2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Gene networks from microarray data

Bas E. Dutilh and Paulien Hogeweg (1999), "Gene Networks from Microarray Data", report Binf.1999.11.01, Bioinformatics, Utrecht University: www.cmbi.ru.nl/~dutilh/genenets. Thesis site, PDF.

Since the development of the microarray technique in 1995, there has been an enormous increase in gene expression data from several organisms. Based on the view of gene systems as a logical network of nodes that influence each other's expression levels, scientists dream of being able to reconstruct the precise gene interaction network from the expression data obtained with this large scale arraying technique. Computer science shows that inference of a logical regulatory network is possible solely from sets of expression data, and mathematicians are working on the question how much data is at least necessary for reverse engineering.

Meanwhile, experimental biologists are experiencing problems in the field. The number of experiments that are necessary before attempting network reconstruction is a lot more than is generally possible in "wet" laboratories, so data compression algorithms are applied to reduce the number of nodes considered. This is however an extremely coarse representation of the intricate interconnections that exist between single genes. The resulting network of only a handful of nodes is therefore usually only sufficient to describe the experiments performed, while any possible predicting properties are absent.

In this literature thesis, I attempt to give an update on the state of the art in computerised network reconstruction techniques, and explicitly relate this to actual biological gene networks. I will go into the model formalisms used to describe genetic networks, and explain their specific advantages and disadvantages. Also, a separate chapter will be dedicated to several experimental results obtained in the research of genetic networks, and finally, a short discussion and some hypothesising is added.

Evolution of viral strain structure through host immune response

Bas E. Dutilh and Paulien Hogeweg (1999), "Evolution of viral strain structure through host immune response", talk at TMBM'99, Amsterdam, The Netherlands. In: abstracts of TMBM'99, Amsterdam: 160-161. PDF.

Recently, it was shown that host immune responses can form a strong selective pressure on the antigenic strain structure of pathogen populations. In an ODE model described by Gupta et al. [1], the evolutionary dynamics of infective agents (each viral strain is defined as a specific combination of several alleles at a number of epitopes) can lead to discrete strain structures, called discordant sets. Discordant sets consist of viruses which have no antigens in common, and together fill up the complete antigen space with their genotypes. These sets of infective agents inhibit the spreading of related pathogens in a host population, by making the hosts resistant to all antigens in the world.

We further examine potential emergent strain structures due to host immune responses, in a spatially explicit model including a population of immunologically reactive hosts. The hosts, which are individually implemented in a cellular automata machine, can each carry their own virus, and are each resistant to a specific combination of antigens. Thus, we are able to study many different assumptions on immune systems in one simple model. In the present study, for example, a host that is infected with several viruses, can collect resistances against all their antigens. Upon encounter of a newly attacking infection, it will oppose an immunity that is proportional to the amount of the infectious agent's antigens it has gathered immunity to.

Surprisingly, we discovered that spatial pattern formation in the cellular automata machine is necessary for generation of discordant strain structure such as found in the ODE model of Gupta et al. [1]. In the case of a mean field approximation (randomly reshuffling the cellular automata every timestep), agglomerative clustering techniques reveal strain structure in larger sets of viruses. There is a clear selection for minimization of the encountered immunities, and for each virus, this is optimized in a set with symmetric and minimum amounts of antigenic overlaps. Though these conditions are satisfied in any collection of discordant sets (including the complete viral population) the cofluctuating sets never contain discordants. The observed strain structures allow for larger virus populations, causing more infections during the lifetime of a host than an equal number of viruses organized in discordant sets would.

We conclude that host immune response can structure viral populations into discrete sets, which are not invadable by new mutants. Moreover, we see that spatial pattern formation, which leads to discordant sets of viruses, protects the hosts and reduces the numbers of viruses that survive.

[1] S. Gupta, N. Ferguson & R. Anderson: "Chaos, Persistance, and Evolution of Strain Structure in Antigenically Diverse Infectious Agents", Science, 280: 912-915, 1998.