2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Metagenomics and future perspectives in virus discovery

John L Mokili, Forest Rohwer and Bas E. Dutilh (2012), "Metagenomics and future perspectives in virus discovery", Current Opinion in Virology. 2: 1-15. DOI.

Monitoring the emergence and re-emergence of viral diseases with the goal of containing the spread of viral agents requires both adequate preparedness and quick response. Identifying the causative agent of a new epidemic is one of the most important steps for effective response to disease outbreaks. Traditionally, virus discovery required propagation of the virus in cell culture, a proven technique responsible for the identification of the vast majority of viruses known to date. However, many viruses cannot be easily propagated in cell culture, thus limiting our knowledge of viruses. Viral metagenomic analyses of environmental samples suggest that the field of virology has explored less than 1% of the extant viral diversity. In the last decade, the culture-independent and sequence-independent metagenomic approach has permitted the discovery of many viruses in a wide range of samples. Phylogenetically, some of these viruses are distantly related to previously discovered viruses. In addition, 60-99% of the sequences generated in different viral metagenomic studies are not homologous to known viruses. In this review, we discuss the advances in the area of viral metagenomics during the last decade and their relevance to virus discovery, clinical microbiology and public health. We discuss the potential of metagenomics for characterization of the normal viral population in a healthy community and identification of viruses that could pose a threat to humans through zoonosis. In addition, we propose a new model of the Koch's postulates named the 'Metagenomic Koch's Postulates'. Unlike the original Koch's postulates and the Molecular Koch's postulates as formulated by Falkow, the metagenomic Koch's postulates focus on the identification of metagenomic traits in disease cases. The metagenomic traits that can be traced after healthy individuals have been exposed to the source of the suspected pathogen.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Pyrosequencing of 16S rRNA gene amplicons to study the microbiota in the gastrointestinal tract of carp (Cyprinus carpio L.)

Maartje A.H.J. van Kessel, Bas E. Dutilh, Kornelia Neveling, Michael P. Kwint, Joris A. Veltman, Gert Flik, Mike S.M. Jetten, Peter H.M. Klaren and Huub J.M. Op den Camp (2011), "Pyrosequencing of 16S rRNA gene amplicons to study the microbiota in the gastrointestinal tract of carp (Cyprinus carpio L.)", AMB Express 1: 41. AMB, Pubmed, PDF.

The microbes in the gastrointestinal (GI) tract are of high importance for the health of the host. In this study, Roche 454 pyrosequencing was applied to a pooled set of different 16S rRNA gene amplicons obtained from GI content of common carp (Cyprinus carpio) to make an inventory of the diversity of the microbiota in the GI tract. Compared to other studies, our culture-independent investigation reveals an impressive diversity of the microbial flora of the carp GI tract. The major group of obtained sequences belonged to the phylum Fusobacteria. Bacteroidetes, Planctomycetes and Gammaproteobacteria were other well represented groups of micro-organisms. Verrucomicrobiae, Clostridia and Bacilli (the latter two belonging to the phylum Firmicutes) had fewer representatives among the analyzed sequences. Many of these bacteria might be of high physiological relevance for carp as these groups have been implicated in vitamin production, nitrogen cycling and (cellulose) fermentation.

Genome sequence of the human pathogen Vibrio cholerae Amazonia

Cristiane C. Thompson, Michel Abanto Marin, Graciela Maria Dias, Bas E. Dutilh, Rob Edwards, Tetsuya Iida, Fabiano L. Thompson (2011), "Genome sequence of the human pathogen Vibrio cholerae Amazonia", Journal of Bacteriology 193: 5877-5878. Pubmed, PDF.

Vibrio cholerae O1 Amazonia is a pathogen that was isolated from cholera like diarrhea cases in, at least, two countries, Brazil and Ghana. It belongs to a distinct profile by MLSA. The genomic analysis revealed that it contains the Vibrio pathogenicity island-2 and a set of genes related with pathogenesis and fitness, as the type VI secretion system, present in choleragenic V. cholerae strains.

FACIL: fast and accurate genetic code inference and logo

Bas E. Dutilh, Rasa Jurgelenaite, Radek Szklarczyk, Sacha A.F.T. van Hijum, Harry R. Harhangi, Markus Schmid, Bart de Wild, Kees-Jan Françoijs, Hendrik G. Stunnenberg, Marc Strous, Mike S.M. Jetten, Huub J.M. Op den Camp and Martijn A. Huynen (2011), "FACIL: fast and accurate genetic code inference and logo", Bioinformatics 27: 1929-1933. Pubmed, PDF.

Motivation: The intensification of environmental DNA sequencing will increasingly unveil uncharacterized species with potential alternative genetic codes. A total of 0.65% of the DNA sequences currently in Genbank encode their proteins with a variant genetic code, and these exceptions occur in many unrelated taxa. Results: We introduce FACIL, a fast and reliable tool to evaluate nucleic acid sequences for their genetic code that detects alternative codes even in species distantly related to known organisms. To illustrate this, we apply FACIL to a set of mitochondrial genomic contigs of Globobulimina pseudospinescens. This foraminifer does not have any sequenced close relatives in the databases, yet we infer its alternative genetic code with high confidence values. Results are intuitively visualized in a Genetic Code Logo. Availability and Implementation: FACIL is available as a web-based service at http://www.cmbi.ru.nl/FACIL/ and as a stand-alone program.

Bas E. Dutilh (2011), "FACIL: fast and accurate genetic code inference and logo", talk at San Diego Microbiology Group All Day Meeting 2011, San Diego, California, USA.

Bas E. Dutilh, Rasa Jurgelenaite, Radek Szklarczyk, Sacha A.F.T. van Hijum, Harry R. Harhangi, Markus Schmid, Bart de Wild, Kees-Jan Françoijs, Hendrik G. Stunnenberg, Marc Strous, Mike S.M. Jetten, Huub J.M. Op den Camp and Martijn A. Huynen (2011), "FACIL: fast and accurate genetic code inference and logo", poster at San Diego Microbiology Group All Day Meeting 2011, San Diego, California, USA.

Towards the human colorectal cancer microbiome

Julian R. Marchesi, Bas E. Dutilh, Neil Hall, Wilbert H. M. Peters, Rian Roelofs, Annemarie Boleij, Harold Tjalsma (2011), "Towards the Human Colorectal Cancer Microbiome", PLoS ONE 6: e20447. PLoS, Pubmed, PDF, F1000 Recommended.

Multiple factors drive the progression from healthy mucosa towards sporadic colorectal carcinomas and accumulating evidence associates intestinal bacteria with disease initiation and progression. Therefore, the aim of this study was to provide a first high-resolution map of colonic dysbiosis that is associated with human colorectal cancer (CRC). To this purpose, the microbiomes colonizing colon tumor tissue and adjacent non-malignant mucosa were compared by deep rRNA sequencing. The results revealed striking differences in microbial colonization patterns between these two sites. Although inter-individual colonization in CRC patients was variable, tumors consistently formed a niche for Coriobacteria and other proposed probiotic bacterial species, while potentially pathogenic Enterobacteria were underrepresented in tumor tissue. As the intestinal microbiota is generally stable during adult life, these findings suggest that CRC-associated physiological and metabolic changes recruit tumor-foraging commensal-like bacteria. These microbes thus have an apparent competitive advantage in the tumor microenvironment and thereby seem to replace pathogenic bacteria that may be implicated in CRC etiology. This first glimpse of the CRC microbiome provides an important step towards full understanding of the dynamic interplay between intestinal microbial ecology and sporadic CRC, which may provide important leads towards novel microbiome-related diagnostic tools and therapeutic interventions.

Ultra-deep pyrosequencing of pmoA amplicons confirms prevalence of Methylomonas and Methylocystis in Sphagnum mosses from a Dutch peat bog

Nardy Kip, Bas E. Dutilh, Yao Pan, Levente Bodrossy, Kornelia Neveling, Michael P. Kwint, Mike S.M. Jetten and Huub J.M. Op den Camp (2011), "Ultra-deep pyrosequencing of pmoA amplicons confirms prevalence of Methylomonas and Methylocystis in Sphagnum mosses from a Dutch peat bog", Environmental Microbiology Reports 3: no. doi: 10.1111/j.1758-2229.2011.00260.x. EMIR, PDF.

Sphagnum peatlands are important ecosystems in the methane cycle. Methanotrophs in these ecosystems have been shown to reduce methane emissions and provide additional carbon to Sphagnum mosses. However, little is known about the diversity and identity of the methanotrophs present in and on Sphagnum mosses in these peatlands. In this study, we applied a pmoA microarray and high-throughput 454 pyrosequencing to pmoA PCR products obtained from total DNA from Sphagnum mosses from a Dutch peat bog to investigate the presence of methanotrophs and to compare the two different methods. Both techniques showed comparable results and revealed an abundance of Methylomonas and Methylocystis species in the Sphagnum mosses. The advantage of the microarray analysis is that it is fast and cost-effective, especially when many samples have to be screened. Pyrosequencing is superior in providing pmoA sequences of many unknown or uncultivated methanotrophs present in the Sphagnum mosses and, thus, provided much more detailed and quantitative insight into the microbial diversity.

Mass spectrometry analysis of hepcidin peptides in experimental mouse models

Harold Tjalsma*, Coby M.M. Laarakkers*, Rachel P.S. van Swelm, Milan Theurl, Igor Theurl, Erwin H. Kemna, Yuri E.M. van der Burgt, Hanka Venselaar, Bas E. Dutilh, Frans G.M. Russel, Günter Weiss, Rosalinde Masereeuw, Robert E. Fleming, Dorine W. Swinkels (2011), "Mass Spectrometry Analysis of Hepcidin Peptides in Experimental Mouse Models", PLoS ONE 6: e16762. PLoS, Pubmed, PDF. *Authors contributed equally.

Background The mouse is a valuable model for unravelling the role of hepcidin in iron homeostasis. Here, we aimed to assess mouse hepcidin-1 (Hep-1) and -2 (Hep-2) peptide levels in serum and urine by a novel mass spectrometry (MS)-based approach. Methods We used time-of-flight (TOF) MS to determine Hep-1 and -2 levels and Fourier transform ion cyclotron resonance (FTICR) and tandem-MS for hepcidin identifications. The method was biologically validated by hepcidin assessment in: i) 3 mouse strains (C57Bl/6; DBA/2 and BABL/c) upon stimulation with intravenous iron and LPS, ii) homozygous Hfe knock out, homozygous transferrin receptor 2 (Y245X) mutated mice and double affected mice, and iii) mice treated with a sublethal hepatotoxic dose of paracetamol. Results Hep-1 detection was restricted to serum, while Hep-2 was only found in urine and consisted of several isoforms. Elevations in serum Hep-1 and urine Hep-2 upon intravenous iron or LPS were only moderate and varied considerably between mouse strains. Serum Hep-1 was decreased in all three hemochromatosis models and lowest in the double affected mouse. Serum Hep-1 levels correlated with liver hepcidin-1 gene expression, while acute liver damage by paracetamol depleted Hep-1 from serum. Furthermore, serum Hep-1 appeared to be an excellent indicator of splenic iron accumulation. Conclusion Hep-1 and Hep-2 peptide responses in experimental mouse agree with the known biology of hepcidin mRNA regulators, and their measurement can now be implemented in iron-related experimental mouse models to provide novel insights in post-transcriptional regulation, hepcidin function, and kinetics.

The organellar genome and metabolic potential of the hydrogen-producing mitochondrion of Nyctotherus ovalis

Rob M. de Graaf*, Guenola Ricard*, Theo A. van Alen, Isabel Duarte, Bas E. Dutilh, Carola Burgtorf, Jan W.P. Kuiper, Georg W.M. van der Staay, Aloysius G.M. Tielens, Martijn A. Huynen and Johannes H.P. Hackstein (2011), "The organellar genome and metabolic potential of the hydrogen-producing mitochondrion of Nyctotherus ovalis", Molecular Biology and Evolution 28: 2379-2391. Pubmed, PDF. *Authors contributed equally.

It is generally accepted that hydrogenosomes (hydrogen-producing organelles) evolved from a mitochondrial ancestor. However, until recently, only indirect evidence for this hypothesis was available. Here we present the almost complete genome of the hydrogen-producing mitochondrion of the anaerobic ciliate Nyctotherus ovalis and show that, except for the notable absence of genes encoding electron-transport chain components of Complexes III, IV and V, it has a gene content similar to the mitochondrial genomes of aerobic ciliates. Analysis of the genome of the hydrogen-producing mitochondrion, in combination with that of more than 9,000 gDNA and cDNA sequences, allows a preliminary reconstruction of the organellar metabolism. The sequence data indicate that N. ovalis possesses hydrogen-producing mitochondria that have a truncated, two step (Complex I and II) electron-transport chain that uses fumarate as electron acceptor. In addition, components of an extensive protein network for the metabolism of amino-acids, defense against oxidative stress, mitochondrial protein synthesis, mitochondrial protein import and processing, and transport of metabolites across the mitochondrial membrane were identified. Genes for MPV17 and ACN9, two hypothetical proteins linked to mitochondrial disease in humans, were also found. The inferred metabolism is remarkably similar to the organellar metabolism of the phylogenetically distant anaerobic Stramenopile Blastocystis. Notably, the Blastocystis organelle and that of the related flagellate Proteromonas lacertae also lacks genes encoding components of Complexes III, IV and V. Thus, our data show that the hydrogenosomes of N. ovalis are highly specialized, hydrogen-producing mitochondria.

Genome wide screening in human growth plates during puberty in one patient suggests a role for RUNX2 in epiphyseal maturation

Joyce Emons, Bas E. Dutilh, Eva Decker, Heide Pirzer, Carsten Sticht, Norbert Gretz, Gudrun Rappold, Ewen R. Cameron, James C. Neil, Gary S. Stein, Andre J. van Wijnen, Jan Maarten Wit, Janine N. Post, Marcel Karperien (2011), "Genome wide screening in human growth plates during puberty in one patient suggests a role for RUNX2 in epiphyseal maturation", Journal of Endocrinology 209: 245-254. Pubmed, PDF.

In late puberty, estrogen decelerates bone growth by stimulating growth plate maturation. Here, we studied the mechanism of estrogen action using two pubertal growth plate specimens of one girl at Tanner stage B2 and Tanner stage B3. Histological analysis showed that progression of puberty coincided with characteristic morphological changes; a decrease in total growth plate height (p=0.002), height of the individual zones (p<0.001) and an increase in intercolumnar space (p<0.001). Microarray analysis of the specimens identified 394 genes (72% upregulated, 28% downregulated) that changed with the progression of puberty. Overall changes in gene expression were small (average 1.38-fold upregulated and 1.36-fold downregulated genes). The 394 genes mapped to 13 significantly changing pathways (p<0.05) associated with growth plate maturation (e.g., extracellular matrix, cell cycle and cell death). We next scanned the upstream promoter regions of the 394 genes for the presence of evolutionarily conserved binding sites for transcription factors implicated in growth plate maturation such as Estrogen Receptor, Androgen Receptor, Elk1, Stat5b, CREB and RUNX2. High quality motif sites for RUNX2 (87 genes), Elk1 (43 genes) and Stat5b (31 genes), but not estrogen receptor, were evolutionarily conserved, indicating their functional relevance across primates. Moreover, we show that some of these sites are direct target genes of these transcription factors as shown by ChIP assays.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Genome-wide profiling of p63 DNA-binding sites identifies an element that regulates gene expression during limb development in the 7q21 SHFM1 locus

Evelyn N. Kouwenhoven*, Simon J. van Heeringen*, Juan J. Tena*, Martin Oti, Bas E. Dutilh, M. Eva Alonso, Elisa de la Calle-Mustienes, Leonie Smeenk, Tuula Rinne, Lilian Parsaulian, Emine Bolat, Rasa Jurgelenaite, Martijn A. Huynen, Alexander Hoischen, Joris A. Veltman, Han G. Brunner, Tony Roscioli, Emily Oates, Meredith Wilson, Miguel Manzanares, José Luis Gómez-Skarmeta, Hendrik G. Stunnenberg, Marion Lohrum, Hans van Bokhoven and Huiqing Zhou (2010), "Genome-wide profiling of p63 DNA-binding sites identifies an element that regulates gene expression during limb development in the 7q21 SHFM1 locus", PLoS Genetics 6: e1001065. PLoS, Pubmed, PDF. *Authors contributed equally.

Heterozygous mutations in p63 are associated with split hand/foot malformations (SHFM), orofacial clefting and ectodermal abnormalities. Elucidation of the p63 gene network that includes target genes and regulatory elements may reveal new genes for other malformation disorders. We performed genome-wide DNA-binding profiling by chromatin immunoprecipitation (ChIP) followed by deep sequencing (ChIP-seq) in primary human keratinocytes, and identified potential target genes and regulatory elements controlled by p63. We show that p63 binds to an enhancer element in the SHFM1 locus on chromosome 7q and that this element controls expression of DLX6 and possibly DLX5, both of which are important for limb development. A unique microdeletion including this enhancer element but not the DLX5/DLX6 genes was identified in a patient with SHFM. Our study strongly indicates disruption of a non-coding cis-regulatory element located more than 250 kb from the DLX5/DLX6 genes as a novel disease mechanism in SHFM1. These data provide a proof-of-concept that the catalogue of p63 binding sites identified in this study may be of relevance to the studies of SHFM and other congenital malformations that resemble the p63-associated phenotypes.

Deconstructing the super-organism

Bas E. Dutilh (2010). "Deconstructing the super-organism: detecting metabolic differentiation by compartmentalizing metagenomes", Veni award, NWO.

This award from the Dutch Scientific Organization (NWO) enables me to do 3 years of independent research. I will interpret the functionality of metagenomes at the level of individual micro-organisms. The award has been highlighted by several news sources, including Gezondheidskrant.nl, Medicalfacts.nl.

Nitrite-driven anaerobic methane oxidation by oxygenic bacteria

Katharina F. Ettwig*, Margaret K. Butler*, Denis Le Paslier, Eric Pelletier, Sophie Mangenot, Marcel M.M. Kuypers, Frank Schreiber, Bas E. Dutilh, Johannes Zedelius, Dirk de Beer, Jolein Gloerich, Hans J.C.T. Wessels, Theo A. van Alen, Francisca Luesken, Ming L. Wu, Katinka T. van de Pas-Schoonen, Huub J.M. Op den Camp, Eva M. Janssen-Megens, Kees-Jan Francoijs, Henk Stunnenberg, Jean Weissenbach, Mike S.M. Jetten and Marc Strous (2010), "Nitrite-driven anaerobic methane oxidation by oxygenic bacteria", Nature 464: 543-548. Pubmed, PDF, F1000 Exceptional. *Authors contributed equally.

Only three biological pathways are known to produce oxygen: photosynthesis, chlorate respiration and the detoxification of reactive oxygen species. Here we present evidence for a fourth pathway, possibly of considerable geochemical and evolutionary importance. The pathway was discovered after metagenomic sequencing of an enrichment culture that couples anaerobic oxidation of methane with the reduction of nitrite to dinitrogen. The complete genome of the dominant bacterium, named 'Candidatus Methylomirabilis oxyfera', was assembled. This apparently anaerobic, denitrifying bacterium encoded, transcribed and expressed the well-established aerobic pathway for methane oxidation, whereas it lacked known genes for dinitrogen production. Subsequent isotopic labelling indicated that 'M. oxyfera' bypassed the denitrification intermediate nitrous oxide by the conversion of two nitric oxide molecules to dinitrogen and oxygen, which was used to oxidize methane. These results extend our understanding of hydrocarbon degradation under anoxic conditions and explain the biochemical mechanism of a poorly understood freshwater methane sink. Because nitrogen oxides were already present on early Earth, our finding opens up the possibility that oxygen was available to microbial metabolism before the evolution of oxygenic photosynthesis.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
The mitochondrial genomes of the ciliates Euplotes minuta and Euplotes crassus

Rob M. de Graaf, Theo A. van Alen, Bas E. Dutilh, Jan W.P. Kuiper, Hanneke J.A.A. van Zoggel, Minh Bao Huynh, Hans-Dieter Görtz, Martijn A. Huynen and Johannes H.P. Hackstein (2009), "The mitochondrial genomes of the ciliates Euplotes minuta and Euplotes crassus", BMC Genomics 10: 514. BMC, PDF, Pubmed.

Background There are thousands of very diverse ciliates species from which only a handful mitochondrial genomes have been studied so far. These genomes are rather similar because the ciliates analysed (Tetrahymena spp. and Paramecium aurelia) are closely related. Here we study the mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus. These ciliates are only distantly related to Tetrahymena spp. and Paramecium aurelia, but more closely related to Nyctotherus ovalis, which possesses a hydrogenosomal (mitochondrial) genome. Results The linear mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus were sequenced and compared with the mitochondrial genomes of several Tetrahymena species, Paramecium aurelia and the partially sequenced mitochondrial genome of the anaerobic ciliate Nyctotherus ovalis. This study reports new features such as long 5'gene extensions of several mitochondrial genes, extremely long cox1 and cox2 open reading frames and a large repeat in the middle of the linear mitochondrial genome. The repeat separates the open reading frames into two blocks, each having a single direction of transcription, from the repeat towards the ends of the chromosome. Although the Euplotes mitochondrial gene content is almost identical to Paramecium and Tetrahymena, the order of the genes is completely different. In contrast, the 33273 bp (excluding the repeat region) piece of the mitochondrial genome that has been sequenced in both Euplotes species exhibits no difference in gene order. Unexpectedly, many of the mitochondrial genes of E. minuta encoding ribosomal proteins possess N-terminal extensions that are similar to mitochondrial targeting signals. Conclusions The mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus are rather different from the previously studied genomes. Many genes are extended in size compared to mitochondrial genes from other sources.

Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly

Bas E. Dutilh, Martijn A. Huynen and Marc Strous (2009), "Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly", Bioinformatics 25: 2878-2881, PDF, Pubmed.

Motivation Most microbial species can not be cultured in the lab. Metagenomic sequencing may still yield a complete genome if the sequenced community is enriched and the sequencing coverage is high. However, the complexity in a natural population may cause the enrichment culture to contain multiple related strains. This diversity can confound existing strict assembly programs and lead to a fragmented assembly, which is unnecessary if we have a related reference genome available that can function as a scaffold. Results Here, we map short metagenomic sequencing reads from a population of strains to a related reference genome, and compose a genome that captures the consensus of the population's sequences. We show that by iteration of the mapping and assembly procedure, the coverage increases while the similarity with the reference genome decreases. This indicates that the assembly becomes less dependent on the reference genome and approaches the consensus genome of the multi-strain population.

Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly

Bas E. Dutilh, Martijn A. Huynen, Jolein Gloerich and Marc Strous (2011), "Iterative Read Mapping and Assembly Allows the Use of a More Distant Reference in Metagenome Assembly", In: Handbook of Molecular Microbial Ecology I: Metagenomics and Complementary Approaches. Ed. Frans J. de Bruijn. Wiley-Blackwell.

Most microbial species can not be cultured in the laboratory. Metagenomic sequencing may still yield a complete genome if the sequenced community is enriched and the sequencing coverage is high. However, the complexity in a natural population may cause the enrichment culture to contain multiple related strains. Moreover, it is not uncommon that these strains represent a quasispecies that is relatively distantly related to the closest available reference genome. These matters can confound existing strict assembly programs and lead to a fragmented assembly, which is unnecessary if we have a related reference genome available that can function as a scaffold, and if we use this scaffold loosely. We show that by iteratively mapping short metagenomic sequencing reads from a population of strains to a related reference genome, we can create a genome that captures the consensus of the population's sequences. Iteration allows us to map more of the reads, leading to a higher coverage and depth of the assembled consensus genome. At the same time, the similarity with the reference genome decreases. This indicates that the assembly becomes less dependent on the reference genome and approaches the consensus genome of the multi-strain population. Thus, by exploiting the homology offered by a reference genome in combination with permissive, iterative read mapping, we get a better view of both the consensus genome sequence of the quasispecies present in the sample and of the sequence diversity between the strains.

Bas E. Dutilh (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", talk at Bio-IT World Conference and Expo 2010, Hannover, Germany.

Bas E. Dutilh (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", talk at Genomics Automation Europe 2010, Dublin, Ireland.

Bas E. Dutilh (2010), "Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly", talk at NBIC Conference 2010, Lunteren, The Netherlands.

Bas E. Dutilh, Martijn A. Huynen and Marc Strous (2009), "Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly", talk at Next Generation Sequencing and Algorithms for Short Read Analysis SIG, ISMB/ECCB 2009, Stockholm, Sweden.

Bas E. Dutilh, Martijn A. Huynen, Jolein Gloerich and Marc Strous (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", poster at ECCB 2010, Ghent, Belgium.

Bas E. Dutilh, Martijn A. Huynen, Jolein Gloerich and Marc Strous (2010), "Iterative read mapping and assembly allows the use of a more distant reference in metagenome assembly", poster at NBIC Conference 2010, Lunteren, The Netherlands.

Asymmetric relationships between proteins shape genome evolution

Richard A. Notebaart*, Philip R. Kensche*, Martijn A. Huynen and Bas E. Dutilh (2009), "Asymmetric relationships between proteins shape genome evolution", Genome Biology 10: R19. Pubmed, Genome Biology, PDF. *Authors contributed equally.

Background The relationships between proteins are often asymmetric: one protein (A) depends for its function on another protein (B), but the second protein does not depend on the first. For example, in regulatory interactions, the regulator's function depends on the availability of its target, but the target can often function without the regulator. Other examples are metabolic networks, in which there are multiple pathways that converge into one central pathway. The enzymes in the converging pathways depend on the enzymes in the central pathway, but the enzymes in the latter do not depend on any specific enzyme in the converging pathways. Asymmetric relations are analogous to the "if->then" logical relation where A implies B, but B does not imply A (A->B). Results We show that the majority of relationships between enzymes in metabolic flux models of metabolism in Escherichia coli and Saccharomyces cerevisiae are asymmetric. We show furthermore that these asymmetric relationships are reflected in the expression of the genes encoding those enzymes, the effect of gene knockouts and the evolution of genomes. From the asymmetric relative dependency, one would expect that the gene that is relatively independent (B), can occur without the other, dependent gene (A), but not the reverse. Indeed, when only one gene of an A->B pair is expressed, is essential, is present in a genome, is gained in evolutionary history without the other, or is present after a loss of one of the two, it tends to be the independent gene (B). This bias is strongest for genes encoding proteins whose asymmetric relationship is evolutionarily conserved. Conclusions The asymmetric relations between proteins that arise from the system properties of metabolic networks affect gene expression, the relative effect of gene knockouts and genome evolution in a predictable manner.

Richard A. Notebaart*, Philip R. Kensche*, Martijn A. Huynen and Bas E. Dutilh (2009), "Asymmetric relationships between proteins shape genome evolution", talk by R.A. Notebaart at NBIC Conference 2009, Lunteren, The Netherlands. *These authors contributed equally.

Richard A. Notebaart*, Philip R. Kensche*, Martijn A. Huynen and Bas E. Dutilh (2009), "Asymmetric relationships between proteins shape genome evolution", poster at ISMB/ECCB 2009, Stockholm, Sweden.
*These authors contributed equally.

Philip R. Kensche*, Richard A. Notebaart*, Martijn A. Huynen and Bas E. Dutilh (2008), "Asymmetric relationships between proteins shape genome evolution", poster at Benelux Bioinformatics Conference 2008, Maastricht, The Netherlands.
*These authors contributed equally.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns

Guénola Ricard, Rob M. de Graaf, Bas E. Dutilh, Isabel Duarte, Theo A. van Alen, Angela H.A.M. van Hoek, Brigitte Boxma, Georg W.M. van der Staay, Seung Yeo Moon van der Staay, Wei-Jen Chang, Laura F. Landweber, Johannes H.P. Hackstein and Martijn A. Huynen (2008), "Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns", BMC Genomics 9: 587. Pubmed, BMC, PDF.

Background Nyctotherus ovalis is a single-celled eukaryote that has hydrogen-producing mitochondria and lives in the hindgut of cockroaches. Like all members of the ciliate taxon, it has two types of nuclei, a micronucleus and a macronucleus. N. ovalis generates its macronuclear chromosomes by forming polytene chromosomes that subsequently develop into macronuclear chromosomes by DNA elimination and rearrangement. Results We examined the structure of these gene-sized macronuclear chromosomes in N. ovalis. We determined the telomeres, subtelomeric regions, UTRs, coding regions and introns by sequencing a large set of macronuclear DNA sequences (4,242) and cDNAs (5,484) and comparing them with each other. The telomeres consist of repeats CCC(AAAACCCC)n, similar to those in spirotrichous ciliates such as Euplotes, Sterkiella (Oxytricha) and Stylonychia. Per sequenced chromosome we found evidence for either a single protein-coding gene, a single tRNA, or the complete ribosomal RNAs cluster. Hence the chromosomes appear to encode single transcripts. In the short subtelomeric regions we identified a few over-represented motifs that could be involved in gene regulation, but there is no consensus polyadenylation site. The introns are short (21-29 nucleotides), and a significant fraction (1/3) of the tiny introns is conserved in the distantly related ciliate Paramecium tetraurelia. As has been observed in P. tetraurelia, the N. ovalis introns tend to contain in-frame stop codons or have a length that is not dividable by three. This pattern causes premature termination of mRNA translation in the event of intron retention, and potentially degradation of unspliced mRNAs by the nonsense-mediated mRNA decay pathway. Conclusions The combination of short leaders, tiny introns and single genes leads to very minimal macronuclear chromosomes. The smallest we identified contained only 150 nucleotides.

Signature genes as a phylogenomic tool

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", Molecular Biology and Evolution 25: 1659-1667. Pubmed, PDF.

Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss, and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition.
We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that ~92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarising, signature genes can complement traditional sequence based methods in addressing taxonomic questions.

Signature, a web server for taxonomic characterization of sequence samples using signature genes

Bas E. Dutilh, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature, a web server for taxonomic characterization of sequence samples using signature genes", Nucleic Acids Res, 36 (Web Server Issue): W470-W474. Pubmed, PDF.

Signature genes are genes that are unique to a taxonomic clade and are common within it. They contain a wealth of information about clade-specific 15 processes and hold a strong evolutionary signal that can be used to phylogenetically characterize a set of sequences, such as a metagenomics sample. As signature genes are based on gene content, they provide a means to assess the taxonomic origin 20 of a sequence sample that is complementary to sequence-based analyses. Here, we introduce Signature (http://www.cmbi.ru.nl/signature), a web server that identifies the signature genes in a set of query sequences, and therewith 25 phylogenetically characterizes it. The server produces a list of taxonomic clades that share signature genes with the set of query sequences, along with an insightful image of the tree of life, in which the clades are color coded based on the number of 30 signature genes present. This allows the user to quickly see from which part(s) of the taxonomy the query sequences likely originate.

Signature genes are genes with a common ancestor, that are specific for a clade in the Tree of Life, and can be used to address phylogenetic or functional questions. Signature allows you to find out whether your sequence is a signature for any clade, and places the signature OGs in the context of the tre of life. Your initial input is first assigned to orthologous groups (OGs), or you can choose to skip the OG assignment step and enter OG identifiers directly. The distribution of these OGs is assessed in a default or custom tree of life, and finally Signature outputs all the clades that share signature OGs with your query.

The Bioscience Technology article Characterizing The Tree Of Life includes an interview with me about Signature.

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", talk at Benelux Bioinformatics Conference 2008, Maastricht, The Netherlands.

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", poster at ISMB/ECCB 2009, Stockholm, Sweden.

Bas E. Dutilh, Berend Snel, Thijs J.G. Ettema, Ying He, Maarten L. Hekkelman and Martijn A. Huynen (2008), "Signature genes as a phylogenomic tool", poster at Society for Bioinformatics in Northern Europe Conference 2008, Warszawa, Poland.
Selected for presentation.

Conservation of divergent transcription in fungi

Philip R. Kensche, Martin Oti, Bas E. Dutilh and Martijn A. Huynen (2008), "Conservation of divergent transcription in fungi", Trends in Genetics 24: 207-211. PubMed, PDF.

The comparison of fully sequenced genomes enables the study of selective constraints that determine genome organisation. We show that, in fungi, adjacent divergently transcribed (<-->) genes are more conserved in orientation than convergent (-><-) or co-oriented (->->) gene pairs. Furthermore, the time divergent orientation of two genes is conserved correlates with the degree of their co-expression and with the likelihood of them being functionally related. The functional interactions of the proteins encoded by the conserved divergent gene pairs indicate a potential for protein function prediction in eukaryotes.

Philip R. Kensche, Martin Oti, Bas E. Dutilh and Martijn A. Huynen (2008), "Conservation of Divergent Transcription in Fungi", poster at Society for Bioinformatics in Northern Europe Conference 2008, Warszawa, Poland.
Selected for presentation.

Philip R. Kensche, Martin Oti, Bas E. Dutilh and Martijn A. Huynen (2007), "Conservation of Gene Orientation in Fungi", poster at ESF-EMBO Symposium "Comparative Genomics of Eukaryotic Microorganisms: Eukaryotic Genome Evolution", Sant Feliu de Guixols, Spain.

Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution

Philip R. Kensche, Vera van Noort, Bas E. Dutilh and Martijn A. Huynen (2008), "Practical and theoretical advances in predicting the function of a protein from its phylogenetic distribution", Journal of the Royal Society Interface 5: 151-170. PubMed, PDF.

The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a number of proteins, including ones of medical relevance, have thus been predicted and subsequently confirmed experimentally. Additionally, various approaches to measure the similarity of phylogenetic profiles and to account for the phylogenetic bias in the data have been proposed. We review the successful applications of phylogenetic profiling and analyse the performance of various profile similarity measures with a set of one microsporidial and 25 fungal genomes. In the fungi, phylogenetic profiling yields high-confidence predictions for the highest and only the highest scoring gene pairs illustrating both the power and the limitations of the approach. Both practical examples and theoretical considerations suggest that in order to get a reliable and specific picture of a protein's function, results from phylogenetic profiling have to be combined with other sources of evidence.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Extracting the evolutionary signal from genomes

Bas E. Dutilh (2007), "Extracting the evolutionary signal from genomes", Ph.D. thesis. PDF. Email me to receive a printed copy.

Several methods to analyze aspects of evolution are developed, that depend on the availability of complete genomes. While I consistently find a phylogenetic signal using many approaches, a question that is winning concern is how these evolutionary relationships should be interpreted. Since Darwin's idea about the tree-like structure of evolution, the dogma has been that evolution is mainly a vertical process, but recently, Doolittle pointed out that especially for prokaryotes, a tree may be insufficient to capture the complex evolutionary paths leading to the current-day genomes. While a tree may fall short as a representation of the evolutionary relationships between genomes, I think that describing a species as its entire genome blurs your vision. To characterize a species, I would look at its core, and disregard the noisy genes that obscure its evolutionary history (chapters "Genome trees and the nature of genome evolution", "The Consistent Phylogenetic Signal in Genome Trees Revealed by Reducing the Impact of Noise" and "Assessment of phylogenomic and orthology approaches for phylogenetic inference"). To identify these cores at many different levels throughout the tree of life, I use the hundreds of complete genomes that have become available. In the chapter "Signature genes as a phylogenomic tool", I find signature genes for every clade, and show that these can be used for the taxonomic characterization of a sequenced sample, for example an environmental sample. Another type of data that have become available on a large scale are gene expression data. To be able to compare the functional context of genes in distantly related species, we developed the expression context, that relies on the completeness of the genome sequences and on the availability of genome-wide expression experiments (chapter "A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation").

Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate

Nicole A. Datson, Maarten C. Morsink, Srebrena Atanasova, Victor W. Armstrong, Hans Zischler, Christina Schlumbohm, Bas E. Dutilh, Martijn A. Huynen, Brigitte Waegele, Andreas Ruepp, E. Ronald de Kloet and Eberhard Fuchs (2007), "Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate" BMC Genomics 8: 190. PubMed, BMC, PDF.

Background The common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal model, the molecular tools for genetic analysis are extremely limited. Results Here we report the development of the first marmoset-specific oligonucleotide microarray (EUMAMA) containing probe sets targeting 1541 different marmoset transcripts expressed in hippocampus. These 1541 transcripts represent a wide variety of different functional gene classes. Hybridisation of the marmoset microarray with labelled RNA from hippocampus, cortex and a panel of 7 different peripheral tissues resulted in high detection rates of 85% in the neuronal tissues and on average 70% in the non-neuronal tissues. The expression profiles of the 2 neuronal tissues, hippocampus and cortex, were highly similar, as indicated by a correlation coefficient of 0.96. Several transcripts with a tissue-specific pattern of expression were identified. Besides the marmoset microarray we have generated 3215 ESTs derived from marmoset hippocampus, which have been annotated and submitted to GenBank [GenBank: EF214838 - EF215447, EH380242 - EH382846]. Conclusions We have generated the first marmoset-specific DNA microarray and demonstrated its use to characterise large-scale gene expression profiles of hippocampus but also of other neuronal and non-neuronal tissues. In addition, we have generated a large collection of ESTs of marmoset origin, which are now available in the public domain. These new tools will facilitate molecular genetic research into this non-human primate animal model.

Assessment of phylogenomic and orthology approaches for phylogenetic inference

Bas E. Dutilh, Vera van Noort, René T.J.M. van der Heijden, Teun Boekhout, Berend Snel and Martijn A. Huynen (2007), "Assessment of phylogenomic and orthology approaches for phylogenetic inference", Bioinformatics 23: 815-824. PubMed, PDF.

Motivation: Phylogenomics integrates the vast amount of phylogenetic information contained in complete genome sequences, and is rapidly becoming the standard for inferring reliable species phylogenies. There are however fundamental differences between the ways in which phylogenomic approaches like gene content, superalignment, superdistance and supertree integrate the phylogenetic information from separate orthologous groups. Furthermore, they all depend on the method by which the orthologous groups are initially determined. Here, we systematically compare these four phylogenomic approaches, in parallel with three approaches for large-scale orthology determination: pairwise orthology, cluster orthology and tree-based orthology. Results: Including various phylogenetic methods, we apply a total of 54 fully automated phylogenomic procedures to the Fungi, the eukaryotic clade with the largest number of sequenced genomes, for which we retrieved a golden standard phylogeny from the literature. Phylogenomic trees based on gene content show, relative to the other methods, a bias in the tree topology that parallels convergence in life style among the species compared, indicating convergence in gene content. Conclusions: Complete genomes are no warrant for good, or even consistent phylogenies. However, the large amounts of data in genomes enable us to carefully select the data most suitable for phylogenomic inference. In terms of performance, the superalignment approach, combined with restrictive orthology, is the most successful in recovering a fungal phylogeny that agrees with current taxonomic views, and allows us to obtain a high resolution phylogeny. We provide solid support for what has grown to be common practice in phylogenomics during its advance in recent years.

Bas E. Dutilh, Vera van Noort, René T.J.M. van der Heijden, Teun Boekhout, Berend Snel and Martijn A. Huynen (2007), "Assessment of phylogenomic and orthology approaches for phylogenetic inference", talk at ESF-EMBO Symposium "Comparative Genomics of Eukaryotic Microorganisms: Eukaryotic Genome Evolution", Sant Feliu de Guixols, Spain.

Bas E. Dutilh and Martijn A. Huynen (2006), "Superalignment and supertree are the best phylogenomic approaches", talk at International Conference in Phylogenomics, Sainte Adèle, Quebec, Canada. In: Conference Program Phylogenomics Conference, p. 20.

Bas E. Dutilh, Vera van Noort, René T.J.M. van der Heijden, Martijn A. Huynen and Berend Snel (2006), "A comprehensive comparison of phylogenomics and orthology methods applied to the Fungi", poster at International Conference in Phylogenomics, Sainte Adèle, Quebec, Canada. In: Conference Program Phylogenomics Conference, p. 40.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Deciphering the evolution and metabolism of an anammox bacterium from a community genome

Marc Strous, Eric Pelletier, Sophie Mangenot, Thomas Rattei, Angelika Lehner, Michael W. Taylor, Matthias Horn, Holger Daims, Delphine Bartol-Mavel, Patrick Wincker, Valérie Barbe, Nuria Fonknechten, David Vallenet, Béatrice Segurens, Chantal Schenowitz-Truong, Claudine Médigue, Astrid Collingro, Berend Snel, Bas E. Dutilh, Huub J. M. Op den Camp, Chris van der Drift, Irina Cirpus, Katinka T. van de Pas-Schoonen, Harry R. Harhangi, Laura van Niftrik, Markus Schmid, Jan Keltjens, Jack van de Vossenberg, Boran Kartal, Harald Meier, Dmitrij Frishman, Martijn A. Huynen, Hans-Werner Mewes, Jean Weissenbach, Mike S. M. Jetten, Michael Wagner and Denis Le Paslier (2006), "Deciphering the evolution and metabolism of an anammox bacterium from a community genome", Nature 440: 790-794. PubMed, PDF, F1000 Exceptional.

Anaerobic ammonium oxidation (anammox) has become a main focus in oceanography and wastewater treatment. It is also the nitrogen cycle's major remaining biochemical enigma. Among its features, the occurrence of hydrazine as a free intermediate of catabolism, the biosynthesis of ladderane lipids and the role of cytoplasm differentiation are unique in biology. Here we use environmental genomics the reconstruction of genomic data directly from the environment to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis from a complex bioreactor community. The genome data illuminate the evolutionary history of the Planctomycetes and allow us to expose the genetic blueprint of the organism's special properties. Most significantly, we identified candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism, and discovered unexpected metabolic versatility.

Horizontal gene transfer from Bacteria to rumen Ciliates indicates adaptation to their anaerobic carbohydrates rich environment

Guenola Ricard, Neil R. McEwan, Bas E. Dutilh, Jean-Pierre Jouany, Didier Macheboeuf, Makoto Mitsumori, Freda M. McIntosh, Tadeusz Michalowski, Takafumi Nagamine, Nancy Nelson, Charles J. Newbold, Eli Nsabimana, Akio Takenaka, Nadine A. Thomas, Kazunari Ushida, Johannes H.P. Hackstein and Martijn A. Huynen (2006), "Horizontal Gene Transfer from Bacteria to rumen Ciliates indicates adaptation to their anaerobic carbohydrates rich environment", BMC Genomics 7: 22. PubMed, BMC, PDF, F1000 Recommended, ISI.

Background The horizontal transfer of expressed genes from Bacteria into Ciliates which live in close contact with each other in the rumen (the foregut of ruminants) was studied using ciliate Expressed Sequence Tags (ESTs). About 4000 ESTs were sequenced from the two main types of rumen Cilates: Entodiniomorphs (Entodinium simplex, Entodinium caudatum, Eudiplodinium maggii, Metadinium medium, Diploplastron affine, Polyplastron multivesiculatum and Epidinium ecaudatum) and Vestibuliferida, previously called Holotrichs (Isotricha prostoma, Isotricha intestinalis and Dasytricha ruminantium). Results A comparison of the sequences with the completely sequenced genomes of Eukaryotes and Prokaryotes, followed by large scale construction and analysis of phylogenies, identified 148 ciliate genes that specifically cluster with genes from the Bacteria. Of these genes, 34 cluster with genes from the Firmicutes, a phylum of Bacteria that is well represented in the rumen. The phylogenetic clustering with bacterial genes, coupled with the absence of close relatives of these genes in the Ciliate Tetrahymena thermophila, indicates that they have recently been acquired via Horizontal Gene Transfer (HGT). Conclusions Among the HGT candidates, we found an over representation (>75%) of genes involved in metabolism, specifically in the catabolism of complex carbohydrates (>45%), a rich food source in the rumen. We propose that the acquisition of these genes has facilitated the Ciliates' colonization of the rumen and provides evidence for the role of HGT in the adaptation to new niches.

A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation

Bas E. Dutilh, Martijn A. Huynen and Berend Snel (2006), "A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation", BMC Genomics 7: 10. PubMed, BMC (highly accessed), PDF, ISI.

Background The massive scale of microarray derived gene expression data allows for a global view of cellular function. Thus far, comparative studies of gene expression between species have been based on the level of expression of the gene across corresponding tissues, or on the co-expression of the gene with another gene. Results To compare gene expression between distant species on a global scale, we introduce the "expression context". The expression context of a gene is based on the co-expression with all other genes that have unambiguous counterparts in both genomes. Employing this new measure, we show 1) that the expression context is largely conserved between orthologs, and 2) that sequence identity shows little correlation with expression context conservation after gene duplication and speciation. Conclusions This means that the degree of sequence identity has a limited predictive quality for differential expression context conservation between orthologs, and thus presumably also for other facets of gene function.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Genome trees and the nature of genome evolution

Berend Snel, Martijn A. Huynen and Bas E. Dutilh (2005), "Genome trees and the nature of genome evolution", Annual Review of Microbiology 59: 191-209. PubMed, Annual Reviews.

Genome trees are a means to capture the overwhelming amount of phylogenetic information that is present in genomes. Different formalisms have been introduced to reconstruct genome trees on the basis of various aspects of the genome. On the basis of these aspects, we separate genome trees into five classes: (a) alignment-free trees based on statistic properties of the genome, (b) gene content trees based on the presence and absence of genes, (c) trees based on chromosomal gene order, (d) trees based on average sequence similarity, and (e) phylogenomics-based genome trees. Despite their recent development, genome tree methods have already had some impact on the phylogenetic classification of bacterial species. However, their main impact so far has been on our understanding of the nature of genome evolution and the role of horizontal gene transfer therein. An ideal genome tree method should be capable of using all gene families, including those containing paralogs, in a phylogenomics framework capitalizing on existing methods in conventional phylogenetic reconstruction. We expect such sophisticated methods to help us resolve the branching order between the main bacterial phyla.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise

Bas E. Dutilh, Martijn A. Huynen, William J. Bruno and Berend Snel (2004), "The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise", Journal of Molecular Evolution 58: 527-539. PubMed, PDF, F1000 Must Read.

With the sequencing of complete genomes we have the most complete molecular data for the reconstruction of the phylogeny of life. For example, instead of using the sequence similarity between proteins, as is classically done for the reconstruction of phylogenies, we can now use the number of shared genes between genomes. The goal of this project is to construct phylogenies using these and other types of "complete genome data" in new ways, combining as much of the genomic information as possible. Aside from being interesting in themselves, these genome trees are of extreme importance to detect and filter out various types of phylogenetic bias in genomic data sets, and therewith improve our methods for the prediction of protein interactions.

Bas E. Dutilh, Martijn A. Huynen, William J. Bruno and Berend Snel (2003), "The consistent signal in genome trees revealed by reducing the impact of noise", poster at ECCB, Paris, France.

Bas E. Dutilh, Martijn A. Huynen, William J. Bruno and Berend Snel (2003), "The consistent signal in genome trees revealed by reducing the impact of noise", poster at 6th Annual Conference on Computational Genomics, Boston, USA.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Decline in excision circles requires homeostatic renewal or homeostatic death of naive T cells

Bas E. Dutilh and Rob J. de Boer (2003), "Decline in excision circles requires homeostatic renewal or homeostatic death of naive T cells", Journal of Theoretical Biology 224: 351-358. PubMed, PDF.

When the TCR is formed in the thymus, fragments of DNA are excised from the T cell progenitor chromosome. These TCR rearrangement excision circles (TRECs) are stable, are not replicated in cell division and are therefore most frequent in naive T cells that have recently left the thymus. During life, the average TREC content of peripheral naive T cells decreases between one and two orders of magnitude in humans. It is generally believed that the age-dependent decrease in the production of naive T cells by the thymus is sufficient to explain the decrease in the TREC content. Here, we demonstrate that this decrease in thymic production is required, but it is not sufficient to explain the TREC data. Only if the decrease in thymic output is compensated by homeostasis can one explain the decrease in the TREC content. The homeostatic response can take two forms: when the total number of naive T cells declines, there could be an increase in the renewal rate or an increase of the average cellular lifespan.

Rob J. de Boer, Bas E. Dutilh, Mette D. Hazenberg and Frank M. Miedema (2000), "Mathematical models are required for the interpretation of T cell receptor excision circle data", poster at Joint Annual Meeting of Immunology of DGfI and NVvI, Düsseldorf, Germany. Abstract.

Bas E. Dutilh, Rob J. de Boer, Mette D. Hazenberg and Frank M. Miedema (2000), "Decline in excision circles proves homeostasis of naive T-cells", poster at Joint Annual Meeting of Immunology of DGfI and NVvI, Düsseldorf, Germany.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Reconstruction of Pyrococcus central carbohydrate metabolism

Bas E. Dutilh (2002), "Pyrobase: integrated database of Pyrococcus genes": www.cmbi.ru.nl/pyrobase.

With the sequencing of complete genomes we can predict the metabolic pathways in a species. Some of these species are of particular importance, either for medical or for industrial reasons. The Pyrococci are a genus that belong to the Archaea, one of the three branches in the evolution of cellular life, and the one about which the least is known. Pyrococcus is a hyperthermophile that lives at about 90°C, and has a large reductive potential. This makes the organism interesting for the production of certain fine chemicals (alcohols and aldehydes) from carboxyl acids.
The goal of this project is to detect the enzymes that are involved in the carbohydrate metabolism (e.g. glycolysis, citric acid cycle, fatty acid metabolism) of the Pyrococci. Some of them have already been annotated in the genome, but with the methods that are being developed in our group we have proposed new candidates. In this partly EU funded project, we work in collaboration with the Bacterial Genetics Group of John van der Oost at Wageningen University, where specific predictions can be tested.

Pyrobase is the integrated database of Pyrococcus genes made as a part of this project. The genomes of Pyrococcus abyssi, Pyrococcus furiosus, Pyrococcus horikoshii, the only three Pyrococcus species with completely sequenced genomes, were screened for Pfam protein families, the genes were assigned to a cluster of orthologous groups of proteins from the COG database, genomically linked orthologous groups were identified with STRING, and buttons are included to directly submit the protein sequence to a STRING genomic context search, and a SMART domain architecture search.

Augustinus R. Uria, Ronnie Machielsen, Bas E. Dutilh, Martijn A. Huynen and John van der Oost (2006), "Alcohol dehydrogenases from marine hyperthermophilic microorganisms and their importance to the pharmaceutical industry", presented at "International seminar and workshop on marine biodiversity and their potential for developing bio-pharmaceutical industry", May 2006, Jakarta. PDF.

M.P. Machielsen, Corné H. Verhees, Bas E. Dutilh, Martijn A. Huynen, Willem M. de Vos and John van der Oost (2002), "Distribution of alcohol dehydrogenases in Pyrococcus furiosus", poster at Extremophiles 2002, Napoli, Italy. In: Extremophiles 2002. Proceedings of the 4th international congress on extremophiles (Rossi, M., Bartolucci, S., Ciaramella, M. and Moracci, M., Eds.), p. 229. Napoli.
Computational genomics for protein function and pathway prediction

Berend Snel, Toni Gabaldón, Vera van Noort, Bas E. Dutilh and Martijn A. Huynen (2002), "Computational genomics for protein function and pathway prediction", poster at KNCV Symposium "Bioinformatics, the best of both worlds", Wageningen, The Netherlands.

2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Initiatives on sustainable development in the food sector worldwide

Bas E. Dutilh, Chris E. Dutilh and Willem H.M.M. van Laarhoven (2001), "Initiatives on Sustainable Development in the Food Sector Worldwide", Foundation for Sustainability in the Food Chain (DuVo). PDF.

In 1995 fifteen companies, active in the food chain in The Netherlands have initiated the Foundation for a Sustainable Food Chain (DuVo). The first projects carried out by DuVo were related to the identification of major environmental impacts in the food chain. Subsequently the focus changed to the identification of options for improvement along the production chain and to the development of an infrastructure, which could contain and provide such information. In 1998, DuVo has formulated a new strategy, which is composed of the following elements:

  • A dialogue with relevant stakeholders, aimed at establishing a common definition for the concept 'sustainable food chain'. In that process, measurable criteria can be developed to manage and monitor an improvement process;
  • Development of knowledge, aimed at providing factual information which can improve the content of the dialogue;
  • Open exchange of knowledge to enable as many parties as possible to share the insights which have been acquired.
DuVo organises an annual Dialogue Meeting since 1999, bringing together a broad range of stakeholders to inspire one another and exchange ideas. Also since 1999, it issues a booklet reporting on its activities every year: "Sustainability in the Food Chain" (1999),"Beginning of a Dialogue" (2000), and "Sustainability in Perspective" (2001). Of all the booklets, an English translation of the summary has been made.DuVo realised that their initiatives might inspire others, and thus hope to inform a wider international audience about their activities. For the same reason, DuVo decided to investigate whether similar initiatives exist elsewhere in the world. This report is the outcome of that investigation.
2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 1999
Gene networks from microarray data

Bas E. Dutilh and Paulien Hogeweg (1999), "Gene Networks from Microarray Data", report Binf.1999.11.01, Bioinformatics, Utrecht University: www.cmbi.ru.nl/~dutilh/genenets. Thesis site, PDF.

Since the development of the microarray technique in 1995, there has been an enormous increase in gene expression data from several organisms. Based on the view of gene systems as a logical network of nodes that influence each other's expression levels, scientists dream of being able to reconstruct the precise gene interaction network from the expression data obtained with this large scale arraying technique. Computer science shows that inference of a logical regulatory network is possible solely from sets of expression data, and mathematicians are working on the question how much data is at least necessary for reverse engineering.

Meanwhile, experimental biologists are experiencing problems in the field. The number of experiments that are necessary before attempting network reconstruction is a lot more than is generally possible in "wet" laboratories, so data compression algorithms are applied to reduce the number of nodes considered. This is however an extremely coarse representation of the intricate interconnections that exist between single genes. The resulting network of only a handful of nodes is therefore usually only sufficient to describe the experiments performed, while any possible predicting properties are absent.

In this literature thesis, I attempt to give an update on the state of the art in computerised network reconstruction techniques, and explicitly relate this to actual biological gene networks. I will go into the model formalisms used to describe genetic networks, and explain their specific advantages and disadvantages. Also, a separate chapter will be dedicated to several experimental results obtained in the research of genetic networks, and finally, a short discussion and some hypothesising is added.

Evolution of viral strain structure through host immune response

Bas E. Dutilh and Paulien Hogeweg (1999), "Evolution of viral strain structure through host immune response", talk at TMBM'99, Amsterdam, The Netherlands. In: abstracts of TMBM'99, Amsterdam: 160-161. PDF.

Recently, it was shown that host immune responses can form a strong selective pressure on the antigenic strain structure of pathogen populations. In an ODE model described by Gupta et al. [1], the evolutionary dynamics of infective agents (each viral strain is defined as a specific combination of several alleles at a number of epitopes) can lead to discrete strain structures, called discordant sets. Discordant sets consist of viruses which have no antigens in common, and together fill up the complete antigen space with their genotypes. These sets of infective agents inhibit the spreading of related pathogens in a host population, by making the hosts resistant to all antigens in the world.

We further examine potential emergent strain structures due to host immune responses, in a spatially explicit model including a population of immunologically reactive hosts. The hosts, which are individually implemented in a cellular automata machine, can each carry their own virus, and are each resistant to a specific combination of antigens. Thus, we are able to study many different assumptions on immune systems in one simple model. In the present study, for example, a host that is infected with several viruses, can collect resistances against all their antigens. Upon encounter of a newly attacking infection, it will oppose an immunity that is proportional to the amount of the infectious agent's antigens it has gathered immunity to.

Surprisingly, we discovered that spatial pattern formation in the cellular automata machine is necessary for generation of discordant strain structure such as found in the ODE model of Gupta et al. [1]. In the case of a mean field approximation (randomly reshuffling the cellular automata every timestep), agglomerative clustering techniques reveal strain structure in larger sets of viruses. There is a clear selection for minimization of the encountered immunities, and for each virus, this is optimized in a set with symmetric and minimum amounts of antigenic overlaps. Though these conditions are satisfied in any collection of discordant sets (including the complete viral population) the cofluctuating sets never contain discordants. The observed strain structures allow for larger virus populations, causing more infections during the lifetime of a host than an equal number of viruses organized in discordant sets would.

We conclude that host immune response can structure viral populations into discrete sets, which are not invadable by new mutants. Moreover, we see that spatial pattern formation, which leads to discordant sets of viruses, protects the hosts and reduces the numbers of viruses that survive.

[1] S. Gupta, N. Ferguson & R. Anderson: "Chaos, Persistance, and Evolution of Strain Structure in Antigenically Diverse Infectious Agents", Science, 280: 912-915, 1998.