Genomics and regulatory networks of Baltic Sea bacterioplankton communities

The Baltic Sea suffers from environmental problems such as eutrophication and overfishing that are manifested in e.g. massive blooms of cyanobacteria in the summer and in oxygen depleted bottom waters. In all marine environments, including the Baltic Sea, microbes are essential for the fluxes of nutrients and energy and are integral components of the food web. Despite their ecosystem importance, little is known about Baltic Sea microbial communities and how they are regulated. In collaboration with Jarone Pinhassi’s lab at the Linnaeus University we conduct a project where the surface water of an off-shore station east of Öland, the Linnaeus Microbial Observatory (LMO), is sampled twice weekly (started 2011, still ongoing). Samples from this station form the basis for two of our main projects concerning the Baltic Sea.

Genomic reconstruction of dominant Baltic Sea bacterioplankton

Although the sea is comprised of thousands of different microbial species, their abundance distributions are very skewed and a limited number of species make up most of the cells (20 species constitute >50% of the cells in surface water). Having the genomes of the more abundant populations sequenced will be important not only for the understanding of their specific roles in the ecosystem but also for understanding the ecosystem as a whole. We are performing shotgun metagenomics on a large number of the seasonal LMO samples. Overlapping short sequence reads are assembled into longer genome fragments. However, due to e.g. repeated sequences within and across the genomes, assembly results in fragmented genomes. A challenge is then to group fragments (contigs) stemming from the same genome. Our program CONCOCT that we recently developed with Christopher Quince and others (Alneberg & Bjarnason et al 2014) allows us to, given that we have a data from multiple samples, group contigs into genomes with high accuracy. With this method and our extensive dataset we aim to present a catalogue of the most abundant microbial genomes of the Baltic Sea. As a proof of concept we used a similar approach to reconstruct the genome of one the most abundant bacteria in the Baltic Sea (Herlemann et al mBio 2013).

Reconstructing a regulatory network of Baltic Sea plankton assemblages

By sequencing of PCR amplified taxonomic marker genes we can accurately measure the abundance of thousands of different bacterioplankton and microscopic eukaryotes in the seawater samples. For each species we will obtain a temporal profile over the years that the sampling has been on-going. In addition to the sequences we will have environmental data on temperature, nutrient levels etc. From this data the aim is to deduce the primary control mechanisms of the major plankton groups, such as bloom forming cyanobacteria. Preliminary data indicate that the different blooming events (spring bloom of eukaryotic phytoplankton and summer bloom of cyanobacteria) induce major state shifts in the bacterioplankton assemblages. Algorithms stemming from systems biology will be used to reconstruct a regulatory network of the ecosystem, consisting of environment-species and species-species interactions. We are also planning to use single cell analysis as a complementary approach to deduce species-species interactions.

Exploring microbial signatures for water quality assessment

Since microbial communities respond rapidly to changes in environmental conditions, both at the species composition and gene expression scale, they have great potential for being used as biomarkers of pollutants or other aspects of ecosystem health status. DNA sequencing technology develops extremely fast and within a few years it is likely that portable devices will be available for sequencing directly in the field. We are partners of the EU-funded project BLUEPRINT (Biological lenses using gene prints). The aim of this project is to use meta-omics for monitoring of Baltic Sea water health status. Based on metagenomic and/or transcriptomic data from hundreds of well characterized samples we will train algorithms to recognize different types of environmental conditions (such as early signs of oxygen depletion) based on the frequencies of taxonomic marker genes and/or genes coding for specific functions and/or metabolic pathways. We are responsible for the bioinformatics workpackage within this project. We are also participating in the EU-funded AFISmon project, where the aim is to develop an autonomous sampling device for collecting and storing water samples for metagenomics and metatranscriptomics analysis. Together with Stockholm Water we are exploring the potential of metagenomic sand taxonomic marker gene sequencing for tracking contaminations of wastewater in the stonewater system of Stockholm. This may result from misconnections or leakages in the water systems, and can be highly problematic since wastewater may go un-cleaned into receiving water bodies. The goal is to find a faster and more precise alternative to the current method based on culturing the indicator species E. coli on selective media.

Next generation plankton monitoring

Unicellular plankton form the basis of marine food webs but they also include harmful members such as fish-killing species and producers of toxins that accumulate in shellfish. Quantitative assessment of unicellular plankton is therefore an important component of environmental monitoring programs. So far such monitoring has largely been based on microscopic counting of primarily phytoplankton and microzooplankton. This is a time-consuming process that requires highly skilled taxonomists, and pico- and nanoplankton, the most abundant planktonic fraction, often cannot be identified by this method. There is hence an urge for a faster, more robust and more precise way of quantifying plankton species in environmental samples. Together with SMHI we are conducting a project where we evaluate and optimize methodology for quantifying planktonic taxa in marine water samples by high-throughput DNA sequencing of taxonomic marker genes. Quantification based on microscope counting and sequencing will be conducted on samples collected at stations in the Baltic Sea, Kattegat and Skagerrak. The long-term goal is to incorporate the sequencing-based methodology in monitoring programs as a complement to traditional methods.

Taxonomic binning of metagenomic contigs

Metagenomic sequencing can successfully produce genome fragments (contigs) from complex microbial communities through assembly, but since the assembly does not reconstruct entire genomes, binning of contigs is necessary. Traditionally, unsupervised metagenomic binning programs have used sequence composition, exploiting k-mer biases between species, and/or coverage differences within a sample. Together with Christopher Quince and others we developed the software CONCOCT, Clustering cONtigs on COverage and ComposiTion, (Alneberg & Bjarnason et al 2014). CONCOCT combines sequence compisition and coverage, two sources of information commonly used in binning but also extends the way coverage is commonly used. By using patterns of coverage distributions over multiple samples instead of a single one a stronger signal for clustering is achieved. Using Bayesian statistics to estimate the parameters of a Gaussian Mixture model, CONCOCT can automatically cluster contigs into genomes without the need to fix the number of clusters in beforehand.

Human Gut Microbiome

Our bodies house more bacterial cells than own cells, most of which are located in the lower intestine. This microbiome helps the body extract energy from otherwise undigestible nutrient sources such as complex polysacharides. The microbiome also interacts with the immune system and associations between microbiota composition and various immune disorders have started to be unveiled. Together with Lars Engstrands group at the Karolinska Institute and Maria Jenmalms group at Linköping University we use taxonomic marker gene sequencing to investigate how the human gut microbiome develops in infants and how it is affected by factors such as mode of delivery. By following the children over several years we can relate microbiome diversity in infancy with health status later in life. We have shown that infants delivered by Cesarian section have a lower bacterial diversity in the gut early in life (Jakobsson et al 2014) and that low diversity in infancy is associated with atopic eczema in infancy (Abrahamson et al 2012) and asthma development at seven years of age (Abrahamson et al 2014).

Metagenomics-driven enzyme discovery

The great functional diversity of microbes (microbes can be found growing on virtually any substrate) opens up for discovery of biotechnology relevant enzymes within their genomes. Biomass from Swedish forests has a great potential for second-generation biofuel production as well as for bio-refinery products. In order to ferment the biomass into e.g. ethanol the lignocellulose first needs to be degraded to free the sugars. This can be achieved by using cocktails of carbohydrate active enzymes (CAZymes) but efficiencies of existing cocktails are rather low and not tailored towards Swedish forest biomass. In collaboration with Henrik Aspeborg’s group at the Division of Industrial Biotechnology, KTH, and researchers at SLU, we have initiated a project aimed at finding and characterising novel CAZymes of the microbiome of the moose rumen. Branches and shoots of trees make up a substantial fraction of the moose diet, why the rumen of moose likely harbors a microbiome capable of degrading lignocellulose from Swedish trees. We find a large number of novel carbohydrate active enzymes in the bacterial genomes. These will be further characterised in silico and together with Aspeborg’s group selected novel genes will be expressed in the lab and their specificities and modes of action investigated.

Primer Design

By sequencing of PCR amplified taxonomic marker genes we can accurately profile the microbial diversity within many samples in parallel. However, to minimize biases it is important to use primers that bind as non-selectively as possible. We have developed the program DegePrime (Hugerth et al - 1) that given a sequence alignmentfro each position finds the primer with a maximum degeneracy given by the user that matches as many sequences as possible. We used this to develop the primer pair 341F-805R that was show by another group to be the best primer pair for bacterial diversity studies out of 512 pairs tested. We also used DegePrime to greatly improve the coverage of a popular primer pair for assessing bacterial and archaeal diversity (515F-805R) and for developing a new primer pair for 18S rRNA gene surveys (574F-1123R) (Hugerth et al - 2).

Check out our projects on GitHub:

Where we share our bioinformatic software and lab protocols.