Back

CFB 2.0 - vision for the new funding period

Friday 15 Jan 21

CEO at The Novo Nordisk Foundation Center for Biosustainability, Bernhard Palsson, share his thoughts on what the Center will be focusing on for the next five-year funding period.

In the fall of 2018, we had already started identifying key research areas that would change the future of our field. It became evident that big data was going to be one of the fundamental drivers for change in life sciences. Biology is morphing into information science at an accelerating rate, due to high-powered computation and large dataset generation and analysis. This is the fundamental change that is the basis for CFB 2.0.

Transitioning into more data-driven science will require a cultural change, because the education of life scientists has historically focused on a very limited range of topics, such as enzymes, processes in a cell, and the resolution of genetic issues. We now need to observe the whole genome as an integrated entity. A growing fraction of biology’s future seems to belong to genome-scale science, which means that we need to take a very broad and functional overview of everything that happens in the cell at the same time. This integrative viewpoint will synergize the different branches of research that the CFB covers, which include systems biology, metabolic engineering, molecular biology, computational biology, chemical analytics, genomics, biochemistry, and bioprocessing.

Data has always been viewed as being project-specific; you do a strain design, or you delete a gene, or formulate a hypothesis, and then you generate your data accordingly. The data generated tends to be very personal and the investigators may claim ownership, even if it was funded by somebody else and is a part of a larger effort. There had also been a culture that often lacked transparency regarding the methods used. But today, all of that has changed. Data science operates on a FAIR principle; findable, accessible, interoperable, and reusable, and we are now seeing databases becoming larger and more complex, and yet more and more useful.

As an example, if you have ten projects generating expression profiling data and all of them were put together in an accessible database, a data scientist would most likely be able to browse through and find valuable information, since those ten project-specific datasets are now interoperable. The data will no longer only address those questions asked in the ten specific research papers, but also bring forward completely new results. The whole is greater than the sum of the parts. Every week at the CFB, we are finding answers to questions that were not asked when the individual data sets assembled into a database were generated.

The concept of generating big data sets and integrating all data in the same place is a paradigm shift. For example, three years ago the Center decided to build a Danish streptomyces strain collection. We then visualized all the locations where the soil bacteria were found on a map of Denmark. We will continue to add information to this strain collection. For instance, we will analyze it with the AntiSMASH-platform and get a geographical distribution of gene clusters. We plan to characterize these strains in many different ways and continue to build up a database containing disparate data types. Such a database will become a resource to query for many new applications.

As a new application area for the Center, we have established the microbial foods program where we will take a similar approach. The people working in this program are collecting an initial set of 100-150 lactobacilli strains and they are all going to have their genomes sequenced. This will produce an initial sequencing database for lactobacillus. Then we will add more data types to the database. There is surprisingly little data published on the genomics of these organisms in Denmark, even though companies, like Chr. Hansen, would most likely find it both interesting and important to have access to such a database.

When it comes to sustainable chemicals, the Center has mostly been focused on E. coli, for which we have dense and growing databases; some of the data analytics methods that we are using at CFB were based on the use of those data. Thus, new data analytics methods already exist, and we will deploy them to study the data coming from the other species that the CFB will be working with. One of the issues everyone struggles with is the large number of genes with unknown functions that you find in freshly sequenced genomes. We would like to know all the enzymes encoded in a given strain, all the transporters (determining what comes in and what goes out of the cell), and we would like to know all the transcriptional regulators that control these processes. Ultimately, we would like to have a broad knowledge base describing all these important factors for all the organisms that we work with. Knowledge-graphs will be developed by the IT team, so that we can trace relationships between different pieces of knowledge available for our target organisms.

These are just a few examples describing the basic paradigm of generating big data sets and integrating all data at the same place. In five years’ time, we might see a de novo design of genomes by contemplating large-scale synthetic biology. I think it’s becoming clear to most leading Technical Universities that they need to start to prepare for the formation of new Departments of Genome Engineering, and I project that by 2030, such departments will exist at the leading engineering schools.

CFB 2.0 - vision for the new funding period

News and filters