View: The biological world
Life support/medical technology

Working Group: Bioinformatics


Description of the group:

"Bioinformatics is a multidisciplinary research area in the interface between molecular biology and informatics, with contributions from areas like mathematics, statistics, physics, chemistry and medicine. The area is essential to modern molecular biology, and in particular for the relatively new field of systems biology where large, complex biological systems are mapped and simulated. The working group is looking into how bioinformatics may affect future research, in particular in the area of medical diagnosis and treatment. It is looking at what demands this may put on future information systems, including databases, data mining and artificial intelligence. It is also trying to identify the supporting technologies and methods that have to be available to realise future potentials, for example from mathematics and statistics."


Position paper
 
Slide Show

Bioinformatics towards 2020

 

Finn Drabløs1, Mikael Hammer2, Astrid Lægreid3, Jon Olav Hauglid4, Bjørn K. Alsberg5

 

1Department of Cancer Research and Molecular Medicine, NTNU, N-7489 Trondheim, Norway. Email finn.drablos@medisin.ntnu.no. 2Department of Electronics and Telecommunications, NTNU, N-7491 Trondheim, Norway. Email mikael.hammer@tele.ntnu.no. 3Department of Cancer Research and Molecular Medicine, NTNU, N-7489 Trondheim, Norway. Email astrid.lagreid@medisin.ntnu.no. 4Department of Computer and Information Science, NTNU, N-7491 Trondheim, Norway. Email jon.olav.hauglid@idi.ntnu.no. 5Department of Chemistry, NTNU, N-7491 Trondheim, Norway. Email bjorn.alsberg@chem.ntnu.no.

 

Abstract. Future developments in bioinformatics and related research areas of relevance to informatics are discussed. The main trend is large-scale integration of data from a variety of sources, with focus on simulation and analysis of complete complex systems (“systems biology”) rather than individual parts. This requires a close integration of complementary techniques, including informatics. It is also important to have specialists in e.g. informatics, mathematics, statistics, physics, chemistry and biology with good interdisciplinary communication skills, working in multidisciplinary environments. Research and development associated with health surveys and biobanks is presented as multidisciplinary area of particular relevance to NTNU.

 

1 Introduction

Bioinformatics is a multidisciplinary research area at the interface between informatics and biology, with additional input from e.g. statistics and mathematics. Traditionally it has focused on molecular biology, in particular sequence analysis. This includes in silico gene finding, sequence analysis, sequence alignment, structure prediction etc. In parallel to bioinformatics several related and partially overlapping research areas have evolved. Computational biology focuses more on computations and simulations associated with biological macromolecules and biological systems, including simulation of cellular processes. Mathematical biology analyses e.g. population dynamics in complex ecological systems, whereas physical biology studies physical processes and phenomena found in biological systems. Nano­bio­tech­nology combines nanomaterials and nanotechniques with biotechnology and biological molecules, using the best from both worlds, whereas nanomedicine uses nanotechnology for medical purposes. Medical informatics handles and retrieves medical data, e.g. electronic medical records or medical publications. Medical statistics and statistical genetics uses statistical methods to analyse medical data looking e.g. for correlation between genetic variation and susceptibility to specific diseases.

However, there is an increasing need to look beyond these traditional (and artificial) boundaries, and gradually these partly separate fields are interacting, merging and exchanging ideas, methods and approaches [1]. In particular there is a strong trend towards looking at large, complex biological system as a whole, rather than as a collection of specialised sub-topics. This approach is often called “systems biology” [2-4], and represents integration of informatics, biology, physics, mathematics and statistics. It also integrates several technological disciplines, as a prerequisite for getting the necessary experimental data.

We believe that this tendency will continue, with increasingly close integration of relevant disciplines in order to analyse and understand even more complex biological systems. The disciplines will probably approach a symbiotic relationship, where there is a beneficial mutual dependency between complementary disciplines. Even though large biological data sets are collected at all levels of bioinformatical research, much non-genetic information is still missing, and this must be collected and included in the bioinformatical modelling in the future in order to arrive at new biological insights. However, most needs and requirements, e.g. with respect to informatics, will be defined by the biological and medical problems we are trying to solve.

 

2 Major trends

Within the main trend towards closer integration of sub-disciplines, several sub-trends can be identified.

 

Automation / robotisation

A high degree of automation, including an extensive use of robots for sample preparation and handling, has been a prerequisite for genomic research, in particular in areas like genome sequencing and microarray analysis. This process will continue. In the future most routine tasks in the laboratory will be handled by robots. It is also very likely that robots and computers will be involved in designing and controlling experiments as well, in the sense that they run and optimise experiments that are designed (by the computer) to answer specific questions. Proof-of-principle design of robot-controlled experiments has been published [5].

Nanoscale measurement techniques

Living tissue is a very heterogenous system. In order to really understand such systems it is necessary to do measurements on a very small scale, e.g. on individual cells, or even sub-cellular compartments. The only way to achieve this in relatively intact cells is by nanoscale measurement devices. This will give very detailed information, and will be an important research area. Advanced simulations are needed to develop these measurement techniques. Such methods can give large amounts of measurements, in principle like for example an online, continuous microarray experiment. Therefore advanced methods are also needed for data processing and analysis.

A related area is sample analysis using microfluidic devices ("lab-on-a-chip"), where extremely small sample volumes may be needed. This means that very large amounts of experimental data can be generated from small samples, both in research and in e.g. medical diagnosis.

 

Artificial intelligence

Research is not only experiments and data analysis, it is also a creative process leading to novel questions and inventive experiments to answer these questions. It is difficult to predict to what extent artificial intelligence, computers and robots will take over the creative process. It is certainly true that computers are better at handling large data sets and complex relationships, and can find novel patterns in such data. Intelligent computers will therefore be increasingly important for stimulating the creative process in the human researcher. In particular artificial intelligence with true creativity would be a very useful tool, as it would be able to analyse more complex data sets than the human brain. However, improved methods for data analysis, presentation and visualisation can make very complex data accessible also to human interpretation. Data mining by sorting through data to identify patterns and establish relationships is already important, and it will be essential to develop improved and novel methods.

 

Data integration

Advanced data analysis as described above has to be based on data from many different sources. For systems biology efficient integration of very different types of data is therefore a prerequisite. Such data may be sequence data for genes and proteins, measurement data from various types of instruments, multi-category answers from health survey questionnaires, description of symptoms and various tests from medical records, observations from publications, image data from different types of imaging experiments etc. These data need to be stored and accessed efficiently, and tools are needed to compare, classify and run statistical tests. This will be a challenging task, where standards and ontologies are needed. However, it is also important to be able to include old data e.g. from publications. These data are at present time not standardised or annotated in accordance with established ontologies. Therefore advanced data processing will be needed in combination with construction (and continuous re-construction) of ontologies. Data modelling, in order to get a good understanding of complex data structures, can be useful in this context.

 

Multi-level simulations and computation

Biological systems are complex at several levels. In principle we want to simulate systems by starting at a quantum-mechanical level and go all the way through to full ecosystems. This requires simulation at different levels of theory. Full simulation of e.g. enzymatic processes requires a quantum-mechanical treatment, whereas e.g. transport processes and mechanical properties of cellular structures need various levels of approximation and abstraction. Ideally we want to treat the different elements of such complex systems at the level of theory that actually is sufficient to reproduce experimental data. However, even this can be extremely challenging.

There are also other types of computations that are relevant in this context. One example is phylogenetic estimates, where one tries to model previous evolution based on current-day data. Another example is medical statistics. New experiment types, larger population studies and research on more complex, multifactorial diseases means new challenges for medical statistics. In general, statistical methods will be essential for most future research in systems biology.

 

Visualisation

Visualisation is already important in bioinformatics, in particular in the area of structural biology where advanced visualisation of macromolecular structures is essential. This is reflected in the price of many commercial tools for protein structure modelling and visualisation, where the price tag may be more than USD 15.000 even for a single-computer installation. A detailed under­standing of individual 3D structural properties may be essential in e.g. a drug design project, therefore virtual reality tools may be used in such projects. However, also for other types of data such visualisation will become increasingly important. Examples are visualisation of cellular 3D structures based on stacked slices from imaging experiments, flexible visualisation techniques for analysing very high-dimensional data, or techniques for highlighting correlations in complex, multi-property data sets. Here on-line, real time manipulation of the visualisation, even for very large and complex data structures, will be essential.

 

3 Grand challenges

Since bioinformatics by definition is intertwined with biology and medicine, it is important to look at some major challenges in biology and medicine that most likely will be important focus areas for the coming decade(s) [6]. They represent biological questions, but with important methodological challenges where informatics, mathematics, statistics, physics, chemistry etc becomes essential.

  • Comprehensively identify the structural and functional components encoded in the human genome. Elucidate the organisation of genetic networks and protein pathways and establish how they contribute to cellular and organismal phenotypes.
    – This requires efficient methods for identifying DNA/protein interactions, mapping of regulatory networks, identification of alternative gene products, mapping of protein products, characterisation of DNA folding and 3D structure etc.
  • Develop a detailed understanding of the heritable variation in the human genome. Understand evolutionary variation across species and the mechanisms underlying it.
    – This requires efficient methods for characterisation of SNPs (single-nucleotide polymorphisms) and possibly very high-speed genome sequencing [7].
  • Develop robust strategies for identifying the genetic contributions to disease and drug response. Develop genome-based approaches to prediction of disease susceptibility and drug response, early detection of illness, and molecular taxonomy of disease states.
    – This requires improved approaches towards health surveys and biobanks, and improved methods in medical statistics and statistical genetics for handling such experiments.
  • Use new understanding of genes and pathways to develop powerful new therapeutic approaches to disease.
    –This requires advanced modelling of regulatory and metabolic pathways, experimental and theoretical structure prediction, improved in silico drug design techniques and better strategies for drug synthesis [8]. This will be an important aspect of personalised medicine, where treatment is optimised towards the genetic and metabolic profile of each individual [9].

Based on these and similar problems, some immediate challenges for informatics and computational biology have been formulated [6].

  • New approaches to problem solving, in particular from data on physical properties (e.g. protein folding, protein-protein interactions, genetic variation).
  • Reusable software modules.
  • Robust methods for elucidating the effects of non-genetic factors on health and disease.
  • New ontologies for description of complex data types.
  • Improved database technologies and knowledge management systems, in particular for integration of many different data types across disciplines.

4 Training

What kind of training do students need in order to meet these challenges? Bioinformatics is a multi-disciplinary field, and two different strategies can be envisioned. In the “traditional” approach we train students to become experts in key areas like mathematics, informatics, statistics etc. In the “futuristic” approach we train students to become multi-disciplinary experts, so that e.g. bioinformatics students have a relatively strong background in both informatics as well as biology. This is the ideal situation. However, the “futuristic” approach is probably unrealistic. With the level of knowledge needed to become an expert, it will for most students be unrealistic to become true experts in more than one area, and to become competitive we need highly qualified students in key areas. This means that we probably have to use the “traditional” approach as a basis.

On the other hand, it is essential that these students learn multi-disciplinary communication, e.g. an informatics student must learn how to discuss a biological problem with a biologist and see how it can be solved using informatics tools. So the “traditional” approach has to evolve into a more “forward-looking” approach with more focus on multi-disciplinary collaboration. This means that some basic training in relevant areas is needed, e.g. molecular biology for informatics students, or informatics and data analysis for biotechnology students. The basis for this must be established using relevant cross-disciplinary courses, like “Molecular biology for technologists” (an existing course at NTNU). However, the most efficient training process will most likely take place in truly multi-disciplinary research groups, collaborating on actual research projects. Giving students the opportunity to do projects and thesis work in such research groups will actively stimulate cross-disciplinary communication. This means that such research groups have to be available, and they must have the necessary resources to take care of these students. However, the benefit will be large. Such research groups will have a very good potential for really novel research, and the students get excellent training. This will be a win-win situation for all parts, where multi-disciplinary collaboration becomes a symbiotic and creative process of mutual benefit.

 

5 Implementing the future – Health surveys and biobanks

If we want to turn these visions into reality it may be useful to identify multidisciplinary research areas that are able to integrate a large number of activities, and where research data from these areas will be of national and international importance. One such area that is of particular relevance to NTNU is health surveys and biobanks [10].

There are two major types of biobanks. Population-based biobanks are normally associated with corresponding health surveys. A specific population (e.g. the full population of a selected region) is invited to participate. They are normally examined (e.g. height, weight, blood pressure, bone mass etc), they fill in questionnaires about e.g. life style, and a blood sample may be collected and stored for future use. Based on these data as well as links to "end-point" registers (cancer, stroke etc) and analysis performed on the biological material, novel knowledge can be generated about important diseases like asthma, cancer and diabetes. In particular for complex, multi-factorial diseases such large population-based surveys most likely is a prerequisite for new knowledge.

Patient-based biobanks have a slightly different focus. These are based on biological material collected from patients, normally as part of the diagnostic process at the hospital. This material is interesting in its own right, as it can be used for better understanding of specific diseases, in particular at a genetic and molecular level. However, it becomes even more interesting when it can be linked to a population-based survey, where it can be used to confirm and elaborate data generated from the population-based material. This may include e.g. looking for genetic differences between cancer subtypes that show subtle differences in the population-based material. Such differences may have significant implications for successful diagnosis and treatment.

NTNU and St. Olav, the University Hospital in Trondheim, is responsible for two very important biobanks, both supported by the national FUGE (functional genomics) project. The HUNT health survey and biobank is an almost complete screening of the population in the county of Nord-Trøndelag. It has been carried out two times, in HUNT 1 (1984-86) only health survey data were collected, whereas in HUNT 2 (1995-97) also blood samples were collected. A new survey, HUNT 3, is planned for 2006-08. HUNT is relatively unique on an international scale because a stable population is followed over a relatively long time. It is also important that it can be linked to the second large biobank in the region, the patient-based biobank at the University Hospital. This makes it possible to do very thorough investigations based on this material.

However, full utilisation of this potential requires a co-ordinated effort from many different disciplines.

  • Robotics and low-temperature storage technologies.Biological material in biobanks is often stored in liquid nitrogen for increased stability. The storage system should include automatic computer-controlled sample retrieval for analysis. Robotics is also important for sample handling e.g. between different instruments for sample preparation and analysis.
  • Database solutions, data integrity and security. All data, including health survey data and data from analysis of biological material, have to be stored in a database, and FUGE has already allocated some funding to a pre-project on databases for biobanks. Such data have to be verified with respect to integrity (that data and relationships are correct), and they have to be stored according to requirements for such data (access control and encryption). Data integrity has to include data from e.g. laboratory information management systems (LIMS), like batch numbers for chemicals used during analysis. This is essential for tracking of data affected by experimental errors.
  • Analytical methods. The novelty of the research will at least partly depend upon which analytical methods are available. Methods that can give new information on e.g. the regulatory state of cells in a biological sample will lead to unique results. Nanotechnology and advanced electronics will probably be important for the development of new analytical methods.
  • Electronic patient journals (EPJ). Easy access to patient data will make it much easier to link patient status (e.g. diagnosis) to health survey data. The patient status can probably be given a more realistic description compared to a typical end-point diagnosis, in particular with respect to how it develops over time. NTNU has a national centre for research on EPJ systems.
  • Real-time wireless data collection. Future health surveys may follow a subset of the population even closer than today, using a wireless system to monitor health status continuously for extended time periods. This will give more realistic data on health status. It may also be relevant to use similar systems to monitor environmental effects, like exposure to pollution or noise.
  • Statistics. The statistical analysis is a key step in any research project based on health surveys and biobank material. In particular for the analysis of complex multifactorial diseases improved approaches are needed.
  • Bioinformatics and systems biology. In order to actually explain how e.g. a given genetic difference may lead to increased stroke risk, and hopefully how that risk can be reduced, bioinformatics and systems biology will be essential. Statistical analysis of the health survey and biobank material may identify the crucial differences that are associated e.g. with an increased risk, but bioinformatics and systems biology is needed in order to understand what this actually means on a molecular and cellular level.
  • Intellectual property rights, technology transfer and commercialisation. Such projects will most likely lead to novel ideas for better diagnostic methods or treatment, including diagnostic kits, drugs etc. The technology transfer office (TTO) at NTNU will be an important partner in commercialisation based on results from biobank research.
  • Ethics. Health-related research can easily lead to ethical questions that have to be handled properly as part of the project. This is well established in medical research, in particular through regional ethical committees. However, this is an important research area in itself, where NTNU is active.

It is clear that biobanks and health surveys has a considerable potential as an integrated multidisciplinary research project that can serve as a platform for active collaboration between different disciplines. There may be other areas with similar potential. However, the biobank and health survey area has the advantage of being a well-established ongoing activity with considerable potential for expansion and novel initiatives. NTNU is currently doing foresight-analysis on biobanks.


Acknowledgements

We want to thank the programme committee for Programme for Bioinformatics at NTNU for valuable input. In addition to F.D., A.L. and B.K.A. the committee members are Kjell Bratbergsengen, Arne Halaas, Atle Bones, Catharina Davies and Mette Langaas from NTNU, Per Magnus from the Norwegian Institute of Public Health and Erik Must from Fondsfinans ASA. In particular we want to thank Mette Langaas for her contributions to this paper. We also want to thank the NTNU biobank foresight group for important input.

References

[1]     V. Maojo, C.A. Kulikowski, Bioinformatics and medical informatics: collaborations on the road to genomic medicine?, J Am Med Inform Assoc 10 (2003) 515-522.

[2]     L. Hood, Systems biology: integrating technology, biology, and computation, Mech Ageing Dev 124 (2003) 9-16.

[3]     L. Hood, D. Galas, The digital code of DNA, Nature 421 (2003) 444-448.

[4]     E. Werner, In silico multicellular systems biology and minimal genomes, Drug Discovery Today 8 (2003) 1121-1127.

[5]     R.D. King, K.E. Whelan, F.M. Jones, P.G. Reiser, C.H. Bryant, S.H. Muggleton, D.B. Kell, S.G. Oliver, Functional genomic hypothesis generation and experimentation by a robot scientist, Nature 427 (2004) 247-252.

[6]     F.S. Collins, E.D. Green, A.E. Guttmacher, M.S. Guyer, A vision for the future of genomics research, Nature 422 (2003) 835-847.

[7]     J. Kling, Ultrafast DNA sequencing, Nat Biotechnol 21 (2003) 1425-1427.

[8]     E. Davidov, J. Holland, E. Marple, S. Naylor, Advancing drug discovery through systems biology, Drug Discov Today 8 (2003) 175-183.

[9]     R. Molidor, A. Sturn, M. Maurer, Z. Trajanoski, New trends in bioinformatics: from genome sequence to personalized medicine, Exp Gerontol 38 (2003) 1031-1036.

[10]    J. Kaiser, Biobanks. Population databases boom, from Iceland to the U.S, Science 298 (2002) 1158-1161.


 



Members of the working group:
Associate Professor Finn Drabløs, Department of Cancer Research and Molecular Medicine, MTFS
finn.drablos@medisin.ntnu.no
Professor Astrid Lægreid, Department of Cancer Research and Molecular Medicine, DMF
astrid.lagreid@medisin.ntnu.no
Professor Bjørn Alsberg, Department of Chemistry, NT
bjorn.alsberg@chem.ntnu.no
University lecturer Jon Olav Hauglid, Department of Computer and Information Science, IME
jon.olav.hauglid@idi.ntnu.no
Post Doc Mikael Hammer, Department of Electronics and Telecommunications
mikael.hammer@tele.ntnu.no