|
|||
|
Description of the group: "Bioinformatics is a multidisciplinary research area in the interface between molecular biology and informatics, with contributions from areas like mathematics, statistics, physics, chemistry and medicine. The area is essential to modern molecular biology, and in particular for the relatively new field of systems biology where large, complex biological systems are mapped and simulated. The working group is looking into how bioinformatics may affect future research, in particular in the area of medical diagnosis and treatment. It is looking at what demands this may put on future information systems, including databases, data mining and artificial intelligence. It is also trying to identify the supporting technologies and methods that have to be available to realise future potentials, for example from mathematics and statistics."
Bioinformatics towards 2020
Finn Drabløs1, Mikael Hammer2, Astrid Lægreid3, Jon Olav Hauglid4, Bjørn K. Alsberg5
1Department of Cancer Research and Molecular Medicine, NTNU, N-7489 Trondheim, Norway. Email finn.drablos@medisin.ntnu.no. 2Department of Electronics and Telecommunications, NTNU, N-7491 Trondheim, Norway. Email mikael.hammer@tele.ntnu.no. 3Department of Cancer Research and Molecular Medicine, NTNU, N-7489 Trondheim, Norway. Email astrid.lagreid@medisin.ntnu.no. 4Department of Computer and Information Science, NTNU, N-7491 Trondheim, Norway. Email jon.olav.hauglid@idi.ntnu.no. 5Department of Chemistry, NTNU, N-7491 Trondheim, Norway. Email bjorn.alsberg@chem.ntnu.no.
Abstract. Future developments in bioinformatics and related research areas of relevance to informatics are discussed. The main trend is large-scale integration of data from a variety of sources, with focus on simulation and analysis of complete complex systems (“systems biology”) rather than individual parts. This requires a close integration of complementary techniques, including informatics. It is also important to have specialists in e.g. informatics, mathematics, statistics, physics, chemistry and biology with good interdisciplinary communication skills, working in multidisciplinary environments. Research and development associated with health surveys and biobanks is presented as multidisciplinary area of particular relevance to NTNU.
1 IntroductionBioinformatics is a multidisciplinary research area at the interface between informatics and biology, with additional input from e.g. statistics and mathematics. Traditionally it has focused on molecular biology, in particular sequence analysis. This includes in silico gene finding, sequence analysis, sequence alignment, structure prediction etc. In parallel to bioinformatics several related and partially overlapping research areas have evolved. Computational biology focuses more on computations and simulations associated with biological macromolecules and biological systems, including simulation of cellular processes. Mathematical biology analyses e.g. population dynamics in complex ecological systems, whereas physical biology studies physical processes and phenomena found in biological systems. Nanobiotechnology combines nanomaterials and nanotechniques with biotechnology and biological molecules, using the best from both worlds, whereas nanomedicine uses nanotechnology for medical purposes. Medical informatics handles and retrieves medical data, e.g. electronic medical records or medical publications. Medical statistics and statistical genetics uses statistical methods to analyse medical data looking e.g. for correlation between genetic variation and susceptibility to specific diseases. However, there is an increasing need to look beyond these traditional (and artificial) boundaries, and gradually these partly separate fields are interacting, merging and exchanging ideas, methods and approaches [1]. In particular there is a strong trend towards looking at large, complex biological system as a whole, rather than as a collection of specialised sub-topics. This approach is often called “systems biology” [2-4], and represents integration of informatics, biology, physics, mathematics and statistics. It also integrates several technological disciplines, as a prerequisite for getting the necessary experimental data. We believe that this tendency will continue, with increasingly close integration of relevant disciplines in order to analyse and understand even more complex biological systems. The disciplines will probably approach a symbiotic relationship, where there is a beneficial mutual dependency between complementary disciplines. Even though large biological data sets are collected at all levels of bioinformatical research, much non-genetic information is still missing, and this must be collected and included in the bioinformatical modelling in the future in order to arrive at new biological insights. However, most needs and requirements, e.g. with respect to informatics, will be defined by the biological and medical problems we are trying to solve.
2 Major trendsWithin the main trend towards closer integration of sub-disciplines, several sub-trends can be identified.
Automation / robotisationA high degree of automation, including an extensive use of robots for sample preparation and handling, has been a prerequisite for genomic research, in particular in areas like genome sequencing and microarray analysis. This process will continue. In the future most routine tasks in the laboratory will be handled by robots. It is also very likely that robots and computers will be involved in designing and controlling experiments as well, in the sense that they run and optimise experiments that are designed (by the computer) to answer specific questions. Proof-of-principle design of robot-controlled experiments has been published [5]. Nanoscale measurement techniquesLiving tissue is a very heterogenous system. In order to really understand such systems it is necessary to do measurements on a very small scale, e.g. on individual cells, or even sub-cellular compartments. The only way to achieve this in relatively intact cells is by nanoscale measurement devices. This will give very detailed information, and will be an important research area. Advanced simulations are needed to develop these measurement techniques. Such methods can give large amounts of measurements, in principle like for example an online, continuous microarray experiment. Therefore advanced methods are also needed for data processing and analysis. A related area is sample analysis using microfluidic devices ("lab-on-a-chip"), where extremely small sample volumes may be needed. This means that very large amounts of experimental data can be generated from small samples, both in research and in e.g. medical diagnosis.
Artificial intelligenceResearch is not only experiments and data analysis, it is also a creative process leading to novel questions and inventive experiments to answer these questions. It is difficult to predict to what extent artificial intelligence, computers and robots will take over the creative process. It is certainly true that computers are better at handling large data sets and complex relationships, and can find novel patterns in such data. Intelligent computers will therefore be increasingly important for stimulating the creative process in the human researcher. In particular artificial intelligence with true creativity would be a very useful tool, as it would be able to analyse more complex data sets than the human brain. However, improved methods for data analysis, presentation and visualisation can make very complex data accessible also to human interpretation. Data mining by sorting through data to identify patterns and establish relationships is already important, and it will be essential to develop improved and novel methods.
Data integrationAdvanced data analysis as described above has to be based on data from many different sources. For systems biology efficient integration of very different types of data is therefore a prerequisite. Such data may be sequence data for genes and proteins, measurement data from various types of instruments, multi-category answers from health survey questionnaires, description of symptoms and various tests from medical records, observations from publications, image data from different types of imaging experiments etc. These data need to be stored and accessed efficiently, and tools are needed to compare, classify and run statistical tests. This will be a challenging task, where standards and ontologies are needed. However, it is also important to be able to include old data e.g. from publications. These data are at present time not standardised or annotated in accordance with established ontologies. Therefore advanced data processing will be needed in combination with construction (and continuous re-construction) of ontologies. Data modelling, in order to get a good understanding of complex data structures, can be useful in this context.
Multi-level simulations and computationBiological systems are complex at several levels. In principle we want to simulate systems by starting at a quantum-mechanical level and go all the way through to full ecosystems. This requires simulation at different levels of theory. Full simulation of e.g. enzymatic processes requires a quantum-mechanical treatment, whereas e.g. transport processes and mechanical properties of cellular structures need various levels of approximation and abstraction. Ideally we want to treat the different elements of such complex systems at the level of theory that actually is sufficient to reproduce experimental data. However, even this can be extremely challenging. There are also other types of computations that are relevant in this context. One example is phylogenetic estimates, where one tries to model previous evolution based on current-day data. Another example is medical statistics. New experiment types, larger population studies and research on more complex, multifactorial diseases means new challenges for medical statistics. In general, statistical methods will be essential for most future research in systems biology.
VisualisationVisualisation is already important in bioinformatics, in particular in the area of structural biology where advanced visualisation of macromolecular structures is essential. This is reflected in the price of many commercial tools for protein structure modelling and visualisation, where the price tag may be more than USD 15.000 even for a single-computer installation. A detailed understanding of individual 3D structural properties may be essential in e.g. a drug design project, therefore virtual reality tools may be used in such projects. However, also for other types of data such visualisation will become increasingly important. Examples are visualisation of cellular 3D structures based on stacked slices from imaging experiments, flexible visualisation techniques for analysing very high-dimensional data, or techniques for highlighting correlations in complex, multi-property data sets. Here on-line, real time manipulation of the visualisation, even for very large and complex data structures, will be essential.
3 Grand challengesSince bioinformatics by definition is intertwined with biology and medicine, it is important to look at some major challenges in biology and medicine that most likely will be important focus areas for the coming decade(s) [6]. They represent biological questions, but with important methodological challenges where informatics, mathematics, statistics, physics, chemistry etc becomes essential.
Based on these and similar problems, some immediate challenges for informatics and computational biology have been formulated [6].
4 TrainingWhat kind of training do students need in order to meet these challenges? Bioinformatics is a multi-disciplinary field, and two different strategies can be envisioned. In the “traditional” approach we train students to become experts in key areas like mathematics, informatics, statistics etc. In the “futuristic” approach we train students to become multi-disciplinary experts, so that e.g. bioinformatics students have a relatively strong background in both informatics as well as biology. This is the ideal situation. However, the “futuristic” approach is probably unrealistic. With the level of knowledge needed to become an expert, it will for most students be unrealistic to become true experts in more than one area, and to become competitive we need highly qualified students in key areas. This means that we probably have to use the “traditional” approach as a basis. On the other hand, it is essential that these students learn multi-disciplinary communication, e.g. an informatics student must learn how to discuss a biological problem with a biologist and see how it can be solved using informatics tools. So the “traditional” approach has to evolve into a more “forward-looking” approach with more focus on multi-disciplinary collaboration. This means that some basic training in relevant areas is needed, e.g. molecular biology for informatics students, or informatics and data analysis for biotechnology students. The basis for this must be established using relevant cross-disciplinary courses, like “Molecular biology for technologists” (an existing course at NTNU). However, the most efficient training process will most likely take place in truly multi-disciplinary research groups, collaborating on actual research projects. Giving students the opportunity to do projects and thesis work in such research groups will actively stimulate cross-disciplinary communication. This means that such research groups have to be available, and they must have the necessary resources to take care of these students. However, the benefit will be large. Such research groups will have a very good potential for really novel research, and the students get excellent training. This will be a win-win situation for all parts, where multi-disciplinary collaboration becomes a symbiotic and creative process of mutual benefit.
5 Implementing the future – Health surveys and biobanksIf we want to turn these visions into reality it may be useful to identify multidisciplinary research areas that are able to integrate a large number of activities, and where research data from these areas will be of national and international importance. One such area that is of particular relevance to NTNU is health surveys and biobanks [10]. There are two major types of biobanks. Population-based biobanks are normally associated with corresponding health surveys. A specific population (e.g. the full population of a selected region) is invited to participate. They are normally examined (e.g. height, weight, blood pressure, bone mass etc), they fill in questionnaires about e.g. life style, and a blood sample may be collected and stored for future use. Based on these data as well as links to "end-point" registers (cancer, stroke etc) and analysis performed on the biological material, novel knowledge can be generated about important diseases like asthma, cancer and diabetes. In particular for complex, multi-factorial diseases such large population-based surveys most likely is a prerequisite for new knowledge. Patient-based biobanks have a slightly different focus. These are based on biological material collected from patients, normally as part of the diagnostic process at the hospital. This material is interesting in its own right, as it can be used for better understanding of specific diseases, in particular at a genetic and molecular level. However, it becomes even more interesting when it can be linked to a population-based survey, where it can be used to confirm and elaborate data generated from the population-based material. This may include e.g. looking for genetic differences between cancer subtypes that show subtle differences in the population-based material. Such differences may have significant implications for successful diagnosis and treatment. NTNU and St. Olav, the University Hospital in Trondheim, is responsible for two very important biobanks, both supported by the national FUGE (functional genomics) project. The HUNT health survey and biobank is an almost complete screening of the population in the county of Nord-Trøndelag. It has been carried out two times, in HUNT 1 (1984-86) only health survey data were collected, whereas in HUNT 2 (1995-97) also blood samples were collected. A new survey, HUNT 3, is planned for 2006-08. HUNT is relatively unique on an international scale because a stable population is followed over a relatively long time. It is also important that it can be linked to the second large biobank in the region, the patient-based biobank at the University Hospital. This makes it possible to do very thorough investigations based on this material. However, full utilisation of this potential requires a co-ordinated effort from many different disciplines.
It is clear that biobanks and health surveys has a considerable potential as an integrated multidisciplinary research project that can serve as a platform for active collaboration between different disciplines. There may be other areas with similar potential. However, the biobank and health survey area has the advantage of being a well-established ongoing activity with considerable potential for expansion and novel initiatives. NTNU is currently doing foresight-analysis on biobanks. Acknowledgements We want to thank the programme committee for Programme for Bioinformatics at NTNU for valuable input. In addition to F.D., A.L. and B.K.A. the committee members are Kjell Bratbergsengen, Arne Halaas, Atle Bones, Catharina Davies and Mette Langaas from NTNU, Per Magnus from the Norwegian Institute of Public Health and Erik Must from Fondsfinans ASA. In particular we want to thank Mette Langaas for her contributions to this paper. We also want to thank the NTNU biobank foresight group for important input. References [1] V. Maojo, C.A. Kulikowski, Bioinformatics and medical informatics: collaborations on the road to genomic medicine?, J Am Med Inform Assoc 10 (2003) 515-522. [2] L. Hood, Systems biology: integrating technology, biology, and computation, Mech Ageing Dev 124 (2003) 9-16. [3] L. Hood, D. Galas, The digital code of DNA, Nature 421 (2003) 444-448. [4] E. Werner, In silico multicellular systems biology and minimal genomes, Drug Discovery Today 8 (2003) 1121-1127. [5] R.D. King, K.E. Whelan, F.M. Jones, P.G. Reiser, C.H. Bryant, S.H. Muggleton, D.B. Kell, S.G. Oliver, Functional genomic hypothesis generation and experimentation by a robot scientist, Nature 427 (2004) 247-252. [6] F.S. Collins, E.D. Green, A.E. Guttmacher, M.S. Guyer, A vision for the future of genomics research, Nature 422 (2003) 835-847. [7] J. Kling, Ultrafast DNA sequencing, Nat Biotechnol 21 (2003) 1425-1427. [8] E. Davidov, J. Holland, E. Marple, S. Naylor, Advancing drug discovery through systems biology, Drug Discov Today 8 (2003) 175-183. [9] R. Molidor, A. Sturn, M. Maurer, Z. Trajanoski, New trends in bioinformatics: from genome sequence to personalized medicine, Exp Gerontol 38 (2003) 1031-1036. [10] J. Kaiser, Biobanks. Population databases boom, from Iceland to the U.S, Science 298 (2002) 1158-1161. |