Can I sell my DNA sequence?

DNA analysis as a service

In record time and with the best technology in each case, LION bioscience AG wants to determine the sequence of the building blocks in the genetic material of cells and, for example, use the genetic information to make predictions about the function of proteins. This is useful for the pharmaceutical industry in order to be able to develop more targeted drugs. The newly founded company is one of the first four projects to benefit from the funding from the “BioRegio” pot. A cooperation partner on the part of the industry is a prerequisite for funding. Magnus von Knebel-Doeberitz reports on the service concept of the company located in the Heidelberg Technology Park.

The power of living nature comes from the flexibility to always create something new, better adapted: “Just as the rain does not fall to soak the grain in the field, it falls to spoil the grain in the barn. So it is with the parts in nature. The teeth, for example, sharp and pointed at the front in order to cut up the food and broad and strong in the back in order to grind the food well, were not made for the sake of the purpose, but for no reason as it were incidentally. Nevertheless, it seems to us that they were created for the sake of purpose. It is probably the same with the other things in nature. The suitable arises because it just arises, but only the one who owns it remains, everything else goes to the ground. ”Aristotle recognized this principle in his“ Physicae auscultationes ”. Every organism has successfully completed an optimal adaptation to its living conditions. Through the millions of years of evolution of life, manifestations have emerged in nature through natural selection, which have optimally developed under the most varied of living conditions. This applies, for example, to microorganisms in the deep sea that feed on petroleum, as well as to thermostable microbes that exist in geysers under extremely high temperatures and for all other beings that exist under less extreme conditions. Charles Darwin detailed this process in his book on the origin of species.

A decisive selection advantage of humans has always been their wealth of ideas to use natural processes for their struggle for survival. This has been particularly evident in the past 200 years. Due to the technical advances, especially in the fields of physics, chemistry and engineering, which were able to develop freely in the Age of Enlightenment, the lives of people on earth have fundamentally changed. Much work that used to require great effort is now performed by machines. Nevertheless, it is clear to all engineers and scientists: As clever as the designers' thoughts have been up to now, as complicated as the machines may be, they are nowhere near the potential that nature has realized in living organisms. This becomes particularly clear when you compare the efficiency of the best machines with that of biochemical processes, such as glycolysis. The fact that nature has much better tricks to do justice to the tasks on earth also applies to medicine. No drug is as efficient as nature's self-healing powers. The enormous success of the vaccination strategies, which exploit precisely these self-healing powers, is good evidence for this postulate.

Basic biological research over the past 40 years has gained significant knowledge about information and functional processes in living organisms, which have had an enormous influence on our understanding of nature. Nucleic acids were recognized as the fundamental control molecules of life, genetic information. They define the structure and structure of proteins, which in turn are composed of amino acids and represent the actual functional molecules of life. Nucleic acids are made up of four different nucleotides (adenosine, guanine, cytosine and thymidine) that are linked to one another in a linear fashion. The genetic information content of the nucleic acids consists in the fact that the sequence of three consecutive nucleotides codes for a specific amino acid. The nucleic acid sequence codes the order of the amino acids that are assembled to form a protein, and thus determines the function and properties of the protein. In addition to the coding sequences on the DNA, there are also non-coding areas that exercise regulatory functions and can thus determine, for example, which DNA segments in a particular cell should be overwritten into proteins and when. Coding and non-coding sequences that direct the synthesis of a protein are called a gene. The totality of all genes in an organism forms the genome. These findings made it possible to deal technically with genetic information and thus brought about a fundamental change in the understanding of biology, a paradigm shift, the consequences of which can hardly be assessed today. For the first time in the history of science, it has become possible to read the information of life by first analyzing the sequence of genetic information units (nucleic acid sequencing). Theoretical structure predictions can be used to draw conclusions about the function of a protein encoded by any genetic information. The same is true for the regulatory sequences. This led to the development of a completely new scientific discipline, bioinformatics.

The information that determines all functional processes in nature, the nucleic acid sequence, is suddenly available for further technical handling of the genetic information. It can be isolated from one organism in the form of certain nucleic acids and transferred to the genome of another. For example, it is possible to selectively modify bacteria, yeasts or higher organisms in such a way that they produce large quantities of a certain human hormone very efficiently. On the basis of the sequence data, genetic information can also be changed in a targeted manner and modified genetic information for scientific or technical application can be created using structure and function predictions. The principles of nature, especially the unmatched high efficiency of metabolic processes, can be used as the basis of the new "bioengineering". It is possible to change the genes that code for certain technically interesting enzymes in such a way that the efficiency of the enzymes is increased or adapted to certain tasks. For example, the enzymatic processes that contribute to the generation of energy from crude oil in deep-sea bacteria can be used to dispose of crude oil in an environmentally friendly manner after tanker accidents. Many other examples from the most diverse areas of technology could be named here. In their entirety, the new knowledge about how to deal with genetic information has led to the biotechnological revolution in which we ourselves move ceaselessly and faster and faster - without noticing it every day.

The handling of genetic information is also becoming more and more important in medicine. Like all properties of living organisms, many pathological manifestations of life are defined by the genetic information of the nucleic acid sequences. This includes such complex pathological events as cancer, high blood pressure and metabolic disorders. On the basis of the sequence analysis of the pathologically altered genetic information, it is possible in certain cases to make predictions about a possible disease that is already present or that occurs later, as well as about a possible pharmacological treatment strategy. Occasionally, preventive operations are even carried out in order to avoid diseases that occur with certainty. The area of ​​preventive medicine will become enormously important as a result of the new technology.

The sequence analysis of genetic information is a central and fundamental basis for all areas of biotechnological research and application. The importance attached to DNA sequence analysis worldwide is also expressed in the current efforts to identify the genomes of entire organisms, including humans, to sequence in toto.

In order to read the sequence of nucleic acids and thus their information content as efficiently as possible, different methods were developed - originally by Walter Gilbert and Alan Maxam, chemical DNA sequencing, and Fred Sanger, enzymatic DNA sequencing. The “Sanger method”, also known as “dideoxy sequencing”, is still the most common today. The DNA analysis on its basis allowed the analysis process to be automated very quickly. The European Molecular Biology Laboratory (EMBL) and especially the working group around Wilhelm Ansorge has become one of the leading institutions in Europe, but also worldwide, that deals with the automation of DNA sequencing.

More than 95 percent of the previous DNA sequence analysis took place in scientific or clinical laboratories, in which small to medium-sized sequence analyzes were initially carried out. The “ALF-Automat” and the “ALF-Express”, both of which were developed and patented by Wilhelm Ansorge's working group and are sold by the Pharmacia company, do justice to this number of samples. The sales of ALF machines have skyrocketed in the last two years, which underlines the increasing need for automatic DNA analysis.

However, the demand for sequencing capacities far beyond the previous level is also becoming increasingly important. If the genomes of organisms are to be analyzed in toto, large fragments of DNA have to be read in order to obtain the maximum amount of genetic information. Since all life processes are defined by the expression of genetic information, the need to analyze the expressed genetic information, for example under pathological conditions, is becoming increasingly important. New sequencing machines are required for this, which far exceed the capacities of the previous ones. Here too, Wilhelm Ansorge's working group has developed a new machine, "ARAKIS", which can routinely sequence 40 clones with run lengths of over 1500 bp in parallel. By using two different lasers (two dye technique), the capacity of the machines can be doubled. It is now possible to read 100 kb and more per day on a machine without any problems. Compared to previous methods, this is a capacity that allows the efficient solution of complex sequencing problems in all areas of DNA sequence analysis.

The systems for DNA sequence analysis developed at EMBL have repeatedly given decisive impulses in technological development over the past few years and led to commercially successful products. The strength of the technology lies in the successful interaction of all system components: the hardware in the form of the automatic sequencing devices with a throughput of up to 120 kilobases per run, reading lengths of up to 1500 bases per sample with an accuracy of over 99 percent, the flexible biochemistry that allows to analyze the samples of different origins exactly, and the software that enables the fastest possible processing - from data acquisition to automatic sequence determination to preparation for more detailed sequence analysis and database research (GeneSkipper software). Due to the technical possibility of working with two laser detection systems simultaneously in the EMBL sequencing system, the sequence information of both DNA strands can be determined simultaneously in one step using the so-called “doublex” method with the help of two differently colored primer molecules. This not only significantly lowers costs, but also significantly increases accuracy, which is of crucial importance in the analysis of clinical material in the molecular diagnosis of hereditary diseases.

By using special long glass plates, EMBL technology allows up to 1500 bases to be determined in one reaction (that is, up to 3000 bases when working on both DNA strands at the same time). The technology is therefore ideally suited for efficient “full-length cDNA sequencing”, in which, for example, in the “human genome project” it is of interest to determine the coding regions of all human genes. Currently, the existing cDNA libraries, which are systematically sequenced in several genome centers and are also of great pharmacological interest, have an average size of 1500 to 1700 bases and could be read in a single experiment using EMBL technology.

In parallel to the further developments of the automatic DNA sequencing machines, significant progress has also been made at EMBL in the field of sequencing biochemistry. New enzymes with which sequence reactions can be carried out more effectively and which above all allow the direct sequencing of amplified PCR products (thermocyclic sequencing) have also been developed in recent years and made available for routine use.

Medicine also benefited from the improved analysis technology. By 2003, the entire human genome with its roughly three billion base pairs is said to have been read. If this information is then available for comparative analyzes, it will be possible to trace individual diseases back to specific changes in certain genes. Today this is already possible with a small but rapidly growing number of diseases, such as certain metabolic disorders, some hereditary cancers and certain cardiovascular diseases. The more sequence information is available, the more interactions of individual variants of the genetic information can be examined with regard to their predisposing role for certain diseases. The analysis of the complex interactions and the possibilities derived from them to recognize disease predispositions at an early stage and to specifically prevent the outbreak of diseases will have a lasting impact on medical practice in the future. This is already a reality for individual diseases, for example for familial adenomatous polyposis coli. Patients in these families develop malignant tumors of the colon at a young age. Until now, the tumors were often recognized too late. Today, the affected family members are therefore regularly advised to undergo extensive preventive examinations (colonoscopies) in order to be able to discover the impending tumor disease in good time. From a statistical point of view, however, only every second family member inherits the predisposition for the disease, so that the preventive medical check-up does not even have to be carried out in half of the people. The family members could thus be spared the stressful regular colonoscopies.

An affected carrier can be identified through the DNA sequence analysis of certain gene segments, even without having to carry out invasive examinations. A special preventive program can now be proposed to the patient in order to identify and treat a tumor in good time. The unaffected person can be relieved of concern that they might be affected by a simple test. With the help of EMBL's DNA sequencing technology, the genetic information relevant to the disease was identified as routine screening for the patients and their families in the section for molecular diagnostics and therapy at the Heidelberg University Surgical Clinic.

Sequence analysis as the basis of biotechnology

The example of hereditary colon cancer shows how quickly DNA analysis can become routine in diseases. As a result, the need for sequencing is increasing and goes beyond the scope of a scientific-clinical laboratory. DNA analysis in the context of diagnosing disease predispositions naturally raises a wide variety of ethical questions that must not be influenced by the forces of the free market or commercial interests. The more intensive discussion about the molecular genetic diagnosis of inherited disease predispositions will make a major contribution to the clarification of hitherto unresolved ethical questions in this field. A technically first-class implementation of DNA sequence diagnostics, which requires appropriate DNA sequencing capacities, should be located in a commercially working laboratory, since only such laboratories have sufficient long-term experience and technical prerequisites for these analyzes. Nevertheless, nucleic acid analysis should only be carried out there in direct cooperation with the responsible human geneticists or other professionally qualified doctors. To this end, clear protocols must be drawn up and submitted to the relevant specialist committees. A commercial laboratory must ensure that the molecular diagnosis of hereditary disease predispositions may only be carried out with the consent of the affected patient at the instigation of the responsible medical profession and exclusively in the interests of affected patients and their families. The DNA sequence analysis laboratory is to be seen here as a competent provider of nucleic acid analysis and not as an institution that influences the indication for carrying out corresponding tests.

In the future, technologies that make it possible to depict more complex relationships between gene expression will also be of increasing interest. For example, it can be a question of analyzing how, in certain clinical pictures or stages, a large number of genes are regulated in relation to one another, how certain genes are strengthened or reduced or no longer expressed at all. If, in the ideal case, all genes of an organism are known, the aim will be to make quantitative statements about the expression rate of different “transcripts” in a comparison case - pathological versus healthy. With the help of the recently developed SAGE technology (Serial Analysis of Gene Expression) it is possible to quantitatively record thousands of "transcripts" in a very short time using a clever PCR cloning approach in combination with an efficient sequencing technology based on the total mRNA To draw conclusions about changes in the expression profile. Here, too, the combination of cloning technology with “high-through-put” EMBL sequencing technology is proving to be successful and very promising.

The more genetic information is read and the more it becomes known about the function of the sequences and proteins derived from it, the more conclusions can be drawn about basic biological functional principles based on the genetic information read. This leads to a new scientific discipline, bioinformatics. The evaluation of the DNA sequence data of baker's yeast, which is regarded as a model organism in biotechnology, has proven to be a prime example of the efficiency and success of bioinformatic analyzes. Analytics will very quickly have far-reaching consequences for the diagnosis, treatment and healing of human diseases (for example for Alzheimer's disease, see article on page 37).

The genome of the baker's yeast has been deciphered almost completely. In order to be able to make statements about the function of the more than 6000 genes, the sequence of each yeast gene is compared with all reference sequences that are stored in the sequence databases that are distributed around the world. If a sufficiently similar sequence is found there, the function of which is already known, conclusions can be drawn about the function or the information content of the yeast sequence examined. So far, such analyzes have been carried out interactively on the computer by experts; For the complex yeast genome they would have taken many months with modern workstations. Such large sequence data require high-performance and highly automated tools in two ways. On the one hand out of the sheer necessity of maximum computing power for the use of the fastest algorithms and heuristics and on the other hand as an indispensable knowledge support for experts. The rapidly growing databases of gene and protein families from genome projects now far exceed the knowledge that an individual can acquire in his lifetime.

The functional analysis of the yeast genome was carried out in the European technology and production center of Silicon Graphics in Cortaillod (Switzerland) in just three days and delivered over twelve gigabytes of result data. The drastic time savings were made possible by the parallel use of a network of four supercomputers. The pioneering project for further genetic research was carried out jointly by scientists from EMBL (Heidelberg), the European Bioinformatics Institute (Cambridge) and the European Chemistry Technology Center from Silicon Graphics (Basel). The use of a software package from EMBL was also of central importance for the success of the project. It enables automatic sequence analysis and functional predictions derived from it. It has an expert system with rule-based knowledge and decides which sequences should best be examined with which of the available analysis programs for homologies and sequence properties. The results are then automatically annotated for the respective sequences. The success of the project is unique. Of the more than 6000 gene sequences of the yeast genome, the function of around nine percent of the proteins encoded there could be predicted and their three-dimensional structure visualized. Another 59 percent could predict the function without the three-dimensional structure. A likely function could be predicted in about seven percent.

Although no direct function could be predicted for about 14 percent, homology could be predicted. For only 11 percent, neither function nor homology could be predicted.

Further developments based on the software strategy that made the analysis of the baker's yeast genome possible are carried out by LION biosciences by the co-founder and developer of the program, Dr. Reinhard Schneider and Dr. Georg Casari, made available. Your work will be carried out in closer cooperation with Dr. Peer Bork, who works as a visiting scientist at the Max Delbrück Center Berlin at EMBL, has been further developed for the specific applications of LION AG. With bioinformatics, LION will have a world-leading biomathematical consulting and analysis service at its disposal.

But who needs genetic information for what purpose? Where are the customers of a company specializing in nucleic acid analysis? The development of molecular biology has made it possible to change, implement or apply the information read in any desired way. Targeted predictions through gene design enable the synthesis of “made-to-measure proteins” that could be used for any imaginable use. The technology has enormous potential for scientific and industrial applications, especially in the chemical and pharmaceutical industries. The possible applications range from detergents to pharmaceuticals, from environmental technology to the computer industry (construction of so-called protein or DNA chips).

Scientific institutions as well as industrial companies from all areas of technology, chemistry, pharmacy and medicine will increasingly have a need for DNA sequence analysis and evaluation in order to be able to compete successfully in their respective fields in the future. There is a special need for the use of the nucleic acid sequence information for the modification of organisms that fulfill certain desired services, for example in the field of environmental technology or energy generation, for the production of improved products in the chemical industry, for example improved enzyme activities for detergents. There is also a need for the analysis of the gene expression profiles of living organisms under certain conditions and the identification of new genes that are relevant for certain biological functions, as well as predictions of the pharmacological influence of modified proteins in certain diseases, clinical diagnoses based on nucleic acids, genetic vaccination methods and gene therapy.

To date, sequence analyzes have essentially been carried out in academic laboratories. The enormously increased demand for sequencing services for all biotechnological and biomedical applications, however, by far exceeds the capacity that academic institutions are able to and are also able to provide. This results in a strong demand for a service provider who provides this service comprehensively at the highest scientific and technological level. In the USA, numerous companies have already been founded in the last five years, mostly from university research institutions, which, with different focuses, take into account the need for nucleic acid analysis, which has already risen sharply some time ago. This development has not yet been reflected to the same extent in Europe. Most of the tasks of nucleic acid analysis are still carried out by university and non-university public research institutions. These considerations led to the establishment of LION bioscience (Laboratories on the Investigation of Nucleotide Sequences).

The corporate concept of LION bioscience is based on offering comprehensive nucleic acid sequence analysis and interpretation with the latest and most efficient technology, in particular as a routine service for institutions from science or industry; for projects for the total analysis of entire genomes, for example as a contract service for research institutions and industry; to support the diagnosis of inherited and other diseases as a service for clients from the health system; for the analysis of genetic expression profiles of organisms under physiological and pathophysiological conditions; as a comprehensive biomathematical information and advisory service; for the creation of reagents for the analysis of the expression and function of genetic information, for example prepared probes or monoclonal antibodies.

In addition, a nucleic acid processing service is installed as an upstream service, and sequence evaluation and interpretation as part of bioinformatics is offered as a downstream service. The individual corporate goals are to be supplemented by a comprehensive and scientifically competent advisory program for LION customers.

The company was founded in March 1997 by Prof. Wilhelm Ansorge, Dr. Friedrich von Bohlen, Dr. Peer Bork, Dr. Georg Casari, Prof. Magnus von Knebel-Doeberitz, Dr. Reinhard Schneider and Dr. Hartmut Voss founded. Dr. Friedrich von Bohlen has taken over the management as a member of the board. In addition to the bodies prescribed by law, a scientific advisory board is planned to advise and support the company in professional, technical and methodological terms. The company's headquarters are in Heidelberg, per se a center of molecular biosciences in Europe. From the outset, we plan to work closely with universities and other institutes that are active in the field of nucleic acid sequence analysis. A cooperation agreement has already been agreed with EMBL. Similar cooperation agreements are also planned with the DKFZ and the University of Heidelberg.

Author:
Prof. Dr. Magnus von Knebel-Doeberitz
Section for Molecular Diagnostics and Therapy, Surgical University Clinic, Im Neuenheimer Feld 110, 69120 Heidelberg,
Telephone (06221) 56 28 76