Understanding the information contained in genome sequence was the major challenge of the early 21st century. The genome is the entire DNA content of a cell, including all the genes and all of the intergenic regions. Prokaryotic genomes are small with very little space between genes. The yeast genome contains 6000 genes and has a more compact organization.
Every organism is specified by genomes, which contain the biological information needed to construct and maintain a living organism. Most genomes, including those for all cellular life forms, are made of DNA (deoxyribonucleic Acid) but a few viruses have RNA (ribonucleic Acid) genomes. DNA and RNA are polymeric molecules made up of linear, unbranched chain of monomeric subunits called nucleotides.
The biological information contained in a genome is encoded in the nucleotide sequence of its DNA or RNA molecules and is divided into discrete units called genes. The information in a gene is read by proteins attached to the genome at the appropriate positions and initiate a series of biochemical reactions referred to gene expression.
For organisms with DNA genomes this process was originally looked on as comprising following two stages:
- Transcription: first producing an RNA copy of gene.
- Translation: resulting in synthesis of protein whose amino acid sequence is determined, via the genetic code, by the nucleotide sequence of the RNA transcript.
This is still an accurate description of gene expression in simple organisms such as bacteria, but it gives an incomplete picture of events involved in conversion of genomic.[113]