FIRST things first, what is a genome? The website of the US National Centre for Biotechnology Information has a simple explanation.
It says, “Life is specified by genomes. Every organism, including humans, has a genome that contains all of the biological information needed to build and maintain a living example of that organism. The biological information contained in a genome is encoded in its deoxyribonucleic acid (DNA) and is divided into discrete units called genes.”
The Sime Darby factsheet tells us that a DNA consists of four nucleotide bases: adenine (A), thymine (T), cytosine (C) and guanine (G). This ACGT sequence determines the individual hereditary characteristics of an organism.
Nucleotides are molecules that, when joined together, make up the structural units of RNA (ribonucleic acid) and DNA. These are basically chemical building blocks. Genome sequencing is the process of determining the exact order of the nucleotide bases.
The oil palm genome is estimated to have 1.7 billion base pairs. The Sime Darby project assembled 1.594 billion base pairs, giving a 93.8% coverage.
The US National Science Foundation website explains that to sequence the maize genome, scientists collect and purify DNA from maize plants in the laboratory. The purified DNA is “chopped up” to produce DNA small enough to analyse.
A sequencing machine determines the actual order of about 1,000 DNA bases at a time. By analysing the sequence data with sophisticated computer programs, the fragments can be aligned by overlapping their ends. Repeated sequences throughout the genome make it difficult to match up the correct pieces. When the project is completed, researchers will know the sequence of all 2.5 billion DNA bases in the maize genome.
The sequencing of the oil palm genome is a similar process, but is said to be more complicated because of the high proportion (over 60%) of repeated sequences.