Autocorrect for your DNA
Information Theory and the Genetic Code
New research at NSU has revealed the information content associated with each letter of DNA. This work may improve our understanding of how the genetic code can resist the effects of mutations that may cause cancer or inherited diseases. The same genetic code is used by almost all living organisms to translate three-letter “words,” or codons, of DNA into amino acids, which are strung together to form proteins. Assistant Professor Louis Nemzer, a biophysicist at the Halmos College of Natural Sciences and Oceanography, used methods from the field of information theory to calculate the “Shannon entropy” of each letter of DNA, depending on the type of base (A, T, G, or C) and its position in the codon. Although many people have never heard of Claude Shannon, his pioneering work at Bell Labs on measuring the maximum amount of information contained in messages is still crucial today for digital communication technologies, including text messaging, WiFi, and mobile data transmission. So why did Shannon choose to call his measure of information “entropy,” a word more associated with the physics of an ideal gas? “There are very close connections between thermodynamics and information theory” said Dr. Nemzer, “entropy in physics really just measures how much information about a system you are missing.” By using the equations originally developed for thermodynamics and adapted by information theory, he calculated how “determinative” each letter, or nucleotide, is for the properties of the amino acid it codes for. Changing the properties of even a single amino acid too much may cause the entire protein it is in to lose its ability to function, with potentially negative health outcomes. Fortunately, the genetic code has a kind of built-in “autocorrect” feature that causes most single-letter mutations to produce the same, or a chemically similar, amino acid to the original. This helps make the genetic code robust more to error. The new research, just published in a pair of related papers in the Journal of Theoretical Biology and BioSystems, quantifies how important each letter is to the final properties of the amino acid. It was also found that the genetic code takes advantage of the fact that not all mutations are equally likely. The mutations in DNA that would cause the most severe changes to proteins are less frequent, and more easily repaired. Dr. Nemzer hopes to use the knowledge gained from this research to improve our understanding of the risk factors for cancer and genetic disorders, as well as trace the evolution of different genes between species.
Louis R. Nemzer. Shannon information entropy in the canonical genetic code.
Journal of Theoretical Biology 415 158–170 (2017) DOI: 10.1016/j.jtbi.2016.12.010
Louis R. Nemzer. A binary representation of the genetic code.
BioSystems 155 10–19 (2017) DOI: 10.1016/j.biosystems.2017.03.001