top of page

The Gene by Siddhartha Mukherjee

A very informative read and you will learn a lot but the book is very long and I felt it could be shorter in some areas. Siddhartha definitely has a lot of knowledge about genetics and is a brilliant guy. He discusses a very wide range of topics about the gene including its history, its future, the role it plays in our cells, how it was discovered, the role it played in Siddhartha's family and more. I would highly recommend the book because it's probably the best book on the gene since the author has a very good writing style which makes this book easy to read for a difficult topic but it is a little challenging and you will need a dictionary at times. I wouldn't recommend it to someone who is just looking to understand the basics about genetics though because the time investment wouldn't be worth it and there are much better time efficient ways to do this.


This book is the story of the birth, growth, and future of one of the most powerful and dangerous ideas in the history of science: the “gene,” the fundamental unit of heredity, and the basic unit of all biological information.

Genes reside on chromosomes – long, filamentous structures buried within cells that contain tens of thousands of genes linked together in chains. Humans have forty-six such chromosomes in total – twenty-three from one parent and twenty-three from another. The entire set of genetic instructions carried by an organism is termed a genome (think of the genome as the encyclopedia of all genes, with footnotes, annotations, instructions, and references). The human genome contains about between twenty-one and twenty-three thousand genes that provide the master instructions to build, repair, and maintain humans.

Individuals within a species are constantly competing for scarce resources. When these resources form a critical bottleneck – during a famine, for instance – a variant better adapted for an environment is “naturally selected.” The best adapted – the “fittest” – survive (the phrase survival of the fittest was borrowed from the Malthusian economist Herbert Spencer). These survivors then reproduce to make more of their kind, thereby driving evolutionary change within a species.

Flocks of finches fed on fruit until their population exploded. A bleak season came upon the island – a rotting monsoon or a parched summer – and fruit supplies dwindled drastically. Somewhere in the vast flock, a variant was born with a grotesque beak capable of cracking seeds. As famine raged through the finch world, this gross-beaked variant survived by feeding on hard seeds. It reproduced, and new species of finch began to appear. The freak became the norm. As new Malthusian limits were imposed – diseases, famines, parasites – new breeds gained a stronghold, and the populations shifted again. Freaks became norms, and norms became extinct. Monster by monster, evolution advanced.

A child was an ancestral composite, but a supremely simple one: one-half from the mother, one-half from the father. Each parent contributed a set of instructions, which were decoded to create a child.

A gene was defined by what a gene does: it was a carrier of hereditary information.

The set of genetic instructions is called the genotype.

Theodsius Dobzhansky collects fruit flies and discovers that a fly species named Drosophila Pseudoobscura has multiple gene variants. He takes this species with 2 opposite gene variants and puts an equal amount into a cold temperature carton and a room temperature carton over 4 months. One gene variant flourishes in the room temperature carton and the opposite gene variant flourished in the cold temperature carton proving Darwin’s theory of natural selection.

Genotype + environment + triggers + chance = phenotype

This was the missing link in Darwin’s logic; reproductive incompatibility, ultimately derived from genetic incompatibility drove the origin of novel species.

How could heat-killed bacterial debris – no more than a lukewarm soup of microbial chemicals – have transmitted a genetic trait to a live bacterium by mere contact? The simplest explanation, then, was that genetic information had passed between the two strains in a chemical form. During “transformation,” the gene that governed virulence – producing the smooth coat versus the rough coat – had somehow slipped out of the bacteria into the chemical soup, then out of that soup into live bacteria and become incorporated into the genome of the live bacterium. Genes could, in other words, be transmitted between two organisms without any form of reproduction.

Sugars provided energy. Fats stored it. Proteins enabled chemical reactions, speeding and controlling the pace of biochemical processes, thereby acting as the switchboards of the biological world.

Wilksons shows Watons an x-ray photo of DNA and Watson figures out the structure of DNA.

Each strand of DNA, recall, is a long sequence of “bases” – A, T, G, and C. The bases are linked together by the sugar-phosphate backbone. The backbone twists on the outside, forming a spiral. The bases face in, like treads in a circular staircase. The opposite strand contains the opposing bases: A matched with T and G matched with C. Thus, both strands contain the same information – except in a complementary sense: each is a “reflection,” or echo, of the other (the more appropriate analogy is a ying-and-yang structure). Molecular forces between the A:T and G:C pairs lock the two strands together, as in a zipper. A double helix of DNA can thus be envisioned as a code written with four alphabets – ATGCCCTACGGCCCATCG… - forever entwined with its mirror-image code.

DNA --> RNA --> Protein or, at a conceptual level: Gene --> Message --> Function

From bacteria to elephants – from red eyed flies to blue-blooded princes – biological information flowed through living systems in a systematic, archetypal manner: DNA provided instructions to build RNA. RNA provided instructions to build proteins. Proteins ultimately enabled structure and function – brining genes to life.

Even though every cell contains the same set of genes – an identical genome – the selective activation or repression of particular subsets of genes allows an individual cell to respond to its environments. The genome was an active blueprint – capable of deploying selected parts of its code at different times and in different circumstances. Proteins act as regulatory sensors, or master switches, in this process – turning on and turning off genes, or even combination of genes, in a coordinate manner. Like the master score of a bewitchingly complex symphonic work, the genome contains the instructions for the development and maintenance of organisms. But the genomic “score” is inert without proteins. Proteins actualize this information – by activating or repressing genes (some of these regulatory proteins are also called transcription factors).

Units of hereditary information, encoded in DNA and packaged on chromosomes, are transmitted through sperm and egg into an embryo, and from the embryo to every living cell in an organism’s body. These units encode messages to build proteins – and the messages and proteins, in turn, enable the form and function of a living organism.

That humans and worms have about the same number of genes – around twenty thousand – and yet the fact that only one of these two organisms is capable of painting the ceiling of the Sistine Chapel suggests that the number of genes is largely unimportant to the physiological complexity of the organism. “It is not what you have,” as a certain Brazilian samba instructor once told me, “it is what you do with it.”

Human physiology, by analogy, is the developmental consequence of certain genes intersecting with other genes in the right sequence, in the right space. A gene is one line in a recipe that specifies an organism. The human genome is the recipe that specifies a human.

In a conceptual sense, every virus is a professional gene carrier. Viruses have a simple structure: they are often no more than a set of genes wrapped inside a coat – a “piece of bad news wrapped in a protein coat.” When a virus enters a cell, it sheds its coat, and begins to use the cell as a factory to copy its genes, and manufactures new coats, resulting in millions of new viruses budding out of the cell. Viruses have thus distilled their life cycle to its bare essentials. They live to infect and reproduce; they infect and reproduce to live.

Genetics, like any language, is built out of basic structural elements – alphabet, vocabulary, syntax, and grammar. The “alphabet” of genes has only four letters: the four bases of DNA – A, C, G and T. The “vocabulary” consists of the triplet code: three bases of DNA are read together to encode one amino acid in a protein; ACT encodes Threonine, CAT encodes Histidine, GGT encodes Glycine, and so forth. A protein is the “sentence” encoded by a gene, using alphabets strung together in a chain (ACT-CAT-GCT encodes Threonine-Histidine-Glycine). And the regulation of genes, as Monod and Jacob had discovered, creates a context for these words and sentences to generate meaning. The regulatory sequences appended to a gene – i.e. signals to turn a gene on or off at certain times and in certain cells – can be imagined as the internal grammar of the genome.

It is the impulse of science to try to understand nature, and the impulse of technology to try to manipulate it. Recombinant DNA had pushed genetics from the realm of science into the realm of technology. Genes were not abstractions anymore. They could be liberated from the genomes of organisms where they had been trapped for millennia, shuttled between species, amplified, purified, extended, shortened, altered, remixed, mutated, mixed, matched, cut, pasted, edited; they were infinitely malleable to human interventions. Genes were no longer just the subjects of study, but the instruments of study. There is an illuminated moment in the development of a child when she grasps the recursiveness of language: just as thoughts can be used to generated words, she realizes, words can be used to generate thoughts. Recombinant DNA had made the language of genetics recursive.

Nearly every drug works by binding to its target and enabling or disabling it – turning molecular switches on or off. To be useful, a drug must bind to its switches – but to only a selected set of switches; an indiscriminate drug is no different from a poison. Most molecules can barely achieve this level of discrimination – but proteins have been designed explicitly for this purpose. Proteins, recall, are the hubs of the biological world. They are the enablers and the disablers, the machinators, the regulators, the gatekeepers, the operators, of cellular reactions. They are the switches that most drugs seek to turn on and off.

Normal cells could acquire these cancer-causing mutations through four mechanisms. The mutations could be caused by environmental insults, such as tobacco smoke, ultraviolet light, or X-rays – agents that attack DNA and change its chemical structure. Mutations could arise from spontaneous errors during cell division (every time DNA is replicated in a cell, there’s a minor chance that the copying process generates an error – an A switched to a T, G, or C, say). Mutant cancer genes could be inherited from parents, thereby causing hereditary cancer syndromes such as retinoblastoma and breast cancer that coursed through families.

The discovery of ten thousand new proteins, with more than ten thousand new functions, would have amply justified the novelty of the project – yet the most surprising feature of the worm genome was not protein-encoding genes, but the number of genes that made RNA messages, but no protein. These genes – called “noncoding” (because they do not encode proteins) – were scattered through the genome, but they clustered on certain chromosomes. There were hundreds of them, perhaps thousands. Some noncoding genes were of known function: the ribosome, the giant intracellular machine that makes proteins, contains specialized RNA molecules that assist in the manufacture of proteins. Other noncoding genes were eventually found to encode small RNAs – called micro-RNAs – which regulate genes with incredible specificity. But many of these genes were mysterious and ill defined.

When Mendel discovered the “gene” in 1865, he knew it only as an abstract phenomenon: a discrete determinant, transmitted intact across generations, that specified a single visible property or phenotype, such as flower color or seed texture in peas. Morgan and Muller deepened this understanding by demonstrating that genes were physical – material – structures carried on chromosomes. Avery advanced this understanding of genes by identifying the chemical form of that material: genetic information was carried in DNA. Watson, Crick, Wilkins, and Franklin solved its molecular structure as a double helix, with two paired, complementary strands.

Comparisons between human, worm, and fly genes revealed several provocative patterns. OF the 289 human genes known to be involved with a disease, 177 genes – more than 60% - had a related counterpart in the fly. There were no genes for sickle-cell anemia or hemophilia – flies do not have red blood cells or form clots – but genes involved in colon cancer, breast cancer, Tay-Sachs disease, muscular dystrophy, cystic fibrosis, Alzheimer’s disease, Parkinson’s disease, and diabetes, or close counterparts of those genes, were present. Although separated by four legs, two wings, and several million years of evolution, flies and humans shared core pathways and genetic networks.

Published as a book with a standard-size font, [the human genome] would contain just four letters… AGCTTGCAGGGG… and so on, stretching, inscrutably, page upon page, for over 1.5 million pages – sixty-six times the size of the Encyclopedia Britannica.

[The human genome] is divided into twenty-three pairs of chromosomes – forty-six in all – in most cells in the body. All other apes, including gorillas, chimpanzees, and orangutans, have twenty-four pairs. At some point in hominid evolution, two medium-size chromosomes in some ancestral ape fused to form one. The human genome departed cordially from the ape genome several million years ago, acquiring new mutations and variations over time. We lost a chromosome, but gained a thumb.

[The human genome] encodes about 20,687 genes in total – only 1,796 more than worms, 12,000 fewer than corn, and 25,000 fewer genes than rice or wheat. The difference between “human” and “breakfast cereal” is not a matter of gene numbers, but of the sophistication of gene networks. It is not what we have; it is how we use it.

Mutations in mitochondrial genes are passed intact across generations, and they accumulate over time without crossing over, making the mitochondrial genome an ideal genetic timekeeper… Living humans are endowed with the evolutionary history of our species in our genomes. It is as if we permanently carry a photograph of each of our ancestors in our wallets.

Calculating backward, the age of humans was estimated to be about two hundred thousand years – a minor blip, a tick tock, in the scale of evolution.

Modern humans appear to have emerged exclusively from a rather narrow slice of earth, somewhere in sub-Saharan Africa, about one hundred to two hundred thousand years ago, and then migrated northward and eastward to populate the middle east, Europe, Asia and the Americas.

Consider the genesis of a single-celled embryo produced by the fertilization of an egg by a sperm. The genetic material of this embryo comes from two sources: paternal genes (from sperm) and maternal genes (from eggs). But the cellular material of the embryo comes exclusively from the egg; the sperm is no more than a glorified delivery vehicle for male DNA – a genome equipped with a hyperactive tail. Aside from proteins, ribosomes, nutrients, and membranes, the egg also supplies the embryo with specialized structures called mitochondria. These mitochondria are the energy-producing factories of the cell; they are so anatomically discrete and so specialized in their function that cell biologists call them “organelles” – i.e., mini-organs resident within cells.

A number of studies have tried to quantify the level of genetic diversity of the human genome. The most recent estimates suggest that the vast proportion of genetic diversity (85 to 90 percent) occurs within so-called races (i.e., within Asians or Africans) and only a minor proportion (7 percent) between racial groups. Some genes certainly vary sharply between racial or ethnic groups – sickle cell anemia is an Afro-Caribbean and Indian disease, and Tay-Sachs disease has a much higher frequency in Ashkenazi Jews – but for the most part, the genetic diversity within any racial group dominates the diversity between racial groups – not marginally, but by an enormous amount. This degree of intraracial variability makes “race” a poor surrogate for nearly any feature: in a genetic sense, an African man from Nigeria is so “different” from another man from Namibia that it makes little sense to lump them into the same category.

You can use genome to predict where X or Y came from. But, knowing where A or B came from, you can predict little about the person’s genome. Or: every genome carries a signature of an individual’s ancestry – but an individual’s racial ancestry predicts little about the person’s genome. You can sequence DNA from an African-American man and conclude that his ancestors came from Sierra Leone or Nigeria. But if you encounter a mane whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man. The geneticist goes home happy, the racist returns empty-handed.

If you superpose poverty, hunger, and illness on a child, then these variables dominate the influence on IQ. Genes that control IQ only become significant if you remove these limitations. It is easy to demonstrate an analogous effect in a lab: if you raise two plant strains – one tall and one short – in undernourished circumstances, then both plants grow short regardless of intrinsic genetic drive. In contrast, when nutrients are no longer limiting, the tall plant grows to its full height. Whether genes or environment – nature or nurture – dominates in influence depends on context. When environments are constraining, they exert a disproportionate influence. When the constraints are removed, genes become ascendant.

When a gene variant reduces an organism’s fitness in a particular environment – a hairless man in Antarctica – we call the phenomenon genetic illness. When the same variant increases fitness in a different environment, we call the organism genetically enhanced. The synthesis of evolutionary biology and genetics reminds us that these judgements are meaningless: enhancement or illness are words that measure the fitness of a particular genotype to a particular environment; if you alter the environment, the words can even reverse their meanings. “When nobody read,” the psychologist Alison Gopnik writes, “dyslexia” wasn’t a problem. When most people had to hunt, a minor genetic variation in your ability to focus attention was hardly a problem, and may even have been an advantage [enabling a hunter to maintain his focus on multiple and simultaneous targets, for instance]. When most people have to make it through high school, the same variation can become a life-altering disease.”

When a distinct, heritable biological feature, such as a genetic illness (e.g., sickle-cell anemia), is the ascendant concern, then examining the genome to identify the locus of that feature makes absolute sense. The narrower the definition of the heritable feature or the trait, the more likely we will find a genetic locus for that trait, and the more likely that the trait will segregate within some human subpopulation (Ashkenazi Jews in the case of Tay-Sachs disease, or Afro-Caribbean’s for sickle-cell anemia). There’s a reason that marathon running, for instance, is becoming a genetic sport: runners from Kenya and Ethiopia, a narrow eastern wedge of one continent, dominate the race not just because of talent and training, but also because the marathon is a narrowly defined test for a certain form of extreme fortitude. Genes that enable this fortitude (e.g., particular combinations of gene variants that produce distinct forms of anatomy, physiology, and metabolism) will be naturally selected.

Steven’s work was corroborated by that of her close collaborator, the cell biologist Edmund Wilson, who simplified Steven’s terminology, calling the male chromosome Y, and the female X. In chromosomal terms, male cells were XY, and females were XX. The egg contains a single X chromosome, Wilson reasoned. When a sperm carrying a Y chromosome fertilizes an egg, it results in an XY combination, and maleness is determined. When a sperm carrying an X chromosome meets a female egg, the result is XX, which determines femaleness. Sex was not determined by right or left testicles, but by a similarly random process – by the nature of the genetic payload of the first sperm to reach and fertilize an egg.

What we call gender, then, is an elaborate genetic and developmental cascade, with SRY at the tip of the hierarchy, and modifiers, integrators, instigators, and interpreters below. This geno-developmental cascade specifies gender identity. To return to an earlier analogy, genes are single lines in a recipe that specifies gender. The SRY gene is the first line in the recipe: “Start with four cups of flour.” If you fail to start with the flour, you will certainly not bake anything close to a cake. But infinite variations fan out of that first line – from the crusty baguette of a French bakery to the eggy mooncakes of Chinatown.

Why do identical twins raised in identical homes and families end up with different lives and become such different beings? Why do identical genomes become manifest in such dissimilar personhoods, with nonidentical temperaments, personalities, fates, and choices? … What causes the difference? Forty-three studies, performed over two decades, have revealed a powerful and consistent answer: “unsystematic, idiosyncratic, serendipitous events.” Illnesses. Accidents. Traumas. Triggers. A missed train; a lost key; suspended thought. Fluctuations in molecules that cause fluctuations in genes, resulting in slight alterations in forms. Rounding a bend in Venice and falling into a canal. Falling in love. Randomness. Chance.

“Why are twins different?” we had asked earlier? Well, because idiosyncratic events are recorded through idiosyncratic marks in their bodies. But “recorded” in what manner? Not in the actual sequence of genes: if you sequence the genomes of a pair of identical twins every decade for fifty years, you get the same sequence over and over again. But if you sequence the epigenomes of a pair of twins over the course of several decades, you find substantial differences: the pattern of methyl groups attached to the genomes of blood cells or neurons is virtually identical between the twins at the start of the experiment, begins to diverge slowly over the first decade, and becomes substantially different over fifty years. Chance events – injuries, infections, infatuations; the haunting trill of that particular nocturne; the smell of that particular Madeline in Paris – impinge on one twin and not the other. Regulatory proteins turn genes “on” and “off” in response to these events, and epigenetic marks are gradually layered above genes. How these epigenetic marks functionally impact the activity of genes remains to be determined – but some experiments suggest that these marks, in conjunction with transcription factors, can help orchestrate the activity of genes.

Following the reports of the T cell trials at the NIA, gene therapists envisaged novel cures for genetic diseases such as cystic fibrosis and Huntington’s disease. Since genes could be delivered into virtually any cell, any cellular disease was a candidate for gene therapy: heart disease, mental illness, cancer.

The anti-determinists want to say that DNA is a little side-show, but every disease that’s with us is caused by DNA. And [every disease] can be fixed by DNA. – George Church

The future of human genetics would be built on two fundamental elements. The first was “genetic diagnosis” – the idea that genes could be used to predict or determine illness, identity, choice, and destiny. The second was “genetic alteration” – that genes could be changed to change the future of diseases, choice, and destiny.

Computational algorithms could determine the probability of the development of heart disease or asthma or sexual orientation and assign a level of relative risk for various fates to each genome. The genome will thus be read not in absolutes, but in likelihoods – like a report card that does not contain grades but probabilities, or a resume that does not list past experiences but future propensities. It will become a manual of previvorship.

bottom of page