CUHK
News Centre
CUHK research team uses AI to solve evolutionary puzzle
Aerobic bacteria evolved oxygen respiration 2.7 billion years ago far earlier than the Great Oxidation Event
A research team from the School of Life Sciences at The Chinese University of Hong Kong (CUHK) has combined machine learning with evolutionary analysis to trace when bacteria first evolved the ability to respire oxygen. Their study, analysing over 80,000 bacterial genomes, reveals that aerobic (oxygen-respiring) bacteria emerged roughly 2.7 billion years ago. This predates the Great Oxidation Event (GOE), the first major rise of oxygen in Earth’s atmosphere, by 200 to 400 million years. The findings reveal that microbial life adapted to local oxygen sources long before the planet’s atmosphere transformed, disproving the previously widely held belief that the ability to respire oxygen evolved only after atmospheric oxygen became abundant. Research findings were published in the prestigious journal Proceedings of the National Academy of Sciences of the United States of America.
Determining when life began using oxygen has been a major challenge for scientists. Previous methods relied on detecting genes directly involved with oxygen, but often failed because of the incomplete genomic data typical of environmental bacteria. These “environmental genomes” are frequently pieced together from complex microbial communities and are naturally fragmented, making the traditional detection method unreliable.
To solve this, the research team led by Professor Luo Haiwei from the Simon F.S. Li Marine Science Laboratory and the School of Life Sciences at CUHK, in collaboration with CUHK’s Faculty of Medicine and the University of St Andrews, United Kingdom, developed a machine learning model that bypasses the need to find directly oxygen-related genes. Instead, the model was trained to identify aerobic bacteria based on a minimal set of 40 genes, most of which are involved in fundamental cellular processes like energy metabolism and stress response, not directly in handling oxygen. This new model remains accurate even when applied to fragmented environmental genomes.
By applying the model to a vast database and mapping the results onto the bacterial family tree, the team used molecular clock dating, a technique that estimates when species diverged based on the rate of changes of genetic materials, to trace the evolutionary history of oxygen respiration by bacteria. “Our molecular clock dating shows a clear signal: aerobic bacteria existed in localised environments long before the GOE,” said Professor Luo. “This means that for hundreds of millions of years, these early oxygen-respiring microbes were living in ‘oases’ of oxygen, likely produced by early oxygen-releasing bacteria or by biologically independent processes, while the rest of the planet remained largely oxygen-free.”
The study indicates that, after their initial appearance, aerobic bacteria remained relatively rare until their numbers expanded dramatically during the GOE. A second major expansion occurred much later, about 800-550 million years ago, aligning with another significant period of rising atmospheric oxygen levels, known as the Neoproterozoic Oxygenation Event (NOE).
This research demonstrates how machine learning can predict traits in modern microbes and how, when combined with evolutionary tree analysis, these predictions can reliably infer the characteristics of ancient ancestral life. The team’s work also sets a precedent for using machine learning as a tool to uncover deep-time revolutionary events, paving the way for more studies in molecular evolution.
The full research article, Non-canonical Genetic Markers Resolve the Pre-GOE Emergence of Aerobic Bacteria in Earth’s History, is available here: https://www.pnas.org/doi/10.1073/pnas.2515709123
Figure 1. Evolutionary timeline of bacterial oxygen respiration.
This circular family tree illustrates when different groups of bacteria first appeared and whether they could respire oxygen (aerobes) or not (anaerobes). The inner colour ring indicates the predicted oxygen requirement, while the outer ring shows major bacterial groups. Time is shown in billions of years (Ga), radiating from the centre (time in the past) outward (towards the present). Pie charts at branching points represent the inferred presence of aerobic lineages.


