Search
Close this search box.

The minimum energy required to build a cell – Scientific Reports

Establishing a standardised methodology to calculate the minimum energetic requirements for cellular biosynthesis that defines the most efficient metabolic pathways across diverse cell types can provide insights into the energetic constraints of life in different environments and guide research in astrobiology, cellular biology, and biotechnology. This work offers a comprehensive, data-driven approach to elucidate the minimum energetic requirements of cellular biosynthesis. The model calculates organism and environment-specific energy requirements, elaborating on other approaches which rely on model organisms, highly specific applications, or generalising across microbial communities. We have shown that per gram of dry weight, mammalian cells, S. cereviscea, E. coli, and even the ‘minmial cell’ JCVI-syn3 have similar minimum energetic costs of biosynthesis.

It is important to note that the “energy required for cellular synthesis” and the “minimum energy necessary for cellular synthesis” should be interpreted differently. The former refers to the typical energy expended in-practice, accounting for specific metabolic pathways, environmental influences and biological inefficiencies. The latter, this study’s focus, represents optimal conditions, yielding the lowest possible energy required to build a cell. While the in-practice energy reflects typical cellular operations and can be influenced by possible metabolic heterotrophic inefficiencies or the availability of partially constructed carbon sources, the minimum energy is a foundational, agnostic reference for highly efficient metabolisms. One method of quantifying the efficiency of metabolism is by calculating a Gibbs energy dissipation rate (in kJ (g cells)(^{-1}) h(^{-1})). This paramaterises the energy which is not utilized in metabolism, and is lost as entropy, heat, or through other inefficiencies. It varies with growth rate and appears to plateau at high growth rates40. As Synercell is integrated into microbial growth models, it may be used in the future to examine how much dissipation is caused by the difference between the two synthesis energies described above.

Historical research on biosynthesis has predominantly centred on the cell maintenance energy, biomass synthesis energy from metabolic models41, exploring ATP requirements for specific synthesis pathways5, energy dynamics within chemolithotrophic communities6, or generalising biomass synthesis from inorganic precursors based on fixed stoichiometries7. The approach presented here instead provides insights into a lower thermodynamic floor of energy required to build a cell, which could shed light on the ultimate biophysical limit of efficiency for microbial growth. This approach uses a wider variety of omics data than those listed above. It plays a vital role in lending specificity and variability to the biomolecules under consideration. The variablity in composition and size given by input sequences is adjusted to fix reactants and products’ concentration pools. Thus, the model can be deployed for any well-sequenced species, yielding cell-specific biosynthesis energy requirements for application in biogeosciences, cellular biology, biotechnology, or astrobiology.

Our minimum energy necessary for cellular synthesis can be tentatively compared with other estimates that used the various approaches above. Some comparisons for E. coli are listed in Table 2. The synthesis energies computed in this work for E. coli are significantly lower than these other estimates. This owes to the methodological differences between the studies, and our goal in this work to find a fundamental thermodynamic minimum. The largest difference is between this work and the estimates from Lynch and Marinov33,34. For each cell component, these estimates are (sim)20–40 times larger than our predictions. This most likely owes to the Lynch and Marinov estimates including smaller building blocks than Synercell and that model’s association with empirical data. For example, the majority of the Lynch and Marinov33 synthesis energy ATP cost is associated with building nucleotides with polymerisation a minor (of order percent) contributor33, whereas Synercell focuses on building the helix structure. This likely accounts for one portion of the discrepancy, with the remainder associated with alternative inefficiencies such as the residual energy loss described above. McCollom and Amend6 suggest that the actual observed energy expended on growth processes is approximately an order of magnitude larger than thermodynamic cost of synthesising the constituent building blocks, and their column in Table 2 only characterises that process, not polymerisation. If the McCollom and Amend6 column reflects the building block synthesis then, and Synercell represents the polymerisation cost, the remainder between the sum of these and the Lynch and Marinov33,34 column may represent the cellular inefficiency in biomass production in nature. Higgins and Cockell4 calculated that, for proteins at (approx)25 (^circ)C, the synthesis of amino acids from organic precursors and the cost of polymerisation are approximately 700 and 500 J (g proteins)(^{-1}) respectively. However, this begins to diverge with elevating temperature and amino acid synthesis becomes more energy intensive ((approx)4 times more expensive at 100 (^circ)C)4. The Synercell proteome polymerisation estimate for E. coli at 25 (^circ)C is 372 J (g proteins)(^{-1}), in broad agreement with Higgins and Cockell ((approx)500 J (g proteins)(^{-1}))4 and Amend et al. (347 J (g proteins)(^{-1}))8. To our knowledge, this is the first application of a GCA for DNA, RNA, and phospholipid polymerisation so it is difficult to verify these results against other studies. Similar relationships are observed between the different components and the other studies noted above and in Table 2.

In Table 2 synthesis energies are also presented in mmoles of ATP per gram of cells in order to compare with other empirically validated results33,34. However, the ATP energy yield is influenced by internal cell concentrations and physicochemical parameters like temperature and pressure, which can vary significantly among different organisms20 and even within the same organism under different growth states. Consequently, while comparing ATP costs is a prevalent approach in the literature, this method only sometimes provides a straightforward comparison due to these variable internal and environmental factors. Our analysis, therefore, treats these ATP cost estimations as part of a broader, context-dependent framework rather than as absolute values for direct comparison. As such, conversion between the units of the studies in Table 2 may account for some of the discrepancy in energy synthesis values.

Our approach to estimating the minimal energy requirements for cell synthesis is an alternative to using biomass compositions derived from flux balance analysis (FBA). FBA, a well-recognised method for studying metabolic networks, often involves challenges in accurately capturing the stoichiometry of biomass reactions, a point highlighted by recent studies42,43. These challenges stem from the difficulty in obtaining detailed experimental data for all major biomass components, compounded by the variability and complexity of metabolic networks. To sidestep the inherent uncertainty in the biomass reaction stoichiometry used in FBA models, we instead introduce variability in size and composition by reading input sequences. While FBA models are critical for understanding cellular metabolism, they often focus on the growth-associated maintenance (GAM) demand of ATP, making it hard to understand the minimum energy necessary to synthesise these components. Reported values for E. coli cell synthesis calculated with FAB include 23 mmol (g cells)(^{-1})44, 59.81 mmol (g cells)(^{-1})41, 53.81 mmol (g cells)(^{-1})45 and 75.38 mmol (g cells)(^{-1})46, which are larger than our estimates, and, as above, this difference likely characterised cellular inefficiencies and complexities such as GAM and energy dissipation. In contrast, Synercell aims to provide an energetic baseline while staying flexible to any cell with known genome and proteome sequences. This approach is particularly advantageous for analysing cells with less characterised metabolic networks, where detailed experimental data for biomass composition are unavailable.

Results for the JCVI-syn3A cell model can also be compared to some other calculations, albeit in a more limited way than E. coli. JCVI-syn3A is an interesting case study, because it was engineered to function as a ‘minimal cell’24. This makes it an ideal example to probe the fundamental minimal energy necessary to synthesise a cell. On a per-cell basis, the minimum synthesis energy of JCVI-syn3 was the lowest amongst our sample of four—but it was also the smallest cell so that result alone is limited. On a per-gram basis, all four organisms examined in this work have a similar minimum synthesis cost, and any differences are likely caused primarily by differences in internal cell composition, and secondarily by genome and proteome complexity. Breuer et al.24 provides some estimates of the ATP requirement to synthesise JCVI-syn3A DNA, RNA, and proteins—0.24, 0.14, 21.2 mmol ATP respectively—but those are based on E. coli-like synthesis costs so direct applicability to this organism is limited. Synercell results were generated with the JCVI-syn3A internal metabolites composition, and genome and proteome sequences.

In this work, we have expanded the scope of existing methodologies for peptide synthesis19,20 to include the energy calculations for a cell’s DNA, RNA, and lipid content. This was not possible for carbohydrates owing to the extensive diversity in carbohydrate structures, their varied functional roles across different organisms, lack of standardised structural description47 and the limited availability of thermodynamic data. The vast heterogeneity in carbohydrates implies that no single structure can adequately capture the essence of all cell types. Instead, we adopted an alternative strategy where we adopted an average value approach for the carbohydrate content, as has been previously done for all non-proteome components4,8,19. This approach allowed us to integrate carbohydrates into our whole-cell calculations, ensuring a more comprehensive and representative model, albeit with an acknowledgement of the simplifications necessitated by the complexity of carbohydrate diversity.

Furthermore, our model employs POPC as a representative phospholipid to approximate the energetic costs associated with membrane synthesis. While POPC is a prevalent component in many cell types, this membrane simplification poses limitations in fully capturing the energetic nuances associated with synthesising more complex cell membranes. Cell membranes comprise a rich mixture of various lipid species and proteins and in this model the latter part is calculated as part of the proteome algorithm. Approximately 30% of proteins in a cell are in the membrane48. To get a closer approximation to the membrane value we need to consider that the energetic cost of this protein component would be approximately 30 % of the proteome’s value (58.57 J/(g cells) or 1.76(times 10^{-11}) J/ cell for E. coli). The lipid bilayer cost is 69.95 J/(g cells) or 2.10(times 10^{-11}) J/cell, giving a total of 128.53 J/(g cells) or 3.86(times 10^{-11}) J/cell for an E. coli membrane (values from Table 2). Table 2 also summarises similar estimates of the cell constituents of E. coli from other studies using slightly different methods and chemical environments.

Our model stands out due to its adaptability. It can be refined with additional thermodynamics and omics, allowing for species-specific energy estimates. Conversely, since our model’s input requires biomolecule sequences to perform the calculations, it can only perform DNA, RNA and protein calculations based on omics data. Despite previous efforts to sequence phospholipids and carbohydrates47,49, there is still a lack of standardised methodologies and data for these biomolecules. Therefore, our model only includes one ‘hand-made’ generic model per biomolecule type. Consequently, since accurately calculating the minimum energy needed to synthesise a cell requires more thermodynamic information for phospholipids and carbohydrates, we provide an open-source tool for different applications that can be updated as data becomes available.

In the development of this model, we evaluated two key aspects: (1) the accuracy of the GCA in constructing a biomolecule’s (Delta G_f^{circ }), and (2) the use of different (Delta G_f^{circ }) standards for the building blocks (Table 4). First, we built the nucleotides in two ways: phosphate + deoxyribose + adenine (block method 1) and phosphate + deoxyadenosine (block method 2) (Supplementary Fig. S1). We obtained similar results when comparing the (Delta G_r) obtained with the different methodologies and the standard nucleotide’s (Delta G_f^{circ }) from the SUPCRT slop07 database50. Furthermore, we tested this method for chemical bonds and validated the results with experimental data (Supplementary Fig. S2), indicating this is a reliable method. Secondly, we examined thermodynamic and biological standards (Delta G_f^{circ }) for the building blocks18 to ensure consistency in results. Each standard estimated the same (Delta G_r), likely due to the lack of H+ in the overall reaction and our assumption that ionic strength is close to zero20. Furthermore, although the only physicochemical parameter considered here was temperature, the models could be corrected for chemical differences by considering the corresponding change in cellular content, if any. Our models show a broad floor when compared to results from other studies which examine a variety of chemical environments4,5,6,19.

In the future, our model could benefit from integrating more variables. For example, variations in internal pH among organisms can influence cellular composition stability51,52. Environmental shifts can also affect energy consumption in biomacromolecule synthesis. Higgins and Cockell4 showed that rising temperatures make amino acid synthesis a leading energy expense in protein formation. Additionally, McCollom and Amend6 found that anaerobic conditions are more conducive to building block synthesis than aerobic ones due to specific oxidation states. In anaerobic settings, the altered oxidation state affects the concentrations of crucial dissolved compounds, influencing biomolecule synthesis. Moreover, we have utilised cell compositions from diverse sources to approximate an average value, aiming to represent a broad spectrum of growth phases. While this approach provides a broad overview, future studies may benefit from analysing cell composition in specific growth phases to assess dynamic changes in energetic costs and maintenance requirements. This would enhance the granularity of our analysis and allow us to examine how changes in the absolute internal cell composition—both reactants and products—impact the overall energetic cost of cell synthesis.

When the model is deployed for analyses of microbial communities in situ, local instantaneous geochemical data should be leveraged to correct internal cell concentrations and their effect on the present biosynthesis calculations, unlocking faster and more robust biomass turnover calculations than are currently possible20. Typically, the biological data which serves as input parameters for microbial models are inferred from culture-based studies which themselves are controlled and well-defined but may be time consuming to perform. The model presented here only depends on the omics data of any given organism and its internal composition, so only requires the latter to be updated using insights into the local geochemistry to generate site-specific energetic requirements of biomass synthesis. This could additionally be extended for analyses of habitability and growth through deep time, and to model how the energy requirement changes with its environment53.

This study’s primary goal was determining the minimal energy necessary to assemble a cell, a key metric for understanding the basal energy requirements essential for life. The energetic requirement of biomass synthesis is a critical component of bioenergetic habitability models4 and a controlling parameter in estimates of biomass turnover, which are pertinent to biosignature production and, by extension, constraining the feasibility of life detection on other worlds20. Additionally, our findings have significant implications in biotechnology, offering a pathway to optimise energy efficiency in microbial production systems and synthetic biology applications. By establishing a benchmark for the minimum energy needed to construct cellular biomass, our model, Synercell, is a tool for identifying and enhancing energy-efficient pathways in various biotechnological processes54,55,56,57. The potential integration of our model, Synercell, with other predictive models, (e.g., amide bond synthesis58), can enhance the accuracy of bioenergetic predictions across diverse environmental conditions.

In conclusion, this study introduces a comprehensive, data-driven model to understand the minimum energy requirements for cellular biosynthesis. It is a valuable tool for cellular biology, biotechnology, biogeosciences, and astrobiology and can be incorporated into other models. We anticipate its flexibility will encourage further research and data collection, particularly for thermodynamic data related to organisms other than those studied here, and their constituent biomolecules. Ultimately, our research contributes to understanding the energy constraints of life and the factors influencing the fundamental thermodynamic minimum energy requirements for cell construction. This understanding is crucial for exploring life’s boundaries in extreme environments, optimising biotechnological processes, and probing the potential for life beyond Earth.