Abstract
AI protein design software is used to explore the world of ancient protein structures, and very small, primordial amino acid libraries are found to produce a wide variety of key folds. Data from asteroid analyses and protobiotic chemistry experiments are utilised to constrain several primitive amino acid libraries of varying sizes. Protein design software is then used to construct sequences of these amino acids that could emulate a broad range of key protein folds including metabolic (all enzymes of the reverse tricarboxylic acid cycle), redox, electron-transfer and ribosomal, among others. AlphaFold2 is employed to predict 3D structures from these sequences, which are compared to their native counterparts using the Template Modelling Score. A library of only 6 amino acids-those of highest measured abundance on asteroid Bennu-can reproduce all folds in the test set. Two libraries of just 7 amino acids, constrained by the Miller-Urey experiment and Murchison meteorite data, and one library with 8 amino acids constrained by another Miller-Urey experiment, are also able to produce all folds considered. It is also demonstrated that a 6 amino acid alphabet-the 5 most abundant on Bennu supplemented by cysteine that could have been supplied by atmospheric haze chemistry-can yield a ferredoxin-like protein with a plausible Fe-S binding geometry. Such broad protein folding with very limited amino acid libraries has significant implications for the origins of life, synthetic biology and medical applications.
Similar content being viewed by others
Data availability
Data related to all proteins analysed in this work are available at accession code: https://doi.org/10.5281/zenodo.18521378. Source data are provided with this paper.
Code availability
Protein sequence design was performed using ProteinMPNN40, which is distributed under the MIT License and is openly available at https://github.com/dauparas/ProteinMPNN. The Colab notebook used in this study is available at https://colab.research.google.com/github/dauparas/ProteinMPNN/blob/main/colab_notebooks/quickdemo.ipynb. The original license and copyright information provided by the authors were retained. Protein structure prediction was performed using AlphaFold2 and ESMFold as implemented in ColabFold39, which is distributed under the Apache License 2.0 and is openly available at https://github.com/sokrypton/ColabFold. The specific notebooks used are available at https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynband https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/ESMFold.ipynb. No modifications were made to the core prediction algorithms. Protein structural similarity metrics were computed using TM-align45, which is freely available for academic use at: https://zhanggroup.org/TM-align/.
References
Barge, L. M., Flores, E., Baum, M. M., VanderVelde, D. G. & Russell, M. J. Redox and ph gradients drive amino acid synthesis in iron oxyhydroxide mineral systems. Proc. Natl. Acad. Sci. 116, 4828–4833 (2019).
Kauffman, S. A. At home in the universe: The search for laws of self-organization and complexity (Oxford University Press, USA, 1995).
Goldford, J. E., Hartman, H., Smith, T. F. & Segrè, D. Remnants of an ancient metabolism without phosphate. Cell 168, 1126–1134 (2017).
Moody, E. R. et al. The nature of the last universal common ancestor and its impact on the early earth system. Nat. Ecol. Evol. 8, 1654–1666 (2024).
Woese, C. The universal ancestor. Proc. Natl. Acad. Sci. 95, 6854–6859 (1998).
Bartlett, S. & Wong, M. L. Emergence, construction, or unlikely? navigating the space of questions regarding life’s origins. Conflicting models for the origin of life 53–64 (2023).
Bonfio, C. et al. Prebiotic iron–sulfur peptide catalysts generate a ph gradient across model membranes of late protocells. Nat. Catal. 1, 616–623 (2018).
Efremov, R. G. & Sazanov, L. A. Respiratory complex i:‘steam engine’of the cell? Curr. Opin. Struct. Biol. 21, 532–540 (2011).
Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 75, 126001 (2012).
Yoshida, M., Muneyuki, E. & Hisabori, T. Atp synthase-a marvellous rotary engine of the cell. Nat. Rev. Mol. cell Biol. 2, 669–677 (2001).
Lane, N.Transformer: The Deep Chemistry of Life and Death (W. W. Norton, 2022). https://books.google.com/books?id=Uv5KEAAAQBAJ.
Smith, E. & Morowitz, H.The Origin and Nature of Life on Earth: The Emergence of the Fourth Geosphere (Cambridge University Press, 2016). https://books.google.com/books?id=vi-8CwAAQBAJ.
Muchowska, K. B. et al. Metals promote sequences of the reverse krebs cycle. Nat. Ecol. evolution 1, 1716–1721 (2017).
Deng, M., Yu, J. & Blackmond, D. G. Symmetry breaking and chiral amplification in prebiotic ligation reactions. Nature 626, 1019–1024 (2024).
Hein, J. E. & Blackmond, D. G. On the origin of single chirality of amino acids and sugars in biogenesis. Acc. Chem. Res. 45, 2045–2054 (2012).
Ozturk, S. F. & Sasselov, D. D. On the origins of life’s homochirality: Inducing enantiomeric excess with spin-polarized electrons. Proc. Natl. Acad. Sci. 119, e2204765119 (2022).
Ozturk, S. F. & Sasselov, D. D. Life’s homochirality: Across a prebiotic network. Proc. Natl. Acad. Sci. 122, e2505126122 (2025).
Adamala, K. P. et al. Confronting risks of mirror life. Science 386, 1351–1353 (2024).
Wong, M. L., Christensen, M. & Bartlett, S. Rethinking “prebiotic chemistry”. Perspect. Earth Space Scientists 6, e2025CN000275 (2025).
Glavin, D. P. et al. Abundant ammonia and nitrogen-rich soluble organic matter in samples from asteroid (101955) bennu. Nat. Astron. 1–12 (2025).
Parker, E. T. et al. Primordial synthesis of amines and amino acids in a 1958 miller h2s-rich spark discharge experiment. Proc. Natl. Acad. Sci. 108, 5526–5531 (2011).
Johnson, A. P. et al. The miller volcanic spark discharge experiment. Science 322, 404–404 (2008).
Reed, N. W. et al. An archean atmosphere rich in sulfur biomolecules. Proc. Natl. Acad. Sci. 122, e2516779122 (2025).
Makarov, M. et al. Early selection of the amino acid alphabet was adaptively shaped by biophysical constraints of foldability. J. Am. Chem. Soc. 145, 5320–5329 (2023).
Makarov, M. et al. Prebiotically plausible peptides can self-assemble into β-rich nanostructures. bioRxivhttps://www.biorxiv.org/content/early/2025/11/10/2025.11.09.687475. (2025).
Walter, K. U., Vamvaca, K. & Hilvert, D. An active enzyme constructed from a 9-amino acid alphabet. J. Biol. Chem. 280, 37742–37746 (2005).
Timm, J. et al. Design of a minimal di-nickel hydrogenase peptide. Sci. Adv. 9, eabq1990 (2023).
Gibney, B. R., Mulholland, S. E., Rabanal, F. & Dutton, P. L. Ferredoxin and ferredoxin–heme maquettes. Proc. Natl. Acad. Sci. 93, 15041–15046 (1996).
Riddle, D. S. et al. Functional rapidly folding proteins from simplified amino acid sequences. Nat. Struct. Biol. 4, 805–809 (1997).
Tretyachenko, V. et al. Modern and prebiotic amino acids support distinct structural profiles in proteins. Open Biol. 12, 220040 (2022).
Solis, A. D. Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds. BMC Evolut. Biol. 19, 158 (2019).
Murphy, L. R., Wallqvist, A. & Levy, R. M. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng. 13, 149–152 (2000).
Longo, L. M., Lee, J. & Blaber, M. Simplified protein design biased for prebiotic amino acids yields a foldable, halophilic protein. Proc. Natl. Acad. Sci. 110, 2135–2139 (2013).
Devkota, K. et al. Miniaturizing, modifying, and magnifying nature’s proteins with raygun. bioRxiv https://www.biorxiv.org/content/early/2025/03/17/2024.08.13.607858 (2025).
Farías-Rico, J. A. & Mourra-Díaz, C. M. A short tale of the origin of proteins and ribosome evolution. Microorganisms 10, 2115 (2022).
Giacobelli, V. G. et al. Ancient amino acid sets enable stable protein folds. bioRxiv https://www.biorxiv.org/content/early/2025/10/29/2025.10.29.685319.full.pdf (2025).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 493–500 (2024).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Mirdita, M. et al. Colabfold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Dauparas, J. et al. Robust deep learning-based protein sequence design using proteinmpnn. Science 378, 49–56 (2022).
Bada, J. L. New insights into prebiotic chemistry from stanley miller’s spark discharge experiments. Chem. Soc. Rev. 42, 2186–2196 (2013).
Xu, J. & Zhang, Y. How significant is a protein structure similarity with tm-score= 0.5? Bioinformatics 26, 889–895 (2010).
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins: Struct., Funct., Bioinforma. 57, 702–710 (2004).
Bernstein, M. P., Dworkin, J. P., Sandford, S. A., Cooper, G. W. & Allamandola, L. J. Racemic amino acids from the ultraviolet photolysis of interstellar ice analogues. Nature 416, 401–403 (2002).
Zhang, Y. & Skolnick, J. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Acknowledgements
SB gratefully acknowledges funding support from the Caltech Center for Evolutionary Science, grant no. CES.FY2025. JEG gratefully acknowledges support from NASA’s Interdisciplinary Consortia for Astrobiology Research, grant no. 80NSSC23K1357. We dedicate this paper to the memory of Professor Yuk L. Yung (1946-2026), who passed away on March 16, 2026.
Author information
Authors and Affiliations
Contributions
The basic concept of this work was conceived by SB, JG, JY and YLY. SB carried out the protein design and analysis stages, with additional assistance from AG and DP. SB wrote the text and produced the figures. KH and WF provided essential guidance and support.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bartlett, S., Gupta, A., Phung, D. et al. A highly limited amino acid library from asteroid Bennu yields wide-ranging protein folds. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71509-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-71509-6


