ZHOU LABORATORY 
Protein Drug Design and Molecular Diagnostics

中文版本
RESEARCH

Main Research Areas and Directions: We integrate artificial intelligence-driven structural bioinformatics with cutting-edge high-throughput and directed evolution biotechnologies to deepen our understanding of the complex relationships between the sequence, structure, function, and phenotype of proteins/RNA to achieve precise prediction and design. We are a multidisciplinary team with talents in AI computation, molecular biology, cell biology, synthetic chemistry, as well as software and hardware engineers. Some research directions are as follows:


1. Protein Structure/Function Prediction

Predicting protein structures from sequences has always been a core challenge in molecular biology. Over time, the field has evolved from template-based modeling, fragment assembly, de novo prediction to end-to-end learning. We have contributed to this field by developing SPARKS (2004) [1] and SPARKS X (2011) [2] for template-based modeling, and by developing the statistical mechanical potential energy named DFIRE (2002) [3] for energy-guided structural optimization. In addition, we pioneered the prediction of protein dihedral angles and developed de novo structure prediction methods (2009) [4], making end-to-end learning-based structural prediction possible, which is an important part of the successful high-precision protein structure prediction by AlphaFold 2. Currently, we are focusing on solving the challenging problem that the prediction accuracy of AlphaFold2 heavily relies on natural homologous sequences.

[1] H. Zhou and Y. Zhou, "Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition", Proteins, 55, 1005-1013 (2004).

[2] Y. Yang, E. Faraggi, H. Zhao, and Y. Zhou, "Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of the query and corresponding native properties of templates" Bioinformatics 27, 2076-2082 (2011).

[3] H. Zhou and Y. Zhou, "Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction, Protein Science, 11 , 2714-2726 (2002).

[4] E. Faraggi, Y. Yang, S. Zhang, and Y. Zhou, "Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction", Structure 17,1515-1527 (2009)


2. Protein Design

Due to the vast number of possible protein sequences (for a protein with only 100 amino acids, the possibilities reach \(10^{130}\)), the structural and functional space of proteins remains a largely untapped resource. Historically, energy-based methods have dominated the design of novel proteins, but their overall success rate has been limited, hindering widespread application. We pioneered the development of an artificial intelligence-based protein design method called SPIN (2014) [5], which does not rely on energy functions, and subsequently improved it using deep learning (SPIN2, 2018) [6]. Now, artificial intelligence-based protein design has become mainstream, continuously increasing the success rate [7]. Our goal is to improve the success rate of functional design through the combination of computation and automated experimental evolution, with a particular focus on the discovery of protein drugs.

[5] Z. Li, Y. Yang, E. Faraggi, J. Zhan, and Y. Zhou, "Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.", Proteins, 82, 2565-2573 (2014).

[6] J. O’Connell, Z. Li, J. Hanson, R. Heffernan, J. Lyons, K. Paliwal, A. Dehzangi, Y. Yang, and Y. Zhou, SPIN2: Predicting sequence profiles from protein structures using deep neural networks PROTEINS, 86: 629-633 (2018).

[7] X Zhang, H Yin, F Ling, J Zhan, Y Zhou. SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network. bioRxiv; 2023. DOI: 10.1101/2023.07.07.548080.


3. RNA Structure and Function Research

Non-coding RNA is currently a hot topic in basic biological research. This is mainly because messenger RNA (mRNA) that encodes proteins only accounts for 1.5% of the human genome, while non-coding RNA accounts for 75%. So far, only a very small number of non-coding RNA functions have been discovered, and they are almost present in all biological processes, playing a key role in many diseases (including cancer) [8]. However, for the vast majority of non-coding RNAs, we know very little, mainly because of the lack of structural information. Structure determines function, and without structure, we have no clues to decipher its function. Our contributions to RNA structural prediction include the establishment of the first fully automated homology search pipeline RNAcmap (2021) [9], the world's first end-to-end deep learning to predict RNA secondary structure (2019) [10], machine learning prediction of functional long non-coding RNA [11] and methods for inferring RNA secondary structure using deep mutation [12], as well as the development of a statistical energy function BRiQ specifically for RNA that can accurately optimize and correct RNA near-native state structures (2020) [13]. Alchemy-RNA2 won first place in the CASP 15 RNA structure prediction competition in 2022 with the BRiQ energy function [14]. Currently, these works are being further developed, striving to achieve the ultimate goal of sequence to structure, structure to function, and function design.

[8] B. Zhou, B. Ji, K. Liu, G. Hu, F. Wang, Q. Chen, R. Yu, P. Huang, J. Ren, C. Guo, H. Zhao, H. Zhang, D. Zhao, Z. Li, Q. Zeng, J. Yu, Y. Bian, Z. Cao, S. Xu, Y. Yang, Y. Zhou*, and J. Wang*, EVLncRNAs 2.0: an updated database of manually curated functional long non-coding RNAs validated by low-throughput experiments, Nucleic Acids Research (Database Issue) 49, D86–D91 (2021).

[9] T. Zhang, J. Singh, T. Litfin, J. Zhan, K. Paliwal, and Y. Zhou, "RNAcmap: A fully automatic pipeline for predicting contact maps of RNAs by evolutionary coupling analysis.", Bioinformatics , 37, 3494–3500 (2021).

[10] J. Singh, J. Hanson, K. Paliwal, and Y. Zhou, RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning, Nature Communications 10, 5407 (2019).

[11] B. Zhou, Y. Yang, J. Zhan, X. Dou, J. Wang*, and Y. Zhou*, Predicting functional long non-coding RNAs validated by low throughput experiments, RNA Biology, 16: 1555-1564 (2019).

[12] Z. Zhang, P. Xiong, T. Zhang, J. Wang, J. Zhan, and Y. Zhou, Accurate inference of the full base-pairing structure of RNA by deep mutational scanning and covariation-induced deviation of activity, Nucleic Acids Research, 48:1451-1465 (2020).

[13] X. Peng, R. Wu, J. Zhan* and Y. Zhou*, Pairing a high-resolution statistical potential with a nucleobase-centric sampling algorithm for improving RNA model refinement, Nature Communications 12, Article number: 2777 (2021).

[14] K. Chen, Y. Zhou, S. Wang, and P. Xiong, "RNA tertiary structure modeling with BRiQ potential in CASP15.", Proteins (CASP 15 Special Issue) , in press (2023).


4. Biopharmaceuticals and Nanobody Design and Development

The one-in-a-thousand differences in the human genome among individuals lead to significant variations in immune capabilities, the likelihood of developing certain diseases, and the effectiveness of various drugs. These differences necessitate precise diagnostic methods and personalized, highly targeted new preventive vaccines or therapeutic drugs. Leveraging artificial intelligence and deep learning methods to mine biological and medical big data is the essential path to achieving precision medicine as soon as possible. New drug development and precision medicine are one of the key points in China's 13th Five-Year National Strategic Emerging Industry Development Plan, and biopharmaceuticals (peptides, RNA, proteins, antibodies, etc.) are becoming increasingly popular due to their minimal side effects and strong targeting capabilities. Currently, both small molecules and biopharmaceuticals face serious drug resistance issues in antiviral, antibacterial, antifungal, and anticancer applications, primarily because the drugs used clinically interact with the surface of the target structure that the drug molecules are designed to target, and natural mutations on the surface of the target structure can lead to drug resistance. After years of exploration, Zhou Yaoqi's research group has been able to predict self-inhibiting peptides that disrupt the target structure using computational methods and has successfully applied this to the discovery of new antimicrobial peptides that are difficult to develop resistance to [15]. In addition, protein design is increasingly becoming a new tool for the design of biopharmaceuticals, including nanobodies [16]. The research group's combination of computation and experimentation will undoubtedly accelerate the discovery and application of new drugs.

【15】J. Zhan, H. Jia, E. A. Semchenko, Y. Bian, A. M. Zhou, Z. Li, Y. Yang, J. Wang, S. Sarkar, M. Totsika, H. Blanchard, F. E.-C. Jen, Q. Ye, T. Haselhorst, M. P. Jennings, K. L. Seib, and Y. Zhou, Self-derived structure-disrupting peptides targeting methionine aminopeptidase in pathogenic bacteria; a new strategy to generate antimicrobial peptides, FASEB J. , 33: 2095–2104 (2019).

【16】Z. Li, Y. Yang, J. Zhan, L. Dai and Y. Zhou, Energy Functions in De Novo Protein Design: Current Challenges and Future Prospects, Ann. Rev. Biophysics 42, 315-335 (2013).


5. Biomarker Detection and Instrument Development

The rapid, cost-effective, highly sensitive, and accurate detection of biomarkers remains a global challenge today, and the research and medical instruments are monopolized by a few international companies. Zhou Yaoqi's research group has experience in designing high-sensitivity sensors [17], has been responsible for software development, data analysis, and AI algorithm projects for many years, and is currently leading the development of a new generation of immunoblot imaging systems. <Click to view>

【17】S. Xu, J. Zhan, B. Man, S. Jiang, W. Yue, S. Gao, C. Guo, H. Liu, Z. Li, J. Wang, and Y. Zhou, Real-time reliable determination of binding kinetics of DNA hybridization using a multi-channel graphene biosensor, Nature Communications 8, 14902 (2017).