Our Scientific Research

At Illuminating Minds, our passion for science and innovation extends beyond the classroom. Below, you'll find our most recent articles: 

"ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation"


Kyro, GW; Morgunov, A ; Brent, RI; Batista, VS. Journal of Chemical Information and Modeling. 2023 [Submitted]. 



Abstract: The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. It is therefore of tremendous interest to develop methodologies that enhance the abilities and applicability of these powerful tools. In this work, we present a novel and efficient semi-supervised active learning methodology that allows for the fine-tuning of a generative model with respect to an objective function by strategically operating within a constructed representation of the sample space. In the context of targeted molecular generation, we demonstrate the ability to fine-tune a GPT-based molecular generator with respect to an attractive interaction-based scoring function by strategically operating within a chemical space proxy, thereby maximizing attractive interactions between the generated molecules and a protein target. Importantly, our approach does not require the individual evaluation of all data points that are used for fine-tuning, enabling the incorporation of computationally expensive metrics. We are hopeful that the inherent generality of this methodology ensures that it will remain applicable as this exciting field evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.

"HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction"


Kyro, GW; Brent, RI; Batista, VS. Journal of Chemical Information and Modeling. 2023, 63, 7, 1947-1960.


Full paper: 10.1021/acs.jcim.3c00251


Abstract: Applying deep learning concepts from image detection and graph theory has greatly advanced protein–ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two graph convolutional networks utilizing attention-based aggregation of node features. HAC-Net (Hybrid Attention-Based Convolutional Neural Network) obtains state-of-the-art results on the PDBbind v.2016 core set, the most widely recognized benchmark in the field. We extensively assess the generalizability of our model using multiple train-test splits, each of which maximizes differences between either protein structures, protein sequences, or ligand extended-connectivity fingerprints of complexes in the training and test sets. Furthermore, we perform 10-fold cross-validation with a similarity cutoff between SMILES strings of ligands in the training and test sets and also evaluate the performance of HAC-Net on lower-quality data. We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction. All of our software is available as an open-source repository at https://github.com/gregory-kyro/HAC-Net/, and the HACNet Python package is available through PyPI.

“Quantum Convolutional Neural Networks for Multi-Channel Supervised Learning”


Smaldone, AM; Kyro, GW; Batista, VS. Quantum Machine Intelligence. 2023 [Accepted].


Full paper: 10.48550/arXiv.2305.18961


Abstract: As the rapidly evolving field of machine learning continues to produce incredibly useful tools and models, the potential for quantum computing to provide speed up for machine learning algorithms is becoming increasingly desirable. In particular, quantum circuits in place of classical convolutional filters for image detection-based tasks are being investigated for the ability to exploit quantum advantage. However, these attempts, referred to as quantum convolutional neural networks (QCNNs), lack the ability to efficiently process data with multiple channels and therefore are limited to relatively simple inputs. In this work, we present a variety of hardware-adaptable quantum circuit ansatzes for use as convolutional kernels, and demonstrate that the quantum neural networks we report outperform existing QCNNs on classification tasks involving multi-channel data. We envision that the ability of these implementations to effectively learn interchannel information will allow quantum machine learning methods to operate with more complex data.

"MDiGest: A Python Package for Describing Allostery from Molecular Dynamics Simulations"


Maschietto, F; Allen, B; Kyro, GW; Batista, VS. Journal of Chemical Physics. 2023, 158, 215103. 


Full paper: 10.1063/5.0140453


Abstract: Many biological processes are regulated by allosteric mechanisms that communicate with distant sites in the protein responsible for functionality. The binding of a small molecule at an allosteric site typically induces conformational changes that propagate through the protein along allosteric pathways regulating enzymatic activity. Elucidating those communication pathways from allosteric sites to orthosteric sites is, therefore, essential to gain insights into biochemical processes. Targeting the allosteric pathways by mutagenesis can allow the engineering of proteins with desired functions. Furthermore, binding small molecule modulators along the allosteric pathways is a viable approach to target reactions using allosteric inhibitors/activators with temporal and spatial selectivity. Methods based on network theory can elucidate protein communication networks through the analysis of pairwise correlations observed in molecular dynamics (MD) simulations using molecular descriptors that serve as proxies for allosteric information. Typically, single atomic descriptors such as α-carbon displacements are used as proxies for allosteric information. Therefore, allosteric networks are based on correlations revealed by that descriptor. Here, we introduce a Python software package that provides a comprehensive toolkit for studying allostery from MD simulations of biochemical systems. MDiGest offers the ability to describe protein dynamics by combining different approaches, such as correlations of atomic displacements or dihedral angles, as well as a novel approach based on the correlation of Kabsch–Sander electrostatic couplings. MDiGest allows for comparisons of networks and community structures that capture physical information relevant to allostery. Multiple complementary tools for studying essential dynamics include principal component analysis, root mean square fluctuation, as well as secondary structure-based analyses.

"The Landscape of Computational Approaches for Artificial Photosynthesis"


Yang, KR; Kyro, GW; Batista, VS. Nature Computational Science. 2023, 3, 504-513. 


Full paper: 10.1038/s43588-023-00450-1


Abstract: Artificial photosynthesis is an attractive strategy for converting solar energy into fuels, largely because the Earth receives enough solar energy in one hour to meet humanity’s energy needs for an entire year. However, developing devices for artificial photosynthesis remains difficult and requires computational approaches to guide and assist the interpretation of experiments. In this Perspective, we discuss current and future computational approaches, as well as the challenges of designing and characterizing molecular assemblies that absorb solar light, transfer electrons between interfaces, and catalyze water-splitting and fuel-forming reactions.

"Mapping N- to C-terminal Allosteric Coupling Through Disruption of the Putative CD74 Activation Site in D-Dopachrome Tautomerase"


Chen, E; Widjaja, V; Kyro, GW; Allen, B; Das, P; Bhandari, V; Lolis, EJ; Batista, VS; Lisi, GP. Journal of Biological Chemistry. 2023, 299, 6, 104729.


Full paper: 10.1016/j.jbc.2023.104729


Abstract: The macrophage migration inhibitory factor (MIF) protein family consists of MIF and D-dopachrome tautomerase (also known as MIF-2). These homologs share 34% sequence identity while maintaining nearly indistinguishable tertiary and quaternary structure, which is likely a major contributor to their overlapping functions, including the binding and activation of the cluster of differentiation 74 (CD74) receptor to mediate inflammation. Previously, we investigated a novel allosteric site, Tyr99, that modulated N-terminal catalytic activity in MIF through a “pathway” of dynamically coupled residues. In a comparative study, we revealed an analogous allosteric pathway in MIF-2 despite its unique primary sequence. Disruptions of the MIF and MIF-2 N termini also diminished CD74 activation at the C terminus, though the receptor activation site is not fully defined in MIF-2. In this study, we use site-directed mutagenesis, NMR spectroscopy, molecular simulations, in vitro and in vivo biochemistry to explore the putative CD74 activation region of MIF-2 based on homology to MIF. We also confirm its reciprocal structural coupling to the MIF-2 allosteric site and N-terminal enzymatic site. Thus, we provide further insight into the CD74 activation site of MIF-2 and its allosteric coupling for immunoregulation.

"Turning Up the Heat Mimics Allosteric Signaling in Imidazole-Glycerol Phosphate Synthase"


Maschietto, F; Morzan, U; Tofoleanu, F; Gheereart, A; Chaudhuri, A; Kyro, GW; Nekrasov, P; Brooks, B; Loria, JP; Rivalta, I; Batista, VS. Nature Communications. 2023, 14, 2239. 


Full paper: 10.1038/s41467-023-37956-1


Abstract: Allosteric drugs have the potential to revolutionize biomedicine due to their enhanced selectivity and protection against overdosage. However, we need to better understand allosteric mechanisms in order to fully harness their potential in drug discovery. In this study, molecular dynamics simulations and nuclear magnetic resonance spectroscopy are used to investigate how increases in temperature affect allostery in imidazole glycerol phosphate synthase. Results demonstrate that temperature increase triggers a cascade of local amino acid-to-amino acid dynamics that remarkably resembles the allosteric activation that takes place upon effector binding. The differences in the allosteric response elicited by temperature increase as opposed to effector binding are conditional to the alterations of collective motions induced by either mode of activation. This work provides an atomistic picture of temperature-dependent allostery, which could be harnessed to more precisely control enzyme function.

“Electrostatic Networks for Characterization of Allosteric Pathways in Cas9 Apo, RNA- and DNA-Bound Forms"


Maschietto, F; Kyro, GW; Allen, B; Batista, VS. Biophysical Journal. 2023, 122 (3).


Full paper: 10.1016/j.bpj.2022.11.389


Abstract: Allostery is a fundamental process by which biological macromolecules transmit the effect of a local perturbation at one site to a distal, functional site, allowing for regulation of activity. The long-range coupling between residues that gives rise to allostery in a protein is built up from short-range electrostatic and hydrophobic interactions. These are arguably the largest determinants of protein structure and are essential regulators of protein function. We introduce an effective coulombic electrostatic coupling network obtained from the analysis of molecular dynamics simulations of Cas9 in its apo, DNA- and RNA-bound forms. We characterize key electrostatic events that determine its functional activity and targeting precision. We demonstrate the locality of the electrostatic-interaction network over other connectivity matrices as validated through direct comparisons to NMR measurements. We define an electrostatic-based centrality metric that allows us to pinpoint relevant donor-acceptor pairs that promote charge displacements that modulate the cross-interaction between the PAM-interacting region and catalytic domains. We determine key amino acid residues central to the network, allowing us to identify a circular allosteric pathway that channels perturbations from the PAM-interacting domain to the HNH and RuvCII domains, and then back to the PAM-contacting region. The connectivity around HNH is important for controlling the directionality of signal transfer from and towards the PAM-interacting domain. The effective coulombic electrostatic coup

"Twisting and Swiveling Domain Motions in Cas9 to Recognize Target DNA Duplexes, Make Double-Strand Breaks, and Release Cleaved Duplexes"


Wang, J; Arantes, PR; Ahsan, M; Sinha, S; Kyro, GW; Maschietto, F; Allen, B; Skeens, E; Lisi, GP; Batista, VS; Palermo, G. Frontiers in Molecular Biosciences. 2023, 9.


Full paper: 10.3389/fmolb.2022.1072733


Abstract: The CRISPR-associated protein 9 (Cas9) has been engineered as a precise gene editing tool to make double-strand breaks. CRISPR-associated protein 9 binds the folded guide RNA (gRNA) that serves as a binding scaffold to guide it to the target DNA duplex via a RecA-like strand-displacement mechanism but without ATP binding or hydrolysis. The target search begins with the protospacer adjacent motif or PAM-interacting domain, recognizing it at the major groove of the duplex and melting its downstream duplex where an RNA-DNA heteroduplex is formed at nanomolar affinity. The rate-limiting step is the formation of an R-loop structure where the HNH domain inserts between the target heteroduplex and the displaced non-target DNA strand. Once the R-loop structure is formed, the non-target strand is rapidly cleaved by RuvC and ejected from the active site. This event is immediately followed by cleavage of the target DNA strand by the HNH domain and product release. Within CRISPR-associated protein 9, the HNH domain is inserted into the RuvC domain near the RuvC active site via two linker loops that provide allosteric communication between the two active sites. Due to the high flexibility of these loops and active sites, biophysical techniques have been instrumental in characterizing the dynamics and mechanism of the CRISPR-associated protein 9 nucleases, aiding structural studies in the visualization of the complete active sites and relevant linker structures. Here, we review biochemical, structural, and biophysical studies on the underlying mechanism with emphasis on how CRISPR-associated protein 9 selects the target DNA duplex and rejects non-target sequences.

"Structural Basis for Reduced Dynamics of Three Engineered HNH Endonuclease Lys-to-Ala Mutants for the Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Associated 9 (CRISPR/Cas9) Enzyme"


Wang, J; Skeens, E; Arantes, P; Maschietto, F; Allen, B; Kyro, GW; Lisi, GP; Palermo, G; Batista, VS. Biochemistry. 2022, 61 (9), 785-794.


Full paper: 10.1021/acs.biochem.2c00127


Abstract: Many bacteria possess type-II immunity against invading phages or plasmids known as the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated 9 (Cas9) system to detect and degrade the foreign DNA sequences. The Cas9 protein has two endonucleases responsible for double-strand breaks (the HNH domain for cleaving the target strand of DNA duplexes and RuvC domain for the nontarget strand, respectively) and a single-guide RNA-binding domain where the RNA and target DNA strands are base-paired. Three engineered single Lys-to-Ala HNH mutants (K810A, K848A, and K855A) exhibit an enhanced substrate specificity for cleavage of the target DNA strand. We report in this study that in the wild-type (wt) enzyme, D835, Y836, and D837 within the Y836-containing loop (comprising E827-D837) adjacent to the catalytic site have uncharacterizable broadened 1H15N nuclear magnetic resonance (NMR) features, whereas remaining residues in the loop have different extents of broadened NMR spectra. We find that this loop in the wt enzyme exhibits three distinct conformations over the duration of the molecular dynamics simulations, whereas the three Lys-to-Ala mutants retain only one conformation. The versatility of multiple alternate conformations of this loop in the wt enzyme could help to recruit noncognate DNA substrates into the HNH active site for cleavage, thereby reducing its substrate specificity relative to the three mutants. Our study provides further experimental and computational evidence that Lys-to-Ala substitutions reduce dynamics of proteins and thus increase their stability.

“Photophysics of Rhenium(I) Polypyridyl-Based Complexes and Their Employment as Highly Sensitive Anion Sensors”


Kyro, GW; Lees, AJ. 2021.


Full paper: 10.13140/RG.2.2.29980.56962


Abstract: Anion sensing has been gaining a tremendous amount of attention over the last 25 years because of the important roles that anions play in both biological and chemical systems. Anion sensors present great potential for application in a wide array of fields, offering new methods for bioanalytical applications in living organisms, as well as the ability to extract chemical pollutants from the environment. Rhenium(I) complexes containing amide subunits have been reported in the literature to interact very strongly with anions through charge-assisted amide hydrogen bonding, displaying binding affinities for anions as high as Ka ~ 10^6 M^-1 in CH2Cl2 solution. To better understand the electronic properties of these fascinating systems, a series of rhenium(I) polypyridyl-based complexes with a unique recognition site have been synthesized and tested with anions. The sensors display binding affinities for anions as high as Ka ~ 10^6 M^-1 in CH2Cl2 solution, with the strongest interactions observed for fluoride, cyanide, and iodide. The complexes recognize anions between the amide protons and the central pyridine but exhibit a unique binding mode with iodide ions not previously recognized for these systems.