Many drugs work by inhibiting protein enzymes associated with a particular disease. Unfortunately, the same drugs can inhibit protein enzymes unrelated to the disease, resulting in harmful side effects. One potential solution is to better identify structural features that determine a protein enzyme's function.

Now, a team headed by a computational biologist at the University of Maryland School of Medicine (UMSOM) has developed a suite of computer programs that cull through data on structure and genomic sequencing to identify the features that distinguish one enzyme from similar enzymes. This research has the potential to significantly accelerate drug discovery, allowing scientists to develop more effective drugs, more quickly.

"This new approach allows proteins to be analyzed at a much deeper, more specific level," says Andrew F. Neuwald, PhD, Professor of Biochemistry & Molecular Biology, a senior scientist at the Institute for Genome Sciences (IGS) at UMSOM, and the lead author of the paper describing the new method. "This method provides clues regarding sequence and structural features responsible for a protein's specific biological function."

The paper was published this week in the journal eLife. Dr. Neuwald collaborated on the work with L. Aravind, PhD, and Stephen F. Altschul, PhD, two senior investigators at the National Center for Biotechnology Information at the National Institutes of Health.

In the paper, the investigators used this approach to identify the key features of various enzymes: N-acetyltransferases, P-loop GTPases, RNA helicases, synaptojanin-superfamily phosphatases and nucleases, and thymine/uracil DNA glycosylases. The results revealed striking and previously overlooked structural features likely associated with each protein's function. This has the potential to lead researchers to new ways of designing drugs that have fewer unintended, harmful side effects.

The two main programs are BPPS (Bayesian Partitioning with Pattern Selection), and SIPRIS (Structurally Interacting Pattern Residues' Inferred Significance). The programs and source code are freely available and require only a minimal knowledge of Linux, thereby making this approach widely accessible to other researchers. This approach will also be useful for protein engineering and for understanding the molecular basis of many human diseases.

The three researchers each brought something different to the work. Dr. Neuwald, who has worked on protein analysis for years, has a varied background, with experience in molecular biology, computer science and Bayesian statistics. Dr. Aravind is a well-known computational biologist with a broad knowledge of protein structure and function. Dr. Altschul, whose formal training is in mathematics, was the first author on two landmark publications describing the popular sequence database search programs BLAST and PSI-BLAST.

Neuwald AF, Aravind L, Altschul SF.
Inferring joint sequence-structural determinants of protein functional specificity.
Elife. 2018 Jan 16;7. pii: e29880. doi: 10.7554/eLife.29880.