University of Hawaii, Manoa Reserach

2004-Present


Introduction

Extremophilic microorganisms have been found to exist in almost every environment possible, ranging from hypersalie lakes to acidic volcanic hot springs to deep ocean thermal vents. In all of these environments, the same basic molecular components are used in the fabrication of biomacromolecules such as nucleotides and proteins (Figure 1). Understanding how proteins are able to adapt to extreme environments is a question that has been examined by many researchers over the past two decades and has focused on two main lines of inquiry. In one, whole proteomes have been surveyed to look for changes in overall amino acid content in proteins from different species. While this method allows for the examination of a broad data set, it does not provide deep insight into molecular level adaptations. An alternative is the study of protein crystal structures. Here, similar proteins from mesophilic and extremophilic organisms are isolated, purified, and crystallized. This allows an in-depth analysis of environmentally related structural differences, but is limited in that the process is very involved and takes enormous amounts of effort to examine even a small set of molecules. To bridge this gap, we are working on a project that uses protein secondary structure routines to predict a-helical content of proteins and the use of this program to undertake an analysis of structural changes in these a-helices as a function of the environment. Through this project, we hope to establish general trends in a-helical composition that will help understand how proteins adapt to different environments. To accomplish this goal, we have set out a number of milestones that we are currently working on achieving:

Figure 1. Illustration demonstrating the search for environmental relationships in protein structure. Images of natural areas are from the National Parks Services webpages.


Project Progress

Our long-term goal for bioinformatics side of this project is to develop a new tool that can be used to analyze primary sequence information in an effort to determine environment-specific adaptations of proteins to extreme environments. However, since just straight amino acid content gives very little clue to how proteins adapt to extreme environments, we needed to find a way to convert primary sequence information in to a reliable and useful observations on how stability is achieved in protein in extreme environments. To this, we decided to explore protein a-helix composition as a function of environment. There are several reasons for this decision: (1) a-helices are a major secondary structural element, comprising approximately half of the amino acids in a given protein, (2) a-helices can be readily studied in synthetic model peptides that can be used to probe structure-stability relationships, (3) studies have shown that structure-stability studies in a-helical peptides can be used to understand the stability of a-helices in proteins, (4) a-helices are often the first phase of protein folding pathways, and therefore their ability to form in a given environment is a critical step in many folding mechanisms, and (5) there exist a number of computer programs that can accurately predict a-helix sequences from primary proteins sequences. To investigate environment-composition relationships in a-helices, we are looking both at amino acid content as well as for distributions of amino acid motifs that indicate intra-helical interactions that are expected to stabilize a-helices.

Stabilizing intra-helical motifs occur when two amino acid side chains whose functional groups can partake in a non-covalent bonding interaction are placed at the appropriate place in an a-helix so that this interaction can occur. From the canonical list of 20 amino acids, it is possible to conceive of 11 different modes of non-covalent bonding interactions (Figure 2). Spacing and orientation are also important considerations. In terms of spacing, amino acids with either two or three spaces apart on an a-helix have their side-chains on the same helical face and therefore in positions where the side-chains can interact (Figure 3a). These positions are known as (i,i+3), (i,i+4) spacing, respectively. Another aspect of motif geometry is that the a-helix has a macrodipole that results form the hydrogen bonds between the backbone amide groups. The a-helix macrodipole is aligned such that the positive end is on the N-terminal and the negative end is on the C-terminal. Since most non-covalent bonding interactions also have dipole moments, the alignment of dipoles will also be important. Given all of these aspects of intra-helical interactions, there are 800 motifs that can be generated from the natural 20 amino acids; the types of motifs identified in Figure 2 cover over half of them.

Figure 2. Illustration of interactions between amino acid side chains that give rise to the 11 different types of motifs.

Figure 3. Illustrations of (a) a “helix-wheel” diagram demonstrating how spacing effects side-chain orientation and (b) parallel and antiparallel alignments of interaction dipole moments with helix macrodipole moment.

Initial results have been promising so far. In our first study, we looked at the distribution of intra-helical salt bridge motifs in the a-helices of proteins from mesophilic and thermophilic organisms. As expected, we found that the usage of salt bridge motifs increased with increasing optimal growth temperature (OGT) (Figure 4a). Considering composition, spacing, and alignment, there are 24 possible motifs that can lead to salt bridge formation. Of these, only a few were found to occur in large amounts (Figure 4b).

Figure 4. (a) Graph showing the increase of salt bridge usage with increasing OGT and (b) plot showing the distributions of individual motifs for each species studied.

Excited by these results, we have now turned to the study of whole genomes of organisms from other extreme environments. This is made possible by the use of protein secondary structure programs, in specific, we use the JPred and JNet programs. These programs use various methods to predict secondary structure from primary sequences, and we have interfaced this program with a motif search engine and a statistical package that examines amino acid and motif content. Our program is named the Salt Bridge Statistical Analyzer (SBSA) and it can be downloaded here.

We initially compared SBSA results to the salt bridge study above by using SBSA to analyze the same set of proteins that we had crystal structure information from. We found that most of the results SBSA provide were very similar to the crystal structure studies, indicating that SBSA could be used to provide reliable motif information starting from primary sequences. Using this program, we next analyzed the genomes of 35 mesophilic and extremophilic (psychrophilic, thermophilic, hyperthermophilic, acidothermophilic, and halophilic) organisms for amino acid and motif distributions. Considering only those motifs expected to give rise to a known intra-helical interactions, there was a large degree of diversity in which type of organism preferred which motifs. For example, while salt bridges were used often in thermophiles and hyperthermophiles, they were used less often in acidothermophiles and very infrequently by halophiles. Acidothermophiles preferred motifs with aromatic groups while psychrophiles preferred hydrogen-bonding motifs. An even more complex picture emerged when considering the relative distribution of each of the different motifs. From this work, we realized that each class of organism exhibited a unique distribution of these attributes and that these distributions may serve to act as proteomic fingerprints of proteins from a given environment.

One difficulty we have run into is that simultaneously analyzing similarities of motif distributions among large groups of organisms quickly became confusing to keep track of and we needed to develop a simple approach to examine these relationships. Another, related area of microbial bioinformatics research address a similarly complex question concerning organism evolutionary relationships. To study these, phylogenetic trees have been developed. These trees are based on sequence alignments of either proteins or nucleic acids and are built from distance matrices that measure how similar one sequence is to another. Inspired by this, we developed a similar system that produces Envirogenic Trees. An envirogenic tree is a cladogram that shows structural relationships between given sets of proteins. The basis of tree construction is a distance matrix which is calculated as a simple Euclidian distance (Djk) between species j and k, and is calculated as:

or

Once a Djk value is calculated between every species in a given dataset and a distance matrix is made, then a tree is built using UPGMA pairing in the Phylip package and a tree is then drawn. For example, in the group of 35 organisms we are currently studying, most of the species are grouped with other that are similar (e.g. mesophiles with mesophiles, thermophiles with thermophiles, etc) although we are still working on optimization of the construction process (Figure 5).

Figure 5. Example of an Envirogenic Tree constructed from motif propensity.


Current and Future Work

Currently, I am working on tying up several of the initial aspects of this project, writing papers, and at the same time looking for jobs. After the paperwork is done, I plan on getting back to this project and working with some of the computer science people to develop some new programs. The first will be one to streamline the tree construction process, which will enable a more through exploration of structural relationships using this approach. While this program is being developed, I will then build a large library of whole proteome SBSA files of as many genomes as I can download, and then the real fun will begin.

Another thing I am involved in is a recently established collaboration with Hector Ayala del Rio of the University of Puerto Rico. Dr. del Rio studies the genomes of psychrophilic organisms and we met at the recent AbSciCon conference in Washington D.C. where we discussed a possible collaboration. This eventually came to fruition, and I am currently using my tools on two genomes sequenced in his libratory.

Another thing I am getting to is to start doing model/protein and actual environmental studies. In the case of models, I have ordered some custom peptides that have motifs that produce repulsive electrostatic interactions and intend to study how these effect the stability of a-helices under different conditions. The two sets of peptides I acquired have motifs where either pairs of glutamic acid or histidine are spaced at the (i,i+3), (i,i+4), and (i,i+5) positions in the middle of the helix (Figure 6a). These peptides were chosen due to our observation that halophiles used these types of motifs with an unusually high propensity. This is somewhat counterintuitive since this type of motif might be expected to destabilize an a-helix in an effect opposite to that of a salt bridge motif. The working theory that we have is that these motifs are able to bind divalent metal ions, such as Mg2+ or Ca2+, and in so doing form a “ion bridge” (Figure 6b) that in turn stabilizes the helix. We will therefore be testing this premise by looking at peptide helicity as a function of pH and ionic content. Another study along these lines that I am planning on is to study the effect of specific salt bridges on protein stability. Here, I hope to work with George Bachand at the Center for Integrated Nanotechnology at Sandia National Labs to study directed mutations of the motor protein Kinesin aimed at achieving increased protein stability by the addition or changing of specific salt bridges. Finally, Mark Brown (another UHNAI fellow) and I have been talking recently about using SBSA and Envirogenic Trees to study the microbial mats he is investigating as part of the research in the Domachie group.

Figure 6. (a) Sequences of peptides that will be used to study ionic bridges and (b) illustration of an ionic bridge.


On The Side: A Month at Sea

One of the really cool expierences I had at UH was the chance to participate in an oceanographic research cruise lead by James Cowen, an Oceanography professor at UH. MORE INFO NEEDS TO BE ADDED Heck, we even got in the paper (Honolulu Star-Bulletin) six months after we got back.


Bioinformatic Toolbox


Papers resulting of this work


Websites of interest
:


This page was created on 7 July 2006 and last updated on 12 July 2006

Return to the Main Page

For more information, email me at andy@kiskadden.net