Cory L. Strope

E-mail: cstrope AT cse DOT unl DOT edu
Office: Bioinformatics lab, N169 Beadle Center
Curriculum Vitae

I am currently a Computer Science Ph.D. candidate, specializing in Bioinformatics at the University of Nebraska - Lincoln. I work in the Bioinformatics Lab.

Research Interests

Sequence Simulation:

Strope CL, Scott SD, Moriyama EN. 2007. indel-Seq-Gen: a new protein family simulator incorporating domains, motifs, and indels. Mol. Biol. Evol. 24:640-649. (preprint PDF file, Supplementary Files)

Abstract: Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain-shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible, and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the ``true'' multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, e.g., multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their ``true'' multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.

Indel Informativeness: Insertions and deletions, although believed to be rare occurrances in the evolutionary history of protein sequences, are ignored in nearly all protein sequence analysis. Much has been made of indels as ``missing'' information in protein sequence sets [9], but with large genomic and proteomic experiments occurring today, datasets of proteins are large enough to be able to pinpoint some indel events [5]. There have been some research in the area (See [4,6,7,8,10]), but mainstream applications continue to ignore indel information. Specifically:

Subcellular Localization: An important step in the post-genomics era is the functional characterization of protein sequences. Characterization of protein sequences provides hints to the function of the protein in the cell, as well as other sequences to which the protein may interact with. One such method of characterizing the protein sequences is to determine the subcellular localization of each protein. Eukaryotic cells have many organelles that perform certain functions to assist in the overall function of the cell. Following the central dogma of biology, which states that function is carried out by the protein sequences, predicting the localization of protein sequences within these subcellular compartments will greatly assist in the functional annotation of the protein sequences. As such, prediction of the subcellular localization of protein sequences by computational means has been a hot area of research.


Presentations


LATEX and gnuplot help

Journals

I have been keeping track of many interesting journals, related to biology, computer science, and bioinformatics. One item of note is that many of these journals are for subscribers only. For these journals, using a computer under the .unl.edu domain is often a good idea. Finally, it is always a very good idea to perform a search using HighWire Press (the first link). The advanced search feature allows you to search any subject you are interested in (with a feature allowing you to sort the hits by date); there is also a very nice feature called the TopicMap, which allows you to navigate a tree of subjects to find journals that have relevant information on your preferred subject. I highly recommend that you play around on their site to find all of the interesting things you can do!

Et Cetera

Bibliography

1
Grassly,N., Adachi,J., Rambaut,A. (1997) PSeq-Gen: an application for the monte carlo simulation of protein sequence evolution along phylogenetic trees, Bioinformatics, 13, 559-560.

2
Benner,S., Cohen,M., Gonnett,G. (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins, J. Mol. Biol., 229, 1065-1082.

3
Chang,M.S.S., Benner,S.A. (2004) Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments, J. Mol. Biol., 341, 617-631.

4
Giribet, G., Wheeler, W.C. (1999) On Gaps, Mol. Phyl. Evol., 13, 132-143.

5
Gupta, R.S. (2006) Molecular signatures (unique proteins and conserved indels) that are specific for the epsilon proteobacteria Campylobacterales). BMC Genomics 7:167.

6
Mitchison, G.J. (1999) A probabilistic treatment of phylogeny and sequence alignment, J. Mol. Evol., 49, 11-22.

7
Simmons, M.P., Ochoterena, H. (2000) Gaps as characters in sequence-based phylogenetic analyses, Syst. Biol., 49, 369-381.

8
Simmons, M.P., Ochoterena, H., Carr, T.G. (2001) Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses, Syst. Biol., 50, 454-462.

9
Waddell, P.J. (2005) Measuring the fit of sequence data to phylogenetic model: allowing for missing data. Mol. Biol. Evol22:395-401.

10
Young, N.D., Healy, J. (2003) GapCoder automates the use of indel characters in phylogenetic analysis, BMC Bioinformatics, 4.


cory strope 2007-03-04