Cory L. Strope
E-mail: cstrope AT cse DOT unl DOT edu
Office: Bioinformatics lab, N169 Beadle Center
Curriculum Vitae
I am currently a Computer Science
Ph.D. student, with a specialization in
Bioinformatics. at the
University of Nebraska - Lincoln.
I am currently working as a graduate research assistant in the Bioinformatics
Lab
under Dr. Etsuko N. Moriyama. Some of the projects (and abstracts)
that I am working on include:
- Sequence Simulation: Simulating sequence evolution is essential when we evaluate various molecular
evolutionary methods, e.g., multiple alignment and phylogenetic reconstruction. Currently available simulation
methods, however, consider only substitution events throughout evolution. They do not incorporate more dynamic
evolutionary events, such as duplication, recombination, insertion, and deletion. Such dynamic changes need to be taken
into account when we develop multiple alignment and phylogenetic methods that can reconstruct highly diverged sequence
evolution. In order to fill this gap, we developed a tool called indel-PSeq-Gen that simulates the evolution of protein
sequences that undergo amino acid substitutions, insertions, and deletions.
Other dynamic events, such as
recombination and duplication, are considered for future work in this project.
- Phylogenetic and Domain Networks: As more and more genomic data is gathered, the so-called ``tree of
life'' appears to be not so much a tree as it is a network, or even a ``ring of life''. There has been evidence of
many methods of gene transfers, most notably in bacteria. These events have created a new area in the field of
phylogenetics, the reconstruction of phylogenetic networks. There have been many new algorithms created to reconstruct
these networks, using methods such as quartets (SplitsTree) or through the optimization of a loss function through the
creation of extra branches (Reticulation Networks). However, a common problem with these approaches is that they add
too many network branches, perhaps making a much more connected network than actually exists. The root of the problem
may be that the current methods create a phylogenetic tree first, and then try to create network branches on this
phylogenetic tree. After the previously mentioned simulation method is complete, we can do a more complete analysis of
existing phylogenetic reconstruction methods, and begin development of another method of reconstruction.
- Machine Learning: Pattern recognition and classification has been an interest of mine since I begin
graduate school. Currently, I am joining a team under Dr. Stephen D. Scott, making an SVM kernel using the pairwise
distances between protein sequences in order to classify sets of proteins.
gnuplot
- A website for gnuplot help
- To create a graph quickly, use the command
( echo set term postscript default eps ; echo plot \"FILE1\", \"FILE2\", \"FILE3\", \"FILE4\" ) | gnuplot > plot.eps,
where FILEx is a filename with points to be plot, and plot.eps is the output file.
- Another way to create a gnuplot document is done by running the command
gnuplot gnu > output.eps
where gnu is the file:
set term postscript eps 20
set xlabel "Branch length \(PAM units\)" font "Helvetica,24"
set ylabel "Probability of an indel" font "Helvetica,24"
plot 0.0224-0.0219*exp(-0.01168*x) notitle, "E_indels_" notitle
that will produce the file output.eps:
During my Ph.D. and Master's work at UNL, I have done many presentations over different areas of my work and interests.
Here is a list of the presentations that I have given (Chronological order, most recent first):
- Randomized Online Algorithms: This is the
presentation that I gave over a two-week span for the Randomized Algorithms seminar.
- Simulating Families of ``Twilight Zone'' proteins:
Protein identification using only sequence information is a difficult task, and for any method that attempts to perform
it, there must be a method that can validate the results. This is mostly done through the use of simulated sequences,
so that the evolution of the protein sequences is known. However, before my method, there has been no simulation
package that could create realistic twilight-zone proteins. This presentation is a discussion of the method I used to
create such a package.
- Moriyama Lab 3: Gaps in alignments are often treated as
missing information. However, gaps in alignments represent substantial evolutionary events, such as insertion or
deletion events. This type of information should be usable to infer evolutionary history among sets of proteins. In
this presentation, I present many methods of using these events in phylogenetic construction, using these
papers [4,5,6,7,8].
- Bioinformatic Seminar Presentation over
indel-PSeq-Gen: indel-PSeq-Gen is a sequence generator that uses
PSeq-Gen [1] to simulate amino acid substitutions, while indel-PSeq-Gen simulates insertion and deletion
events (indels) based on empirical evidence given in Chang and Benner [3] and Benner et
al. [2].
- Moriyama Lab 2: A paper presentation. The paper
discussed the selective advantage for survival of a Horizontal Gene Transfer based on the codon usage index of the gene
versus the organism receiving the gene.
- Moriyama Lab 1: This is a basic
presentation over simulation study, beginning with PSeq-Gen (a protein sequence generator), discussing the algorithm
behind converting protein scoring matrices into transition matrices, then moving on to three types of phylogenetic tree
reconstruction: Neighbor-Joining, Maximum Parsimony, and Maximum Likelihood.
LATEX
LATEX
is a high-quality typesetting system,
with features designed for the production of technical and scientific documentation. LATEXis
the de facto standard for the communication and publication of scientific documents.
Learning LATEXcan be daunting in the beginning, however, after a little practice, it is almost
second nature. There is a lot of online documentation to help new users. A list of guides can
be found here.
Other helpful items:
- unlthesis.cls: The document that contains the typesetting of a thesis or
dissertation for the University of Nebraska - Lincoln,
- algorithm2e.sty: A style file that helps typeset algorithms. Helpful to
also have the user's
guide.
- threeparttable.sty: Helpful for creating a table with a small caption,
and a large legend (Especially useful when a table of contents is created). Also allows for footnotes to be created in
a table.
- ccaption.sty: This package allows the user to make the ``Figure'' and
``Table'' command boldfaced.
- bibunits.sty, natbib.sty
and
multibib.sty
are two packages used in conjunction with BiBTeX, for special
formatting purposes. Most often found in Journal submission guideline, such as Bioinformatics.
- The LATEX2
manual.
A very helpful guide.
I have been keeping track of many interesting journals, related to biology, computer science, and bioinformatics. One
item of note is that many of these journals are for subscribers only. For these journals, using a computer under the
.unl.edu domain is often a good idea. Finally, it is always a very good idea to perform a search using HighWire Press
(the first link). The advanced search feature allows you to search any subject you are interested in (with a feature
allowing you to sort the hits by date); there is also a very nice feature called the TopicMap, which allows you to
navigate a tree of subjects to find journals that have relevant information on your preferred subject. I highly
recommend that you play around on their site to find all of the interesting things you can do!
- 1
- Grassly,N., Adachi,J., Rambaut,A. (1997) PSeq-Gen: an application for the monte carlo simulation of
protein sequence evolution along phylogenetic trees, Bioinformatics, 13, 559-560.
- 2
- Benner,S., Cohen,M., Gonnett,G. (1993) Empirical and structural models for insertions and deletions
in the divergent evolution of proteins, J. Mol. Biol., 229, 1065-1082.
- 3
- Chang,M.S.S., Benner,S.A. (2004) Empirical analysis of protein insertions and deletions determining
parameters for the correct placement of gaps in protein sequence alignments, J. Mol. Biol., 341, 617-631.
- 4
- Giribet, G., Wheeler, W.C. (1999) On Gaps, Mol. Phyl. Evol., 13, 132-143.
- 5
- Mitchison, G.J. (1999) A probabilistic treatment of phylogeny and sequence alignment, J.
Mol. Evol., 49, 11-22.
- 6
- Simmons, M.P., Ochoterena, H. (2000) Gaps as characters in sequence-based phylogenetic analyses,
Syst. Biol., 49, 369-381.
- 7
- Simmons, M.P., Ochoterena, H., Carr, T.G. (2001) Incorporation, relative homoplasy, and effect of
gap characters in sequence-based phylogenetic analyses, Syst. Biol., 50, 454-462.
- 8
- Young, N.D., Healy, J. (2003) GapCoder automates the use of indel characters in phylogenetic
analysis, BMC Bioinformatics, 4.
cory strope
2005-07-06