Melissa Cline, Recent Projects
cline@cse.ucsc.edu
Predicting reliable positions in protein sequence
alignments
Objective:
develop
methods to predict accurate positions in alignments of target sequences
to template sequence families.
-
Built libraries of alignments of proteins with low sequence
similarity and high structural similarity. Assembled heterogeneous data
on each alignment position; stored in RDB relational databases.
-
Trained neural networks to predict alignment position accuracy.
Optimized input feature sets through empirical sensitivity analysis.
-
Applied neural network, near-optimal alignment information,
column scores, and secondary structural information to predicting alignment
position accuracy. Empirically selected thresholds for removing positions.
Applied all predictors to validation alignments, not seen previously.
Results:The
best predictors removed 73% of the substantially-misaligned positions and
60% of the over-aligned positions, while retaining 85% of the accurate
positions.
Fold
recognition method development and sequence analysis
Objective:
Work
in team developing automated HMM-based protein structure prediction methods
and preparing predictions for submission to CASP protein structure prediction
contests.
-
Scripted automated methods for building library of HMM-based
sequence alignments, interpreting and extracting data from SWISSPROT, PDB,
FSSP, and HSSP databases.
-
Optimized remote homology alignment with SAM HMM suite, including
selection of sequence weighting scheme, template family alignment, alignment
algorithm, and SAM alignment parameters.
-
Analyzed structure predictions, identifying and removing
suspect alignment positions, and verifying predictions against structure
and function of the predicted folds..
Results:
Participated
in CASP2, CASP3, and CASP4. In CASP2, CASP3, and CASP4, method judged one
of the world's best for fold recognition.
Neural
network development in MATLAB
-
Extended the neural network toolbox by incorporating maximum
likelihood and cross-entropy error functions. Incorporated adaptive learning
rate, on-line learning, and sensitivity analysis.
-
Instructed Artificial Intelligence students on MATLAB neural
network development. Supervised experiments comparing learning algorithms
and error functions.
-
Successfully trained neural networks to predict alignment
position reliability, amino acid contact likelihood, and likelihood of
a stable or functional mutant in protein mutagenesis experiments.
Analysis
of pairwise contact potentials
Objective:
Analyze
the information content of pairwise amino acid contacts, types of
amino acids that tend to be close in protein structures.
-
Tested sets of pairwise contacts for statistical independence,
using chi-squared and G-tests.
-
Analyzed the dependence of pairwise mutual information on
sample size.
-
Developed method to estimate the expected mutual information
of two quantities, given sample size and assuming pairwise independence.
-
Analyzed classes of contacts to determine the relation between
contact information and factors including burial, secondary structure,
and amino acid properties.
Results:
Most
observed contact information results from hydrophobic forces or interactions
between a few specific types of amino acids. Information is richest in
long-range contacts and in beta-sheets.
Development
of an alignment quality measure
Objective:
Develop
a quality measure suitable for alignment method optimization: a single
number incorporating penalties for misalignment, aligning too much, and
aligning too little.
-
Designed and implemented the shift score, a measure
for comparing predicted and structural alignments.
-
Downloaded, reformatted, and scored CASP2 fold recognition
predictions, comparing shift score to CASP2 alignment assessment measures.
Results:
Shift
score correlates very well with accepted measures; when they disagree,
the shift score seems better. Applied the shift score to alignment optimization
and predicting accurate alignment positions.