CEPCEB Members
| Stefano Lonardi
Assistant Professor Department of Computer Science and Engineering 320
Surge University of California Riverside, CA 92521 Phone: (951) 827-2203
Fax: (951) 827-4643 
| Areas
of Expertise | - Computational Molecular Biology
- Data
Compression
- Data Mining
- Information Hiding
|
|
 |
| | Background I
received a Laurea degree cum laude in Computer Science from the
CS Department of the University
of Pisa, Italy. While I was pursuing a doctoral degree in Computer Science
and Electronical Engineering at the Department
of EE and CS, University of Padua, Italy,
I joined the graduate program of the Computer
Science Department of Purdue University.
During 1999, I spent the summer working for Celera Genomics, Rockville,
MD. I contributed to the code for the Celera Assembler that assembled the genome
of Drosophila melanogaster, and then made history by assembling the genome
of Homo sapiens. I joined the Department of Computer Science and Engineering,
University of California, Riverside in the summer of 2001. My research interests
are aimed towards the computational challenges of molecular biology and, in particular,
the statistical analysis of sequences. An unprecedented wealth of data is being
generated by genome sequencing projects and other efforts to determine the structures
and functions of biological molecules. The demands and opportunities for interpreting
these data are increasing at the same rate as biological databases are expanding.
At the current pace at which biological information becomes available in the form
of new DNA sequences, proteins, 3D structures, gene expression profiles, protein
interactions, etc., the ability to automatically analyze, cluster, classify, and
annotate it is becoming an increasingly critical need. Ever increasingly fast
and sophisticated algorithms have to be sought, while the scientific explorations
of the computational problems is still in its infancy. Nonetheless, a basic repertoire
of techniques for searching, matching, comparing, and analyzing some discrete
structures -- such as strings, arrays, trees, automata, etc. -- have been developed
over the past decades, in response to the needs coming from a variety of applications.
The focus of my research has been mostly concerned with the design of algorithms
to find, exploit and analyze regularities and repetitions in texts. In the domain
of data compression these repetitive structures are regarded as redundancies and
sought to be removed. Vice versa, in the context of learning, classification and
pattern discovery, repeated patterns are unveiled as carriers of information and
structure. With biological data increasingly and massively accumulated, a
critical issue is to design strategies to limit and filter what the analysis would
return. The limited bandwidth of our perceptions will become otherwise an unsurmountable
bottleneck in the analysis pipeline. We have to develop effective synthetic descriptions
of our data, generate succinct characterizations of the problem at hand, and enhance
prominent features of the analysis. Hints could come from data compression and
scientific visualization, as well as from information retrieval and machine learning.
Understanding and characterizing combinatorial structures and properties can also
help to reduce the amount of data at the outset. An orthogonal research direction
also addressing these needs would be to study sampling techniques and devise algorithms
to compute ``approximate analysis'' of the data. In addition to sequencing
projects, we can also expect a massive amount of data on gene expression to be
generated in the near future, mainly by DNA micro-array technology. Databases
and tools which can organize and visualize these data in useful ways need to be
developed. Appropriate statistical and mathematical models are required for analysis
and comparison of expression and functional data. One of the steps of these studies
uses pattern discovery tools to find promoters and enhancers of transcription.
Back to Top  Research
Interests Computational molecular biology, data mining, data
compression, statistical analysis of sequences, information hiding, advanced data
structures and algorithms, scientific visualization Back
to Top 
Ongoing Current Projects
| 
| | Verbumculus
is a a suite of software tools for the detection of over- or under-represented
words that might appear as consecutive substrings in nucleotide and amino acid
sequences. The inner core of Verbumculus rests on subtly interwoven properties
of statistics, pattern matching and combinatorics on words. These properties enable
us to limit drastically and a priori the set of over- or under-represented
candidate words of all lengths in a given sequence, thereby rendering it more
feasible both to detect and visualize such words in a fast and practically useful
way. (joint work with A.Apostolico, M.E. Bock) | | | | |
|  | | DNA
microarrays technology is a powerful method to monitor the activity of thousands
of genes simultaneously in a cell. An impressive amount of data is being
collected in "wet labs" all over the world. A large microarray instrument can
easily analyze and record in a day the expression profiles of say, tens of thousands
of genes. The output is typically in the tens of MBytes, and can be easily reach
in the hundreds of MBytes of raw data. It is believed crucial to keep the raw
data in some permament storage medium. Repeating the experiment is usually not
viable because of the high costs associated with this new technology. Some form
of data compression is therefore required. (joint work with Yu Luo) |
| | | | |  | | Greedy
off-line textual substitution refers to the following approach to lossless
compression. Given a long textstring x, a substring w is identified
such that replacing all instances of w in x except one by a suitable
pair of pointers yields the highest possible contraction of x; the process
is then repeated on the contracted textstring, until substrings capable of producing
contractions can no longer be found. (joint work with A.Apostolico) |
| | | | |  | | Pattern
Matching Pointers is a webpage containing an extensive collection of resources
on combinatorial pattern matching. Home pages of researches, books, software,
journals are constantly updated. Currently, the pages receives about two hundreds
hits per day, and it is ranked first (or second, depending on the days) in the
category "Computer
> Algorithms" by Google. |
| | | | |  | | gzipS
is a version of gzip which allows to hide within the text enough information to
warrant the authenticity of the compressed document. The design is based on cryptographically-secure
pseudo-random generators, in such a way that the hidden data cannot be retrieved
in a reasonable amount of time by an attacker unless the secret key is known.
The embedding is totally transparent: since it can still be decompressed by the
original LZ-77 algorithm, to a ``casual eavesdropper'' the augmented compressed
file appears perfectly normal. Preliminary experiments show also the degradation
in compression due to the embedding is almost negligible. (joint work with
M.J.Atallah) | | | | | |  | | Source
coding and channel coding are two opposing forces that present significant challenges
in error-resilient adaptive lossless data compression. Source coding tries to
decorrelate the input sequence as much as possible by removing redundant information,
while channel coding introduces additional correlation by adding the information
in order to protect against errors. Due to their devastating effects, errors in
adaptive data compression have been a long-standing open problem. As a result,
the non-resilience of adaptive data compression has hindered its use in many applications.
However, joint source-channel coding has emerged as a possible solution to the
problem. We have developed a novel joint source-coding algorithm capable of correcting
errors in the popular Lempel-Ziv'77 scheme without losing any practical compression
power (joint work with Wojciech Szpankowski) | | | | | |  | | HarvEST
EST analysis (joint work with Tim Close, Tao Jiang, Steve Wanamaker Jie Zheng) |
Current
Laboratory Personnel - Qiaofeng Yang, Ph.D. student, Genetics
program
- Yu Luo, Ph.D. student, Computer Science
- Jie Zheng, Ph.D.
student, Computer Science
- Kun Yan, M.S. student, Computer Science
Selected
Recent Publications (Bibliography
page) Back
to Top 
|