This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
View analytic
Wednesday, June 29 • 2:30pm - 3:30pm
Statistical assessment of the similarity of amino-acid sequences

Log in to save this to your schedule and see who's attending!

Poster #1

One of the classes of data considered in order to support equivalence of a generic to a reference listed drug is the comparison of amino-acid chain distributions.  Sequences of amino-acids with certain molar ratio characteristics are used to explore novel comparison approaches, for these distributions.  Different similarity measures, such as Tanimoto distances can produce a similarity matrix comparing the sequences.  These measures will be compared based on their performance.  Furthermore, we should search for important characteristics (features) that produce a meaningful separation of the sequences into clusters.  This can be accomplished using weighted sampling, K-means and self-organizing maps (SOM).  Additionally, clustering can be explored through building probability profiles for sequences of fixed lengths.  In all these cases, a population of thousands of peptide chains from a single simulation resulted in hundreds of thousands of residue sequences.  Data cleaning/ organizing and pattern identification through these sequences of equal length, is computationally intensive and is carried using string detection functions such as ‘str_detect’ from the  R-package ‘stringr’.

When the circumstances necessitate cleavage of the amino-acid sequences at a certain residue, it is important to develop efficient coding, in order to investigate the properties of the distributions of the cleaved sequences and their molecular weights.  The cleavage and sequencing of such immense size - data sets, is efficiently handled by the ‘rstring’ and ‘Biostrings’ R-packages and storage container functions such as ‘AAStringSet’.  This group of functions also facilitates the task of building empirical probability distributions of all unique amino acid sequences of a specified length.

The performance of different metrics will be assessed and all approaches will be discussed in the context of using similarity of the amino-acid sequences, in order to demonstrate bioequivalence between a complex-molecule drug and its generic version.  Furthermore, the issue of seeking computationally efficient pathways for dealing with such data sets will be addressed.


Elena Rantou

Mathematical Statistician, OB, Office of Translational Sciences, CDER, FDA
Elena Rantou was awarded her Ph.D. in Statistics from American University, Washington DC. She has worked in academia and as a statistical consultant for many years. She joined FDA/CDER in 2013. Her work focuses on generic drugs review as well as research related to determining bioequivalence of locally acting drugs, classification / data mining for data fraud detection and similarity of pharmacokinetic profiles.

Wednesday June 29, 2016 2:30pm - 3:30pm
Sponsor Pavilion 326 Galvez Street Stanford, CA 94305-6105

Attendees (11)