Probability and Statistics 2018 Spring

Spring  2018
Time & Location: All talks are on Wednesdays in Gibson Hall 126 at 3:00 PM unless otherwise noted.
Organizer: Scott McKinley

February 28
Support points – a new way to compact big data
Simon Mak | Georgia Tech University
Abstract:
This talk presents a new method for compacting large datasets (or in the infinite-dimensional setting, distributions) into a smaller, representative point set called support points (SPs). In an era where data is plentiful but analysis is oftentimes expensive, the proposed data reduction technique can be used to efficiently tackle many challenging big data problems in engineering, statistics and machine-learning. Using a popular distance-based statistical energy measure introduced in Székely and Rizzo (2004), SPs can be viewed as minimum-energy points under the potential field induced by big data. As such, these point sets enjoy several nice theoretical properties on distributional convergence, integration performance and functional approximation. One key advantage of SPs is that it allows for an efficient and parallelizable reduction of big data via difference-of-convex programming. This talk concludes with several real-world applications of SPs, for (a) compacting Markov chain Monte Carlo (MCMC) sample chains in Bayesian computation, (b) propagating uncertainty in expensive simulations, and (c) efficient kernel learning with big data.
Location:  Stanley Thomas 316
Time: 2:00 PM

April 11
Multivariate Self-similarity: Multiscale eigenstructures for the estimation of Hurst exponents - Application to Internet traffic monitoring:
Patrice Abry | CNRS and ENS-Lyon
Abstract:
Scale invariance has become an ubiquitous paradigm massively used to model temporal dynamics in real- world data. Self-similar processes, and particularly their Gaussian instance, fractional Brownian motion, consist of the most common stochastic model used to account for scale invariance. However, most applications of self-similarity remained so far univariate, while data collected in real world applications most often naturally come as multivariate. Recently, Operator Fractional Brownian Motion (OfBm) has been proposed in the literature as the reference model for multivariate self-similarity. It yet remained barely used because of the lake of available identification procedure for the joint estimation of the parameters entering its definition. The present contribution achieves a first major step in the full identification of M-variate OfBm by proposing a procedure permitting to estimate the vector of M-Hurst exponents underlying its temporal dynamics. The proposed estimation procedure relies on the theoretical study of the multiscale eigen structure of the wavelet spectrum of OfBm. The proposed estimator is shown theoretically to be consistent and practically efficient, with asymptotic normality. Monte Carlo simulations applied to numerous independent copies of synthetic OfBm enable us to assess practically the actual estimation performance of the proposed procedure in a M-variate setting and for finite size data.