Spring 2025
Time & Location: All talks are on Wednesdays in TBA at TBA PM unless otherwise noted.
Organizer: Xiang ji, Michelle Lacey and Yuwei Bao
February 12
Title: Multidimensional Empirical Wavelet Transform
Charles-Gérard Lucas- San Diego State University
Abstract: The empirical wavelet transform, inspired by empirical mode decomposition, is an adaptive time-frequency representation that extracts the different modes of a signal or image by designing a bank of adaptive wavelet filters. The data robustness of this transform has made it the subject of intense development and a growing number of applications over the past decade. However, to date, it has mainly been studied theoretically for signals, and its extension to images is limited to a specific wavelet kernel. This presentation will focus on a multidimensional extension of this transform formulated from a wavelet kernel. Theoretical and numerical properties of this formulation will be particularly detailed. Its interest for texture segmentation will also be highlighted.
Time: 1:00 pm
Location: 104 Norman Mayer
March 24
Title: Random-Walk Debiased Inference for Contextual Ranking Model with Application in Large Language Model Evaluation
Yichi Zhang - Indiana University Bloomington
Abstract: We propose a debiased inference framework to infer the ranking structure in the contextual Bradley-Terry-Luce (BTL) model. We first adopt a nonparametric maximum likelihood estimation method using ReLU neural networks to estimate unknown preference functions in the model. For the inference of pairwise ranking, we introduce a novel random-walk debiased estimator that efficiently aggregates all accessible estimating scores. In particular, under mild conditions, our debiased estimator yields a tractable distribution, and achieves the semiparametric efficiency bound asymptotically. We further extend our method by incorporating multiplier bootstrap techniques for the uniform inference of ranking structures, and adapting it to accommodate the distributional shift of contextual variables. We provide thorough numerical studies to validate the statistical properties of our method, and showcase its applicability in evaluating large language models based on human preferences under different contexts.
Time: 4:00 pm
Location: Norman Mayer Building 101
March 24
Title: From Myth to Truth: An Introduction to Statisticians’ Role In Drug Development
Cindy Lu and Xinyu Cong - AstraZeneca
Abstract: Our presentation explores the evolving role of statisticians in the pharmaceutical industry, particularly within drug development. It introduces the various stages of the drug development process, from pre-clinical trials through Phase IV post-market, highlighting statisticians critical roles during those processes. The presentation aims to dispel common myths about the statistical profession in pharma, encouraging more talented graduates devote their career to the mission of bringing innovative treatments to patients.
Time: 4:00 pm
Location: Norman Mayer Building 101
April 2
Title: Dealing with discordance in the Tree of Life
Matthew Hahn - Indiana University Bloomington
Abstract: Phylogenetics is concerned with uncovering the relationships among organisms (the “Tree of Life”), and statistical computational research has made many important contributions to achieving this goal. In this talk I discuss a major overall challenge facing the field as DNA sequencing efforts have become central to this work: many individual genes have tree topologies that do not match the tree describing relationships among species. Such gene-tree discordance poses many new difficulties for inferring the Tree of Life. Here, I present three approaches for dealing with discordance: 1) a deep-learning method for inferring gene-tree topologies; 2) a quartet summary approach that combines many different gene-tree topologies to construct an accurate species tree, even in the presence of duplication and loss; and 3) a probabilistic approach to reconstructing the history of different traits on a species tree in the presence of discordance. These three problems (and their solutions) represent just a fraction of the challenges now facing the field of phylogenetics.
Time: 4:00 pm
Location: Dinwiddie Hall 102
April 23
Title: MCMC Importance Sampling via Moreau-Yosida Envelopes
Eric Chi - Rice University Host: (Xiang Ji)
Abstract: Markov chain Monte Carlo (MCMC) is the workhorse computational algorithm employed for inference in Bayesian statistics. Gradient-based MCMC algorithms are known to yield faster converging Markov chains. In modern parsimonious models, the use of non-differentiable priors is fairly standard, yielding non-differentiable posteriors. Without differentiability, gradient-based MCMC algorithms cannot be employed effectively. Recently proposed proximal MCMC approaches, however, can partially remedy this limitation. These approaches employ the Moreau-Yosida (MY) envelope to smooth the nondifferentiable prior enabling sampling from an approximation to the target posterior. In this work, we leverage properties of the MY envelope to construct an importance sampling paradigm to correct for this approximation error. We establish asymptotic normality of the importance sampling estimators with an explicit expression for the asymptotic variance which we use to derive a practical metric of sampling efficiency. Numerical studies show that the proposed scheme can yield lower variance estimators compared to existing proximal MCMC alternatives.
Time: 4:00 pm
Location: Dinwiddie Hall 102
April 30
Title: Composite likelihood approaches to phylogenetic inference under the multispecies coalescent
Laura Kubatko - Ohio State University Host: (Xiang Ji)
Abstract: Species-level phylogenetic inference under the multispecies coalescent model remains challenging in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. Algebraic approaches intended to establish identifiability of species tree parameters have suggested computationally efficient inference procedures that have been widely used by empiricists and that have good theoretical properties, such as statistical consistency. However, such approaches are less powerful than approaches based on the full likelihood. Methods based on composite likelihood are a compromise between these two approaches that enable computationally efficient inference while maximizing use of the available sequence data. In this talk, I’ll describe the relationship between these two approaches, highlighting the strengths and weaknesses of each and providing directions for future work.
Time: 4:00 pm
Location: Dinwiddie Hall 102