In this talk, I will share experiences and insights from the NSF Algorithms for Threat Detection
(ATD) Challenges in 2021, 2022, and 2023, where my team earned first place by developing
innovative approaches to high-dimensional data analysis for large-scale spatial and temporal
traffic flow and GDELT datasets. Building on these achievements, I will introduce recent
advances in high-dimensional low-rank regression, distance-covariance (dCOV) Sufficient
Dimension Reduction (SDR), and Bayesian variable selection for mixed-type multivariate
responses.
A central focus of the presentation is a two-step posterior sampling algorithm with a fast
variable screening step designed to tackle the challenges posed by ultra-high-dimensional data.
This algorithm integrates an effective screening mechanism to efficiently reduce dimensionality
while preserving critical information. I will demonstrate the rigorous theoretical properties,
including screening consistency, posterior convergence, contraction rates, and consistency,
which establish its reliability and robust performance in practical applications. These methods
offer substantial potential for a diverse range of applications, from anomaly detection to
neuroimaging, by providing scalable and statistically grounded techniques for extracting
meaningful insights from complex, high-dimensional datasets such as multiple types of
carcinoma classification problem using gene expression data.