ACO Defense of Research Proposal of Mengqi Lou, August 14, 2023
Title: Sharp predictions for complex iterative algorithms on non-convex optimization problems with random data.
Date: August 14th, 2023
Time: 2:00pm ET
Location: Groseclose 118
Advisor: Dr. Ashwin Pananjady, School of Industrial and Systems Engineering, School of Electrical and Computer Engineering, Georgia Institute of Technology
Dr. Cheng Mao, School of Mathematics, Georgia Institute of Technology
Dr. Arkadi Nemirovski, School of Industrial and Systems Engineering, Georgia Institute of Technology
Dr. Vladimir Koltchinskii, School of Mathematics, Georgia Institute of Technology
Abstract: Worst-case efficiency theory for nonconvex problems paints a pessimistic picture for optimization algorithms. However, natural iterative algorithms are routinely applied to nonconvex problems in many modern applications in data science in which the optimization problem is random, and several of them converge quite quickly to accurate solutions. At the same time, the observed convergence behavior for these algorithms is often delicate, and the presence or absence of convergence---as well as the rate of convergence---depends critically on how well the algorithm is initialized. This wide range of possible behavior motivates the need for a sharp, average-case theory of efficiency for iterative nonconvex optimization in settings with random data. With this broad goal in mind, we study two families of algorithms for matrix estimation.
Specifically, we consider the problem of estimating the factors of a rank-1 matrix with i.i.d. Gaussian, rank-1 measurements that are nonlinearly transformed and corrupted by noise. First, we study the convergence properties of a natural alternating update rule for this nonconvex optimization problem. We show sharp convergence guarantees from a random initialization by deriving a deterministic recursion that is accurate even for high-dimensional problems. Our sharp, non-asymptotic analysis exposes several other fine-grained properties of this problem, including how the nonlinearity and noise level affect convergence behavior. Second, we study a stochastic algorithm for this problem, specifically a type of mini-batched prox-linear iteration. Once again, we provide deterministic predictions of the error of the algorithm along with non-asymptotic fluctuation guarantees around this prediction for the random iterates. We demonstrate how to use our deterministic predictions to perform hyperparameter tuning (e.g., step-size and batch-size selection) without running the method.