Learning Dynamics with Language Models and Physics
Michael W. Mahoney
ICSI, LBNL, and Department of Statistics, UC Berkeley
Dynamics and time are ubiquitous throughout science and engineering (and, of course, the world). Yet, modern machine learning tends to struggle with learning dynamics. Much of machine learning theory focuses on static tasks such as classification and regression; and the most motivating and highest profile machine learning applications, such as image classification and textual sentiment analysis, are not obviously dynamical. Instead, they use convolutional models (in computer vision) and sequence-to-sequence models (in natural language processing). All that being said, dealing with tasks that have (directly or indirectly) a dynamical component is an increasingly important area in machine learning and scientific machine learning. I will review several recent lines of work illustrating how intuitive ideas sometimes fail, and non-intuitive ideas sometimes succeed, when machine learning dynamics. First, how popular machine learning pipelines can learn discrete but not meaningfully continuous dynamics. Second, how large language models provide high-quality embeddings that can perform well for time series forecasting and related tasks. Such results hold (in different ways) for a broad range of widely-used tasks as well as niche state-of-the-art forecasting tasks; and the likely reason for this is that these language models learn good embeddings over multiple “time” scales, due to the sequence-to-sequence structure of their training. Third, how combining convolutional and long expressive memory ideas in just the right way can lead to improved performance in challenging seismology and related scientific machine learning applications. Taken together, these lines of work demonstrate some of the subtleties as well as some of the potential of using language models and physics knowledge to machine learn dynamics.
Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also an Amazon Scholar as well as head of the Machine Learning and Analytics Group at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, scalable stochastic optimization, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he was on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he co-organized the Simons Institute’s fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute’s 2016 PCMI Summer Session on The Mathematics of Data, he ran the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he was the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.