Explore more publications!

Earthquake Data Provide Solid Footing for AI Foundation Science Model

RICHLAND, Wash.—There’s been a seismic shift in science, with scientists developing new AI tools and applying AI to just about any question that can be asked.

Researchers are now putting actual seismic waves to work, using data from the world’s largest repository of earthquake data to develop “SeisModal,” an AI foundation model designed to explore big questions about science. The effort, known as Steel Thread, involves researchers from five national laboratories operated by the U.S. Department of Energy.

Foundation models form the cornerstone of the AI landscape and are a must-have tool for researchers. They’re built using a broad set of data and form a foundation of knowledge and reasoning that can be adapted to many specific purposes. Current large language models are good examples, providing a knowledge base of text and code that can be built upon to explore many types of questions.

While many powerful foundation models have been created by industry and others, few are built from inception with the science of nuclear nonproliferation as the focus. That’s the goal of Steel Thread.

“We’re creating a foundation model with broad capability that can be applied to multiple problems in science with minimal retraining for each application,” said Karl Pazdernik, a chief data scientist at Pacific Northwest National Laboratory who is the science lead of the Steel Thread team. He discussed the effort in an invited talk at the annual Joint Statistical Meetings in Nashville last summer.

SeisModal brings together data about many characteristics of earthquakes into one foundation model that can be adapted to broader scientific questions. (Animation by Sara Levine | Pacific Northwest National Laboratory)

The Steel Thread project is funded by the National Nuclear Security Administration’s Office of Defense Nuclear Nonproliferation Research and Development. The project includes scientists from Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory (Chengping Chai) and Sandia National Laboratories (Lisa Linville), as well as PNNL.

The lead architects of SeisModal are PNNL scientists Sai Munikoti and Ian Stewart.

For any AI model, the size, quality and diversity of the dataset are key: The more high-quality data that is sufficiently diverse in scope, the better the chance that the model will be accurate across many tasks. 

Earthquakes trigger massive amounts of energy that move through the Earth, offering a source of information that could be relevant to the discrimination of underground events. So, the Steel Thread team is drawing on a dataset maintained by the National Earthquake Information Center. The database includes information about more than 16,000 seismic events and meets several important criteria: It’s publicly available, the data is of high quality and thousands of earthshaking events are included.

An important feature of SeisModal is that it’s what researchers call “multimodal”: The model can incorporate and make sense of many types of data. For earthquakes, that includes information about the intensity of the earthquake, location, timing, details about the event’s waveform, text and imagery like photos or video. 

The model integrates all those streams of information, creating a comprehensive picture of each event and providing a basis to study new events. Even when a few details are missing, a robust multimodal model can oftentimes draw firm conclusions from the data that is available. The goal of Steel Thread researchers is to create a model that can analyze a broad set of scientific data relevant for nuclear nonproliferation.

“Creating an AI foundation model whose goal is to understand scientific concepts can be a big lift, but it can have many applications beyond seismology,” said Pazdernik.

“Since we want our models to be rooted in science, a major focus of our project is also to make sure that any model that we build is trustworthy. To evaluate its trustworthiness, we need to understand the training data, be sure of its origin, and describe the security and usability of the model. SeisModal provides an excellent example of training on transparent data to build a trustworthy model for science,” added Pazdernik.

A strength of SeisModal is its capacity to analyze a “time series”—a series of events or data points, such as tremors from an earthquake or the electrical signals of a heartbeat.

“SeisModal can reason over complex time series data such as seismic waveforms, which is an advance over many current large language models,” said Stewart. “The ability to detect these signals and other uncommon data types opens the door to a wider variety of scientific analysis methods that were previously unavailable.”

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Share us

on your social networks:
AGPs

Get the latest news on this topic.

SIGN UP FOR FREE TODAY

No Thanks

By signing to this email alert, you
agree to our Terms & Conditions