RUS  ENG
Полная версия
СЕМИНАРЫ

Большой семинар кафедры теории вероятностей МГУ
26 марта 2025 г. 16:45, г. Москва, ГЗ МГУ, ауд. 12-24




[Dynamic Data Selection in Large Model Training]

Bingyi Jing

Southern University of Science and Technology

Аннотация: The training of large models typically requires the use of internet-scale massive data. Data quality is crucial to model performance, making the selection of high-quality samples from such vast datasets a critical issue. To address this, we have redesigned the lifecycle of data during the training process from the ground zero, starting with the underlying training framework. However, numerous challenges arise in the large-scale application of dynamic data filtering within current large model training systems. This report explores how to tackle these challenges.

Язык доклада: английский


© МИАН, 2025