RUS  ENG
Full version
JOURNALS // Zapiski Nauchnykh Seminarov POMI // Archive

Zap. Nauchn. Sem. POMI, 2024 Volume 540, Pages 82–112 (Mi znsl7545)

Feature engineering pipeline optimisation in AutoML workflow using large language models

I. L. Iov, N. O. Nikitin

ITMO University, St. Petersburg, Russia

Abstract: One important way to achieve more efficient automated machine learning is to involve meta-optimisation for all stages of the pipeline design. In this work, we aim to use large language models for feature engineering steps as both optimisers and domain-knowledge experts. We encode the feature engineering pipeline in natural language as a sequence of atomic operations. Black-box optimisation is implemented by requesting a feature engineering pipeline from the LLM using a prompt consisting of predefined instructions, dataset description, and previously evaluated pipelines. To increase the time efficiency and stability of optimisation, we implement a population-based algorithm to produce a set of pipelines with each LLM response instead of a single one. Multi-step optimisation is attempted to provide the LLM with additional domain knowledge. To analyse the performance of the proposed approach, we conduct a set of experiments on the open datasets. Random search has been chosen as a baseline for the optimisation task. We find that while straightforward results obtained with the gpt-3.5-turbo model are close to the baseline with the same time cost, population-based pipeline generation outperforms the baseline and other approaches. Our results confirm that the proposed approach can increase the overall performance of machine learning models with the same time cost for optimisation and fewer tokens needed to obtain the result.

Key words and phrases: AutoML, large language models, feature engineering, black-box optimisation.

Received: 15.11.2024

Language: English



© Steklov Math. Inst. of RAS, 2025