V. M. Lomshakov, S. I. Nikolenko, “Large language models for source code generation and editing”, Zap. Nauchn. Sem. POMI, 2024, Volume 540,Pages <nobr>276

Large language models for source code generation and editing

V. M. Lomshakov^a, S. I. Nikolenko^ab

^a St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences
^b St. Petersburg National Research University of Information Technologies, Mechanics and Optics

Abstract: Lomshakov V. M., Nikolenko S. I. Large Language Models for Source Code Generation and Editing. In recent years, large language models (LLMs) have significantly transformed approaches to the automation of software development, providing powerful tools for code generation, correction, and optimization. In this survey, we examine methods for adapting LLMs to programming tasks, including reinforcement learning from human feedback (RLHF), instruction tuning, parameter-efficient fine-tuning (PEFT), and effective prompting strategies. We review modern approaches for fine-tuning and LLM applications, discuss their advantages and limitations, consider relevant datasets for code generation and correction tasks and the corresponding evaluation metrics. Additionally, we describe state of the art open weight models for working with source code.

Key words and phrases: large language models, source code generation, reinforcement learning, instruction tuning, LLM-based agents.

Received: 15.11.2024