RUS  ENG
Full version
JOURNALS // Computer Research and Modeling // Archive

Computer Research and Modeling, 2024 Volume 16, Issue 7, Pages 1703–1713 (Mi crm1243)

SPECIAL ISSUE

Generating database schema from requirement specification based on natural language processing and large language model

N. Salema, Kh. Al-Tarawneha, A. Hudaiba, H. Salemb, A. Tareefc, H. Salloumb, M. Mazzarab

a King Abdullah II School for Information Technology, University of Jordan, Amman, Jordan
b Innopolis University, 1 Universitetskaya st., Innopolis, 420500, Russia
c Faculty of Information Technology, Mutah University, Karak, Jordan

Abstract: A Large Language Model (LLM) is an advanced artificial intelligence algorithm that utilizes deep learning methodologies and extensive datasets to process, understand, and generate humanlike text. These models are capable of performing various tasks, such as summarization, content creation, translation, and predictive text generation, making them highly versatile in applications involving natural language understanding. Generative AI, often associated with LLMs, specifically focuses on creating new content, particularly text, by leveraging the capabilities of these models. Developers can harness LLMs to automate complex processes, such as extracting relevant information from system requirement documents and translating them into a structured database schema. This capability has the potential to streamline the database design phase, saving significant time and effort while ensuring that the resulting schema aligns closely with the given requirements. By integrating LLM technology with Natural Language Processing (NLP) techniques, the efficiency and accuracy of generating database schemas based on textual requirement specifications can be significantly enhanced. The proposed tool will utilize these capabilities to read system requirement specifications, which may be provided as text descriptions or as Entity-Relationship Diagrams (ERDs). It will then analyze the input and automatically generate a relational database schema in the form of SQL commands. This innovation eliminates much of the manual effort involved in database design, reduces human errors, and accelerates development timelines. The aim of this work is to provide a tool can be invaluable for software developers, database architects, and organizations aiming to optimize their workflow and align technical deliverables with business requirements seamlessly.

Keywords: large language model, natural language processing entity-relationship diagrams, SQL

UDC: 004.42

Received: 26.10.2024
Revised: 21.11.2024
Accepted: 25.11.2024

Language: English

DOI: 10.20537/2076-7633-2025-16-7-1703-1713



© Steklov Math. Inst. of RAS, 2025