Abstract:
Malware detection is essential in cybersecurity, yet its accuracy is
often compromised by class imbalance and limited labeled data. This study
leverages Conditional Generative Adversarial Networks (cGANs) to generate
synthetic malware samples, addressing these challenges by augmenting the
minority class.
The cGAN model generates realistic malware samples conditioned on class
labels, balancing the dataset without altering the benign class. Applied to the
CICMalDroid2020 dataset, the augmented data is used to train a LightGBM
model, leading to improved detection accuracy, particularly for underrepresented
malware classes.
The results demonstrate the efficacy of cGANs as a robust data augmentation
tool, enhancing the performance and reliability of machine learning-based
malware detection systems.
Key words and phrases:malware detection, Generative Adversarial Networks,
machine learning, cybersecurity, data augmentation