Research Project Β· KMIT 2025

MalCL

Continual Malware Classification with GAN-Based Generative Replay

Advanced machine learning approach using Conditional GANs to combat catastrophic forgetting in malware classification. Achieving 72% mean accuracy across 100 malware families.

72%

Mean Accuracy

100

Malware Classes

11

CL Tasks

+17.5pp

Above Target

Project Overview

Malware threats evolve continuously, requiring detection models to learn new malware families while retaining previously acquired knowledge. Traditional machine learning models suffer from catastrophic forgettingβ€”they forget older malware patterns when trained with new data.

MalCL addresses this challenge using conditional GANs to generate synthetic samples of previously learned malware families. These generated samples are combined with new malware data to train a CNN classifier, improving malware classification while reducing the need to store large historical datasets.

πŸ“ˆ

Continual Learning

11 sequential tasks with dynamic classifier head expansion

πŸ’Ύ

Reduced Forgetting

28.6% forgetting rate compared to 84.3% without replay

πŸ“Š

EMBER Dataset

100 malware families with 1,000 samples per class

MetricPaper MethodOur cGANImprovement
Mean Accuracy41.2%72.0%+30.8pp
Forgetting Rate67.1%28.6%-38.5pp
Replay StrategyUnconditional GAN + L1_C_MeanConditional GANSimplified

Catastrophic Forgetting in Malware Classification

Malware evolves continuously. When classifiers learn new malware families, they forget previously learned ones β€” a phenomenon called catastrophic forgetting. Replay-based methods generate synthetic samples of old classes to counteract this.

πŸ”“

01. New malware families emerge

Threat actors continuously develop new malware variants. A classifier must learn these new families without losing access to a retraining corpus.

πŸ’₯

02. Sequential learning degrades old knowledge

Training on new classes overwrites the model weights that encoded old class boundaries β€” catastrophic forgetting.

πŸ”„

03. Generative replay mitigates forgetting

A GAN generates synthetic samples of old classes. These are replayed during new task training, preserving prior knowledge.

⚠️

04. Class ambiguity remains in unconditional replay

Without class conditioning, generated samples may belong to any class. Assigning incorrect labels actively harms classification.

⚠️ Without Generative Replay

Task 191%
Task 285%
Task 372%
Task 456%
Task 538%
Task 620%
Task 78%

⚠️ Accuracy collapses from 91% β†’ 8% across 11 tasks

βœ“ With Generative Replay

Task 192%
Task 289%
Task 388%
Task 485%
Task 582%
Task 679%
Task 776%

βœ“ Accuracy maintained at 70-90% across all tasks

⚑

Key Finding: Class Ambiguity

The root cause of poor performance with unconditional GANs: unlabeled samples are assigned class labels post-hoc via centroid proximity. At 100 classes, centroids overlap significantly, causing mislabeled replay samples that actively teach wrong associations. Conditional GANs solve this by generating class-labeled samples at generation time.

System Architecture

Interactive architecture explorer. Click any component to explore its role in the cGAN replay pipeline.

Noise z
N(0,1)
+
Class c
one-hot
↓
cGAN Generator
Generates synthetic feature vectors
↓
Synthetic x̃
Generated Samples
+
Real Data (Task T)
New Samples
↓
CNN Classifier
Mixed batch training on old + new malware families

Generator (G)

Description

Generates synthetic malware feature vectors that resemble real samples from a given class.

Inputs

  • β€’Noise vector z ~ N(0,1)
  • β€’Class label c (one-hot, cGAN only)

Outputs

  • β€’Synthetic feature vector xΜƒ ∈ ℝ²⁡⁢

Implementation Details

3-layer MLP (256β†’512β†’256). BatchNorm + LeakyReLU. Class embedding concatenated to noise in cGAN variant.

Feature Dimension
256
Malware Classes
100
Samples/Class
1,000
Total Tasks
11

Experimental Results

All experiments evaluated on 100 malware classes across 11 sequential continual learning tasks.

Mean Accuracy by Method

No Replay12.4%
Paper GAN41.2%
L1-CMean + FML48.7%
cGAN Hybrid72%

🎯 Paper Target: 54.5%

Forgetting Rate Comparison

No Replay84.3%
Paper GAN67.1%
L1-CMean + FML60.7%
cGAN Hybrid28.6%

βœ“ cGAN achieves 2.3Γ— lower forgetting

BEST ACCURACY
72.0%
cGAN Hybrid
ABOVE TARGET
+17.5pp
vs Paper (54.5%)
LOWEST FORGETTING
28.6%
cGAN Method
IMPROVEMENT OVER BASELINE
2.4Γ—
Accuracy Gain
AttributePaper (AAAI 2025)Our cGAN Method
Generator InputNoise z onlyNoise z + class c
Discriminator InputFeature x onlyFeature x + class c
Replay StrategyUnconditional β†’ filterClass-conditioned
Filtering RequirementL1_C_Mean requiredNone required
Class AmbiguityHigh (100 classes)None
ScalabilityDegrades at scaleScales well
Mean Accuracy41.2% (our repro)72.0%
Forgetting Rate67.1%28.6%

Engineering Challenges & Solutions

Hardware

GPU Out-of-Memory

Problem:

Large discriminator exceeded 15GB VRAM on Colab T4

Solution:

Reduced discriminator dimensions + gradient checkpointing

βœ“ Enabled training on 100 classes

Training

Prohibitively Slow Training

Problem:

Full EMBER dataset made each task epoch take 45+ minutes

Solution:

Stratified subsampling: 1,000 samples per class per task

βœ“ Reduced per-task time from 45min to ~8min

Research

Class Ambiguity in Replay

Problem:

Unconditional GAN generates unlabeled samples; L1_C_Mean fails at 100 classes

Solution:

Designed conditional GAN with class embedding concatenation

βœ“ Mean accuracy improved from 41.2% to 72.0%

Training

GAN Training Instability

Problem:

Standard GAN loss caused mode collapse at >50 classes

Solution:

Switched to WGAN-GP + spectral normalization

βœ“ Consistent convergence across all 11 tasks

Research Team

Major Project Β· Keshav Memorial Institute of Technology (KMIT) Β· 2025

DY

D. Yashita

Researcher

24BD1A056B

SS

Shesha Sai

Researcher

24BD1A057J

VB

Vasuki Bothkurwar

Researcher

24BD1A057W

AV

Advaith Vuppula

Researcher

24BD1A05K2

TJ

Tanishq Jain

Researcher

24BD1A051T

FB

Dr. Badrinath

Faculty Mentor

KMIT

Development Timeline

8-phase SDLC from initial planning through final evaluation

01

Planning

Week 1

Requirements gathering and project scope definition

02

Design

Week 2–3

System architecture and algorithm design

03

Implementation

Week 3–5

cGAN and classifier implementation

04

Testing & Baseline

Week 5–6

Paper reproduction and baseline evaluation

05

Root Cause Analysis

Week 6–7

Identified class ambiguity issue

06

cGAN Extension

Week 7–9

Conditional GAN design and implementation

07

Final Evaluation

Week 9–10

Complete evaluation and results analysis

INSTITUTION

KMIT

Keshav Memorial Institute of Technology, Hyderabad

BASE PAPER

AAAI 2025

MalCL: Leveraging GAN-Based Generative Replay

CONTRIBUTION

cGAN Replay

Conditional GAN for improved continual learning

Built with v0