December 2025

Fine-Tuning Drug Discovery LLMs: 5 Hours, 30 Commits, AMD GPU Struggles

LLM Drug Discovery AMD HuggingFace

This is the story of building drug discovery AI models over 5 intense hours, resulting in 30+ GitHub commits, and learning why even the best AI coding assistants struggle with AMD GPUs.

🎯 The Goal

Build text classification models that predict drug approval likelihood from SMILES molecular strings. Not a chatbot - a specialized binary classifier for pharma R&D.

🖥️ The Setup

Local: RTX 3050 6GB - for ChemBERTa training
Cloud: AMD MI300X 192GB - for large model training
AI Assistant: Google Antigravity + Claude Opus 4.5

📊 Local Training (RTX 3050)

Started with ChemBERTa - a chemistry-specialized BERT model. With only 6GB VRAM, I used gradient checkpointing and small batch sizes. Training worked smoothly on NVIDIA - the CUDA ecosystem is mature and well-supported.

☁️ Moving to Cloud: AMD MI300X

For larger models, I needed serious GPU power. Why AMD? AMD offers GPU credits for developers through their developer program. Thanks to AMD for their support which made this project possible!

With 192GB HBM3 memory on the MI300X, my plan was to train GPT-OSS-120B or Llama-3.1-70B for better accuracy.

📊 Model Memory Requirements

Model	Parameters	Min VRAM	Status
ChemBERTa	85M	4GB	✅ Works on RTX 3050
Qwen 2.5 14B	14B	35GB	✅ Works on MI300X
Llama 3.1 70B	70B	140GB	❌ Training crashed
GPT-OSS 120B	120B	180GB	❌ OOM even with 4-bit

💥 The AMD GPU Challenge

This is where things got interesting. Even Claude Opus 4.5 - arguably the best code generation model - struggled to produce working code for AMD ROCm.

Issues encountered:

Memory allocation errors despite having 192GB VRAM
Device placement conflicts with HuggingFace Trainer
Quantization libraries (bitsandbytes) behaving differently on ROCm
Model loading timeouts and CUDA-specific code paths

🔄 The Model Journey: 120B → 14B

Original plan was GPT-OSS-120B. Reality hit hard:

120B: Out of memory even with 4-bit quantization
70B: Loaded but training crashed
14B (Qwen 2.5): Finally worked with 4-bit NF4 quantization

🔧 Key Fixes Required

Custom ModelWithClassifier wrapper - Base models needed classification heads
DeviceMapTrainer - Custom Trainer to skip device movement for device_map models
NaN handling - HuggingFace models produced NaN logits needing torch.nan_to_num()
Format detection - Evaluate script needed to detect HF vs PyTorch checkpoint formats

📈 Final Models on HuggingFace

Model	GPU	HuggingFace
ChemBERTa	RTX 3050 (Local)	kprsnt/drug-discovery-chemberta
Qwen 2.5 14B	AMD MI300X (Cloud)	kprsnt/drug-discovery-qwen-14b

💡 Key Takeaways

AMD GPUs need more AI tooling love. NVIDIA's ecosystem is years ahead.
Even best AI (Opus 4.5) isn't optimized for AMD. Most training data is CUDA-focused.
30 commits in 5 hours - iterative debugging is essential for new hardware.
Start smaller. 14B worked where 120B failed.
Antigravity is amazing - the agentic workflow made rapid iteration possible.

🙏 Credits & Acknowledgments

AMD - GPU credits for developers that made MI300X access possible
Google Antigravity - Agentic AI coding workflow
Claude Opus 4.5 - Code generation (despite AMD struggles!)
HuggingFace - Model hosting and Transformers library

🔮 Future Plans

These are text classification models. Next step: train a chat model that can explain drug predictions and answer pharma questions.

GitHub: github.com/kprsnt2/drug-discovery-chemberta