Fine-Tuning Drug Discovery LLMs: 5 Hours, 30 Commits, AMD GPU Struggles
This is the story of building drug discovery AI models over 5 intense hours, resulting in 30+ GitHub commits, and learning why even the best AI coding assistants struggle with AMD GPUs.
🎯 The Goal
Build text classification models that predict drug approval likelihood from SMILES molecular strings. Not a chatbot - a specialized binary classifier for pharma R&D.
🖥️ The Setup
- Local: RTX 3050 6GB - for ChemBERTa training
- Cloud: AMD MI300X 192GB - for large model training
- AI Assistant: Google Antigravity + Claude Opus 4.5
📊 Local Training (RTX 3050)
Started with ChemBERTa - a chemistry-specialized BERT model. With only 6GB VRAM, I used gradient checkpointing and small batch sizes. Training worked smoothly on NVIDIA - the CUDA ecosystem is mature and well-supported.
☁️ Moving to Cloud: AMD MI300X
For larger models, I needed serious GPU power. Why AMD? AMD offers GPU credits for developers through their developer program. Thanks to AMD for their support which made this project possible!
With 192GB HBM3 memory on the MI300X, my plan was to train GPT-OSS-120B or Llama-3.1-70B for better accuracy.
📊 Model Memory Requirements
| Model | Parameters | Min VRAM | Status |
|---|---|---|---|
| ChemBERTa | 85M | 4GB | ✅ Works on RTX 3050 |
| Qwen 2.5 14B | 14B | 35GB | ✅ Works on MI300X |
| Llama 3.1 70B | 70B | 140GB | ❌ Training crashed |
| GPT-OSS 120B | 120B | 180GB | ❌ OOM even with 4-bit |
💥 The AMD GPU Challenge
This is where things got interesting. Even Claude Opus 4.5 - arguably the best code generation model - struggled to produce working code for AMD ROCm.
Issues encountered:
- Memory allocation errors despite having 192GB VRAM
- Device placement conflicts with HuggingFace Trainer
- Quantization libraries (bitsandbytes) behaving differently on ROCm
- Model loading timeouts and CUDA-specific code paths
🔄 The Model Journey: 120B → 14B
Original plan was GPT-OSS-120B. Reality hit hard:
- 120B: Out of memory even with 4-bit quantization
- 70B: Loaded but training crashed
- 14B (Qwen 2.5): Finally worked with 4-bit NF4 quantization
🔧 Key Fixes Required
- Custom ModelWithClassifier wrapper - Base models needed classification heads
- DeviceMapTrainer - Custom Trainer to skip device movement for device_map models
- NaN handling - HuggingFace models produced NaN logits needing torch.nan_to_num()
- Format detection - Evaluate script needed to detect HF vs PyTorch checkpoint formats
📈 Final Models on HuggingFace
| Model | GPU | HuggingFace |
|---|---|---|
| ChemBERTa | RTX 3050 (Local) | kprsnt/drug-discovery-chemberta |
| Qwen 2.5 14B | AMD MI300X (Cloud) | kprsnt/drug-discovery-qwen-14b |
💡 Key Takeaways
- AMD GPUs need more AI tooling love. NVIDIA's ecosystem is years ahead.
- Even best AI (Opus 4.5) isn't optimized for AMD. Most training data is CUDA-focused.
- 30 commits in 5 hours - iterative debugging is essential for new hardware.
- Start smaller. 14B worked where 120B failed.
- Antigravity is amazing - the agentic workflow made rapid iteration possible.
🙏 Credits & Acknowledgments
- AMD - GPU credits for developers that made MI300X access possible
- Google Antigravity - Agentic AI coding workflow
- Claude Opus 4.5 - Code generation (despite AMD struggles!)
- HuggingFace - Model hosting and Transformers library
🔮 Future Plans
These are text classification models. Next step: train a chat model that can explain drug predictions and answer pharma questions.