Self-Hosting LLMs on Google Cloud Run
GCP
Ollama
Docker
Want your own ChatGPT-like interface without sending data to third parties? Here's how I deployed Ollama with Open WebUI on Google Cloud Run.
Architecture
The setup uses Cloud Run for autoscaling, Cloud Storage for model persistence, and Artifact Registry for container images.
Why Cloud Run?
- Pay only when in use (scale to zero)
- Automatic HTTPS and domain mapping
- Easy updates with container deployments
Challenges
The main challenge was model loading time. Cold starts can take 30+ seconds for large models. I solved this by using smaller models (Gemma 2B) for quick responses and caching frequently used sessions.