October 2025

Self-Hosting LLMs on Google Cloud Run

GCP Ollama Docker

Want your own ChatGPT-like interface without sending data to third parties? Here's how I deployed Ollama with Open WebUI on Google Cloud Run.

Architecture

The setup uses Cloud Run for autoscaling, Cloud Storage for model persistence, and Artifact Registry for container images.

Why Cloud Run?

Pay only when in use (scale to zero)
Automatic HTTPS and domain mapping
Easy updates with container deployments

Challenges

The main challenge was model loading time. Cold starts can take 30+ seconds for large models. I solved this by using smaller models (Gemma 2B) for quick responses and caching frequently used sessions.

Architecture

Why Cloud Run?

Challenges

Prashanth Kumar Kadasi