Back to Blog
October 2025

Self-Hosting LLMs on Google Cloud Run

GCP Ollama Docker

Want your own ChatGPT-like interface without sending data to third parties? Here's how I deployed Ollama with Open WebUI on Google Cloud Run.

Architecture

The setup uses Cloud Run for autoscaling, Cloud Storage for model persistence, and Artifact Registry for container images.

Why Cloud Run?

  • Pay only when in use (scale to zero)
  • Automatic HTTPS and domain mapping
  • Easy updates with container deployments

Challenges

The main challenge was model loading time. Cold starts can take 30+ seconds for large models. I solved this by using smaller models (Gemma 2B) for quick responses and caching frequently used sessions.


PK
Prashanth Kumar Kadasi

Data Analyst & AI Developer