Most cloud comparison articles recycle the same vague advice: "AWS has the most services, Azure integrates with Microsoft, GCP is good for data." That is not useful when you are a startup founder staring at three pricing calculators trying to figure out where your runway goes furthest.
This post is a technical breakdown based on actual workloads, real pricing, and architecture decisions we have made across multiple startup engagements. We will cover compute, databases, serverless, Kubernetes, AI/ML, networking costs, compliance, and developer experience — then walk through a detailed case study with a real cost comparison.
Every price cited here is approximate and based on publicly available 2026 pricing for us-east-1 (AWS), East US (Azure), and us-central1 (GCP). Prices change. Always verify against the current pricing pages before making decisions.
1. Compute: EC2/Fargate vs Azure VMs/Container Apps vs GCE/Cloud Run
On-Demand Virtual Machines
For a general-purpose instance with 4 vCPUs and 16 GB RAM:
| Provider | Instance Type | On-Demand $/hr | Monthly (730 hrs) |
|---|---|---|---|
| AWS | m7i.xlarge | $0.192 | ~$140 |
| Azure | Standard_D4s_v5 | $0.192 | ~$140 |
| GCP | e2-standard-4 | $0.134 | ~$98 |
GCP's E2 instances consistently come in 20–30% cheaper than equivalent AWS and Azure general-purpose VMs. The catch: E2 instances share physical cores (similar to AWS T-series burstable instances), so for sustained CPU workloads, compare against GCP's N2 or C3 series instead, which price closer to AWS and Azure.
Spot and Preemptible Instances
All three providers offer discounted compute that can be reclaimed:
- ✓AWS Spot Instances: Up to 90% discount. Prices fluctuate based on supply and demand. You get a 2-minute warning before termination. Spot pricing varies by instance type and AZ — some instance types see frequent interruptions.
- ✓Azure Spot VMs: Similar discount range (up to 90%). You set a max price, and the VM is evicted when the market price exceeds it or Azure needs capacity. 30-second eviction notice.
- ✓GCP Spot VMs (replaced Preemptible): Up to 91% discount. Fixed discount — no bidding. VMs can be reclaimed at any time with a 30-second warning. Maximum lifetime removed (Preemptible had a 24-hour cap; Spot VMs do not).
For batch processing and stateless workloads, GCP Spot VMs are the simplest to reason about since the discount is fixed rather than market-driven.
Container-Native Compute
This is where things get interesting for startups that want to skip VM management:
- ✓AWS Fargate: Serverless containers for ECS/EKS. Priced per vCPU-hour ($0.04048) and per GB-hour ($0.004445). No cluster management, but you pay a premium over EC2.
- ✓Azure Container Apps: Built on Kubernetes (KEDA + Envoy). Consumption plan charges per vCPU-second ($0.000024) and per GB-second ($0.000003). Includes scale-to-zero.
- ✓GCP Cloud Run: Fully managed serverless containers. Per vCPU-second ($0.000024) and per GB-second ($0.0000025). Scale-to-zero included. Supports request-based and instance-based billing.
For request-driven microservices with variable traffic, Cloud Run and Azure Container Apps are materially cheaper than Fargate because they scale to zero. Fargate always runs at least one task.
Quick Cloud Run deployment:
# Build and deploy a container to Cloud Run
gcloud run deploy my-service \
--source . \
--region us-central1 \
--allow-unauthenticated \
--min-instances 0 \
--max-instances 10 \
--memory 512Mi \
--cpu 1
2. Database Services
Managed PostgreSQL
Most startups should start with PostgreSQL. Here is what each provider charges for a managed instance (4 vCPUs, 16 GB RAM, 200 GB storage):
| Provider | Service | Monthly Estimate |
|---|---|---|
| AWS | RDS PostgreSQL (db.m7g.xlarge) | ~$280 + $23 storage |
| Azure | Azure Database for PostgreSQL Flexible (D4s_v3) | ~$260 + $23 storage |
| GCP | Cloud SQL PostgreSQL (db-custom-4-16384) | ~$245 + $34 storage |
The base compute costs are within 15% of each other. Storage pricing differs — AWS and Azure charge ~$0.115/GB/month for general purpose SSD, GCP charges ~$0.170/GB/month. For small databases this is noise; at multi-TB scale it matters.
Beyond Standard PostgreSQL
This is where the providers diverge significantly:
AWS Aurora PostgreSQL: Drop-in PostgreSQL compatible with a custom storage engine. 3x throughput over standard PostgreSQL (AWS's claim — real-world gains vary by workload). Storage auto-scales. Aurora Serverless v2 scales compute in 0.5 ACU increments, useful for variable workloads. Pricing starts at $0.12/ACU-hour.
Azure Cosmos DB: Globally distributed, multi-model database. Not a PostgreSQL replacement — it is a different paradigm. Priced in Request Units (RU/s). 400 RU/s (minimum) costs ~$23/month. Gets expensive fast at scale: 10,000 RU/s costs ~$580/month. The PostgreSQL interface for Cosmos DB (vCore-based) launched in 2023 and is now GA, but it is fundamentally Cosmos with a PostgreSQL wire protocol, not managed PostgreSQL.
GCP AlloyDB: PostgreSQL compatible, claims 4x throughput over standard PostgreSQL and 100x faster analytical queries through a columnar engine. Priced at ~$0.1386/vCPU-hour for primary instances. Compelling for mixed OLTP/OLAP workloads where you would otherwise need a separate analytics database.
GCP Spanner: Globally consistent, horizontally scalable relational database. Starts at $0.90/node-hour (~$657/month for one node). Only consider this if you genuinely need global strong consistency at scale — for most startups, it is overkill.
Decision Framework
- ✓Default choice: Managed PostgreSQL on any provider. Portable, well-understood, enormous ecosystem.
- ✓Need auto-scaling compute: Aurora Serverless v2 (AWS) or AlloyDB with read pool autoscaling (GCP).
- ✓Multi-region strong consistency: Spanner (GCP) or Cosmos DB (Azure). Both are expensive.
- ✓Mixed OLTP + analytics: AlloyDB (GCP) avoids needing a separate data warehouse for moderate analytical queries.
3. Serverless Functions
Lambda vs Azure Functions vs Cloud Functions
| Metric | AWS Lambda | Azure Functions | GCP Cloud Functions |
|---|---|---|---|
| Price per 1M invocations | $0.20 | $0.20 | $0.40 |
| Price per GB-second | $0.0000166667 | $0.000016 | $0.0000025 (gen2) |
| Free tier (monthly) | 1M requests, 400K GB-s | 1M requests, 400K GB-s | 2M requests, 400K GB-s |
| Max execution time | 15 min | 10 min (Consumption) | 9 min (HTTP), 60 min (event) |
| Max memory | 10,240 MB | 1,536 MB (Consumption) | 32,768 MB (gen2) |
| Cold start (Node.js) | 100–300ms | 200–500ms | 100–400ms |
| Cold start (Java) | 1–5s | 2–8s | 1–6s |
Cold start numbers are indicative — they depend on package size, runtime, VPC configuration, and whether provisioned concurrency is enabled. AWS Lambda with SnapStart (Java) brings cold starts down to ~200ms. Azure Functions on the Premium plan eliminates cold starts entirely (pre-warmed instances) but costs significantly more.
For cost at scale, consider a workload doing 50M invocations/month with an average 256 MB memory and 200ms duration:
AWS Lambda:
Invocations: 50M × $0.20/1M = $10
Compute: 50M × 0.256 GB × 0.2s = 2,560,000 GB-s × $0.0000166667 = $42.67
Total: ~$53/month
Azure Functions:
Invocations: 50M × $0.20/1M = $10
Compute: 2,560,000 GB-s × $0.000016 = $40.96
Total: ~$51/month
GCP Cloud Functions (gen2):
Invocations: 50M × $0.40/1M = $20
Compute: 50M × 0.256 GB × 0.2s = 2,560,000 GB-s × $0.0000025 = $6.40
vCPU: 50M × 1 vCPU × 0.2s = 10,000,000 vCPU-s × $0.0000100 = $100
Total: ~$126/month
GCP Cloud Functions gen2 is more expensive at this scale because it bills vCPU and memory separately. For memory-heavy, short-duration functions, Lambda and Azure Functions win on price. For CPU-heavy, longer-duration functions, GCP's pricing can be competitive.
4. Startup Credit Programs
This is often the single biggest factor in a startup's initial cloud choice.
| Program | Credits | Duration | Qualification |
|---|---|---|---|
| AWS Activate | Up to $100K | 1–2 years | Must be associated with an approved accelerator, incubator, or VC. Self-funded startups get the Founders tier ($1K credits). |
| Microsoft for Startups (Founders Hub) | Up to $150K | 1 year | Open application. No VC requirement. Tiered — most startups start at $1K–$5K and gain access to more by hitting milestones. |
| Google for Startups Cloud | Up to $200K | 2 years | Must be affiliated with a partner VC, accelerator, or apply directly. Also includes $2,500 in Firebase and Google Maps credits. |
The fine print that matters:
- ✓AWS Activate's $100K tier is accessible primarily through their network of approved accelerators and VCs. If you are not backed by a recognized fund, expect the $1K Founders tier. Some startups qualify for $10K–$25K through regional programs.
- ✓Azure's Founders Hub is the most accessible — any startup can apply regardless of funding status. The $150K figure is the maximum after reaching all tiers, which requires demonstrating product traction.
- ✓Google's program gives the highest maximum credits but qualification is stricter. The $200K tier typically requires Series A or later with an approved VC partner. Seed-stage startups through the general application usually receive $2K–$10K.
Practical advice: Apply to all three. Use credits on the provider whose services fit your architecture, not the other way around. Building your entire stack to match a credit program is a form of lock-in.
5. Managed Kubernetes: EKS vs AKS vs GKE
| Feature | EKS | AKS | GKE |
|---|---|---|---|
| Control plane cost | $0.10/hr ($73/month) | Free | Free (Standard); $0.10/hr (Autopilot) |
| Node auto-provisioning | Karpenter | KEDA + Cluster Autoscaler | GKE Autopilot (built-in) |
| Max pods per node | 110 (default VPC CNI) | 250 | 110 (standard), 256 (GKE Dataplane V2) |
| Managed node updates | Managed node groups | Auto-upgrade (default) | Auto-upgrade + surge upgrades |
| Service mesh | App Mesh (deprecated) → use Istio | Open Service Mesh (deprecated) → use Istio | Anthos Service Mesh (managed Istio) |
| GPU support | Yes (P4, P5 instances) | Yes (NC, ND series) | Yes (T4, A100, H100 via node pools) |
The cost difference is real. EKS charges $73/month just for the control plane before you run a single pod. For a startup running 3 small services, this is a non-trivial fixed cost. AKS and GKE Standard do not charge for the control plane.
GKE Autopilot deserves specific mention: it manages nodes entirely, billing per pod resource request rather than per node. For small, variable workloads this removes the bin-packing problem and can reduce costs significantly:
# Create a GKE Autopilot cluster
gcloud container clusters create-auto my-cluster \
--region us-central1 \
--release-channel regular
# Compare: Create an EKS cluster (requires eksctl)
eksctl create cluster \
--name my-cluster \
--region us-east-1 \
--nodegroup-name workers \
--node-type t3.medium \
--nodes 3 \
--nodes-min 1 \
--nodes-max 5
For startups with fewer than 10 services, consider whether you need Kubernetes at all. Cloud Run, Azure Container Apps, or Fargate handle most microservice architectures without cluster management overhead.
6. AI/ML Services
Model Training
| Provider | Service | GPU Instance (per hr) | Managed Training Job |
|---|---|---|---|
| AWS | SageMaker | $3.825 (ml.g5.xlarge, 1x A10G) | ~$4.59/hr (20% markup) |
| Azure | Azure ML | $3.67 (NC6s_v3, 1x V100) | Compute cost only (no markup) |
| GCP | Vertex AI | $3.22 (n1-standard-8 + 1x T4) | ~$3.86/hr (20% markup) |
SageMaker and Vertex AI charge a markup over raw compute for managed training jobs. Azure ML does not — you pay the underlying compute cost. However, Azure ML's experiment tracking and pipeline tooling require more manual configuration.
Inference
For serving a custom image classification model (batch + real-time):
- ✓SageMaker Inference: Real-time endpoints start at $0.0576/hr (ml.t2.medium). Serverless inference is available but has cold start. Multi-model endpoints reduce costs when serving multiple models.
- ✓Azure ML Online Endpoints: Priced at VM compute cost. Supports managed online endpoints with autoscaling. No markup over VM pricing.
- ✓Vertex AI Prediction: $0.0350/hr (n1-standard-2) for online prediction. Batch prediction charged per node-hour. Supports autoscaling to zero on custom containers.
For startups doing inference, Vertex AI's scale-to-zero on custom prediction containers is a meaningful cost saver during development when traffic is sporadic.
Pre-built APIs
For common ML tasks (vision, NLP, translation), all three offer pre-built APIs priced per request:
# Example: GCP Vision API — classify an image
gcloud ml vision detect-labels gs://my-bucket/image.jpg
# Example: AWS Rekognition — detect labels
aws rekognition detect-labels \
--image '{"S3Object":{"Bucket":"my-bucket","Name":"image.jpg"}}' \
--max-labels 10
Pricing is comparable across providers for pre-built APIs (~$1–$1.50 per 1,000 images for label detection). The differentiator is accuracy for your specific use case — run a benchmark on your actual data before committing.
7. Networking Costs: The Hidden Budget Killer
Data egress is where cloud bills quietly balloon. Ingress is free on all three providers. Egress pricing:
| Tier | AWS | Azure | GCP |
|---|---|---|---|
| First 1 GB/month | Free | Free | Free |
| 1–10 TB/month | $0.09/GB | $0.087/GB | $0.12/GB |
| 10–50 TB/month | $0.085/GB | $0.083/GB | $0.11/GB |
| 50–150 TB/month | $0.07/GB | $0.07/GB | $0.08/GB |
Inter-AZ traffic (within the same region, across availability zones):
| Provider | Cost |
|---|---|
| AWS | $0.01/GB each direction ($0.02 round-trip) |
| Azure | Free (within the same region) |
| GCP | $0.01/GB |
This matters for Kubernetes clusters and distributed databases. A chatty microservice architecture on AWS across 3 AZs can generate significant inter-AZ charges. Azure's free intra-region traffic is a genuine advantage for architectures with high internal data movement.
Example: A startup pushing 5 TB/month of egress (API responses, CDN origin pulls, backup replication):
AWS: (1 GB free) + (9,999 GB × $0.09) = ~$450/month
Azure: (1 GB free) + (9,999 GB × $0.087) = ~$435/month
GCP: (1 GB free) + (9,999 GB × $0.12) = ~$600/month
At high egress volumes, GCP is noticeably more expensive. If your startup serves large media files or high-traffic APIs, factor egress heavily in your cost model. Consider using a CDN (CloudFront, Azure CDN, Cloud CDN) — CDN egress is typically 30–50% cheaper than direct compute egress.
Google introduced Premium and Standard network tiers. Standard tier egress (which routes via public internet rather than Google's backbone) is priced at $0.085/GB for the first 10 TB — competitive with AWS and Azure, but with potentially higher latency.
8. Developer Experience
CLI Tools
AWS CLI (aws): Comprehensive but verbose. Consistent aws <service> <action> syntax. JSON output by default. Autocomplete available but not enabled by default. Configuration via ~/.aws/credentials and profiles.
# List running EC2 instances — requires JMESPath for useful output
aws ec2 describe-instances \
--filters "Name=instance-state-name,Values=running" \
--query "Reservations[].Instances[].[InstanceId,InstanceType,State.Name]" \
--output table
Azure CLI (az): Readable command structure. Good interactive mode (az interactive). Outputs JSON by default but supports table, TSV, YAML. Login flow can be frustrating with multi-tenant Azure AD.
# List running VMs — cleaner default output than AWS
az vm list --show-details \
--query "[?powerState=='VM running'].[name, resourceGroup, hardwareProfile.vmSize]" \
--output table
Google Cloud CLI (gcloud): The most opinionated CLI. Project and region context reduces repetitive flags. gcloud init onboarding is the smoothest of the three. Interactive SSH, SCP, and log tailing built-in.
# List running instances — context-aware, less boilerplate
gcloud compute instances list --filter="status=RUNNING"
Console Quality
- ✓AWS Console: Feature-rich but cluttered. Finding services requires the search bar — the categorized menu is overwhelming. Individual service consoles vary wildly in quality (S3 console is great; IAM policy editor is painful).
- ✓Azure Portal: The most visually polished. Resource groups provide logical organization. The portal occasionally surfaces stale data — always verify with CLI. Cost Management integration in the portal is excellent.
- ✓GCP Console: Clean and fast. The project-scoped model keeps things organized. The integrated Cloud Shell (browser-based terminal with
gcloudpre-configured) is genuinely useful for quick tasks.
Documentation and SDK Quality
AWS documentation is the most comprehensive by volume but inconsistent in quality. Some pages have not been updated in years. Azure docs are well-structured with clear "quickstart → tutorial → how-to → reference" progression. GCP documentation is the most concise and usually includes working code samples in multiple languages in the same page.
SDK quality across all three is mature. AWS SDK for Python (boto3) and JavaScript (v3) are excellent. Azure SDKs went through a major quality improvement in 2023 (unified @azure/ packages). GCP client libraries are idiomatic and well-typed.
9. Compliance Certifications
| Certification | AWS | Azure | GCP |
|---|---|---|---|
| SOC 2 Type II | Yes (all services) | Yes (all services) | Yes (all services) |
| HIPAA | Yes (requires BAA) | Yes (requires BAA) | Yes (requires BAA) |
| PCI-DSS | Yes (Level 1) | Yes (Level 1) | Yes (Level 1) |
| GDPR | Yes (DPA available) | Yes (DPA included in terms) | Yes (DPA included in terms) |
| ISO 27001 | Yes | Yes | Yes |
| FedRAMP High | Yes (GovCloud) | Yes (Azure Government) | Yes (Assured Workloads) |
HIPAA specifics: All three require you to sign a Business Associate Agreement (BAA). The BAA does not cover all services — only designated "HIPAA-eligible" services are covered.
- ✓AWS: ~160 HIPAA-eligible services. Broadest coverage. BAA via AWS Artifact.
- ✓Azure: ~130 HIPAA-eligible services. BAA is part of the Online Services Terms.
- ✓GCP: ~120 HIPAA-eligible services. BAA via the Google Cloud console.
For healthtech startups, verify that every service in your architecture is on the provider's HIPAA-eligible list before signing the BAA. A common gotcha: managed Kafka is HIPAA-eligible on AWS (MSK) and Azure (Event Hubs), but GCP's managed Kafka offering reached HIPAA eligibility only recently.
10. Pricing Calculators Are Unreliable
Every provider has a pricing calculator. None of them are accurate for real workloads. Here is why:
- ✓They miss cross-service costs: Data transfer between services (e.g., Lambda reading from S3, DynamoDB streams triggering Lambda) generates charges that are easy to overlook.
- ✓They assume steady-state: Real traffic is bursty. Autoscaling means your actual compute usage does not match your calculator estimate.
- ✓They hide support costs: AWS Business Support is 10% of your monthly bill (minimum $100/month). Azure and GCP have similar tiers. This is not in the calculator by default.
What to Do Instead
Run parallel workloads for 7–30 days on each provider and compare actual bills. This is the only reliable method.
Use billing APIs to track costs programmatically:
# AWS — Get cost breakdown for last 30 days
aws ce get-cost-and-usage \
--time-period Start=2025-01-01,End=2025-01-31 \
--granularity MONTHLY \
--metrics "BlendedCost" \
--group-by Type=DIMENSION,Key=SERVICE \
--output json
# GCP — Export billing to BigQuery, then query
bq query --use_legacy_sql=false '
SELECT
service.description AS service,
ROUND(SUM(cost), 2) AS total_cost
FROM `my-project.billing_export.gcp_billing_export_v1_XXXXXX`
WHERE invoice.month = "202501"
GROUP BY service
ORDER BY total_cost DESC
LIMIT 20
'
# Azure — Get cost breakdown using Cost Management API
az costmanagement query \
--type ActualCost \
--timeframe MonthToDate \
--dataset-aggregation '{"totalCost":{"name":"Cost","function":"Sum"}}' \
--dataset-grouping name=ServiceName type=Dimension \
--scope "/subscriptions/<subscription-id>"
Set billing alerts at 50%, 75%, and 90% of your budget on day one:
# GCP budget alert
gcloud billing budgets create \
--billing-account=XXXXXX-XXXXXX-XXXXXX \
--display-name="Monthly Budget" \
--budget-amount=2000 \
--threshold-rule=percent=0.5 \
--threshold-rule=percent=0.75 \
--threshold-rule=percent=0.9
11. Lock-in Analysis
Not all managed services are created equal in terms of portability. Here is a lock-in risk assessment:
Low Lock-in (Portable)
- ✓Managed PostgreSQL/MySQL: RDS, Azure Database, Cloud SQL. Standard SQL engines. Migration is a
pg_dumpandpg_restoreaway. - ✓Object storage: S3, Azure Blob, GCS. S3 API is the de facto standard — MinIO provides an S3-compatible layer. Most tools support all three.
- ✓Kubernetes: EKS, AKS, GKE. Workloads are portable via standard Kubernetes manifests. Cluster-level configs (IAM, networking) need rework.
- ✓Container registries: ECR, ACR, Artifact Registry. OCI-compliant images work everywhere.
Medium Lock-in
- ✓Serverless functions: Lambda, Azure Functions, Cloud Functions. The function code is portable; the event bindings, IAM, and deployment tooling are not.
- ✓Managed Kafka: MSK, Azure Event Hubs (Kafka protocol), GCP Managed Kafka. Kafka protocol is standard, but operational configs differ.
- ✓CDN: CloudFront, Azure CDN, Cloud CDN. Configuration and edge function runtimes (CloudFront Functions, Azure Edge Workers) are provider-specific.
High Lock-in (Proprietary)
- ✓DynamoDB (AWS): No wire-compatible alternative. ScyllaDB offers a DynamoDB-compatible API, but it is not a drop-in replacement for complex access patterns. Migration requires data modeling changes.
- ✓Cosmos DB (Azure): Multi-model, globally distributed. The closest equivalent on other providers requires assembling multiple services.
- ✓Spanner (GCP): Globally consistent relational database with horizontal scaling. No equivalent exists on AWS or Azure. CockroachDB is the closest open-source alternative.
- ✓BigQuery (GCP): Serverless analytics warehouse with a unique pricing model (per-query). AWS Athena is conceptually similar but architecturally different.
- ✓Aurora (AWS): PostgreSQL-compatible wire protocol but a proprietary storage engine. You can migrate data out, but you lose the performance characteristics.
Mitigation strategy: Use infrastructure-as-code (Terraform, Pulumi) with provider-agnostic abstractions where possible. For databases, prefer PostgreSQL or MySQL-compatible services unless a proprietary service offers a capability you genuinely cannot replicate.
# Terraform — provider-agnostic PostgreSQL pattern
# Swap the resource block to migrate between providers
# AWS
resource "aws_db_instance" "postgres" {
engine = "postgres"
engine_version = "16.1"
instance_class = "db.t4g.medium"
allocated_storage = 200
}
# GCP equivalent
resource "google_sql_database_instance" "postgres" {
database_version = "POSTGRES_16"
settings {
tier = "db-custom-4-16384"
disk_size = 200
}
}
Case Study: Healthtech Startup Cloud Evaluation
Context
A Series A healthtech startup — $2M ARR, 15 engineers, HIPAA compliance required — needed to choose a primary cloud provider. Their workload:
- ✓500K API calls/day (REST, with peaks at 2x during morning hours US Eastern)
- ✓PostgreSQL database with 200 GB of data (mostly patient records and appointment metadata)
- ✓3 containerized microservices (API gateway, scheduling service, notification service)
- ✓ML inference pipeline processing ~8,000 medical images/day for classification
The Stripe Systems engineering team ran a 30-day parallel evaluation, deploying the identical workload across all three providers using Terraform. Here is what we found.
Infrastructure Setup
We used the same Terraform modules with provider-specific resource definitions. The core Terraform structure:
# modules/workload/main.tf — shared workload definition
variable "provider_name" {}
variable "db_connection_string" {}
variable "container_image" {}
variable "ml_model_path" {}
# Provider-specific implementations in:
# environments/aws/main.tf
# environments/azure/main.tf
# environments/gcp/main.tf
For the container workload, we chose each provider's managed container platform (no Kubernetes — unnecessary for 3 services):
# GCP Cloud Run deployment (example for the API gateway)
gcloud run deploy api-gateway \
--image us-central1-docker.pkg.dev/healthco-eval/services/api-gw:v1.2 \
--region us-central1 \
--memory 1Gi \
--cpu 2 \
--min-instances 1 \
--max-instances 20 \
--set-env-vars "DB_HOST=$DB_IP,ML_ENDPOINT=$ML_URL" \
--vpc-connector healthco-connector \
--ingress internal-and-cloud-load-balancing
# AWS Fargate task definition registration
aws ecs register-task-definition \
--family api-gateway \
--network-mode awsvpc \
--requires-compatibilities FARGATE \
--cpu "1024" \
--memory "2048" \
--container-definitions '[{
"name": "api-gw",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api-gw:v1.2",
"portMappings": [{"containerPort": 8080, "protocol": "tcp"}],
"environment": [
{"name": "DB_HOST", "value": "'$DB_HOST'"},
{"name": "ML_ENDPOINT", "value": "'$ML_URL'"}
]
}]'
30-Day Cost Results
After running identical workloads for 30 days, here are the actual bills:
| Service Category | AWS | Azure | GCP |
|---|---|---|---|
| Compute (containers) | $624 (Fargate) | $410 (Container Apps) | $318 (Cloud Run) |
| Database (PostgreSQL, 200GB) | $303 (RDS) | $283 (Flexible Server) | $279 (Cloud SQL) |
| ML Inference (image classification) | $520 (SageMaker endpoint) | $445 (Azure ML endpoint) | $290 (Vertex AI w/ scale-to-zero) |
| Object Storage (model artifacts + images) | $12 (S3) | $11 (Blob) | $10 (GCS) |
| Load Balancer | $22 (ALB) | $18 (App Gateway basic) | $0 (included with Cloud Run) |
| Data Egress (~800 GB) | $72 | $70 | $96 |
| Analytics (query pipeline logs) | $85 (Athena + S3) | $110 (Synapse serverless) | $42 (BigQuery on-demand) |
| Monitoring & Logging | $45 (CloudWatch) | $55 (Monitor) | $38 (Cloud Logging) |
| Total Monthly | $1,683 | $1,402 | $1,073 |
Cost Analysis
The three biggest differentiators:
- ✓
Compute (containers): Cloud Run's scale-to-zero and per-request billing saved ~$300/month over Fargate. The API gateway and notification service had periods of near-zero traffic (nights, weekends). Fargate kept minimum tasks running; Cloud Run scaled to zero during idle periods. Azure Container Apps also scaled to zero and came in second.
- ✓
ML Inference: The ML pipeline processed images in batches (every 15 minutes). Vertex AI's custom prediction container with scale-to-zero meant we only paid for GPU time during actual inference. SageMaker's real-time endpoint ran continuously. Azure ML sat in between — autoscaling was available but minimum instance count was 1.
- ✓
Analytics: BigQuery's on-demand pricing ($6.25/TB queried) was ideal for ad hoc analysis of pipeline logs and API metrics. We queried ~6 TB over the month. Athena charged similarly per query but required managing data in S3 with specific formats. Azure Synapse serverless had higher per-TB costs.
Decision Matrix
Scored 1–5 (5 = best for this specific workload):
| Criteria | Weight | AWS | Azure | GCP |
|---|---|---|---|---|
| Monthly cost | 25% | 2 | 3 | 5 |
| Scale-to-zero (containers) | 15% | 2 | 4 | 5 |
| HIPAA compliance tooling | 15% | 5 | 4 | 4 |
| ML inference flexibility | 15% | 4 | 3 | 5 |
| Analytics (serverless SQL) | 10% | 3 | 3 | 5 |
| PostgreSQL compatibility | 10% | 5 | 4 | 4 |
| Startup credits available | 10% | 3 | 4 | 5 |
| Weighted Score | 3.25 | 3.50 | 4.75 |
Why GCP Won
For this specific workload, GCP won on cost and ML flexibility. The combination of Cloud Run (scale-to-zero containers), Vertex AI (scale-to-zero inference), and BigQuery (serverless analytics) created a stack where the startup paid almost exclusively for actual usage rather than reserved capacity.
The startup also qualified for $100K in Google for Startups Cloud credits through their accelerator, which effectively made the first 12+ months free on GCP at their current spend rate.
What would have changed the outcome:
- ✓If the workload required always-on inference endpoints (e.g., real-time video analysis), SageMaker's multi-model endpoints might have been more cost-effective.
- ✓If the team was deeply embedded in the Microsoft ecosystem (Active Directory, Teams integrations, Power BI), Azure's integration advantages would outweigh the cost difference.
- ✓If the startup needed the broadest service catalog (e.g., IoT, specific managed databases, or niche ML services), AWS's 200+ services provide options that GCP and Azure do not match.
Post-Migration Validation
After choosing GCP, we monitored the production workload for 60 days. Actual production costs averaged $1,120/month — within 5% of the evaluation period, confirming the evaluation methodology was sound.
# Post-migration cost monitoring — BigQuery billing export query
bq query --use_legacy_sql=false '
SELECT
service.description,
ROUND(SUM(cost), 2) AS monthly_cost,
ROUND(SUM(cost) / 30, 2) AS daily_avg
FROM `healthco-prod.billing.gcp_billing_export_v1_XXXXXX`
WHERE usage_start_time >= TIMESTAMP("2025-02-01")
AND usage_start_time < TIMESTAMP("2025-03-01")
GROUP BY service.description
HAVING monthly_cost > 1
ORDER BY monthly_cost DESC
'
Conclusion
There is no universally correct cloud provider. The right choice depends on your specific workload characteristics, team expertise, compliance requirements, and cost sensitivity.
For startups optimizing for cost on variable, request-driven workloads: GCP's scale-to-zero ecosystem (Cloud Run + Vertex AI + BigQuery) is hard to beat.
For startups needing the broadest service catalog and largest talent pool: AWS remains the default safe choice.
For startups in the Microsoft ecosystem or needing free intra-region networking: Azure offers real technical and cost advantages.
Run the evaluation. Measure actual costs. Make the decision based on data, not marketing materials.
Ready to discuss your project?
Get in Touch →