koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 00 / 35

Day 0 — Spin Up a Real Kubernetes Cluster in One Command

Before you debug a single Pending pod, you need a real cluster. Here is how to get one on AWS, GCP, or Azure with a single command — and the Terraform that makes it boring.

KV
Koti Vellanki19 Mar 20268 min read
kuberneteseksaksgketerraformsetup
Day 0 — Spin Up a Real Kubernetes Cluster in One Command

You cannot debug a cluster you do not have.

Every post in this series ("Mastering Kubernetes the Right Way") assumes you can run kubectl apply -f issue.yaml, watch something break, and run kubectl apply -f fix.yaml to put it back. That assumption falls apart on the very first day if you do not already have a cluster — and "spin up a Kubernetes cluster" is the kind of task that has eaten entire afternoons of mine.

This is Day 0. By the end of this post you will have a working, real, cloud-managed Kubernetes cluster on AWS, GCP, or Azure. One command. Real Terraform underneath. No magic.

What we are building

A managed Kubernetes cluster — your choice of AWS EKS, GCP GKE, or Azure AKS. Each one comes with:

  • Network: a VPC (or VNet), public + private subnets, NAT gateways
  • Control plane: managed by the cloud provider, you do not babysit it
  • Worker nodes: a managed node group sized for dev
  • kubeconfig: written to your laptop so kubectl Just Works
  • Tear-down: one command, leaves nothing behind, leaves no bill behind

The whole thing comes from one repo: vellankikoti/resources-creator. Terraform modules per cloud, a thin shell wrapper, and one script that hides all of it behind a flag.

Prerequisites (15 minutes, once)

You need four command-line tools — Terraform, kubectl, Helm, and the CLI for your cloud (one of AWS / Google Cloud / Azure) — plus a logged-in cloud account. Pick your operating system:

The Homebrew way. Works on Apple Silicon and Intel Macs. Tested on macOS 14 Sonoma and 15 Sequoia. Run each step on its own — if one fails, you can re-run only that step instead of starting over.

  1. Install Homebrew

    skip if you already have it

    If running brew --version shows nothing, install Homebrew first:

    bash
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    ✓ Success looks like: brew --version prints something like Homebrew 4.4.x.
    ⚠ Heads up: On Apple Silicon, the installer drops Homebrew at /opt/homebrew/bin. The installer prints two eval commands at the end — copy them into your ~/.zshrc and run them, or new shells will not find brew.
  2. Install the core tools

    Terraform, kubectl, Helm

    These three tools work for all three clouds.

    bash
    brew install terraform kubectl helm
    ✓ Success looks like: terraform version shows v1.10.x or newer, kubectl version --client shows v1.31.x or newer, helm version shows v3.16.x or newer.
    terraform version output in a macOS terminal
    terraform --version — a clean install prints the binary version plus a provider list.
    kubectl version output in a macOS terminal
    kubectl version --client — client-only check so you don't need a cluster to verify the install.
    helm version output in a macOS terminal
    helm version — short and sweet, one line is all you need.
  3. Install your cloud CLI

    pick exactly one

    Pick the one for the cluster you want to spin up. You only need one — installing all three is wasteful.

    bash
    # AWS brew install awscli
    bash
    # GCP brew install --cask google-cloud-sdk
    bash
    # Azure brew install azure-cli
  4. Authenticate with your cloud

    needs to match your CLI choice
    bash
    # AWS — paste your access key, secret, default region (e.g. us-east-1) aws configure
    aws sts get-caller-identity JSON output with account, user id, and arn
    aws sts get-caller-identity — the caller identity JSON is your proof the CLI is wired up. Account / UserId / Arn values shown here are placeholders.
    bash
    # GCP — opens a browser to log in, then writes credentials Terraform can read gcloud init gcloud auth application-default login
    bash
    # Azure — opens a browser; in headless setups use `az login --use-device-code` az login
    az account show JSON output with the active subscription, tenant, and user
    az account show — the subscription list is your sign the session is live. Tenant / subscription / email values shown here are placeholders.
    ✓ Success looks like: aws sts get-caller-identity prints your account ID, OR gcloud config get-value project prints your project, OR az account show prints your subscription.
    ◆ Tip: Run which terraform to verify it points inside /opt/homebrew/bin on Apple Silicon. If it points to /usr/local/bin, you have the old Intel Homebrew shadow-installed and your CLIs may load the wrong architecture — fix it now or you'll fight it later.

Verify everything is alive

Once installed, run these on every OS — same commands, same expected output:

bash
terraform version # should print v1.10.x or newer kubectl version --client # should print Client Version: v1.31.x or newer helm version # should print v3.16.x or newer # And ONE of these depending on your cloud: aws sts get-caller-identity # AWS — should print your account ID gcloud config get-value project # GCP — should print your project ID az account show # Azure — should print your subscription

If any of those fail, fix them now. Spending two hours debugging a broken kubectl because your AWS CLI was the wrong version is the most demoralizing way to start a Kubernetes journey.

Common setup issues (the ones that actually waste your morning)

These are the failures I see beginners hit again and again. Worth scanning even if your install looked clean.

1. terraform: command not found after a fresh install. The binary installed but your shell does not see it because the PATH change has not been reloaded. Fix: open a new terminal window. On Linux/Windows, you can also source ~/.bashrc or refreshenv. If it still fails, run which terraform (or where terraform on Windows) — if it returns nothing, the install put the binary somewhere unexpected.

2. aws sts get-caller-identity returns "Unable to locate credentials". You installed the CLI but never ran aws configure, OR you have an AWS profile mismatch. Either run aws configure to set the default profile, or set export AWS_PROFILE=<your-profile> in your shell. SSO users need aws sso login instead.

3. kubectl shows "Unable to connect to the server: x509: certificate signed by unknown authority". Your kubeconfig is pointing at a stale or mismatched cluster. Check kubectl config current-context and switch with kubectl config use-context <name>. After running create-cluster.sh later in this post, the script writes a fresh context and selects it automatically — so this error is almost always from a leftover context.

4. gcloud auth login works but Terraform says "could not find default credentials". There are TWO gcloud auth flows. gcloud auth login authenticates the CLI for gcloud commands. Terraform needs gcloud auth application-default login for its provider. Run that second command and Terraform will find the credentials.

5. az login opens a browser tab that never returns. Common in headless Linux/WSL2. Use az login --use-device-code instead — it gives you a code to paste into a browser on any device.

6. WSL2 cannot reach the cloud (timeouts on aws/gcloud/az). Usually a corporate VPN that does not propagate routes into WSL2. Workaround: configure WSL2 networking mirrored mode in ~/.wslconfig with networkingMode=mirrored and restart WSL (wsl --shutdown).

7. helm: command not found even though install succeeded. On macOS Apple Silicon, this happens when Homebrew installed Helm in /opt/homebrew/bin but your shell is still using the old Intel /usr/local/bin path. Fix: add eval "$(/opt/homebrew/bin/brew shellenv)" to your ~/.zshrc and reload.

8. Terraform errors out with "Error: Failed to install provider" on a corporate network. Your firewall is blocking releases.hashicorp.com. Either get the proxy whitelisted, or set TF_REGISTRY_DISCOVERY_RETRY=5 and try a network at home first.

Clone the cluster factory

bash
git clone https://github.com/vellankikoti/resources-creator.git cd resources-creator/cluster-creator

Inside, you will find:

plaintext
cluster-creator/ ├── scripts/create-cluster.sh # The one command ├── terraform/ │ ├── eks/ # AWS EKS module │ ├── gke/ # GCP GKE module │ ├── aks/ # Azure AKS module │ └── modules/ # Shared building blocks ├── environments/ │ ├── dev.env │ ├── qa.env │ ├── staging.env │ └── prod.env └── addons/ # Optional cluster addons

The Terraform per cloud is the real config. The script just wires it up so you do not have to think about state files, backend configuration, or the order of terraform init / plan / apply.

One command. Pick your cloud.

AWS (EKS)

bash
./scripts/create-cluster.sh \ --cloud aws \ --name kotidev \ --env dev \ --region us-east-1

What this creates: VPC, 3 private + 3 public subnets, one NAT gateway (single for non-prod, multi-NAT for prod), an EKS managed control plane, and a managed node group. Cost target for dev: under $5/day if you tear it down at end of day.

create-cluster.sh running against AWS — Terraform plan and apply output streaming in the terminal
./create-cluster.sh --cloud aws — Terraform plans the VPC, subnets, IAM, EKS control plane, and node group, then applies. Expect 8-14 minutes.

GCP (GKE)

bash
./scripts/create-cluster.sh \ --cloud gcp \ --name kotidev \ --env dev \ --region us-central1

What this creates: VPC, regional subnet, GKE cluster (standard mode, not autopilot — we want to see what's happening), a default node pool. Same cost shape as EKS.

Azure (AKS)

bash
./scripts/create-cluster.sh \ --cloud azure \ --name kotidev \ --env dev \ --region eastus

What this creates: Resource group, VNet, subnet, AKS cluster with Azure CNI, default system node pool. Same cost shape as the others.

What happens when you run it

The script does roughly this:

  1. Validates your cloud CLI is logged in and the region is real.
  2. Initializes Terraform with the correct backend and module path for your cloud.
  3. Plans the resource graph and prints what it is about to create.
  4. Applies the plan (asks for confirmation in dev, auto-confirms in qa).
  5. Writes kubeconfig to ~/.kube/config so kubectl is immediately wired up.
  6. Validates the cluster by running kubectl get nodes and waiting for them to be Ready.

The whole thing takes 8–14 minutes depending on the cloud. AWS is the slowest because EKS control plane provisioning is genuinely glacial. GKE is the fastest. AKS sits in between.

Verify

bash
kubectl get nodes # NAME STATUS ROLES AGE VERSION # ip-10-0-1-23.ec2 Ready <none> 2m v1.30.4 # ip-10-0-2-44.ec2 Ready <none> 2m v1.30.4 kubectl get pods -A # Should show kube-system pods running cleanly.
kubectl get nodes and kubectl get pods -A against a freshly provisioned EKS cluster — nodes Ready, kube-system pods Running
kubectl get nodes + kubectl get pods -A — the cluster is real, the kubeconfig is wired up, you are ready for Day 1.

If the nodes are NotReady, give it 60 seconds. If they are still NotReady, run kubectl describe node <name> and look at the Conditions block. The story is usually right there.

Tear it down before bed

bash
./scripts/destroy-cluster.sh \ --cloud aws \ --name kotidev \ --env dev \ --region us-east-1

This is the most important command in the whole post. Always run it before you stop working for the day. A forgotten EKS cluster running over the weekend is a $50 bill nobody enjoys.

What you have now

A real, cloud-managed Kubernetes cluster, provisioned by real Terraform, that you can throw away and recreate in under fifteen minutes. From here, the rest of the series (Days 1 through 35) assumes this cluster exists. Every post starts with kubectl apply -f issue.yaml, and every fix lands cleanly because the cluster is real.

Day 1 lands tomorrow: the simplest possible failure mode, and the one I see most often in beginner clusters — a pod that refuses to start because the command is wrong.

See you on Day 1.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.