koti.dev
← The Runbook
Mastering Kubernetes the Right Way · DAY 32 / 35

SELinux Denied Your Kubernetes Pod. kubectl Has No Idea.

Profile mismatches, denied syscalls, and the audit2allow loop you should never run blind.

KV
Koti Vellanki20 Apr 20264 min read
kubernetessecurity
SELinux Denied Your Kubernetes Pod. kubectl Has No Idea.

Same image, same manifest, same namespace. Staging runs fine. Prod throws open: permission denied on a file that clearly exists, clearly has the right ownership, clearly passed ls -la with 0644. id inside the container shows the right UID. stat shows the file. And still the app crashes on startup. I spend two hours thinking it is a Kubernetes bug. It is not a Kubernetes bug. It is SELinux enforcing a policy on the prod nodes that staging did not have enabled, and the container's SELinux context does not match the file's SELinux context. Kubernetes has no idea any of this is happening. kubectl describe pod says Running. The kernel says nope.

The scenario

DAY 32 · SECURITY · SELINUX / APPARMOR

DAC bits say 777. SELinux still says no.

A pod writes to /var/lib/data mounted from a hostPath. The host is in SELinux enforcing mode. The container's label is container_file_t; the host directory's label is default_t. The kernel LSM hook fires after the DAC check passes and returns EACCES. The app logs 'permission denied' on a path that looks completely open.

FIGURE32 / 35
SELinux denies pod write — container_file_t cannot write to default_tA pod running as uid 1000 mounts a hostPath at /var/lib/data and calls write(). The kernel DAC check passes but the SELinux LSM hook compares the container label container_file_t with the host directory label default_t, finds no type transition, and returns EACCES. The pod status shows Running; the app log shows permission denied.KUBERNETES CLUSTERnode · SELinux enforcingPOD · default nsmountPath:/var/lib/datarunAsUser: 1000DAC bits: 0755→ write() fails1write()→ EACCESLINUX KERNELLSM hookspermission checks:DAC check: ✓ allowuid 1000 owns path, mode 0755MAC (SELinux): ✕ denylabel mismatch → AVC→ EACCESerrno 13 returned to app2SELINUXenforcinglabel comparison:container:container_file_thost dir:default_t→ no transition ruleAVC denial loggedin /var/log/audit/audit.log3
1

The pod looks fine from Kubernetes' perspective

kubectl describe pod says Running. ls -la inside the container shows the mount with correct mode bits. id shows uid 1000. The DAC layer is satisfied. SELinux enforces a second, independent permission layer that Kubernetes has no visibility into.

2

Two checks run, in order — DAC then MAC

The kernel runs the Discretionary Access Control (DAC) check first. It passes — uid 1000 has execute on the directory. The LSM hook fires next: selinux_inode_permission compares process label to inode label. No type transition rule covers container_file_t → default_t, so the AVC denial fires and the kernel returns EACCES.

3

Fix the label, not the mode bits

Run ausearch -m avc -ts recent on the node to find the exact denial. The fix is either chcon -Rt container_file_t /var/lib/data on the host, or set securityContext.seLinuxOptions.type: container_file_t in the pod spec. Never run audit2allow blind — read what you are permitting first.

Kubernetes
LSM gate
MAC denied
syscall path
◆ koti.dev / runbook
A pod's write() syscall passes DAC but is denied by the SELinux MAC layer — label mismatch between container_file_t and default_t.
A Kubernetes pod tries to write to a hostPath volume at /var/lib/data. The Linux kernel first runs the DAC check which passes, then runs the SELinux LSM hook. The container's SELinux label is container_file_t but the host directory carries default_t. No type transition is defined, so the MAC check fails and the kernel returns EACCES. kubectl describe pod shows Running; the app log shows permission denied.
EACCES (13) — permission denied; man errno(3) · pod.spec.securityContext.seLinuxOptions — kubectl explain pod.spec.securityContext.seLinuxOptions · kind v0.22.0, Kubernetes 1.30.0, kernel with SELinux enforcing

The repo has a reproducible version with a mismatched SELinux level.

bash
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro/scenarios/selinux-apparmor-policy-violation ls
bash

description.md, issue.yaml, fix.yaml. The issue manifest sets seLinuxOptions.level: "s0:c0,c1" on the pod, simulating a container that gets labelled with a category set the node's policy does not allow it to use.

Reproduce the issue

bash
kubectl apply -f issue.yaml kubectl get pod selinux-apparmor-issue-pod
bash
plaintext
NAME READY STATUS RESTARTS AGE selinux-apparmor-issue-pod 1/1 Running 0 3m

And the logs:

bash
kubectl logs selinux-apparmor-issue-pod
bash
plaintext
open /data/config.json: permission denied

Kubernetes reports the pod as healthy. The app reports that a file it owns is unreadable. Both are telling the truth from where they sit.

Debug the hard way

First, check the SELinux context Kubernetes actually applied to the pod.

bash
kubectl get pod selinux-apparmor-issue-pod -o jsonpath='{.spec.securityContext}'
bash
plaintext
{"seLinuxOptions":{"level":"s0:c0,c1"}}

Now exec in and check the runtime view.

bash
kubectl exec selinux-apparmor-issue-pod -- id -Z
bash
plaintext
system_u:system_r:container_t:s0:c0,c1

And check the file the app is trying to open.

bash
kubectl exec selinux-apparmor-issue-pod -- ls -lZ /data/config.json
bash
plaintext
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0:c0,c0 42 Apr 19 22:14 /data/config.json

There it is. Process context is s0:c0,c1. File context is s0:c0,c0. DAC permissions (rw-r--r--) would let the process read the file. MAC permissions (SELinux) say no, because the categories do not match. To see the denial in the kernel log I would normally ssh to the node and tail journalctl.

bash
ssh node-7 'journalctl -k | grep AVC | tail -3'
bash
plaintext
audit: type=1400 audit(...): avc: denied { read } for pid=12345 comm="busybox" name="config.json" scontext=system_u:system_r:container_t:s0:c0,c1 tcontext=system_u:object_r:container_file_t:s0:c0,c0 tclass=file permissive=0

There is the AVC denial in black and white. scontext is the subject (the process). tcontext is the target (the file). The mismatch is the c0,c1 vs c0,c0 category set. The kernel denied the read and returned EACCES, which the container saw as plain old permission denied.

Why this happens

SELinux and AppArmor are Linux Security Modules that sit below the regular Unix permission model. Even if DAC says "this UID can read this file," the LSM gets a second vote and can refuse. In Kubernetes land, this matters because the container runtime labels your container with an SELinux context, and the files on the node (hostPath, emptyDir, local PVs) have their own context. If they do not match, the kernel refuses the syscall and there is no error anywhere that Kubernetes-level tooling will show you.

The way this usually bites teams in prod is an environment skew. Staging nodes have SELinux in permissive mode. Prod nodes have it in enforcing. The staging cluster silently logs denials and keeps going. The prod cluster hard-fails. Your manifest is identical. Your logs disagree. The second common cause is a pod that sets seLinuxOptions explicitly, maybe copied from a blog post, and the level it asks for is not one the node's policy knows about.

The fix is usually to adjust the pod's context to match the files it needs, or to let the runtime assign the default context and not override it at all. The hard version is writing a custom SELinux policy with audit2allow, and that is a loop you should never run blindly, because audit2allow -a will gladly generate a rule that allows the exact denial you just saw plus a dozen things you did not see and did not want to allow. Always read the generated TE file before loading it.

The fix

bash
kubectl delete pod selinux-apparmor-issue-pod kubectl apply -f fix.yaml
bash

The diff is a single category.

diff
securityContext: seLinuxOptions: - level: "s0:c0,c1" + level: "s0:c0,c0"
diff

Verify the process and file contexts now match.

bash
kubectl exec selinux-apparmor-fixed-pod -- id -Z
bash
plaintext
system_u:system_r:container_t:s0:c0,c0

In real life, unless you know exactly what the policy on the node looks like, do not set seLinuxOptions at all. Let the runtime pick. For volumes that need a specific context, use the seLinuxOptions at the volume level and let the runtime relabel with the Z option. For AppArmor, use the container.apparmor.security.beta.kubernetes.io/<container> annotation and point it at a profile that actually exists on every node.

The lesson

  1. "Permission denied" on a file that looks readable almost always means an LSM is saying no below DAC. Check id -Z and ls -Z before you check ownership.
  2. Do not set seLinuxOptions unless you know the node's policy. The runtime default is almost always right.
  3. audit2allow -a generates rules for every denial it sees, including ones you did not want to allow. Read the TE file every time.

Day 32 of 35 — tomorrow we go one layer deeper, into the CRI itself, where the pod sandbox fails before your image even gets a chance to pull.

◆ Newsletter

Get the next post in your inbox.

Real Kubernetes lessons from seven years in production. One email when a new post drops. No spam. Unsubscribe in one click.