I Built a Tool That Tells You If Your Kubernetes Deploy Is About to Cause an Incident

I Built a Tool That Tells You If Your Kubernetes Deploy Is About to Cause an Incident

Every engineer who has deployed to Kubernetes has had this moment: you’re staring at a YAML diff, trying to figure out if that change to spec.template.spec.containers[0].image is fine or if it’s going to take down production at 2 a.m.

A raw git diff doesn’t help. It shows you text changed. It doesn’t tell you what that means.

So I built kdiff, a CLI that compares two Kubernetes manifests and tells you not just what changed, but also whether any of those changes are risky.

The problem with existing tools

kubectl diff compares your local manifest against what’s live in the cluster. That’s useful, but it requires cluster access, and it doesn’t flag risky patterns. It just shows you the diff and leaves the interpretation to you.

helm diff is similar. It’s great for seeing what a Helm upgrade will do, but its output is just raw, unanalyzed text.

What I wanted was something more like terraform plan. When Terraform shows you a plan, it doesn’t just say these resources will change. It also tells you which changes are destructive, making the risk clear before you apply anything.

Kubernetes doesn’t have a tool like that, so kdiff is my attempt to fill that gap.

What it does

You give it two YAML files, one before and one after, and it tells you:

  • Which resources were added, removed, or modified
  • For modified resources, exactly which fields changed and what the values were before and after
  • A risk analysis flagging anything operationally dangerous

Here’s what the output looks like on a real example:

kdiff.gif


Four HIGH flags in one diff. This is a deploy that would have caused problems.

It also supports --output json for CI pipelines and can take a directory as input if you have multiple manifest files.

How it’s built

The tool is written in Go. Its architecture is a pipeline with five stages, each handling a single job and passing its output to the next.

Input Files

   ↓

Loader    — reads YAML, splits multi-document files on ---

   ↓

Parser    — turns raw YAML into typed Resource structs

   ↓

Differ    — compares before/after, produces a list of changes

   ↓

Risk      — inspects changes for dangerous patterns

   ↓

Renderer  — formats output for terminal or JSON

Each stage is in its own package under internal/. They only interact through shared types, which makes each one independently testable.

Let me walk through the parts that were actually interesting to build.

The diffing problem

The naive approach to diffing YAML is to serialize both documents back to strings and run a text diff. That tells you something changed, but not what.

A better approach is to walk both documents as maps and record every path where values differ. That’s what kdiff does. A field change looks like this:

spec.template.spec.containers.api.image   api:v1.2.3 → api:latest

That path is meaningful. You know exactly where in the manifest this changed.

The tricky part is handling Kubernetes lists. A Deployment’s containers field is a YAML array. If you compare arrays by index, you get containers[0].image changed. That’s somewhat useful, but not ideal. If you rename a container or reorder the list, index-based diffing can report many false changes.

A better approach is to compare lists by the name field that most Kubernetes list items have. Containers, environment variables, and volume mounts all have names. So instead of containers[0], you get containers.api, which is keyed by the container’s actual name.

func indexByName(items []any) map[string]any {

   result := make(map[string]any, len(items))

   for _, item := range items {

       m, ok := item.(map[string]any)

       if !ok {

           continue

       }

       name, ok := m["name"].(string)

       if ok && name != "" {

           result[name] = m

       }

   }

   return result

}

If every item in a list has a name field, compare by name. If the list is mixed or unnamed, treat the whole list as a single value and report one change at the list path. This avoids silent data loss and prevents errors on unexpected shapes.

The namespace problem

Kubernetes has a subtlety that catches people out: if you don’t specify metadata.namespace in a manifest, it defaults to default when applied to the cluster.

This means a manifest with namespace: default and one with no namespace at all describe the same resource. If you compare them without handling this, you get a false Removed and Added for the same resource, which is both incorrect and noisy.

kdiff normalizes this. When matching resources for comparison, it treats a missing namespace as default for namespaced resource kinds. But cluster-scoped resources like Node and ClusterRole don’t have namespaces at all, so they get different handling.

This also affects the body diff. If the before manifest omits metadata.namespace and the after manifest includes namespace: default, the raw YAML bodies differ, but semantically they’re identical. So before comparing the body, kdiff removes metadata.namespace from both sides. The field is already part of the resource’s identity, so including it in the body diff would produce false positives.

It’s a small detail, but it matters if you want output that people can actually trust.

The risk model

The risk analyzer is the part that makes kdiff more than just a diff viewer.

Each risk rule is a function with the same signature:

type rule func(diff.Change) []RiskFlag

Rules are collected in a slice and checked in order against every change. To add a new rule, you just write one function and add it to the slice. There are no switch statements to change and no existing logic to update.

var rules = []rule{

   ruleResourceDeleted,

   ruleReplicaDecreased,

   ruleImageUnpinned,

   ruleImageChanged,

   ruleEnvVarRemoved,

   ruleResourceLimitsChanged,

   ruleProbeRemoved,

   ruleStrategyTypeChanged,

   ruleVolumeMountRemoved,

   ruleTerminationGracePeriodDecreased,

}

Each rule is independently testable. Here’s the replica rule:

func ruleReplicaDecreased(c diff.Change) []RiskFlag {

   if c.Type != diff.Modified {

       return nil

   }

   for _, f := range c.Fields {

       if !f.Path.Equals("spec", "replicas") {

           continue

       }

       before, bOK := toInt64(f.Before)

       after, aOK := toInt64(f.After)

       if bOK && aOK && after < before {

           return []RiskFlag{{

               Severity: High,

               Rule:     "replica-decreased",

               Message:  fmt.Sprintf("replicas decreased from %d to %d", before, after),

               Path:     f.Path.String(),

           }}

       }

   }

   return nil

}

Each rule has a single concern. The replica rule only triggers when replicas go down. Increasing replicas is fine. Changing replicas on a resource that’s being deleted is already handled by the resource-deleted rule. Each rule stays focused on its own job.

The image rules are worth calling out specifically because there are two of them that often fire together:

  • image-unpinned triggers when the new image uses :latest or has no tag at all. It does not trigger for digest-pinned images (image@sha256:...) because a digest is content-addressed and is immutable by definition.
  • image-changed triggers when the image changes to any new value, with MEDIUM severity. It’s not dangerous by itself, but it’s the most common source of production incidents during deploys. You want to be aware of it.

A single image change from api:v1.2.3 to api:latest fires both rules. You see that the image changed, and you see that the new reference is not reproducible.

Handling malformed input

Real YAML is messy. The tool needs to handle it without panicking.

Two cases worth mentioning:

Non-string image fields can be tricky. For example, image: [list] is valid YAML. Comparing two []any values with == will cause a panic in Go. The image rules check that both before and after values are strings before comparing them. If they’re not, the rule simply skips them.

Non-integer replica counts are another case. For example, replicas: "three" is parsed as a string. The replica rule uses a toInt64 helper that handles intint64, and float64, which are the three types the YAML parser can produce for a numeric field, and returns a boolean to show if conversion worked. If either side isn’t numeric, the rule doesn’t trigger.

func toInt64(v any) (int64, bool) {

   switch n := v.(type) {

   case int:

       return int64(n), true

   case int64:

       return n, true

   case float64:

       return int64(n), true

   }

   return 0, false

}

Small helpers like this are what make the difference between a tool that only works on your test data and one that works in the real world.

Testing

The test suite has around 90 tests. Most of them test failure modes, not just happy paths.

A few I’m particularly glad I wrote:

TestCheckDuplicates_OmittedNamespaceMatchesExplicitDefault checks that a file containing two resources that are actually the same (one with namespace: default, one without) is caught as a duplicate instead of silently overwriting one with the other.

TestDiffCmd_NonStringImageField_DoesNotCrash sends a non-string image value through the full pipeline and checks that the tool produces valid JSON instead of panicking.

TestCompare_OmittedNamespaceDoesNotProduceModified caught an actual bug. The resource identity matching was correct, but the body diff was still producing a false Modified because metadata.namespace was present in one body and missing in the other. This test is the reason bodyForDiff exists.

Tests that catch real bugs during development are more valuable than tests written after the fact.

Install and try it

go install github.com/am-miracle/kdiff@latest

kdiff diff before.yaml after.yaml

kdiff diff before/ after/

kdiff diff before.yaml after.yaml --output json

The repo is at github.com/am-miracle/kdiff. The README documents every risk rule, its triggers, and why.

What’s next

There are a few things on the list for future versions:

  • Recursive directory traversal (currently, it only reads the top level of a directory)
  • Helm and Kustomize rendering support: render first, then pipe to kdiff
  • Custom risk rule definitions via a config file
  • GitHub Action: For now, the v1 scope is intentional. A focused tool that does one thing well is more useful than a sprawling tool that does many things only halfway. Does many things halfway.

If you deploy to Kubernetes and you’ve ever stared at a YAML diff trying to figure out if it’s safe, give it a try. I’d be curious what risk patterns you think are missing.

Customize your view

Manage your font size, color, and background

Font size

Aa

Aa

Color

Background

Light
Dim
Dark