Skip to main content
10 Docker Production Mistakes That Are Costing Your Team Time and Money

By INI8 Labs · 2026-06-05 · 10 min read

10 Docker Production Mistakes That Are Costing Your Team Time and Money

Production doesn't fail loudly with Docker. It fails quietly. A container restarts. A pod flaps. CPU spikes just enough to make your API feel slow. Support tickets accumulate before alerts fire. And suddenly everyone's staring at their Docker setup like it betrayed them.

It didn't. Most Docker production failures are self-inflicted — the result of patterns that "worked fine in development" and then fell apart under real traffic, real attack surfaces, and real operational complexity. A 2024 CNCF survey found that 37% of failures with container orchestration were traced to version drift alone.


Mistake 1: Using the latest Tag in Production

The latest tag is mutable. Today's nginx:latest is not tomorrow's. When you deploy with latest, you cannot reproduce your deployment, cannot reliably roll back, and have no guarantee that the image running in production is the same one that passed your tests.

The fix:

# Wrong
FROM python:latest

# Right — pin to specific version
FROM python:3.11.9-slim

# Most stable — pin with digest
FROM python:3.11.9-slim@sha256:abc123...

Pin every image in every Dockerfile to a specific version. Semantic version tags are the minimum acceptable standard.


Mistake 2: Running Containers as Root

By default, Docker containers run as root. This means a compromised container has root-level access to the container filesystem and potential escalation paths to the host.

The fix:

FROM node:18-slim
WORKDIR /app
COPY . .
RUN useradd -m -u 1001 appuser && chown -R appuser /app
USER appuser
CMD ["node", "server.js"]

For Kubernetes deployments, enforce this at the pod level with securityContext: runAsNonRoot: true in your pod spec.


Mistake 3: Not Setting Resource Limits

Containers without resource limits are a denial-of-service waiting to happen. A single misbehaving container can starve other containers on the same node.

The fix:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Set both requests (guaranteed allocation) and limits (maximum consumption) for every container.


Mistake 4: Bloated Images That Slow Deployments and Increase Attack Surface

A Flask application that ends up as a 1GB+ image because it includes gcc, vim, curl, and the full build toolchain is a mistake that compounds. Large images slow down CI/CD pipelines and carry a dramatically larger attack surface.

The fix — multi-stage builds:

# Stage 1: Build
FROM python:3.11 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Stage 2: Runtime (no build tools)
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
USER 1001
CMD ["python", "app.py"]

Multi-stage builds typically reduce image size by 60–80%.


Mistake 5: Secrets in Environment Variables (or Worse, in Images)

Hardcoded secrets in Dockerfiles are baked into image layers and persist in image history. Environment variables are still visible in process listings and container inspect output.

The fix:

env:
  - name: DB_PASSWORD
    valueFrom:
      secretKeyRef:
        name: db-credentials
        key: password

Use HashiCorp Vault, AWS Secrets Manager, or Kubernetes Secrets with encryption at rest enabled for all credentials.


Mistake 6: Ignoring .dockerignore

Every file in your build context gets sent to the Docker daemon during the build process. A missing .dockerignore means your .git directory, your local node_modules, and your .env files all get included.

The fix — a minimal .dockerignore:

.git
.gitignore
.env
*.env
node_modules
__pycache__
*.pyc
.pytest_cache

Mistake 7: No Health Checks

A container that's running is not necessarily a container that's healthy. Without a health check, Kubernetes can't distinguish a container that's processing requests from one that's crashed internally.

The fix:

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3   CMD curl -f http://localhost:8080/health || exit 1
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Mistake 8: Inefficient Layer Ordering Destroying Build Cache

Docker builds layers from top to bottom and caches each layer. The most common mistake: copying your entire source code before installing dependencies.

Wrong:

COPY . .
RUN pip install -r requirements.txt

Right:

COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

Optimised layer ordering reduces CI build times by 50–80% for most applications.


Mistake 9: Never Updating Base Images

Container images are not static artifacts. The base image you pinned six months ago now has known CVEs. Container security is an ongoing practice.

The fix:

  • Schedule monthly (minimum) base image rebuilds
  • Integrate Trivy or Snyk into your CI pipeline to fail builds on critical CVEs
  • Use tools like Renovate Bot to automate dependency update pull requests

Mistake 10: Treating Container Logs as Ephemeral

Container logs written to local files inside the container are lost when the container restarts. The logs from a crashed container — the logs you need most — are gone before you can investigate.

The fix: Write all application logs to stdout/stderr. Docker captures these automatically. In Kubernetes, use a log aggregation stack (Fluentd/FluentBit → Elasticsearch or CloudWatch) that ships logs off-node before the pod terminates.

import logging
import sys

logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

A Production-Ready Dockerfile: Combining All the Fixes

# Stage 1: Build
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Stage 2: Runtime
FROM python:3.11-slim
WORKDIR /app
RUN useradd -m -u 1001 appuser
COPY --from=builder /root/.local /root/.local
COPY --chown=appuser:appuser . .
USER appuser
HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3   CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" || exit 1
EXPOSE 8080
CMD ["python", "app.py"]

Actionable Takeaways

  • Audit your current Dockerfiles for each of the 10 mistakes — most production Docker setups exhibit at least 4–5
  • Pin image tags immediately, starting with your highest-traffic services
  • Add a .dockerignore to every repository that uses Docker
  • Implement Trivy or Snyk as a blocking CI gate for critical CVEs in container images
  • Move all secrets to a vault — eliminate environment variable credential passing
  • Set resource limits on every container running in Kubernetes
  • Schedule monthly base image rebuild cycles and automate dependency update PRs

FAQ

Why is running containers as root a security risk? A container running as root that is compromised gives an attacker root-level access to the container filesystem and potential privilege escalation paths to the underlying host.

What is a Docker multi-stage build? A multi-stage build uses multiple FROM instructions in a single Dockerfile to separate build-time dependencies from runtime dependencies. The final image only contains what's needed to run the application, typically reducing image size by 60–80%.

What is .dockerignore and why does it matter? .dockerignore specifies files and directories to exclude from the Docker build context. Without it, your entire working directory — including .git, .env files, test data, and local caches — gets sent to the Docker daemon on every build.

How often should you rebuild Docker base images? At minimum monthly, and immediately when a critical CVE is published for your base image distribution.

Why does layer ordering matter in Dockerfiles? Docker caches each layer. If a layer changes, all subsequent layers are invalidated and must be rebuilt. Placing frequently-changing instructions after rarely-changing ones means dependency installation is cached across most builds.


INI8 Labs builds production-grade DevOps infrastructure including container security, Kubernetes platform engineering, and CI/CD pipeline design.