What is Secure by Design in software development?

Secure by Design is an approach where security is built into the architecture and development process from the start – rather than added as a layer after the fact. It covers practices like least-privilege access, secure defaults, fail-secure error handling, and continuous automated security testing.

How do I apply the principle of least privilege to service accounts?

Scope each service account to a single job. Replace wildcard IAM policies with explicit action verbs, lock resource ARNs to the specific services that account needs to reach, and replace long-lived static credentials with short-lived tokens issued on demand. Most cloud providers offer access analysis tooling that shows which permissions an account is actually using versus what it’s been granted.

What does fail-secure error handling mean in practice?

A fail-secure system defaults to denying access when something goes wrong – rather than allowing it. In code, this means every authorization check must return an explicit “yes” to grant access; a timeout, exception, or missing response is treated as a “no.” Error responses to clients should be generic; sensitive details like stack traces go to the internal logging pipeline only.

How do I add automated security scanning to a CI/CD pipeline?

Start with one scanner rather than several at once. Dependency scanning – using tools like Grype, Trivy, or Dependabot – is the lowest-friction starting point because CVE findings are verifiable facts, not heuristic guesses. Set a severity threshold your team actually enforces (fail on Critical and High, surface Medium as warnings), and assign a clear owner for triaging results.

What are secure defaults, and why do they matter?

Frameworks and platforms optimize for getting started quickly, which means they ship with permissive settings. Secure defaults means hardening the configuration your team starts from – strict CORS policies, secure cookie flags, non-root container users, read-only filesystems – so every project inherits a secure baseline without anyone having to remember to apply it manually.

June 09, 2026

Last updated: June 15, 2026

Autor: Max Spanier

Secure by Design

Uhren Symbol 8 min.

Five Secure by Design Practices You Can Implement Today

Most developers don't write insecure code on purpose – the guardrails just weren't there. This post covers five Secure by Design practices you can apply in your current sprint. No reorganization, no security team initiative required.

A group of astronaut IT developers sit in a room working on their computers. Their current sprint board in the background reads security implementations.

Most developers don’t write insecure code on purpose. They write it because the defaults weren’t secure, the error handling was an afterthought, or the service account inherited permissions from a template nobody revisited. In most cases we’ve seen, the issue wasn’t that someone was careless. The guardrails just weren’t there.

This post covers five practices you can apply in your current sprint. No reorganization required, no dependency on a security team initiative. Each one targets a recurring source of vulnerabilities that we see across projects.

Practice 1: Scope Your Service Accounts Down to the Minimum

Service accounts tend to start with broad permissions and never lose them. Someone sets up a deployment pipeline, copies an IAM policy from a working example, and moves on. Six months later, that account still has write access to production databases it never touches. Default policies in cloud platforms ship broader than needed, and shared accounts across teams make it impossible to trace who did what.

One account, one job. If a service account serves multiple pipelines, split it. When one gets compromised, the blast radius stays small.
Replace static credentials with short-lived tokens. Hardcoded API keys in environment variables are easy to set up and easy to forget. Switch to workload identity federation or equivalent mechanisms that issue temporary credentials on demand.
Use just-in-time elevation instead of permanent admin roles. If a CI/CD step needs elevated privileges for a specific task, grant them for that task and revoke them automatically afterward.
Audit what’s actually being used. Most cloud providers offer access analysis tooling that shows which permissions an account exercises versus what it’s been granted. The gap is your attack surface.

The IAM policy is where the copy-paste problem shows up most clearly. Here’s the same deployment service account written two ways: the version that usually ships, and the version that should. Both policies cover the same task of updating a specific Lambda function family and reading deployment artifacts from S3. The difference is how much trust each one extends to get there.

Before: Open Version

{ 
"Version": "2012-10-17", 
"Statement": [ 
{ 
  "Effect": "Allow", 
  "Action": [ 
    "s3:*", 
    "lambda:*", 
    "iam:*" 
  ], 
  "Resource": "*" 
} 
] 
}

After: Scoped Properly

{ 
"Version": "2012-10-17", 
"Statement": [ 
{ 
  "Sid": "DeployPaymentServiceLambdas", 
  "Effect": "Allow", 
  "Action": [ 
    "lambda:UpdateFunctionCode", 
    "lambda:UpdateFunctionConfiguration", 
    "lambda:GetFunction", 
    "lambda:PublishVersion" 
  ], 
  "Resource": "arn:aws:lambda:eu-central-1:123456789012:function:payment-service-*" 
}, 
{ 
  "Sid": "ReadDeploymentArtifacts", 
  "Effect": "Allow", 
  "Action": ["s3:GetObject"], 
  "Resource": "arn:aws:s3:::deploy-artifacts-prod/payment-service/*", 
  "Condition": { 
    "StringEquals": { 
      "aws:SourceVpc": "vpc-0abc1234def567890" 
    } 
  } 
} 
] 
}

The scoped version replaces wildcard actions with explicit verbs – lambda:UpdateFunctionCode, s3:GetObject – so the account can only do exactly what the pipeline requires. Resource ARNs are locked to a specific service family rather than applying across the account. A condition block adds a second constraint: the call must originate from a known VPC, so even a stolen credential doesn’t work from outside that boundary. Statements are split with Sid identifiers, which turns CloudTrail logs from a wall of Allow entries into something an engineer can actually trace back to a specific deployment step.

Practice 2: Make Your Error Handling Fail-Secure, Not Fail-Open

A service goes down, and the load balancer routes traffic to a fallback that skips authentication. An unhandled exception dumps a full stack trace into the API response, complete with database connection strings. The common thread: the system hit an unexpected state and the failure path hadn’t been designed. Firewalls that default to “allow” when the rule engine crashes, auth services that return “authorized” when they can’t reach the identity provider. Each of these turns a temporary outage into a security incident.

Default to deny, not allow. Every access decision should require an explicit “yes” from your auth logic. If the check fails, times out, or throws an exception, the answer is “no.”
Catch exceptions before they leak information. Wrap external-facing error responses in generic messages and route the details to your logging pipeline. Stack traces, internal paths, and tokens in API responses or error messages are free reconnaissance for attackers.
Lock down your fallback configurations. If your application can’t read its config, it should start in a minimal, locked state rather than falling back to permissive defaults. Missing values should disable functionality, not enable it.
Test your failure paths. Kill a dependency in staging and watch how your service reacts. If you’ve never done that, the answer is probably “not well.”

Here’s what fail-secure error handling looks like in an Express.js application. First the default that ships, then a version designed for production.

Before: Default that leaks internals

app.use((err, req, res, next) => { 
  res.status(500).send(err.stack); 
});

After: Generic response to client, full details in logs

const logger = require('./logger');
const { randomUUID } = require('crypto'); 

app.use((err, req, res, next) => { 
  const correlationId = req.correlationId || randomUUID(); 
  const status = err.statusCode || 500; 

  // Everything sensitive goes to the log pipeline, never to the client 
  logger.error({ 
    correlationId, 
    method: req.method, 
    path: req.path, 
    userId: req.user?.id, 
    ip: req.headers['x-forwarded-for'] 
    err: { name: err.name, message: err.message, stack: err.stack } 
  }, 'Unhandled exception'); 

  // 5xx: hide details. 4xx: only expose a message the error class marked safe. 
  res.status(status).json({ 
    error: status >= 500 ? 'Internal Server Error' : ('Bad Request'), 
    correlationId 
  }); 
});

The revised handler never sends sensitive details to the client. Stack traces, internal paths, and request metadata go to the structured logging pipeline instead where your engineers can find them and attackers can’t. The client receives a generic message and a correlation ID: enough for a support conversation, useless for reconnaissance. The 5xx path returns only ‘Internal Server Error’; 4xx responses expose only what the error class explicitly marks safe. The correlationId is generated from crypto.randomUUID() rather than from any user-supplied value, so it can’t be manipulated to infer internal state.

Practice 3: Validate Every Request, Not Just the First One

A user logs in, gets a session token, and from that point on, every API call sails through because the system checked once and decided to trust. Thirty minutes later, the user’s role has changed, the token is still valid, and the requests keep going through with outdated permissions. This is one of the quieter vulnerabilities because it doesn’t look like a bug. The auth system works. The problem is that authorization was treated as a one-time event. Cached permissions that never refresh, long-lived tokens, internal endpoints without auth logic because “only our services call them” – in a microservices architecture, this compounds fast.

Route authorization through a single enforcement point. One place that makes access decisions, applied consistently across all services. Scattering auth logic across individual endpoints is how checks get forgotten.
Keep tokens and sessions short-lived. Issue tokens with short expiry windows and force revalidation at meaningful intervals. If a user’s permissions change mid-session, the next request should reflect that.
Treat internal APIs the same as external ones. “This endpoint is only called by our own services” is not a security boundary. Service-to-service calls need auth checks too.

Practice 4: Set Secure Defaults – Don’t Rely on Configuration

A developer scaffolds a new service, pushes it to staging, and it works. Nobody questions the defaults. Months later, a security review reveals CORS set to allow all origins, session cookies without the Secure flag, and the admin panel reachable without authentication. Frameworks and cloud platforms optimize for getting started quickly, which means shipping with permissive settings. The assumption is that teams will harden things before production. In practice, the config that worked in development is the config that ships.

Harden your project templates. If your team uses starter repos, Helm charts, or Terraform modules, lock them down: strict CORS, secure cookie flags (e.g., Secure, HttpOnly, SameSite), TLS required, admin interfaces disabled. Every project that inherits those templates starts secure without anyone having to remember.
Make risky features opt-in, not opt-out. Debug modes, verbose logging, permissive firewall rules: these should require a deliberate change to enable. If turning something dangerous on is as easy as not turning it off, someone will ship it by accident.
Fail on missing configuration. If an environment variable is missing, the application should refuse to start rather than silently applying a permissive fallback. A loud startup failure beats a quietly running service with open permissions.

A Docker Compose file is a practical place to see this gap. Both configurations run the same service, what changes is how much of the host the container can reach.

Before: What frameworks generate by default

services: 
  api: 
    image: my-api:latest 
    ports: 
      - "3000:3000" 
    environment: 
      - NODE_ENV=production

After: Hardened Version

services: 
  api: 
    image: my-api:1.4.2@sha256:abc123def456...   # pinned digest, not :latest 
    ports: 
      - "127.0.0.1:3000:3000"                    # bind to loopback, expose via reverse proxy 
    environment: 
      - NODE_ENV=production 
    user: "10001:10001"                          # non-root UID/GID 
    read_only: true                              # immutable root filesystem 
    tmpfs: 
      - /tmp:size=64M,mode=1777                  # writable scratch space only where needed 
    cap_drop: 
      - ALL                                      # drop all Linux capabilities, add back only what's needed 
    networks: 
      - backend 
    deploy: 
      resources: 
        limits: 
          memory: 512M 
          cpus: '0.5' 
    healthcheck: 
      test: ["CMD", "node", "healthcheck.js"] 
      interval: 30s 
      timeout: 5s 
      retries: 3 
    restart: unless-stopped 

networks: 
  backend: 
    driver: bridge 
    internal: true                               # no direct egress; outbound via explicit proxy---

Two things in the hardened version will break applications on first run: read_only: true fails immediately if the application writes to any path not explicitly covered by a tmpfs mount – check for log directories, temp files, and PID files before enabling it. cap_drop: ALL removes all Linux capabilities, which breaks applications that bind to port 80 or rely on CAP_NET_BIND_SERVICE; add back only the specific capability you need. internal: true cuts all outbound traffic from the container – remove it if the service makes legitimate calls to external APIs, and route outbound through an explicit proxy instead.

Beyond those caveats: :latest is replaced with a pinned version and SHA256 digest, which prevents silent upstream image swaps from introducing changes you didn’t review. The port binding is restricted to 127.0.0.1, so external traffic has to come through a reverse proxy rather than directly to the container. The container runs as a non-root user – UID/GID 10001 – which limits what a compromised process can do on the host. Memory and CPU limits are set explicitly, which bounds the blast radius of a resource exhaustion attack or a crypto-mining payload that makes it through.

Practice 5: Automate One Security Check in Your Pipeline Today

You already run linters and tests before merging code. Adding a security check is the same principle: catch problems before they reach production, automatically. Yet in a lot of teams, security validation still happens manually or not at all until weeks before a release. The other failure mode is overcorrection: three scanners at once, a flood of false positives, and within a month the team clicks “ignore” on everything. Start with one check. Get the noise level under control, then expand.

Dependency scanning is the lowest-friction starting point. Tools like Dependabot or OWASP Dependency-Check flag known vulnerabilities with minimal configuration. A CVE in a dependency is a verifiable fact, not a heuristic guess, so the signal-to-noise ratio is better than most SAST tools.
Add a secret scanner alongside your dependency checks. Tools like gitleaks or TruffleHog scan commits for API keys, tokens, and hardcoded secrets before they reach a shared branch.
Set a severity threshold your team actually respects. Break the build on critical and high, surface medium as warnings, suppress low. Adjust based on what your team acts on. If everything blocks, developers will route around it.
Assign ownership for findings, not just the tool. A scanner without someone responsible for triaging its output is a scanner that gets ignored. Decide upfront who reviews, who decides fix-or-suppress, and how quickly.

Here’s a minimal GitHub Actions workflow using Grype – one scanner, one severity threshold, and the results in a place where someone is actually accountable for them.

GitHub Actions: Grype

name: security-scan

on:
  pull_request:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'        # weekly, Mondays 06:00 UTC - catches CVEs disclosed after last commit
  workflow_dispatch:             # allow manual triggering

jobs:
  dependency-scan:
    name: Dependency vulnerability scan
    runs-on: ubuntu-latest
    timeout-minutes: 10          # prevent hung jobs from blocking the queue
    permissions:
      contents: read
      security-events: write     # required to upload SARIF
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Scan repository with Grype
        id: scan
        uses: anchore/scan-action@v4   # pin to specific version; verify latest before publish
        with:
          path: "."
          severity-cutoff: high        # fail build on HIGH or CRITICAL findings
          fail-build: true
          output-format: sarif
          only-fixed: true             # skip vulns with no available fix - noise reduction

      - name: Upload findings to GitHub Security tab
        if: always()                   # upload even when the previous step fails the build
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: ${{ steps.scan.outputs.sarif }}
          category: grype              # disambiguates findings if you add more scanners later

The workflow runs on every pull request, every push to main, and on a weekly schedule. The schedule matters because CVEs are disclosed continuously, not just when code changes. severity-cutoff: high means HIGH and CRITICAL findings fail the build; everything below surfaces as informational without blocking the pipeline. only-fixed: true is worth noting: it filters out vulnerabilities with no available patch, which removes a significant source of alert fatigue. Findings are exported as SARIF and uploaded to the GitHub Security tab, which gives them a UI, a triage workflow, and an assignable owner instead of disappearing into CI logs that nobody reviews after the build passes.

From First Practice to Secure by Design Habit

Five practices, none of which require a reorganization. Scope your service accounts, fail safe, validate every request, harden your defaults, add one automated check.

If you’ve implemented one and it worked, pick the next. That’s how security practice builds: through incremental changes that compound over time. The hard part isn’t any individual practice. It’s building the habit of treating security as part of the work rather than as a separate task.

Five Secure by Design Practices You Can Implement Today

Practice 1: Scope Your Service Accounts Down to the Minimum

Before: Open Version

After: Scoped Properly

Practice 2: Make Your Error Handling Fail-Secure, Not Fail-Open

Before: Default that leaks internals

After: Generic response to client, full details in logs

Practice 3: Validate Every Request, Not Just the First One

Practice 4: Set Secure Defaults – Don’t Rely on Configuration

Before: What frameworks generate by default

After: Hardened Version

Practice 5: Automate One Security Check in Your Pipeline Today

GitHub Actions: Grype

From First Practice to Secure by Design Habit

Frequently Asked Questions

What is Secure by Design in software development?

How do I apply the principle of least privilege to service accounts?

What does fail-secure error handling mean in practice?

How do I add automated security scanning to a CI/CD pipeline?

What are secure defaults, and why do they matter?

Which of these practices is your team already applying?

Max

Insights

Secure by Design 101

Secure by Design 101

Secure by Design 101