8 Silent Failures in 2 Workflow Files. Zero Warnings

A customer placed an order in production. The confirmation page loaded.

But the cook and bartender dashboards showed nothing. The order existed in the database: payment_status: pending

That was the problem.

The dashboards filter on payment_status = 'paid'. An order that never gets confirmed, never shows up.

No alert. No error. No food. Then the customer waits.

Production incident flow

✓

Customer places orderConfirmation page loads successfully

●

payment_status = pendingOrder created in database but never confirmed

Dashboard receives nothingFilters on payment_status = 'paid'

Kitchen sees no orderOrder never made. Customer waits.

☉

Root cause discoveredDatabase schema not applied in production

What actually happened

Staging worked. Production didn't.

Same code. Same Docker images. Different outcome.

The difference: staging had the correct database schema, but production didn't.

The root cause

The deploy workflow had this line:

deploy.yml

cat ~/[app]/init-scripts/schema.sql | docker exec -i [app]-postgres-production \
  psql -U [app]_user -d [app] || echo 'Schema may already exist'

That last part is the lie:

the problem

|| echo 'Schema may already exist'

If the file is missing → the pipeline prints a message and continues.

If the SQL has errors → same thing.

If psql can't connect → same thing.

The build stays green but the schema is not applied. And nobody knows.

I wrote a deploy step that skipped verification and moved on. I never checked if the schema applied. The app never checked if its tables existed. The health check returned HTTP 200 because the server booted, not because it worked.

What the pipeline looked like

↑

Commit

✓

CI Passed

green

⇩

Deploy

green

⚠

Production Broken

broken

The audit

I grepped both workflow files for || true, || echo, and unguarded curl calls.

Eight hits. Two broke production.

Risk	Pattern	Impact
High	schema.sql \|\| echo (staging)	DB schema not applied, app broken
High	schema.sql \|\| echo (production)	DB schema not applied, app broken
Medium	ssh-keygen \|\| echo (production)	Bad key undetected, deploy fails later
Low	grep \|\| true (staging)	Expected behavior, no risk
Low	grep \|\| true (production x2)	Expected behavior, no risk
Low	curl without -f flag (staging)	Cache not purged, stale content
Low	curl without -f flag (production)	Cache not purged, stale content

Five were harmless. One caused confusion. Two broke production.

The fix

Three layers. Each catches what the others miss.

Stop on first failure

Added set -euo pipefail to every SSH block:

deploy.yml

ssh user@server "
  set -euo pipefail
  docker login ghcr.io ...
  docker compose pull
  docker compose up -d
"

Flag	Effect
`-e`	Exit immediately if any command fails
`-u`	Treat unset variables as errors
`-o pipefail`	Fail if any command in the pipe fails

Without pipefail, a pipeline like cat file | psql only checks the exit code of psql. If cat fails because the file is missing, the pipe still reports success. psql receives empty input and exits 0.

With pipefail, the pipe fails if either side fails.

This alone would have caught the schema bug.

Replace silence with validation

Replace every || echo with proper error handling.

✕ Silent failure

• || echo
• || true
• Ignored errors
• Green pipeline

✓ Controlled failure

• Validation
• Logs
• Warnings
• Explicit errors

Schema validation — before:

before

cat schema.sql | docker exec -i postgres psql ... || echo 'Schema may already exist'

After:

after

if [ ! -f ~/[app]/init-scripts/schema.sql ]; then
  echo 'ERROR: schema.sql not found! Did SCP step fail?'
  exit 1
fi
echo 'Applying database schema...'
cat ~/[app]/init-scripts/schema.sql | docker exec -i postgres psql ...
echo 'Schema applied successfully'

SSH key validation — before:

before

ssh-keygen -l -f ~/.ssh/deploy_key || echo "Key validation failed"

After:

after

if ! ssh-keygen -l -f ~/.ssh/deploy_key; then
  echo "ERROR: SSH key validation failed!"
  echo "Check PRODUCTION_SSH_KEY secret format"
  exit 1
fi
echo "SSH key validated successfully"

Validate before deploying

Added a validation step before any file gets to the server:

deploy.yml — pre-deploy validation

- name: Validate required files exist
  run: |
    echo "Validating deployment files..."
    if [ ! -f infrastructure/docker/docker-compose.production.yml ]; then
      echo "ERROR: docker-compose.production.yml not found"
      exit 1
    fi
    if [ ! -d infrastructure/docker/init-scripts ]; then
      echo "ERROR: init-scripts directory not found"
      exit 1
    fi
    if [ ! -f infrastructure/docker/init-scripts/schema.sql ]; then
      echo "ERROR: schema.sql not found"
      exit 1
    fi
    echo "All required files present"

Catches missing files on the build runner, where the error message is obvious.

Pipeline after fix

↑

Commit

✓

Validation

files checked

⚗

Schema Check

verified

⇩

Deployment

deployed

✓

Production Verified

healthy

Tradeoffs

Not every || true is wrong. grep exits 1 when it finds zero matches. That's correct behavior, not an error. If your .env file has no STRIPE_ lines, grep -v '^STRIPE_' .env exits 1 and set -e kills the build.

The fix is a comment, not removal:

intentional || true

# || true is intentional: grep exits 1 when no matches found;
# this is expected if .env has no STRIPE_ lines
grep -v '^STRIPE_' .env > .env.tmp || true

The Cloudflare cache purge was a different case. After making it fail loudly, it blocked every deploy. The API token didn't have the right permissions.

A cache issue was killing deploys that had nothing to do with the application.

So I switched to continue-on-error: true with a warning:

deploy.yml — non-blocking step

- name: Purge Cloudflare Cache
  continue-on-error: true  # TODO: fix API token permissions
  run: |
    if ! curl -sf ... purge_cache; then
      echo "WARNING: Cloudflare cache purge failed (non-blocking)"
    else
      echo "Cloudflare cache purged successfully"
    fi

There's a spectrum:

Silent failure

|| echo

Loud failure

exit 1

Right answer

continue-on-error
+ WARNING log
+ TODO comment

Silence looks like success.

Failure looks like a problem.

The right answer tells the truth.

Takeaway

Add set -euo pipefail to every shell block. Audit every || true and || echo.

For each one, ask: if this fails, should the build continue?

If yes, document it.

If no, let it fail.

The rule

Validate files before deploying them. If you allow a failure, log a warning. If a deploy step doesn't verify its own success, you're guessing it worked. Five commits fixed that.

Originally published on Medium. Read the original article →

Was this useful, or did I get something wrong? Email me. I read every reply.

8 Silent Failures in 2 Workflow Files. Zero Warnings.

What actually happened

The root cause

The audit

The fix

Stop on first failure

Replace silence with validation

Validate before deploying

Tradeoffs

Takeaway

Shipping code that silently breaks?

8 Silent Failures in 2 Workflow Files. Zero Warnings.

What actually happened

The root cause

The audit

The fix

Stop on first failure

Replace silence with validation

Validate before deploying

Tradeoffs

Takeaway

Shipping code that silently breaks?

Related articles

The Risk You Carry Until You Can't

How I Think About Technical Debt

dbt: Data Pipelines Done Right