💃 Blast Radius🕺

The Most Expensive Number You're Not Measuring

Good morning, Salesforce Nerds! 👋 

Every automation is a bet. When it works, nobody notices. When it fails, the only question is how far the damage travels.

Blast radius is the distance between a single failure and everyone affected by it.

In a complex Salesforce org, that distance can be one record or twelve downstream systems. The architecture decides which.

The economics are brutal. One uncontained automation failure costs more than a year of building containment. The kind that corrupts ten thousand records, double-fires payments, or freezes case routing for three hours.

Yet most orgs invest in features and treat containment as overhead. That's backwards. Containment is the feature.

Let's design for it. 👇️

TABLE OF CONTENTS

WALL OFF THE DAMAGE

DRAW LINES, DEFEND THEM

Ships use bulkheads to keep a single hull breach from sinking the whole vessel. Your org needs the same discipline. 🚢

Every business domain should live behind its own automation boundary. Cases own Cases. Opportunities own Opportunities. One team owns the logic, one handler runs the work, and cross-domain calls travel through async hops rather than direct method invocations.

Platform Events are the bulkhead of choice for cross-domain integration. They decouple producers from consumers. When the consumer fails, the producer keeps running. The failure stays in one domain instead of spilling across the org. ✅

The governance piece outweighs the code. Ownership zones must be named, documented, and enforced in code review.

Without that discipline, the cleanest technical isolation rots the first time someone "just adds one quick trigger" across a boundary. 🚧

A practical test: if your Opportunity trigger directly updates Account, Contact, Quote, and an external billing system in one transaction, you have no bulkheads.

You have one giant failure surface waiting for its moment. 💥

MAKE RETRIES SAFE

RUN TWICE, BREAK NOTHING

If your automation cannot execute twice safely, it cannot recover from any failure. Period. 🔁

Idempotency means a second execution with the same input produces the same result as the first. No duplicate records. No double-charged customers. No cascading cleanup tickets. 🛡️

The technical patterns are well-worn. Use External IDs for upserts.

Stamp every async request and every inbound integration call with a deduplication key. Check state before mutating it. ✅

// Reject duplicate work before it propagates
Map<String, Job__c> seen = new Map<String, Job__c>();
for (Job__c j : [SELECT Request_Id__c FROM Job__c
                 WHERE Request_Id__c IN :requestIds]) {
    seen.put(j.Request_Id__c, j);
}
// Process only what's actually new

The governance angle is the contract. Every consumer of your Platform Events, APIs, or async work needs your idempotency guarantees in writing.

Version them when they change. Treat them like SLAs. 📜

STOP THE BLEEDING FAST

EVERY AUTOMATION NEEDS BRAKES

When automation goes feral at 2 AM, you need a switch that does not require a deployment. ⛔

Custom Metadata Types are the standard pattern. Build an Automation_Control__mdt record set.

Every Flow, every Apex handler, every Queueable checks it before doing real work. Flip a checkbox, and the dangerous code path goes dormant in seconds.

Circuit breakers are the smarter cousin. They trip themselves when error rates spike and self-heal once conditions clear.

Wrap external callouts in this pattern and stop letting one slow API torch your transaction limits. 🔌

Governance decides who flips switches and when. Document the runbook. Name the on-call rotation. Define the trip criteria and the reset criteria.

Without authority and process, the kill switch becomes a button nobody dares press. 🛑

YOU CAN'T FIX BLIND

SEE IT BEFORE THEY DO

Most orgs learn about automation failures from users. That is a containment failure stacked on top of an automation failure. 👀

Structured logging is the floor. Stamp every transaction with a correlation ID so a single Case event can be traced from inbound API to outbound integration to downstream cleanup. Pipe logs to whatever observability platform your enterprise already pays for. Splunk, Datadog, New Relic, Sumo Logic.

Platform Events double as excellent audit rails. Emit a domain event for every meaningful state change.

Now you have a queryable, replayable history of what your org actually did, not what you assumed it did. 🔍

Track three signals at minimum: error rate, latency, and throughput. When any of them shifts more than two standard deviations from baseline, page someone.

Stop discovering failures through Slack escalations from angry users.

Governance ties it together. Someone owns the dashboard. Someone reviews the SLOs. Misses get discussed in a recurring forum. Telemetry without accountability is expensive noise. 📊

AUDIT YOUR BLAST ZONES

CONTAINMENT IS THE FEATURE

Pick your most business-critical automation. Walk it through four questions before the next release goes out:

  1. If this fails, what else fails with it? 🤔

  2. Can it run twice without breaking anything? 🤔

  3. How do we stop it without a deployment? 🤔

  4. How would we know it broke before a user told us? 🤔

If you cannot answer all four with confidence, you do not have an automation problem. You have a blast radius problem.

Find it. Fix it. Before the 2 AM page finds you first. 🚨

SOUL FOOD

Today’s Principle

"Everything fails, all the time."

Werner Vogels

and now....Salesforce Memes

What did you think about today's newsletter?

Login or Subscribe to participate in polls.