Against Alert-Only Automation

|

Every Security Operations Center (SOC) I walk into has the same two things. An automation platform, and a backlog larger than it was last quarter. Most security leaders are willing to discuss the second one. Few are willing to admit the relationship between the two.

The industry decided "alert-only automation" counted as automation, and the backlog has been pointing at the contradiction for a decade.

The pitch that aged badly

The case for alert-only automation has been consistent since Security Orchestration, Automation, and Response (SOAR) shipped. Detect faster, triage faster, route faster, get the right alert to the right human faster. The implicit promise was that the team would have to take fewer manual actions, because the system would narrow the action surface to what mattered. Modern Extended Detection and Response (XDR) platforms, Security Information and Event Management (SIEM) consolidation plays, and most of the "AI-powered SOC" generation rest on the same premise. Better signal in, fewer decisions out.

That premise looked correct in the slide decks, but it has not held up in production.

What alert-only automation actually does at enterprise scale is re-categorize work. The analyst hours that used to go into investigation now go into response. The response is still manual. The platform generates a higher-fidelity ticket; a human takes the action. The SOC got faster at producing tickets the team could not close.

What the SOAR program revealed

The SOAR generation is the cleanest case study, because the pitch and the outcome are well documented now. The promise was that playbooks would handle the routine 80 percent of incidents so analysts could focus on the 20 percent that mattered. In practice, the playbook catalog became a maintenance burden. Every playbook ages with the environment. A team reorganization, a renamed Active Directory group, a new cloud account, an Identity Provider migration, and the playbook fails silently. The on-call analyst finds out at 3 a.m. that the runbook for the credential exposure scenario stopped firing six weeks ago, and the only evidence is in a Splunk query nobody had a reason to run.

I have sat through the post-incident reviews on many of these. The pattern is identical. The team did not know the automation had stopped working until something broke. The runbook was treated as durable infrastructure when it was actually a code asset with no owner.

That is the first cost of alert-only automation. The system creates the appearance of work being done. The reality is that the work is being deferred to a runbook that nobody is paying to maintain.

The three costs nobody puts on the slide

Analyst attention is a finite resource. Alert-only automation spends it on triage decisions that do not change outcomes. A higher-fidelity alert that arrives at 11:42 PM still has to be looked at by a person. 

The backlog grows in the place automation does not touch. Detection accelerates. Remediation does not. The dormant identities, expired certificates, overprivileged service accounts, and unreviewed Open Authorization (OAuth) integrations sitting on the board's risk register do not move, because no SOAR playbook was written to actually disable a service account safely across three identity systems with stakeholder approval. They were written to open a ticket.

Trust erodes one near-miss at a time. SOAR programs die from accumulated near-misses nobody wants to defend in front of an audit committee. Single incidents are rarely the cause. The first time a playbook fires on the wrong account and disables a service that runs production payroll, the team adds approval gates everywhere. The second time, the automation gets turned off "for a quarter, to be revisited."

What automation should mean

The next category of automation in security is execution. The system does the work, end to end. It identifies who actually owns the resource right now, gathers the cross-system context required to act safely, asks the owner for confirmation with that context in hand, makes the change across all the systems that need to change, and logs every step. Human approval lands at the consequential gates, where it belongs, ahead of irreversible actions.

This is the difference the alert-only generation skipped. Humans should be approving actions, not deciding which alerts get a runbook attached. The runbook era assumed humans were always in the work. The agentic era assumes humans are in the approval. Those are different operating models, and they produce different backlog curves.

For the security buyers reading this: when the next vendor says "AI-powered automation," ask which one they mean. If the demo ends with a Jira ticket appearing in someone else's queue, the answer is still triage.

What to do Monday morning

Stop measuring the SOC by alerts processed. The metric is corrosive. It rewards activity over outcomes, and it turns the budget conversation into a debate about which dashboard is fastest.

Start measuring the hygiene backlog. Dormant accounts older than 90 days. Expired certificates renewed before expiration. External OAuth apps reviewed within 30 days of grant. Privileged service accounts with no business owner of record. These are metrics a Chief Financial Officer (CFO) and a board can read without translation. They are also the metrics that move when execution-grade automation lands, and stay flat when an organization keeps buying faster triage.

Pick one use case where the action is reversible and start there. The first use case is a trust-building exercise. Its job is to make the next four shippable, not to win the budget conversation on its own. Dormant account cleanup is the cleanest starting point I have seen, because the blast radius is bounded, the ownership pattern is well understood, and a mistake is recoverable inside one business day.

Then measure backlog reduction, not action volume. "We took 1,000 actions" is a vanity number. "The remediation backlog dropped 40 percent in 90 days" is a board-readable result. If a program cannot produce the second number, it is doing triage with a different name.

A closing line for the operators in the room

The SOC of 2030 will be measured by work finished, not by alerts processed. The teams that internalize that shift now are the ones that will still have headcount when the budget conversation arrives.

Logo

Ready to operationalize your security?