← All articles
Incident Response 9 min read

Cloud Incident Response Runbook: The First 30 Minutes

The concrete playbook for the first 30 minutes of a cloud security incident: IAM isolation, evidence preservation via EBS snapshot and CloudTrail export, and blast radius analysis with Athena queries.

CloudDefender Team ·

Cloud incident response is not like traditional IR. The blast radius can expand to dozens of regions and hundreds of services in minutes. Evidence can be overwritten or auto-deleted. Attackers who have established persistence through secondary IAM users or Lambda backdoors will survive initial containment if you don’t check for them. And every decision you make — or fail to make — in the first 30 minutes shapes whether you end up with a contained incident or a prolonged breach investigation.

The decisions that matter most in those first 30 minutes are sequencing decisions. Do you isolate the compromised principal first, or preserve evidence first? Do you stop the affected EC2 instance, or snapshot it first? The right answers depend on the scenario, but the decision framework must be worked out before the incident, not during it.

Incident Response Timeline — First 30 Minutes1. Preserve (0–10 min)Export CloudTrail to forensic S3EBS snapshot (while running)Enable VPC Flow Logs now2. Contain (10–20 min)Attach deny-all inline IAM policyRevoke active STS sessionsDisable console access3. Analyze (20–30 min)Athena blast radius queryCheck AssumeRole pivotsCheck for persistence mechanismsDecision gate: active exfiltration observed? → Isolate first (skip phase 1). Otherwise: preserve first.
The three-phase first-30-minutes playbook. The default sequence is Preserve → Contain → Analyze. The single exception: if you observe active data exfiltration at scale, flip to Contain first and accept incomplete forensics.

The Isolation vs. Evidence Tradeoff

Every incident responder faces this dilemma: isolating the compromised credential immediately stops the attacker’s current access — but also destroys the window for observing what they do next. Watching and waiting risks additional data exfiltration and additional persistence.

The right sequencing for most scenarios: preserve first, then isolate. Evidence preservation takes 5 minutes if you’re prepared. CloudTrail log export and EBS snapshot creation are fast, non-disruptive operations. Isolating an IAM principal is a single API call that takes seconds. The 5-minute window between preservation start and isolation is an acceptable risk in almost every scenario — unless you’re watching active data exfiltration at scale, in which case isolate first and accept incomplete forensics.

Phase 1: Evidence Preservation (Minutes 0–10)

Export CloudTrail logs to a dedicated forensic S3 bucket. This bucket should exist before any incident — create it with S3 Object Lock in compliance mode so nothing written to it can be deleted or modified during the retention period.

aws s3api create-bucket --bucket forensic-evidence-[ACCOUNT_ID] --region us-east-1
aws s3api put-object-lock-configuration \
  --bucket forensic-evidence-[ACCOUNT_ID] \
  --object-lock-configuration \
  '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Days":365}}}'

Take EBS snapshots of affected EC2 instances while they’re running. Memory artifacts — running processes, network connections, in-memory keys — are lost when an instance stops. The disk state at time of compromise is captured in the snapshot:

aws ec2 create-snapshot \
  --volume-id vol-xxxxxxxxxxxxxxxxx \
  --description "IR-$(date +%Y%m%d-%H%M%S)-compromised-instance" \
  --tag-specifications 'ResourceType=snapshot,Tags=[{Key=incident,Value=active}]'

Enable VPC Flow Logs if not already active. They won’t capture historical traffic, but they’ll capture any ongoing lateral movement or data exfiltration that occurs during your investigation window:

aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-xxxxxxxxxxxxxxxxx \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::forensic-evidence-[ACCOUNT_ID]/flow-logs/

Phase 2: IAM Containment (Minutes 10–20)

The fastest, most reversible containment action is attaching an explicit deny-all inline policy. Unlike deleting the access key (irreversible), an inline policy takes effect within seconds and can be removed when you’ve rotated to clean credentials:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "IRContainment",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*"
    }
  ]
}
aws iam put-user-policy \
  --user-name compromised-user \
  --policy-name IRContainment \
  --policy-document file://deny-all.json

Revoke all active STS sessions — for roles, use sts:RevokeSession by attaching an inline policy with a DateLessThan condition on aws:TokenIssueTime. This revokes all existing sessions for the role, including sessions with token expiry times up to 12 hours in the future:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {
      "DateLessThan": {
        "aws:TokenIssueTime": "2026-06-01T12:00:00Z"
      }
    }
  }]
}

Disable console access by deleting the login profile for the compromised IAM user:

aws iam delete-login-profile --user-name compromised-user

Phase 3: Blast Radius Analysis (Minutes 20–30)

With the principal contained, the key question is: what did they actually do? Query CloudTrail using Athena against your CloudTrail Lake or S3-backed trail:

SELECT
  eventtime,
  eventsource,
  eventname,
  awsregion,
  sourceipaddress,
  useragent,
  requestparameters,
  errorcode
FROM cloudtrail_logs
WHERE useridentity.arn LIKE '%compromised-user%'
  AND eventtime > to_iso8601(current_timestamp - interval '24' hour)
ORDER BY eventtime ASC;

Look specifically for four classes of actions:

A specific query to identify cross-account role pivots:

SELECT requestparameters
FROM cloudtrail_logs
WHERE eventname = 'AssumeRole'
  AND useridentity.arn LIKE '%compromised-user%'
  AND eventtime > to_iso8601(current_timestamp - interval '24' hour);

Any AssumeRole where the roleArn contains a different account ID than your account indicates cross-account pivot — and means you need to contact the owner of that account.

Post-Incident Recovery Checklist

The organizations that handle cloud incidents well practiced this playbook before they needed it — they ran tabletop exercises, tested the Athena queries, confirmed the forensic bucket had Object Lock, and verified that the containment policy snippet actually worked. Don’t build this runbook during the incident.


CloudDefender includes a one-click incident response workflow that preserves evidence, isolates the compromised principal, and runs blast radius analysis automatically — reducing the first-30-minutes playbook from manual steps to a supervised automated response.

CloudDefender

Defend your cloud. Continuously.

CloudDefender Suite gives security teams continuous posture management, threat detection, and compliance automation across AWS, Azure, and GCP — with zero false-positive fatigue.

Try CloudDefender →