Cloud Incident Response Runbook: The First 30 Minutes

Cloud incident response is not like traditional IR. The blast radius can expand to dozens of regions and hundreds of services in minutes. Evidence can be overwritten or auto-deleted. Attackers who have established persistence through secondary IAM users or Lambda backdoors will survive initial containment if you don’t check for them. And every decision you make — or fail to make — in the first 30 minutes shapes whether you end up with a contained incident or a prolonged breach investigation.

The decisions that matter most in those first 30 minutes are sequencing decisions. Do you isolate the compromised principal first, or preserve evidence first? Do you stop the affected EC2 instance, or snapshot it first? The right answers depend on the scenario, but the decision framework must be worked out before the incident, not during it.

The three-phase first-30-minutes playbook. The default sequence is Preserve → Contain → Analyze. The single exception: if you observe active data exfiltration at scale, flip to Contain first and accept incomplete forensics.

The Isolation vs. Evidence Tradeoff

Every incident responder faces this dilemma: isolating the compromised credential immediately stops the attacker’s current access — but also destroys the window for observing what they do next. Watching and waiting risks additional data exfiltration and additional persistence.

The right sequencing for most scenarios: preserve first, then isolate. Evidence preservation takes 5 minutes if you’re prepared. CloudTrail log export and EBS snapshot creation are fast, non-disruptive operations. Isolating an IAM principal is a single API call that takes seconds. The 5-minute window between preservation start and isolation is an acceptable risk in almost every scenario — unless you’re watching active data exfiltration at scale, in which case isolate first and accept incomplete forensics.

Phase 1: Evidence Preservation (Minutes 0–10)

Export CloudTrail logs to a dedicated forensic S3 bucket. This bucket should exist before any incident — create it with S3 Object Lock in compliance mode so nothing written to it can be deleted or modified during the retention period.

aws s3api create-bucket --bucket forensic-evidence-[ACCOUNT_ID] --region us-east-1
aws s3api put-object-lock-configuration \
  --bucket forensic-evidence-[ACCOUNT_ID] \
  --object-lock-configuration \
  '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Days":365}}}'

Take EBS snapshots of affected EC2 instances while they’re running. Memory artifacts — running processes, network connections, in-memory keys — are lost when an instance stops. The disk state at time of compromise is captured in the snapshot:

aws ec2 create-snapshot \
  --volume-id vol-xxxxxxxxxxxxxxxxx \
  --description "IR-$(date +%Y%m%d-%H%M%S)-compromised-instance" \
  --tag-specifications 'ResourceType=snapshot,Tags=[{Key=incident,Value=active}]'

Enable VPC Flow Logs if not already active. They won’t capture historical traffic, but they’ll capture any ongoing lateral movement or data exfiltration that occurs during your investigation window:

aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-xxxxxxxxxxxxxxxxx \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::forensic-evidence-[ACCOUNT_ID]/flow-logs/

Phase 2: IAM Containment (Minutes 10–20)

The fastest, most reversible containment action is attaching an explicit deny-all inline policy. Unlike deleting the access key (irreversible), an inline policy takes effect within seconds and can be removed when you’ve rotated to clean credentials:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "IRContainment",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*"
    }
  ]
}

aws iam put-user-policy \
  --user-name compromised-user \
  --policy-name IRContainment \
  --policy-document file://deny-all.json

Revoke all active STS sessions — for roles, use sts:RevokeSession by attaching an inline policy with a DateLessThan condition on aws:TokenIssueTime. This revokes all existing sessions for the role, including sessions with token expiry times up to 12 hours in the future:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {
      "DateLessThan": {
        "aws:TokenIssueTime": "2026-06-01T12:00:00Z"
      }
    }
  }]
}

Disable console access by deleting the login profile for the compromised IAM user:

aws iam delete-login-profile --user-name compromised-user

Phase 3: Blast Radius Analysis (Minutes 20–30)

With the principal contained, the key question is: what did they actually do? Query CloudTrail using Athena against your CloudTrail Lake or S3-backed trail:

SELECT
  eventtime,
  eventsource,
  eventname,
  awsregion,
  sourceipaddress,
  useragent,
  requestparameters,
  errorcode
FROM cloudtrail_logs
WHERE useridentity.arn LIKE '%compromised-user%'
  AND eventtime > to_iso8601(current_timestamp - interval '24' hour)
ORDER BY eventtime ASC;

Look specifically for four classes of actions:

sts:AssumeRole — did they pivot to other roles in this account or cross-account?
s3:GetObject and s3:ListBucket — what data did they access?
iam:CreateUser, iam:CreateAccessKey, iam:CreateLoginProfile — did they establish persistence?
ec2:RunInstances, lambda:CreateFunction — did they deploy resources for ongoing access?

A specific query to identify cross-account role pivots:

SELECT requestparameters
FROM cloudtrail_logs
WHERE eventname = 'AssumeRole'
  AND useridentity.arn LIKE '%compromised-user%'
  AND eventtime > to_iso8601(current_timestamp - interval '24' hour);

Any AssumeRole where the roleArn contains a different account ID than your account indicates cross-account pivot — and means you need to contact the owner of that account.

Post-Incident Recovery Checklist

Export CloudTrail logs to forensic S3 bucket with Object Lock
EBS snapshot of all affected instances (while running)
Enable VPC flow logs if not already active
Attach deny-all inline IAM policy to compromised principal
Revoke all active sessions (STS token revocation policy)
Disable console access for compromised user
Run Athena blast radius query for last 24 hours
Check for sts:AssumeRole pivots to other roles or accounts
Check for new IAM users, access keys, or login profiles created
Check for new EC2 instances, Lambda functions, or ECS tasks
Review S3 bucket policy changes during compromise window
Generate IAM credential report: aws iam generate-credential-report
Rotate all access keys for compromised principal and assumed roles
Scope breach notification obligations from S3 access logs
Document full timeline for post-incident review

The organizations that handle cloud incidents well practiced this playbook before they needed it — they ran tabletop exercises, tested the Athena queries, confirmed the forensic bucket had Object Lock, and verified that the containment policy snippet actually worked. Don’t build this runbook during the incident.

CloudDefender includes a one-click incident response workflow that preserves evidence, isolates the compromised principal, and runs blast radius analysis automatically — reducing the first-30-minutes playbook from manual steps to a supervised automated response.

The Isolation vs. Evidence Tradeoff

Phase 1: Evidence Preservation (Minutes 0–10)

Phase 2: IAM Containment (Minutes 10–20)

Phase 3: Blast Radius Analysis (Minutes 20–30)

Post-Incident Recovery Checklist

Defend your cloud. Continuously.