Zero Trust in AWS: Architecture Patterns That Actually Work

Most AWS environments are built on an implicit assumption: anything inside the VPC is trusted. Security groups control which resources can talk to each other within the VPC, and the VPC perimeter — the combination of NACLs, route tables, and internet gateways — is treated as the boundary between trusted and untrusted. This is the perimeter security model, and it has a fundamental flaw: the perimeter is much easier to breach than organizations realize, and once it’s breached, the attacker has lateral movement opportunities everywhere the perimeter was assumed to be the protection.

Traditional VPC-perimeter model vs Zero Trust. In Zero Trust, every AWS API call is verified by IAM condition keys regardless of network origin.

The VPC-as-Perimeter Fallacy

A VPC provides network-level isolation. Security groups and NACLs control which IP addresses and ports can communicate with which resources. This is valuable, but it does not protect against a compromised EC2 instance, a compromised developer workstation with VPN access, or an attacker who has stolen a set of IAM credentials that happen to have network path to your resources.

When an EC2 instance is compromised — through an application vulnerability, a supply chain attack, a malicious container image — the attacker inherits everything the instance profile allows. If that instance profile has s3:GetObject on "Resource": "*", the attacker can download every S3 bucket in your account. Security groups didn’t help because the attack originates from within the VPC, from a legitimate AWS identity. The perimeter was already inside.

Zero Trust replaces the implicit-trust model with a simple principle: no request is trusted by default, regardless of where it originates. Every API call to every AWS service must present verifiable identity, and the resource decides whether to honor that identity based on explicit conditions — not based on whether the call came from “inside the VPC.”

Principle 1 — Verify Explicitly

In Zero Trust, identity verification is not optional and not based on network location. In AWS, this means replacing security group rules as the primary access control mechanism with IAM roles and IAM condition keys.

The shift is conceptual: instead of asking “can this IP address reach this resource?” the system asks “is this authenticated IAM principal authorized to perform this action on this resource?” A DynamoDB table shouldn’t accept writes from any EC2 instance with network access — it should accept writes from EC2 instances carrying a specific IAM role, and ideally only when that role is further constrained by a session policy for that specific workflow.

Session policies are a powerful tool here. When a service assumes an IAM role via sts:AssumeRole, it can pass a session policy that further restricts permissions for that session — even below what the role’s identity policy allows. A Lambda function that runs under a role with broad S3 access can assume a session with a policy restricting it to GetObject on a single bucket ARN for a specific execution. If that Lambda is compromised, the blast radius is contained to that one bucket.

Principle 2 — Use Least Privilege Access

Least privilege in Zero Trust isn’t just about having the right policies on IAM roles — it’s about making privilege contextual and time-bounded. A role with s3:GetObject on all buckets is less dangerous than a session token that has s3:GetObject scoped to one bucket and expires in 15 minutes.

AWS Systems Manager Session Manager illustrates this well. Instead of opening port 22 on EC2 instances for SSH access (a network-level privilege that any compromised instance could exploit), SSM provides authenticated, session-based access that requires specific IAM permissions (ssm:StartSession) and logs every command to CloudWatch. The session itself is scoped to a single instance, time-limited, and fully auditable. Network access to the instance is never granted.

The same pattern applies to database access via IAM authentication for Aurora and RDS. Rather than a long-lived database password stored in an environment variable, an application uses rds-db:connect permission to get a short-lived authentication token from the IAM service. The token expires, the application requests a new one, and no persistent credential exists to steal.

Principle 3 — Assume Breach

The most operationally transformative principle in Zero Trust is designing with the assumption that the perimeter has already been compromised. If you assume that an attacker is already inside your VPC with valid credentials, what controls would contain them? That question reshapes how you architect resource access.

VPC endpoints with endpoint policies are the primary mechanism for containing blast radius at the network layer under the assume-breach model. A VPC endpoint for S3 routes traffic from your VPC to S3 without traversing the public internet — but more importantly, it allows you to attach an endpoint policy that restricts which S3 buckets can be accessed via that endpoint. Combined with an S3 bucket policy that denies any request not originating via the VPC endpoint, you create a control where exfiltrating data to an attacker-controlled external bucket is impossible even for a compromised instance with broad S3 permissions. The instance can only talk to S3 buckets via the endpoint, and the endpoint policy limits which buckets are reachable.

Interface VPC endpoints for SSM, Secrets Manager, and KMS are equally critical. Without them, your EC2 instances and Lambda functions call these services over the public internet — meaning the traffic egresses your VPC, potentially bypasses endpoint policies, and is visible to network-layer interception. With interface endpoints, the traffic stays within AWS’s private network and can be further controlled by endpoint policies that restrict which principals can use the endpoint and which resources they can access.

Concrete Patterns: IAM Condition Keys as Zero Trust Controls

IAM condition keys are the enforcement mechanism for Zero Trust in AWS. Several deserve specific attention.

aws:SourceVpc and aws:SourceVpce in S3 bucket policies enforce that requests must come from a specific VPC or VPC endpoint. A bucket policy with a Deny statement using StringNotEquals: aws:SourceVpce: vpce-xxxxxxxx means any request that doesn’t come through the designated VPC endpoint — including requests from a principal with s3:GetObject permission who is calling from outside the VPC — is denied. This is how you prevent exfiltration even from within an assumed-role session.

{
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:*",
  "Resource": [
    "arn:aws:s3:::my-sensitive-bucket",
    "arn:aws:s3:::my-sensitive-bucket/*"
  ],
  "Condition": {
    "StringNotEquals": {
      "aws:SourceVpce": "vpce-0a1b2c3d4e5f"
    }
  }
}

aws:CalledVia ensures that service-to-service calls only flow through approved intermediaries. A DynamoDB table policy that includes aws:CalledVia: cloudformation.amazonaws.com means direct DynamoDB calls (not via CloudFormation) from any principal — even administrators — are denied. This prevents attackers with IAM credentials from directly modifying infrastructure state outside your approved change management process.

aws:PrincipalOrgID restricts resource access to principals within your own AWS Organization. Applied to S3 bucket policies, KMS key policies, and SQS queue policies, it ensures that even if an attacker exfiltrates temporary credentials and uses them from outside your AWS Organization, resource access is denied. The credentials themselves are valid, but the resource’s policy requires an org membership that the external caller cannot satisfy.

Common Pitfall: Bypassing VPC Endpoints

One of the most common failures in Zero Trust AWS implementations is developers hardcoding public API endpoints, bypassing VPC endpoints entirely. An EC2 instance configured with a VPC endpoint for S3 will route S3 calls through that endpoint by default — but if a developer hardcodes https://s3.us-east-1.amazonaws.com in an application, the SDK may bypass the endpoint and call the public endpoint instead, defeating the VPC endpoint policy.

Detect this with CloudTrail: calls to S3 that have vpcEndpointId absent from the CloudTrail event’s requestParameters are coming via the public endpoint. Create an EventBridge rule or Athena query that identifies S3 API calls from EC2 instance profiles that lack a vpcEndpointId field. If your Zero Trust controls require endpoint usage, these calls represent violations worth investigating.

An SCP can enforce endpoint usage at the organization level: deny s3:* actions where aws:sourceVpce is null AND the principal is an EC2 instance profile (identified by the principal ARN pattern). This forces applications to route through VPC endpoints or receive access denied errors, creating a forcing function for developer compliance.

From Architecture to Enforcement

Zero Trust is not a product you buy — it’s an architecture you implement incrementally. Start with the highest-risk data paths: what data in your AWS environment would be most damaging if exfiltrated? Apply VPC endpoint policies and bucket policies with aws:SourceVpce conditions there first. Add session policies to the Lambda functions and EC2 roles that access that data. Enable IAM Access Analyzer to surface any externally accessible resources. Use SCPs to prevent new public S3 buckets from being created.

The goal is not to make every API call require a VPN — it’s to make the authorization decision independent of network location. An IAM principal with the right permissions, calling via an approved VPC endpoint, on a resource with a policy that requires that endpoint: this is a request that no amount of network compromise can fake. That’s the standard Zero Trust in AWS is designed to reach.