If I were setting up a new AWS Organization today, I would use something like the following design. This is based on prior experience, combined with newer AWS features that I would like to see deployed more widely.

I am specifically avoiding tools like AWS Control Tower as they can be somewhat inflexible, and a purpose built Terraform environment will likely be more flexible as an organization grows.

Management Account

The Management (or Central) account is the AWS account that controls your AWS Organization.

This account holds absolute power over all other accounts in its Organization, so it is absolutely critical that you secure this account to the maximum extent possible. There is no way to reasonably mitigate the access that this account has to your other accounts, and you should always assume it can gain root access to all other accounts.

Whenever possible, implement the following policy for this account:

  • Absolutely no code execution is permitted within the account
  • Access is strictly minimized
  • Any AWS features/systems which can be delegated must be delegated
    • AWS Billing/CUR data should be exported to S3 in another account, and processed remotely
  • The default answer to any request for access must be “no”

Logging Account

Second only to the management account is your Logging account. This account will collect, store, and preserve logs from all other systems, such as AWS Cloudtrail.

If S3 Object Locking can be enabled for logging, then the security controls in this account can be somewhat relaxed. However, if object locking is not possible/practical then a policy equivalent to the management account should be implemented.

Central Infra

Central Infra contains key infrastructure that all other accounts depend on high privilege systems, such as Atlantis.

This account should be used for elements such as:

  • Codebuild for Atlantis ECR image
  • ECR storage for Atlantis container, shared to all other accounts in the org
  • Domain registration
  • Route53 zones for above domains

While this account is used for some execution and code building, it should build the absolute minimum required to enable all other AWS accounts to function. General purpose execution and code building should use a separate account.

Among the domains in this account should be at least one “service domain” used for AWS/backend infrastructure. When possible, this domain should be relatively short, easy to type/remember, and distinct from any production domains. For example, google.com uses 1e100.net for portions of their infrastructure. As a bonus, this domain can be used as a canary token when searching for unreasonable sharing on Google and GitHub.

Central Networking

Optional: I haven’t yet used this pattern at scale, but I plan to in any future AWS account build out.

Contains “the one true VPC” which will be shared out to all other accounts using RAM sharing. If you use this pattern, consider using a SCP to prevent VPC creation in all other accounts, and delete their default VPCs.

By using one global VPC, you gain several advantages:

  • Prevent any need for distributed IPAM or CIDR management
  • Prevent any CIDR overlaps among teams/VPCs
  • Greatly simplify network interconnects
  • Greatly simplify private network access from Tailscale/VPNs
  • Greatly simplify DNS logging and private zone attachment
  • Removes or reduces requirements for duplicate high cost infra, such as NAT Gateways

Though there are some potential disadvantages:

  • IP allocation must still be done with care to prevent unreasonable CIDR usage
  • Subnet naming must either be AZ-agnostic or use the physical AZ IDs
    • A subnet named “infra-usw2a” will only make sense in the source account, as AZs are randomized across account boundaries
  • This pattern is still somewhat rare so engineer experience is limited

Variations:

  • Consider hosting a small set of VPCs rather than one, such as VPCs for:
    • General purpose systems
    • Atlantis, with one subnet per account
    • Sensitive infrastructure
    • Vendors

While “one true VPC” is the most cost-effective, and certainly the best for a low-cost goal, consider supporting a small set of VPCs in the future when designing your system.

Generic Account Unit

A generic account unit (GAU) is the template that you will stamp out a hundred times.

Before creating a GAU, the following details should be collected and recorded:

  • Name: A unique name within your AWS Organization
    • Used for IAM alias, subdomain, and terraform folders/identifiers
  • Owner
  • Purpose
  • Sensitivity and/or compliance details

The template will contain, at least, and detailed in the sections below:

  • A subdomain of the Central Infra service domain
    • The Central Infra account should delegate the zone to this account using an NS record
    • All services within the account should default to using this subdomain
  • Atlantis
  • Monitoring roles/tools

To enable this “GAU”, some form of bootstrapping process will be required, usually in the form of a script that does something like:

  • Create a new account in the Management Org
  • Create a subdomain in the Central Infra account
  • Create other infra required to represent new account, such as github repo (for terraform) and the like
  • Step into the account and create the initial role/access required for Terraform to start
  • Run Terraform in the account to create Atlantis infrastructure
  • “Hand over” control of terraform to Atlantis

Atlantis

Atlantis should be contained within the account that it manages. In this way each account is responsible for itself and a compromise of any account is easily contained. In addition, special purpose or low-security accounts are easily isolated from one another.

The S3/KMS/IAM components should be locked down such that their use is only possible from the ECS container under normal circumstances. Under special circumstances a root user can follow a take-over procedure to “take back” control of Terraform.

Atlantis infrastructure can take many forms, but I recommend something like:

  • IAM role for Atlantis, accessible only to ECS
  • VPC (or subnet shared by RAM) containing
    • ECS Fargate container running Atlantis image built in Central Infra
    • An ALB with two endpoints, fronting the fargate container
      • /events for GitHub webhooks to access, open to the internet
      • Catch all, filtered to the internal network and perhaps only to the VPN subnet
  • KMS key accessible only to Atlantis
  • A S3 bucket for Terraform state
    • Encrypted by the Atlantis KMS key
    • Versioning enabled
    • Possibly with object lock
  • Dynamodb table for Terraform locks

Monitoring Roles

The specific form of monitoring roles will likely vary depending on the tools you deploy over time.

Regardless, you should probably create at least one IAM role, with the SecurityAudit managed policy attached, which can be assumed by your management account. This role can be used later for any auditing and tooling.