Security Groups vs. Network ACLs: Choosing Your AWS VPC Firewall

When designing a secure Virtual Private Cloud (VPC) environment in Amazon Web Services (AWS), administrators rely on multiple layers of control to manage network traffic. The two foundational components for filtering traffic at the network level are Security Groups (SGs) and Network Access Control Lists (NACLs).

They look similar in the console, but they fail in very different ways. Security groups are usually where you describe application intent. Network ACLs are where you enforce broad subnet boundaries or emergency denies.

The Role of Firewalls in AWS VPC

AWS provides network security at two primary levels within a VPC:

Instance Level (Security Groups): Acts as a firewall for specific EC2 instances or resources (like RDS databases or Elastic Load Balancers). It controls traffic to and from the network interface.
Subnet Level (Network ACLs): Acts as a stateless firewall for entire subnets, controlling traffic flow entering or leaving the subnet boundary.

Deep Dive into Security Groups (SGs)

Security Groups function as the primary, fine-grained firewall for individual resources. They are stateful and filter by protocol, port, and source or destination.

Key Characteristics of Security Groups

Feature	Description	Implications for Use
Scope	Applies directly to the Elastic Network Interface (ENI) of an instance.	Controls traffic flow to and from the instance itself.
Statefulness	Stateful. If an inbound request is explicitly allowed, the corresponding return traffic (outbound response) is automatically allowed, regardless of outbound rules.	Simplifies configuration; only need to define the initiating traffic direction.
Rule Type	Allow only. Explicit deny rules are not possible. Traffic that doesn't match an explicit `ALLOW` rule is implicitly denied.	Focuses on defining what is permitted.
Evaluation	All rules are evaluated before a decision is made. They are not numbered, and no implicit `DENY` is processed until all `ALLOW` rules fail.	Ordering does not matter; all rules are treated equally.

Security Group Configuration Example

To allow SSH access (port 22) to an EC2 instance, you only need an inbound rule. The outbound rule for the SSH response is handled automatically by the stateful nature of the SG.

Type	Protocol	Port Range	Source	Description
Inbound	TCP	22	0.0.0.0/0 (or specific admin IP)	Allow SSH access
Outbound	All	All	0.0.0.0/0	(Default: Allows all traffic, but this can be restricted if needed)

# Conceptual representation of a stateful flow
User (Source IP) --> [Inbound SG Rule: ALLOW 22] --> EC2 Instance
EC2 Instance (Response) --> [Implicit State Tracked] --> User (Response received)

Best Practice Tip: Always define Security Group rules using the principle of least privilege. Whenever possible, restrict source IP ranges instead of allowing 0.0.0.0/0.

Deep Dive into Network ACLs (NACLs)

Network ACLs provide a second layer of defense, acting as a stateless filter at the subnet boundary. They are powerful for network segmentation and broad denial policies.

Key Characteristics of Network ACLs

Feature	Description	Implications for Use
Scope	Applies to an entire VPC subnet. A subnet can only be associated with one NACL at a time.	Controls all traffic entering or leaving the subnet, affecting all instances within it.
Statefulness	Stateless. Both inbound requests and the corresponding outbound responses must be explicitly allowed.	Requires careful configuration for return traffic (ephemeral ports).
Rule Type	Allow and Deny. You can explicitly define rules to permit or block traffic.	Excellent for blocking known malicious IPs or denying specific protocols network-wide.
Evaluation	Rules are numbered (1 to 32766) and evaluated sequentially, starting from the lowest number. The first matching rule is applied immediately.	Rule ordering is critical. The implicit deny rule (the last rule processed) denies everything that has not been explicitly allowed.

Handling Stateless Traffic (Ephemeral Ports)

Because NACLs are stateless, you must consider the ephemeral ports used by clients connecting to your servers. When a client initiates a connection, it uses a destination port (e.g., 80 for HTTP) and a high-numbered source port (ephemeral port range, typically 1024-65535).

To allow web traffic (HTTP) into a subnet, you need two rules:

Inbound Rule: Allows traffic on the destination port (e.g., 80).
Outbound Rule: Allows return traffic back to the client using the ephemeral source ports.

Rule #	Type	Protocol	Port Range	Source/Destination	Rule Action
100	Inbound	TCP	80	0.0.0.0/0	ALLOW (Web traffic in)
110	Outbound	TCP	1024-65535	0.0.0.0/0	ALLOW (Web response out - Ephemeral ports)
*	Implicit Deny	All	All	All	DENY (Processed last)

Warning: If you miss the corresponding outbound rule for ephemeral ports in a NACL, the traffic will reach the instance (due to the inbound rule) but the response will be dropped at the subnet boundary, leading to connection timeouts.

Comparison Summary: SG vs. NACL

The following table summarizes the crucial differences between the two firewall types:

Feature	Security Group (SG)	Network ACL (NACL)
Applicability Scope	Instance/ENI Level	Subnet Level
State	Stateful	Stateless
Rule Types	Allow Only	Allow and Deny
Rule Evaluation	All rules evaluated, no specific order.	Rules evaluated sequentially by number (lowest first); first match wins.
Default Behavior	Denies all inbound, allows all outbound (unless restricted).	Default NACL allows all inbound/outbound. Custom NACLs deny all inbound/outbound.
Effect on Traffic	Only applies rules if traffic is destined for or originating from an associated resource.	Filters traffic passing the subnet boundary, affecting all resources in the subnet.

Choosing the Right Firewall: Scenarios and Best Practices

Successful VPC security relies on using SGs and NACLs together in a layered approach (Defense in Depth).

When to Prioritize Security Groups

Security Groups should be the primary tool for filtering network access due to their stateful nature and ability to reference other SGs, simplifying application configuration.

Fine-Grained Application Control: Use SGs to define exactly which ports and protocols are required for a specific application (e.g., only allowing traffic on port 3306 from the web server SG to the database SG).
Internal Communication: Manage security for traffic between instances within the same subnet or across subnets (e.g., ensuring a load balancer can talk to its target groups).
Ease of Management: Since they are stateful, SGs require fewer rules and are less error-prone than managing ephemeral ports with NACLs.

When to Implement Network ACLs

NACLs are best used for setting broad, network-wide boundaries and segmentation policies.

Broad Denial Policies: Use explicit DENY rules (Rule #100) to block specific, malicious IP addresses or IP ranges across an entire subnet before the traffic even reaches the instances.
Subnet Segmentation: Enforce strict boundaries between layers of your architecture (e.g., ensuring the database subnet NACL explicitly denies all inbound traffic from the internet, regardless of how an SG might be configured).
Compliance Requirements: Certain compliance standards may mandate subnet-level filtering, making NACLs essential.
Stateless Protocol Filtering: NACLs are necessary if you need to filter stateless protocols that SGs cannot effectively manage on their own (though this is rare for standard TCP/UDP traffic).

Mistakes that cause outages

The most common NACL outage is forgetting return traffic. Someone allows inbound TCP 443 to a public subnet and leaves outbound rules too tight. The load balancer or instance receives the SYN, but the response is dropped on the way out. From the client side it looks like a timeout, and from the instance side the service may look perfectly healthy.

Another mistake is using NACLs for per-application policy. If a subnet contains web, worker, and admin instances, one NACL applies to all of them. A rule added for one workload can unexpectedly expose or break another workload in the same subnet. If you need different network behavior, use different security groups, and consider separate subnets only when there is a real boundary to enforce.

Rule numbering also deserves care. Leave gaps such as 100, 110, 120 instead of 1, 2, 3 so you can insert emergency rules later. Remember that the first match wins. A deny at rule 90 will beat an allow at rule 100, even if the allow looks more specific to the person reading the console quickly.

For security groups, the common mistake is broad source ranges. 0.0.0.0/0 on 443 for a public load balancer may be normal. The same source on SSH, RDP, Redis, PostgreSQL, or an internal admin API is usually a problem. Prefer security group references inside the VPC and narrow CIDRs for operator access.

When you inherit an existing VPC, export the rules and group them by intent: public entry points, app-to-app traffic, data stores, administration, and emergency denies. Rules without a clear owner or reason are where stale exposure usually lives.

The Defense-in-Depth Approach

In a typical, well-designed VPC, traffic flows must pass through both a NACL and a Security Group. If either security control denies the traffic, the packet is dropped.

Inbound Flow: Traffic enters the subnet -> NACL checks rules -> Traffic reaches instance ENI -> Security Group checks rules -> Traffic reaches the application.
Outbound Flow: Application generates response -> Security Group (Stateful check passed) -> Traffic leaves instance ENI -> NACL checks rules -> Traffic leaves the subnet.

By leveraging the NACL for coarse segmentation and denial rules, and the SG for precise, stateful, application-level permissions, you maximize security effectiveness while maintaining configuration simplicity.

A practical design pattern

For most application VPCs, start with security groups. Give the load balancer a public-facing security group, give the application instances a security group that only accepts traffic from the load balancer security group, and give the database a security group that only accepts traffic from the application security group on the database port. That model follows the app dependency graph and survives IP changes.

Use NACLs more sparingly. A good NACL use case is a subnet-level deny for a known bad CIDR, a hard boundary around a database subnet, or a compliance rule that must apply before traffic reaches any ENI in the subnet. NACLs become painful when teams try to mirror every application rule there. The stateless return-port rules are easy to get wrong, and one low-numbered deny can break a whole subnet.

When a connection times out, check both layers in the packet path. For inbound internet traffic to an EC2 instance in a public subnet, the request must pass the inbound NACL rule, the route table, and the inbound security group rule. The response must pass the stateful security group tracking and the outbound NACL rule. If SGs look correct but clients still hang, the NACL ephemeral-port rule is often the missing piece.

The cleanest mental model is this: security groups say which resources may talk to which other resources; NACLs say what the subnet will never allow. Keep those jobs separate and the design stays easier to audit.