Best Practices for Handling and Requesting AWS Service Limit Increases

Monitor AWS service quotas, plan capacity early, and submit clear quota increase requests before throttling affects production.

Best Practices for Handling and Requesting AWS Service Limit Increases

AWS service quotas protect services from runaway usage, but they can also stop your scaling plan at the worst moment. If your team does not watch quotas before a launch or traffic spike, you can hit throttling, failed deployments, or capacity errors even when your application code is healthy.

Use the Service Quotas console, CloudWatch, and clear business justification to manage limits as part of normal capacity planning.

Understanding AWS Service Quotas

Before initiating any requests, it is essential to understand the nature of AWS limits. These limits are typically categorized based on resources (e.g., number of EC2 instances), throughput (e.g., IOPS), or API requests per second (RPS).

Soft Limits vs. Hard Limits

Most quotas fall into one of two categories:

  • Soft Limits (Adjustable Quotas): These are the vast majority of quotas. They are default values that AWS sets for new accounts and can generally be increased by submitting a request to AWS Support, provided there is sufficient business justification.
  • Hard Limits (Non-Adjustable Quotas): These limits are dictated by service design, safety, or infrastructure constraints. They generally cannot be increased, so you need an architectural workaround.

Tip: Always check the AWS Service Quotas console first. Limits listed there are usually soft limits and are the preferred way to submit requests.

Common Limits That Require Attention

In highly scalable environments, the following limits are often the first to be reached and should be monitored closely:

  1. EC2 On-Demand Instance Count: The total number of vCPUs running across all EC2 instance types in a Region.
  2. EBS Volume Count/Size: Limits on the total number or cumulative size of attached volumes.
  3. VPC Resources: Limits on the number of VPCs, Internet Gateways, NAT Gateways, and Elastic IPs (EIPs).
  4. API Throttling Limits: Requests Per Second (RPS) limits for services like S3, DynamoDB, or Lambda invocation rates.

Proactive Monitoring and Anticipation

Reacting to throttling is expensive and disruptive. The goal is to proactively anticipate limit breaches long before they impact production.

1. Utilizing the Service Quotas Console

The AWS Service Quotas console is the single authoritative source for viewing current quotas and tracking utilization across many services. It replaces the need to check limits across various service consoles.

Actionable Step: Regularly audit quotas for services critical to your application, such as Lambda, EC2, RDS, VPC, and DynamoDB. Investigate any quota that is steadily climbing or already near your alert threshold.

2. Implementing CloudWatch Alarms

For critical limits, set automated alarms that notify your team when usage approaches a dangerous threshold.

Many resource metrics (like EC2 vCPU usage, Lambda concurrency) are published to CloudWatch. For quotas that are directly integrated with Service Quotas, you can create alarms directly from the Quotas console, typically setting them at 80% utilization.

# Example: Setting an 80% utilization alarm for Lambda Concurrent Executions
# (Often configured via the Service Quotas console integration or CloudFormation)

AlarmName: LambdaConcurrencyWarning
MetricName: ConcurrentExecutions
Namespace: AWS/Lambda
Statistic: Maximum
Period: 300
Threshold: [Current Limit * 0.80] 
ComparisonOperator: GreaterThanThreshold
EvaluationPeriods: 2
TreatMissingData: notBreaching

3. Forecasting and Planning

Align quota management with development milestones and marketing campaigns. If a major scaling event or product launch is scheduled, calculate the maximum required capacity and submit the increase request well in advance. Some requests complete quickly; others need human review or extra justification.

The Efficient Service Limit Increase Request Procedure

AWS prefers that limit increase requests be submitted through the Service Quotas console, as this automates routing and accelerates the approval process.

Step 1: Submitting via Service Quotas Console (Recommended)

  1. Navigate to the AWS Service Quotas console.
  2. Search for the specific service (e.g., 'Amazon EC2').
  3. Click on the relevant quota (e.g., 'Running On-Demand All Standard Instances').
  4. Click the Request increase button.
  5. Specify the new desired limit and the Region.
  6. Provide a detailed justification (see Step 3).

If the quota is not listed in the Service Quotas console, you must submit the request through the traditional AWS Support Center under the 'Service Limit Increase' case type.

Step 2: Key Information to Include in the Request

To prevent back-and-forth communication with AWS Support, ensure your request is comprehensive:

  • AWS Region: Specify the exact Region (e.g., us-east-1) where the increase is needed.
  • Specific Limit Name: Provide the precise name of the quota (e.g., number of running Fargate tasks).
  • Current Limit: (Optional, but helpful) Confirm the existing limit you are hitting.
  • Requested New Limit: State the exact final number you require (e.g., increase from 100 to 500).
  • Business Justification: This is the most crucial element.

Step 3: Crafting a Strong Business Justification

AWS engineers require concrete evidence that the requested limit is necessary, sustainable, and accurate. Vague requests are often delayed or denied.

Do not use: "We need more resources for testing." Do use: "We require 500 additional vCPUs, for a total of 750, in eu-west-1 to support a new ECS Fargate workload. Load testing shows peak demand of 100 concurrent tasks during launch traffic. We need the capacity available before the scheduled release window."

Justification Component Example Detail
Use Case New application launch, client onboarding, seasonal promotion, database migration.
Calculation Basis Load test results, projected traffic growth (RPS), number of users, concurrency requirements.
Timeline When the capacity is needed (e.g., Need capacity operational by 2024-11-01).
Duration Is this a permanent increase or temporary spike?

Advanced Best Practices and Handling Denial

Architectural Strategies to Avoid Limits

Sometimes, increasing a limit is the right approach, but often, the bottleneck indicates an architectural inefficiency. Consider these mitigation techniques before requesting extremely large increases:

  1. Implement Exponential Backoff and Jitter: Use this pattern for retrying failed API calls (especially relevant for S3 or DynamoDB limits) to prevent overwhelming the service and minimize throttling impact.
  2. Optimize Batching: Consolidate multiple individual API calls into single batch operations where supported (e.g., DynamoDB BatchWriteItem).
  3. Utilize Caching: Implement ElastiCache or CloudFront to reduce the number of requests hitting backend services, decreasing the probability of hitting RPS limits.

Handling Rejected Requests

If AWS rejects or significantly lowers your requested limit, it usually means the justification was insufficient, or the request exceeded safety parameters.

Action Plan for Rejection:

  • Do not re-submit immediately. Review the denial reason provided by AWS Support.
  • Refine the Justification: Provide more specific data points, internal metrics, and a clearer calculation methodology.
  • Contact Support Directly: If the issue is urgent or complex, respond to the support case asking for an explanation and offering to schedule a call to review the architectural requirements.

Post-Increase Review

After a limit is increased, update your CloudWatch alarms to reflect the new 80% threshold. Simply getting the increase is not the end; continuous monitoring ensures you do not hit the new limit unexpectedly in the future.

Takeaway

Quota management is part of production capacity planning. Track the quotas your architecture depends on, alert before you run out of room, and request increases with the same evidence you would use in a scaling review: current usage, expected peak, Region, timeline, and how you calculated the number.