Debugging AWS Lambda: Common Invocation Errors and How to Fix Them
Master the art of debugging AWS Lambda functions. This comprehensive guide details the most common invocation failures, ranging from IAM permission issues and VPC connectivity problems to resource constraints like memory exhaustion and function timeouts. Learn how to leverage CloudWatch logs effectively and apply practical, actionable fixes—including optimizing configurations, managing dependencies, and correcting execution roles—to ensure reliable and consistent serverless function performance.
Debugging AWS Lambda: Common Invocation Errors and How to Fix Them
AWS Lambda invocation errors usually come from one of three places: the caller cannot invoke the function, Lambda cannot start the runtime, or your code starts and then fails. The fastest fix is to identify which stage failed before changing memory, timeout, IAM policies, or VPC settings.
Start with CloudWatch Logs, then confirm permissions, handler settings, dependencies, and networking in that order.
Establishing the Debugging Baseline: CloudWatch Logs
Before changing the function, check the log group /aws/lambda/YourFunctionName. A normal Lambda execution usually includes these platform log lines:
- START: Indicates the beginning of execution.
- END: Indicates the completion of execution.
- REPORT: Provides summary metrics (Duration, Billed Duration, Memory Used, Max Memory Used, and X-Ray tracing details).
If the function never starts, you may not see application logs. In that case, check the invoking service, Lambda console test result, CloudTrail events, and the function's resource-based policy.
Resolving Permission and Access Errors
Permission errors are arguably the most common cause of Lambda invocation failure. These typically fall into two categories: the function lacks permission to run, or the invoking entity lacks permission to call the function.
Execution Role (IAM Role) Failures
Every Lambda function must assume an IAM execution role. If this role is misconfigured, the function cannot interact with necessary AWS services.
Common Missing Permissions:
| Service Access Needed | Required IAM Policy Actions |
|---|---|
| Logging to CloudWatch | logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents |
| VPC Connectivity | ec2:CreateNetworkInterface, ec2:DeleteNetworkInterface, ec2:DescribeNetworkInterfaces |
| Reading S3/DynamoDB | s3:GetObject, dynamodb:GetItem, etc. |
Fix:
- Navigate to the Lambda function configuration in the AWS Console.
- Check the "Permissions" tab and review the attached IAM role policy.
- Ensure the role has the basic AWS managed policy
AWSLambdaBasicExecutionRoleor that its custom policy includes the necessary CloudWatch actions. - Add only the service permissions your code actually needs, such as
s3:GetObjectfor a specific bucket prefix.
Resource-Based Policy Errors (Invocation Permissions)
If your Lambda is invoked by another service (like S3, API Gateway, SNS, or a cross-account invocation), that service needs explicit permission to call your function.
Symptom: The service (e.g., S3) attempts to trigger the Lambda, but nothing appears in the CloudWatch logs, and the service reports an error.
Fix: use the add-permission CLI command or the equivalent console setting to grant invocation rights. For example, allowing an S3 bucket to invoke the function:
aws lambda add-permission \
--function-name my-processing-function \
--statement-id S3InvokePermission \
--action lambda:InvokeFunction \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3:::my-trigger-bucket
For cross-account invocations, check both sides: the caller needs IAM permission to call lambda:InvokeFunction, and the Lambda function needs a resource-based policy allowing that caller.
Configuration and Resource Constraint Errors
These errors relate to the defined runtime environment settings and resource limits imposed on the function.
Function Timeout Errors
A function timeout is a common failure, indicating that the execution exceeded the maximum allotted time. Lambda will forcibly terminate the execution and log a Task timed out error.
Diagnosis:
- Check the
REPORTline in CloudWatch logs. Look at theDurationvs. the configured timeout. - If the function times out early (e.g., after 5 seconds of a 30-second limit), the bottleneck is likely initialization or connectivity (e.g., waiting for a DNS lookup).
Fixes:
- Increase Timeout: If the task is inherently long-running (e.g., large data processing), increase the timeout (up to 15 minutes).
- Optimize Code/Dependencies: If the task is slow, profile the code to identify bottlenecks. Ensure any external calls have reasonable timeouts defined within the code.
- Handle Cold Starts: Large initialization processes can contribute to timeouts. Use Lambda provisioned concurrency if cold starts are critical.
Memory Exhaustion Errors
If your function requires more RAM than allocated, it will crash and log an OutOfMemoryError or similar message, depending on the runtime.
Diagnosis: Review the Max Memory Used metric in the CloudWatch REPORT line. If this value is consistently close to or equal to the configured Memory Size, you may have a memory leak or insufficient resources.
Fix: Increase the memory allocation and retest. Lambda allocates more CPU as you allocate more memory, so higher memory can sometimes reduce duration enough to offset some of the cost. Measure your own function instead of assuming it will be cheaper.
AWS Lambda Power Tuning can help compare memory settings for a specific workload.
Handler Misconfiguration (Runtime.HandlerNotFound)
This occurs when Lambda cannot locate the entry point defined in the function configuration.
Symptom: Error: Runtime.HandlerNotFound or similar startup failure.
Fix: Verify the Handler field in the function settings matches the structure: [file_name].[function_name]. For example, a Python function defined in my_code.py with the entry function lambda_handler must have the handler set to my_code.lambda_handler.
For Node.js, handler names follow the module and exported function, such as index.handler for an exported handler function in index.js.
Networking and VPC Connectivity Issues
When a Lambda function is configured to run inside a Virtual Private Cloud (VPC), it gains access to private resources but loses public internet access by default.
Missing Internet Access
If your Lambda is in a VPC and needs to connect to external services, it needs a route to the internet through a NAT gateway or another approved egress path. Putting the function in a public subnet does not give it a public IP address.
Symptom: HTTP connection failures, timeouts when accessing public endpoints.
Fixes:
- Verify the function is attached to private subnets intended for Lambda workloads.
- Ensure these private subnets have a route table entry directing outbound internet traffic (
0.0.0.0/0) to a NAT Gateway. - If the Lambda only needs to access AWS services privately, consider VPC endpoints such as gateway endpoints for S3 and DynamoDB or interface endpoints for supported services.
Security Group and ACL Restrictions
Your function can start successfully but hang when its security group, target security group, network ACL, or route table blocks the connection.
Fix: allow outbound traffic from the Lambda security group to the target port, and allow inbound traffic on the target security group from the Lambda security group. For example, a Lambda function connecting to PostgreSQL needs outbound TCP 5432 from Lambda and inbound TCP 5432 on the database side.
If the execution role lacks the required EC2 network-interface permissions for VPC access, Lambda can fail while preparing the VPC networking needed to run the function.
Deployment and Runtime Misconfigurations
These issues relate to how the code bundle is structured or the runtime environment chosen.
Dependency and Package Errors
If your code relies on external libraries that were not correctly bundled or installed for the specific runtime environment, the function will fail during initialization.
Symptom: Runtime exceptions like module not found, cannot import name, or No such file or directory (especially common in Python or Node.js).
Fixes:
- Local vs. Lambda Environment: Ensure you build dependencies on an environment matching the Lambda runtime (e.g., use
pip install -t .for Python to place dependencies correctly). - Use Lambda Layers: Package larger, stable dependencies into Lambda Layers to reduce the size of the main deployment package and improve deployment speed.
- Check Path: Verify that your runtime configuration correctly points to the location of the installed dependencies.
Deployment Package Size and Format
Lambda has deployment package size limits, and those limits differ depending on whether you upload a .zip file directly, upload through Amazon S3, use layers, or deploy a container image. Check the current Lambda quotas for your packaging method before restructuring a large function.
Symptom: Deployment fails with a size error, or a large package contributes to slower cold starts.
Fixes:
- Pruning: Remove unnecessary files, documentation, and development dependencies.
- Layers: Move static assets or large dependencies to Lambda Layers.
- Container Images: For very large applications, consider deploying the function as a container image from Amazon ECR.
Event and Payload Problems
Some invocation failures come from the event itself:
- Malformed JSON: Console tests and CLI invocations require valid JSON payloads.
- Unexpected event shape: An S3 event, API Gateway event, and EventBridge event do not have the same fields.
- Async retry behavior: Asynchronous invokes can retry after failures and may send failed events to a destination or dead-letter queue if configured.
For a direct CLI test, capture the response and logs:
aws lambda invoke \
--function-name my-function \
--payload '{"ping": true}' \
--cli-binary-format raw-in-base64-out \
response.json
The --cli-binary-format raw-in-base64-out option is commonly needed with AWS CLI v2 when passing raw JSON directly on the command line.
Summary of Troubleshooting Steps
When encountering an invocation error, follow this systematic approach:
- Check CloudWatch First: Look for immediate errors logged by the Lambda service before the
STARTline. - Verify IAM Role: Ensure the function’s execution role has all required permissions (logging, VPC, and service access).
- Review Configuration: Check the Handler name, Memory setting, and Timeout limit.
- Analyze VPC Settings: If using a VPC, verify the security groups, subnet mappings, and route tables (especially for NAT Gateway access).
- Examine Dependencies: Confirm that all necessary libraries are correctly packaged and accessible by the runtime.
Once you know whether the failure happened before invocation, during runtime startup, or inside your code, the fix becomes much narrower. Check logs first, prove the active IAM identity and resource policy, then adjust handler, package, timeout, memory, and VPC settings based on the specific error you see.