AWS Lambda
- Synchronous Invocations
- Results is returned right away
- Client need to handle Errors with actions(debug, retries, exponential backoff, etc)
- Users Invoked
- Elastic Load Balancing (Application Load Balancer)
- Amazon API Gateway
- Amazon CloudFront (Lambda@Edge)
- Amazon S3 Batch
- Service Invoked
- Amazon Cognito
- AWS Step Functions
- Others
- Amazon Lex
- Amazon Alexa
- Amazon Kinesis Data Firehose
- Asynchronous Invocations
- S3, SNS, CloudWatch Events (ie EventBridge)
- Put the invocations into Events Queue
- Retry on errors for 3 times, init Run – 1min wait – 1st retry – 2 mins wait – 2nd retry – 2mins wait – final retry; duplicate logs entries in CloudWatch Logs as retried
- the processing is idempotent
- Can define a DLQ (dead-letter queue) ā SNS or SQS ā for failed processing (need correct IAM permissions)
- Invoked by
- Amazon Simple Storage Service (S3), with S3 Events Notifications
- Amazon Simple Notification Service (SNS)
- Amazon CloudWatch Events / EventBridge
- AWS CodeCommit (CodeCommit Trigger : new branch, new tag, new push)
- AWS CodePipeline (invoke a Lambda function during the pipeline, Lambda must callback)
- Amazon CloudWatch Logs (log processing)
- Amazon Simple Email Service
- AWS CloudFormation
- AWS Config
- AWS IoT
- AWS IoT Events
- To expose Lambda as an HTTP(S) endpoint
- ALB, registered Lambda in a target group; also QueryParameters and Headers needed to be Key/Value paired
- support multi-value headers, auto-convert multiple values with same key into arrays to Lambda
- API Gateway
- ALB, registered Lambda in a target group; also QueryParameters and Headers needed to be Key/Value paired
- Event Source Mapping (synchronous invoked)
- Streams
- Kinesis Data Streams or DynamoDB Streams
- One Lambda invocation per stream shard
- Processed items aren’t removed from the stream
- process multiple batches in parallel (up to 10 batches per shard)
- By default, if your function returns an error, the entire batch is reprocessed until the function succeeds, or the items in the batch expire
- Queue
- SQS (standard) queue & SQS FIFO queue
- Long Polling with batch size (1-10 messages)
- the DLQ have to set on SQS, not on Lambda
- items would be deleted from queue once successfully processed by Lambda
- For SQS (standard) queue Lambda adds 60 more instances per minute to scale up, up to 1000 batches
- For SQS FIFO queue Lambda scales to the number of active message groups (defined in GroupID), and messages under same GroupID would be processed in order
- Streams
- Event Object – original sources prepared for application codes
- JSON, contains information from the invoking service (e.g., EventBridge, custom, …)
- Lambda runtime converts the event to an object (e.g., dict type in Python)
- Example: input arguments, invoking service arguments, …
- Context Object – details about the Lambda resources described
- Provides methods and properties that provide information about the invocation, function, and runtime environment
- Passed to your function by Lambda at runtime
- Example: aws_request_id, function_name, memory_limit_in_mb, …
- Destinations
- Asynchronous invocations – can define destinations for successful and failed event to
- Amazon SQS
- Amazon SNS
- AWS Lambda
- Amazon EventBridge bus
- Event Source mapping – only for discarded event batches, send to
- Amazon SQS
- Amazon SNS
- Asynchronous invocations – can define destinations for successful and failed event to
- Lambda Execution Role (IAM Role), to grant the Lambda function permissions to AWS services/resources
- AWSLambdaBasicExecutionRole ā Upload logs to CloudWatch
- AWSLambdaKinesisExecutionRole ā Read from Kinesis
- AWSLambdaDynamoDBExecutionRole ā Read from DynamoDB Streams
- AWSLambdaSQSQueueExecutionRole ā Read from SQS
- AWSLambdaVPCAccessExecutionRole ā Deploy Lambda function in VPC
- AWSXRayDaemonWriteAccess ā Upload trace data to X-Ray
- Lambda Resources Based Policy, to allow resources to call Lambda functions
- Lambda Event Variables, as key / value pair in āStringā form
- Can embrace X-Ray for tracing, by enabling “Active Tracing” in configuration, with IAM Execution Role (AWSXRayDaemonWriteAccess)
- _X_AMZN_TRACE_ID: contains the tracing header
- AWS_XRAY_CONTEXT_MISSING: by default, LOG_ERROR
- AWS_XRAY_DAEMON_ADDRESS: the X-Ray Daemon IP_ADDRESS:PORT
- Edge functions attached on CloudFront with
- CloudFront Functions, as lightweight functions written in JavaScript, can be millions requests per second
- change viewer requests (after CloudFront received) and viewer responses (before forwarding to clients)
- managed in CloudFront
- Lambda@Edge
- Lambda functions written in NodeJS or Python, only support up to 1K request per second
- change CloudFront requests and responses:
- Viewer Request ā after CloudFront receives a request from a viewer
- Origin Request ā before CloudFront forwards the request to the origin
- Origin Response ā after CloudFront receives the response from the origin
- Viewer Response ā before CloudFront forwards the response to the viewer
- Authoring in one AWS Region (us-east-1)
- CloudFront Functions, as lightweight functions written in JavaScript, can be millions requests per second
- to allow Lambda to access VPC resources (RDS, ElasticCache, internal ELB, etc.), an Elastic Network Interface (ENI) created by Lambda
- with VPC ID, subnets, and security groups
- using AWSLambdaVPCAccessExecutionRole permission
- no internet access, unless the deployed in a private subnet has setup NAT Gateway or NAT instance
- without NAT, can access AWS resources via VPC Endpoints
- Lambda Function Configuration and Performance
- RAM: 128MB to 10GB
- vCPU would be assigned more if RAM > 1792MB; also need to enable multi-threading in code
- Timeout is 3(default) – 900 seconds
- Execution Context is a temporary runtime environment that initializes any external dependencies
- re-use by other function invoke would boost performance (like DB/HTTP connect)
- does included the /tmp directory, allowing max 10GB file for temporary; generating KMS Data Keys to encrypt if needed
- for permanent objects, using S3
- Lambda Layers
- Custom Runtimes Library (C++ & Rust)
- split/package dependencies as externalised for re-use
- File System Mounting
- EFS with EFS Access Points if Lambda deployed within same VPC
- Watch out the EFS connection (and burst) limits, as each Lambda instance use its own non-shared connection
- Concurrency and Throttling
- Max 1000 concurrent executions as default
- Can set “reserved concurrency” at function lv; the exceeds would trigger “throttle”
- Synchronous Invoke: ThrottleError -429
- Asynchronous Invoke: retry and then go to DLQ
- Cold Start – processing for first request of new instance would be slower
- Provisioned Concurrency would ensure instances would be allocated before function invoked; so no cold start occurrs
- Upload the zip (code + dependency libaries) straight to Lambda if less than 50MB, else to S3 ļ¬rst
- Using CloudFormation
- inline – use Code.ZipFile property; could not include dependencies
- S3 – with S3Bucket + S3Key + S3ObjectVersion; however, anytime to update Lambda, more than one of three properties needs change as well as the codes (zip file)
- Lambda Container Images with max-size of 10GB in ECR, with Lambda Runtime API
- Versions – default is $LATEST, each version (immutable) has their own ARNs
- Alias – a pointer to Lambda version, as mutable; alias enable canary deployment with weight assign; most time aliases can be used as staging, with also their own ARNs
- Function URL, for public access on internet with unique URL
- Can be $LATEST or Alias, no Versions
- Throttle by Reserved Concurrency
- Secured by Resource-based Policy or Cross-Origin Resources Sharing (CORS)
- AuthType NONE: allow public with unauthenticiated, but need Resource-based Policy grant public access
- AuthType AWS_IAM
- CodeGuru Profiler can gain insight of runtime performance
- Java and Python
- AmazonCodeGuruProļ¬lerAgentAccess
Step Functions
- Workflow in JSON
- Start with SDK, API Gateway call, or Event Bridge
- Task: Invoke 1 AWS service or Run 1 Activity
- States: Choice, Fail-or-Succeed, Pass, Wait, Map, Parallel
- Error Handling should be in Step Functions, not in Task; using Retry and Catch, running from top to bottom but not sequentially (“OR”)
- Wait for Task token: append .waitForTaskToken in Resource, pause the running until receiving a SendTaskSuccess or SendTaskFailure API call. (PUSH mechanism)
- Activity Task: Activity Worker on EC2/Lambda/.., using GetTaskActivity API call for poll, sending response with SendTaskSuccess or SendTaskFailure API call (PULL mechanism), with SendTaskHeartBeat + HeartBeatSeconds
- Standard vs Express (asynchronous and synchronous)
AWS Serverless Application Modal (SAM)
- configure via JSON/YAML, complied to CloudFormation stack
- use CodeDeploy for Lambda function
- Traffic Shifting (from OLD ver to New ver)
- Linear: grow trafļ¬c every N minutes until 100%
- Canary: try X percent then 100%
- AllAtOnce: immediate
- Pre- and Pro- for testing on traffic shifting
- rollback by AWS CloudWatch Alarm
- AppSpec.yml
- Name
- Alias
- CurrentVersion
- TargetVersion
- Traffic Shifting (from OLD ver to New ver)
- run Lambda, API Gateway, DynamoDB locally
- Lambda start/invoke
- API Gateway
- AWS Events (sample payloads for event resources)
- SAM Recipe
- Transform Header – template
- Write Code
- Package and Deploy – into S3 Bucket
- SAM Accelerate (sam sync) – reduce latency
- update existing SAM template
- using “–code” option, without updating infrastructure (service APIs and bypass CloudFormation)
- SAM Policy Templates
- apply permissions to Lambda Functions
- SAM Multiple Environments, using “samconfig.toml”
Amazon Athena
- Serverless
- Analysis on S3
- Using SQL, with Presto engine
- Support JSON, CSV, ORC, Avro and Parquet
- with Amazon QuickSight for reporting/dashboards
- Performance Improve
- Using columnar data, for less scan (cost-saving)
- Data type: ORC or Apache Parquet
- Using Glue for data conversion
- Compression for smaller retrievals
- Partition datasets (using year/month/date in S3 path/folders)
- Large files (> 128MB) to lower overhead
- Federated Query
- any data sources
- using AWS Data Source Connectors(Lambda func)
- results stored in Amazon S3
Amazon API Gateway
- Endpoints
- Edge-Optimized (default):
- For global clients, the requests are routed through the CloudFront Edge locations (improves latency)
- The API Gateway still lives in only one region
- Regional:
- For clients within the same region
- Could manually combine with CloudFront (more control over the caching
strategies and the distribution)
- Private:
- Can only be accessed from your VPC using an interface VPC endpoint (ENI)
- Use a resource policy to deļ¬ne access
- Edge-Optimized (default):
- Security
- IAM Role – Authentication; IAM Policy & Resources Policy – Authorization
- Cognito User Pool – Authentication; API Gateway Methods – Authorization
- , or Custom Authorizer
- Custom Authorizer / Lambda Authorizer (External) – Authentication;Lambda function – Authorization
- Custom Domain Name HTTPS security through integration with AWS Certificate Manager (ACM)
- for Edge-Optimized endpoint, then the certificate must be in us-east-1
- for Regional endpoint, the certificate must be in the API Gateway region
- Must setup CNAME or A-alias record in Route 53
- Deployment Stages/Environments
- Stage variables are like environment variables for API Gateway, passed to the ācontextā object in AWS Lambda, with Format: ${stageVariables.variableName}
- Canary deployments, choose the % of traffic the canary channel, suitable for Green/Blue Deployment
- Integration Types
- MOCK: API Gateway returns a response directly
- HTTP / AWS (Lambda & AWS Services): for example, call SQS. Setup data mapping using mapping templates for the request & response
- Mapping templates can be used to modify request / responses
- Rename / Modify query string parameters
- Modify body content
- Add headers
- Uses Velocity Template Language (VTL): for loop, if etcā¦
- Filter output results (remove unnecessary data)
- Content-Type can be set to application/json or application/xml
- AWS_PROXY (Lambda Proxy)
- incoming request from the client is the input to Lambda
- No mapping template; headers, query string parametersā¦ are passed as arguments
- HTTP_PROXY
- No mapping template
- Possibility to add HTTP Headers if need be (ex: API key)
- Swagger / Open API import to quickly define APIs
- OpenAPI specs can be written in YAML or JSON
- Using OpenAPI we can generate SDK for our applications
- Request Validation
- Returns a 400-error response to the caller if validation failed
- Cache API responses
- Default TTL (time to live) is 300 seconds, ranging from 0-3600s
- Caches are defined per stage, possible to override cache settings per method
- Clients can invalidate the cache with header: Cache-Control: max-age=0
- Usage Plan, using API Keys to identify clients and meter access
- API Keys are alphanumeric string values
- Throttling limits
- Quotas limits is the overall number of maximum requests
- Callers must supply an assigned API key in the x-api-key header in requests
- CloudWatch Metrics
- CacheHitCount & CacheMissCount
- Count
- IntegrationLatency: The time between when API Gateway relays a request to the backend and when it receives a response from the backend.
- Latency: The time between when API Gateway receives a request from a client and when it returns a response to the client. The latency includes the integration latency and other API Gateway overhead.
- Throttling limits
- Account Limit, at 10000 rps across all API
- Also can set Stage limit, Method limits, or deļ¬ne Usage Plans to throttle per customer
- like Lambda Concurrency, one API that is overloaded, if not limited, can cause the other APIs to be throttled
- Errors
- 4xx means Client errors
- 400: Bad Request
- 403: Access Denied, WAF filtered
- 429: Quota exceeded, Throttle (Too Many Requests, aka retriable error)
- 5xx means Server errors
- 502: Bad Gateway Exception, usually for an incompatible output returned from a
Lambda proxy integration backend and occasionally for out-of-order invocations due to
heavy loads. - 503: Service Unavailable Exception
- 504: Integration Failure ā ex Endpoint Request Timed-out Exception; API Gateway requests time out after 29 second maximum
- 502: Bad Gateway Exception, usually for an incompatible output returned from a
- 4xx means Client errors
- CORS must be enabled when you receive API calls from another domain.
- The OPTIONS pre-ļ¬ight request must contain the following headers:
- Access-Control-Allow-Methods
- Access-Control-Allow-Headers
- Access-Control-Allow-Origin
- The OPTIONS pre-ļ¬ight request must contain the following headers:
- REST (apigateway) vs HTTP (apigatewayv2)
- https://dev.to/tinystacks/api-gateway-rest-vs-http-api-what-are-the-differences-2nj
Feature | HTTP API | REST API |
Core Protocol | HTTP | HTTP + REST Principles |
Features | Basic (e.g., limited authentication) | Rich (API keys, request validation, private endpoints) |
Cost | Generally lower | Generally higher |
Performance | Often faster | Can have slightly lower performance |
Flexibility | Less flexible | Highly flexible and scalable |
Architectural Style | Not strictly bound | Adheres to REST principles (statelessness, client-server) |
Purpose | Simpler applications, internal use, rapid development | Public-facing APIs, microservices, complex integrations |
Canary Deployments | Not supported | Supported |
Programmatic Model | Simplified | Can be more complex |
Endpoint Types | Limited (e.g., regional) | Supports various types |
Security Options | Fewer options | More options (authentication, authorization, encryption) |
Deployments | Automatic deployments | Manual or more involved |
Authorizers | HTTP API | REST API |
AWS Lambda | V | V |
IAM | V | V |
Resource Policies | V | |
Amazon Cognito | V | V |
Native OpenID Connect / OAuth 2.0 / JWT | V |
- WebSocket API
- Server can push information to the client (wss://abcdef.execute-api.us-west-1.amazonaws.com/dev/@connections/connectionId)
- POST: Server send message to the connected Client
- GET: get connection status
- DELETE: disconnect with Client
- This enables stateful application use cases
- WebSocket APIs are often used in real-time applications such as chat applications, collaboration platforms, multiplayer games, and financial trading platforms.
- Works with AWS Services (Lambda, DynamoDB) or HTTP endpoints
- Routing
- https://docs.aws.amazon.com/apigateway/latest/developerguide/websocket-api-develop-routes.html
- Incoming JSON messages are routed to different backend
- If no routes => sent to $default route
- You request a route selection expression to select the field on JSON to route from
- Sample expression: $request.body.action
- The result is evaluated against the route keys available in your API Gateway
- The route is then connected to the backend youāve setup through API Gateway
- Server can push information to the client (wss://abcdef.execute-api.us-west-1.amazonaws.com/dev/@connections/connectionId)
Amazon SQS
- Queue model as pull-based
- ideal for solutions that must be durable and loosely coupled
- Max message size is 256kb, and max retention time of 14 days; also the message is persisted in SQS until a consumer deletes
- When a consumer picks a message from the queue, theĀ message stays in the queue but is invisible until the job is processed. If theĀ visibility timeout (default: 30s)Ā is over (ie, job is not processed in time), then the message reappears in the queue for another consumer to take.
- Dead Letter Queue (DLQ)
- MaximumReceives is the threshold for message to re-queue in source
- DLQ of a FIFO queue must also be a FIFO queue
- DLQ of a Standard queue must also be a Standard queue
- use “redrive” to put DLQ message to be re-process
- Delay Queue, from 0s (default) to 15mins
- Short polling vs. Long polling = time to wait before polling again
- āShort polling is the default. When you poll the SQS, it doesn’t wait for messages to be available in the queue to respond. It checks a subset of servers for messages and may respond that nothing is available yet.
- Long polling waits (with extra time, from 1-20s) for messages to be in the queue before responding, so it uses fewer total requests and reduces cost.
- SQS Extended Client (Java Library) for large message (stored in S3 bucket)
- API calls
- CreateQueue (MessageRetentionPeriod), DeleteQueue
- PurgeQueue: delete all the messages in queue
- SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage
- MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
- ReceiveMessageWaitTimeSeconds: Long Polling
- ChangeMessageVisibility: change the message timeout
- Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility helps decrease costs
- Standard vs. FIFO: FIFO is very rigorous whereas Standard is best-effort. The trade-off is that Standard has unlimited throughput of transactions per sec.
- FIFO with Message Group ID
- Messages that share a common Message Group ID will be in order within the group
- Each Group ID can have a different consumer (parallel processing!)
- FIFO De-duplication interval is 5 minutes
- Content-based deduplication: will do a SHA-256 hash of the message body
- Explicitly provide a Message Deduplication ID
- FIFO with Message Group ID
- SQS doesn’t prioritize items in the queue. If you need to prioritize use multiple queues, one for each priority type
- ——
- To use industry standards with Apache ActiveMQ, use an Amazon MQ instead of SQS (this is similar to using EKS instead of ECS, the industry-standard version of containers rather than the Amazon proprietary version)
DynamoDB
Amazon S3
Amazon SNS
- Pub/Sub model (Publish-Subscribe messaging)
- fully managed messaging service for pushing async notifications, especially used for broadcasting to multiple services
Amazon Kinesis
- Real-time Streaming model
- for use cases that require ingestion of real-time data (e.g. IoT senor data)
- Kinesis data stream is made up of shards, which are made up of data records, which each have a sequence #. Then you map devices to partition keys which group data by shard.