06. Serverless

AWS Lambda

  • Synchronous Invocations
    • Results is returned right away
    • Client need to handle Errors with actions(debug, retries, exponential backoff, etc)
    • Users Invoked
      • Elastic Load Balancing (Application Load Balancer)
      • Amazon API Gateway
      • Amazon CloudFront (Lambda@Edge)
      • Amazon S3 Batch
    • Service Invoked
      • Amazon Cognito
      • AWS Step Functions
    • Others
      • Amazon Lex
      • Amazon Alexa
      • Amazon Kinesis Data Firehose
  • Asynchronous Invocations
    • S3, SNS, CloudWatch Events (ie EventBridge)
    • Put the invocations into Events Queue
    • Retry on errors for 3 times, init Run – 1min wait – 1st retry – 2 mins wait – 2nd retry – 2mins wait – final retry; duplicate logs entries in CloudWatch Logs as retried
    • the processing is idempotent
    • Can define a DLQ (dead-letter queue) ā€“ SNS or SQS ā€“ for failed processing (need correct IAM permissions)
    • Invoked by
      • Amazon Simple Storage Service (S3), with S3 Events Notifications
      • Amazon Simple Notification Service (SNS)
      • Amazon CloudWatch Events / EventBridge
      • AWS CodeCommit (CodeCommit Trigger : new branch, new tag, new push)
      • AWS CodePipeline (invoke a Lambda function during the pipeline, Lambda must callback)
      • Amazon CloudWatch Logs (log processing)
      • Amazon Simple Email Service
      • AWS CloudFormation
      • AWS Config
      • AWS IoT
      • AWS IoT Events
  • To expose Lambda as an HTTP(S) endpoint
    • ALB, registered Lambda in a target group; also QueryParameters and Headers needed to be Key/Value paired
      • support multi-value headers, auto-convert multiple values with same key into arrays to Lambda
    • API Gateway
  • Event Source Mapping (synchronous invoked)
    • Streams
      • Kinesis Data Streams or DynamoDB Streams
      • One Lambda invocation per stream shard
      • Processed items aren’t removed from the stream
      • process multiple batches in parallel (up to 10 batches per shard)
      • By default, if your function returns an error, the entire batch is reprocessed until the function succeeds, or the items in the batch expire
    • Queue
      • SQS (standard) queue & SQS FIFO queue
      • Long Polling with batch size (1-10 messages)
      • the DLQ have to set on SQS, not on Lambda
      • items would be deleted from queue once successfully processed by Lambda
      • For SQS (standard) queue Lambda adds 60 more instances per minute to scale up, up to 1000 batches
      • For SQS FIFO queue Lambda scales to the number of active message groups (defined in GroupID), and messages under same GroupID would be processed in order
  • Event Object – original sources prepared for application codes
    • JSON, contains information from the invoking service (e.g., EventBridge, custom, …)
    • Lambda runtime converts the event to an object (e.g., dict type in Python)
    • Example: input arguments, invoking service arguments, …
  • Context Object – details about the Lambda resources described
    • Provides methods and properties that provide information about the invocation, function, and runtime environment
    • Passed to your function by Lambda at runtime
    • Example: aws_request_id, function_name, memory_limit_in_mb, …
  • Destinations
    • Asynchronous invocations – can define destinations for successful and failed event to
      • Amazon SQS
      • Amazon SNS
      • AWS Lambda
      • Amazon EventBridge bus
    • Event Source mapping – only for discarded event batches, send to
      • Amazon SQS
      • Amazon SNS
  • Lambda Execution Role (IAM Role), to grant the Lambda function permissions to AWS services/resources
    • AWSLambdaBasicExecutionRole ā€“ Upload logs to CloudWatch
    • AWSLambdaKinesisExecutionRole ā€“ Read from Kinesis
    • AWSLambdaDynamoDBExecutionRole ā€“ Read from DynamoDB Streams
    • AWSLambdaSQSQueueExecutionRole ā€“ Read from SQS
    • AWSLambdaVPCAccessExecutionRole ā€“ Deploy Lambda function in VPC
    • AWSXRayDaemonWriteAccess ā€“ Upload trace data to X-Ray
  • Lambda Resources Based Policy, to allow resources to call Lambda functions
  • Lambda Event Variables, as key / value pair in ā€œStringā€ form
  • Can embrace X-Ray for tracing, by enabling “Active Tracing” in configuration, with IAM Execution Role (AWSXRayDaemonWriteAccess)
    • _X_AMZN_TRACE_ID: contains the tracing header
    • AWS_XRAY_CONTEXT_MISSING: by default, LOG_ERROR
    • AWS_XRAY_DAEMON_ADDRESS: the X-Ray Daemon IP_ADDRESS:PORT
  • Edge functions attached on CloudFront with
    • CloudFront Functions, as lightweight functions written in JavaScript, can be millions requests per second
      • change viewer requests (after CloudFront received) and viewer responses (before forwarding to clients)
      • managed in CloudFront
    • Lambda@Edge
      • Lambda functions written in NodeJS or Python, only support up to 1K request per second
      • change CloudFront requests and responses:
        • Viewer Request ā€“ after CloudFront receives a request from a viewer
        • Origin Request ā€“ before CloudFront forwards the request to the origin
        • Origin Response ā€“ after CloudFront receives the response from the origin
        • Viewer Response ā€“ before CloudFront forwards the response to the viewer
      • Authoring in one AWS Region (us-east-1)
  • to allow Lambda to access VPC resources (RDS, ElasticCache, internal ELB, etc.), an Elastic Network Interface (ENI) created by Lambda
    • with VPC ID, subnets, and security groups
    • using AWSLambdaVPCAccessExecutionRole permission
    • no internet access, unless the deployed in a private subnet has setup NAT Gateway or NAT instance
    • without NAT, can access AWS resources via VPC Endpoints
  • Lambda Function Configuration and Performance
    • RAM: 128MB to 10GB
    • vCPU would be assigned more if RAM > 1792MB; also need to enable multi-threading in code
    • Timeout is 3(default) – 900 seconds
    • Execution Context is a temporary runtime environment that initializes any external dependencies
      • re-use by other function invoke would boost performance (like DB/HTTP connect)
      • does included the /tmp directory, allowing max 10GB file for temporary; generating KMS Data Keys to encrypt if needed
      • for permanent objects, using S3
  • Lambda Layers
    • Custom Runtimes Library (C++ & Rust)
    • split/package dependencies as externalised for re-use
  • File System Mounting
    • EFS with EFS Access Points if Lambda deployed within same VPC
    • Watch out the EFS connection (and burst) limits, as each Lambda instance use its own non-shared connection
  • Concurrency and Throttling
    • Max 1000 concurrent executions as default
    • Can set “reserved concurrency” at function lv; the exceeds would trigger “throttle”
      • Synchronous Invoke: ThrottleError -429
      • Asynchronous Invoke: retry and then go to DLQ
    • Cold Start – processing for first request of new instance would be slower
    • Provisioned Concurrency would ensure instances would be allocated before function invoked; so no cold start occurrs
  • Upload the zip (code + dependency libaries) straight to Lambda if less than 50MB, else to S3 ļ¬rst
  • Using CloudFormation
    • inline – use Code.ZipFile property; could not include dependencies
    • S3 – with S3Bucket + S3Key + S3ObjectVersion; however, anytime to update Lambda, more than one of three properties needs change as well as the codes (zip file)
  • Lambda Container Images with max-size of 10GB in ECR, with Lambda Runtime API
  • Versions – default is $LATEST, each version (immutable) has their own ARNs
  • Alias – a pointer to Lambda version, as mutable; alias enable canary deployment with weight assign; most time aliases can be used as staging, with also their own ARNs
  • Function URL, for public access on internet with unique URL
    • Can be $LATEST or Alias, no Versions
    • Throttle by Reserved Concurrency
    • Secured by Resource-based Policy or Cross-Origin Resources Sharing (CORS)
      • AuthType NONE: allow public with unauthenticiated, but need Resource-based Policy grant public access
      • AuthType AWS_IAM
  • CodeGuru Profiler can gain insight of runtime performance
    • Java and Python
    • AmazonCodeGuruProļ¬lerAgentAccess

Step Functions

  • Workflow in JSON
  • Start with SDK, API Gateway call, or Event Bridge
  • Task: Invoke 1 AWS service or Run 1 Activity
  • States: Choice, Fail-or-Succeed, Pass, Wait, Map, Parallel
  • Error Handling should be in Step Functions, not in Task; using Retry and Catch, running from top to bottom but not sequentially (“OR”)
  • Wait for Task token: append .waitForTaskToken in Resource, pause the running until receiving a SendTaskSuccess or SendTaskFailure API call. (PUSH mechanism)
  • Activity Task: Activity Worker on EC2/Lambda/.., using GetTaskActivity API call for poll, sending response with SendTaskSuccess or SendTaskFailure API call (PULL mechanism), with SendTaskHeartBeat + HeartBeatSeconds
  • Standard vs Express (asynchronous and synchronous)

AWS Serverless Application Modal (SAM)

  • configure via JSON/YAML, complied to CloudFormation stack
  • use CodeDeploy for Lambda function
    • Traffic Shifting (from OLD ver to New ver)
      • Linear: grow trafļ¬c every N minutes until 100%
      • Canary: try X percent then 100%
      • AllAtOnce: immediate
    • Pre- and Pro- for testing on traffic shifting
    • rollback by AWS CloudWatch Alarm
    • AppSpec.yml
      • Name
      • Alias
      • CurrentVersion
      • TargetVersion
  • run Lambda, API Gateway, DynamoDB locally
    • Lambda start/invoke
    • API Gateway
    • AWS Events (sample payloads for event resources)
  • SAM Recipe
    • Transform Header – template
    • Write Code
    • Package and Deploy – into S3 Bucket
  • SAM Accelerate (sam sync) – reduce latency
    • update existing SAM template
    • using “–code” option, without updating infrastructure (service APIs and bypass CloudFormation)
  • SAM Policy Templates
    • apply permissions to Lambda Functions
  • SAM Multiple Environments, using “samconfig.toml”

Amazon Athena

  • Serverless
  • Analysis on S3
  • Using SQL, with Presto engine
  • Support JSON, CSV, ORC, Avro and Parquet
  • with Amazon QuickSight for reporting/dashboards
  • Performance Improve
    • Using columnar data, for less scan (cost-saving)
    • Data type: ORC or Apache Parquet
    • Using Glue for data conversion
    • Compression for smaller retrievals
    • Partition datasets (using year/month/date in S3 path/folders)
    • Large files (> 128MB) to lower overhead
  • Federated Query
    • any data sources
    • using AWS Data Source Connectors(Lambda func)
    • results stored in Amazon S3

Amazon API Gateway

  • Endpoints
    • Edge-Optimized (default):
      • For global clients, the requests are routed through the CloudFront Edge locations (improves latency)
      • The API Gateway still lives in only one region
    • Regional:
      • For clients within the same region
      • Could manually combine with CloudFront (more control over the caching
        strategies and the distribution)
    • Private:
      • Can only be accessed from your VPC using an interface VPC endpoint (ENI)
      • Use a resource policy to deļ¬ne access
  • Security
    • IAM Role – Authentication; IAM Policy & Resources Policy – Authorization
    • Cognito User Pool – Authentication; API Gateway Methods – Authorization
    • , or Custom Authorizer
    • Custom Authorizer / Lambda Authorizer (External) – Authentication;Lambda function – Authorization
    • Custom Domain Name HTTPS security through integration with AWS Certificate Manager (ACM)
      • for Edge-Optimized endpoint, then the certificate must be in us-east-1
      • for Regional endpoint, the certificate must be in the API Gateway region
      • Must setup CNAME or A-alias record in Route 53
  • Deployment Stages/Environments
    • Stage variables are like environment variables for API Gateway, passed to the ā€contextā€ object in AWS Lambda, with Format: ${stageVariables.variableName}
    • Canary deployments, choose the % of traffic the canary channel, suitable for Green/Blue Deployment
  • Integration Types
    • MOCK: API Gateway returns a response directly
    • HTTP / AWS (Lambda & AWS Services): for example, call SQS. Setup data mapping using mapping templates for the request & response
      • Mapping templates can be used to modify request / responses
      • Rename / Modify query string parameters
      • Modify body content
      • Add headers
      • Uses Velocity Template Language (VTL): for loop, if etcā€¦
      • Filter output results (remove unnecessary data)
      • Content-Type can be set to application/json or application/xml
    • AWS_PROXY (Lambda Proxy)
      • incoming request from the client is the input to Lambda
      • No mapping template; headers, query string parametersā€¦ are passed as arguments
    • HTTP_PROXY
      • No mapping template
      • Possibility to add HTTP Headers if need be (ex: API key)
  • Swagger / Open API import to quickly define APIs
    • OpenAPI specs can be written in YAML or JSON
    • Using OpenAPI we can generate SDK for our applications
    • Request Validation
      • Returns a 400-error response to the caller if validation failed
  • Cache API responses
    • Default TTL (time to live) is 300 seconds, ranging from 0-3600s
    • Caches are defined per stage, possible to override cache settings per method
    • Clients can invalidate the cache with header: Cache-Control: max-age=0
  • Usage Plan, using API Keys to identify clients and meter access
    • API Keys are alphanumeric string values
    • Throttling limits
    • Quotas limits is the overall number of maximum requests
    • Callers must supply an assigned API key in the x-api-key header in requests
  • CloudWatch Metrics
    • CacheHitCount & CacheMissCount
    • Count
    • IntegrationLatency: The time between when API Gateway relays a request to the backend and when it receives a response from the backend.
    • Latency: The time between when API Gateway receives a request from a client and when it returns a response to the client. The latency includes the integration latency and other API Gateway overhead.
  • Throttling limits
    • Account Limit, at 10000 rps across all API
    • Also can set Stage limit, Method limits, or deļ¬ne Usage Plans to throttle per customer
    • like Lambda Concurrency, one API that is overloaded, if not limited, can cause the other APIs to be throttled
  • Errors
    • 4xx means Client errors
      • 400: Bad Request
      • 403: Access Denied, WAF filtered
      • 429: Quota exceeded, Throttle (Too Many Requests, aka retriable error)
    • 5xx means Server errors
      • 502: Bad Gateway Exception, usually for an incompatible output returned from a
        Lambda proxy integration backend and occasionally for out-of-order invocations due to
        heavy loads.
      • 503: Service Unavailable Exception
      • 504: Integration Failure ā€“ ex Endpoint Request Timed-out Exception; API Gateway requests time out after 29 second maximum
  • CORS must be enabled when you receive API calls from another domain.
    • The OPTIONS pre-ļ¬‚ight request must contain the following headers:
      • Access-Control-Allow-Methods
      • Access-Control-Allow-Headers
      • Access-Control-Allow-Origin
  • REST (apigateway) vs HTTP (apigatewayv2)
    • https://dev.to/tinystacks/api-gateway-rest-vs-http-api-what-are-the-differences-2nj
FeatureHTTP APIREST API
Core ProtocolHTTPHTTP + REST Principles
FeaturesBasic (e.g., limited authentication)Rich (API keys, request validation, private endpoints)
CostGenerally lowerGenerally higher
PerformanceOften fasterCan have slightly lower performance
FlexibilityLess flexibleHighly flexible and scalable
Architectural StyleNot strictly boundAdheres to REST principles (statelessness, client-server)
PurposeSimpler applications, internal use, rapid developmentPublic-facing APIs, microservices, complex integrations
Canary DeploymentsNot supportedSupported
Programmatic ModelSimplifiedCan be more complex
Endpoint TypesLimited (e.g., regional)Supports various types
Security OptionsFewer optionsMore options (authentication, authorization, encryption)
DeploymentsAutomatic deploymentsManual or more involved
AuthorizersHTTP APIREST API
AWS LambdaVV
IAMVV
Resource PoliciesV
Amazon CognitoVV
Native OpenID Connect / OAuth 2.0 / JWTV
  • WebSocket API
    • Server can push information to the client (wss://abcdef.execute-api.us-west-1.amazonaws.com/dev/@connections/connectionId)
      • POST: Server send message to the connected Client
      • GET: get connection status
      • DELETE: disconnect with Client
    • This enables stateful application use cases
    • WebSocket APIs are often used in real-time applications such as chat applications, collaboration platforms, multiplayer games, and financial trading platforms.
    • Works with AWS Services (Lambda, DynamoDB) or HTTP endpoints
    • Routing
      • https://docs.aws.amazon.com/apigateway/latest/developerguide/websocket-api-develop-routes.html
      • Incoming JSON messages are routed to different backend
      • If no routes => sent to $default route
      • You request a route selection expression to select the field on JSON to route from
      • Sample expression: $request.body.action
      • The result is evaluated against the route keys available in your API Gateway
      • The route is then connected to the backend youā€™ve setup through API Gateway

Amazon SQS

  • Queue model as pull-based
  • ideal for solutions that must be durable and loosely coupled
  • Max message size is 256kb, and max retention time of 14 days; also the message is persisted in SQS until a consumer deletes
  • When a consumer picks a message from the queue, theĀ message stays in the queue but is invisible until the job is processed. If theĀ visibility timeout (default: 30s)Ā is over (ie, job is not processed in time), then the message reappears in the queue for another consumer to take.
  • Dead Letter Queue (DLQ)
    • MaximumReceives is the threshold for message to re-queue in source
    • DLQ of a FIFO queue must also be a FIFO queue
    • DLQ of a Standard queue must also be a Standard queue
    • use “redrive” to put DLQ message to be re-process
  • Delay Queue, from 0s (default) to 15mins
  • Short polling vs. Long polling = time to wait before polling again
    • ā€‹Short polling is the default. When you poll the SQS, it doesn’t wait for messages to be available in the queue to respond. It checks a subset of servers for messages and may respond that nothing is available yet.
    • Long polling waits (with extra time, from 1-20s) for messages to be in the queue before responding, so it uses fewer total requests and reduces cost.
  • SQS Extended Client (Java Library) for large message (stored in S3 bucket)
  • API calls
    • CreateQueue (MessageRetentionPeriod), DeleteQueue
    • PurgeQueue: delete all the messages in queue
    • SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage
    • MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
    • ReceiveMessageWaitTimeSeconds: Long Polling
    • ChangeMessageVisibility: change the message timeout
    • Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility helps decrease costs
  • Standard vs. FIFO: FIFO is very rigorous whereas Standard is best-effort. The trade-off is that Standard has unlimited throughput of transactions per sec.
    • FIFO with Message Group ID
      • Messages that share a common Message Group ID will be in order within the group
      • Each Group ID can have a different consumer (parallel processing!)
    • FIFO De-duplication interval is 5 minutes
      • Content-based deduplication: will do a SHA-256 hash of the message body
      • Explicitly provide a Message Deduplication ID
  • SQS doesn’t prioritize items in the queue. If you need to prioritize use multiple queues, one for each priority type
  • ——
  • To use industry standards with Apache ActiveMQ, use an Amazon MQ instead of SQS (this is similar to using EKS instead of ECS, the industry-standard version of containers rather than the Amazon proprietary version)

DynamoDB

Amazon S3

Amazon SNS

  • Pub/Sub model (Publish-Subscribe messaging)
  • fully managed messaging service for pushing async notifications, especially used for broadcasting to multiple services

Amazon Kinesis

  • Real-time Streaming model
  • for use cases that require ingestion of real-time data (e.g. IoT senor data)
  • Kinesis data stream is made up of shards, which are made up of data records, which each have a sequence #. Then you map devices to partition keys which group data by shard.

AWS Kinesis Data Firehose

Aurora Serverless