AWS Command Line Interface (CLI) & Software Developer Kit (SDK)
both protected by access keys, generated from AWS Management Console
CLI can direct access to public APIs of AWS resources
to use MFA with CLI, which means a temporary session
run the STS GetSessionToken API call aws sts get-session-token --serial-number arn-of-the-mfa-device --token-code code-from-token --duration-seconds 3600
CLI Credentials Provider Chain
Command line options – –region, –output, and –profile
Environment variables – AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
Instance profile credentials – for EC2 Instance Profiles
SDK is set of libraries for programming, as language-specific APIs
the AWS CLI uses the Python SDK (boto3)
“us-east-1” would be chosen by default, if no region specified
SDK Credential Provider Chain
Java system properties – aws.accessKeyId and aws.secretKey
Environment variables – AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
The default credential profiles file – ex at: ~/.aws/credentials, shared by many SDK
Amazon ECS container credentials – for ECS containers
Instance profile credentials– used on EC2 instances
Cloud Development Kit (CDK)
CloudFormation using JSON/YAML, but CDK using Javascript/Typescript, Python, Java, .Net
Contain higher level components, constructs
encapsulate everything for final CloudFormation stack creation
AWS Construct Library or Construct Hub
Layer 1 (L1): CloudFormation(CFN) resources, prefix with “Cfn”, and all resource properties needed to be explicitly configured
Layer 2 (L2): intent-based API resources, with defaults and boilerplate, also can use methods
Layer 3 (L3): aka Patterns, represents as multiple related resources (for example, API Gateway + Lambda, or Fargate cluster + Application Load Balancer)
The codes would be complied to CloudFormation template
Benefits for Lambda & ECS/EKS as infrastructures and applications runtime codes implemented together
SAM focus on serverless, good for Lambda, but only JSON/YAML
Bootstrapping: the process of provisioning before deploying AWS environment (Account+Region)
CDKToolkit (CloudFormation stack), with S3 Bucket – store files and IAM Roles
Error: “Policy contains a statement with one or more invalid principal”, due to the lack of new IAM Roles for each new environment
UnitTest, using CDK Assertion Module for Jest(Javascript) or Pytest(Python)
Fine-granted Assertions (common): check certain property of certain resource
Snapshot Test: test against baseline template
AWS CloudFormation
provision infrastructure using a text-based (JSON/YAML) templates (uploaded to S3) that describes exactly what resources are provisioned and their settings.
manages the template history similar to how code is managed in source control
Delete the stack would also remove each individual artifact
CAPABILITY_AUTO_EXPAND, for Macro and Nested stakes
InsufficientCapabiltiesException
DeletionPolicy
Delete (wont work in S3 if the bucket is not empty)
Retain
Snapshot: create a final snapshot before deleted
Stack policy is a JSON to tell which resource(s) should be protected as not touched during update; so an explicit ALLOW for the desired resources for update is needed
Termination Protection is to prevent accidental delete
Custom Resources
custom functions running via Lambda, for example, empty a S3 bucket
AWS::CloudFormation::CustomResource or Custom::MyCustomResourceTypeName
Properties with Service Token (Lambda function or SNS topic, in the sam region) and optional Input data
Stackset is used for cross accounts/regions stacks management, with a single CloudFormation template.
A stack instance is simply a reference to a stack in a target account within a region.
(Python) Helper scripts
cfn-init – Use to retrieve and interpret resource metadata, install packages, create files, and start services.
cfn-signal – Use to signal with a CreationPolicy or WaitCondition, so you can synchronize other resources in the stack when the prerequisite resource or application is ready.
cfn-get-metadata – Use to retrieve metadata for a resource or path to a specific key.
cfn-hup – Use to check for updates to metadata and execute custom hooks when changes are detected.
AWS Serverless Application Modal (SAM)
configure via JSON/YAML, complied to CloudFormation stack
use CodeDeploy for Lambda function
Traffic Shifting (from OLD ver to New ver)
Linear: grow traffic every N minutes until 100%
Canary: try X percent then 100%
AllAtOnce: immediate
Pre- and Pro- for testing on traffic shifting
rollback by AWS CloudWatch Alarm
AppSpec.yml
Name
Alias
CurrentVersion
TargetVersion
run Lambda, API Gateway, DynamoDB locally
Lambda start/invoke
API Gateway
AWS Events (sample payloads for event resources)
SAM Recipe
Transform Header – template
Write Code
Package and Deploy – into S3 Bucket
SAM commands
sam init – creating a new SAM project
sam build – resolve dependencies and construct deployment artifacts for all functions and layers in the SAM template.
sam package – prepares the serverless application for deployment by zipping artifacts, uploading them to S3, and generating a CloudFormation template with references to the uploaded artifacts in S3. But, it doesn’t deploy the application.
sam deploy – zips your code artifacts, uploads them to Amazon S3, and produces a packaged AWS SAM template file that it uses to deploy your application
for nested applications, need “CAPABILITY_AUTO_EXPAND” option
Compared with “aws cloudformation deploy” – deploy a CloudFormation stack, it expects that your artifacts are already packaged and uploaded to S3.
sam publish – publishes an AWS SAM application to the AWS Serverless Application Repository
sam sync – update existing SAM template
as Accelerate, reduce latency on deployments for rapid development testing
using “–code” option, without updating infrastructure (service APIs and bypass CloudFormation)
SAM Policy Templates
apply permissions to Lambda Functions
SAM Multiple Environments, using “samconfig.toml”
AWS Elastic Beanstalk
provision infrastructure using a text-based template that describes exactly what resources are provisioned and their settings
complied to CloudFormation stack
Components
Application
Application Version
Environment
Web Server Tier and Worker Tier
Deployment Method
All at once, has downtime
Rolling: running under capacity, no additional costs
Rolling with additional batches: compared to Rolling, this let application running at capacity (ie temporary create more instances)
Immutable: create new instances in a new ASG, then swap; zero downtime
(Blue Green: new environment then swap, using Route53 with weighted policies)
Traffic Splitting: canary testing
Lifecycle
max most Application versions: 1000, use LifeCycle Policy to phase out, based on Time or Space
has option to retain source bundles on S3
EB Extensions
YAML/JSON, with “.config” extension as file name
update defaults with “option_settings”
place under the “.ebextensions/” folder under root of source code
resources managed by .ebextensions would be deleted if the environment goes away
EB Clone, can help to setup exact same “configuration” environment
Load Balancer type and configure
RDS configure, but no data
Environment variables
EB Migration
Once EB created, the Elastic Load Balancer (ELB) type cannot change
Create another environment with new ELB, then using Route53 update or CNAME swap
Decouple RDS with EB, for PROD
== CONTAINERS ==
Amazon Elastic Container Service (ECS)
Container management service for Docker containers (ECS Task)
Highly scalable / high performance, lets you run applications on an EC2 cluster
Amazon Elastic Container Registry (ECR) is private repository for Docker images, the public version is Amazon ECR Public Gallery; backed by Amazon S3, access controlled through IAM
ECS Launch Types
Fargate Launch Type is serverless, managed by AWS
EC2 Launch Type gives you direct access to the instances, but you have to manage them, with ECS Agent
ECS Agent would use EC2 Instance Profile
ESC Tasks use each individual ESC Task Role, which is defined in the task definition
Mount EFS for ECS tasks, which can ensure all tasks in any AZ will share the same data; in comparison, S3 cannot be mounted as File System
ECS Task definition is metadata in JSON, up to 10 containers in one file
Image name
Port Binding for Container and Host
on EC2 Launch type, if only define container port, then the ALB would use Dynamic Host Port Mapping, then on EC2 instance’s Security Group should set allow on any port from ALB security group
each task has its unique private IP on Fargate Launch, so only define the container port
Memory and CPU required
Environment variables (Hardcoded ,SSM Parameter Store, Secrets Manager, or files stored in S3)
Networking
IAM Role (One IAM Role per Task Definition)
Logging configuration (CloudWatch)
Data Volume to share data among multiple containers (Applications and Metrics/Logs, aka sidecar)
EC2 Launch Type – using EC2 instance storage
Fargate Launch Type – using ephemeral storage (20-200 GB), data deleted when containers demolished
ECS Task Placement strategy & Task Placement constraints – Only for EC2 Launch Type
find instances meet CPU/Memory/Port requirements
find those satisfy task placement constraints
distinctInstance – place each task on different container instance
memberOf – using Cluster Query Language, placing on certain instances (like t2.*)
find those satisfy task placement strategies
Binpack – cost-saving by using least available amount of CPU or Memory as minimum instances
Random
Spread (can be AZ or instance ID)
ECS does not use EC2 Auto Scaling, instead, uses the AWS Application Auto Scaling based on
Average CPU Utilization
Average Memory Utilization – Scale on RAM
ALB Request Count Per Target
AWS Application Auto Scaling policy can be
Target Tracking – scale based on the target specific CloudWatch metric
Step Scaling – based on a specified CloudWatch Alarm
Scheduled Scaling
Under EC2 Launch Type, the way to auto-scaling EC2 instances by
Auto Scaling Group Scaling – use EC2 ASG to check instance loadings (CPU, Memory, etc.)
ECS Cluster Capacity Provider, paired with ASG
AWS Coplit is the CLI tool, running apps on AppRunner, ECS and Fargate; with CodePipeline for deployment
Amazon Elastic Kubernetes Service (EKS)
EC2 Launch for deploy worker node; Fargate for serverless
Kubernetes is cloud-agnostic
Node Types
Managed Node Groups
AWS handles EC2 instances with ASG managed by EKS
On-Demand or Spot instances
Self-Managed Nodes
Self create and manage EC2 instance with self-define ASG
On-Demand or Spot instances
AWS Fargate
Can specify StorageClass manifest on EKS cluster, leverage a Container Storage Interface (CSI) compliant driver
Amazon EBS (EC2)
Amazon EFS (EC2, Fargate)
Amazon FSx for Lustre (EC2)
Amazon FSx for NetApp ONTAP (EC2)
== MONITORING ==
AWS CloudWatch
Metrics: Collect and track key metrics for every AWS services
namespace
dimension is an attributes (instance id, environment, …)
timestamps
(EC2) Memory usage is a custom metric, using API PutMetricData
StorageResolution can be 1min (Standard) or 1/5/10/30 sec(High Resolution)
Data point range of custom metric would be 2 weeks for past history and 2 hours in future
Logs: Collect, monitor, analyze and store log files
Group – application (to encrpyt with KMS keys, need to use CloudWatch Logs API)
stream – instances / log files / containers
export
Amazon S3, may take up to 12 hour, with API CreateExportTask
Using Logs Subscripton to export real-time events to Kinesis Data Streams, Kinesis Data Firehose, AWS Lambda, with Subscription Filter
By default, no logs from EC2 machine to CloudWatch
CloudWatch Logs Agent – only push logs
CloudWatch Unified Agent – push logs + collect metrics (extra RAM, Process, Swap) + centralized by SSM Parameter Store
Metric Filters to trigger alarms; not traceback of history
Alarms: React in real-time to metrics / events
based on a single metric; Composite Alarms are monitoring on multiple other alarms
Targets
EC2
EC2 ASG
Amazon SNS
Synthetics Canary: monitor your APIs, URLs, Websites, …
Events, now called Amazon EventBridge
Schedule – cron job
Event Pattern – rules to react/trigger services
Event Bus,a router that receives events and delivers them to zero or more destinations, or targets.
(AWS) default, Partner, Custom
Schema – the structure template for event (json)
CloudWatch Evidently
validate/serve new features to specified % of users only
Launches (= feature flags) and Experiments (= A/B testing), and Overrides (specific variants assigned to specific user-id)
evaluation events stored in CloudWatch Logs or S3
AWS Serverless Application Modal (SAM)
configure via JSON/YAML, complied to CloudFormation stack
use CodeDeploy for Lambda function
Traffic Shifting (from OLD ver to New ver)
Linear: grow traffic every N minutes until 100%
Canary: try X percent then 100%
AllAtOnce: immediate
Pre- and Pro- for testing on traffic shifting
rollback by AWS CloudWatch Alarm
AppSpec.yml
Name
Alias
CurrentVersion
TargetVersion
run Lambda, API Gateway, DynamoDB locally
Lambda start/invoke
API Gateway
AWS Events (sample payloads for event resources)
SAM Recipe
Transform Header – template
Write Code
Package and Deploy – into S3 Bucket
SAM commands
sam init – creating a new SAM project
sam build – resolve dependencies and construct deployment artifacts for all functions and layers in the SAM template.
sam package – prepares the serverless application for deployment by zipping artifacts, uploading them to S3, and generating a CloudFormation template with references to the uploaded artifacts in S3. But, it doesn’t deploy the application.
sam deploy – zips your code artifacts, uploads them to Amazon S3, and produces a packaged AWS SAM template file that it uses to deploy your application
for nested applications, need “CAPABILITY_AUTO_EXPAND” option
Compared with “aws cloudformation deploy” – deploy a CloudFormation stack, it expects that your artifacts are already packaged and uploaded to S3.
sam publish – publishes an AWS SAM application to the AWS Serverless Application Repository
sam sync – update existing SAM template
as Accelerate, reduce latency on deployments for rapid development testing
using “–code” option, without updating infrastructure (service APIs and bypass CloudFormation)
SAM Policy Templates
apply permissions to Lambda Functions
SAM Multiple Environments, using “samconfig.toml”
AWS X-Ray
Troubleshooting application performance and errors as “centralized service map visualization”
Request tracking across distributed systems
Focus on Latency, Errors and Fault analysis
Compatible
AWS Lambda
Elastic Beanstalk
ECS
ELB
API Gateway
EC2 Instances or any application ser ver (even on premise)
Enable by
AWS X-Ray SDK
Install X-Ray daemon (low lv UDP packet interceptor on OS)
Enable X-Ray AWS Integration (IAM Role with proper permission)
Instrumentation means the measure of product’s performance, diagnose errors, and to write trace information
Segments: each application / service will send them
Subsegments: if you need more details in your segment, especially for DynanmoDB.
Trace: segments collected together to form an end-to-end trace
Sampling: decrease the amount of requests sent to X-Ray, reduce cost
(default) 1st request each second (aka reservoir: 1), and then 5% of additional requests (aka rate: 0.05)
Annotations: Key Value pairs used to index traces and use with filters
Metadata: Key Value pairs, not indexed, not used for searching
X-Ray APIs Policy (used by X-Ray daemon)
AWSXrayWriteOnlyAccess
PutTraceSegments
PutTelemetryRecords
GetSamplingRules
GetSamplingTargets
GetSamplingStatisticSummaries
AWSXrayReadOnlyAccess
GetServiceGraph
BatchGetTraces
GetTraceSummaries
GetTraceGraph
AWS CloudTrail
Audit API calls made by users / services / AWS console, under AWS account(s)
history of events / API calls
Console
SDK
CLI
AWS services
useful to find AWS resources changes (when and how)
export logs to CloudWatch Logs or S3
A trail can be applied to All Regions (default) or a single Region.
Event Type
Management Events (default enabled): management operations on resources, can separate as Read and Write
Data Events (default disabled): resource activities
Insights Events: to detect unusual activities (by analyze “Write” events)
Event Retention as 90 days. Can be export to S3 and use Athena for further analysis.
== CICD ==
AWS CodePipeline
automating pipeline from code to deployments, as visual workflow
Consists of stages:
Each stage can have sequential actions and/or parallel actions
Manual approval can be defined at any stage
each stage can create artifacts, stored in S3 bucket
Use AWS EventBridge for troubleshooting; with CloudTrail for audit AWS API calls
AWS CodeBuild
building and testing our code
Build instruction on buildspec.yml, stored at the root of codes
env
variables – plaintext variables
parameter-store – variables stored in SSM Parameter Store
secrets-manager – variables stored in AWS Secrets Manager
phases
install – install dependencies you may need for your build
pre_build – final commands to execute before build
Build – actual build commands
post_build – finishing touches (e.g., zip output)
artifacts – what to upload to S3 (encr ypted with KMS)
cache – files to cache (usually dependencies) to S3 for future build speedup
AWS CodeDeploy
Deploy new applications versions
EC2 Instances, On-premises servers
need CodeDeployAgent on the target instance, with S3 access permit
In-place Deployment (compatible with existing ASG)
AllAtOnce: most downtime
HalfAtATime: reduced capacity by 50%
OneAtATime: slowest, lowest availability impact
Custom: define your %
or Blue/Green Deployment (new ASG created)
must be using an ELB
Lambda functions (integrated into SAM)
with only Blue/Green deployment
traffic would be redirected by
Linear
Canary
AllAtOnce
ECS Services
with only Blue/Green deployment
needs Application Load Balancer (ALB) to control traffic
Linear
Canary
AllAtOnce
not Elastic Beanstalk
Automated Rollback capability
Use appspec.yml to define actions
Rollback = redeploy a previously the last known good revision as a new deployment (with new deployment ID)
AWS CodeArtifact
store, publish, and share software packages (aka code dependencies)
AWS CodeGuru
using Machine Learning for automated code reviews and application performance recommends
MaxStackDepth – the max depth of chain on method call
MemoryUsageLimitPercent
MinimumTimeForReportingInMilliseconds
ReportingIntervalInMilliseconds
SamplingIntervalInMilliseconds
Disaster Recovery (DR)
DR approaches
Backup and restore = lowest cost, just create backups
Pilot Light = small part of core services that is running and syncing data or documents
Warm Standby = scaled down version of a fully functional environment that is actively running
Multi-site = on-prem and in AWS in an active-active configuration
For disaster recovery in a different region, create a AMI from your EC2 instance and copy it into a 2nd region.
AWS Global Accelerator
increases availability and performance
can be expensive
runs over AWS global network
directs traffic to optimal endpoints across multiple regions
By default, provides you with 2 static IP addresses that are anycast from the AWS edge network. You can migrate existing IPv4 (/24) IPs rather than creating new.