08. Architecture

== CODE IMPLEMENTATION ==

AWS Command Line Interface (CLI) & Software Developer Kit (SDK)

  • both protected by access keys, generated from AWS Management Console
  • CLI can direct access to public APIs of AWS resources
  • to use MFA with CLI, which means a temporary session
    • run the STS GetSessionToken API call
      aws sts get-session-token --serial-number arn-of-the-mfa-device --token-code code-from-token --duration-seconds 3600
  • CLI Credentials Provider Chain
    • Command line options – –region, –output, and –profile
    • Environment variables – AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
    • CLI credentials file – aws configure ~/.aws/credentials
    • CLI configuration file – aws configure ~/.aws/config
    • Container credentials – for ECS tasks
    • Instance profile credentials – for EC2 Instance Profiles
  • SDK is set of libraries for programming, as language-specific APIs
  • the AWS CLI uses the Python SDK (boto3)
  • “us-east-1” would be chosen by default, if no region specified
  • SDK Credential Provider Chain
    • Java system properties – aws.accessKeyId and aws.secretKey
    • Environment variables –
      AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
    • The default credential profiles file – ex at: ~/.aws/credentials, shared by many SDK
    • Amazon ECS container credentials – for ECS containers
    • Instance profile credentials– used on EC2 instances

AWS CloudFormation

  • provision infrastructure using a text-based (JSON/YAML) templates (uploaded to S3) that describes exactly what resources are provisioned and their settings.
  • manages the template history similar to how code is managed in source control
  • Delete the stack would also remove each individual artifact
  • Components
    • AWSTemplateFormatVersion
    • Description
    • Resources (mandatory) – aws resources
    • Parameters – dynamic inputs (AllowedValues/NoEcho) !Ref
    • Mappings – static variables
    • Outputs – reference about what has been created !ImportValue
    • Conditions
    • (Reference / Functions) as Helper
  • 2 methods of updating a stack
    1. ​direct update – CloudFormation immediately deploys your changes 
    2. change sets – preview your changes first, then decide if you want to deploy
  • Intrinsic Functions
    • Fn::Ref (!Ref): for AWS resource, the ID is return while using Fn::Ref to call
    • Fn::GetAtt (!GetAtt)
    • Fn::FindInMap (!FindInMap) [MapName, TopLevelKey, SecondLevelKey]
    • Fn::ImportValue (!ImportValue)
    • Fn::Base64 (!Base64): convert string to Base64; heavily used in UserData
    • Conditions (Fn::If, Fn::Not, Fn::Equal, Fn::And, Fn::Or, etc)
  • Capabilities
    • CAPABILITY_(Resource Named)_IAM, CAPABILITY_IAM
    • CAPABILITY_AUTO_EXPAND, for Macro and Nested stakes
    • InsufficientCapabiltiesException
  • DeletionPolicy
    • Delete (wont work in S3 if the bucket is not empty)
    • Retain
    • Snapshot: create a final snapshot before deleted
  • Stack policy is a JSON to tell which resource(s) should be protected as not touched during update; so an explicit ALLOW for the desired resources for update is needed
  • Termination Protection is to prevent accidental delete
  • Custom Resources
    • custom functions running via Lambda, for example, empty a S3 bucket
    • AWS::CloudFormation::CustomResource or Custom::MyCustomResourceTypeName
    • Properties with Service Token (Lambda function or SNS topic, in the sam region) and optional Input data
  • Stackset is used for cross account/region stacks management

Cloud Development Kit (CDK)

  • CloudFormation using JSON/YAML, but CDK using Javascript/Typescript, Python, Java, .Net
  • Contain higher level components, constructs
    • encapsulate everything for final CloudFormation stack creation
    • AWS Construct Library or Construct Hub
      • Layer 1 (L1): CloudFormation(CFN) resources, prefix with “Cfn”, and all resource properties needed to be explicitly configured
      • Layer 2 (L2): intent-based API resources, with defaults and boilerplate, also can use methods
      • Layer 3 (L3): aka Patterns, represents as multiple related resources (for example, API Gateway + Lambda, or Fargate cluster + Application Load Balancer)
  • The codes would be complied to CloudFormation template
  • Benefits for Lambda & ECS/EKS as infrastructures and applications runtime codes implemented together
  • SAM focus on serverless, good for Lambda, but only JSON/YAML
  • Bootstrapping: the process of provisioning before deploying AWS environment (Account+Region)
    • CDKToolkit (CloudFormation stack), with S3 Bucket – store files and IAM Roles
    • Error: “Policy contains a statement with one or more invalid principal”, due to the lack of new IAM Roles for each new environment
  • UnitTest, using CDK Assertion Module for Jest(Javascript) or Pytest(Python)
    • Fine-granted Assertions (common): check certain property of certain resource
    • Snapshot Test: test against baseline template

AWS Elastic Beanstalk

  • provision infrastructure using a text-based template that describes exactly what resources are provisioned and their settings
  • complied to CloudFormation stack
  • Components
    • Application
    • Application Version
    • Environment
      • Web Server Tier and Worker Tier
  • Deployment Method
    • All at once, has downtime
    • Rolling: running under capacity, no additional costs
    • Rolling with additional batches: compared to Rolling, this let application running at capacity (ie temporary create more instances)
    • Immutable: create new instances in a new ASG, then swap; zero downtime
    • (Blue Green: new environment then swap, using Route53 with weighted policies)
    • Traffic Splitting: canary testing
  • Lifecycle
    • max most Application versions: 1000, use LifeCycle Policy to phase out, based on Time or Space
    • has option to retain source bundles on S3
  • EB Extensions
    • YAML/JSON, with “.config” extension as file name
    • update defaults with “option_settings”
    • place under the “.ebextensions/” folder under root of source code
    • resources managed by .ebextensions would be deleted if the environment goes away
  • EB Clone, can help to setup exact same “configuration” environment
    • Load Balancer type and configure
    • RDS configure, but no data
    • Environment variables
  • EB Migration
    • Once EB created, the Elastic Load Balancer (ELB) type cannot change
      • Create another environment with new ELB, then using Route53 update or CNAME swap
    • Decouple RDS with EB, for PROD

== CONTAINERS ==

Amazon Elastic Container Service (ECS)

  • Container management service for Docker containers (ECS Task)
  • Highly scalable / high performance, lets you run applications on an EC2 cluster
  • Amazon Elastic Container Registry (ECR) is private repository for Docker images, the public version is Amazon ECR Public Gallery; backed by Amazon S3, access controlled through IAM
  • ECS Launch Types
    1. Fargate Launch Type is serverless, managed by AWS
    2. EC2 Launch Type gives you direct access to the instances, but you have to manage them, with ECS Agent
      • ECS Agent would use EC2 Instance Profile
      • ESC Tasks use each individual ESC Task Role, which is defined in the task definition
  • Mount EFS for ECS tasks, which can ensure all tasks in any AZ will share the same data; in comparison, S3 cannot be mounted as File System
  • ECS Task definition is metadata in JSON, up to 10 containers in one file
    • Image name
    • Port Binding for Container and Host
      • on EC2 Launch type, if only define container port, then the ALB would use Dynamic Host Port Mapping, then on EC2 instance’s Security Group should set allow on any port from ALB security group
      • each task has its unique private IP on Fargate Launch, so only define the container port
    • Memory and CPU required
    • Environment variables (Hardcoded ,SSM Parameter Store, Secrets Manager, or files stored in S3)
    • Networking
    • IAM Role (One IAM Role per Task Definition)
    • Logging configuration (CloudWatch)
    • Data Volume to share data among multiple containers (Applications and Metrics/Logs, aka sidecar)
      • EC2 Launch Type – using EC2 instance storage
      • Fargate Launch Type – using ephemeral storage (20-200 GB), data deleted when containers demolished
  • ECS Task Placement strategy & Task Placement constraints – Only for EC2 Launch Type
    1. find instances meet CPU/Memory/Port requirements
    2. find those satisfy task placement constraints
      • distinctInstance – place each task on different container instance
      • memberOf – using Cluster Query Language, placing on certain instances (like t2.*)
    3. find those satisfy task placement strategies
      • Binpack – cost-saving by using least available amount of CPU or Memory as minimum instances
      • Random
      • Spread (can be AZ or instance ID)
  • ECS does not use EC2 Auto Scaling, instead, uses the AWS Application Auto Scaling based on
    • Average CPU Utilization
    • Average Memory Utilization – Scale on RAM
    • ALB Request Count Per Target
  • AWS Application Auto Scaling policy can be
    • Target Tracking – scale based on the target specific CloudWatch metric
    • Step Scaling – based on a specified CloudWatch Alarm
    • Scheduled Scaling
  • Under EC2 Launch Type, the way to auto-scaling EC2 instances by
    • Auto Scaling Group Scaling – use EC2 ASG to check instance loadings (CPU, Memory, etc.)
    • ECS Cluster Capacity Provider, paired with ASG
  • AWS Coplit is the CLI tool, running apps on AppRunner, ECS and Fargate; with CodePipeline for deployment

Amazon Elastic Kubernetes Service (EKS)

  • EC2 Launch for deploy worker node; Fargate for serverless
  • Kubernetes is cloud-agnostic
  • Node Types
    • Managed Node Groups
      • AWS handles EC2 instances with ASG managed by EKS
      • On-Demand or Spot instances
    • Self-Managed Nodes
      • Self create and manage EC2 instance with self-define ASG
      • On-Demand or Spot instances
    • AWS Fargate
  • Can specify StorageClass manifest on EKS cluster, leverage a Container Storage Interface (CSI) compliant driver
    • Amazon EBS (EC2)
    • Amazon EFS (EC2, Fargate)
    • Amazon FSx for Lustre (EC2)
    • Amazon FSx for NetApp ONTAP (EC2)

== MONITORING ==

AWS CloudWatch

  • Metrics: Collect and track key metrics for every AWS services
    • namespace
    • dimension is an attributes (instance id, environment, …)
    • timestamps
    • (EC2) Memory usage is a custom metric, using API PutMetricData
    • StorageResolution can be 1min (Standard) or 1/5/10/30 sec(High Resolution)
    • Data point range of custom metric would be 2 weeks for past history and 2 hours in future
  • Logs: Collect, monitor, analyze and store log files
    • Group – application (to encrpyt with KMS keys, need to use CloudWatch Logs API)
    • stream – instances / log files / containers
    • export
      • Amazon S3, may take up to 12 hour, with API CreateExportTask
      • Using Logs Subscripton to export real-time events to Kinesis Data Streams, Kinesis Data Firehose, AWS Lambda, with Subscription Filter
        • Cross-Account Subscription (Subscription Filter -> Subscription Destination)
    • Live Tail – for realtime tail watch
    • By default, no logs from EC2 machine to CloudWatch
      • CloudWatch Logs Agent – only push logs
      • CloudWatch Unified Agent – push logs + collect metrics (extra RAM, Process, Swap) + centralized by SSM Parameter Store
    • Metric Filters to trigger alarms; not traceback of history
  • Alarms: React in real-time to metrics / events
    • based on a single metric; Composite Alarms are monitoring on multiple other alarms
    • Targets
      • EC2
      • EC2 ASG
      • Amazon SNS
  • Synthetics Canary: monitor your APIs, URLs, Websites, …
  • Events, now called Amazon EventBridge
    • Schedule – cron job
    • Event Pattern – rules to react/trigger services
    • Event Bus,a router that receives events and delivers them to zero or more destinations, or targets.
      • (AWS) default, Partner, Custom
    • Schema – the structure template for event (json)
  • CloudWatch Evidently
    • validate/serve new features to specified % of users only
    • Launches (= feature flags) and Experiments (= A/B testing), and Overrides (specific variants assigned to specific user-id)
    • evaluation events stored in CloudWatch Logs or S3

AWS X-Ray

  • Troubleshooting application performance and errors as “centralized service map visualization”
  • Request tracking across distributed systems
  • Focus on Latency, Errors and Fault analysis
  • Compatible
    • AWS Lambda
    • Elastic Beanstalk
    • ECS
    • ELB
    • API Gateway
    • EC2 Instances or any application ser ver (even on premise)
  • Enable by
    • AWS X-Ray SDK
    • Install X-Ray daemon (low lv UDP packet interceptor on OS)
    • Enable X-Ray AWS Integration (IAM Role with proper permission)
  • Instrumentation means the measure of product’s performance, diagnose errors, and to write trace information
    • Segments: each application / service will send them
    • Subsegments: if you need more details in your segment
    • Trace: segments collected together to form an end-to-end trace
    • Sampling: decrease the amount of requests sent to X-Ray, reduce cost
      • (default) 1st request each second (aka reservoir: 1), and then 5% of additional requests (aka rate: 0.05)
    • Annotations: Key Value pairs used to index traces and use with filters
    • Metadata: Key Value pairs, not indexed, not used for searching
  • X-Ray APIs Policy (used by X-Ray daemon)
    • AWSXrayWriteOnlyAccess
      • PutTraceSegments
      • PutTelemetryRecords
      • GetSamplingRules
      • GetSamplingTargets
      • GetSamplingStatisticSummaries
    • AWSXrayReadOnlyAccess
      • GetServiceGraph
      • BatchGetTraces
      • GetTraceSummaries
      • GetTraceGraph

AWS CloudTrail

  • Audit API calls made by users / services / AWS console, under AWS account(s)
    • history of events / API calls
      • Console
      • SDK
      • CLI
      • AWS services
    • useful to find AWS resources changes (when and how)
  • export logs to CloudWatch Logs or S3
  • A trail can be applied to All Regions (default) or a single Region.
  • Event Type
    • Management Events (default enabled): management operations on resources, can separate as Read and Write
    • Data Events (default disabled): resource activities
    • Insights Events: to detect unusual activities (by analyze “Write” events)
  • Event Retention as 90 days. Can be export to S3 and use Athena for further analysis.

== CICD ==

AWS CodePipeline

  • automating pipeline from code to deployments, as visual workflow
  • Consists of stages:
    • Each stage can have sequential actions and/or parallel actions
    • Manual approval can be defined at any stage
    • each stage can create artifacts, stored in S3 bucket
  • Use AWS EventBridge for troubleshooting; with CloudTrail for audit AWS API calls

AWS CodeBuild

  • building and testing our code
  • Build instruction on buildspec.yml, stored at the root of codes
  • env
    • variables – plaintext variables
    • parameter-store – variables stored in SSM Parameter Store
    • secrets-manager – variables stored in AWS Secrets Manager
  • phases
    • install – install dependencies you may need for your build
    • pre_build – final commands to execute before build
    • Build – actual build commands
    • post_build – finishing touches (e.g., zip output)
  • artifacts – what to upload to S3 (encr ypted with KMS)
  • cache – files to cache (usually dependencies) to S3 for future build speedup

AWS CodeDeploy

  • Deploy new applications versions
    • EC2 Instances, On-premises servers
      • need CodeDeployAgent on the target instance, with S3 access permit
      • In-place Deployment (compatible with existing ASG)
        • AllAtOnce: most downtime
        • HalfAtATime: reduced capacity by 50%
        • OneAtATime: slowest, lowest availability impact
        • Custom: define your %
      • or Blue/Green Deployment (new ASG created)
        • must be using an ELB
    • Lambda functions (integrated into SAM)
      • traffic would be redirected by
        • Linear
        • Canary
        • AllAtOnce
    • ECS Services
      • with only Blue/Green deployment
      • needs Application Load Balancer (ALB) to control traffic
        • Linear
        • Canary
        • AllAtOnce
    • not Elastic Beanstalk
  • Automated Rollback capability
  • Use appspec.yml to define actions
  • Rollback = redeploy a previously the last known good revision as a new deployment (with new deployment ID)

AWS CodeArtifact

  • store, publish, and share software packages (aka code dependencies)

AWS CodeGuru

  • using Machine Learning for automated code reviews and application performance recommends
  • CodeGuru Reviewer- identify critical issues, security vulnerabilities
  • CodeGuru Profiler
  • CodeGuru Agent
    • MaxStackDepth – the max depth of chain on method call
    • MemoryUsageLimitPercent
    • MinimumTimeForReportingInMilliseconds
    • ReportingIntervalInMilliseconds
    • SamplingIntervalInMilliseconds


Disaster Recovery (DR)

  • DR approaches
    • Backup and restore = lowest cost, just create backups
    • Pilot Light = small part of core services that is running and syncing data or documents
    • Warm Standby = scaled down version of a fully functional environment that is actively running
    • Multi-site = on-prem and in AWS in an active-active configuration
  • For disaster recovery in a different region, create a AMI from your EC2 instance and copy it into a 2nd region. 

AWS Global Accelerator

  • increases availability and performance
  • can be expensive
  • runs over AWS global network 
  • directs traffic to optimal endpoints across multiple regions
  • By default, provides you with 2 static IP addresses that are anycast from the AWS edge network. You can migrate existing IPv4 (/24) IPs rather than creating new.