10. Miscellanies

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

  • for Apache Kafka (alternative for Amazon Kinesis Data Streams)
  • Fully managed, with data stored in EBS for as long as needed (longer than 1 year)
  • Or Amazon MSK serverless, which no capacity concerns
  • Differences with Amazon Kinesis Data Streams
    • Message size is not limited in 1MB(default)
    • Topics with Partitions in Kafka = Shards in Data Streams
    • MSK can only add Partition, but Data Streams can split and merge Shards
    • PlainText or In-flight TLS encryption; Data Streams can only in-flight TLS
    • Kinesis provides “at-least-once” message delivery -> duplicate records might occur
    • MSK provides “exactly-once” delivery -> risk to lost message
Kinesis Data StreamsFirehoseManaged Service for Apache Flink
UsageCapture stream log and event data, run real-time analytics, and build event-driven applicationsLoad data streams into AWS data storesAnalyze data streams with Managed Service for Apache Flink Studio and Apache Flink
Data sourcesMobile apps, application logs, web clickstream/social, IoT sensors, connected products, smart buildingsConnected devices such as consumer appliances, embedded sensors, TV set-top boxes, clickstream data, application logsAnalyze streaming data from Kinesis Data Streams, Amazon MSK, Amazon MQ, custom connectors
Stream ingestionAWS SDKs, Kinesis Producer Library, AWS Mobile SDKs, Kinesis Agent, AWS IoT, Amazon CloudWatch Events, Amazon DynamoDB, AWS DMSAWS SDKs, Kinesis Producer Library, Kinesis Data Streams, Kinesis Agent, AWS IoT, Amazon CloudWatch EventsAnalyze streaming data from Kinesis Data Streams, Amazon MSK, Amazon MQ, custom connectors

Amazon OpenSearch Service (previous “Amazon ElasticSearch”)

  • Previous, “AWS ElasticSearch”
  • Can search on “any” field, even partial match.
  • Two cluster modes, “managed” and “serverless”.
  • Not natively support SQL.
  • OpenSearch Dashboards for visualisation.

AWS SES

  • Simple E-mail Service for sending marketing e-mails (like a Marketo or ConstantContact)

Scalability

  • an application / system can handle greater loads by adapting.
    • Vertical, to increase instance size
    • Horizontal, also called as “elasticity”, to increase number of instances

High Availability

  • running your application / system in at least 2 data centers (== Availability Zones)

Server Name Indication (SNI)

  • allow multiple SSL certificates onto one web server (to serve multiple websites), only works for ALB & NLB and CloudFront

API Rate Limits

  • DescribeInstances API for EC2 has a limit of 100 calls per seconds
  • GetObject on S3 has a limit of 5500 GET per second per prefix
  • For Intermittent Errors: implement Exponential Backoff
  • For Consistent Errors: request an API throttling limit increase

Service Quotas (Service Limits)

  • Running On-Demand Standard Instances: 1152 vCPU
  • You can request a service limit increase by opening a ticket
  • You can request a service quota increase by using the Service Quotas API

Exponential Backoff

  • If get ThrottlingException intermittently
  • For on 5xx server errors and throttling, not on the 4xx client errors

Signing AWS API requests

  • using Signature v4 (SigV4) to send credentials (access key & secret key)
    • HTTP Header option (signature in Authorization header)
    • Query String option, ex: S3 pre-signed URLs (signature in X-Amz-Signature)

AWS Fault Injection Simulator

  • a managed service that is commonly used in chaos engineering, and not for application development. It enables you to perform fault injection experiments on your AWS workloads to improve the performance and resiliency of your applications

AWS Batch

  • used to efficiently run hundreds of thousands of batch computing jobs in AWS; mostly used for ML model training, simulation, and analysis at any scale

AWS Simple Workflow Service (SWF) 

  • is for executing tasks. Helps developers build, run, and scale background jobs; but it does not provide serverless orchestration to multiple AWS resources

Amazon Pinpoint

  • allows you to engage with your customers across multiple messaging channels.
  • primarily used to send push notifications, emails, SMS text messages, and voice messages.

AWS CloudShell

  • simply a command-line interface used for managing AWS resources from a terminal

AWS IoT Greengrass

  • enable connected devices to run AWS Lambda functions, execute predictions based on machine learning models, keep device data in sync, and communicate with other devices securely even without an Internet connection.

Amazon DynamoDB point-in-time recovery (PITR)

  • provides automatic continuous backups of your DynamoDB table data. Point-in-time recovery (PITR) backups are fully managed by DynamoDB and provide up to 35 days of recovery points at a per second granularity.

Disaster Recovery (DR)

  • about preparing for and recovering from a disaster
  • RPO: Recovery Point Objective, about Data Loss
  • RTO: Recovery Time Objective, about Downtime
  • DR approaches
    • Backup and restore
      • lowest cost, just create backups
    • Pilot Light
      • small part of core services that is running and syncing data or documents
      • Useful for the critical core (pilot light)
      • Faster than Backup and Restore as critical systems are already up
    • Warm Standby
      • scaled down version of a fully functional environment that is actively running
      • Upon disaster, we can scale to production load
    • Hot Site / Multi Site
      • on-prem and in AWS in an active-active configuration
      • Very low RTO (minutes or seconds) – very expensive
      • Full Production Scale is running AWS and On Premise
  • For disaster recovery in a different region, create a AMI from your EC2 instance and copy it into a 2nd region. 
  • Tips
    • Backup
      • EBS Snapshots, RDS automated backups / Snapshots, etc…
      • Regular pushes to S3 / S3 IA / Glacier, Lifecycle Policy, Cross Region Replication
      • From On-Premise: Snowball or Storage Gateway
    • High Availability
      • Use Route53 to migrate DNS over from Region to Region
      • RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
      • Site to Site VPN as a recovery from Direct Connect
    • Replication
      • RDS Replication (Cross Region), AWS Aurora + Global Databases
      • Database replication from on-premise to RDS
      • Storage Gateway
    • Automation
      • CloudFormation / Elastic Beanstalk to re-create a whole new environment
      • Recover / Reboot EC2 instances with CloudWatch if alarms fail
      • AWS Lambda functions for customized automations
    • Chaos
      • Netflix has a “simian-army” randomly terminating EC2

AWS Global Accelerator

  • increases availability and performance improve the performance of your network traffic by utilizing the AWS global infrastructure instead of the public Internet.
  • can be expensive
  • runs over AWS global network 
  • directs traffic to optimal endpoints across multiple regions
  • By default, provides you with 2 static IP addresses that are anycast from the AWS edge network. You can migrate existing IPv4 (/24) IPs rather than creating new.

AWS Cost Management

  • gain visibility into their cloud spending, identify cost-saving opportunities, and make informed decisions about resource allocation and optimization
  • Steps of Cost Allocation
    • STEP ONE: create user-defined tags with key-value pairs that reflect attributes such as project names or departments to ensure proper categorization of resources
    • STEP TWO: apply these tags to the relevant resources to enable tracking
    • STEP THREE:  enable the cost allocation tags in the Billing console
    • (AFTER) STEP FOUR: Configure tag-based cost and usage reports (AWS Cost Allocation Reports) for detailed analysis in Cost Explorer
  • Step of Budgeting
    • Set up automated alerts based on budget thresholds for expenses
  • AWS Budgets is typically used for setting cost and usage limits and receiving alerts; more about monitoring and controlling costs rather than detailed tracking and reporting

Amazon Managed Service for Prometheus (AMP)

  • a fully managed, Prometheus-compatible monitoring and alerting service that allows users to monitor containerized applications at scale on AWS. It simplifies the process of collecting, storing, and querying Prometheus metrics by handling the underlying infrastructure and scaling needs. AMP is compatible with the open-source Prometheus project and supports the Prometheus Query Language (PromQL). 
  • works with container clusters that run on Amazon Elastic Kubernetes Service and self-managed Kubernetes environments.
  • AMP offers agentless observability, meaning users don’t need to manage any agents within their clusters
  • AMP utilizes Amazon S3 for reliable and scalable long-term storage of metrics. 

Serverless ComponentMax TimeoutComments
API Gateway50 milliseconds – 29 secondsConfigurable
Lambda Function900 seconds (15 minutes)Also limited to 1,000 concurrent executions. If not handled, can lead to throttling issues.
DynamoDB Streams40,000 write capacity units per table 
S3No timeout by default, can be configured to 5-10 secondsUnlimited objects per bucket

Regions

  • a physical location around the world where we cluster data centers. We call each group of logical data centers an Availability Zone.
  • Each AWS Region consists of a minimum of three, isolated, and physically separate AZs within a geographic area.
  • Each AZ has independent power, cooling, and physical security and is connected via redundant, ultra-low-latency networks. AWS customers focused on high availability can design their applications to run in multiple AZs to achieve even greater fault-tolerance.

Availability Zones

  • An Availability Zone (AZ) is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region.
  • AZs give customers the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center.
  • All AZs in an AWS Region are interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber providing high-throughput, low-latency networking between AZs.
  • All traffic between AZs is encrypted.
  • If an application is partitioned across AZs, companies are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more.
  • AZs are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.


  • AWS App Mesh = for application networking for microservices applications
  • AWS Resource Access Manager = share a Transit Gateway connection (only?) with other AWS accounts
  • AWS Server Migration Service (SMS) is for migrating virtual machines
  • AWS CodeStar quickly develop, build and deploy applications on AWS
  • AWS Import/Export send HDDs with data to AWS and they import the data into S3
  • Amazon Neptune: interactive graphs of DBs
  • Amazon AppStream: streaming service
  • Amazon Elastic Transcoder: convert video and audio files into versions that play on phones, tablets and PCs
  • CloudSearch: search engine for your site
  • AWS LightSail: easy alternative to setting up a VPC. Product set includes virtual servers (instances), MySQL DBs, HA storage and load balance 
  • AWS IoT Core: connected devices interact securely with cloud applications