== CODE IMPLEMENTATION ==
AWS Command Line Interface (CLI) & Software Developer Kit (SDK)
- both protected by access keys, generated from AWS Management Console
- CLI can direct access to public APIs of AWS resources
- to use MFA with CLI, which means a temporary session
- run the STS GetSessionToken API call
aws sts get-session-token --serial-number arn-of-the-mfa-device --token-code code-from-token --duration-seconds 3600
- run the STS GetSessionToken API call
- CLI Credentials Provider Chain
- Command line options ā –region, –output, and –profile
- Environment variables ā AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
- CLI credentials file ā aws configure ~/.aws/credentials
- CLI configuration file ā aws configure ~/.aws/config
- Container credentials ā for ECS tasks
- Instance profile credentials ā for EC2 Instance Profiles
- SDK is set of libraries for programming, as language-specific APIs
- the AWS CLI uses the Python SDK (boto3)
- “us-east-1” would be chosen by default, if no region specified
- SDK Credential Provider Chain
- Java system properties ā aws.accessKeyId and aws.secretKey
- Environment variables ā
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY - The default credential profiles file ā ex at: ~/.aws/credentials, shared by many SDK
- Amazon ECS container credentials ā for ECS containers
- Instance profile credentialsā used on EC2 instances
- pagination parameters in AWS CLI command could help to solve the Time-Out issue from massive resource item processing
- –page-size
- The CLI still retrieves the entire list, but it makes a greater number of service API calls in the background and retrieves fewer items with each request. This increases the probability that individual calls will succeed in without the use of a timeout.
AWS CloudFormation
- provision infrastructure using a text-based (JSON/YAML) templates (uploaded to S3) that describes exactly what resources are provisioned and their settings.
- manages the template history similar to how code is managed in source control
- Benefits
- Infrastructure as code (IaC)
- Cost
- Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you
- Productivity
- Ability to destroy and re-create an infrastructure on the cloud on the fly
- Automated generation of Diagram for your templates!
- Declarative programming (no need to figure out ordering and orchestration)
- Separation of concern: create many stacks for many apps, and many layers.
- Delete the stack would also remove each individual artifact
- Components
- AWSTemplateFormatVersion
- Description
- Resources (mandatory) – aws resources
- Parameters – dynamic inputs (AllowedValues/NoEcho) !Ref
- Pseudo Parameters
- AWS::AccountId
- AWS::Region
- AWS::StackId
- AWS::StackName
- AWS::NotificationARNs
- AWS::NoValue (Doesnāt return a value)
- Pseudo Parameters
- Mappings – static variables !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]
- Outputs – reference about what has been created !ImportValue
- declares optional outputs values that we can import into other stacks (if you export them first)!
- Conditions
- (References / Functions) as Helper
- Transformer- used for Serverless services, especially for AWS SAM.
- 2 methods of updating a stack
- ādirect update – CloudFormation immediately deploys your changes
- change sets – preview your changes first, then decide if you want to deploy
- Intrinsic Functions
- Fn::Ref (!Ref)
- Parameters ā returns the value of the parameter
- Resources ā returns the physical ID of the underlying resource (e.g., EC2 ID)
- Fn::GetAtt (!GetAtt)
- Attributes are attached to any resources you create
- Fn::FindInMap (!FindInMap) [MapName, TopLevelKey, SecondLevelKey]
- Fn::ImportValue (!ImportValue)
- Import values that are expor ted in other stacks
- Fn::Base64 (!Base64)
- convert string to Base64; heavily used in EC2 instance’s UserData property
- user data script log is in /var/log/cloud-init-output.log
- Conditions (Fn::If, Fn::Not, Fn::Equal, Fn::And, Fn::Or, etc)
- Fn::Ref (!Ref)
- Rollbacks
- Stack Creation Fails:
- Default: ever ything rolls back (gets deleted). We can look at the log
- Option to disable rollback and troubleshoot what happened
- Stack Update Fails:
- The stack automatically rolls back to the previous known working state
- Ability to see in the log what happened and error messages
- Rollback Failure? Fix resources manually then issue ContinueUpdateRollback API from Console
- Or from the CLI using continue-update-rollback API call
- Stack Creation Fails:
- Service Role
- IAM role that allows CloudFormation to create/update/delete stack resources on your behalf
- User must have iam:PassRole permissions
- Capabilities
- CAPABILITY_(Resource Named)_IAM & CAPABILITY_IAM
- Necessary to enable when you CloudFormation template is creating or updating IAM resources (IAM User, Role, Group, Policy, Access Keys, Instance Profileā¦)
- CAPABILITY_AUTO_EXPAND, for Macro and Nested stakes
- Necessar y when your CloudFormation template includes Macros or Nested Stacks (stacks within stacks) to perform dynamic transformations
- InsufficientCapabiltiesException
- Exception that will be thrown by CloudFormation if the capabilities havenāt been acknowledged when deploying a template (security measure)
- CAPABILITY_(Resource Named)_IAM & CAPABILITY_IAM
- DeletionPolicy
- Default: Delete (wont work in S3 if the bucket is not empty)
- Retain
- Snapshot: create a final snapshot before deleted
- works with storage related (EBS, ElastiCache, RDS, Redshift, Neptune, etc)
- Stack policy is a JSON to tell which resource(s) should be protected as not touched during update; so an explicit ALLOW for the desired resources for update is needed
- Termination Protection is to prevent accidental delete
- Custom Resources
- custom functions running via Lambda, for example, empty a S3 bucket
- AWS::CloudFormation::CustomResource or Custom::MyCustomResourceTypeName
- Properties with Service Token (Lambda function or SNS topic, in the sam region) and optional Input data
- Dynamic References
- Reference external values stored in Systems Manager Parameter Store and Secrets Manager within CloudFormation templates
- CloudFormation retrieves the value of the speciļ¬ed reference during create/update/delete operations
- Using ā{{resolve:service-name:reference-key}}ā
- ssm ā for plaintext values stored in SSM Parameter Store
- ssm-secure ā for secure strings stored in SSM Parameter Store
- secretsmanager ā for secret values stored in Secrets Manager
- (Python) Helper scripts
- cfn-init ā Use to retrieve and interpret resource metadata, install packages, create files, and start services.
- AWS::CloudFormation::Init
- A config contains the following and is executed in that order
- Packages: used to download and install pre-packaged apps and components on Linux/Windows (ex. MySQL, PHP, etcā¦)
- Groups: define user groups
- Users: define users, and which group they belong to
- Sources: download files and archives and place them on the EC2 instance
- Files: create files on the EC2 instance, using inline or can be pulled from a URL
- Commands: run a series of commands
- Services: launch a list of sysvinit
- AWS::CloudFormation::Init
- cfn-signal ā Use to signal with a WaitCondition, so you can synchronize other resources in the stack when the prerequisite resource or application is ready
- WaitCondition
- Block the template until it receives a signal from cfn-signal
- We attach a CreationPolicy (also works on EC2, ASG)
- We can deļ¬ne a Count > 1 (in case you need more than 1 signal)
- Troubleshooting: Verify that the instance has a connection to the Internet.
- WaitCondition
- cfn-get-metadata ā Use to retrieve metadata for a resource or path to a specific key.
- cfn-hup ā Use to check for updates to metadata every 15(default) minutes and execute custom hooks when changes are detected.
- cfn-init ā Use to retrieve and interpret resource metadata, install packages, create files, and start services.
- Troubleshooting
- DELETE_FAILED
- Some resources must be emptied before deleting, such as S3 buckets
- Use Custom Resources with Lambda functions to automate some actions
- Security Groups cannot be deleted until all EC2 instances in the group are gone
- Think about using DeletionPolicy=Retain to skip deletions
- UPDATE_ROLLBACK_FAILED
- Can be caused by resources changed outside of CloudFormation, insufficient permissions, Auto Scaling Group that doesnāt receive enough signalsā¦
- Manually fix the error and then ContinueUpdateRollback
- DELETE_FAILED
- Nested Stacks
- Nested stacks are stacks as par t of other stacks
- Nested stacks are considered best practice
- To update a nested stack, always update the parent (root stack)
- Nested stacks can have nested stacks themselves!
- DependsOn
- Specify that the creation of a specific resource follows another
- When added to a resource, that resource is created only after the creation of the resource
specified in the DependsOn attribute - Applied automatically when using !Ref and !GetAtt
- Stackset is used for cross accounts/regions stacks management, with a single CloudFormation template.
- A stack instance is simply a reference to a stack in a target account within a region.
- When you update a stackset, all associated stack instances are updated throughout all accounts and regions
- Permission Models
- Self-managed Permissions
- Create the IAM roles (with established trusted relationship) in both administrator and target accounts
- Deploy to any target account in which you have permissions to create IAM role
- Service-managed Permissions
- Deploy to accounts managed by AWS Organizations
- StackSets create the IAM roles on your behalf (enable trusted access with AWS Organizations)
- Must enable all features in AWS Organizations
- Ability to deploy to accounts added to your organization in the future (Automatic Deployments)
- Self-managed Permissions
- Troubleshooting
- A stack operation failed, and the stack instance status is OUTDATED.
- Insufļ¬cient permissions in a target account for creating resources that are speciļ¬ed in your template.
- The template could be trying to create global resources that must be unique but aren’t, such as S3 buckets
- The administrator account does not have a trust relationship with the target account
- Reached a limit or a quota in the target account (too many resources)
- A stack operation failed, and the stack instance status is OUTDATED.
- Drift
- Performs drift detection on the stack associated with each stack instance in the StackSet
- If the current state of a resource in a stack varies from the expected state:
- The stack considered drifted
- And the stack instance that the stack associated with considered drifted
- And the StackSet is considered drifted
- Drift detection identifies unmanaged changes (outside CloudFormation)
- Changes made through CloudFormation to a stack directly (not at the StackSet level), arenāt considered drifted
- You can stop drift detection on a StackSet
- ChangeSets
- When you update a stack, you need to know what changes will happen before it applying them for greater confidence
- ChangeSets wonāt say if the update will be successful
- For Nested Stacks, you see the changes across all stacks
- AWS Service Catalog
- a quick self-service por tal to launch a set of authorized products pre-defined by admins
- Create and manage catalogs of IT ser vices that are approved on AWS
- The āproductsā are CloudFormation templates
- Ex: Virtual machine images, Servers, Software, Databases, Regions, IP address ranges
- CloudFormation helps ensure consistency, and standardization by Admins
- They are assigned to Portfolios (teams)
- Teams are presented a self-service portal where they can launch the products
- All the deployed products are centrally managed deployed services
- Helps with governance, compliance, and consistency
- Can give user access to launching products without requiring deep AWS knowledge
- Integrations with āself-service portalsā such as ServiceNow
- Stack Set Constraints
- Accounts
- Regions
- Permissions
- Launch Constraints
- IAM Role assigned to a Product which allows a user to launch, update, or terminate a product with minimal IAM permissions
- Example: end user has access only to Service Catalog, all other permissions required are
attached to the Launch Constraint IAM Role - IAM Role must have the following permissions:
- CloudFormation (Full Access)
- AWS Services in the CloudFormation template
- S3 Bucket which contains the CloudFormation template (Read Access)
- Local artifacts declared in CodeUri property
- The aws cloudformation package command packages the local artifacts (local paths) that your AWS CloudFormation template references.
- After you package your templateās artifacts, run the aws cloudformation deploy command to deploy the returned template.










AWS Serverless Application Model (SAM)
- configure via YAML, complied to CloudFormation stack
- use CodeDeploy for Lambda function
- Traffic Shifting (from OLD ver to New ver)
- Linear: grow trafļ¬c every N minutes until 100%
- Canary: try X percent then 100%
- AllAtOnce: immediate
- Pre- and Pro- for testing on traffic shifting
- Pre and Post traffic hooks features to validate deployment (before the traffic shift star ts and after it ends)
- rollback by AWS CloudWatch Alarm
- AppSpec.yml
- Name
- Alias
- CurrentVersion
- TargetVersion
- Traffic Shifting (from OLD ver to New ver)

- run Lambda, API Gateway, DynamoDB locally
- Lambda start/invoke
- API Gateway
- AWS Events (sample payloads for event resources)
- SAM Recipe
- Transform Header – template
- Write Code
- AWS::Serverless::Function
- AWS::Serverless::Api
- AWS::Serverless::SimpleTable
- Package and Deploy – into S3 Bucket
- Quickly sync local changes to AWS Lambda (SAM Accelerate): sam sync –watch
- SAM commands
- sam init – creating a new SAM project
- sam build – resolve dependencies and construct deployment artifacts for all functions and layers in the SAM template.
- sam package – prepares the serverless application for deployment by zipping artifacts, uploading them to S3, and generating a CloudFormation template with references to the uploaded artifacts in S3. But, it doesnāt deploy the application.
- sam deploy – zips your code artifacts, uploads them to Amazon S3, and produces a packaged AWS SAM template file that it uses to deploy your application
- for nested applications, need “
CAPABILITY_AUTO_EXPAND
” option - Compared with “aws cloudformation deploy” – deploy a CloudFormation stack, it expects that your artifacts are already packaged and uploaded to S3.
- for nested applications, need “
- sam publish – publishes an AWS SAM application to the AWS Serverless Application Repository
- sam sync – update existing SAM template
- as Accelerate, reduce latency on deployments for rapid development testing
- using “–code” option, without updating infrastructure (service APIs and bypass CloudFormation)
- sam local
- can specify a named profile from your AWS CLI configuration using the –profile parameter with the sam local invoke command
- run the aws configure with the –profile option to set the credentials for a named profile
- AWS SAM template file
- AWS::Serverless::Application – for nested application
- AWS::Serverless::Function – configuration information for creating a Lambda function
- AWS::Serverless::LayerVersion – creates a Lambda layer version (LayerVersion) that contains library or runtime code thatās needed by a Lambda function
- AWS::Serverless::Api – describes an API Gateway resource. Itās useful for advanced use cases where you want full control and flexibility when you configure your APIs. Mostly as part of event sources of “AWS::Serverless::Function”
- SAM Policy Templates
- apply permissions to Lambda Functions
- SAM Multiple Environments, using “samconfig.toml”


Cloud Development Kit (CDK)
- CloudFormation using JSON/YAML, but CDK using Javascript/Typescript, Python, Java, .Net
- Contain higher level components, constructs
- encapsulate everything for final CloudFormation stack creation
- AWS Construct Library or Construct Hub
- Layer 1 (L1): CloudFormation(CFN) resources, prefix with “Cfn”, and all resource properties needed to be explicitly configured
- Layer 2 (L2): intent-based API resources, with defaults and boilerplate, also can use methods
- Layer 3 (L3): aka Patterns, represents as multiple related resources (for example, API Gateway + Lambda, or Fargate cluster + Application Load Balancer)
- The codes would be complied to CloudFormation template
- Benefits for Lambda & ECS/EKS as infrastructures and applications runtime codes implemented together
- SAM focus on serverless, good for Lambda, but only JSON/YAML
- Bootstrapping: the process of provisioning before deploying AWS environment (Account+Region)
- CDKToolkit (CloudFormation stack), with S3 Bucket – store files and IAM Roles
- Error: “Policy contains a statement with one or more invalid principal”, due to the lack of new IAM Roles for each new environment
- UnitTest, using CDK Assertion Module for Jest(Javascript) or Pytest(Python)
- Fine-granted Assertions (common): check certain property of certain resource
- Snapshot Test: test against baseline template


AWS Elastic Beanstalk
- provision infrastructure using a text-based template that describes exactly what resources are provisioned and their settings
- Amazon EC2 Instance
- Amazon CloudWatch
- ELB & ASG
- AWS S3
- RDS, DynamoDB
- Amazon SNS
- complied to CloudFormation stack
- Managed service
- Automatically handles capacity provisioning, load balancing, scaling, application health monitoring, instance configurationā¦
- Just the application code is the responsibility of the developer
- Components
- Application
- collection of Elastic Beanstalk components (environments, versions, configurationsā¦)
- Application Version
- an iteration of your application code
- Environment
- Collection of AWS resources running an application version (only one application version at a time)
- Web Server Tier and Worker Tier
- If your application performs tasks that are long to complete, offload these tasks to a dedicated worker environment
- Elastic Beanstalk worker environments simplify the process by managing the Amazon SQS queue (with support of DLQ) and running a daemon process on each instance that reads from the queue
- define periodic tasks in a file cron.yaml
- If your application performs tasks that are long to complete, offload these tasks to a dedicated worker environment
- Application

- Deployment Method
- All at once, has downtime
- Rolling: running under capacity, no additional costs
- Rolling with additional batches: compared to Rolling, this let application running at capacity (ie temporary create more instances)
- Immutable: create new instances in a new ASG, then swap; zero downtime
- Blue Green: new environment then swap
- Not a ādirect featureā of Elastic Beanstalk
- using Route53 with weighted policies
- Traffic Splitting: canary testing
- If thereās a deployment failure, this triggers an automated rollback (very quick)
- using ALB with weighted policies


Method | Impact of failed deployment | Deploy time | Zero downtime | No DNS change | Rollback process | Code deployed to |
---|---|---|---|---|---|---|
All at once | Downtime | ![]() | No | Yes | Manual redeploy | Existing instances |
Rolling | Single batch out of service; any successful batches before failure running new application version | ![]() ![]() | Yes | Yes | Manual redeploy | Existing instances |
Rolling with an additional batch | Minimal if first batch fails; otherwise, similar to Rolling | ![]() ![]() ![]() | Yes | Yes | Manual redeploy | New and existing instances |
Immutable | Minimal | ![]() ![]() ![]() ![]() | Yes | Yes | Terminate new instances | New instances |
Traffic splitting | Percentage of client traffic routed to new version temporarily impacted | ![]() ![]() ![]() ![]() | Yes | Yes | Reroute traffic and terminate new instances | New instances |
Blue/green | Minimal | ![]() ![]() ![]() ![]() | Yes | No | Swap URL | New instances |
- Lifecycle
- max most Application versions: 1000, use LifeCycle Policy to phase out, based on Time or Space
- has option to retain source bundles on S3
- EB Extensions
- YAML/JSON, with “.config” extension as file name
- update defaults with “option_settings”
- place under the “.ebextensions/” folder under root of source code
- resources managed by .ebextensions would be deleted if the environment goes away
- EB Clone, can help to setup exact same “configuration” environment
- Load Balancer type and configure
- RDS configure, but no data
- Environment variables
- EB Migration
- Once EB created, the Elastic Load Balancer (ELB) type cannot change
- Create another environment with new ELB, then using Route53 update or CNAME swap
- Decouple RDS with EB, for PROD
- Once EB created, the Elastic Load Balancer (ELB) type cannot change
- extra Configuration file could be able to add to the source bundle
- cron.yaml – schedule tasks
- env.yaml – configure the environment name, solution stack, and environment links
- Dockerrun.aws.json – multi-container Docker environments that are hosted in Elastic Beanstalk
- Notifications
- Create Rules in EventBridge to act to the following events:
- Environment Operations Status ā create, update, terminate (start, success, fail)
- Other Resources Status ā ASG, ELB, EC2 Instance (created, deleted)
- Managed Updates Status ā started, failed
- Environment Health Status
- Create Rules in EventBridge to act to the following events:

AWS Amplify
- create mobile and web applications (aka. ElasticBeans for mobile and web application)
- Authentication (Cognito), Storage (AppSync + DynamoDB), API (REST, GraphQL), CI/CD, PubSub, Analytics, AI/ML Predictions, Monitoring…
- Connect your source code from GitHub, AWS CodeCommit, Bitbucket, GitLab, or upload directly
- End-to-End (E2E) test, using Cypress


AWS AppConfig
- deploy dynamic configuration change without code deployment; validate with JSON Schema or Lambda Function
- provides the functionality to manage feature flags, a powerful technique that allows developers to test and control new features in live environments
AWS AppSync
- extend Cognito Sync (user data, like app preferences or game state), and also allowing multiple users to synchronize and collaborate in real time on shared data.
- managed service of GraphQL, combining multiples data sources
- retrieve data in “realtime” or “MQTT” of WebSocket
- for mobile apps: local data access and data sync
- Security: API_KEY, AWS_IAM, OPENID_CONNECT, AMAZON_COGNITO_USER_POOLS

AWS Systems Manager
- focused on management and operations of AWS resources (EC2 and On-Premise), such as automation, patching, and configuration
- with SSM Agent installed on nodes
== MONITORING ==
AWS CloudWatch
- Metrics: Collect and track key metrics for every AWS services
- namespace (specify a namespace for each data point, as new metric)
- dimension is an attributes (instance id, environment, …)
- timestamps
- for EC2 memory
- CloudWatch does not monitor the memory, swap, and disk space utilization of your instances. If you need to track these metrics, you can install a CloudWatch agent in your EC2 instances.
- (EC2) Memory usage is a custom metric, using API PutMetricData
- for Lambda function
- The ConcurrentExecutions metric in Amazon CloudWatch explicitly measures the number of instances of a Lambda function that are running at the same time.
- StorageResolution can be 1min (Standard) or 1/5/10/30 sec(High Resolution)
- Data point range of custom metric would be 2 weeks for past history and 2 hours in future
- detailed monitoring, just shorten the period to 1-minute; no extra fields

- Logs: Collect, monitor, analyze and store log ļ¬les
- Group – application (to encrpyt with KMS keys, need to use CloudWatch Logs API)
- stream – instances / log files / containers
- export
- Amazon S3, may take up to 12 hour, with API CreateExportTask
- Using Logs Subscripton to export real-time events to Kinesis Data Streams, Kinesis Data Firehose, AWS Lambda, with Subscription Filter
- Cross-Account Subscription (Subscription Filter -> Subscription Destination)
- Live Tail – for realtime tail watch
- By default, no logs from EC2 machine to CloudWatch
- CloudWatch Logs Agent – only push logs
- CloudWatch Unified Agent – push logs + collect metrics (extra RAM, Process, Swap) + centralized by SSM Parameter Store
- Metric Filters to trigger alarms; not traceback of history
- With “aws logs associate-kms-key“, enable (AWS KMS) encryption for an existing log group, eliminating the need to recreate the log group or manually encrypt logs before submission
- Log Insight
- facilitate in-depth analysis of log data
- enables users to run queries on log data collected from various AWS services and applications in real-time
- Alarms: Re-act in real-time to metrics / events
- based on a single metric; Composite Alarms are monitoring on multiple other alarms
- Targets
- EC2
- EC2 ASG
- Amazon SNS
- Settings
- Period is the length of time to evaluate the metric or expression to create each individual data point for an alarm. It is expressed in seconds. If you choose one minute as the period, there is one datapoint every minute.
- Evaluation Period is the number of the most recent periods, or data points, to evaluate when determining alarm state.
- Datapoints to Alarm is the number of data points within the evaluation period that must be breaching to cause the alarm to go to the ALARM state. The breaching data points do not have to be consecutive, they just must all be within the last number of data points equal to Evaluation Period.

- Synthetics Canary: monitor your APIs, URLs, Websites, ā¦
- Events, now called Amazon EventBridge
- Schedule – cron job
- Event Pattern – rules to react/trigger services
- Event Bus,a router that receives events and delivers them to zero or more destinations, or targets.
- (AWS) default, Partner, Custom
- Schema – the structure template for event (json)
- CloudWatch Evidently
- validate/serve new features to specified % of users only
- Launches (= feature flags) and Experiments (= A/B testing), and Overrides (specific variants assigned to specific user-id)
- evaluation events stored in CloudWatch Logs or S3
AWS X-Ray
- Troubleshooting (not monitoring) application performance and errors as “centralized service map visualization”
- Request tracking across distributed systems
- Focus on Latency, Errors and Fault analysis
- Compatible
- AWS Lambda
- Elastic Beanstalk
- ECS
- ELB
- API Gateway
- EC2 Instances or any application server (even on premise)
- But X-Ray cannot track the memory and swap usage of the instance; only CloudWatch Agents can do.
- Enable by
- AWS X-Ray SDK (on applications)
- Install X-Ray daemon (low lv UDP packet interceptor on OS) (on EC2 or ECS)
- a software application that listens for traffic on UDP port 2000, gathers raw segment data, and relays it to the AWS X-Ray API.
- for EC2, X-Ray daemon can be installed via user-data script
- for ECS, create a Docker image that runs the X-Ray daemon, upload it to a Docker image repository, and then deploy it to your Amazon ECS cluster
- Lambda runs the daemon automatically any time a function is invoked for a sampled request
- Enable X-Ray AWS Integration (IAM Role with proper permission) (on AWS services)
- for ElasticBeanstalk: to enable the X-Ray daemon by including the xray-daemon.config configuration file in the .ebextensions directory of your source code.
- Instrumentation means the measure of productās performance, diagnose errors, and to write trace information
- AWS X-Ray receives data from services as segments. X-Ray then groups segments that have a common request into traces. X-Ray processes the traces to generate a service graph that provides a visual representation of your application.
- segments/subsegments -> traces -> service graph
- Segments: each application / service will send them
- Subsegments: if you need more details in your segment, especially for DynanmoDB.
- Trace: segments collected together to form an end-to-end trace
- A trace segment is just a JSON representation of a request that your application serves.
- Sampling: decrease the amount of requests sent to X-Ray, reduce cost
- (default) 1st request each second (aka reservoir: 1), and then 5% of additional requests (aka rate: 0.05)
- Annotations: Key Value pairs used to index traces (for search) and use with filters
- Metadata: “EXTRA” Key Value pairs, not indexed, not used for searching
- AWS X-Ray receives data from services as segments. X-Ray then groups segments that have a common request into traces. X-Ray processes the traces to generate a service graph that provides a visual representation of your application.
- A subset of segment fields are indexed by X-Ray for use with filter expressions. You can search for segments associated with specific information in the X-Ray console or by using the GetTraceSummaries API.



- X-Ray APIs Policy
- AWSXrayWriteOnlyAccess
- PutTraceSegments
- PutTelemetryRecords
- GetSamplingRules
- GetSamplingTargets
- GetSamplingStatisticSummaries
- AWSXrayReadOnlyAccess – grant console access
- GetServiceGraph
- BatchGetTraces
- GetTraceSummaries
- GetTraceGraph
- AWSXRayDaemonWriteAccess
- AWSXrayFullAccess – Read + Write + configure encryption key settings and sampling rules
- AWSXrayWriteOnlyAccess
- APIs
- GetTraceSummaries – trace summaries, as a list of trace IDs of the application (also with annotations)
- BatchGetTraces – full traces, retrieve the list of traces (ie activity events)
- GetGroup – retrieves the group resource details.
- GetServiceGraph – shows which services process the incoming requests, including the downstream services that they call as a result.
- If a load balancer or other intermediary forwards a request to your application, X-Ray takes the client IP from the X-Forwarded-For header in the request instead of from the source IP in the IP packet.
Amazon Managed Grafana
- a fully managed and secure data visualization service that you can use to instantly query, correlate, and visualize operational metrics, logs, and traces

Use case | What is it optimized for? | Monitoring and observability services |
---|---|---|
Monitoring and alerting | These services are optimized to provide real-time visibility, proactive issue detection, resource optimization, and efficient incident response, contributing to overall application and infrastructure health. | – Amazon CloudWatch – Amazon CloudWatch Logs – Amazon EventBridge |
Application performance monitoring | These services provide comprehensive insights into application behavior, offer tools for identifying and resolving performance bottlenecks, aid in efficient troubleshooting, and contribute to delivering modern user experiences across distributed and web applications. | – Amazon CloudWatch Application Signals – Amazon Managed Service for Prometheus – AWS X-Ray – Amazon CloudWatch Synthetics |
Infrastructure observability | These services provide a holistic view of your cloud resources, helping you make more informed decisions about resource utilization, performance optimization, and cost-efficiency. | – Amazon CloudWatch Metrics – Amazon CloudWatch Container Insights |
Logging and analysis | These services help you efficiently manage and analyze log data, troubleshoot, detect anomalies, support security, meeting compliance requirements, and get actionable insights into your applications and infrastructure. | – Amazon Cloudwatch Logs Insights – Amazon CloudWatch Logs Anomaly Detection – Amazon Managed Grafana – Amazon OpenSearch Service – Amazon Kinesis Data Streams |
Security and compliance monitoring | Optimized to provide a robust security framework, enabling proactive threat detection, continuous monitoring, compliance tracking, and audit capabilities to help safeguard your AWS resources and maintain a secure and compliant environment. | – Amazon GuardDuty – AWS Config – AWS CloudTrail |
Network monitoring | These services provide visibility into network traffic, enhance security by detecting and preventing threats, enable efficient network traffic management, and support incident response activities. | – Amazon CloudWatch – Network Monitor – Amazon CloudWatch Internet Monitor – Amazon VPC Flow Logs – AWS Network Firewall |
Distributed tracing | These services provide a comprehensive view of the interactions and dependencies within your distributed applications. They enable you to diagnose performance bottlenecks, optimize application performance, and support the smooth functioning of complex systems by offering insights into how different parts of your application communicate and interact. | – AWS Distro for OpenTelemetry – AWS X-Ray – Amazon CloudWatch Application Signals (Preview) |
Hybrid and multicloud observability | Maintain reliable operations, provide modern digital experiences for your customers, and get help to meet service level objectives and performance commitments. | – Amazon CloudWatch (hybrid and multicloud support) |