== OPERATIONS ==
Automate key machine learning tasks and use no-code or low-code solutions
- SageMaker Canvas
- No-code machine learning for business analysts
- Capabilities for tasks
- such as data preparation, feature engineering, algorithm selection, training and tuning, inference, and more.
- Upload csv data (csv only for now), select a column to predict, build it, and make predictions
- Can also join datasets
- Classification or regression
- Automatic data cleaning
- Missing values
- Outliers
- Duplicates
- Share models & datasets with SageMaker Studio
- The Finer Points
- Local file uploading must be configured “by your IT administrator.”
- Set up an S3 bucket with appropriate CORS permissions
- Can integrate with Okta SSO
- Canvas lives within a SageMaker Domain that must be manually updated
- Import from Redshift can be set up
- Time series forecasting must be enabled via IAM
- Can run within a VPC
- Local file uploading must be configured “by your IT administrator.”
- SageMaker Autopilot (has been integrated into Canvas)
- Automates:
- Algorithm selection
- Data preprocessing
- Model tuning
- All infrastructure
- It does all the trial & error for you
- More broadly this is called AutoML
- Wokflow
- Load data from S3 for training
- Select your target column for prediction
- Automatic model creation
- Model notebook is available for visibility & control
- Model leaderboard
- Ranked list of recommended models
- You can pick one
- Deploy & monitor the model, refine via notebook if needed
- Can add in human guidance
- With or without code in SageMaker Studio or AWS SDK’s
- Problem types:
- Binary classification
- Multiclass classification
- Regression
- Algorithm Types:
- Linear Learner
- XGBoost
- Deep Learning (MLP’s)
- Ensemble mode
- Data must be tabular CSV or Parquet
- Training Modes ???
- HPO (Hyperparameter optimization)
- Ensembling
- Auto
- Explainability ???
- Integrates with SageMaker Clarify
- Transparency on how models arrive at predictions
- Feature attribution
- Uses SHAP Baselines / Shapley Values
- Research from cooperative game theory
- Assigns each feature an importance value for a given prediction
- Automates:
- SageMaker JumpStart
- One-click models and algorithms from model zoos
- Over 150 open source models in NLP, object detections, image classification, etc.
- also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning with SageMaker AI.
Inference Model Deployment
- getting predictions, or inferences, from your trained machine learning models
- approches
- Deploy a machine learning model in a low-code or no-code environment
- deploy pre-trained models using Amazon SageMaker JumpStart through the Amazon SageMaker Studio interface
- Use code to deploy machine learning models with more flexibility and control
- deploy their own models with customized settings for their application needs using the ModelBuilder class in the SageMaker AI Python SDK, which provides fine-grained control over various settings, such as instance types, network isolation, and resource allocation.
- Deploy machine learning models at scale
- use the AWS SDK for Python (Boto3) and AWS CloudFormation along with your desired Infrastructure as Code (IaC) and CI/CD tools
- Deploying a model to an endpoint
- Real-time inference
- ideal for inference workloads where you have interactive, low latency requirements.
- Deploy models with Amazon SageMaker Serverless Inference
- without configuring or managing any of the underlying infrastructure
- ideal for workloads which have idle periods between traffic spurts and can tolerate cold starts.
- Asynchronous inference
- queues incoming requests and processes them asynchronously
- ideal for requests with large payload sizes (up to 1GB), long processing times (up toAsynchronous Inference one hour), and near real-time latency requirements
- Real-time inference
- Cost optimization
- Model performance optimization with SageMaker Neo.
- automatically optimizing models to run in environments like AWS Inferentia chips.
- Automatic scaling of Amazon SageMaker AI models.
- Use autoscaling to dynamically adjust the compute resources for your endpoints based on incoming traffic patterns, which helps you optimize costs by only paying for the resources you’re using at a given time.
- Model performance optimization with SageMaker Neo.
- Deploy a machine learning model in a low-code or no-code environment
- Inference Modes
- Real-time inference
- inference workloads where you have real-time, interactive, low latency requirements
- (On-demand) Serverless inference
- workloads which have idle periods between traffic spurts and can tolerate cold starts.
- Asynchronous inference
- queues incoming requests and processes them asynchronously
- requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements
- Batch transform
- Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
- Get inferences from large datasets.
- Run inference when you don’t need a persistent endpoint.
- Associate input records with inferences to help with the interpretation of results.
- suitable for long-term monitoring and trend analysis
- Real-time inference
- Deployment Safeguards
- Deployment Guardrails
- For asynchronous or real-time inference endpoints
- Controls shifting traffic to new models
- “Blue/Green Deployments”
- All at once: shift everything, monitor, terminate blue fleet
- Canary: shift a small portion of traffic and monitor
- Linear: Shift traffic in linearly spaced steps
- Auto-rollbacks
- Shadow Tests
- Compare performance of shadow variant to production
- You monitor in SageMaker console and decide when to promote it
- Deployment Guardrails
Use case 1 | Use case 2 | Use case 3 | |
---|---|---|---|
SageMaker AI feature | Use JumpStart in Studio to accelerate your foundational model deployment. | Deploy models using ModelBuilder from the SageMaker Python SDK. | Deploy and manage models at scale with AWS CloudFormation. |
Description | Use the Studio UI to deploy pre-trained models from a catalog to pre-configured inference endpoints. This option is ideal for citizen data scientists, or for anyone who wants to deploy a model without configuring complex settings. | Use the ModelBuilder class from the Amazon SageMaker AI Python SDK to deploy your own model and configure deployment settings. This option is ideal for experienced data scientists, or for anyone who has their own model to deploy and requires fine-grained control. | Use AWS CloudFormation and Infrastructure as Code (IaC) for programmatic control and automation for deploying and managing SageMaker AI models. This option is ideal for advanced users who require consistent and repeatable deployments. |
Optimized for | Fast and streamlined deployments of popular open source models | Deploying your own models | Ongoing management of models in production |
Considerations | Lack of customization for container settings and specific application needs | No UI, requires that you’re comfortable developing and maintaining Python code | Requires infrastructure management and organizational resources, and also requires familiarity with the AWS SDK for Python (Boto3) or with AWS CloudFormation templates. |
Recommended environment | A SageMaker AI domain | A Python development environment configured with your AWS credentials and the SageMaker Python SDK installed, or a SageMaker AI IDE such as SageMaker JupyterLab | The AWS CLI, a local development environment, and Infrastructure as Code (IaC) and CI/CD tools |

== IMPLEMENTATIONS ==
SageMaker and Docker Containers
- All models in SageMaker are hosted in Docker containers
- Docker containers are created from images
- Images are built from a Dockerfile
- Images are saved in a repository
- Amazon Elastic Container Registry (ECR)
- Amazon SageMaker Containers
- Library for making containers compatible with SageMaker
- RUN
pip install sagemaker-containers
in your Dockerfile
- Environment variables
- SAGEMAKER_PROGRAM
- Run a script inside /opt/ml/code
- SAGEMAKER_TRAINING_MODULE
- SAGEMAKER_SERVICE_MODULE
- SM_MODEL_DIR
- SM_CHANNELS / SM_CHANNEL_*
- SM_HPS / SM_HP_*
- SM_USER_ARGS
- SAGEMAKER_PROGRAM
- Production Variants
- test out multiple models on live traffic using Production Variants
- Variant Weights tell SageMaker how to distribute traffic among them
- So, you could roll out a new iteration of your model at say 10% variant weight
- Once you’re confident in its performance, ramp it up to 100%
- do A/B tests, and to validate performance in real-world settings
- Offline validation isn’t always useful
- Shadow Variants
- Deployment Guardrails
- test out multiple models on live traffic using Production Variants


SageMaker on the Edge
- SageMaker Neo
- Train once, run anywhere
- Edge devices
- ARM, Intel, Nvidia processors
- Optimizes code for specific devices
- Tensorflow, MXNet, PyTorch, ONNX, XGBoost, DarkNet, Keras
- Consists of a compiler and a runtime
- Neo + AWS IoT Greengrass
- Neo-compiled models can be deployed to an HTTPS endpoint
- Hosted on C5, M5, M4, P3, or P2 instances
- Must be same instance type used for compilation
- OR! You can deploy to IoT Greengrass
- This is how you get the model to an actual edge device
- Inference at the edge with local data, using model trained in the cloud
- Uses Lambda inference applications
- Neo-compiled models can be deployed to an HTTPS endpoint
Managing SageMaker Resources
- In general, algorithms that rely on deep learning will benefit from GPU instances (P3, g4dn) for training
- Inference is usually less demanding and you can often get away with compute instances there (C5)
- Can use EC2 Spot instances for training
- Save up to 90% over on-demand instances
- Spot instances can be interrupted!
- Use checkpoints to S3 so training can resume
- Can increase training time as you need to wait for spot instances to become available
- Elastic Inference (EI) / Amazon SageMaker Inference
- attach just the right amount of GPU-powered acceleration to any Amazon EC2 and Amazon SageMaker instance
- Accelerates deep learning inference
- At fraction of cost of using a GPU instance for inference
- EI accelerators may be added alongside a CPU instance
- ml.eia1.medium / large / xlarge
- EI accelerators may also be applied to notebooks
- Works with Tensorflow, PyTorch, and MXNet pre-built containers
- ONNX may be used to export models to MXNet
- Works with custom containers built with EI-enabled Tensorflow, PyTorch, or MXNet
- Works with Image Classification and Object Detection built-in algorithms
- Automatic Scaling
- set up a scaling policy to define target metrics, min/max capacity, cooldown periods
- Works with CloudWatch
- Serverless Inference
- Specify your container, memory requirement, concurrency requirements
- Underlying capacity is automatically provisioned and scaled
- Good for infrequent or unpredictable traffic; will scale down to zero when there are no
requests. - Charged based on usage
- Monitor via CloudWatch
- ModelSetupTime, Invocations, MemoryUtilization
- Amazon SageMaker Inference Recommender
- How it works:
- Register your model to the model registry
- Benchmark different endpoint configurations
- Collect & visualize metrics to decide on instance types
- Existing models from zoos may have benchmarks already
- Instance Recommendations
- Runs load tests on recommended instance types
- Takes about 45 minutes
- Endpoint Recommendations
- Custom load test
- You specify instances, traffic patterns, latency requirements,
throughput requirements - Takes about 2 hours
- How it works:
- Availability Zones
- automatically attempts to distribute instances across availability zones
- Deploy multiple instances for each production endpoint
- Configure VPC’s with at least two subnets, each in a different AZ

MLOps with SageMaker
- Integrates SageMaker with Kubernetes-based ML infrastructure
- Amazon SageMaker Operators for Kubernetes
- Components for Kubeflow Pipelines
- Enables hybrid ML workflows (on-prem + cloud)
- Enables integration of existing ML platforms built on Kubernetes / Kubeflow
- SageMaker Projects
- SageMaker Studio’s native MLOps solution with CI/CD
- Build images
- Prep data, feature engineering
- Train models
- Evaluate models
- Deploy models
- Monitor & update models
- Uses code repositories for building & deploying ML solutions
- Uses SageMaker Pipelines defining steps
- SageMaker Studio’s native MLOps solution with CI/CD





Inference Pipelines ???
- Linear sequence of 2-15 containers
- Any combination of pre-trained built-in algorithms or your own algorithms in Docker containers
- Combine pre-processing, predictions, post-processing
- Spark ML and scikit-learn containers OK
- Spark ML can be run with Glue or EMR
- Serialized into MLeap format
- Can handle both real-time inference and batch transforms




== SECURITY ==
SageMaker Security
- Use Identity and Access Management (IAM)
- User permissions for:
- CreateTrainingJob
- CreateModel
- CreateEndpointConfig
- CreateTransformJob
- CreateHyperParameterTuningJob
- CreateNotebookInstance
- UpdateNotebookInstance
- Predefined policies:
- AmazonSageMakerReadOnly
- AmazonSageMakerFullAccess
- AdministratorAccess
- DataScientist
- User permissions for:
- Set up user accounts with only the permissions they need
- Use MFA
- Use SSL/TLS when connecting to anything
- Use CloudTrail to log API and user activity
- Use encryption
- AWS Key Management Service (KMS)
- Accepted by notebooks and all SageMaker jobs
- Training, tuning, batch transform, endpoints
- Notebooks and everything under /opt/ml/ and /tmp can be encrypted with a KMS key
- Accepted by notebooks and all SageMaker jobs
- S3
- Can use encrypted S3 buckets for training data and hosting models
- S3 can also use KMS
- Protecting Data in Transit
- All traffic supports TLS / SSL
- IAM roles are assigned to SageMaker to give it permissions to access resources
- Inter-node training communication may be optionally encrypted
- Can increase training time and cost with deep learning
- AKA inter-container traffic encryption
- Enabled via console or API when setting up a training or tuning job
- VPC
- Training jobs run in a Virtual Private Cloud (VPC)
- You can use a private VPC for even more security
- You’ll need to set up S3 VPC endpoints
- Custom endpoint policies and S3 bucket policies can keep this secure
- Notebooks are Internet-enabled by default
- If disabled, your VPC needs
- an interface endpoint (PrivateLink) and allow outbound connections (to other AWS services, like S3, AWS Comprehend), for training and hosting to work
- or NAT Gateway, and allow outbound connections (to Internet), for training and hosting to work
- If disabled, your VPC needs
- Training and Inference Containers are also Internet-enabled by default
- Network isolation is an option, but this also prevents S3 access
- Enable the SageMaker parameter EnableNetworkIsolation for the notebook instances; so the instances wouldn’t be accessible from the Internet
- Network isolation is an option, but this also prevents S3 access
- Logging and Monitoring
- CloudWatch can log, monitor and alarm on:
- Invocations and latency of endpoints
- Health of instance nodes (CPU, memory, etc)
- Ground Truth (active workers, how much they are doing)
- CloudTrail records actions from users, roles, and services within SageMaker
- Log files delivered to S3 for auditing
- CloudWatch can log, monitor and alarm on:





== COST OPTIMISATION ==
AWS Cost Management
- Operations for Cost Allocation Tracking/Analysis
- STEP ONE: create user-defined tags with key-value pairs that reflect attributes such as project names or departments to ensure proper categorization of resources
- STEP TWO: apply these tags to the relevant resources to enable tracking
- STEP THREE: enable the cost allocation tags in the Billing console
- (AFTER) STEP FOUR: Configure tag-based cost and usage reports (AWS Cost Allocation Reports) for detailed analysis in Cost Explorer