02. VPC & Network – Yannick Forge

VPC, SUBNETS, NETWORKING

A VPC is a virtual network for deploy resources (region lv)
A subnet is a range of IP addresses in your VPC as partitions (availability zone lv).
A public subnet is accessible from internet; private subnet is only accessible within the same VPC.
Use route tables to determine where network traffic from your subnet or gateway is directed.
A gateway connects your VPC to another network. For example, use an internet gateway to connect your VPC to the internet.
Use a VPC endpoint to connect to AWS services privately, without the use of an internet gateway or NAT device, (ie connect in a private network, not in public www internet). This is for VPC to connect out, not to for VPC to accept outside connect.
Use a VPC peering connection to route traffic between the resources in two VPCs, with condition: no overlapping CIDR (IP address range); also it’s not transitive.
Use a Transit Gateway, which acts as a central hub, to route traffic between your VPCs, VPN connections, and AWS Direct Connect connections.
Connect your VPCs to your on-premises networks using AWS Virtual Private Network (AWS VPN).
Use VPC Flow Logs to analysis/debug of the traffic issue, from
- VPC Flow Logs
- Subnet Flow Logs
- Elastic Network Interface (ENI) Flow Logs
- Capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data can be published to Amazon CloudWatch Logs and Amazon S3.
- also included any network information of AWS managed interfaces (ELB, ElasticCache, RDS, Aurora, etc.)
- data can send to S3, Kinesis Data Firehose, and CloudWatch Logs

Subnets

A VPC is housed within a Region, and a subnet maps 1-to-1 with an AZ. Therefore, for high availability, you need at least 2 subnets in your VPC so that you can span 2 AZs.
When you create a new subnet, it is automatically associated to the main route table.
Example VPC/subnet configurations recommended by AWS
- VPC with single public subnet: e.g. for single-tier, public-facing web app such as a blog or a simple website
- VPC with public and private subnets: e.g. for multi-tier web apps where the web servers are in the public subnet and the DBs in the private subnet

Public Subnet vs. Private Subnet

Public Subnets
- has a route table that routes to an Internet Gateway (IGW) (note the Internet Gateway is attached to the VPC, not directly to the subnet)
- When EC2 instances launched in a Public Subnet, they are auto-assigned a public IP address or ENI
- Security groups and network ACLs on Public Subnet must allow SSH traffic (on port 22) for admin config.
Private Subnets
- outbound traffic is routed to a NAT device. The NAT device is installed in the Public Subnet and connected to an Internet Gateway for outbound access to the internet.
  - NAT Gateway vs. NAT Instance = NAT Gateway is managed for you by AWS and highly available, whereas NAT Instance (self-managed) is a lot more manual work but can be used as a bastion host / jump box
  - using NAT Gateway, enables outbound internet access from the private subnet while blocking inbound connections from the Internet
- EC2 instances don’t have public IP or ENI
- You have to use a bastion host (“jump box”) to access instances in the Private Subnet over SSH (port 22)

Security Groups vs. Network ACL (NACL)

Security Group is at the instance level, Network ACL (a firewall decides traffic from and to) is at the subnet level and applies to all instances within that subnet
Security Group looks on IP address + other security groups, but Network ACL only works on IP address
Security Groups has only ALLOW rules, Network ACL have ALLOW and DENY
Security Groups are stateful (return traffic is always allowed), Network ACL stateless (both income and return traffic need to validate)
Security Groups evaluate all rules together, Network ACL processes rules in order
Neither can block traffic by country
Security Groups have inbound allow rules allowing traffic from within the group, whereas custom security groups don’t allow any inbound traffic by default. All outbound traffic is allowed by default.
Security Group default state: outbound rule allows all traffic to all IPs, but inbound has no rules and traffic therefore denied by default
NACLs function at the subnet level with separate allow/deny rules for inbound and allow/deny rules for outbound. They are stateless so it’s all about what the rules say each time. Don’t apply -within- the subnet, only in/iout of the subnet.
Default security groups have inbound allow rules (from within the group). Custom security groups do not allow any inbound traffic. All outbound traffic is allowed.
VPC automatically comes with a default NACL which allows all inbound/outbound traffic. A custom NACL denies all inbound/outbound traffic by default.

VPC Endpoints

Interface endpoints privately connect your VPC to AWS services, services hosted by other AWS accounts, and supported AWS Marketplace services as if they were in your VPC
Gateway endpoints direct traffic to S3 or DynamoDB only, using private IP addresses.
- Does not enable AWS Privatelink
- You route traffic from your VPC to the gateway endpoint using route tables. Protected by VPC endpoint policies rather than Security Groups.
powered by AWS Privatelink
- privately access services hosted on the AWS network without exposing your VPC to the public Internet
applies to many AWS services (API Gateway, CloudFormation, CloudWatch, S3)
does not go over the internet
no need to use an internet gateway, NAT device, DX connection, or VPN
Is an ENI with a private IP address, in the subnet that you specify, directing traffic to the service that you specify. Uses DNS to direct traffic to the service. Protected by a Security Group.

VPN Connection

Site-to-Site VPN, connect on-premises VPN to to AWS, encrypted transmit over public internet. Quick setup within minutes; AWS Managed site-to-site VPN Connection is connected between a Customer Gateway on the customer side and Virtual Private Gateway (VPG, or VPN gateway) that you create at the edge of your VPC.
Direct Connect (DX), needs physical connections, for weeks or months of establishment. It goes over private network.

Amazon Route 53

Domain Name System (DNS)
- Domain Registrar
- DNS Records
- Zone File: contains DNS records
- Name Server: resolves DNS queries (Authoritative or Non-Authoritative)
- Top Level Domain (TLD)
- Second Level Domain (SLD)
A highly available, scalable, fully managed and Authoritative DNS
- Authoritative = the customer (you) can update the DNS records
The only AWS service which provides 100% availability SLA
Health Checks
- HTTP Health Checks are only for public resources
- Health Check => Automated DNS Failover
  - Pass with 2xx/3xx status codes
  - Can setup based on the text on the first 5120 bytes of the response
- Health Checks are integrated with CW metrics
- (Combine) Calculated
  - up to 256 Child Health Checks
- Health checkers are outside the VPC
  - can’t access private endpoints (private VPC or on-premises resource)
  - You can create a CloudWatch Metric and associate a CloudWatch Alarm, then create a Health Check that checks the alarm itself
Routing Policies
- does not route any traffic, it only responds to the DNS queries
- • Simple
  - Can specify multiple values in the same record
  - if multiple values are returned, a random one is chosen by the client
  - no Health Check
- Weighted
  - Weights don’t need to sum up to 100
  - A record with 0 weight, then it assume to stop the resource; but if all records are set as 0, then all resources would be used equally
- Latency based
  - Latency is based on traffic between users and AWS Regions
- Geolocation
  - by location of the user [User GeoLocation]
  - Should create a “Default” record (in case there’s no match on location)
- Geoproximity (using Route 53 Traffic Flow feature)
  - by proximity of the resources and users [Resource GeoLocation]
  - Ability to shift more traffic to resources based on the defined bias
- Failover
- IP-based Routing
  - based on clients’ IP addresses
  - a list of CIDRs for your clients
- Multi-Value
  - Can be associated with Health Checks (return only values for healthy resources)
  - Up to 8 healthy records are returned for each Multi-Value query
  - Multi-Value is not a substitute for having an ELB
Traffic flow
- Visual editor to manage complex routing decision trees
- Configurations can be saved as Traffic Flow Policy
  - Can be applied to different Route 53 Hosted Zones (different domain names)
- Supports versioning
Configurations
- active/passive: in case of failure, return backup resource. Requires failover policy. Manual intervention can be required to then cause a fail-back to the active site.
- active/active: return >1 resource. Requires latency policy, weighted policy, or some other policy besides failover. In the case of failover, returns only the healthy resource
- combination: multiple policies are combined into a tree for more complex DNS failover

Routing Records

Best practice is to use DNS names/URLs whenever possible rather than IP addresses. Some exceptions include pointing ELBs directly to the IP address of a peered VPC, or an on-prem resource linked via DX or VPN connection.
Alias records provide a Route 53–specific extension to DNS functionality.
- They let you route traffic to selected AWS resources: ELBs, APIs, CloudFront distributions, S3 buckets, Elastic Beanstalk, VPC interface endpoints, etc.
- Unlike a CNAME record, they also let you route traffic from one record in a hosted zone (usually the zone apex / naked domain name, such as “example.com”) to another record (e.g. “www.example.com”)
- When Route 53 receives a DNS query for an alias record, it responds with 1 or more IP addresses that the record maps to
- Works for ROOT DOMAIN and NON ROOT DOMAIN (aka mydomain.com)
- Alias Record is always of type A/AAAA for AWS resources (IPv4 / IPv6)
- You can’t set the TTL
- You cannot set an ALIAS record for an EC2 DNS name
CNAME records (canonical name records) redirect DNS queries to any DNS record. For example, you can create a CNAME record that redirects queries from acme.example.com to zenith.example.com or acme.example.org.
- Points a hostname to any other hostname
- You don’t need to use Route 53.
- Unlike Alias records, they can’t be used for resolving apex domain names
- ONLY FOR NON ROOT DOMAIN (aka. something.mydomain.com)
PTR records = reverse lookup where you map an IP address to a DNS name

CloudFront

CloudFront distributes files from an origin as CDN (Content Delivery Network).
The origin
- S3 bucket (with Origin Access Control, OAC)
  - Secure with Original Access Control (OAC)
  - work as ingress (file upload to S3)
- Custom Origin (HTTP)
  - public Application Loader Balancer (ALB)
  - public EC2 instance
  - S3 static website
  - Any other HTTP backend
CloudFront vs S3 Cross Region Replication
- CloudFront cache with TTL
- CloudFront is global
- S3 Cross Region Replication is file updated near real-time, as Read Only
- S3 Cross Region Replication is good at low-latency in few regions
Cache
- stored in Edge locations
- Cache key, usu. “Domain” + “resource portion of the url”
- Using CloudFront Cache Policies to enhance the Cache Key with custom data, also control the TTL (0s – 1yr)
  - HTTP Header
  - Cookies
  - Query String
- CloudFront Origin Request Policy defines what part of information can be bring over to the requests to Origin (but not included in Cache Key)
  - HTTP Header; CloudFront Headers and custom Headers can be appended
  - Cookies
  - Query String
- Cache Hit Ratio to minimise the direct traffic to origins
- Using CreateInvalidation to manual expire caches (ie CloudFront Invalidation)
- The default Cache Behaviour is the last to be executed, and is always /*
CloudFront Signed URL/Cookies, for limited sharing, with a policy of
- URL expiration
- IP range allow to access
- Trust singers
- The most effective method to control unauthorized access to the photos and manage data transfer costs is to use a CloudFront web distribution with signed URLs or signed cookies.
  - Configure the S3 bucket to remove public read access and use pre-signed URLs with expiry dates is incorrect because it is not scalable for large numbers of objects, as it would require generating pre-signed URLs for potentially thousands or millions of objects, making it impractical.
  - Blocking the IP addresses of the offending websites using Network Access Control List is incorrect because a quick change in IP address would easily bypass this configuration
- 1 CloudFront Signed URL is for only 1 file to access; but 1 CloudFront Signed Cookies can access multiple files.
reduce price by utilise the Price Class
- Price Class ALL
- Price Class 200 – exclude Oceania and South America
- Price Class 100 – USA + Europe
Multiple Origin, using path pattern
Origin Group is for failover and high availability, as one Primary and one Secondary
Field Level Encryption, adds extra layer of security along with HTTPS, using asymmetric encryption
HTTPS enforcement
- Viewers <> CF, set “Viewer Protocol Policy” to use “Redirect HTTP to HTTPS“, “HTTPS Only“.
- CF <> origins, using AWS Certificate Manager (ACM) (or 3rd party ssl certificates imported)
CloudFront Real Time Logs, can send all requests to Kinesis Data Streams
- Sampling Rate, decide the percentage to be recorded
- Specific field
- Specific Cache Behaviour (path patterns)
Lambda@Edge is a feature of CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency
Can be configured to load an error page (“content not found”) for operationally simple error handling
Geo restriction (whitelist/blacklist access to content by country, e.g. due to copyright restrictions)
Headers for CloudFront Function (similar to CloudFlare Worker)
- CloudFront-Viewer-Country: geo country
- CloudFront-Viewer-Address: ip address

	CloudFront Functions	Lambda@Edge
Programming languages	JavaScript (ECMAScript 5.1 compliant)	Node.js and Python
Event sources	Viewer requestViewer response	Viewer requestViewer responseOrigin requestOrigin response
Supports Amazon CloudFront KeyValueStore	YesCloudFront KeyValueStore only supports JavaScript runtime 2.0	No
Scale	10,000,000 requests per second or more	Up to 10,000 requests per second per Region
Function duration	Submillisecond	Up to 5 seconds (viewer request and viewer response)Up to 30 seconds (origin request and origin response)
Maximum function memory size	2 MB	128 MB (viewer request and viewer response)10,240 MB (10 GB) (origin request and origin response)
Maximum size of the function code and included libraries	10 KB	50 MB (viewer request and viewer response)50 MB (origin request and origin response)
Network access	No	Yes
File system access	No	Yes
Access to the request body	No	Yes
Access to geolocation and device data	Yes	No (viewer request and viewer response)Yes (origin request and origin response)
Can build and test entirely within CloudFront	Yes	No
Function logging and metrics	Yes	Yes

using CloudFront to ensure viwer <> origin are all end-to-end SSL connection
1. A viewer submits an HTTPS request to CloudFront.
2. If the object is in the CloudFront edge cache, CloudFront encrypts the response and returns it to the viewer, and the viewer decrypts it.
3. If the object is not in the CloudFront cache, CloudFront performs SSL/TLS negotiation with your origin
4. Your origin decrypts the request, encrypts the requested object, and returns the object to CloudFront.
5. CloudFront decrypts the response, re-encrypts it, and forwards the object to the viewer. CloudFront also saves the object in the edge cache so that the object is available the next time it’s requested.
6. The viewer decrypts the response.

Elastic Load Balancers

ELBs send traffic to AWS and on-prem resources. Unlike Route 53, they use resource IP addresses and you don’t get to specify policies such as a weighted policy. VPC flow logs show traffic going to/from an ELB
Health check is way to check the instance status under ELB, usu a port with a route (like /health)
A Classic Load Balancer (CLB) operates using TCP, SSL, HTTP and HTTPS.
An Application Load Balancer (ALB) makes routing decisions at the application layer aka Layer 7 (HTTP/HTTPS & WebSocket)
- supports path-based routing and host-based routing (i.e. based on the content of the request in the host field), even QueryStrings and Headers
- can route requests to one or more ports on each ECS container instance in a cluster, also the port mapping as Dynamic Host Port Mapping
- support Lambda Functions as target
- support private IPs (on-prem resources)
- support redirects (from HTTP to HTTPS)
- supports authentication from OIDC compliant IdPs (OpenID) such as Google and Facebook via an integration with Cognito (with HTTPS listener, port 443)
- periodically sends messages to its targets to check their status – health checks. – and routes only to healthy targets
- enable access logs which can get pushed to S3. They log info on requester, IP, request type, etc.
- the client information can be transmitted by custom inserted HTTP Header, X-Forwarded-For (IP), X-Forwarded-Port (Port), and X-Forwarded-Proto (Proto)

A Network Load Balancer (NLB) make routing decisions at the transport layer aka Layer 4 (TCP, TLS & UDP). They can handle millions of requests per second with extremely low latency. They don’t support path-based routing or host-based routing the way ALB does.
- One static IP per AZ, and support assigning Elastic IP (good for whitelisting specific IP)
- The target groups can be EC2, private IPs, and ALB
- The Heath check supports TPC, HTTP, and HTTPS protocols
Gateway Load Balancer (GWLB) works on Layer 3 (IP Packages), mostly about Firewalls, Intrusion Detection and Prevention Systems, Deep Packet Inspection Systems, payload manipulation; using GENEVE protocol on port 6081
- Transparent Network Gateway – single entry/exit for external traffic
- Load Balancer
- Target groups can be EC2 and private IPs
Sticky Sessions (Session Affinity), same client is always redirected to the same instance behind a load balancer
- Available for CLB, ALB, and NLB
- Two types of cookies
  - Application-based Cookies, with default AWSALBAPP or custom (not naming with AWSALB, AWSALBAPP, or AWSALBTG)
  - Duration-based Cookies, with AWSALB for ALB, AWSELB for CLB
With Cross-Zone Load Balancing, traffic would be spread across AZs not instances
- Default enabled for ALB, but disabled for NLB, GWLB and CLB
- free of inter AZ data transfer for ALB and CLB
SSL certificates supports
- only 1 per CLB
- multiple on ALB and NLB, also with Server Name Indication (SNI)
Connection Draining for CLB, Deregistration Delay for ALB/NLB
- Time to wait for “in-flight requests” completion while the instance is de-registering or unhealthy
- Stops sending new requests to the EC2 instance which is de-registering
- the value can be 0 (as disabled means no draining allowed) or between 1 to 3600 seconds, as default is 300 seconds
- Set to a low value if your requests are short

Auto-Scaling Group (ASG)

In high-availability contexts you use an Auto-Scaling Group (ASG) to automatically launch and stop instances, and an Elastic Load Balancer (ELB) to distribute traffic among the instances
- specify which subnets the ASG should launch instances into
- attach Target Groups to the ASG
ASG Launch Template, with min/max/init capacity, also scale-in(decrease)/scale-out(increase) policies
- AMI + InstanceType
- EC2 User Data
- EBSVolumes
- Security Groups
- SSH Key Pair
- IAM Roles for EC2
- Network + Subnets
- Load Balancer
ASG scaling policies
- Dynamic
  - Target Tracking – uses a custom metric to add/remove instances
  - Simple / Step Scaling
- Scheduled – based on known usage patterns
- Predictive – continuously forecast load and schedule scaling ahead
Metrics to look: CPUUtilization, RequestCountPerTarget, Average Network In / Out, and custom
Cooldown period (default 300 seconds) – reducing the cooldown period will more quickly terminate unneeded instances, reducing costs
Instance Refresh, for recreated all instances with updated Launch Template
- minimum healthy percentage
- warm-up time (how long until the new instance is ready to use)

AWS Direct Connect (DX) Gateway

You can use Direct Connect (DX) to connect an on-prem data centre to one or multiple VPCs
DX can take > 1 month to setup
For resilience, add a 2nd DX connection. As this can take time to setup and is costly, in the short term consider also adding an IPSec VPN connection (with the same BGP prefix) for resiliency.
You must create one of the following virtual interfaces to begin using DX:
- Private virtual interface (private VIF): access a VPC using private IP addresses
- Public virtual interface (public VIF): access all AWS public services using public IP addresses
- Transit virtual interface (transit VIF): access one or more VPC Transit Gateways associated with DX gateways, within a Region.
A hosted virtual interface (hosted VIF) allows another AWS account to access your DX
Use AWS DataSync to copy large amount of data from on-prem to S3, EFS, FSx, NFS shares, SMB shares, AWS Snowcone (via Direct Connect). For copying data, use DMS to copy databases.

AWS Global Accelerator

a service for improving the availability and performance of applications by routing user traffic through the AWS global network infrastructure.
primarily used for accelerating traffic to specific application endpoints like EC2 instances or Elastic Load Balancers, not for accelerating data transfers to Amazon S3

(So what’s the difference among Direct Connect, Transit Gateway, and Global Accelerator??)

3 Types of Network Adapters

ENI – basic type
ENA – for enhanced networking, high bandwidth and low latency
EFA (fabric adapter) – for high performance computing

AWS Services Calling into a VPC

To enable AWS serverless services such as Lambda to access resources inside your private VPC, you provide it with VPC-specific info such as your subnet IDs and security group IDs.

AWS Transit Gateway

Central Hub connecting on-prem networks and VPCs.
- Reduces operational complexity as you can easily add more VPCs, VPN capacity, Direct Connect gateways, without complex routing tables.
- Provides additional features over-and-above VPC peering
A transit virtual interface is used to access VPC Transit Gateways
Pattern for connecting 1 DX to multiple VPCs in the same Region is to associate the DX with a transit gateway
- on-prem -> DX -> DX location -> transit virtual interface -> transit gateway association -> Transit Gateway -> multiple VPCs