S3
Storage Tiers
- S3 Standard - General Purpose
- S3 Standard - Infrequent Access (IA)
- S3 One Zone - Infrequent Access (IA)
- S3 Glacier
- Standard & Glacier: span at least 3 AZs
- S3 Intelligent Tiering
- Auto Tiering
Lifecycle Rules
- Transition actions (move between tiers)
- Expiration actions (delete)
Versioning
- Enabled at the bucket level
Cross Region Replication
- Must enable versioning
- Can be in different accounts
- Asynchronous
- Need proper IAM permissions
Etag (Entity Tag)
- Simple upload
- Etag = MD5
- Multi-part upload
- Etag is calculated by the algorithm
- Use Etag to ensure integrity
Performance
- Key prefix
- 3500 RPS for PUT and 5500 RPS for GET
- Use random, do not use date
- Multi-part upload
- Faster when object > 5 GB
- Decrease time to retry
- CloudFront to cache S3 objects
- S3 Transfer Acceleration (use edge location)
- If using SSE-KMS, limits of KMS is 100 -1000 downloads / uploads per second
Encryption
SSE-S3 | SSE-KMS | SSE-C | CSE | |
---|---|---|---|---|
Encryption key is managed by | S3 | KMS | Customer | Customer |
x-amz-server-side-encryption header | aes256 | aws:kms | ${encryption key} | - |
HTTPS (SSL / TLS) | Optional | Optional | Mandatory | Optional |
Can be set as default encryption | Yes | Yes | No | No |
Security
S3 CORS (Cross-Origin Resource Sharing)
- Enable CORS if you want to request data from website in different origin
- Can control the websites that can request your objects in S3
S3 Access Logs
- Log all events that access to S3 bucket
Policies and ACL
- IAM policies
- Bucket Policies
- Bucket ACL (Bucket Policies are better)
- Object ACL
Others
- VPC Endpoint
- MFA can be required in versioned buckets when delete objects
- Presigned URL
Glacier
- Retrieval options
- Expedited (1 to 5 minutes)
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours)
- Vault Access Policies
- Vault Lock Policies
- Immutable policies
DynamoDB
- Distributed NoSQL database
- DynamoDB
- tables
- primary key (Unique and diverse)
- Partition key only
- Partition kay and sort key
- items (Max size is 400 KB)
- attributes
- String, Number, Binary, Boolean, Null
- String Set, Number Set, Binary Set
- List, Map
- attributes
- primary key (Unique and diverse)
- tables
Provisioned Throughput
- Read and write capacity units must be provisioned
- Option to setup auto-scaling of throughput
- Throughput can be exceeded temporarily using burst credit
- If capacity is exceeded, you will get ProvisionedThroughputExceededExceptions
- It is advised to do an exponential back-off retry
Write Capacity Units (WCU)
- 1 WCU = 1 write per second for an item up to 1 KB
- WCU = (number of objects per second) * ceil(max size of object / 1 KB)
Strongly Consistent Read vs. Eventually Consistent Read
- DynamoDB is distributed
- Uses Eventually Consistent Read by default
- Strongly Consistent Read
- Will get correct data if read just after write
- Eventually Consistent Read
- Maybe get outdated data if read just after write because of replication
Read Capacity Units (RCU)
- 1 RCU = 1 strongly consistent read or 2 eventually consistent reads per second for an item up to 4 KB
- Strongly Consistent Read
- RCU = (number of objects per second) * ceil(max size of object / 4 KB)
- Eventually Consistent Read
- RCU = (number of objects per second / 2) * ceil(max size of object / 4 KB)
ProvisionedThroughputExceededExceptions
- Occurred when RCU or WCU is exceeded
- Reasons
- Hot keys or partitions
- Too large items
- Solutions
- Use exponential back-off in SDK
- Distribute partition keys
- If it is RCU issue, you can use DynamoDB Accelerator (DAX)
Partition
- Each partition
- Max of 3000 RCU and 1000 WCU
- Max of 10 GB
- The number of partition
- Capacity partitions = ceil((RCU / 3000) + (WCU / 1000))
- Size partitions = ceil(Size / 10 GB)
- Partitions = max(Capacity partitions, Size partitions)
- WCU and RCU are spread evenly between partition
APIs
- PutItem
- Creat data or full replace
- UpdateItem
- partial update
- ConditionalWrites
- Put / update only if conditions are respected
- No performance impact
- DeleteItem
- Delete an individual row
- Can perform a conditional delete
- DeleteTable
- Delete a whole table
- BatchWriteItem
- Up to 25 PutItem and / or DeleteItem in one call
- Up to 16 MB and up to 400 KB per item
- Lower latency by reducing API calls
- Batching is done in parallel
- Batching also supports exponential back-off retry
- GetItem
- Read based on Primary key
- Partition key
- Partition key + sort key
- Eventually consistent read by default (option to use strong consistent read)
- Can only filter data by Projection Expression
- Read based on Primary key
- BatchGetItem
- Up to 100 items
- Up to 16 MB
- Batching is done in parallel
Query
- Based on
- Partition key = <condition> (demand)
- Sort key [=, <, <=, >, >=, Between, Begin] <condition> (optional)
- Limit <number of items>
- Client side filtering by Filter Expression
- Returns
- Up to 1 MB of data or use pagination
Scan
- Scan entire table
- Returns up to 1 MB of data or use pagination
- Can reduce consumption of RCU by
- using limit
- pausing
- Filter data use Projection Expression or Filter Expression
- parallel scans
- Can scan multiple partitions at the same time
- Increases throughput but also RCU consumed
Indexes
- Another partition key + optional sort key
Local Secondary Index (LSI)
- Up to five LSI per table
- LSI must be defined at creation time
Global Secondary Index (GSL)
- Create a new ‘view’ by projecting attributes
- Index only (KEYS_ONLY)
- Specify extra attributes (INCLUDE)
- All attributes (ALL)
- Need define independent RCU / WCU
- Can add or modify GSI at any time
DAX (DynamoDB Accelerator)
- Cache for DynamoDB
- Writs also go through DAX
- 5 minutes TTL by default
- Up to 10 nodes in one cluster
- Multi-AZ
Streams
- Trigger other services when changes in DynamoDB
- AWS Lambda
- Kinesis Adapter
- Has 24 hours of data retention
- Can use batch (up to 1000 rows or 6 MB)
Kinesis Adapter
- In KCL library
- Directly consume from DynamoDB Streams
TTL (Time to Live)
- Automatically delete an item after an expiry datetime
- TTL is defined in a column (by setting expiry datetime)
- Expired item will be deleted within 48 hours
- GSI / LSI will also be deleted
- You can recover items using DynamoDB Streams
Other Features
- Point in time restore
- Global Tables (Multi region)
- Support DMS
- Can be launched locally for development
ElastiCache
- In-memory database (Redis or Memcached)
Redis
- In-memory key-value store
- Persistence (cache survive reboots)
Memcached
- In-memory object store
- Not persistence