Skip to content

2026-05-25

Aurora Serverless v2 in Review: An Architectural Deep Dive

What Aurora Serverless v2 actually is under the hood: the shared Aurora storage layer, the ACU-driven compute layer, the Caspian heat-management substrate, scale-to-zero mechanics, and mixed-mode clusters.

Aurora Serverless v2 is often introduced as if it were a separate database product, distinct from provisioned Aurora. That framing leads engineers to wrong assumptions: that read replicas autoscale by count, that scaling events drop connections, that every scale-up warms a cold cache. None of those hold, though a fourth is partially true since November 2024: scale-to-zero does imply a cold-start path.

A note on the name

In April 2026 AWS renamed “Aurora Serverless v2” to “Aurora serverless.” The AWS Database Blog carries the banner on its scale-to-zero launch post: “April, 2026: Aurora Serverless v2 has been renamed Aurora serverless. No action required.” The legacy v1 product still exists as a separate, deprecating offering. Most readers still search for “Aurora Serverless v2,” and the documentation URL slugs themselves continue to say v2 throughout, so the same term carries through below. Wherever “Serverless v2” appears, it refers to the product that AWS now calls simply “Aurora serverless.”

Architecturally, the rename clarifies what was always true: there is one Aurora, and “serverless” is a billing and operating mode layered on it. Aurora Serverless v2 is a thin scaling daemon and a billing wrapper around the same Aurora compute and storage that has powered provisioned clusters since 2014, with scale-to-zero as the one mechanism that meaningfully strains that framing.

The storage layer

Every conversation about Aurora performance, durability, or scaling that does not start with the storage layer ends up confused. The storage layer is what makes Aurora Aurora; the compute layer is, in many ways, comparatively simple. Aurora Serverless v2 inherits this layer untouched. The Serverless v2 documentation states the equivalence explicitly: “Aurora serverless storage has the same reliability and durability characteristics as described in Amazon Aurora storage. That’s because the storage for Aurora DB clusters works the same whether the compute capacity uses Aurora serverless or provisioned.”

Six copies across three availability zones

An Aurora cluster volume stores six copies of all data, spread across three availability zones, regardless of how many DB instances exist in the cluster. The replication factor is independent of compute count. A single-instance cluster gets the same six-way replication as a fifteen-instance cluster. The volume itself is logically a shared device addressable from all compute instances in the cluster; it auto-resizes from 10 GiB up to 128 TiB on demand, and up to 256 TiB on recent Aurora PostgreSQL (17.5+, 16.9+, 15.13+) and Aurora MySQL 3.10+ versions following the July 2025 ceiling lift.

This 6/3 layout enables the quorum semantics formalized in Verbitski et al.’s SIGMOD 2018 paper: writes require acknowledgement from four of the six copies, and reads require three of six. The asymmetry is deliberate. With three AZs, the system can tolerate the loss of an entire AZ plus one additional copy in another AZ and still complete writes. It can tolerate the loss of an entire AZ entirely while continuing reads. Storage is sharded into 10 GiB “protection groups,” each replicated six ways and repaired independently, which means recovery operations work at protection-group granularity rather than per-volume.

The log is the database

The most consequential design choice in Aurora is that only redo log records cross the network from compute to storage, not full data pages. When a transaction commits on a compute instance, the instance sends log records to all six storage nodes for the affected protection groups. Storage nodes apply log records to materialize pages on demand; the compute instance is not on the page-write path. SIGMOD 2017 introduces this with the slogan “the log is the database” and quantifies the reduction in network traffic relative to traditional replication, which sends pages.

Three consequences follow directly. First, compute can be elastic because state lives below it; a scaling event does not have to move pages, only refresh the compute tier’s view of recent log positions. Second, durability is decoupled from instance lifetime entirely; a compute instance can crash mid-transaction and recovery proceeds from the log on storage. Third, read replicas read from the same shared volume rather than from the writer over a replication channel; replica lag is propagation of buffer-pool invalidations through the cluster, not transactional replay.

Why this matters for Serverless v2

All three properties are preconditions for Serverless v2’s behavior. In-place compute scaling is possible because storage is shared and durable below. Scale-to-zero is possible because durability survives the absence of any running compute instance. Mixed-mode clusters work because every reader and the writer all attach to the same cluster volume regardless of whether the instance is provisioned or serverless. The Serverless v2 docs are doing real work when they say the storage layer is unchanged: that single fact is what makes the scaling daemon a small addition rather than a new database engine.

The compute layer

The compute layer of Aurora Serverless v2 runs on the same Aurora compute infrastructure as provisioned, only with elastic capacity. The unit of capacity is the Aurora Capacity Unit, ACU. AWS defines 1 ACU as approximately 2 GiB of RAM with proportional CPU and networking, calibrated to be similar to a comparable Aurora provisioned instance. Capacity is configured as a range with a minimum and maximum, expressed in 0.5 ACU increments. The current range is 0 ACU to 256 ACU; the 256 ACU ceiling was announced on October 3, 2024, and the 0 ACU floor was added with the November 2024 scale-to-zero launch.

The buffer pool (innodb_buffer_pool_size on MySQL, shared_buffers on PostgreSQL) scales with the current ACU value. As the instance scales up, the buffer pool grows; as it scales down, the buffer pool shrinks. This is unusual: most databases treat the buffer pool as a fixed allocation at boot, and resizing it is an offline operation. Aurora’s compute layer is engineered for in-place resizing of buffer pool memory. That single technical fact is what lets the scaling daemon be effective at all. By contrast, max_connections is fixed at boot and derived from the maximum ACU value configured for the cluster. Scaling down does not reduce max_connections, so established connections are not dropped when the instance shrinks. This is the right default for a database compute layer. The implication: a cluster with a maximum of 4 ACU configures max_connections at the value appropriate for a 4 ACU instance, regardless of where it actually runs.

The Caspian substrate

The internal name for the heat-management substrate that schedules Aurora compute across physical hosts is Caspian. AWS docs do not use this name; the canonical public reference is Marc Brooker’s Resource Management in Aurora Serverless, where heat management at oversubscribed hosts is discussed in detail. Caspian’s job is to pack many Aurora compute instances onto shared hosts and move them between hosts when one instance’s demand grows faster than its neighbors can yield resources. From the cluster’s perspective this is invisible: a scaling event grows the ACU allocation, and either the current host has the capacity or Caspian relocates the instance to one that does.

The phrase to avoid is “this is a warm pool of pre-provisioned instances.” That was the Serverless v1 model and it is not how v2 works. There is no pool of warm spare instances waiting to be assigned. There is a fleet of Aurora compute hosts, each running many tenant instances, with a substrate that reallocates resources within and across those hosts as load changes. The reason scale-up looks fast is not that an instance is grabbed from a queue; it is that the existing instance is granted more memory and CPU on its current host, or moved to a host with headroom.

The scaling daemon

The scaling daemon is the component that observes load and adjusts ACU allocation. Its inputs are CPU, memory, and network utilization, which AWS docs collectively call “the load.” Its output is a target ACU value, bounded by the configured minimum and maximum. The triggers are not numerically published; the docs say scaling occurs when capacity is constrained on any of the three dimensions, or when Aurora detects performance issues it can resolve by scaling. Each writer and each reader instance scales independently.

Scaling is in-place. The docs do not promise zero connection drops in every case, but the typical scaling event does not involve failover, does not restart the engine, and does not drop established sessions. The buffer pool size adjusts as the ACU value changes. The engine does not unload and reload pages; the underlying memory allocation grows or shrinks, and the engine’s view of available cache space follows.

AWS does not publish a numeric scale-latency SLA. The docs use the word “rapidly” and describe scaling as occurring at second-granularity, taking incremental steps rather than a single jump. This is the right place to calibrate expectations: a 1 ACU to 32 ACU transition is not instantaneous because the daemon climbs in increments, even though each individual step completes quickly.

Aurora Auto Scaling: unsupported by design

The legacy Aurora Auto Scaling feature, which adds new reader replicas based on CPU utilization, is explicitly not supported on Serverless v2. The Requirements documentation states: “Aurora Auto Scaling isn’t supported. This type of scaling adds new readers to handle additional read-intensive workload, based on CPU usage. However, scaling based on CPU usage isn’t meaningful for Aurora serverless.”

This is the docs team confirming the architectural model in writing. The scaling unit in Serverless v2 is the instance, not the replica count. Each reader grows itself when it needs more capacity. Adding new readers is still a deliberate operator action, taken when read concurrency exceeds what a single instance can serve regardless of size, but it is not an autoscaling primitive. Engineers who expect a Serverless v2 cluster to auto-add readers under load are reasoning by analogy from the provisioned model, and the docs explicitly close that door.

The reason CPU-based replica-adding is “not meaningful” for Serverless v2 is that each Serverless v2 reader already responds to CPU pressure by growing its ACU allocation. Adding a new reader would compound the scaling response with a slower, coarser-grained signal. The right way to read this limitation is not as a missing feature but as a statement that the v2 architecture has internalized what v1 and provisioned Aurora handled with an external controller.

Scale-to-zero: the one architectural seam

Scale-to-zero launched in November 2024. Setting the minimum ACU to 0 enables an auto-pause behavior controlled by the SecondsUntilAutoPause parameter on the --serverless-v2-scaling-configuration flag, which is configurable from 300 seconds to 86,400 seconds. When the configured idle interval elapses with no qualifying activity, the instance pauses; on the next connection attempt, it resumes.

This is the one place where the “thin scaling daemon” framing strains. The original Aurora design assumes long-lived compute instances; pausing and resuming is not in that picture. AWS engineered around this with a deeper sleep state and a measured resume path, but the seam is visible from the application side as latency on the first request after a pause.

Engine version floor

Scale-to-zero requires recent engine versions: Aurora PostgreSQL at minimum 16.3, 15.7, 14.12, or 13.15; Aurora MySQL at minimum version 3.08.0. Older engine versions on the same parameter family do not get scale-to-zero. This is the kind of requirement that is easy to miss when evaluating Serverless v2 against an existing fleet running, for example, Aurora PostgreSQL 15.4: the cluster will run on Serverless v2, but the auto-pause feature will not engage.

Conditions that prevent auto-pause

The auto-pause documentation lists conditions that block a pause. The architecturally interesting ones include:

  • Any user-initiated connection open. Administrative health-check connections do not count.
  • Patching or upgrades in progress.
  • PostgreSQL logical replication or MySQL binlog replication enabled. The writer and any failover-tier-0 or tier-1 readers will not pause; only readers with failover priority 2 or higher can pause in this configuration.
  • An associated RDS Proxy in certain configurations.

The replication clause is worth highlighting. A cluster doing logical replication to another system, whether for analytics, change data capture, or cross-region propagation, cannot have its writer pause. This is correct: if the writer pauses, the replication stream stops. It also means that a common production configuration, where logical replication feeds a downstream data platform, is structurally incompatible with writer-side auto-pause.

Resume mechanics

Typical resume latency from a recent pause is approximately 15 seconds. After a pause longer than 24 hours, Aurora moves the instance into a deeper sleep state, and resume latency rises to roughly 30 seconds or more. The documentation recommends setting client connectTimeout to at least 15 seconds, or at least 30 seconds for clusters that may be in long-paused states. This is the most visible application-side effect of scale-to-zero, and the one that most directly informs whether the feature fits a workload: a synchronous request path with no tolerance for a 15-second cold-start cannot use scale-to-zero on its primary database.

Resume starts cold

The auto-pause documentation contains a load-bearing sentence: “a relatively small capacity and scales up from there. This starting capacity applies even if the DB instance had some higher capacity immediately before it was automatically paused.” In other words, resume does not restore the pre-pause capacity. It restores at a low ACU value and lets the scaling daemon climb back.

This sentence rewrites the buffer-pool warmth story. During normal in-place up-and-down scaling, the buffer pool changes size with the ACU allocation, but the entries already cached at the smaller size are retained where possible: scale-up adds memory, it does not clear cache. After auto-pause, the instance starts cold: the buffer pool is empty, the plan cache is cold, and the first queries pay full page-fetch and parse costs. For workloads with large hot working sets, the practical cost of scale-to-zero is therefore not the 15-second resume but the minutes of degraded latency while the cache rewarms.

This is the architectural seam. Scale-to-zero is fine for low-traffic clusters where the cache cost is small, for development clusters where latency does not matter, and for read replicas that can absorb a slow first query. It is not a free toggle on a production primary serving latency-sensitive traffic.

Mixed-mode clusters

A single Aurora cluster can mix Serverless v2 instances and provisioned instances in any combination. Because the storage layer is shared and the compute layer attaches to the same cluster volume regardless of mode, switching individual instances between modes is an instance-level operation, not a cluster-level one.

Two shapes recur in production:

The first is a provisioned writer with Serverless v2 readers. This fits workloads with steady, predictable write rates and variable read loads, or write workloads at scale beyond what the Serverless v2 ceiling currently absorbs. The provisioned writer gives stable, predictable write throughput; the Serverless v2 readers absorb the variability of read fan-out without operator intervention.

The second is a Serverless v2 writer with provisioned readers. This fits workloads with bursty writes and a steady, well-understood read profile. The Serverless v2 writer scales with the write burst; the provisioned readers provide stable, predictable cost and capacity for the read side. These two shapes are the explicit recommendations on the auto-pause page, presented as the canonical layouts for development/test systems and high-availability production systems respectively.

A practical note on upgrade paths: some Aurora engine version upgrades from older minor versions require starting with a provisioned writer and converting to Serverless v2 afterward. This is documented as part of the version-by-version upgrade prerequisites, and the right place to check is the requirements page rather than the Serverless v2 introduction. Teams planning a major-version upgrade alongside a Serverless v2 conversion should not assume both can happen in one step.

The layer stack at a glance

The Mermaid diagram below captures the four layers that compose Aurora Serverless v2 and the direction of dependency between them.

Storage layer: 6 copies, 3 AZs, log-as-database, shared volume

Aurora compute: ACU-sized instances on Caspian-managed hosts

Scaling daemon: CPU + memory + network -> target ACU, in place

Billing surface: ACU-seconds, including auto-pause window

The diagram is intentionally read bottom-up in dependency terms. The storage layer is the foundation that the rest builds on. The compute layer is the Aurora engine running at some ACU allocation on a Caspian-managed host. The scaling daemon is a controller that observes load and adjusts the ACU allocation. The billing surface is what the customer sees on the invoice. Each layer is thin compared to the storage layer beneath it.

Architectural limits worth naming

Several Serverless v2 limitations are best understood as expressions of the underlying architecture rather than as arbitrary gaps. The Requirements documentation names them.

Database Activity Streams (DAS) is not supported on Serverless v2. DAS streams database activity to Kinesis for compliance and audit use cases on provisioned Aurora. It is not available on the serverless compute layer. Teams with compliance requirements that mandate DAS cannot run those workloads on Serverless v2 today.

Cluster cache management for Aurora PostgreSQL is disabled. The apg_ccm_enabled parameter, which enables fast crash recovery by pre-warming a replica’s cache on the writer, has no effect on Serverless v2 instances. The feature exists for fixed-size provisioned readers where the cache contents are stable; on Serverless v2 the cache contents change with the ACU value, and pre-warming is not meaningful.

Aurora Auto Scaling, as discussed above, is not supported and not meaningful. The reader count is operator-managed; the per-reader capacity is automatic.

Regional availability is not uniform. The supported regions and engines page is the authoritative reference; recent additions (notably AWS GovCloud and some newer commercial regions) lag the announcement date of features by months. Teams operating in less-common regions should consult the regions matrix directly before assuming scale-to-zero or the 256 ACU ceiling is available locally.

Engine version requirements continue to evolve. Aurora MySQL 8.0 is supported with parameter family aurora-mysql8.0. Aurora PostgreSQL 13 through 17 are supported with families aurora-postgresql13 through aurora-postgresql17. PostgreSQL 17 support on Serverless v2 is the current state and is more recent than many older write-ups suggest; this is a fast-moving fact and worth verifying against the regions page rather than secondary sources.

Closing

Aurora Serverless v2 is best understood as a scaling daemon and a billing wrapper over the same Aurora compute and storage that runs provisioned clusters. The storage layer is unchanged. The compute layer is the Aurora engine at an elastic ACU allocation, sitting on Caspian-managed hosts. The scaling daemon watches CPU, memory, and network on each writer and each reader independently, and resizes in place without failover. Mixed-mode clusters work because every instance, serverless or provisioned, attaches to the same shared volume. The architectural seam is scale-to-zero: it works as advertised, but resume starts cold both at the ACU level and at the cache level. That combination should drive the enable-or-not decision on any given cluster.

The recommendation that follows from this model is to choose Serverless v2 when the workload is variable enough that the operational savings of automatic right-sizing exceed the cost of running on shared, oversubscribed compute. Reach for provisioned Aurora when the workload is steady, latency-sensitive at the cache level, or requires features like DAS that the serverless layer does not yet support. Scale-to-zero is the lever to reach for on low-traffic and development clusters, not on production primaries serving synchronous user traffic.

References

Related posts

Amazon Aurora: Understanding AWS's Cloud-Native Database

Comprehensive guide to Aurora architecture, cost analysis, and when to choose it over RDS. Includes migration strategies, performance characteristics, and real-world decision frameworks.

awsaurorards+6
Database Selection Guide: From Classical to Edge - A Complete Engineering Perspective

Comprehensive guide to choosing the right database for your project - covering SQL, NoSQL, NewSQL, and edge solutions with real-world implementation stories and performance benchmarks.

databasepostgresqlmysql+8
AWS Cognito + Verified Permissions for SaaS Authorization

A deep dive into building SaaS authorization with AWS Cognito and Verified Permissions. Covers Cedar policy language, multi-tenant patterns, JWT token flow, cost analysis, and common mistakes with TypeScript examples.

authorizationawscognito+4
External Authorization Management Systems: Choosing the Right Platform for Your Architecture

A vendor-neutral evaluation of external authorization platforms including AWS Verified Permissions, SpiceDB, OpenFGA, Cerbos, and OPA. Covers architecture patterns, cost analysis, and a decision framework for engineering teams.

authorizationsecurityarchitecture+5
TypeScript AI SDK Comparison: Vercel AI SDK vs OpenAI Agents SDK for Agent Development

A practical comparison of TypeScript AI SDKs for building AI agents - Vercel AI SDK, OpenAI Agents SDK, and AWS Bedrock integration. Includes code examples, decision frameworks, and production patterns.

typescriptai-toolsserverless+4