AWS Kappa Architecture: A Deep Dive

Most of the Kappa architecture discussion focuses on using Apache tools (e.g. Kafka and Samza), but since my specialty is AWS serverless, I wanted to see how it might be built using the tools I’m familiar with.

I’m relatively new to the also relatively new “Kappa” architecture–first described by Jay Kreps in 2014 and then in Martin Kleppmann’s fascinating talk (highly recommended if you’re new as well)–but the summary of it in my head is: an append-only log is the canonical store, and you build “materialized views” that are read by whatever consumes them.

Putting that into a diagram might look like this:

One moment, rendering graph...

graph LR
    write(Data Writes) --> log
    log(Append-Only Log) --> mv1(View 1)
    log --> mv2(View 2)
    mv1 --> Consumer1
    mv1 --> Consumer2
    mv2 --> Consumer1
    mv2 --> Consumer2

Append-Only Log #

The first thing we need is an append-only database that can trigger changes in “views”, which we’ll think about next.

Since this is the core piece of the kappa architecture, we want something that can scale really fast and really big, that has mult-zone failover, with automatic backups, and the ability to read the whole database easily when creating new materialized views.

To me, this absolutely screams for DynamoDB + Streams, especially with the (sort of recent) ability to export a point-in-time table to S3:

One moment, rendering graph...

graph LR
    Writes --> db[DynamoDB as the log] --> |Stream Trigger| Processor

The writes can come from about anywhere, and of course a Lambda can be triggered from SQS, SNS, etc. and write to DynamoDB.

You can even write from an API Gateway directly to DynamoDB using VTL, saving the cost of a Lambda execution:

One moment, rendering graph...

graph LR
    w(API Gateway) --> v(VTL transform) --> db(DynamoDB)

Different business requirements will influence your decision, but for the example architecture in this article, we’ll write to SQS to trigger a Lambda. One advantage here is that you can configure the SQS-triggered-Lambda as the only service with write permissions, setting clear write boundaries:

One moment, rendering graph...

graph LR
    write(SQS:SendMessage) --> lambda
    lambda(Lambda) --> db
    db(DynamoDB as log)

You can have that Lambda do batch processing, to make it faster and cheaper, and you have the option of a strict FIFO (first-in first-out) queue, if needed. I’ve used this SQS->Lambda design in other architectures, and I know it can handle a very substantial write load.

Creating Views #

With the help of DynamoDB Streams, we can now register a processor on the stream to construct a materialized view. In AWS serverless architecture, that’s a simple Lambda integration:

One moment, rendering graph...

graph LR
    db(DynamoDB) --> |Stream| l(Lambda)

Depending on how many views you have, and how complex they are, you could create them in the single Lambda.

One moment, rendering graph...

graph LR
    db(DynamoDB) --> |Stream| l(Lambda) --> d(Views DB)

Alternately, you can use a fan-out design to trigger additional Lambda executions from the first Lambda. There are several options here, including Kinesis (for fat pipes), SQS (great if you need lots of control), SNS (cheapest per-message), and more.

SNS messages are really cheap and fast, and since multiple Lambdas can subscribe to the same SNS topic, one approach that decouples the processor from the fan-out uses SNS to publish a message to a single topic.

One moment, rendering graph...

graph LR
    db(DynamoDB) --> |Stream| l(Lambda) --> sns(SNS Topic)
    sns --> Lambda1 --> View1
    sns --> Lambda2 --> View2
    sns --> l2(Lambda...n) --> v(View...n)

If you need more control, you might have the Stream-triggered Lambda publish to multiple SNS topics.

One moment, rendering graph...

graph LR
    db(DynamoDB) --> |Stream| l(Lambda)
    l --> sns1(Topic 1)
    l --> sns2(Topic 2)
    sns1 --> Lambda1 --> View1
    sns2 --> Lambda2 --> View2

As an example, suppose one view only cared about “player” record types, and the second view only cared about “game” record types. If 90% of your log stream events were for “game” record types, using a single SNS topic would mean the Lambda building the “player” view would execute 90% of the time for no reason, which could be expensive.

Storing Views #

My understanding of Kappa architecture is that each “view” holds its own state, so for example the log might hold “players” and “games” together, but the materialized view for “players” would only hold that type, and likely would run through some transformations as well.

If you store that view in something like Redis, without persistence, it would mean that you would need a long-running service and, any time that service restarted, you would need to do a full table scan of the log.

My database of choice is DynamoDB, so that’s where I would store the view. In theory, and following best practices, we could use the same table that the append-only log is in, but that would mean balancing permissions and potential circular view triggers since the log table is configured to use Streams.

Instead, I recommend using a new DynamoDB table for storing views. It would be easy enough to create a new table for each view, which in turn would let you build unique indexes for each view. However, single-table design has many benefits, so I recommend pursuing that route, probably each view having its own primary key.

One moment, rendering graph...

graph LR
    db1(DynamoDB log) --> |Stream| l(Lambda)
    l --> sns1(Topic 1)
    l --> sns2(Topic 2)
    l --> snsn(Topic...n)
    sns1 --> Lambda1 --> db2
    sns2 --> Lambda2 --> db2
    snsn --> ln(Lambda...n) --> db2
    db2(DynamoDB views)

From there, your application can read from the DynamoDB “views” table using well designed queries.

If you want true reactive architecture, you could also set up a separate Streams on the “views” table that would trigger a Lambda to publish changes to subscribers. For example, with the “game” material view table, when a game updates you might want to push the updates over WebSockets to a mobile app.

One moment, rendering graph...

graph LR
    db1(DynamoDB log) --> |Stream| l(Lambda)
    l --> sns1(Topic 1)
    l --> sns2(Topic 2)
    l --> snsn(Topic...n)
    sns1 --> Lambda1 --> db2
    sns2 --> Lambda2 --> db2
    snsn --> ln(Lambda...n) --> db2
    db2(DynamoDB views) --> |Stream| l2(Lambda)
    l2 --> |WebSocket| s(Subscribers)

One reason to deviate from a single-table design on the “views” table is if you have some data types that receive frequent updates, but for which you do not need to execute a Lambda when the update happens. Since you can only enable DynamoDB Streams per table, a Lambda will execute for every change on the table, and this might be wasteful.

You can likely get by with only two view tables, one with Streams enabled and one without:

One moment, rendering graph...

graph LR
    db1(DynamoDB log) --> |Stream| l(Lambda)
    l --> sns1(Topic 1)
    l --> sns2(Topic 2)
    l --> sns3(Topic 3)
    sns1 --> Lambda1 --> db2
    sns2 --> Lambda2 --> db2
    sns3 --> Lambda3 --> db3
    db2(DynamoDB views 1) --> |Stream| l2(Lambda)
    db3(DynamoDB views 2)
    l2 --> |WebSocket| s(Subscribers)

Further optimization is possible by creating additional tables based on frequency of updates. You’d need to balance that optimization against the reasons why single-table design is considered best practice.

Quick Review #

With the architecture design above, using the SQS queue as the initial input to the append-only log, the full flow is:

Write to the SQS queue, which
Causes a Lambda execution (SQS -> Lambda integration), which
Writes to the append-only DynamoDB Streams database, which
Causes a Lambda execution (Streams -> Lambda integration), which
Publishes to n SNS topics, which
Causes n Lambda executions (SNS -> Lambda integration), which
Writes to m DynamoDB tables, k which have Streams, and of those
Causes k Lambda executions (Streams -> Lambda integration)

One moment, rendering graph...

graph
    write(SQS:SendMessage) --> l1
    l1(Lambda) --> db1
    db1(DynamoDB +Streams as append-only log) --> l2
    l2(Lambda) --> sns
    sns(n SNS topics) --> l3
    l3(Lambda) --> dbn
    dbn(m DynamoDB views tables, some +Streams) --> l4
    l4(Lambda)

Cost Modeling #

In any good architecture design, it’s a good idea to model what the costs would be. This design is entirely triggered by SQS writes, so we’ll frame the cost in terms of dollars per write.

To make this model clearer, we’ll place some final design decisions on how we split SNS topics and view tables. For this model, we’ll say we have 2 views tables, one with Streams enabled and one without. We have 5 SNS topics, 2 of which write to the view table with Streams, and the other 3 write to the other table.

SQS Costs #

We’ll make these assumptions about the SQS usage:

We don’t need to care about order, so it’s not a FIFO queue.
Writes are not likely to be batched.
Since reads use the Lambda integration, they will be batched to 10 messages per read.
Message size is small enough that we can always read all 10 (maximum total payload of 256 KB).
Fewer than 100 billion requests per month.

For every 10 SQS writes we’ll have one read, so with SQS costs at 0.4$ per million requests we have 0.4 + (0.4/10) or 0.44$ per million writes.

Lambda Costs #

We have two kinds of Lambda executions: 1) acting as a message forwarder, e.g. SQS to DynamoDB, and 2) the materialized view creator.

We’ll make these assumptions about the first kind:

The application code is light enough that we can use the 128MB instance.
The execution time should be in the sub-second range, we’ll estimate 500ms.
For the DynamoDB Streams integration, we can use batching at 10 records per execution.
When a record is written to the append-only table, it causes 2 SNS messages to be published, one that writes to the view table with Streams enabled, and one that writes to the view table with Streams not enabled.

The materialized view creator has the business logic, so it may need to query other data sources or external services to compose the view. Ideally this would not be the case, but it frequently happens, so we’ll make these assumptions about the second kind:

The application code requires the 512MB instance.
The execution time on average is 1500ms.

Now we can list the Lambda executions relative to the SQS writes:

Write to append-only log: 1 for every 10 SQS writes
DynamoDB Streams to SNS: 1 for every 10 SQS writes (each SQS write is a DynamoDB write)
SNS topic to DynamoDB (the bigger Lambda): n for every 1 SQS write (each log entry causes an SNS message)
DynamoDB Streams to WebSocket: k times n for every 1 SQS write

The total executions would be 1 + 1 + k*n for every 1 SQS write for the smaller Lambda, and n for every 1 SQS write for the bigger Lambda.

For our example, since a write causes 2 SNS messages, one of which writes to the view table with Streams, n is 2 and k is 1, so:

Cheap Lambda: 1 + 1 + 1 * 1 is 3 executions, so 3 * 500ms or 1500ms of execution time, so 1500 * 0.0000000021$ * 1,000,000 writes or 3.15$ per million SQS writes.
Expensive Lambda: 2 * 1500ms or 3000ms of execution time, so 3000 * 0.0000000083 * 1,000,000 writes or 24.9$ per million SQS writes.

Additionally, Lambdas cost 0.2$ per million requests. For every 1 SQS write, we have 3 + 2 so 1$ per million SQS writes.

Total cost: 3.15 + 24.9 + 1 or 29.05$ per million SQS writes.

SNS messages here are only used to trigger a Lambda execution, and the pricing page says:

No charge for deliveries to Lambda. Standard Lambda pricing applies. Data transfer charges apply between Amazon SNS and Lambda.

Since we only need to estimate data transfer charges, for our model we’ll assume that each SQS write is about 10KB, which is 10 GB per million SQS writes.

Data transfer costs start to become cheaper the more data you transfer, but for our cost model we’ll pick the most expensive, at 0.09$ per GB, which means 0.9$ per million SQS writes.

DynamoDB Costs #

Since we’re designing this as a reactive serverless architecture, we’ll pick on-demand pricing mode, but note that if your traffic is predictable, provisioned capacity can be much cheaper.

In this design, we have three tables, two of which have Streams enabled. In our example, every SQS message ends up causing a write to all three tables. In practice this is very unlikely: an SQS message will cause a write to the append-only log table, and typically to only one other view table.

From the pricing page:

DynamoDB charges one write request unit for each write (up to 1 KB)

Earlier we assumed 10KB per SQS write, so each write would be 10 RCU, which is currently priced at 1.25$ per million RCU, putting DynamoDB write costs at 10.25$ per million SQS writes.

Storage is free for up to 25 GB, but then costs 0.25$ per GB. At 10KB per SQS write, data storage is 10 GB per million SQS writes, or 1.25$ per million SQS writes.

We’re dividing the data between the two data tables, so our cost overall would be 1 + 0.5 + 0.5 or 2x those total costs.

The cost of DynamoDB Streams is based on reads, and since we can batch them up to 10 that’s one Streams RCU per ten:

Each GetRecords API call is billed as a streams read request unit and returns up to 1 MB of data from DynamoDB Streams.

Since half the messages are written to the view table with Streams, we have 1.5x Streams RCUs per SQS write. The first 2.5 million Streams RCUs per month are free, and after that are 0.2$ per million, which gives us 1.5 * 0.2 or 0.3$ per million SQS writes.

Total price then is 10.25*2 + 1.25*2 + 0.3 or 23.3$ per million SQS writes.

Total Costs #

Per million 10KB SQS writes, we have:

SQS: 0.44$
Lambda: 29.05$
SNS: 0.9$
DynamoDB: 23.3$

For a total cost of 53.69$ per million 10KB SQS writes.

Some things to note:

If your SQS writes are significantly smaller, you’ll likely fall into the free tier for some things, and overall costs will also be significantly smaller.
If you have fewer than about half a million SQS writes per month, you’ll fall into the free tier for some things. (Fan-out causes duplication so it’s about half the free tier, which is about a million for the services that matter in our architecture.)

Note that this does not cover the read cost, e.g. users reading from the materialized views. One of the strengths of materialized views is that they can be highly optimized for request type, so you should make sure to take that into account when estimating read costs.

Okay, but… Why? #

Compare the above architecture with a simple HTTP REST design:

One moment, rendering graph...

graph LR
    r(HTTP Request) --> api(API Gateway)
    api --> l(Lambda)
    l --> db(DynamoDB)

Versus:

One moment, rendering graph...

graph LR
    db1(DynamoDB log) --> |Stream| l(Lambda)
    l --> sns1(Topic 1)
    l --> sns2(Topic 2)
    l --> snsn(Topic...n)
    sns1 --> Lambda1 --> db2
    sns2 --> Lambda2 --> db2
    snsn --> ln(Lambda...n) --> db2
    db2(DynamoDB views) --> |Stream| l2(Lambda)
    l2 --> |WebSocket| s(Subscribers)

The question is: why add all that complexity?

The simple answer is: if your business use case needs it.

When I consider the different architectures, a couple things are easily noteable:

1: The simple HTTP design lacks WebSocket support, so already you’ll need to add components to match functionality:

One moment, rendering graph...

graph LR
    r(HTTP Request) --> api(API Gateway)
    api --> l(Lambda)
    l --> db(DynamoDB)
    l --> |WebSocket| s(Subscribers)

2: For AWS API Gateway, Lambda timeout is 30 seconds as a hard limit, so any long running process will need some queue anyway:

One moment, rendering graph...

graph LR
    r(HTTP Request) --> api(API Gateway)
    api --> l(Lambda)
    l --> db(DynamoDB)
    l --> |WebSocket| s(Subscribers)
    r2(Long Running Request) --> api2(API Gateway)
    api2(Lambda) --> sqs(SQS)
    sqs --> l2(Long Running Lambda)
    l2 --> q(Then...?)

I’ve personally designed and built those sorts of architecture many times, and to be honest it can work fine. There are a handful of gotchas and foot-guns, but it’s totally reasonable for many use cases.

However… the Kappa architecture does present some compelling values (covered in more detail in Martin Kleppmann’s talk which you should totally go watch):

Every write is to an append-only log stream, so you can go back and look at how data changed over time.
All writes are already asynchronous and long/short running requests are all the same, simplifying some design issues.
The write table can be optimized for writing, and the view tables can be optimized for reading, instead of tradeoffs to make a single table suitable for both reading and writing.
It’s very easy to add new materialized views, or in other words it’s easy to add new reads that are highly optimized, without needing to rethink the database.
The flows of data seem like they would be much easier to follow and debug, when things inevitable go wrong.

Final Thoughts #

I didn’t expect this to be such a deep dive into architecture design, but I hope it was helpful to you.

To be clear, I haven’t built anything with this full Kappa-esque architecture, this was just an exercise in designing a system that followed the Kappa architecture principals as I understand them.

If you found this to be useful, or if you have comments or corrections on the design or cost model methodology, be sure to let me know!

Tobias Davis