Using Cloudflare Durable Objects

2025-04-03

In the previous post I explored using Cloudflare KV for managing blog content. While I didn’t think the eventually consistent model would be a problem, it turned out that list and get were not consistent, which led to errors for a full minute after new content was published. This did have a familiar feeling - I mention in that post that I had seen framework adapters for Cloudflare Pages use KV for storing content, and I remember similar buggy behaviour following each deploy.

Let’s explore using Durable Objects instead…


Durable Objects vs D1

I tried implementing this in both D1 and Durable Objects. D1 has the following advantages:

  • Point in time rollback
  • Web UI for viewing content
  • Remote access to the DB via HTTP (for things like Drizzle Kit/Studio)

Each of these are nice to have, but not impossible to work around. We lose all that using the (beta) SQLite in Durable Objects. But we gain:

  • No manual steps required to deploy, a new DO - just add a couple of lines to your wrangler.yaml
  • Sharding for multi-tenant architecture is built-in
  • JS is guaranteed to execute alongside the database

Each of these is a huge win in my book. I’ll touch on each of them below.

Leakproof Abstraction

A Durable Object looks something like this:

/// <reference types="@cloudflare/workers-types" />
import { drizzle, DrizzleSqliteDODatabase } from "drizzle-orm/durable-sqlite";
import { like, desc, eq, isNotNull } from "drizzle-orm";
import { posts } from "./schema";
import { migrate } from "./migrations";

export default class DurableDatabase extends DurableObject {
  storage: DurableObjectStorage;
  db: DrizzleSqliteDODatabase;

  static getDefault(env: Env) {
    const id = env.DurableDatabase.idFromName("default");
    const stub = env.DurableDatabase.get(id);
    return stub;
  }

  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env);
    this.storage = ctx.storage;
    this.db = drizzle(this.storage, { logger: false });
    ctx.blockConcurrencyWhile(async () => {
      await migrate(ctx.storage.sql);
    });
  }

  async insert(user: typeof posts.$inferInsert) {
    await this.db.insert(posts).values(user);
  }

  async update(post: typeof posts.$inferInsert) {
    await this.db
      .update(posts)
      .set(post)
      .where(eq(posts.slug, post.slug))
      .execute();
  }

  async list(showAll: boolean = false) {
    if (showAll) {
      return this.db.select().from(posts).orderBy(desc(posts.date)).all();
    } else {
      return this.db
        .select()
        .from(posts)
        .where(eq(posts.status, "published"))
        .orderBy(desc(posts.date))
        .all();
    }
  }

  async get(slug: string) {
    const post = this.db
      .select()
      .from(posts)
      .where(eq(posts.slug, slug))
      .then(takeUniqueOrThrow);
    return post;
  }
}

All instance methods of a class are guaranteed to run alongside the SQLite database:

When a Durable Object uses SQLite, SQLite is invoked as a library. This means the database code runs not just on the same machine as the DO, not just in the same process, but in the very same thread. Latency is effectively zero, because there is no communication barrier between the application and SQLite. A query can complete in microseconds. - link.

There is just no way to write code that makes a high-latency query to the database. This gives a leakproof abstraction. All database access code lives inside the class. Whether you use Drizzle ORM like I did, or write raw SQL, nothing outside of the DO needs to know.

In progress - more to come soon

Managing content in Cloudflare KV

2025-03-27

The easiest way of managing content for a developer blog is probably just Markdown files living in the repo. Frameworks like Astro come with support for this built-in, but it’s trivial to do in any framework using Vite for the build process, with the built-in glob import:

const posts = import.meta.glob('./posts/*.md');

Which generates something that you can iterate over:

{
  './posts/post1.md': () => import('./posts/post1.md'),
  './posts/post2.md': () => import('./posts/post2.md'),
  // ...
}

I’m exploring an approach on Cloudflare, and trying to avoid any build step (except for wranglers inbuilt esbuild). It’s easy to include markdown files using rules and then import them directly in your worker, but it’s not easy to list all the files for the index.

So instead of storing Markdown in the repo, I’m experimenting with using Cloudflare KV. I’m pretty sure many of the framework adapters for Workers used KV to store content before Cloudflare Pages and then Workers Assets came along, so it seems like a pretty standard option for that kind of thing.


Of course, since I can no longer just edit md files locally, I’ll need to build a simple admin site to edit them. This is where something like django admin shines, but LLM’s make it easy to generate that sort of thing, and they will only get better. I’ll work on building this manually, so that I can use it as a reference for AI tools in the future.

KV API

Here are our types:

type Post = {
  slug: string;
  content: string;
  metadata: PostMetadata;
};

type PostMetadata = {
  title: string;
  status: "draft" | "unlisted" | "published";
  date: string;
};

For creating and updating, we create two different functions so that we don’t accidentally overwrite a previous post if we use a blog slug we’ve used before:

  async addPost(post: Post): Promise<Post> {
    if (await this.kv.get(post.slug)) {
      throw new Error("Post title already exists");
    }
    this.kv.put(post.slug, post.content, { metadata: post.metadata });
    return post;
  }

  async updatePost(post: Post): Promise<Post> {
    if (await this.kv.get(post.slug)) {
      this.kv.put(post.slug, post.content, { metadata: post.metadata });
      return post;
    } else {
      throw new Error("Post not found");
    }
  }

Then we just need a way of getting and listing posts

  async getPost(slug: string): Promise<Post> {
    const kvResult = await this.kv.getWithMetadata(slug);
    if (kvResult === null) {
      throw new Error("Post name not found");
    }
    const metadata = kvResult.metadata as PostMetadata | null;
    if (!metadata) {
      throw new Error("Post metadata not found");
    }

    if (!kvResult.value) {
      throw new Error("Post content not found");
    }

    const post = {
      slug: slug,
      metadata: metadata,
      content: kvResult.value,
    };
    return post;
  }

  async listPosts(): Promise<{ slug: string; metadata: PostMetadata }[]> {
    const { keys } = await this.kv.list();
    var posts = keys.flatMap((key) => {
      let metadata = key.metadata as PostMetadata | null;
      if (!metadata) {
        console.error(`No metadata found for ${key.name}`);
        return [];
      }
      return [
        {
          slug: key.name,
          metadata: metadata,
        },
      ];
    });
    // Sort by date
    posts = posts.sort((a, b) => {
      return a.metadata.date.localeCompare(b.metadata.date);
    });
    posts = posts.reverse();
    return posts;
  }
}

That should be all we need until we want to introduce tagging and searching.

Eventual Consistency

One gotcha with KV is the eventually consistent model. Will that cause problems? Let’s look at some concrete cases:

put("key1", "new content")

Followed by calling get several times might result in:

get("key1")
    -> "old content"
get("key1")
    -> "old content"
get("key1")
    -> "new content"

So it might take a few seconds (or up to 60) for everyone to see the new content. Not a big deal. But what about this:

list()
    -> ["key1"]

put("key2", "more new content")

list()
    -> ["key1", "key2"]

get("key2")
    -> Error("Post content not found")
get("key2")
    -> Error("Post content not found")
get("key2")
    -> "more new content"

Perhaps unexpectedly, list and get return inconsistent results. It’s easy to imagine this causing a bug where you click on a link to view the post, but it errors out. But is that theoretical and rare, or pretty common in practice?

Well I tried it, and it happens every time: calls to the list api return updated data long before calls to the get API. This means that when a post is published to the blog, it appears on the home page list almost immediately, and for a full minute the link returns a 404 page. A possible solution will be to enforce an order and a delay between the different statuses:

  status: "draft" | "unlisted" | "published";

If we change from “draft” to “unlisted” it will ensure that the page is available via direct link, but not listed on the index. We then bump it to “published” a minute later. This could be automated via Workflows.

There’s also some problems with the Admin UI. Eventual consistency might be fine for a blog, but it’s not suitable for the editing experience. If the save button triggers a page reload, then you will be shown an older version of the content, and have to keep refreshing for a full minute before you see your changes. I have a feeling that Durable Objects will be the answer for this - since it doesn’t need fast global access, the admin site can write to a DO and which can then store it to KV for the main site to read.

More to come on these solutions in Part 2.

Landscaping

2025-03-01

It’s been a minute since I’ve stepped back to survey landscape for web development frameworks in 2025. I took some time this week to do so. Note that I am really delving into Cloudflare at the moment, so I’m testing each of these out with Cloudflare Workers with the new static assets feature, instead of the now deprecated Pages.


SvelteKit

This was, and still is, my choice for the easiest to teach full-stack framework.

The magic is in all the right places for a CS grad with little or no web development experience. I do like JSX, but it is an abstraction over HTML, and you don’t want to be living in that abstraction if you don’t yet know HTML.

Recent changes such as the (very magic-literal) Runes and Routing with `+Page.ts do push the balance for convention-over-configuration a little far for my liking, and it all feels very strict and arbitrary.

Plus, I could never remember the syntax for writing a loop in Svelte (spoiler - it’s not that hard.

Deploys to Cloudflare easily.

NextJS

A refreshing change to go back to the explicitness of React and NextJS, vs the magic of SvelteKit. React Server Components are really nice, and take the idea of single file components to the next level, by enabling you to write server logic right inside a component, hidden from the outside.

I could see myself building an entire application in React Server Components, with pretty heavy use of HTMX, and a little bit of vanilla JS. But if you’re doing that then you’re introducing a lot of framework, not to mention cognitive overhead, just to build a predominantly server rendered app.

Deploys to Cloudflare easily, however there’s no easy way to deploy the application and any Durable Objects at the same time. You need a separate worker for that. See this issue. Note that SvelteKit has the same problem.

Hono

A strikingly simple framework, with JSX built in. No page-based routing or anything like that built-in, and although honox is tackling that, it’s trivial to wire up your routes to JSX components manually. That’s my preferred approach. It’s basically convention-free, but it doesn’t feel like a foot gun.

Runs perfectly on Cloudflare, without even an external build step (eg. vite). There’s actually a big advantage to this, because wrangler dev just works with no build step or configuration, and so you get a full local development experience, with even Durable Objects and Workflows working locally.

You’re directly writing the script that the worker runs, so there’s no issues deploying DO’s like with NextJS and SvelteKit.

I’m going to keep delving into this. I’m working on a little Starter Kit to document usage of all the CloudFlare APIs and as a reference for my own conventions.

D1 does not have read replication (yet)

2025-02-24

D1 is Cloudflare’s main relational database offering. But a year after GA and it still does not have replication. They are promising it…


D1’s read replication will automatically deploy read replicas as needed to get data closer to your users: and without you having to spin up, manage scaling, or run into consistency (replication lag) issues. 2024-04-01

Automatic read replication: our new storage subsystem is built with replication in mind, and we’re working on ensuring our replication layer is both fast & reliable before we roll it out to developers…when we enable global read replication, you won’t have to pay extra for it, nor will replication multiply your storage consumption…We think built-in, automatic replication is important… 2023-05-19

Unfortunately these promises have been misinterpreted in some cases. From the Prisma docs:

Cloudflare’s principles of geographic distribution and bringing compute and data closer to application users, D1 supports automatic read-replication. It dynamically manages the number of database instances and locations of read-only replicas based on how many queries a database is getting, and from where. For write-operations, queries travel to a single primary instance in order to propagate the changes to all read-replicas and ensure data consistency. - Prisma docs

I love the Syntax podcast, but in today’s episode it sounds like they have misread this too.

With Cloudflare’s developer week next week (hopefully, the 2024 schedule is still up), here’s hoping it is finally released.

Update 10th April 2025 - it is finally here and it is really, really cool.

Python Dependency Management

2024-07-05

Python package management has long been a struggle. In January 2017, the PipEnv project was started, and by 2018 it became the officially recommended package manager.

It brought a fantastic npm-like experience to Python, with easy configuration via a YAML Pipfile and a straightforward CLI. But then it went dead, not seeing any releases between November 2018 and April 2020. People moved on to Poetry, and I ended up using Conda quite a bit, especially when numpy/scipy was required.

While it does look like PipEnv has seen regular releases since April 2020, I’m seeing more people just use the built in pip+virtualenv tools. Here is how to do that…


From the Python docs.

Create and activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate

Install dependencies

python3 -m pip install --upgrade requests
python3 -m pip install -r requirements.txt
python3 -m pip freeze

Deactivate environment

deactivate

Example Blog Post

2024-01-30

This is an example blog post written in MDX format. You can use all the standard Markdown features plus JSX components.

This is the preview section that will show on the homepage. Everything before the page break will be used as the preview text.


Code Highlighting

Here’s some TypeScript code with syntax highlighting:

function greet(name: string): string {
  return `Hello, ${name}!`;
}

console.log(greet("World"));

Images

Images can be included with standard Markdown syntax:

Alt text

More Content

This section would appear after the “Read More” link in the post preview. You can add as much content as needed here.

Subsections

You can organize your content with subsections and include:

  1. Numbered lists
  2. Code blocks
  3. Bold and italic text
  4. And much more!

The MDX format also supports importing and using React/Astro components if needed.

Base64 Conversion

2023-01-04

Here is how to convert a string to and from base64 from the terminal.

echo "abcdef" | base64
YWJjZGVmCg==

echo YWJjZGVmCg== | base64 -d
abcdef

AWS re:Invent 2022

2022-12-28

After a couple of years of cancelled bookings due to COVID-19, this year was the first time I’ve been able to attend AWS re:Invent. What a conference. Unlike any other event I’ve ever seen, the scale of this thing is wild, even for Vegas. Everyone warned me about travelling through LAX on the busiest travel day of the year. But overall, things were pretty smooth. After over 24 hours of flights and layovers, the trip from Adelaide->Sydney->Los Angeles->Las Vegas ended in a wonderful thanksgiving lunch with friends I hadn’t seen since 2019.

The conference was incredible. But here are five lessons for next time…


1. It’s pretty spread out

The conference is spread out over multiple properties, ranging from a five minute walk to a shuttle bus ride. Staying within walking distance of the exhibit hall makes it easy to attend the morning keynotes. I stayed at the Palazzo which was probably the nicest hotel I’ve stayed at in Vegas. Bonus: AWS provide good rates if you get in early and book when you purchase a conference pass.

2. It’s not a big deal if you can reserve a seat for a talk

There are plenty of walk-up seats available for every talk. When I didn’t have a reserved seat, I got there 30-60 minutes early and never missed out.

3. Not everything is in the name

We use AWS CoPilot heavily and were pretty disappointed that searching the program only turned up a couple of talks on the topic. But after going to a talk on ECS and seeing them use CoPilot the whole time, despite not including it in the description, we realised we may have missed a lot of valuable sessions!

4. re:Play is a party

re:Play is the big party on Thursday night. It’s not a repeat viewing of previous sessions, not a board game night, and definitely not a movie night. Probably take it easy on Wednesday night in anticipation of this.

5. Leave Friday afternoon or Saturday

The exhibit hall closes on Thursday afternoon, the big party is Thursday night, and the talks wind up by lunchtime on Friday. All of this makes a Friday or Saturday morning exit recommended.

Delete local and remote git tag

2022-05-23

I can never remember how to delete a remote tag in git. Here is how to do it:

First delete local tag:

git tag -d 12345

Then delete remote tag:

git push origin :refs/tags/12345

RQ with SQS

2022-04-14

RQ is a great library for building a simple decoupled worker queue, which can invoke arbitrary functions from your code base.

As the name implies, it requires a redis service. If you’re deploying on AWS, you might already have familiarity with SQS and prefer to use that instead. Inspired by RQ, here’s how we do it with no dependencies whatsoever…


Considerations

This comes with all the same caveats as RQ, in that it is very tightly coupled, and absolutely not a substitute for a loosely-coupled pub-sub system (i.e. SNS->SQS).

There is also no priority management. You could build that, but it’s probably worth reaching for something like Celery at that point.

So what is it useful for? Really just one thing: executing an arbitrary function, with guaranteed at-least-once execution, plus retries and timeout.

Architecture

We’re on AWS, so we may as well use Lambda. In that situation, we would have the following:

1. Main application

A Lambda function running Flask, hooked up to API gateway.

If you can’t deal with the cold starts or can’t deploy to Lambda for some other reason, you could just as easily run the Flask application in ECS or App Runner etc.

2. Background worker

A Lambda function handling SQS events.

The key point is that both the main application and the background worker share the same code base. Ideally for simplicity, you deploy the same Docker image to both.

Implementation

First, we need to define our function to queue the message to SQS:

def enqueue(func_name, *args, **kwargs):
        payload = {"func_name": func_name, "args": args, "kwargs": kwargs}
        response = boto3.client("sqs").send_message(
            QueueUrl=environ["WORKER_QUEUE_URL"],
            DelaySeconds=0,
            MessageBody=json.dumps(payload),
        )

Then we can call it anywhere we like, passing in the fully qualified path name of the function we want to call, plus any arguments:

enqueue("app.emailer.send_email", "person@gmail.com", "Hello", "Here is the body")

In this case, the function name we pass in is a simple function to send an email, looking like this:

def send_email(email_address, subject, body):
    //Send email via SMTP etc

Create a new file, where the handler function will live. This is heavily inspired by RQ:

import yaml
from os import environ
import importlib
import logging
import boto3
from botocore.exceptions import ClientError
import json
import threading


TIMEOUT_WARNING = int(environ["TIMEOUT_WARNING"])


def import_function(name):
    name_bits = name.split(".")
    module_name = ".".join(name_bits[:-1])
    module = importlib.import_module(module_name)
    function_name = name_bits[-1]
    return getattr(module, function_name)


class TimeoutThread(threading.Thread):
    """Creates a Thread which runs (sleeps) for a time duration equal to
    timeout and raises an exception if it is not stopped
    """

    def __init__(self, lambda_payload):
        # type: (float, int) -> None
        threading.Thread.__init__(self)
        self.lambda_payload = lambda_payload
        self._stop_event = threading.Event()

    def stop(self):
        self._stop_event.set()

    def run(self):
        self._stop_event.wait(TIMEOUT_WARNING)

        if self._stop_event.is_set():
            return

        # If we get to this point we need to log a timeout warning
        logging.error("Lambda timeout warning triggered")

def handler(event, context):
    if len(event["Records"]) > 1:
        raise ValueError("Lambda should only process one event at a time")
    record = event["Records"][0]
    message = json.loads(record["body"])
    timeout_thread = TimeoutThread(
        message
    )
    timeout_thread.start()

    func = import_function(message["func_name"])
    args = message["args"]
    kwargs = message["kwargs"]
    func(*args, **kwargs)
    timeout_thread.stop()

Or you may want to wrap it in an app context:

from flask_client import create_app
...
with app.app_context():
    func(*args, **kwargs)

The whole timeout thing is really only necessary if you’re having trouble getting error messages relating to timeouts. If the Lambda timeout is triggered, then it just shuts down. But if you have an internal timeout running inside Lambda, it gives you the chance to add some logging or send it to Sentry.

When deploying, you will want to set the timeout warning to less than the lambda timeout. For example, if the lambda timeout is 900 seconds, set the timeout warning to 880 seconds.

Deploy

Remember, all of this lives inside the same code base. We build just one docker image, but execute it in different ways.

Running the flask application is no different to usual. Probably just follow the instructions here.

To run the background worker, we need configure lambda to run the same docker image, but using awslambdaric. That’s the AWS Lambda Python Runtime Interface Client, and it allows correct parsing of the SQS payload and running the handler function:

After installing awslambdaric, we configure the lambda function docker with the following:

entry_point = ["python", "-m", "awslambdaric"]
command     = ["lambda.handler"]