Tobias Davis

Designing better software systems.

Resources | Writings | Email List | RSS

Site logo image

Cloudflare Worker and MongoDB

A great NoSQL database that I recommend for startups and small businesses is MongoDB, but when designing serverless architectures it has traditionally been difficult to implement, since MongoDB uses persistent connections.

If you are using the managed MongoDB Atlas offering, you can now make use of the Data API (in “Preview” as of 2022-01-25 Update 2023-02-01: out of preview/beta) which is a simple HTTP interface to the database, allowing the core functions: aggregate, deleteOne, deleteMany, find, findOne, insertOne, insertMany, replaceOne, updateOne, and updateMany.

It’s a really straightforward HTTP request, which means it’s potentially great for use in Cloudflare Workers using fetch, because you don’t even need an SDK.

The Demo #

Interacting with the Data API is really just an HTTP call, and in Cloudflare Workers that means using fetch, so we end up with something like this, in a single src/index.js file.

Note: the source code is also available.

First, we handle the Worker request using the nice new export format:

// using the nice new export format
export default {
async fetch(request, env) {
try {
return await handleRequest(request, env)
} catch (e) {
return new Response(e.message)
}
},
}

Above that we’ll define the handleRequest function. For this demo, we’ll make a simple call to find a set of documents, using the pathname as a query parameter. (In real life, you’d likely have an authentication middleware and some sort of router to extract parameters safely.)

async function handleRequest(request, env) {
// for this demo we want requests to "/smith" to
// turn into `{ filter: { name: 'smith' } }` so we
// extract the pathname
const url = new URL(request.url)
const name = url.pathname.replace(/^\//, '')
const { documents } = await dataApiFind(env, {
filter: { name }
})
return new Response(JSON.stringify(documents), {
status: 200,
headers: { 'Content-Type': 'application/json' }
})
}

And above that we’ll define the dataApiFind function. For this demo we’ll leave all the boilerplate in it, but in practice you’d probably use a library instead.

(Things like the MongoDB Data API key should be stored as secret/encrypted environment variables and bound to your Worker. You then access those with env.BINDING_NAME.)

const REGION = 'data' // or pin to a region with e.g. 'us-east-1'
const URL_PREFIX = `https://${REGION}.mongodb-api.com/app`
const URL_SUFFIX = 'endpoint/data/beta/action'
const ACTION = 'find' // or 'insertOne' etc.

async function dataApiFind(env, { filter }) {
// the API key is managed via the MongoDB Atlas dashboard
const apiKey = env.MONGODB_API_KEY
// the API ID is assigned by MongoDB Atlas on setup
const apiId = env.MONGODB_API_ID
const url = `${URL_PREFIX}/${apiId}/${URL_SUFFIX}/${ACTION}`
return fetch(url, {
headers: {
'Content-Type': 'application/json',
'Access-Control-Request-Headers': '*',
'api-key': apiKey,
},
body: JSON.stringify({
// the cluster name from Atlas, e.g. 'Cluster0'
dataSource: env.MONGODB_DATA_SOURCE,
// the database name, e.g. 'myApp'
database: env.MONGODB_DATABASE,
// the collection name, e.g. 'tasks'
collection: env.MONGODB_COLLECTION,
filter,
})
})
}

And that’s really all there is to it!

Make an HTTP request to your Worker:

GET /Smith HTTP/1.1
Accept: application/json

The response should look like this, depending on what data is in your database of course:

HTTP/1.1 200 OK
Content-Type: application/json
{
"documents": [
{
"_id": "001",
"name": "Smith",
"type": "apple",
"details": "Green and sour."
},
{
"_id": "002",
"name": "Smith",
"type": "worker",
"details": "Works with metals."
}
]
}

Small Plug #

I made a library to make it a little nicer, it’s basically just a wrapper on top of fetch but also handles some inconsistencies in the current Data API responses.

Other Notes #

Limitations #

At the time of writing, you couldn’t use the new MongoDB Atlas Serverless offering (also still in Beta), so you don’t actually get what I would call a true “serverless” experience–you end up having to pay per-hour for the database instance you use.

Update 2023-02-01: although I haven’t had a chance to test it in production under varying loads, initial testing of the new MongoDB Atlas Serverless offering, now out of beta, is very promising. I would start there for any new projects.

It’s also technically possible to use MongoDB Atlas -> Realm App to get a similar experience, but that isn’t as well documented as the Data API so I’m not too keen on that approach. (In fact, to make it work in the past I had to reverse-engineered the MongoDB SDK, since I couldn’t get the bundler to work if I tried to use that. Any MongoDB devs want to reach out, I’d love to talk about how to make that DX better!)

Connections #

I’m still unclear on how many and when “connections” are used/consumed when using the Data API. For example: after launching a demo Worker and reloading the page a few dozen times for testing, the MongoDB Atlas dashboard showed there were 30 connections to the sandbox database. If you have a website very high in read traffic, and especially since the Data API is still in Preview, I would be cautious in using the Data API until some sort of SLA guarantee or at least benchmark numbers are published by their dev team.

Network Delay #

For this demo I was limited for time so I wasn’t as rigorous as I would have liked. I deployed a MongoDB shared database to us-east-1 and made several dozen requests, timing the millisecond delay caused by the fetch request using a simple Date.now() counter. (Note: for security purposes, Cloudflare doesn’t guarantee this as a perfect method of timing, but it should get us close enough.) The size of the data returned was done using a simple JSON.stringify(body).length (this is rudimentary but sufficient for estimations) and was in the 2.7kB range.

On average, the fetch request to MongoDB’s Data API added about 100-150ms. About 15% of requests took less than 80ms, and about 30% took more than 200ms. My intuitive guess is that there’s additional time involved when connection pools need to be handled.

That’s a larger delay than I would want for my applications, but I was using the M0 Sandbox version, which has a “Low network performance”, according to the configuration page. From my experience, using any of the paid dedicated tiers will give a significant network speed boost, thereby (presumably) reducing the network latency between MongoDB and Cloudflare.

Update 2023-02-01: I’ve now used the Data API (not the “serverless” version) in production for a client app for over a year, and I have zero reservations for low to moderate useage applications, if you are using a paid dedicated tier. Network latency is on the order of 50ms typical, and I have had zero problems with connection pool saturation.

Wondering if MongoDB is the right choice for you but don’t want a pushy sales person trying to convince you? Send me a message, I’d love to help you make the right choice!


Your thoughts and feedback are always welcome! 🙇‍♂️