In this post, I want to briefly describe 4 ways of getting real-time data from MongoDB database. If you’ve ever needed to do that, you probably are familiar with some of them and you know that it’s generally speaking not an easy task.
1. Hooks in application’s database layer
This is a very simple approach and there is not much to explain. We simply emit events when specific actions, like insert or update take place. These events are emitted in an abstraction layer over a native driver. This layer may look like this:
const RepositoryFactory = (mongo, emitter) => ({
async addDocument({
collection,
document,
}) {
emitter.emit(`${collection}.beforeInsert`, document);
const insertedDocument = await mongo.collection(collection).insertOne(document);
emitter.emit(`${collection}.afterInsert`, insertedDocument);
},
});
The main advantage of this approach is its simplicity and good performance. The disadvantage is, that all updates made to database that are not made by abstraction layer method (manual edit using shell, session middleware store etc.) will not be caught and emitted. This means, that we are likely to miss some possibly important updates. The perfect example of an actual implementation using this approach is popular object modeling library Mongoose and its middleware feature.
2. Pull and diff query results
This is a slightly more advanced approach, that used to be used in Meteor up to version 0.7. As the name suggests, we need to first query collection and fetch results, store it somewhere, sometime later fetch it again and find the differences between these two sets of documents. We can do it again and again in some defined intervals, like 1 minute or so and this way get near real-time updates. The main disadvantages of this approach are its poor performance and resources usage. If collection is small and query is simple, there is not a problem. But once we need to work with bigger result sets, both space and time consumption may become an issue.
3. oplog tailing
oplog
is a special capped collection, where all operations performed on the database are recorded. It’s only available when MongoDB process is a part of a replica set. When used with tailable cursors feature, it lets user subscribe to all changes made to the database. This is a more advanced technique used for example in Meteor 0.7+. I won’t describe it here in more detail, but if you are curious how it works, here are some useful links: oplog observe driver & tailing MongoDB oplog sharded clusters
4. Change Streams
Change Streams is a new feature of MongoDB 3.6. It lets us subscribe to changes in collection using native driver. Just like oplog
Change Streams are available only in the replica set configuration. This approach is as powerful as oplog, but at the same time it is not that complicated. A simple example in Node.js using the latest 3.0.0 MongoDB driver:
MongoClient
.connect('mongodb://localhost:27017/test')
.then(db => {
const pipeline = [{
$project: {
myField: 1,
}
}];
const changeStream = db.collection('test').watch(pipeline);
changeStream.on('change', change => {
console.log(change);
});
});
As you can see, we can even use some of the aggregation framework features to alter Change Streams output (for example to get only insert operations). If you want to know more about this feature, read official docs: Change Streams
Summary
These are the 4 most common ways to deal with real-time updates of MongoDB databases. I didn’t touch cases of sharded clusters configuration, but as you probably expect, things are getting more complicated there. If you are working with the latest MongoDB 3.6, you should use Change Streams that have built-in mechanisms to work well in sharded clusters deployments.
Sources:
oplog observe driver
tailing MongoDB oplog sharded clusters
Change Streams
Author: Tymoteusz Dzienniak