MongoDB Consistent Backups

Who uses MongoDB?

Based on the online resource* there are 37,658 Companies using MongoDB, MongoDB is most often used by companies with 10-50 employees and 1M-10M dollars in revenue, and the market share is about 5.0% and also many financial services using the MongoDB

Data is always critical whether its RDBMS or NoSQL or Flat files

AWS EBS Volume Snapshot is block-level incremental snapshots, which is one of the best and efficient way to protect the Data in the EC2 environment, you might have read lots of posts, whitepapers about advantages of using the snapshot technology. But the Snapshots of the volume are always crash-consistent, the concern fo the snapshot technology is there are possibilities of ending up into the backups are restorable or the data is corrupted as it is not application-consistent backups.

The MongoDB document says the WiredTiger Storage engine uses checkpoints to provide a consistent view of data on disk and allows MongoDB to recover from the last checkpoint. In case of MongoDB exits unexpectedly in between the checkpoints journaling is required to recover information that occurred after the last checkpoint.

It is like tail logs in the Microsoft SQL Server, admin can recover till the point of failure using the transaction log that were written since the last transaction log backup after recovering from the full backup and keep the database in the recovery pending mode

In a nutshell, if your MongoDB resides on the volume where the journaling is turned on there is no need for any additional effort to make the application consistent backup,

Since no mechanism is bullet-proof and sometimes production database is running in multiple volumes, so it is better to perform Flush and Lock the database before the snapshot

In the upcoming blog, we will discuss more in detail about creating the application-consistent backups of MongoDB running on an EC2 instance using pre and post scripts with Nimesa Cloud Data Protection for AWS