Scaling MongoDB in EC2

Scaling MongoDB in EC2

Background

 

MongoDB has a nice guide for getting started with running mongoDB in EC2.

This article presents some additional enhancements for increasing performance while lower storage costs in EC2. A subsequent installment presents techniques for deploying replica sets and auto-scaling nodes.

AWS Storage Performance

 

Many sources, including the MongoDB guide linked above, recommend using EBS-backed volumes. “For production systems we recommend using EBS-optimized EC2 instances.” EBS backed volumes indeed provide the convenience of data persistence across instance reboots. However, since EBS volumes are network-based, they come with a performance penalty compared to EC2’s ephemeral-based (aka “instance”) storage.

EC2 Instance Storage by Instance Type

By default, EC2 instances (excluding smaller or specialized instance types) come with ephemeral storage. Size ranges from 160GB (for an m1.small instance), to 840GB (for m1.large), to several terabytes (for larger instances).

These ephemeral stores are associated with space reserved on physical storage attached to the underlying instances, made available at boot time and persistent for the life of the machine instance.

Because the instance space is accessible via the system bus on the host of virtual, rather than over the network for EBS volumes, disk performance is higher. This i/o performance is critical to the performance of NoSQL datastores as documented in MongoDB Write Performance.

For this reason, when using EBS volumes, the recommendation is made to utilize IOP (i/o provisioned) EBS volumes to ensure a specific levels of I/O performance on the networked EBS volume utilized by Mongo. IOPS (IO provisioned storage), however, brings additional costs both in the form of monthly per GB and per IOP fees.

Storage performance was measured using hdparm on three separate EC2 instances, one running MongoDB on a default EBS volume, one on a 200 IOPS provisioned EBS volume and one running against an instance store.

EBS default volume ~= 100 IOPS or ~35MB/s

% hdparm -t /dev/xvdf1
Timing buffered disk reads: 106 MB in 3.01 seconds = 35.26 MB/sec

EBS 200 IOPS ~= 87MB/s

% hdparm -t /dev/xvdf1
Timing buffered disk reads: 258 MB in 3.00 seconds = 85.98 MB/sec

EC2 Instance Store ~= 131MB/s or ~300 IOPS

% hdparm -t /dev/xvdg
Timing buffered disk reads: 394 MB in 3.01 seconds = 131.07 MB/sec

Default EBS volume performance is 100 IOPS. Default instance store performance is ~300 IOPS. EBS volume performance above the 100 IOPS level can be guaranteed by purchasing IOPS provisioned storage at additional cost.

AWS EBS IOPS vs. Instance Store Cost

 

Here’s a comparison of the cost of ephemeral vs. instance storage in EC2, for 400GB of underlying storage for use as a MongoDB store.

AWS/EC2 EBS and Provisioned IOPS monthly costs

The current rate for IOPS on EC2 is $0.125 per GB-month of provisioned storage plus $0.10 per provisioned IOPS-month. For a 400GB EBS volume with 200 IOP provisioned storage performance, this equates to $70, in storage costs, for every node in a replica set:

(400GB * $0.125 / GB / month + $0.10 * 200 IOPS) * replica set nodes = $70 / node

Cost and Performance

 

Combining the data above for both cost and performance, for 400GB of database disk storage:

MongoDB on EBS w/ 200 IOPS performance: $70/mo/node

MongoDB on Instance store w/ ~300 IOPS performance: $0/mo/node (via ephemeral storage)

Replica Sets

 

MongoDB supports several different mechanisms for replication, including master/slave, replica pairs and replica sets. As of MongoDB v1.8, the preferred mechanism is replica sets.

In the next installment of this article, I talk about EC2 deployment of MongoDB replica sets and how the cost benefits and high performance of instance storage can be realized while minimizing or mitigating the risk presented by ephemeral storage. I’ll also talk about how replica set nodes can be quickly deployed and initialized in order to achieve a level of auto-scaling and fault tolerance in a production environment, with nodes deployed in sufficient number and across EC2 availability zones to ensure high availability.

 

Comments are closed.