September 2015 – Robert Yokota

The following post originally appeared in the Yammer Engineering blog on September 10, 2014.

In a previous blog, I demonstrated some good results for HBase using an automated test framework called Jepsen. In fact, they may have seemed too good. HBase is designed for strong consistency, yet also seemed to exhibit extraordinary availability during a network partition. How was this possible? Apparently, HBase clients will retry operations when they fail. This can be better seen during a sample run below:

During the network partition, no requests are successful; after the partition is healed, requests are able to succeed, and the request latencies slowly decrease.

Here is a longer network partition showing much greater latencies:

In fact, if the network partition is long enough, the HBase client will start to report timeouts:

Timed out waiting for some tasks to complete!

In such cases, not all requests will be successfully processed. Here is a typical result:

0 unrecoverable timeouts
Collecting results.
Writes completed in 500.038 seconds

5000 total
4974 acknowledged
4974 survivors
all 4974 acked writes out of 5000 succeeded. :-)

An even longer network partition can lead the Jepsen test to run out of memory:

java.lang.OutOfMemoryError: unable to create new native thread

So HBase exhibits behavior typical of a CP system, and when facing a network partition, cannot achieve both consistency and availability. Sorry if I gave that impression.

The following post originally appeared in the Yammer Engineering blog on September 10, 2014.

The Yammer architecture has evolved from a monolithic Rails application to a micro-service based architecture. Our services share many commonalities, including the use of Dropwizard and Metrics. However, as with many micro-service based architectures, each micro-service may have very different needs in terms of persistence. This has led to the adoption of polyglot persistence here at Yammer.

As such, engineers at Yammer are very interested in understanding and evaluating the differences between various databases. One particularly enlightening exposition has been the Call Me Maybe series by Aphyr. This series demonstrates how various databases behave in the presence of network partitions caused by an automated test framework called Jepsen(*). Some of the databases it covers are Postgres, Redis, MongoDB, Riak, Cassandra, and FoundationDB. I have often wondered how HBase would behave under Jepsen. Let’s try augmenting Jepsen to find out.

Jepsen actually consists of two parts. The first part is a provisioning framework written using salticid. This provisions a five-node Ubuntu cluster, where the nodes are referred to as n1, n2, n3, n4, and n5. It can then install and setup the desired database. The second part is a set of runtime tests written using Clojure. (Aphyr also has a great tutorial on Clojure called Clojure from the ground up.)

As I don’t have much experience with HBase, I decided to forego the salticid customizations and simply use Cloudera Manager to set up a five-node HBase cluster on EC2. I used CDH 5.1.2, which bundles hbase-0.98.1+cdh5.1.2+70. I then modified the /etc/hosts file on each of the nodes to add entries for n1 through n5 (with n1 being the ZooKeeper master). With that step done, it was time to write the actual tests, which are available here.

The first test I wrote, hbase-app, is based on one of the Postgres tests. It simply adds a single row for each number. While it is running, Jepsen uses iptables to simulate a network partition within the cluster. Let’s see how it behaves.

lein run hbase -n 2000
...
0 unrecoverable timeouts
Collecting results.
Writes completed in 200.047 seconds

2000 total
2000 acknowledged
2000 survivors
All 2000 writes succeeded. :-D

All 2000 writes succeeded and there was no data loss. So far, so good.

The second test I wrote, hbase-append-app, is based on one of the FoundationDB tests. It repeatedly writes to the same cell by attempting to append to a list stored as a blob while a network partition occurs. The test uses the checkAndPut call, which allows for atomic read-modify-write operations.

lein run hbase-append -n 2000
...
0 unrecoverable timeouts
Collecting results.
Writes completed in 200.05 seconds

2000 total
282 acknowledged
282 survivors
all 282 acked writes out of 2000 succeeded. :-)

This time not all writes succeeded, since all clients are attempting to write to the same cell, and the chance is high that another client will write to the cell between the read and write of a given client’s read-modify-write operation. However, no data loss occurred, as all 282 successful writes are apparent in the final result.

The third test I wrote, hbase-isolation-app, is based on one of the Cassandra tests. It modifies two cells in the same row while a network partition occurs, to test if row updates are atomic.

lein run hbase-isolation -n 2000
...
0 unrecoverable timeouts
Collecting results.
()
Writes completed in 200.043 seconds

2000 total
2000 acknowledged
2000 survivors
All 2000 writes succeeded. :-D

Nice. Row updates are atomic as HBase claims. However, the above test modifies two cells in the same column family. Let’s try modifying two cells in different column families, but in the same row. From what I understand of HBase, cells in different column families are stored in different HBase “stores”. I ran a fourth test, hbase-isolation-multiple-cf-app, to see if it made a difference.

lein run hbase-isolation-multiple-cf -n 2000
...
0 unrecoverable timeouts
Collecting results.
()
Writes completed in 200.036 seconds

2000 total
2000 acknowledged
2000 survivors
All 2000 writes succeeded. :-D

Move along, nothing to see here.

Finally, HBase claims to have atomic counters. In Dynamo-based systems such as Cassandra and Riak, a counter needs to be a CRDT to behave properly. Let’s see how HBase counters behave. I used a fifth test, hbase-counter-app, that is also based on one of the Cassandra tests.

lein run hbase-counter -n 2000
...
0 unrecoverable timeouts
Collecting results.
Writes completed in 200.045 seconds

2000 total
2000 acknowledged
2000 survivors
All 2000 writes succeeded. :-D

No counter increments were lost.

HBase performed well in all five of the above tests. The claims in its documentation concerning atomic row updates and counter operations held up. I’m looking forward to learning more about HBase, which so far appears to be a very solid database.

Update: see the addendum here.

(*) Aphyr is now working on Jepsen 2. This blog refers to the original Jepsen.

Robert Yokota

Month: September 2015

Call Me Maybe: HBase addendum

Call Me Maybe: HBase