HGraphDB: HBase as a TinkerPop Graph Database

The use of graph databases is common among social networking companies. A social network can easily be represented as a graph model, so a graph database is a natural fit. For instance, Facebook has a graph database called Tao, Twitter has FlockDB, and Pinterest has Zen. At Yammer, an enterprise social network, we rely on HBase for much of our messaging infrastructure, so I decided to see if HBase could also be used for some graph modelling and analysis.

Below I put together a wish list of what I wanted to see in a graph database.

  • It should be implemented directly on top of HBase.
  • It should support the TinkerPop 3 API.
  • It should allow the user to supply IDs for both vertices and edges.
  • It should allow user-supplied IDs to be either strings or numbers.
  • It should allow property values to be of arbitrary type, including maps, arrays, and serializable objects.
  • It should support indexing vertices by label and property.
  • It should support indexing edges by label and property, specific to a given vertex.
  • It should support range queries and pagination with both vertex indices and edge indices.

I did not find a graph database that met all of the above criteria. For instance, Titan is a graph database that supports the TinkerPop API, but it is not implemented directly on HBase. Rather, it is implemented on top of an abstraction layer that can be integrated with HBase, Cassandra, or Berkeley DB as its underlying store. Also, Titan does not support user-supplied IDs. S2Graph is a graph database that is implemented directly on HBase, and it supports both user-supplied IDs and indices on edges, but it does not yet support the TinkerPop API nor does it support indices on vertices.

This led me to create HGraphDB, a TinkerPop 3 layer for HBase. It provides support for all of the above bullet points. Feel free to try it out if you are interested in using HBase as a graph database.

HGraphDB: HBase as a TinkerPop Graph Database

2 thoughts on “HGraphDB: HBase as a TinkerPop Graph Database

  1. Great stuff! Apart from not supporting user-provided IDs, was there anything else that you didn’t find fit enough in Titan? You mention that it’s an abstraction layer rather than being built directly on top of hbase. But is that a problem? Are performance differences noticeable?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s