HGraphDB is a client framework for HBase that provides a TinkerPop Graph API. HGraphDB also provides integration with Apache Giraph, a graph compute engine for analyzing graphs that Facebook has shown to be massively scalable. In this blog we will show how to convert a sample Giraph computation that works with text files to instead work with HGraphDB.

In the Giraph quick start, the `SimpleShortestPathsComputation`

is used to show how to run a Giraph computation against a graph contained in a file as a JSON representation. Here are the contents of the JSON file:

[0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]

Each line above has the format `[fromVertexId, vertexValue, [[toVertexId, edgeValue],...]]`

, where the `edgeValue`

is the weight or cost of the edge that will be used for the path computation.

To run the example in the Giraph quick start, the following command line is used:

hadoop jar giraph-examples-1.3.0-SNAPSHOT-for-hadoop-2.5.1-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimpleShortestPathsComputation \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip /user/ryokota/input/tiny_graph.txt \ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op /user/ryokota/output/shortestpaths \ -w 1 -ca giraph.SplitMasterWorker=false

The results of the job will appear in a file under the output path (`/user/ryokota/output/shortestpaths`

), with the following contents:

0 1.0 1 0.0 2 2.0 3 1.0 4 5.0

Now let’s leave that example and consider the exact same graph stored in HGraphDB. The graph above can be created in HGraphDB using the following statements.

Vertex v0 = graph.addVertex(T.id, 0); Vertex v1 = graph.addVertex(T.id, 1); Vertex v2 = graph.addVertex(T.id, 2); Vertex v3 = graph.addVertex(T.id, 3); Vertex v4 = graph.addVertex(T.id, 4); v0.addEdge("e", v1, "weight", 1); v0.addEdge("e", v3, "weight", 3); v1.addEdge("e", v0, "weight", 1); v1.addEdge("e", v2, "weight", 2); v1.addEdge("e", v3, "weight", 1); v2.addEdge("e", v1, "weight", 2); v2.addEdge("e", v4, "weight", 4); v3.addEdge("e", v0, "weight", 3); v3.addEdge("e", v1, "weight", 1); v3.addEdge("e", v4, "weight", 4); v4.addEdge("e", v3, "weight", 4); v4.addEdge("e", v2, "weight", 4);

There is also a class called `HBaseBulkLoader`

that can be used for more efficient creation of larger graphs.

Instead of using the JSON input format above, HGraphDB provides two input formats, `HBaseVertexInputFormat`

and `HBaseEdgeInputFormat`

, which will read from the vertices table and edges table in HBase, respectively. To use these formats, the Giraph computation needs to be changed slightly. Here is the original `SimpleShortestPathsComputation`

:

public class SimpleShortestPathsComputation extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { ... @Override public void compute( Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) throws IOException { if (getSuperstep() == 0) { vertex.setValue(new DoubleWritable(Double.MAX_VALUE)); } double minDist = isSource(vertex) ? 0d : Double.MAX_VALUE; for (DoubleWritable message : messages) { minDist = Math.min(minDist, message.get()); } if (minDist < vertex.getValue().get()) { vertex.setValue(new DoubleWritable(minDist)); for (Edge<LongWritable, FloatWritable> edge : vertex.getEdges()) { double distance = minDist + edge.getValue().get(); sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance)); } } vertex.voteToHalt(); } }

And here is the version for HGraphDB. The main changes are in bold.

public class SimpleShortestPathsComputation extendsHBaseComputation<Long, DoubleWritable, FloatWritable, DoubleWritable>{ ... @Override public void compute(Vertex<ObjectWritable<Long>, VertexValueWritable<DoubleWritable>, EdgeValueWritable<FloatWritable>> vertex, Iterable<DoubleWritable> messages) throws IOException {VertexValueWritable<DoubleWritable> vertexValue = vertex.getValue();if (getSuperstep() == 0) {vertexValue.setValue(new DoubleWritable(Double.MAX_VALUE));} double minDist = isSource(vertex) ? 0d : Double.MAX_VALUE; for (DoubleWritable message : messages) { minDist = Math.min(minDist, message.get()); } if (minDist <vertexValue.getValue().get()) {vertexValue.setValue(new DoubleWritable(minDist));for (Edge<ObjectWritable, EdgeValueWritable> edge : vertex.getEdges()) { double distance = minDist +((Number) edge.getValue().getEdge().property("weight").value()).doubleValue(); sendMessage(edge.getTargetVertexId(), new DoubleWritable(distance)); } } vertex.voteToHalt(); } }

The major difference is that when using `HBaseVertexInputFormat`

, the “value” of a Giraph vertex is an instance of type `VertexValueWritable`

, which is comprised of an `HBaseVertex`

and a `Writable`

value. Likewise when using `HBaseEdgeInputFormat`

, the “value” of a Giraph edge is an instance of type `EdgeValueWritable`

, which is comprised of an `HBaseEdge`

and a `Writable`

value. The instances of `HBaseVertex`

and `HBaseEdge`

should be considered read-only and only be used to obtain IDs and property values.

Running the above Giraph computation against HBase is similar to running the original example. Note that we also have to customize `IdWithValueTextOutputFormat`

to work properly with `VertexValueWritable.`

./hadoop jar hgraphdb-0.4.4-SNAPSHOT-test-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \ io.hgraphdb.giraph.examples.SimpleShortestPathsComputation \ -vif io.hgraphdb.giraph.HBaseVertexInputFormat \ -eif io.hgraphdb.giraph.HBaseEdgeInputFormat \ -vof io.hgraphdb.giraph.examples.IdWithValueTextOutputFormat \ -op /user/ryokota/output/shortestpaths \ -w 1 -ca giraph.SplitMasterWorker=false \ -ca hbase.zookeeper.quorum=127.0.0.1 \ -ca zookeeper.znode.parent=/hbase-unsecure \ -ca gremlin.hbase.namespace=testgraph \ -ca hbase.mapreduce.edgetable=testgraph:edges \ -ca hbase.mapreduce.vertextable=testgraph:vertices

As an alternative to using a text-based output format such as `IdWithValueTextOutputFormat`

, HGraphDB provides two abstract output formats, `HBaseVertexOutputFormat`

and `HBaseEdgeOutputFormat`

, that can be used to modify the graph after a Giraph computation. For example, the shortest path result for each vertex could be set as a property on the vertex by extending `HBaseVertexOutputFormat`

and implementing the method

public abstract void writeVertex(HBaseBulkLoader writer, HBaseVertex vertex, Writable value);

As you can see, HGraphDB extends the functionality in Apache Giraph by making it quite easy to both read and write graphs stored in HBase when performing sophisticated graph analytics.