I previously showed how to build a relational database using Kafka. This time I’ll show how to build a graph database using Kafka. Just as with KarelDB, at the heart of our graph database will be the embedded key-value store, KCache.
Kafka as a Graph Database
The graph database that I’m most familiar with is HGraphDB, a graph database that uses HBase as its backend. More specifically, it uses the HBase client API, which allows it to integrate with not only HBase, but also any other data store that implements the HBase client API, such as Google BigTable. This leads to an idea. Rather than trying to build a new graph database around KCache entirely from scratch, we can try to wrap KCache with the HBase client API.
HBase is an example of a wide column store, also known as an extensible record store. Like its predecessor BigTable, it allows any number of column values to be associated with a key, without requiring a schema. For this reason, a wide column store can also be seen as two-dimensional key-value store.1
I’ve implemented KStore as a wide column store (or extensible record store) abstraction for Kafka that relies on KCache under the covers. KStore implements the HBase client API, so it can be used wherever the HBase client API is supported.
Let’s try to use KStore with HGraphDB. After installing and starting the Gremlin console, we install KStore and HGraphDB.
$ ./bin/gremlin.sh \,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin> :install org.apache.hbase hbase-client 2.2.1 gremlin> :install org.apache.hbase hbase-common 2.2.1 gremlin> :install org.apache.hadoop hadoop-common 3.1.2 gremlin> :install io.kstore kstore 0.1.0 gremlin> :install io.hgraphdb hgraphdb 3.0.0 gremlin> :plugin use io.hgraphdb
After we restart the Gremlin console, we configure HGraphDB with the KStore connection class and the Kafka bootstrap servers.2 We can then issue Gremlin commands against Kafka.
$ ./bin/gremlin.sh \,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: io.hgraphdb plugin activated: tinkerpop.tinkergraph gremlin> cfg = new HBaseGraphConfiguration()\ ......1> .set("hbase.client.connection.impl", "io.kstore.KafkaStoreConnection")\ ......2> .set("kafkacache.bootstrap.servers", "localhost:9092") ==>io.hgraphdb.HBaseGraphConfiguration@41b0ae4c gremlin> graph = new HBaseGraph(cfg) ==>hbasegraph[hbasegraph] gremlin> g = graph.traversal() ==>graphtraversalsource[hbasegraph[hbasegraph], standard] gremlin> v1 = g.addV('person').property('name','marko').next() ==>v[0371a1db-8768-4910-94e3-7516fc65dab3] gremlin> v2 = g.addV('person').property('name','stephen').next() ==>v[3bbc9ce3-24d3-41cf-bc4b-3d95dbac6589] gremlin> g.V(v1).addE('knows').to(v2).property('weight',2).iterate()
It works! HBaseGraph is now using Kafka as its storage backend.
Kafka as a Document Database
Now that we have a wide column store abstraction for Kafka in the form of KStore, let’s see what else we can do with it. Another database that uses the HBase client API is HDocDB, a document database for HBase. To use KStore with HDocDB, first we need to set hbase.client.connection.impl
in our hbase-site.xml
as follows.
<configuration> <property> <name>hbase.client.connection.impl</name> <value>io.kstore.KafkaStoreConnection</value> </property> <property> <name>kafkacache.bootstrap.servers</name> <value>localhost:9092</value> </property> </configuration>
Now we can issue MongoDB-like commands against Kafka, using HDocDB.3
$ jrunscript -cp <hbase-conf-dir>:target/hdocdb-1.0.1.jar:../kstore/target/kstore-0.1.0.jar -f target/classes/shell/hdocdb.js -f - nashorn> db.mycoll.insert( { _id: "jdoe", first_name: "John", last_name: "Doe" } ) nashorn> var doc = db.mycoll.find( { last_name: "Doe" } )[0] nashorn> print(doc) {"_id":"jdoe","first_name":"John","last_name":"Doe"} nashorn> db.mycoll.update( { last_name: "Doe" }, { $set: { first_name: "Jim" } } ) nashorn> var doc = db.mycoll.find( { last_name: "Doe" } )[0] nashorn> print(doc) {"_id":"jdoe","first_name":"Jim","last_name":"Doe"}
Pretty cool, right?
Kafka as a Wide Column Store
Of course, there is no requirement to wrap KStore with another layer in order to use it. KStore can be used directly as a wide column store abstraction on top of Kafka. I’ve integrated KStore with the HBase Shell so that one can work directly with KStore from the command line.
$ ./kstore-shell.sh localhost:9092 hbase(main):001:0> create 'test', 'cf' Created table test Took 0.2328 seconds => Hbase::Table - test hbase(main):003:0* list TABLE test 1 row(s) Took 0.0192 seconds => ["test"] hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' Took 0.1284 seconds hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2' Took 0.0113 seconds hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3' Took 0.0096 seconds hbase(main):007:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1578763986780, value=value1 row2 column=cf:b, timestamp=1578763992567, value=value2 row3 column=cf:c, timestamp=1578763996677, value=value3 3 row(s) Took 0.0233 seconds hbase(main):008:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1578763986780, value=value1 1 row(s) Took 0.0106 seconds hbase(main):009:0>
There’s no limit to the type of fun one can have with KStore. 🙂
Back to Graphs
Getting back to graphs, another popular graph database is JanusGraph, which is interesting because it has a pluggable storage layer. Some of the storage backends that it supports through this layer are HBase, Cassandra, and BerkeleyDB.
Of course, KStore can be used in place of HBase when configuring JanusGraph. Again, it’s simply a matter of configuring the KStore connection class in the JanusGraph configuration.
storage.hbase.ext.hbase.client.connection.impl: io.kstore.KafkaStoreConnection storage.hbase.ext.kafkacache.bootstrap.servers: localhost:9092
However, we can do better when integrating JanusGraph with Kafka. JanusGraph can be integrated with any storage backend that supports a wide column store abstraction. When integrating with key-value stores such as BerkeleyDB, JanusGraph provides its own adapter for mapping a key-value store to a wide column store. Thus we can simply provide KCache to JanusGraph as a key-value store, and it will perform the mapping to a wide column store abstraction for us automatically.
I’ve implemented a new storage plugin for JanusGraph called janusgraph-kafka that does exactly this. Let’s try it out. After following the instructions here, we can start the Gremlin console.
$ ./bin/gremlin.sh \,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.tinkergraph plugin activated: tinkerpop.hadoop plugin activated: tinkerpop.spark plugin activated: tinkerpop.utilities plugin activated: janusgraph.imports gremlin> graph = JanusGraphFactory.open('conf/janusgraph-kafka.properties') ==>standardjanusgraph[io.kcache.janusgraph.diskstorage.kafka.KafkaStoreManager:[127.0.0.1]] gremlin> g = graph.traversal() ==>graphtraversalsource[standardjanusgraph[io.kcache.janusgraph.diskstorage.kafka.KafkaStoreManager:[127.0.0.1]], standard] gremlin> v1 = g.addV('person').property('name','marko').next() ==>v[4320] gremlin> v2 = g.addV('person').property('name','stephen').next() ==>v[4104] gremlin> g.V(v1).addE('knows').to(v2).property('weight',2).iterate()
Works like a charm.
Summary
In this and the previous post, I’ve shown how Kafka can be used as
- a key-value store, using KCache
- a wide column store, using KStore
- a relational database, using KarelDB
- a document database, using HDocDB and KStore
- a graph database, using HGraphDB and KStore, as well as JanusGraph and KCache
I guess I could have titled this post “Building a Graph Database, Document Database, and Wide Column Store Using Kafka”, although that’s a bit long. In any case, hopefully I’ve shown that Kafka is a lot more versatile than most people realize.