Understanding Protobuf Compatibility

In a recent article, I presented rules for evolving JSON Schemas in a backward, forward, and fully compatible manner. The backward compatibility rules for Protocol Buffers (Protobuf) are much simpler, and most of them are outlined in the section “Updating A Message Type” in the Protobuf Language Guide.

The Confluent Schema Registry uses these rules to check for compatibility when evolving a Protobuf schema. However, unlike some other tools, the Confluent Schema Registry does not require that a removed field be marked as reserved when evolving a schema in a backward compatible manner. The reserved keyword can be used to prevent incompatible changes in future versions of a schema. Instead of requiring the reserved keyword, the Confluent Schema Registry achieves compatibility across multiple versions by allowing compatibility checks to be performed transitively. The reserved keyword is most useful for those tools that do not perform transitive compatibility checks. Using it when removing fields is still considered a best practice, however.

For the Protobuf oneof construct, there are four additional compatibility issues that are mentioned in the Protobuf Language Guide.  Since the text of the language guide is a bit terse, for the rest of this article I’ll discuss each of these four issues further. First, let me reproduce the text below.

  1. Be careful when adding or removing oneof fields. If checking the value of a oneof returns None/NOT_SET, it could mean that the oneof has not been set or it has been set to a field in a different version of the oneof. There is no way to tell the difference, since there’s no way to know if an unknown field on the wire is a member of the oneof.
  2. Move fields into or out of a oneof: You may lose some of your information (some fields will be cleared) after the message is serialized and parsed. However, you can safely move a single field into a new oneof and may be able to move multiple fields if it is known that only one is ever set.
  3. Delete a oneof field and add it back: This may clear your currently set oneof field after the message is serialized and parsed.
  4. Split or merge oneof: This has similar issues to moving regular fields.

Each of the above issues can cause information to be lost. For that reason, these are considered backward compatibility issues.

Adding or removing oneof fields

Let’s look at an example of when a oneof field is removed.  Let’s say we have the following Protobuf message with fields f1 and f2 in a oneof.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
    string f2 = 2;
  }
}

Later we decide to remove the field f2.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
  }
}

If an old binary with the first version of SampleMessage creates a message and sets f2 in test_oneof, then a new binary with the second version of SampleMessage will see test_oneof as unset. The value of f2 will be contained in the unknown fields of the message, but there’s no way to tell which, if any, of the unknown fields were previously in test_oneof. The information that is lost is whether the oneof was really set or not.

This behavior may be unexpected when comparing it to how Protobuf handles removing a value from an enum. Any unrecognized enum values encountered on the wire are retained by the binary, and a field using the enum will still appear as set (to an unrecognized value) if the binary encounters a value that is no longer enumerated in the enum.

The information loss from removing a oneof field can cascade and cause other information to be lost, depending on the scenario. For example, if the new binary reads the message with f2 set by the old binary and modifies the same message by setting f1, then when the old binary reads the modified message, the old binary may see f2 as still set! In this scenario, f2 can still appear as set because setting f1 in the new binary does not clear the unknown field for f2, since for the new binary, f2 is not associated with test_oneof. When reading from the wire, the old binary may read f2 after f1, and then clear f1. This assumes that f2 is deserialized after f1. The value of f1 will also be lost for the old binary.

Furthermore, when reading the modified message, a binary in a different programming language may read f1 after f2, thus clearing the value of f2. In this case, the value for f2 will be lost. The actual serialization order is implementation-specific and subject to change. Most implementations serialize fields in ascending order of their field numbers, but this is not guaranteed, especially in the presence of unknown fields. Also, inconsistencies in how various programming languages handle unknown fields prevents a canonical serialization order from being defined. The Protobuf Encoding documentation states the following:

  • By default, repeated invocations of serialization methods on the same protocol buffer message instance may not return the same byte output; i.e. the default serialization is not deterministic.
  • Deterministic serialization only guarantees the same byte output for a particular binary. The byte output may change across different versions of the binary.

So even a later version of the binary in the same programming language may see different results.

For these reasons, removing a field from a oneof is considered a backward incompatible change. Likewise, adding a field to a oneof is considered a forward incompatible change (since a forward compatibility check is just a backward compatibility check with the schemas switched).

Moving fields into or out of a oneof

Consider the following Protobuf message with f1 in a oneof and f2 and f3 outside of the oneof.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
  }
  string f2 = 2;
  string f3 = 3;
}

Later we decide to move both f2 and f3 into test_oneof.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
    string f2 = 2;
    string f3 = 3;
  }
}

If an old binary with the first version of SampleMessage creates a message and sets f1, f2, and f3, then when a new binary with the second version of SampleMessage reads the message from the wire, it will clear two of the fields (depending on serialization order) and leave only one field, say f3, with its value. Thus the values of the other two fields will be lost.

As mentioned, the order in which fields are serialized is implementation-specific. For the fields in the oneof, only the value of the last field read from the wire is retained. Therefore, an additional problem is that a different field in the oneof may appear as set when using a different implementation.

For these reasons, moving fields into a oneof is considered a backward incompatible change, unless the oneof is new and only a single field is moved into it. Moving fields out of a oneof is also a backward incompatible change, since this has the effect of removing the fields from the oneof.

Deleting a oneof and adding it back

In this scenario there are three versions involved. The first version has f1 and f2 in a oneof.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
    string f2 = 2;
  }
}

In the second version, we remove the field f2 from the oneof.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
  }
}

Later we add the field f2 back to the oneof.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
    string f2 = 2;
  }
}

If an old binary with the first version of SampleMessage creates a message and sets f2 in test_oneof, then next a binary with the second version reads the message and sets f1 in test_oneof, and finally a later binary with the third version reads the modified message, it may see f2 as set, and the value of f1 as lost. This is a similar scenario to that described above involving only two binaries when removing a field from the oneof. Here the modified message is being read by a third later binary, rather than by the first original binary in the scenario involving only two binaries.

Splitting or merging a oneof

Consider a Protobuf message with two oneof constructs.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
  }
  oneof test_oneof {
    string f2 = 2;
  }
}

Later we decide to merge the oneof constructs.

message SampleMessage {
  oneof test_oneof {
    string f1 = 1;
    string f2 = 2;
  }
}

If an old binary with the first version of SampleMessage creates a message and sets both f1 and f2, then when a new binary with the second version of SampleMessage reads the message, it will see only one of the fields as set, with the value of the other field being lost. Again, which field is set and which is lost depends on the implementation. This issue is similar to that described above involving moving existing fields into a oneof.

Summary

In Protobuf, evolving schemas with oneof constructs can be problematic for a couple of reasons:

  1. Information may be lost, such as the values of fields that were moved into a oneof, and whether a oneof was set or not. The information loss can cascade and result in further loss, such as when an unknown field reappears in a oneof, causing the previous value of the oneof to be cleared.
  2. A schema change can unfortunately cause multiple fields which have been set to be pulled into the same oneof. If this happens, all of the fields except one will be cleared. Since the order in which fields are serialized to the wire is implementation-specific and subject to change, especially in the presence of unknown fields, the field that is retained may be different between implementations.

For these reasons, Confluent Schema Registry implements two backward compatibility checks for the oneof construct:

  1. Removing a field from a oneof is a backward incompatible change.
  2. Moving several existing fields into a oneof is a backward incompatible change, unless it is a new oneof with a single field.
Understanding Protobuf Compatibility

The Enterprise is Made of Events, Not Things

“The world is made of events, not things.” — Carlo Rovelli

“Every company is becoming software.” — Jay Kreps

In The Order of Time, the physicist Carlo Rovelli argues that the theory of relativity compels us to view the world not as made of things or entities, but as events or processes. In the world of technology, Jay Kreps argues that the core processes that a company executes are increasingly being captured in software, and these processes consume and produce the business events that drive the company. From the viewpoint of both Rovelli and Kreps, one can view a company as made of events or processes.

The dichotomy between events and things (with state) has been noted by many. Martin Kleppmann captured it elegantly in his book Designing Data-Intensive Applications as follows:

One technique often used in Domain-Driven Development (DDD) is event sourcing, which derives application state from immutable business events. Event sourcing often involves arbitrary logic to derive the state. A more formal model would use a finite-state machine (FSM) to derive the state.

In a software architecture involving a network of FSMs, each FSM can perform state transitions when receiving events, produce events for other FSMs to consume, and persist some internal data. With FSMs, one can implement several other models, such as the following:

  • Simple CRUD entities.  This is the most common use-case for event sourcing.
  • Function as a Service (FaaS).  In this case, each FSM only has one state, and a single transition to and from that state, during which it performs the function.
  • Actor model.  In the actor model, actors receive and send events, but are otherwise passive when no events occur.
  • Intelligent agents (IA).  Intelligent agents are similar to actors in that they receive and send events, but are generally viewed as continuously active in order to achieve some goal.

In the rest of this article I’ll show how to implement a network of intelligent agents using Kafka Streams and finite-state machines.

The implementation is comprised of two parts:

  1. An FSM implementation that sits atop Kafka Streams, called a KMachine.  A KMachine definition is comprised of a set of states, a set of state transitions, some internal data, and a set of functions that can be attached to state transitions.  The entire KMachine definition can be expressed in YAML, in which the functions are written as JavaScript.
  2. A REST-based web application that can be used to create and manage both KMachine definitions and instances. A KMachine instance is created for each unique key in the input stream.

To demonstrate how KMachine can be used to implement a network of intelligent agents, I’ve borrowed an example from “Programming Game AI By Example,” by Mat Buckland.  In this example, two intelligent agents inhabit a gaming environment that represents a miners’ town in the Wild West.  One agent is a gold miner, and the other agent is the miner’s wife.

As a preview, here is sample output of the interaction between the miner and his wife:

Miner Bob: Walkin' to the goldmine
Miner Bob: Pickin' up a nugget
Elsa: Makin' the bed
Miner Bob: Pickin' up a nugget
Elsa: Makin' the bed
Miner Bob: Pickin' up a nugget
Miner Bob: Ah'm leavin' the goldmine with mah pockets full o' sweet gold
Miner Bob: Goin' to the bank. Yes siree
Miner Bob: Depositing gold. Total savings now: 3
Miner Bob: Leavin' the bank
Miner Bob: Walkin' to the goldmine
Miner Bob: Pickin' up a nugget
Elsa: Washin' the dishes
Miner Bob: Pickin' up a nugget
Elsa: Moppin' the floor
Miner Bob: Pickin' up a nugget
Miner Bob: Ah'm leavin' the goldmine with mah pockets full o' sweet gold
Miner Bob: Goin' to the bank. Yes siree
Miner Bob: Depositing gold. Total savings now: 6
Miner Bob: WooHoo! Rich enough for now. Back home to mah li'lle lady
Miner Bob: Leavin' the bank
Miner Bob: Walkin' home
Elsa: Hi honey. Let me make you some of mah fine country stew
Miner Bob: ZZZZ... 
Elsa: Putting the stew in the oven
Elsa: Fussin' over food
Miner Bob: ZZZZ... 
Elsa: Fussin' over food
Elsa: Puttin' the stew on the table
Elsa: StewReady! Lets eat
Miner Bob: All mah fatigue has drained away. Time to find more gold!
Elsa: Time to do some more housework!
Miner Bob: Walkin' to the goldmine

Both the miner and his wife are implemented as separate KMachine definitions. Here is the KMachine definition that represents the miner:

name: miner
input: miner
init: goHomeAndSleepTilRested
states:
  - name: enterMineAndDigForNugget
    onEntry: enterMineAction
    onExit: exitMineAction
  - name: visitBankAndDepositGold
    onEntry: enterBankAction
    onExit: exitBankAction
  - name: goHomeAndSleepTilRested
    onEntry: enterHomeAction
    onExit: exitHomeAction
  - name: quenchThirst
    onEntry: enterSaloonAction
    onExit: exitSaloonAction
  - name: eatStew
    onEntry: startEatingAction
    onExit: finishEatingAction
transitions:
  - type: stayInMine
    from: enterMineAndDigForNugget
    to:
    guard:
    onTransition: stayInMineAction
  - type: visitBank
    from: enterMineAndDigForNugget
    to: visitBankAndDepositGold
    guard:
    onTransition:
  - type: quenchThirst
    from: enterMineAndDigForNugget
    to: quenchThirst
    guard:
    onTransition:
  - type: goHome
    from: visitBankAndDepositGold
    to: goHomeAndSleepTilRested
    guard:
    onTransition:
  - type: enterMine
    from: visitBankAndDepositGold
    to: enterMineAndDigForNugget
    guard:
    onTransition:
  - type: enterMine
    from: goHomeAndSleepTilRested
    to: enterMineAndDigForNugget
    guard:
    onTransition:
  - type: enterMine
    from: quenchThirst
    to: enterMineAndDigForNugget
    guard:
    onTransition:
  - type: stayHome
    from: goHomeAndSleepTilRested
    to:
    guard:
    onTransition: stayHomeAction
  - type: stewReady
    from: goHomeAndSleepTilRested
    to: eatStew
    guard:
    onTransition: imComingAction
  - type: finishEating
    from: eatStew
    to: goHomeAndSleepTilRested
    guard:
    onTransition:
data:
  location: shack
  goldCarried: 0
  moneyInBank: 0
  thirst: 0
  fatigue: 0
functions:
  enterMineAction: >-
    (ctx, key, value, data) => {
      if (data.location != 'goldMine') {
        console.log("Miner " + key + ": Walkin' to the goldmine");
        data.location = 'goldMine';
      }
      ctx.sendMessage(ctx.topic(), key, { type: 'stayInMine' }, 0);
    }
  stayInMineAction: >-
    (ctx, key, value, data) => {
      data.goldCarried++;
      data.fatigue++;
      console.log("Miner " + key + ": Pickin' up a nugget");
      if (data.goldCarried >= 3) {
        ctx.sendMessage(ctx.topic(), key, { type: 'visitBank' }, 0);
      } else if (data.thirst >= 5) {
        ctx.sendMessage(ctx.topic(), key, { type: 'quenchThirst' }, 0);
      } else {
        ctx.sendMessage(ctx.topic(), key, { type: 'stayInMine' }, 1000);
      }
    }
  exitMineAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Ah'm leavin' the goldmine with mah pockets full o' sweet gold");
    }
  enterBankAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Goin' to the bank. Yes siree");
      data.location = 'bank';
      data.moneyInBank += data.goldCarried;
      data.goldCarried = 0;
      console.log("Miner " + key + ": Depositing gold. Total savings now: " + data.moneyInBank);
      if (data.moneyInBank >= 5) {
        console.log("Miner " + key + ": WooHoo! Rich enough for now. Back home to mah li'lle lady");
        ctx.sendMessage(ctx.topic(), key, { type: 'goHome' }, 0);
      } else {
        ctx.sendMessage(ctx.topic(), key, { type: 'enterMine' }, 0);
      }
    }
  exitBankAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Leavin' the bank");
    }
  enterHomeAction: >-
    (ctx, key, value, data) => {
      if (data.location != 'shack') {
        console.log("Miner " + key + ": Walkin' home");
        data.location = 'shack';
        if (data.wife) {
          ctx.sendMessage('miners_wife', data.wife, { type: 'hiHoneyImHome' }, 0);
        }
      }
      ctx.sendMessage(ctx.topic(), key, { type: 'stayHome' }, 0);
    }
  stayHomeAction: >-
    (ctx, key, value, data) => {
      if (value.wife) {
        data.wife = value.wife;
      }
      if (data.fatigue < 5) {
        console.log("Miner " + key + ": All mah fatigue has drained away. Time to find more gold!");
        data.location = 'shack';
        ctx.sendMessage(ctx.topic(), key, { type: 'enterMine' }, 0);
      } else {
        data.fatigue--;
        console.log("Miner " + key + ": ZZZZ... ");
        ctx.sendMessage(ctx.topic(), key, { type: 'stayHome' }, 1000);
      }
    }
  exitHomeAction: >-
    (ctx, key, value, data) => {
    }
  enterSaloonAction: >-
    (ctx, key, value, data) => {
      if (data.moneyInBank >= 2) {
        data.thirst = 0;
        data.moneyInBank -= 2;
        console.log("Miner " + key + ": That's mighty fine sippin liquer");
      }
      ctx.sendMessage(ctx.topic(), key, { type: 'enterMine' }, 0);
    }
  exitSaloonAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Leavin' the saloon, feelin' good");
    }
  imComingAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Okay Hun, ahm a comin'!");
    }
  startEatingAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Smells Reaaal goood Elsa!");
      console.log("Miner " + key + ": Tastes real good too!");
      ctx.sendMessage(ctx.topic(), key, { type: 'finishEating' }, 0);
    }
  finishEatingAction: >-
    (ctx, key, value, data) => {
      console.log("Miner " + key + ": Thankya li'lle lady. Ah better get back to whatever ah wuz doin'");
    }
 

The state transition diagram for the miner, generated using the DOT graph language, is shown below.

Next is the KMachine definition that represents the miner’s wife:

name: minersWife
input: miners_wife
init: doHouseWork
states:
  - name: doHouseWork
    onEntry: startHouseWorkAction
    onExit:
  - name: visitBathroom
    onEntry: enterBathroomAction
    onExit: exitBathroomAction
  - name: cookStew
    onEntry: startCookingAction
    onExit: finishCookingAction
transitions:
  - type: continueHouseWork
    from: doHouseWork
    to:
    guard:
    onTransition: continueHouseWorkAction
  - type: natureCalls
    from: doHouseWork
    to: visitBathroom
    guard:
    onTransition:
  - type: natureCalls
    from: cookStew
    to: visitBathroom
    guard:
    onTransition:
  - type: continuePrevious
    from: visitBathroom
    to: revertToPreviousState
    toType: Function
    guard:
    onTransition:
  - type: hiHoneyImHome
    from: doHouseWork
    to: cookStew
    guard:
    onTransition: hiHoneyAction
  - type: hiHoneyImHome
    from: visitBathroom
    to: cookStew
    guard:
    onTransition: hiHoneyAction
  - type: continueCooking
    from: cookStew
    to:
    guard:
    onTransition: continueCookingAction
  - type: stewReady
    from: cookStew
    to: doHouseWork
    guard:
    onTransition: letsEatAction
data:
  location: shack
  cooking: false
functions:
  startHouseWorkAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": Time to do some more housework!");
      ctx.sendMessage(ctx.topic(), key, { type: 'continueHouseWork' }, 0);
    }
  continueHouseWorkAction: >-
    (ctx, key, value, data) => {
      if (value.husband) {
        data.husband = value.husband;
      }
      switch (Math.floor(Math.random() * 3)) {
        case 0:
          console.log(key + ": Moppin' the floor");
          break;
        case 1:
          console.log(key + ": Washin' the dishes");
          break;
        case 2:
          console.log(key + ": Makin' the bed");
          break;
      }
      if (Math.random() < 0.1) {
        ctx.sendMessage(ctx.topic(), key, { type: 'natureCalls' }, 0);
      } else {
        ctx.sendMessage(ctx.topic(), key, { type: 'continueHouseWork' }, 1000);
      }
    }
  enterBathroomAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": Walkin' to the can. Need to powda mah pretty li'lle nose");
      console.log(key + ": Ahhhhhh! Sweet relief!");
      ctx.sendMessage(ctx.topic(), key, { type: 'continuePrevious' }, 0);
    }
  exitBathroomAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": Leavin' the Jon");
    }
  revertToPreviousState: >-
    (ctx, key, value, data) => {
      return data.cooking ? 'cookStew' : 'doHouseWork'
    }
  hiHoneyAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": Hi honey. Let me make you some of mah fine country stew");
    }
  startCookingAction: >-
    (ctx, key, value, data) => {
      if (!data.cooking) {
        console.log(key + ": Putting the stew in the oven");
        ctx.sendMessage(ctx.topic(), key, { type: 'stewReady' }, 2000);
        data.cooking = true;
      }
      ctx.sendMessage(ctx.topic(), key, { type: 'continueCooking' }, 0);
    }
  continueCookingAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": Fussin' over food");
      if (Math.random() < 0.1) {
        ctx.sendMessage(ctx.topic(), key, { type: 'natureCalls' }, 0);
      } else {
        ctx.sendMessage(ctx.topic(), key, { type: 'continueCooking' }, 1000);
      }
    }
  finishCookingAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": Puttin' the stew on the table");
    }
  letsEatAction: >-
    (ctx, key, value, data) => {
      console.log(key + ": StewReady! Lets eat");
      if (data.husband) {
        ctx.sendMessage('miner', data.husband, { type: 'stewReady' }, 0);
      }
      data.cooking = false;
    }
 

The state transition diagram for the miner’s wife is shown below.

Each KMachine definition, for both the miner and his wife, is entirely contained in the YAML above. A KMachine definition describes an FSM as follows:

  • name – The id of the definition for the FSM.
  • input – The input topic.
  • init – The initial state of the FSM.
  • states – The states of the FSM.  Each state can have the following:
    • name – The name of the state.
    • onEntry – The function to invoke on entry to the state.
    • onExit – The function to invoke on exit of the state.
  • transitions – The state transitions.  Each transition can have the following:
    • type – The event type that triggers the transition.
    • from – The source state.
    • to – The destination, which can be
      • The name of the destination state.
      • The function to determine the destination state.
      • null, which represents an internal state transition, where the state does not change, the onEntry or onExit functions are not invoked, but the guard and onTransition functions are invoked.
    • toType – Either “State” or “Function”.
    • guard – A boolean function to indicate whether the transition should occur.
    • onTransition – The function to invoke on transition.
  • data – A set of key-value pairs that represent the internal data for the FSM.
  • functions – A set of JavaScript functions that can be attached to states and transitions.  Each function takes the following parameters:
    • ctx – A context object that provides the following methods:
      • topic() – The input topic.
      • sendMessage() – Sends a message to a topic.
    • key – The key of the event, as a JSON message.
    • value – The value of the event, as a JSON message.  The value is expected to have a property named “type” to trigger state transitions.
    • data – The local data, which can be mutated.

To see the miner and his wife in action, you’ll first need to start a local instance of Kafka. Then create two topics, one for the miner and one for his wife.

./bin/kafka-topics --create --topic miner --bootstrap-server localhost:9092
./bin/kafka-topics --create --topic miners_wife --bootstrap-server localhost:9092
 

Next, clone the project at https://github.com/rayokota/kmachines and start up the web application.

git clone https://github.com/rayokota/kmachines.git
cd kmachines
mvn clean install -DskipTests
mvn -pl kmachines-rest-app compile quarkus:dev -Dquarkus.http.port=8081
 

In a separate window, create the KMachine definitions for the miner and his wife.

cd kmachines
curl -X POST -H "Content-Type: text/yaml" --data-binary @kmachines-rest-app/src/test/resources/miner_messaging.yml "http://localhost:8081/kmachines"
curl -X POST -H "Content-Type: text/yaml" --data-binary @kmachines-rest-app/src/test/resources/miners_wife_messaging.yml "http://localhost:8081/kmachines"
 

Now produce an event to create a KMachine instance for a miner named “Bob”, and another event to create a KMachine instance for his wife named “Elsa”. This can be done with kafkacat, for example. Events are represented as a pair of JSON messages for the key and value. The key corresponds to a unique KMachine instance, while the value is the message used to trigger state transitions, which must have a “type” property to indicate the event type. In the command below, a dot (.) is used to separate the key from the value (using the -K option of kafkacat).

echo '"Bob".{ "type": "stayHome", "wife": "Elsa" }' | kafkacat -b localhost:9092 -K . -P -t miner
echo '"Elsa".{ "type": "continueHouseWork", "husband": "Bob" }' | kafkacat -b localhost:9092 -K . -P -t miners_wife
 

You should see the miner and his wife interacting as above. You can query the state of the FSM for the miner or the wife at any time.

curl -X POST -H "Content-Type: application/json" http://localhost:8081/kmachines/miner/state --data '"Bob"'
curl -X POST -H "Content-Type: application/json" http://localhost:8081/kmachines/minersWife/state --data '"Elsa"'
 

To stop the agents, run the following commands.

curl -X DELETE "http://localhost:8081/kmachines/minersWife" 
curl -X DELETE "http://localhost:8081/kmachines/miner" 
 

That’s it!

The Enterprise is Made of Events, Not Things

Understanding JSON Schema Compatibility

Confluent Schema Registry provides a centralized repository for an organization’s schemas and a version history of the schemas as they evolve over time.  The first format supported by Schema Registry was Avro. Avro was developed with schema evolution in mind, and its specification clearly states the rules for backward compatibility, where a schema used to read an Avro record may be different from the schema used to write it.

In addition to Avro, today Schema Registry supports both Protobuf and JSON Schema. JSON Schema does not explicitly define compatibility rules, so in this article I will explain some nuances of how compatibility works for JSON Schema.

Grammar-Based and Rule-Based Schema Languages

In general, schema languages can be either grammar-based or rule-based.1 Grammar-based languages are used to specify the structure of a document instance. Both Avro and Protobuf are grammar-based schema languages. Rule-based languages typically specify a set of boolean constraints that the document must satisfy.

JSON Schema combines aspects of both a grammar-based language and a rule-based one. Its rule-based nature can be seen by its use of conjunction (allOf), disjunction (oneOf), negation (not), and conditional logic (if/then/else). The elements of these boolean operators tend to be grammar-based constraints, such as constraining the type of a property.

Open and Closed Content Models

A JSON Schema can be represented as a JSON object or a boolean. In the case of a boolean, the value true will match any valid JSON document, whereas the value false will match no documents. The value true is a synonym for {}, which is a JSON schema (represented as an empty JSON object) with no constraints. Likewise, the value false is a synonym for { "not": {} }.

By default, a JSON schema provides an open content model. For example, the following JSON schema constrains the properties “foo” and “bar” to be of type “string”, but allows any additional properties of arbitrary type.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  }
}

The above schema would accept the following JSON document, containing a property named “zap” that does not appear in the schema:

{ 
  "foo": "hello",
  "bar": "world",
  "zap": 123
}

In order to specify a closed content model, in which additional properties such as “zap” would not be accepted, the schema can be specified with “additionalProperties” as false.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  "additionalProperties": false
}

Backward, Forward, and Full Compatibility

In terms of schema evolution, there are three types of compatibility2:

  1. Backward compatibility – all documents that conform to the previous version of the schema are also valid according to the new version
  2. Forward compatibility – all documents that conform to the new version are also valid according to the previous version of the schema
  3. Full compatibility – the previous version of the schema and the new version are both backward compatible and forward compatible

For the schemas above, Schema 1 is backward compatible with Schema 2, which implies that Schema 2 is forward compatible with Schema 1. This is because any document that conforms to Schema 2 will also conform to Schema 1. Since the default value of “additionalProperties” is true, Schema 1 is equivalent to

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  "additionalProperties": true
}

Note that the schema true is backward compatible with false. In fact

  1. The schema true (or {}) is backward compatible with all schemas.
  2. The only schema backward compatible with true is true.
  3. The schema false (or { "not": {} }) is forward compatible with all schemas.
  4. The only schema forward compatible with false is false.

Partially Open Content Models

You may want to allow additional unspecified properties, but only of a specific type. In these scenarios, you can use a partially open content model. One way to specify a partially open content model is to specify a schema other than true or false for “additionalProperties”.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  "additionalProperties": { "type": "string" }
}

The above schema would accept a document containing a string value for “zap”:

{ 
  "foo": "hello",
  "bar": "world",
  "zap": "champ"
}

but not a document containing an integer value for “zap”:

{ 
  "foo": "hello",
  "bar": "world",
  "zap": 123
}

Later one could explicitly specify “zap” as a property with type “string”:

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" },
    "zap": { "type": "string" }
  },
  "additionalProperties": { "type": "string" }
}

Schema 5 is backward compatible with Schema 4.

One could even accept other types for “zap”, using a oneOf for example.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" },
    "zap": { 
      "oneOf": [ { "type": "string" }, { "type": "integer" } ] 
    }
  },
  "additionalProperties": { "type": "string" }
}

Schema 6 is also backward compatible with Schema 4.

Another type of partially open content model is one that constrains the additional properties with a regular expression for matching the property name, using a patternProperties construct.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  patternProperties": {
    "^s_": { "type": "string" }
  },
  "additionalProperties": false
}

The above schema allows any other properties other than “foo” and “bar” to appear, as long as the property name starts with “s_” and the type is “string”.

Understanding Full Compatibility

When evolving a schema in a backward compatible manner, it’s easy to add properties to a closed content model, or to remove properties from an open content model. In general, there are two rules to follow to evolve a schema in a backward compatible manner:

  1. When adding a property in a backward compatible manner, the schema of the property being added must be backward compatible with the schema of “additionalProperties” in the previous version of the schema.
  2. When removing a property in a backward compatible manner, the schema of “additionalProperties” in the new version of the schema must be backward compatible with the schema of the property being removed.

The rules for forward compatibility are similar.

  1. When adding a property in a forward compatible manner, the schema of the property being added must be forward compatible with the schema of “additionalProperties” in the previous version of the schema.
  2. When removing a property in a forward compatible manner, the schema of “additionalProperties” in the new version of the schema must be forward compatible with the schema of the property being removed.

For example, to add a property to an open content model, such as Schema 3, in a backward compatible manner, one can add it with type true, since true is the only schema that is backward compatible with true, as previously mentioned.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" },
    "zap": true
  }
  "additionalProperties": true
}

The property “zap” has been added, but it’s been specified with type true, which means that it can match any valid JSON. Schema 3 and Schema 8 are also fully compatible, since they both accept the same set of documents.

This leads to a way to evolve a closed content model, such as Schema 2, in a fully compatible manner, by adding a property of type false.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" },
    "zap": false
  }
  "additionalProperties": false
}

Admittedly, Schema 9 is not very interesting, because in this case the property “zap” matches nothing.

The rules for full compatibility can now be stated as follows.

  1. When adding a property in a fully compatible manner, the schema of the property being added must be fully compatible with the schema of “additionalProperties” in the previous version of the schema.
  2. When removing a property in a fully compatible manner, the schema of “additionalProperties” in the new version of the schema must be fully compatible with the schema of the property being removed.

Using Partially Open Content Models for Full Compatibility

The previous examples of full compatibility are of limited use, since they only allow new properties to match anything using true, in the case of an open content model, or to match nothing using false, in the case of a closed content model. To achieve full compatibility in a meaningful manner, one can use a partially open content model, such as Schema 4, which I repeat below.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  "additionalProperties": { "type": "string" }
}

Schema 4 allows one to add and remove properties of type “string” in a fully compatible manner. What if you want to add properties of either type “string” or “integer”? You could specify additionalProperties with a oneOf, as in the following schema:

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  "additionalProperties": { 
    "oneOf": [ { "type": "string" }, { "type": "integer" } ] 
  }
}

But with the above schema, every fully compatible schema that adds a new property would have to specify the type of the property as a oneOf as well:

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" },
    "zap": { 
      "oneOf": [ { "type": "string" }, { "type": "integer" } ] 
    }
  },
  "additionalProperties": { 
    "oneOf": [ { "type": "string" }, { "type": "integer" } ] 
  }
}

An alternative would be to use patternProperties. The rules in the previous section regarding adding and removing properties do not apply when using patternProperties.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" }
  },
  "patternProperties": {
    "^s_": { "type": "string" },
    "^i_": { "type": "integer" }
  },
  "additionalProperties": false
}

With Schema 12, one can add properties of type “string” that start with “s_”, or properties of type “integer” that start with “i_”, in a fully compatible manner, as shown below with “s_zap” and “i_zap”.

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" },
    "bar": { "type": "string" },
    "s_zap": { "type": "string" },
    "i_zap": { "type": "integer" }
  },
  patternProperties": {
    "^s_": { "type": "string" },
    "^i_": { "type": "integer" }
  },
  "additionalProperties": false
}

Achieving full compatibility in a meaningful way is possible, but requires some up-front planning, possibly with the use of patternProperties.

Summary

JSON Schema is unique when compared to other schema languages like Avro and Protobuf in that it has aspects of both a rule-based language and a grammar-based language. A better understanding of open, closed, and partially-open content models can help you when evolving schemas in a backward, forward, or fully compatible manner.

Understanding JSON Schema Compatibility

My 10 Favorite Computer Science Books

I’ve seen other lists of favorite books over the last few months, so I thought I’d jump into the fray.  If I were stuck on an island (or in my room during a pandemic), here are the ten computer science books that I would want by my side.  These ten books also provide a great overview of the entire field of computer science.

Structure and Interpretation of Computer Programs

I was fortunate enough to take the course for which this book was developed when I was an undergraduate at MIT.   Since then this book has become a classic.  If you can invest the time, the text has the potential to expand your mind and make you a better programmer.

Artificial Intelligence: A Modern Approach

When I was at MIT, the textbook for the Artificial Intelligence class was written by Professor Winston.  While that was a great book, since then the above textbook by Russell and Norvig has become the bible for AI, and for good reason.  Not only is the text comprehensive, but it is extremely clear and easy to read.

The Language of Machines

I met Professor Floyd while working toward a Master’s degree at Stanford.  He taught an advanced class on automata and computability that later led to the above textbook.  The text is unique in that it introduces a unified model of computation that ties together seemingly disparate concepts.  The class was also memorable in that Professor Floyd was one of the kindest, gentlest teachers I have ever known.  Unfortunately he passed away in 2001.

Abstraction and Specification in Program Development

Professor Liskov is famous for the Liskov Substitution Principle, as well as being a Turing Award winner.  Professor Guttag was my undergraduate advisor, and later became head of the Electrical Engineering and Computer Science department at MIT.  While this is probably the least known book on my list, its influence is greater than is recognized. Although the book uses the CLU programming language to convey its ideas, the ideas were carried over to a subsequent textbook, Program Development in Java.   The above textbook also influenced Introduction to Computation and Programming in Python, which is the foundational text for today’s MIT undergraduate program in computer science.

Compilers: Principles, Techniques, and Tools

The above book is often simply referred to as the Dragon Book.  It remains the bible for compiler theory.  Professor Lam taught an advanced compiler course that I took at Stanford.

The Design and Implementation of the FreeBSD OS

The above textbook is not only comprehensive, but easy to read.  It gives an in-depth view of an influential operating system that continues to be relevant today.

Introduction to Algorithms

When I was at MIT, the authors were just starting to develop this text.  From the chapter notes that were provided during class, I could already tell that a great textbook was taking shape.  Of course, this book is now considered the bible of its field.

TCP/IP Illustrated, Volume 1

This is probably the best book from which to learn about modern day networking.  The level of depth is unmatched and the text continues to reward those who return to it again and again.  W. Richard Stevens passed away in 1999, but fortunately his books continue to be revised for today’s readers.

Transaction Processing: Concepts and Techniques

The authors introduced the ACID (atomicity, consistency, isolation, durability) properties of database transactions, and this book is a comprehensive summary of their work.  The database community lost a true giant when Jim Gray was lost at sea in 2007, while on a short trip to scatter his mother’s ashes near San Francisco.

Concrete Mathematics: A Foundation for Computer Science

This text started as an expanded treatment of the “Mathematical Preliminaries” chapter of The Art of Computer Programming.  While I have not yet read The Art of Computer Programming, I was fortunate enough to take the Stanford course that used the above book, taught by Professor Knuth.  The book provides a more leisurely introduction to the mathematical analysis of algorithms, in a manner that is both challenging and fun.

 

My 10 Favorite Computer Science Books

Keta: A Metadata Store Backed By Apache Kafka

Recently I added the ability for KCache to be configured with different types of backing caches. KCache, by providing an ordered key-value abstraction for a compacted topic in Kafka, can be used as a foundation for a highly-available service for storing metadata, similar to ZooKeeper, etcd, and Consul.

For such a metadata store, I wanted the following features:

  • Ordered key-value model
  • APIs available through gRPC
  • Transactional support using multiversion concurrency control (MVCC)
  • High availability through leader election and failover
  • Ability to be notified of changes to key-value tuples
  • Ability to expire key-value tuples

It turns out that etcd already has these features, so I decided to use the same gRPC APIs and data model as in etcd v3.  In addition, a comprehensive set of Jepsen tests have been built for etcd, so by using the same APIs, I could make use of the same Jepsen tests.

The resulting system is called Keta1.

Hello, Keta

By adopting the etcd v3 APIs, Keta can be used by any client that supports these APIs. Etcd clients are available in go, Java, Python, JavaScript, Ruby, C++, Erlang, and .NET.  In addition, Keta can be used with the etcdctl command line client that ships with etcd.

To get started with Keta, download a release, unpack it, and then modify config/keta.properties to point to an existing Kafka broker.  Then run the following:

$ bin/keta-start config/keta.properties
 

Next download etcd as described here. At a separate terminal, start etcdctl:

$ etcdctl put mykey "this is awesome"
$ etcdctl get mykey
 

The etcd APIs have a concise way for expressioning transactions.

$ etcdctl put user1 bad
$ etcdctl txn --interactive

compares:
value("user1") = "bad"      

success requests (get, put, delete):
del user1  

failure requests (get, put, delete):
put user1 good
 

To expire key-value tuples, use a lease.

$ etcdctl lease grant 300
# lease 2be7547fbc6a5afa granted with TTL(300s)

$ etcdctl put sample value --lease=2be7547fbc6a5afa
$ etcdctl get sample

$ etcdctl lease keep-alive 2be7547fbc6a5afa
$ etcdctl lease revoke 2be7547fbc6a5afa
# or after 300 seconds
$ etcdctl get sample
 

To receive change notifications, use a watch.

$ etcdctl watch stock --prefix

Then at a separate terminal, enter the following:

$ etcdctl put stock1 10
$ etcdctl put stock2 20

If you prefer a GUI, you can use etcdmanager when working with Keta.

Leader Election using the Magical Rebalance Protocol

Keta achieves high availability by allowing any number of Keta instances to be run as a cluster. One instance is chosen as the leader, and all other instances act as followers. The followers will forward both reads and writes to the leader. If the leader dies, another leader is chosen.

Leader election is accomplished by using the rebalance protocol of Kafka, which is the same protocol that is used to assign topic-partitions to consumers in a consumer group.

Jepsen-Driven Development

As mentioned, one nice aspect of using the etcd APIs is that a set of Jepsen tests are already available. This allowed me to use Jepsen-Driven Development (JDD) when developing Keta, which is like Test-Driven Development (TDD), but on steroids.

Jepsen is an awesome framework written in Clojure for testing (or more like breaking) distributed systems. It comes with an in-depth tutorial for writing new Jepsen tests.

I was able to modify the existing Jepsen tests for etcd by having the tests install and start Keta instead of etcd. The client code in the test, which uses the native Java client for etcd, remained untouched. The modified tests can be found here.

I was able to successfully run three types of etcd tests:

  1. A set test, which uses a compare-and-set transaction to concurrently read a set of integers from a single key and append a value to that set.   This test is designed to measure stale reads.
  2. An append test, which uses transactions to concurrently read and append to lists of unique integers.  This test is designed to verify strict serializability.
  3. A register test, which concurrently performs randomized reads, writes, and compare-and-set operations over single keys.  This test is designed to verify linearizability.

Jepsen has a built-in component called nemesis that can inject faults into the system during test runs. For the etcd tests, nemesis was used to kill the leader and to create network partitions.

One challenge of running Keta with Jepsen is that leader election using the rebalance protocol can take several seconds, whereas leader election in etcd, which uses Raft, only takes a second or less. This means that the number of unsuccessful requests is higher in the tests when using Keta than when using etcd, but this is to be expected.

In any case, Keta passes all of the above tests.2

# Run the Jepsen set test
$ lein run test --concurrency 2n --workload set --nemesis kill,partition
...
Everything looks good! ヽ(‘ー`)ノ

# or sometimes
...
Errors occurred during analysis, but no anomalies found. ಠ~ಠ
 

What’s Not in the Box

Keta only provides a subset of the functionality of etcd and so is not a substitute for it. In particular, it is missing

  • A lock service for clients
  • An election service for clients
  • An immutable version history for key-value tuples
  • Membership reconfiguration

For example, Keta only keeps the latest value for a specific key, and not its prior history as is available with etcd.  However, if you’re interested in a highly-available transactional metadata store backed by Apache Kafka that provides most of the features of etcd, please give Keta a try.

Keta: A Metadata Store Backed By Apache Kafka

Using KCache with a Persistent Cache

KCache is a library that provides an ordered key-value store (OKVS) abstraction for a compacted topic in Kafka. As an OKVS, it can be used to treat Kafka as a multi-model database, allowing Kafka to represent graphs, documents, and relational data.

Initially KCache stored data from the compacted topic in an in-memory cache. In newer releases, KCache can be configured to use a persistent cache that stores data to disk. This allows KCache to handle larger data sets, and also improves startup times. The persistent cache is itself an embedded OKVS, and can be configured to be one of the following implementations:

  • Berkeley DB JE
  • LMDB
  • MapDB
  • RocksDB

Here is a quick comparison of the different embedded OKVS libraries before I go into more detail.

Embedded OKVS Data Structure Language Transactions Secondary Indexes License
BDB JE B+ tree Java Yes Yes Apache
LMDB B+ tree C Yes No BSD-like
MapDB B+ tree Java Yes No Apache
RocksDB LSM tree C++ Yes No Apache

Below are additional details on the various libraries.  I also add some historical notes that I find interesting.

Berkeley DB JE

Berkeley DB JE is the Java Edition of the Berkeley DB library.  It is similar but not compatible to the C edition that predates it.  The C edition is simply referred to as Berkeley DB.

Berkeley DB grew out of efforts at the University of California, Berkeley as part of BSD to replace the popular dbm library that existed in AT&T Unix, due to patent issues.  It was first released in 1991.

Berkeley DB JE is the core library in Oracle NoSQL, which extends the capabilities of Berkeley DB JE to a sharded, clustered environment.  Berkeley DB JE supports transactions, and is unique in that it also supports secondary indexes.  It has additional advanced features such as replication and hot backups.  Internally it uses a B+ tree to store data.

LMDB

LMDB, short for Lightening Memory-Mapped Database, is another OKVS that uses the B++ tree data structure.  It was initially designed in 2009 to replace Berkeley DB in the OpenLDAP project.

LMDB supports transactions but not secondary indexes.  LMDB uses a copy-on-write semantics that allows it to not use a transaction log.

MapDB

MapDB is a pure Java implementation of an OKVS.  It evolved from a project started in 2001 called JDBM, which was meant to be a pure Java implementation of the dbm library in AT&T Unix. MapDB provides several collection APIs, including maps, sets, lists, and queues.

MapDB uses a B+ tree data structure and supports transactions, but not secondary indexes.  MapDB also supports snapshots and incremental backups.

RocksDB

RocksDB was created by Facebook in 2012 as a fork of LevelDB.  LevelDB is a library created by Google in 2011 based on ideas from BigTable, the inspiration for HBase.  Both BigTable and HBase can be viewed as distributed OKVSs.

Unlike the OKVSs mentioned above, RocksDB uses an LSM tree to store data.  It supports different compaction styles for merging SST files.  It adds many features that do not exist in LevelDB, including column families, transactions, backups, and checkpoints.  RocksDB is written in C++.

Selecting a Persistent Cache

When selecting a persistent cache for KCache, the first consideration is whether your application is read-heavy vs write-heavy.  In general, an OKVS based on a B+ tree is faster for reads, while one based on an LSM tree is faster for writes.  There’s a good discussion of the pros and cons of B+ trees and LSM trees in Chapter 3 of Designing Data-Intensive Applications, by Martin Kleppmann.

For further performance comparisons, the LMDB project has some good benchmarks here, although they don’t include Berkeley DB JE.  I’ve ported the LMDB benchmarks for KCache and included Berkeley DB JE, so that you can try the benchmarks for yourself on your platform of choice.

Using KCache with a Persistent Cache

Putting Several Event Types in the Same Topic – Revisited

The following post originally appeared in the Confluent blog on July 8, 2020.

In the article Should You Put Several Event Types in the Same Kafka Topic?, Martin Kleppmann discusses when to combine several event types in the same topic and introduces new subject name strategies for determining how Confluent Schema Registry should be used when producing events to an Apache Kafka® topic.

Schema Registry now supports schema references in Confluent Platform 5.5, and this blog post presents an alternative means of putting several event types in the same topic using schema references, discussing the advantages and disadvantages of this approach.

Constructs and constraints

Apache Kafka, which is an event streaming platform, can also act as a system of record or a datastore, as seen with ksqlDB. Datastores are composed of constructs and constraints. For example, in a relational database, the constructs are tables and rows, while the constraints include primary key constraints and referential integrity constraints. Kafka does not impose constraints on the structure of data, leaving that role to Confluent Schema Registry. Below are some constructs when using both Kafka and Schema Registry:

  • Message: a data item that is made up of a key (optional) and value
  • Topic: a collection of messages, where ordering is maintained for those messages with the same key (via underlying partitions)
  • Schema (or event type): a description of how data should be structured
  • Subject: a named, ordered history of schema versions

The following are some constraints that are maintained when using both Kafka and Schema Registry:

  • Schema-message constraints: A schema constrains the structure of the message. The key and value are typically associated with different schemas. The association between a schema and the key or value is embedded in the serialized form of the key or value.
  • Subject-schema constraints: A subject constrains the ordered history of schema versions, also known as the evolution of the schema. This constraint is called a compatibility level. The compatibility level is stored in Schema Registry along with the history of schema versions.
  • Subject-topic constraints: When using the default TopicNameStrategy, a subject can constrain the collection of messages in a topic. The association between the subject and the topic is by convention, where the subject name is {topic}-key for the message key and {topic}-value for the message value.

Using Apache Avro™ unions before schema references

As mentioned, the default subject name strategy, TopicNameStrategy, uses the topic name to determine the subject to be used for schema lookups, which helps to enforce subject-topic constraints. The newer subject-name strategies, RecordNameStrategy and TopicRecordNameStrategy, use the record name (along with the topic name for the latter strategy) to determine the subject to be used for schema lookups. Before these newer subject-name strategies were introduced, there were two options for storing multiple event types in the same topic:

  • Disable subject-schema constraints by setting the compatibility level of a subject to NONE and allowing any schema to be saved in the subject, regardless of compatibility
  • Use an Avro union

The second option of using an Avro union was preferred, but still had the following issues:

  • The resulting Avro union could become unwieldy
  • It was difficult to independently evolve the event types contained within the Avro union

By using either RecordNameStrategy or TopicRecordNameStrategy, you retain subject-schema constraints, eliminate the need for an Avro union, and gain the ability to evolve types independently. However, you lose subject-topic constraints, as now there is no constraint on the event types that can be stored in the topic, which means the set of event types in the topic can grow unbounded.

Using Avro unions with schema references

Introduced in Confluent Platform 5.5, a schema reference is comprised of:

  • A reference name: part of the schema that refers to an entirely separate schema
  • A subject and version: used to identify and look up the referenced schema

When registering a schema to Schema Registry, an optional set of references can be specified, such as this Avro union containing reference names:

[
  "io.confluent.examples.avro.Customer",
  "io.confluent.examples.avro.Product",
  "io.confluent.examples.avro.Payment"
]

When registering this schema to Schema Registry, an array of reference versions is also sent, which might look like the following:

[
  { 
    "name": "io.confluent.examples.avro.Customer",
    "subject": "customer",
    "version": 1
  },
  {
    "name": "io.confluent.examples.avro.Product",
    "subject": "product",
    "version": 1
  },
  {
    "name": "io.confluent.examples.avro.Order",
    "subject": "order",
    "version": 1
  }
]

As you can see, the Avro union is no longer unwieldy. It is just a list of event types that will be sent to a topic. The event types can evolve independently, similar to when using RecordNameStrategy and TopicRecordNameStrategy. Plus, you regain subject-topic constraints, which were missing when using the newer subject name strategies.

However, in order to take advantage of these newfound gains, you need to configure your serializers a little differently. This has to do with the fact that when an Avro object is serialized, the schema associated with the object is not the Avro union, but just the event type contained within the union. When the Avro serializer is given the Avro object, it will either try to register the event type as a newer schema version than the union (if auto.register.schemas is true), or try to find the event type in the subject (if auto.register.schemas is false), which will fail. Instead, you want the Avro serializer to use the Avro union for serialization and not the event type. In order to accomplish this, set these two configuration properties on the Avro serializer:

  • auto.register.schemas=false
  • use.latest.version=true

Setting auto.register.schemas to false disables automatic registration of the event type, so that it does not override the union as the latest schema in the subject. Setting use.latest.version to true causes the Avro serializer to look up the latest schema version in the subject (which will be the union) and use that for serialization; otherwise, if set to false, the serializer will look for the event type in the subject and fail to find it.

Using JSON Schema and Protobuf with schema references

Now that Confluent Platform supports both JSON Schema and Protobuf, both RecordNameStrategy and TopicRecordNameStrategy can be used with these newer schema formats as well. In the case of JSON Schema, the equivalent of the name of the Avro record is the title of the JSON object. In the case of Protobuf, the equivalent is the name of the Protobuf message.

Also like Avro, instead of using the newer subject-name strategies to combine multiple event types in the same topic, you can use unions. The Avro union from the previous section can also be modeled in JSON Schema, where it is referred to as a "oneof":

{
  "oneOf": [
     { "$ref": "Customer.schema.json" },
     { "$ref": "Product.schema.json" },
     { "$ref": "Order.schema.json }
  ]
}

In the above schema, the array of reference versions that would be sent might look like this:

[
  { 
    "name": "Customer.schema.json",
    "subject": "customer",
    "version": 1
  },
  {
    "name": "Product.schema.json",
    "subject": "product",
    "version": 1
  },
  {
    "name": "Order.schema.json",
    "subject": "order",
    "version": 1
  }
]

As with Avro, automatic registration of JSON schemas that contain a top-level oneof won’t work, so you should configure the JSON Schema serializer in the same manner as the Avro serializer, with auto.register.schemas set to false and use.latest.version set to true, as described in the previous section.

In Protobuf, top-level oneofs are not permitted, so you need to wrap the oneof in a message:

syntax = "proto3";

package io.confluent.examples.proto;

import "Customer.proto";
import "Product.proto";
import "Order.proto";

message AllTypes {
    oneof oneof_type {
        Customer customer = 1;
        Product product = 2;
        Order order = 3;
    }
}

Here are the corresponding reference versions that could be sent with the above schema:

[
  { 
    "name": "Customer.proto",
    "subject": "customer",
    "version": 1
  },
  {
    "name": "Product.proto",
    "subject": "product",
    "version": 1
  },
  {
    "name": "Order.proto",
    "subject": "order",
    "version": 1
  }
]

One advantage of wrapping the oneof with a message is that automatic registration of the top-level schema will work properly. In the case of Protobuf, all referenced schemas will also be auto registered, recursively.

You can do something similar with Avro by wrapping the union with an Avro record:

{
 "type": "record",
 "namespace": "io.confluent.examples.avro",
 "name": "AllTypes",
 "fields": [
   {
     "name": "oneof_type",
     "type": [
       "io.confluent.examples.avro.Customer",
       "io.confluent.examples.avro.Product",
       "io.confluent.examples.avro.Order"
     ]
   }
 ]
}

This extra level of indirection allows automatic registration of the top-level Avro schema to work properly. However, unlike Protobuf, with Avro, the referenced schemas still need to be registered manually beforehand, as the Avro object does not have the necessary information to allow referenced schemas to be automatically registered.

Wrapping a oneof with a JSON object won’t work with JSON Schema, since a POJO being serialized to JSON doesn’t have the requisite metadata. Instead, optionally annotate the POJO with a @Schema annotation to provide the complete top-level JSON Schema to be used for both automatic registration and serialization. As with Avro, and unlike Protobuf, referenced schemas need to be registered manually beforehand.

Getting started with schema references

Schema references are a means of modularizing a schema and its dependencies. While this article shows how to use them with unions, they can be used more generally to model the following:

  • Nested records in Avro
  • import statements in Protobuf
  • $ref statements in JSON Schema

As mentioned in the previous section, if you’re using Protobuf, the Protobuf serializer can automatically register the top-level schema and all referenced schemas, recursively, when given a Protobuf object. This is not possible with the Avro and JSON Schema serializers. With those schema formats, you must first manually register the referenced schemas and then the top-level schema. Manual registration can be accomplished with the REST APIs or with the Schema Registry Maven Plugin.

As an example of using the Schema Registry Maven Plugin, below are schemas specified for the subjects named all-types-value, customer, and product in a Maven POM.

<plugin>
  <groupId>io.confluent</groupId>
  <artifactId>kafka-schema-registry-maven-plugin</artifactId>
  <version>${confluent.version}</version>
  <configuration>
    <schemaRegistryUrls>
      <param>http://127.0.0.1:8081</param>
    </schemaRegistryUrls>
    <subjects>
      <all-types-value>src/main/avro/AllTypes.avsc</all-types-value>
      <customer>src/main/avro/Customer.avsc</customer>
      <product>src/main/avro/Product.avsc</product>
    </subjects>
    <schemaTypes>
      <all-types-value>AVRO</all-types-value>
      <customer>AVRO</customer>
      <product>AVRO</product>
    </schemaTypes>
    <references>
      <all-types-value>
        <reference>
          <name>io.confluent.examples.avro.Customer</name>
          <subject>customer</subject>
        </reference>
        <reference>
          <name>io.confluent.examples.avro.Product</name>
          <subject>product</subject>
        </reference>
      </all-types-value>
    </references>
  </configuration>
  <goals>
    <goal>register</goal>
  </goals>
</plugin>

Each reference can specify a name, subject, and version. If the version is omitted, as with the example above, and the referenced schema is also being registered at the same time, the referenced schema’s version will be used; otherwise, the latest version of the schema in the subject will be used.

Here is the content of AllTypes.avsc, which is a simple union:

[
    "io.confluent.examples.avro.Customer",
    "io.confluent.examples.avro.Product"
]

Here is Customer.avsc, which contains a Customer record:

{
 "type": "record",
 "namespace": "io.confluent.examples.avro",
 "name": "Customer",

 "fields": [
     {"name": "customer_id", "type": "int"},
     {"name": "customer_name", "type": "string"},
     {"name": "customer_email", "type": "string"},
     {"name": "customer_address", "type": "string"}
 ]
}

And here is Product.avsc, which contains a Product record:

{
 "type": "record",
 "namespace": "io.confluent.examples.avro",
 "name": "Product",

 "fields": [
     {"name": "product_id", "type": "int"},
     {"name": "product_name", "type": "string"},
     {"name": "product_price", "type": "double"}
 ]
}

Next, register the schemas above using the following command:

mvn schema-registry:register

The above command will register referenced schemas before registering the schemas that depend on them. The output of the command will contain the ID of each schema that is registered. You can use the schema ID of the top-level schema with the console producer when producing data.

Next, use the console tools to try it out. First, start the Avro console consumer. Note that you should specify the topic name as all-types since the corresponding subject is all-types-value according to TopicNameStrategy.

./bin/kafka-avro-console-consumer --topic all-types --bootstrap-server localhost:9092

In a separate console, start the Avro console producer. Pass the ID of the top-level schema as the value of value.schema.id.

./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic all-types --property value.schema.id={id} --property auto.register=false --property use.latest.version=true

At the same command line as the producer, input the data below, which represent two different event types. The data should be wrapped with a JSON object that specifies the event type. This is how the Avro console producer expects data for unions to be represented in JSON.

{ "io.confluent.examples.avro.Product": { "product_id": 1, "product_name" : "rice", "product_price" : 100.00 } }
{ "io.confluent.examples.avro.Customer": { "customer_id": 100, "customer_name": "acme", "customer_email": "acme@google.com", "customer_address": "1 Main St" } }

The data will appear at the consumer. Congratulations, you’ve successfully sent two different event types to a topic! And unlike the newer subject name strategies, the union will prevent event types other than Product and Customer from being produced to the same topic, since the producer is configured with the default TopicNameStrategy.

Summary

Now there are two modular ways to store several event types in the same topic, both of which allow event types to evolve independently. The first, using the newer subject-name strategies, is straightforward but drops subject-topic constraints. The second, using unions (or oneofs) and schema references, maintains subject-topic constraints but adds further structure and drops automatic registration of schemas in the case of a top-level union or oneof.

If you’re interested in querying topics that combine multiple event types with ksqlDB, the second method, using a union (or oneof) is the only option. By maintaining subject-topic constraints, the method of using a union (or oneof) allows ksqlDB to deal with a bounded set of event types as defined by the union, instead of a potentially unbounded set. Modeling a union (also known as a sum type) by a relational table is a solved problem, and equivalent functionality will most likely land in ksqlDB in the future.

Putting Several Event Types in the Same Topic – Revisited

Playing Chess with Confluent Schema Registry

Previously, the Confluent Schema Registry only allowed you to manage Avro schemas. With Confluent Platform 5.5, the schema management within Schema Registry has been made pluggable, so that custom schema types can be added. In addition, schema plugins have been developed for both Protobuf and JSON Schema.

Now Schema Registry has two main extension points:

  1. REST Extensions
  2. Schema Plugins

In reality, the schema management within Schema Registry is really just a versioned history mechanism, with specific rules for how versions can evolve. To demonstrate both of the above extension points, I’ll show how Confluent Schema Registry can be turned into a full-fledged chess engine.1

A Schema Plugin for Chess

A game of chess is also a versioned history. In this case, it is a history of chess moves. The rules of chess determine whether a move can be applied to a given version of the game.

To represent a version of a game of chess, I’ll use Portable Game Notation (PGN), a format in which moves are described using algebraic notation.

However, when registering a new version, we won’t require that the client send the entire board position represented as PGN. Instead, the client will only need to send the latest move. When the schema plugin receives the latest move, it will retrieve the current version of the game, check if the move is compatible with the current board position, and only then apply the move. The new board position will be saved in PGN format as the current version.

So far, this would allow the client to switch between making moves for white and making moves for black. To turn Schema Registry into a chess engine, after the schema plugin applies a valid move from the client, it will generate a move for the opposing color and apply that move as well.

In order to take back a move, the client just needs to delete the latest version of the game, and then make a new move. The new move will be applied to the latest version of the game that is not deleted.

Finally, in order to allow the client to play a game of chess with the black pieces, the client will send a special move of the form {player as black}. This is a valid comment in PGN format. When this special move is received, the schema plugin will simply generate a move for white and save that as the first version.

Let’s try it out. Assuming that the chess schema plugin has been built and placed on the CLASSPATH for the Schema Registry2, the following properties need to be added to schema-registry.properties3

schema.providers=io.yokota.schemaregistry.chess.schema.ChessSchemaProvider
resource.static.locations=static
resource.extension.class=io.yokota.schemaregistry.chess.SchemaRegistryChessResourceExtension
  

The above properties not only configure the chess schema plugin, but also the chess resource extension that will be used in the next section. Once Schema Registry is up, you can verify that the chess schema plugin was registered.

$ curl http://localhost:8081/schemas/types
["CHESS","JSON","PROTOBUF","AVRO"]

Let’s make the move d4.

$ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"schema": "d4", "schemaType": "CHESS"}'  \
  http://localhost:8081/subjects/newgame/versions
{"id":1}

Schema Registry returns the ID of the new version. Let’s examine what the version actually looks like.

$ curl http://localhost:8081/subjects/newgame/versions/latest
{
  "subject": "newgame",
  "version": 1,
  "id": 1,
  "schemaType": "CHESS",
  "schema": "1.d4 d5"
}

Schema Registry replied with the move d5.

We can continue playing chess with Schema Registry in this fashion, but of course it isn’t the best user experience. Let’s see if a REST extension will help.

A REST Extension for Chess

A single-page application (SPA) with an actual chess board as the interface would provide a much better experience. Therefore, I’ve created a REST Extension that wraps a Vue.js SPA for playing chess. When the user makes a move on the chess board, the SPA sends the move to Schema Registry, retrieves the new board position, determines the last move played by the computer opponent, and makes that move on the board as well.

With the REST extension configured as described in the previous section, you can navigate to http://localhost:8081/index.html to see the chess engine UI presented by the REST extension. When playing a chess game, the game history will appear below the chess board, showing how the board position evolves over time.

Here is an example of the REST extension in action.

As you can see, schema plugins in conjunction with REST extensions can provide a powerful combination. Hopefully, you are now inspired to customize Confluent Schema Registry in new and creative ways. Have fun!

Playing Chess with Confluent Schema Registry

Building A Graph Database Using Kafka

I previously showed how to build a relational database using Kafka. This time I’ll show how to build a graph database using Kafka. Just as with KarelDB, at the heart of our graph database will be the embedded key-value store, KCache.

Kafka as a Graph Database

The graph database that I’m most familiar with is HGraphDB, a graph database that uses HBase as its backend. More specifically, it uses the HBase client API, which allows it to integrate with not only HBase, but also any other data store that implements the HBase client API, such as Google BigTable. This leads to an idea. Rather than trying to build a new graph database around KCache entirely from scratch, we can try to wrap KCache with the HBase client API.

HBase is an example of a wide column store, also known as an extensible record store. Like its predecessor BigTable, it allows any number of column values to be associated with a key, without requiring a schema. For this reason, a wide column store can also be seen as two-dimensional key-value store.1

I’ve implemented KStore as a wide column store (or extensible record store) abstraction for Kafka that relies on KCache under the covers. KStore implements the HBase client API, so it can be used wherever the HBase client API is supported.

Let’s try to use KStore with HGraphDB. After installing and starting the Gremlin console, we install KStore and HGraphDB.

$ ./bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph

gremlin> :install org.apache.hbase hbase-client 2.2.1
gremlin> :install org.apache.hbase hbase-common 2.2.1
gremlin> :install org.apache.hadoop hadoop-common 3.1.2
gremlin> :install io.kstore kstore 0.1.0
gremlin> :install io.hgraphdb hgraphdb 3.0.0
gremlin> :plugin use io.hgraphdb
 

After we restart the Gremlin console, we configure HGraphDB with the KStore connection class and the Kafka bootstrap servers.2 We can then issue Gremlin commands against Kafka.

$ ./bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: io.hgraphdb
plugin activated: tinkerpop.tinkergraph

gremlin> cfg = new HBaseGraphConfiguration()\
......1> .set("hbase.client.connection.impl", "io.kstore.KafkaStoreConnection")\
......2> .set("kafkacache.bootstrap.servers", "localhost:9092")
==>io.hgraphdb.HBaseGraphConfiguration@41b0ae4c

gremlin> graph = new HBaseGraph(cfg)
==>hbasegraph[hbasegraph]

gremlin> g = graph.traversal()
==>graphtraversalsource[hbasegraph[hbasegraph], standard]

gremlin> v1 = g.addV('person').property('name','marko').next()
==>v[0371a1db-8768-4910-94e3-7516fc65dab3]

gremlin> v2 = g.addV('person').property('name','stephen').next()
==>v[3bbc9ce3-24d3-41cf-bc4b-3d95dbac6589]

gremlin> g.V(v1).addE('knows').to(v2).property('weight',2).iterate()
  

It works! HBaseGraph is now using Kafka as its storage backend.

Kafka as a Document Database

Now that we have a wide column store abstraction for Kafka in the form of KStore, let’s see what else we can do with it. Another database that uses the HBase client API is HDocDB, a document database for HBase. To use KStore with HDocDB, first we need to set hbase.client.connection.impl in our hbase-site.xml as follows.

<configuration>
    <property>
        <name>hbase.client.connection.impl</name>
        <value>io.kstore.KafkaStoreConnection</value>
    </property>
    <property>
        <name>kafkacache.bootstrap.servers</name>
        <value>localhost:9092</value>
    </property>
</configuration>

Now we can issue MongoDB-like commands against Kafka, using HDocDB.3

$ jrunscript -cp <hbase-conf-dir>:target/hdocdb-1.0.1.jar:../kstore/target/kstore-0.1.0.jar -f target/classes/shell/hdocdb.js -f -

nashorn> db.mycoll.insert( { _id: "jdoe", first_name: "John", last_name: "Doe" } )

nashorn> var doc = db.mycoll.find( { last_name: "Doe" } )[0]

nashorn> print(doc)
{"_id":"jdoe","first_name":"John","last_name":"Doe"}

nashorn> db.mycoll.update( { last_name: "Doe" }, { $set: { first_name: "Jim" } } )

nashorn> var doc = db.mycoll.find( { last_name: "Doe" } )[0]

nashorn> print(doc)
{"_id":"jdoe","first_name":"Jim","last_name":"Doe"}
  

Pretty cool, right?

Kafka as a Wide Column Store

Of course, there is no requirement to wrap KStore with another layer in order to use it. KStore can be used directly as a wide column store abstraction on top of Kafka. I’ve integrated KStore with the HBase Shell so that one can work directly with KStore from the command line.

$ ./kstore-shell.sh localhost:9092

hbase(main):001:0> create 'test', 'cf'
Created table test
Took 0.2328 seconds
=> Hbase::Table - test

hbase(main):003:0* list
TABLE
test
1 row(s)
Took 0.0192 seconds
=> ["test"]

hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
Took 0.1284 seconds

hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
Took 0.0113 seconds

hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
Took 0.0096 seconds

hbase(main):007:0> scan 'test'
ROW                                COLUMN+CELL
 row1                              column=cf:a, timestamp=1578763986780, value=value1
 row2                              column=cf:b, timestamp=1578763992567, value=value2
 row3                              column=cf:c, timestamp=1578763996677, value=value3
3 row(s)
Took 0.0233 seconds

hbase(main):008:0> get 'test', 'row1'
COLUMN                             CELL
 cf:a                              timestamp=1578763986780, value=value1
1 row(s)
Took 0.0106 seconds

hbase(main):009:0>

There’s no limit to the type of fun one can have with KStore. 🙂

Back to Graphs

Getting back to graphs, another popular graph database is JanusGraph, which is interesting because it has a pluggable storage layer. Some of the storage backends that it supports through this layer are HBase, Cassandra, and BerkeleyDB.

Of course, KStore can be used in place of HBase when configuring JanusGraph. Again, it’s simply a matter of configuring the KStore connection class in the JanusGraph configuration.

storage.hbase.ext.hbase.client.connection.impl: io.kstore.KafkaStoreConnection
storage.hbase.ext.kafkacache.bootstrap.servers: localhost:9092

However, we can do better when integrating JanusGraph with Kafka. JanusGraph can be integrated with any storage backend that supports a wide column store abstraction. When integrating with key-value stores such as BerkeleyDB, JanusGraph provides its own adapter for mapping a key-value store to a wide column store. Thus we can simply provide KCache to JanusGraph as a key-value store, and it will perform the mapping to a wide column store abstraction for us automatically.

I’ve implemented a new storage plugin for JanusGraph called janusgraph-kafka that does exactly this. Let’s try it out. After following the instructions here, we can start the Gremlin console.

$ ./bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports

gremlin>  graph = JanusGraphFactory.open('conf/janusgraph-kafka.properties')
==>standardjanusgraph[io.kcache.janusgraph.diskstorage.kafka.KafkaStoreManager:[127.0.0.1]]

gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[io.kcache.janusgraph.diskstorage.kafka.KafkaStoreManager:[127.0.0.1]], standard]

gremlin> v1 = g.addV('person').property('name','marko').next()
==>v[4320]

gremlin> v2 = g.addV('person').property('name','stephen').next()
==>v[4104]

gremlin> g.V(v1).addE('knows').to(v2).property('weight',2).iterate()
  

Works like a charm.

Summary

In this and the previous post, I’ve shown how Kafka can be used as

I guess I could have titled this post “Building a Graph Database, Document Database, and Wide Column Store Using Kafka”, although that’s a bit long. In any case, hopefully I’ve shown that Kafka is a lot more versatile than most people realize.


Building A Graph Database Using Kafka

Building A Relational Database Using Kafka

In a previous post, I showed how Kafka can be used as the persistent storage for an embedded key-value store, called KCache. Once you have a key-value store, it can be used as the basis for other models such as documents, graphs, and even SQL. For example, CockroachDB is a SQL layer built on top of the RocksDB key-value store and YugaByteDB is both a document and SQL layer built on top of RocksDB. Other databases such as FoundationDB claim to be multi-model, because they support several types of models at once, using the key-value store as a foundation.

In this post I will show how KCache can be extended to implement a fully-functional relational database, called KarelDB1. In addition, I will show how today a database architecture can be assembled from existing open-source components, much like how web frameworks like Dropwizard came into being by assembling components such as a web server (Jetty), RESTful API framework (Jersey), JSON serialization framework (Jackson), and an object-relational mapping layer (JDBI or Hibernate).

Hello, KarelDB

Before I drill into the components that comprise KarelDB, first let me show you how to quickly get it up and running. To get started, download a release, unpack it, and then modify config/kareldb.properties to point to an existing Kafka broker. Then run the following:

$ bin/kareldb-start config/kareldb.properties

While KarelDB is still running, at a separate terminal, enter the following command to start up sqlline, a command-line utility for accessing JDBC databases.

$ bin/sqlline
sqlline version 1.8.0

sqlline> !connect jdbc:avatica:remote:url=http://localhost:8765 admin admin

sqlline> create table books (id int, name varchar, author varchar);
No rows affected (0.114 seconds)

sqlline> insert into books values (1, 'The Trial', 'Franz Kafka');
1 row affected (0.576 seconds)

sqlline> select * from books;
+----+-----------+-------------+
| ID |   NAME    |   AUTHOR    |
+----+-----------+-------------+
| 1  | The Trial | Franz Kafka |
+----+-----------+-------------+
1 row selected (0.133 seconds)

KarelDB is now at your service.

Kafka for Persistence

At the heart of KarelDB is KCache, an embedded key-value store that is backed by Kafka.  Many components use Kafka as a simple key-value store, including Kafka Connect and Confluent Schema Registry.  KCache not only generalizes this functionality, but provides a simple Map based API for ease of use.  In addition, KCache can use different implementations for the embedded key-value store that is backed by Kafka.

In the case of KarelDB, by default KCache is configured as a RocksDB cache that is backed by Kafka. This allows KarelDB to support larger datasets and faster startup times. KCache can also be configured to use an in-memory cache instead of RocksDB if desired.

Avro for Serialization and Schema Evolution

Kafka has pretty much adopted Apache Avro as its de facto data format, and for good reason.  Not only does Avro provide a compact binary format, but it has excellent support for schema evolution.  Such support is why the Confluent Schema Registry has chosen Avro as the first format for which it provides schema management.

KarelDB uses Avro to both define relations (tables), and serialize the data for those relations.  By using Avro, KarelDB gets schema evolution for free when executing an ALTER TABLE command.

sqlline> !connect jdbc:avatica:remote:url=http://localhost:8765 admin admin 

sqlline> create table customers (id int, name varchar);
No rows affected (1.311 seconds)

sqlline> alter table customers add address varchar not null;
Error: Error -1 (00000) : 
Error while executing SQL "alter table customers add address varchar not null": 
org.apache.avro.SchemaValidationException: Unable to read schema:
{
  "type" : "record",
  "name" : "CUSTOMERS",
  "fields" : [ {
    "name" : "ID",
    "type" : "int",
    "sql.key.index" : 0
  }, {
    "name" : "NAME",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}
using schema:
{
  "type" : "record",
  "name" : "CUSTOMERS",
  "fields" : [ {
    "name" : "ID",
    "type" : "int",
    "sql.key.index" : 0
  }, {
    "name" : "NAME",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "ADDRESS",
    "type" : "string"
  } ]
}

sqlline> alter table customers add address varchar null;
No rows affected (0.024 seconds)

As you can see above, when we first try to add a column with a NOT NULL constraint, Avro rejects the schema change, because adding a new field with a NOT NULL constraint would cause deserialization to fail for older records that don’t have that field. When we instead add the same column with a NULL constraint, the ALTER TABLE command succeeds.

By using Avro for deserialization, a field (without a NOT NULL constraint) that is added to a schema will be appropriately populated with a default, or null if the field is optional. This is all automatically handled by the underlying Avro framework.

Another important aspect of Avro is that it defines a standard sort order for data, as well as a comparison function that operates directly on the binary-encoded data, without first deserializing it. This allows KarelDB to efficiently handle key range queries, for example.

Calcite for SQL

Apache Calcite is a SQL framework that handles query parsing, optimization, and execution, but leaves out the data store. Calcite allows for relational expressions to be pushed down to the data store for more efficient processing. Otherwise, Calcite can process the query using a built-in enumerable calling convention, that allows the data store to be represented as a set of tuples that can be accessed through an iterator interface. An embedded key-value store is a perfect representation for such a set of tuples, so KarelDB will handle key lookups and key range filtering (using Avro’s sort order support) but otherwise defer query processing to Calcite’s enumerable convention. One nice aspect of the Calcite project is that it continues to develop optimizations for the enumerable convention, which will automatically benefit KarelDB moving forward.

Calcite supports ANSI-compliant SQL, including some newer functions such as JSON_VALUE and JSON_QUERY.

sqlline> create table authors (id int, json varchar);
No rows affected (0.132 seconds)

sqlline> insert into authors 
       > values (1, '{"name":"Franz Kafka", "book":"The Trial"}');
1 row affected (0.086 seconds)

sqlline> insert into authors 
       > values (2, '{"name":"Karel Capek", "book":"R.U.R."}');
1 row affected (0.036 seconds)

sqlline> select json_value(json, 'lax $.name') as author from authors;
+-------------+
|   AUTHOR    |
+-------------+
| Franz Kafka |
| Karel Capek |
+-------------+
2 rows selected (0.027 seconds)

Omid for Transactions and MVCC

Although Apache Omid was originally designed to work with HBase, it is a general framework for supporting transactions on a key-value store. In addition, Omid uses the underlying key-value store to persist metadata concerning transactions. This makes it especially easy to integrate Omid with an existing key-value store such as KCache.

Omid actually requires a few features from the key-value store, namely multi-versioned data and atomic compare-and-set capability. KarelDB layers these features atop KCache so that it can take advantage of Omid’s support for transaction management. Omid utilizes these features of the key-value store in order to provide snapshot isolation using multi-version concurrency control (MVCC). MVCC is a common technique used to implement snapshot isolation in other relational databases, such as Oracle and PostgreSQL.

Below we can see an example of how rolling back a transaction will restore the state of the database before the transaction began.

sqlline> !autocommit off

sqlline> select * from books;
+----+-----------+-------------+
| ID |   NAME    |   AUTHOR    |
+----+-----------+-------------+
| 1  | The Trial | Franz Kafka |
+----+-----------+-------------+
1 row selected (0.045 seconds)

sqlline> update books set name ='The Castle' where id = 1;
1 row affected (0.346 seconds)

sqlline> select * from books;
+----+------------+-------------+
| ID |    NAME    |   AUTHOR    |
+----+------------+-------------+
| 1  | The Castle | Franz Kafka |
+----+------------+-------------+
1 row selected (0.038 seconds)

sqlline> !rollback
Rollback complete (0.059 seconds)

sqlline> select * from books;
+----+-----------+-------------+
| ID |   NAME    |   AUTHOR    |
+----+-----------+-------------+
| 1  | The Trial | Franz Kafka |
+----+-----------+-------------+
1 row selected (0.032 seconds)

Transactions can of course span multiple rows and multiple tables.

Avatica for JDBC

KarelDB can actually be run in two modes, as an embedded database or as a server. In the case of a server, KarelDB uses Apache Avatica to provide RPC protocol support. Avatica provides both a server framework that wraps KarelDB, as well as a JDBC driver that can communicate with the server using Avatica RPC.

One advantage of using Kafka is that multiple servers can all “tail” the same set of topics. This allows multiple KarelDB servers to run as a cluster, with no single-point of failure. In this case, one of the servers will be elected as the leader while the others will be followers (or replicas). When a follower receives a JDBC request, it will use the Avatica JDBC driver to forward the JDBC request to the leader. If the leader fails, one of the followers will be elected as a new leader.

Database by Components

Today, open-source libraries have achieved what component-based software development was hoping to do many years ago. With open-source libraries, complex systems such as relational databases can be assembled by integrating a few well-designed components, each of which specializes in one thing that it does particularly well.

Above I’ve shown how KarelDB is an assemblage of several existing open-source components:

Currently, KarelDB is designed as a single-node database, which can be replicated, but it is not a distributed database. Also, KarelDB is a plain-old relational database, and does not handle stream processing. For a distributed, stream-relational database, please consider using KSQL instead, which is production-proven.

KarelDB is still in its early stages, but give it a try if you’re interesting in using Kafka to back your plain-old relational data.

Building A Relational Database Using Kafka