When using HBase, it is often desirable to encrypt data in transit between an HBase client and an HBase server. This might be the case, for example, when storing PII (Personally Identifiable Information) in HBase, or when running HBase in a multi-tenant cloud environment.
Transport encryption is often enabled by configuring HBase to use SASL with GSSAPI/Kerberos to provide data confidentiality and integrity on a per-connection basis. However, the default implementation of GSSAPI/Kerberos does not seem to make use of AES-NI hardware acceleration. In our testing, we have seen up to a 50% increase in the P75 measurements for latencies of some of our HBase applications when using GSSAPI/Kerberos encryption versus no encryption.
One workaround is to bypass the encryption used by SASL and use an encryption library that can support AES-NI acceleration. This effort has already been completed for HDFS (HDFS-6606) and is in progress for Hadoop (HADOOP-10768). Based on some of this earlier work, similar changes can be made for HBase.
The way that the fix for HADOOP-10768 works is conceptually as follows. If the Hadoop client has been configured to negotiate a cipher suite in place of the one negotiated by SASL, then the following actions will take place:
- The client will send the server a set of cipher suites that it supports.
- The server will negotiate a mutually acceptable cipher suite.
- At the end of the SASL handshake, the server will generate a pair of encryption keys using the cipher suite and send them to the client via the secure SASL channel.
- The generated encryption keys, instead of the SASL layer, will be used to encrypt all subsequent traffic between the client and server.
Originally I was hoping that the work for HADOOP-10768 would be easily portable to the HBase codebase. It seems that some of the HBase code for SASL support originated from the corresponding Hadoop code, but has since diverged. For example, when performing the SASL handshake, the Hadoop client and server use protocol buffers to wrap the SASL state and SASL token, whereas the HBase client and server do not use protocol buffers when passing this data.
Instead, in HBase, during the SASL handshake the client sends
- The integer length of the SASL token
- The bytes of the SASL token
whereas the server sends
- An integer which is either 0 for success or 1 for failure
- In the case of success,
- The integer length of the SASL token
- The bytes of the SASL token
- In the case of failure,
- A string representing the class of the Exception
- A string representing an error message
There is one exception to the above scheme, and that is if the server sends a special integer SWITCH_TO_SIMPLE_AUTH (represented as -88) in place of the length of the SASL token, the rest of the message is ignored and the client falls back to simple authentication instead of completing the SASL handshake.
In order to adapt the fix for HADOOP-10768 for HBase, I decided to use another special integer called USE_NEGOTIATED_CIPHER (represented as -89) for messages related to cipher suite negotiation between client and server. If the client is configured to negotiate a cipher suite, then at the beginning of the SASL handshake, in place of a message containing only the length and bytes of a SASL token, it will send a message of the form
- USE_NEGOTIATED_CIPHER (-89)
- A string representing the acceptable cipher suites
- The integer length of the SASL token
- The bytes of the SASL token
And at the end of the SASL handshake, the server will send one additional message of the form
- A zero for success
- USE_NEGOTIATED_CIPHER (-89)
- A string representing the negotiated cipher suite
- A pair of encryption keys
- A pair of initialization vectors
We can turn on DEBUG logging for HBase to see what the client and server SASL negotiation normally looks like, without the custom cipher negotiation. Here is the client:
Creating SASL GSSAPI client. Server's Kerberos principal name is XXXX Have sent token of size 688 from initSASLContext. Will read input token of size 108 for processing by initSASLContext Will send token of size 0 from initSASLContext. Will read input token of size 32 for processing by initSASLContext Will send token of size 32 from initSASLContext. SASL client context established. Negotiated QoP: auth-cont
And here is the server:
Kerberos principal name is XXXX Created SASL server with mechanism = GSSAPI Have read input token of size 688 for processing by saslServer.evaluateResponse() Will send token of size 108 from saslServer. Have read input token of size 0 for processing by saslServer.evaluateResponse() Will send token of size 32 from saslServer. Have read input token of size 32 for processing by saslServer.evaluateResponse() SASL server GSSAPI callback: setting canonicalized client ID: XXXX SASL server context established. Authenticated client: XXXX (auth:SIMPLE). Negotiated QoP is auth-cont
To enable custom cipher negotiation, we set the following HBase configuration parameters for both the client and server (in addition to the properties to enable Kerberos):
<property> <name>hbase.rpc.security.crypto.cipher.suites</name> <value>AES/CTR/NoPadding</value> </property> <property> <name>hbase.rpc.protection</name> <value>privacy</value> </property>
With the above configuration, here is the client (new actions in bold):
Creating SASL GSSAPI client. Server's Kerberos principal name is XXXX Will send client ciphers: AES/CTR/NoPadding Have sent token of size 651 from initSASLContext. Will read input token of size 110 for processing by initSASLContext Will send token of size 0 from initSASLContext. Will read input token of size 65 for processing by initSASLContext Will send token of size 65 from initSASLContext. Client using cipher suite AES/CTR/NoPadding with server SASL client context established. Negotiated QoP: auth-cont
And here is the server, when using custom cipher negotiation (new actions in bold):
Have read client ciphers: AES/CTR/NoPadding Kerberos principal name is XXXX Created SASL server with mechanism = GSSAPI Have read input token of size 651 for processing by saslServer.evaluateResponse() Will send token of size 110 from saslServer. Have read input token of size 0 for processing by saslServer.evaluateResponse() Will send token of size 65 from saslServer. Have read input token of size 65 for processing by saslServer.evaluateResponse() SASL server GSSAPI callback: setting canonicalized client ID: XXXX Server using cipher suite AES/CTR/NoPadding with client SASL server context established. Authenticated client: XXXX (auth :SIMPLE). Negotiated QoP is auth-cont
Once the cipher suite negotiation is complete, both the client and server will have created an instance of SaslCryptoCodec
to perform the encryption. The client will call SaslCryptoCodec.wrap()/unwrap()
instead of SaslClient.wrap()/unwrap()
while the server will call SaslCryptoCodec.wrap()/unwrap()
instead of SaslServer.wrap()/unwrap()
. This is the same technique as used in HADOOP-10768.
With the above code deployed to our production servers, we can compare the latencies of different encryption modes for one of our HBase applications. (In order to run clients in different modes we have also patched our HBase servers with the fix for HBASE-14865.) Below we show the P50, P75, and P95 latencies over a 12 hour period. The higher line is an HBase client configured with GSSAPI/Kerberos encryption (higher is worse), the middle line is an HBase client configured with accelerated encryption, and the lower line is an HBase client configured with no encryption.
Also, here is the user CPU time for the three differently configured HBase clients (GSSAPI/Kerberos encryption, accelerated encryption, no encryption).
We can see that accelerated encryption provides a significant performance improvement over GSSAPI/Kerberos encryption. The changes I made to HBase in order to support accelerated encryption are available at HBASE-16633.