[SOLVED] Upgrading Apache Kafka 2.7 to Java 11 Changes authenticationID sent to ZooKeeper Enabling Only 1 Kafka Broker to r/w znodes

The title of this post is a bit of mouthful and requires a bit more explanation.

I am running a pure open-source version of Kafka (currently running 2.7) and am using SASL/GSSAPI connections between all of the brokers and ZooKeeper. Currently, the whole system, including ZooKeeper, is running Java 8 and it is long-overdue to be upgraded to Java 11.

Upgrading Kafka to Java 11 causes the server to send an incorrect authenticationID String to ZooKeeper which results in the ACLs on the znodes being set to the hostname of the first Kafka server that connects to ZooKeeper. This results in only one of the Kafka hosts being able to r/w the znodes.

Prior to upgrading the following are the logs from one of the ZooKeeper nodes:

2021-05-17 14:47:35,345 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.19.65.22:41698
2021-05-17 14:47:35,357 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.19.65.22:41698
2021-05-17 14:47:35,367 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x1797ac46a0f0000 with negotiated timeout 18000 for client /172.19.65.22:41698
2021-05-17 14:47:35,470 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET;  authorizationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET.
2021-05-17 14:47:35,474 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: kafka
2021-05-17 14:47:35,474 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@964] - adding SASL authorization for authorizationID: kafka

The following is the corresponding logs from the Kafka host making the connection:

[2021-05-17 14:47:37,512] INFO Successfully authenticated client: authenticationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET; authorizationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET. (org.apache.kafka.common.secur
ity.authenticator.SaslServerCallbackHandler)

The most important part of this is that ZooKeeper sees the authenticationID as kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET. ZooKeeper is configured with the following options so that the host and realm will be removed from the authenticationID. This will leave the service name, ‘kafka‘, and result in all of the Kakfa znodes with kafka as the owner of the node. This then enables all of the hosts in the Kafka cluster r/w access to those znodes. Following are the configurations in ZooKeeper that enable this behavior.

kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true

The resulting znodes ACLs are

[zk: localhost:2181(CONNECTED) 5] getAcl /brokers
'world,'anyone
: r
'sasl,'kafka
: cdrwa

System Configurations

Following are all of the configurations for both the ZooKeeper and Kafka cluster:

ZooKeeper

zoo.cfg

maxClientCnxns=50
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/data_dir
dataLogDir=/zk/data_log_dir
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
jaasLoginRenew=3600000
clientPort=2181
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true

server.1=zk2-01.dv.quasar.nadops.net:2888:3888
server.2=zk2-02.dv.quasar.nadops.net:2888:3888
server.3=zk2-03.dv.quasar.nadops.net:2888:3888

jaas.conf

Server {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/etc/zookeeper/conf/vmwqsrdvz201.dv.quasar.nadops.net.keytab"
  storeKey=true
  useTicketCache=false
  principal="zookeeper/vmwqsrdvz201.dv.quasar.nadops.net@EXAMPLE.NET";
};

Kafka

broker-jaas.conf

KafkaServer {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/usr/local/kafka/config/vmwqsrdvk01.dv.quasar.nadops.net.keytab"
  storeKey=true
  principal="kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET";
};

Client {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/usr/local/kafka/config/vmwqsrdvk01.dv.quasar.nadops.net.keytab"
  storeKey=true
  serviceName="zookeeper"
  principal="kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET";
};

server.properties

Only the relevant configs are included below.

listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL,BROKER_LISTENER:SASL_SSL

inter.broker.listener.name=BROKER_LISTENER
listeners=SASL_SSL://:9092,BROKER_LISTENER://:9093

ssl.keystore.location=/usr/local/kafka/config/kafka01.dv.quasar.nadops.net-keystore.jks
ssl.keystore.password=***
ssl.key.password=***
ssl.truststore.location=/usr/local/kafka/config/kafka01.dv.quasar.nadops.net-truststore.jks
ssl.truststore.password=***
sasl.kerberos.service.name=kafka
sasl.mechanism.inter.broker.protocol=GSSAPI
sasl.enabled.mechanisms=GSSAPI
zookeeper.set.acl=true
super.users=User:kafka
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
allow.everyone.if.no.acl.found=false

Upgrading Kafka to Java 11

Once Kafka (leaving ZooKeeper on Java 8) is upgraded to Java 11 Kafka no longer sends the expected authorizationID String to ZooKeeper.

Following are the ZooKeeper logs of the same authentication:

2021-05-18 14:31:01,627 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.19.65.23:53374
2021-05-18 14:31:01,631 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.19.65.23:53374
2021-05-18 14:31:01,636 [myid:3] - INFO  [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x3797fd27e300004 with negotiated timeout 18000 for client /172.19.65.23:53374
2021-05-18 14:31:01,728 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=vmwqsrdvk02@EXAMPLE.NET;  authorizationID=kafka/vmwqsrdvk02.dv.quasar.nadops.net@EXAMPLE.NET.
2021-05-18 14:31:01,728 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: vmwqsrdvk02
2021-05-18 14:31:01,728 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@964] - adding SASL authorization for authorizationID: vmwqsrdvk02

Following are the logs from Kafka that results in an exception being thrown because it is unable to write to the ZooKeeper znode:

[2021-05-18 14:31:01,410] INFO Initiating client connection, connectString=vmwqsrdvz201.dv.quasar.nadops.net:2181,vmwqsrdvz202.dv.quasar.nadops.net:2181,vmwqsrdvz203.dv.quasar.nadops.net:2181 sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@39b43d60 (org.apache.zookeeper.ZooKeeper)
[2021-05-18 14:31:01,414] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-05-18 14:31:01,419] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,421] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-05-18 14:31:01,556] INFO Client successfully logged in. (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,573] INFO TGT refresh thread started. (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,577] INFO Client will use GSSAPI as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2021-05-18 14:31:01,604] INFO TGT valid starting at:        Tue May 18 14:31:01 UTC 2021 (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,604] INFO TGT expires:                  Wed May 19 00:31:01 UTC 2021 (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,605] INFO TGT refresh sleeping until: Tue May 18 22:32:58 UTC 2021 (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,606] INFO Opening socket connection to server vmwqsrdvz203.dv.quasar.nadops.net/172.19.65.21:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,614] INFO Socket connection established, initiating session, client: /172.19.65.23:53374, server: vmwqsrdvz203.dv.quasar.nadops.net/172.19.65.21:2181 (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,622] INFO Session establishment complete on server vmwqsrdvz203.dv.quasar.nadops.net/172.19.65.21:2181, sessionid = 0x3797fd27e300004, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,626] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2021-05-18 14:31:01,731] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /brokers/ids
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:564)
        at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1662)
        at kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1560)
        at kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1(KafkaZkClient.scala:1552)
        at kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1$adapted(KafkaZkClient.scala:1552)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at kafka.zk.KafkaZkClient.createTopLevelPaths(KafkaZkClient.scala:1552)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:467)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:233)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
        at kafka.Kafka$.main(Kafka.scala:82)
        at kafka.Kafka.main(Kafka.scala)
[2021-05-18 14:31:01,733] INFO shutting down (kafka.server.KafkaServer)

The authorizationID now provided by Kafka is now vmwqsrdvk02@EXAMPLE.NET, only the hostname and realm, and no longer the String that ZooKeeper expects. The resulting, derived authorizationID is vmwqsrdvk02 which means that only that Kafka host has r/w to any of the znodes which it creates.

Solution

After exhausting search options and finding multiple pages indicating that I have everything configured correctly, even for JDK 11, I posted to the #apache-kafka channel on chat.freenode.net and got a hint. Evidently, there are some JDKs that have bugs in their Kerberos implementation.

I was running java-11-openjdk-11.0.8.10-1.el7.x86_64 from CentOS 7 Updates. I double-checked and there was an updated JDK available.

After updating to the latest Java 11 JDK everything worked as expected. If the particular JDK that you are using is still not working try AdoptOpenJDK.

Leave a Reply