Kafka Streams is not detecting renewed kerberos ticket after initial tickets expiry - java

I've found some similar questions, but they're not quite the same situation as this.
I have a Kafka Streams application which authenticates with brokers using Kerberos ticket details found within a Credential Cache.
The application works great until the original ticket's expiry is reached, then I get the following error.
04:21:45.630 [kafka-producer-network-thread | sample-app-StreamThread-1-producer] ERROR org.apache.kafka.clients.NetworkClient - [Producer clientId=sample-app-StreamThread-1-producer] Connection to node 2 (<Hostname>/<ipAddress>:<Port>) failed authentication due to: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]) occurred when evaluating SASL token received from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.
Now, that would all seem expected, but my ticket is renewed every 2 hours by another system, and yet, the Kafka Streams application isn't detecting that the ticket has been renewed. Querying the ticket using 'klist' tells me that there is a valid ticket at the time when the error occurs.
Ticket cache: FILE:/var/ABC/SYSTEM_ACCOUNT/cc/krb5cc_12345
Default principal: 12345#EXCHAD.ABC123.com
Valid starting Expires Service principal
04/02/20 02:28:02 04/02/20 12:28:02 krbtgt/EXCHAD.ABC123.com#EXCHAD.ABC123.com
renew until 04/08/20 08:28:04
Oddly, I can bounce my application again, and it'll work, but only until the new current ticket's expiry is reached in approx 10 hours.
Why isn't Kafka Streams looking for the latest ticket? Is this potentially a bug within Kafka Streams itself? I can't find any other settings related to this beyond the initial JAAS configuration.
com.sun.security.auth.module.Krb5LoginModule required
refreshKrb5Config=true
useKeyTab=false
useTicketCache=true
renewTGT=true
doNotPrompt=true
ticketCache="/var/ABC/SYSTEM_ACCOUNT/cc/krb5cc_12345"
principal="12345#EXCHAD.ABC123.com"
I'm using Java 8, and Kafka Streams 2.4.0
As always, any help or guidance would be greatly appreciated.
Thanks!

Related

How to set up OIDC connection to Keycloak in Quarkus on Kubernetes

Has somebody succeeded in setting up OIDC connection to Keycloack in a Quarkus app deployed in a Kubernetes cluster ?
Could you clarify how does the connection-delay (and other related parameters) work ?
(Here is the documentation I tried to follow)
In our env (Quarkus 1.13.3.Final, Keycloak 12.0.4) we have such config:
quarkus.oidc.connection-delay: 6M
quarkus.oidc.connection-timeout: 30S
quarkus.oidc.tenant-id: testTenant-01
And these msgs appear in pod's log when it's being started:
2021-07-26 14:44:22,523 INFO [main] [OidcRecorder.java:264] -
Connecting to IDP for up to 180 times every 2 seconds
2021-07-26
14:44:24,142 DEBUG [vert.x-eventloop-thread-1]
[OidcRecorder.java:115] - 'testTenant-01' tenant initialization has failed:
'OpenId Connect Provider configuration metadata is not configured and
can not be discovered'. Access to resources protected by this tenant
will fail with HTTP 401.
(... following log comes later as the pod is running ...)
2021-07-27 06:11:54,261 DEBUG [vert.x-eventloop-thread-0]
[DefaultTenantConfigResolver.java:112] - Tenant 'null' is not
initialized
2021-07-27 06:11:54,262 ERROR
[vert.x-eventloop-thread-0] [QuarkusErrorHandler.java:101] - HTTP
Request to /q/health/live failed, error id:
89f83d1d-894c-4fed-9995-0d42d60cec17-2: io.quarkus.oidc.OIDCException:
Tenant configuration has not been resolved at
io.quarkus.oidc.runtime.OidcAuthenticationMechanism.resolve(OidcAuthenticationMechanism.java:61)
at
io.quarkus.oidc.runtime.OidcAuthenticationMechanism.authenticate(OidcAuthenticationMechanism.java:40)
at
io.quarkus.oidc.runtime.OidcAuthenticationMechanism_ClientProxy.authenticate(OidcAuthenticationMechanism_ClientProxy.zig:189)
at
io.quarkus.vertx.http.runtime.security.HttpAuthenticator.attemptAuthentication(HttpAuthenticator.java:100)
at
io.quarkus.vertx.http.runtime.security.HttpAuthenticator_ClientProxy.attemptAuthentication(HttpAuthenticator_ClientProxy.zig:157)
at
io.quarkus.vertx.http.runtime.security.HttpSecurityRecorder$2.handle(HttpSecurityRecorder.java:101)
at
io.quarkus.vertx.http.runtime.security.HttpSecurityRecorder$2.handle(HttpSecurityRecorder.java:51)
at
io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1038)
Questions:
Any way how to find out what metadata are missing ?
Can I somehow change the 2s period between connection attempts ?
Any relation between connection-delay and connection-timeout ?
It failed after cca 2s - does it mean that it fails immediately in the 1st attempt, or has it finished 180 attempts so fast ?
Does DefaultTenantConfigResolver get tenant from different resource than OidcRecorder in initialization, i.e. should tenant be configured at multiple places ?
Finally made it work. Caused by incorrect auth-server-url which is not clear at all from the log messages.
quarkus.oidc.client-id: my-app
quarkus.oidc.enabled: true
quarkus.oidc.connection-delay: 6M
quarkus.oidc.connection-timeout: 30S
quarkus.oidc.tenant-id: testTenant-01
quarkus.oidc.auth-server-url: ${keycloak.url}/auth/realms/${quarkus.oidc.tenant-id}
The URL format is emphasized in Quarkus doc: Note if you work with Keycloak OIDC server, make sure the base URL is in the following format: https://host:port/auth/realms/{realm} where {realm} has to be replaced by the name of the Keycloak realm

Kafka Cluster showing continuous logs "INFO [SocketServer] Failed authentication (SSL handshake failed) (org.apache.kafka.common.network.Selector)"

I have 3 nodes of Kafka cluster in the Windows environment. I have recently added security to this existing cluster with the SASL_SSL mechanism.
Here is my server.properties security configurations on each node:
authroizer.class.name=kafka.security.auth.SimpleAclAuthorizer
security.inter.broker.protocol=SASL_SSL
sasl.mechanism.inter.broker.protocol=PLAIN
sasl.enabled.mechanisms=PLAIN
ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2
ssl.endpoint.identification.algorithm=
ssl.truststore.location=kafka-truststore.jks
ssl.truststore.password=******
ssl.keystore.location=kafka.keystore.jks
ssl.keystore.password=******
ssl.key.password=******
Everything is working fine. I am able to store and retrieve messages. Kafka stream applications are properly connected. But from yesterday I am getting continuous logs on all three nodes as
INFO [SocketServer brokerId=2] Failed authentication with host.docker.internal/ip (SSL handshake failed) (org.apache.kafka.common.network.Selector)
As the log says broker with id 2 is refusing the SSL handshake from the other brokers i.e. 1 & 3.
I have verified the jks certificates and they all are valid.
Did anyone know the reason for such logs?

Apache Pulsar - Authorization failed on topic - Don't have permission to administrate resources on this tenant

I'm getting this exception:
org.apache.pulsar.client.api.PulsarClientException$AuthorizationException: Authorization failed example_ingest_producer on topic persistent://myTenant/myNamespace/myTopicName with error Don't have permission to administrate resources on this tenant
when trying to connect to Pulsar from our client application. I'm running Pulsar 2.4.2.
I confirmed that I'm connecting to the correct endpoint (pulsar+ssl://pulsar-ms-tls.mydomain.com:6651), and we're using SSL+TLS.
What could be causing this problem?
This error is occurring because you're not using the correct token to connect, or the role associated with your token lacks sufficient permission. You will need to ensure that you are using the correct token (and that it has the right permissions) to enable you to connect.

Kafka ACL - LEADER_NOT_AVAILABLE

I have an issue producing messages to a Kafka topic (named secure.topic) secured with ACL.
My Groovy-based producer throws this error:
Error while fetching metadata with correlation id 9 : {secure.topic=LEADER_NOT_AVAILABLE}
Some notes about the configuration:
1 Kafka server, version 2.11_1.0.0 (both server and Java client libs)
topic ACL is set to All (also tested with --producer) and the user is the full name specified in the certificate
client auth enabled using self generated certificates
Additional server config:
security.inter.broker.protocol = SSL
ssl.client.auth = required
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
If I remove the authorizer.class.name property, then my client can produce messages (so, no problem with SSL and certificates).
Also, the kafka-authorizer.log produces the following message:
[2018-01-25 11:57:02,779] INFO Principal = User:CN= User,OU=XXX,O=XXX,L=XXX,ST=Unknown,C=X is Denied Operation = ClusterAction from host = 127.0.0.1 on resource = Cluster:kafka-cluster (kafka.authorizer.logger)
Any idea what can cause the LEADER_NOT_AVAILABLE error when enabling ACL?
From the authorizer logs, it looks like the Authorizer denied ClusterAction on the Cluster resource.
If you check your topic status (for example using kafka-topic.sh), I'd expect to see it without a Leader (-1).
When enabling authorizations, they are applied to all Kafka API messages reaching your cluster including inter-broker messages like StopReplica, LeaderAndIsr, ControlledShutdown, etc. So it looks like you only added ACLs for your client but forgot to add the ACLs required for the brokers to function.
So you need to at least add an ACL granting ClusterAction on the Cluster resource for your broker's principals. IIRC that's the only required ACL for inter-broker messages.
Following that, your cluster should be able to correctly elect a leader for the partition enabling your client to produce.

Kerberos error: GSSHeader did not find the right tag

I’m trying to make Kerberos authentication connection to a SOAP service wsdl url.
I’m able to establish connection successfully and make service calls.
After I start my server, I’m able to make successful service call at least once.
However after few requests (1 or more ), I suddenly get invalid token error.
Once I get the error, future calls do not work and error persists.
IF I restart my server, then again service call works at least once. And above cycle continues.
I’m unable to figure out why suddenly token gets invalid, though it worked earlier. And restarting server makes token valid again.
Here is error stacktrace:
Caused by: GSSException: Defective token detected (Mechanism level: GSSHeader did not find the right tag)
at sun.security.jgss.GSSHeader.<init>(GSSHeader.java:97)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:237)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
Without seeing the Base 64 value or a hex dump, I assume that the client is sending a NTLM type 1 token and Java does not support NTLM.
I could not find root cause for why token is invalid. But here is how i get around this issue.
My authentication was working for first time after I restart my server and bean is loaded again in context. So I changed scope of my spring bean to prototype so that every time new proxy bean is created.

Categories