Writing to a kafka follower node - java

Is it at all possible to write to a kafka follower node? I ask this because we sometimes encounter the following situation - a particular host containing a broker which is the leader node for some partitions becomes inaccessible from the producer host, while actually remaining up (i.e. a network issue between producer and a leader host), so a new leader is not elected.
To create a contrived example, one can block a host or a port using a firewell. In my contrived example, I have
h0: has seven brokers running on ports 9092 - 9098
h1: 3 brokers running between 9092 - 9094
h2: 3 brokers running between 9092 - 9094
h3: 3 brokers running between 9092 - 9094
I blocked outgoing port 9092, and as expected, approximately 25% of the messages do not get written, and error out, as expected.
In the real world, I have seen a host being unreachable for ~5 minutes from the producer host.
Is there any way to ensure that the message gets written to the kafka cluster?

It is not possible to produce messages to a follower, from the Kafka protocol details:
These requests to publish or fetch data must be sent to the broker that is currently acting as the leader for a given partition. This condition is enforced by the broker, so a request for a particular partition to the wrong broker will result in an the NotLeaderForPartition error code (described below).
You can come up with imaginative solutions like setting up a completely independent Kafka cluster, produce there if the main one is inaccessible and have MirrorMaker clone the data from the secondary Kafka to the main one, or forward the data to some other host that has connectivity to the Kafka cluster if your use-case really requires it. But well, most options seems a bit convoluted and costly...
Maybe is better to just buffer the data and wait to have connectivity back or investigate and invest in improving the network so there is more redundancy and becomes harder to have network partitions between hosts and the Kafka cluster.

Related

Kafka consumer group not rebalancing when increasing partitions

I have a situation where in my dev environment, my Kafka consumer groups will rebalance and distribute partitions to consumer instances just fine after increasing the partition count of a subscribed topic.
However, when we deploy our product into its kubernetes environment, we aren't seeing the consumer groups rebalance after increasing the partition count of the topic. Kafka recognized the increase which can be seen from the server logs or describing the topic from the command line. However, the consumer groups won't rebalance and recognize the new partitions. From my local testing, kafka respects metadata.max.age.ms (default 5 mins). But in kubernetes the group never rebalances.
I don't know if it affects anything but we're using static membership.
The consumers are written in Java and use the standard Kafka Java library. No messages are flowing through Kafka, and adding messages doesn't help. I don't see anything special in the server or consumer configurations that differs from my dev environment. Is anyone aware of any configurations that may affect this behavior?
** Update **
The issue was only occurring for a new topic. At first, the consumer application was starting before the producer application (which is intended to create the topic). So the consumer was auto creating the topic. In this scenario, the topic defaulted to 1 partition. When the producer application started it, updated the partition count per configuration. After that update, we never saw a rebalance.
Next we tried disabling consumer auto topic creation to address this. This prevented the consumer application from auto creating the topic on subscription. Yet still after the topic was created by the producer app, the consumer group was never rebalanced, so the consumer application would sit idle.
According to the documentation I've found, and testing in my dev environment, both of these situations should trigger a rebalance. For whatever reason we don't see that happen in our deployments. My temporary workaround was just to ensure that the topic is already created prior to allowing my consumer's to subscribe. I don't like it, but it works for now. I suspect that the different behavior I'm seeing is due to my dev environment running a single kafka broker vs the kubernetes deployments with a cluster, but that's just a guess.
Kafka defaults to update topic metadata only after 5 minutes, so will not detect partition changes immediately, as you've noticed. The deployment method of your app shouldn't matter, as long as network requests are properly reaching the broker.
Plus, check your partition assignment strategy to see if it's using sticky assignment. This will depend on what version of the client you're using, as the defaults changed around 2.7, I think
No messages are flowing through Kafka
If there's no data on the new partitions, there's no real need to rebalance to consume from them

How to set Kafka acorss Multi DC

Scenario: Currently, we have a Primary cluster, and we have Producer and consumer, which are working as expected. We have to implement a secondary Kafka DR cluster in another data center. I have a couple of ideas, but not sure how to proceed with?
Question: How to automate the producer switch over from Primary cluster to the secondary cluster if the Primary cluster/Broker goes down?
Any sample code will be helpful.
You can use a Load Balancer in front of your producer. The Load Balancer can switch to the secondary cluster if the brokers in primary arent available.
You can also implement the failover without a load balancer.
As a next step you have to configure in your code an exception handling which indicates a reconnect by yourself. So the producer are then still able to ingest.
Consumers can subscribe to super topics (several topics). This can be done with a regular expression.
For the HA Scenario you need 2 Kafka Clusters and Mirrormaker 2.0.
The Failover happens on client side Producer / Consumer.
To your question:
If you have 3 brokers and 2 ISR are configured, a maximum of 1 broker can fail. If 2 brokers fail, the high availability is no longer guaranteed. This means that you can build an exception handling which intercepts the error not enough replicas available and on this basis carries out a reconnect.
If you use the scenario with the load balancer, make sure that the load balancer is configured in the passtrough-. In this way, the amount of code can be reduced.

org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata for Kafka Cluster using jaas SASL config authentication

I am trying to deploy a Google Cloud Dataflow pipeline which reads from a Kafka cluster, processes its records, and then writes the results to BigQuery. However, I keep encountering the following exception when attempting to deploy:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata for Kafka Cluster
The Kafka cluster requires the use of a JAAS configuration for authentication, and I use the code below to set the properties required for the KafkaIO.read Apache Beam method:
// Kafka properties
Map<String, Object> kafkaProperties = new HashMap<String, Object>(){{
put("request.timeout.ms", 900000);
put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
put(SaslConfigs.SASL_MECHANISM, "SCRAM-SHA-512");
put(SaslConfigs.SASL_JAAS_CONFIG, "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"USERNAME\" password=\"PASSWORD\";");
put(CommonClientConfigs.GROUP_ID_CONFIG, GROUP_ID);
}};
// Build & execute pipeline
pipeline
.apply(
"ReadFromKafka",
KafkaIO.<Long, String>read()
.withBootstrapServers(properties.getProperty("kafka.servers"))
.withKeyDeserializer(LongDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.withTopic(properties.getProperty("kafka.topic")).withConsumerConfigUpdates(kafkaProperties))
The Dataflow pipeline is to be deployed with public IPs disabled, but there is an established VPN tunnel from our Google Cloud VPC network to the Kafka cluster and the required routing for the private ips on both sides are configured and their IPs are whitelisted. I am able to ping and connect to the socket of the Kafka server using a Compute Engine VM in the same VPN subnetwork as the Dataflow job to be deployed.
I was thinking that there is an issue with the configuration, but I am not able to figure out if I am missing an additional field, or if one of the existing ones is misconfigured. Does anyone know how I can diagnose the problem further since the exception thrown does not really pinpoint the issue? Any help would be greatly appreciated.
Edit:
I am now able to successfully deploy the Dataflow job now, however it appears as though the read is not functioning correctly. After viewing the logs to check for the errors in the Dataflow job, I can see that after discovering the group coordinator for the kafka topic, there are no other log statements before a warning log statement saying that the closing of the idle reader timed out:
Close timed out with 1 pending requests to coordinator, terminating client connections
followed by an uncaught exception with the root cause being:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition test-partition could be determined
There is then an error stating:
Execution of work for P0 for key 0000000000000001 failed. Will retry locally.
Could this maybe be an issue with the key definition since the Kafka topics actually do not have keys in the messages? When I view the topic in Kafka Tool, the only columns observed in the data consist of Offset, Message, and a Timestamp.
Based on the last comment, I assume that you're experiencing the issue more with network stack then initially seeking for any configuration lacks in Dataflow pipeline, in terms of performing Dataflow job runners connections to Kafka brokers.
Basically, when you use Public IP addresses pool for Dataflow workers you have a simplest way to reach external Kafka cluster with no extra configuration to apply on both sides, as you don't need to launch VPC network between parties and perform routine network job to get all routes work.
However, Cloud VPN brings some more complications implementing VPC network on both sides and further adjusting VPN gateway, forwarding rules, and addressing pool for this VPC. Instead, from Dataflow runtime perspective you don't need to spread Public IP addresses between Dataflow runners and doubtlessly reduce the price.
The problem that you've mentioned primary lays on Kafka cluster side. Due to the fact that Apache Kafka is a distributed system, it has the core principle: When producer/consumer executes, it will request metadata about which broker is the leader for a partition, receiving metadata with endpoints available for that partition,thus the client then acknowledge those endpoints to connect to the particular broker. And as far as I understand in your case, the connection to leader is performing through the listener bounded to the external network interface, configured in server.properties broker setting.
Therefore, you might consider to create a separate listener (if it doesn't exist) in listeners bounded to cloud VPC network interface and if necessary propagate advertised.listeners with metadata that is going back to client, consisting data for connection to the particular broker.

Broker per node model?

I have gone through couple of Kafka tutorials on google like this one.
Based on them I have got some questions in context of Kafka :-
1. What is broker ?
Per mine understanding, Each Kafka instance hosting topic(zero or more partition) is broker .
2. Broker per node ?
I believe in practical scenario under clustering , ideally each node will have one kafka instance where each instance will hold two partitions
a. One partition(working as leader)
b. Another partition working as follower for partition on another anode.
Is this correct ?
1) Correct. A broker is an instance of the Kafka server software which runs in a Java virtual machine
2) Incorrect. A node is really the same thing as a broker. If you have three Kafka brokers running as a single cluster (for scalability and reliability) then it's said that you have a 3 node Kafka cluster. Each node is the leader for some partitions and the backup (replica) for others.
However, there are other kinds of nodes besides Kafka broker nodes. Kafka uses Zookeeper so you might have 3 or five Zookeeper nodes as well. A cluster of Zookeepers is often called an Ensemble.
In later versions of Kafka there are now different types of nodes so it's also normal to say there are 3 broker nodes, 5 Zookeeper nodes, 2 Kafka Connect nodes, and a 10 node (or instance) Kafka Streams application.
Each Kafka instance hosting zero or more topics is called a broker.
Each node can host multiple brokers, but in a production environment it makes sense to run one broker per node. Each broker typically hosts multiple topics/partitions though. Having only two partitions per Kafka broker is a waste of resources.
I hope this helps.

Kafka cluster zookeeper failure handling

I am going to implement a kafka cluster consisting of 3 machines, one for zookeeper and other 2 as brokers. I have about 6 consumer machines and about hundred producers.
Now if one of the broker fails data loss is avoided thanks to replication feature. But what if zookeeper fails and the same machine cannot be started? I have several questions:
I noticed that even after zookeeper failure producers continued to push messages in designated broker. But they could no longer be retrieved by consumers. Because Consumers got unregistered. So in this case is data lost permanently?
How to change zookeeper ip in broker config in run time? Will they have to be shutdown to change zookeeper ip?
Even if new zookeeper machine is somehow brought into the cluster previous would the previous data be lost?
Running only one instance of Zookeeper is not fault-tolerant and the behavior cannot be predicted. According to HBase reference, you should setup an ensemble with at least 3 servers.
Have a look at the official documentation page: Zookeeper clustered setup.

Categories