Add partitions for Kafka topic dynamically using Spring Boot?

Add partitions for Kafka topic dynamically using Spring Boot? - java

I was able to inspect particular topic for its partitions:
public void addPartitionIfNotExists(int partitionId){
Map<String, TopicDescription> games = kafkaAdmin.describeTopics("games");
TopicDescription gamesTopicDescription = games.get("games");
List<TopicPartitionInfo> partitionsInfo = gamesTopicDescription.partitions();
boolean partitionIdExists = partitionsInfo.stream().anyMatch(partitionInfo -> partitionInfo.partition() == partitionId);
if (!partitionIdExists){
//missing part
}
}
But I haven't been able to add new partition to a already existing topic during runtime. Don't know if that is even possible.

See KafkaAdminOperations Javadocs for more info:
/**
* Create topics if they don't exist or increase the number of partitions if needed.
* #param topics the topics.
*/
void createOrModifyTopics(NewTopic... topics);
Not sure in your logic around partitionIdExists though, since the partition in the Kafka topic is just an index number. So, if there is partition 3, it doesn't mean that there is no partitions 1 or 2. Therefore a NewTopic API is just that simple as numPartitions. Nothing more.
Technically, what you are asking is just covered by that createOrModifyTopics() and that's it: you don't need to check for topics yourself.

Related

How to get a list of all running ros nodes with rosjava?

I am trying to use rosjava to get a list of all running rosnodes. Does anyone know how I could accomplish that? I am new to ros java and unfortunately the documentation is not very helpful.

Currently, it is not possible to get the nodes straight forward. But you could try to use MasterClient.getSystemState which returns the state of the ROS graph as understood by the master. It contains all topics in the system state. You could iterate all publishers and subscribers of these topics to get all nodes.
Here is an untested snippet which should allow to get the topics with their publisher and subscribers.
MasterClient masterClient = MasterClient(masterUri);
Response<SystemState> systemState = masterClient.getSystemState(GraphName.of("WHATEVER"));
Collection<TopicSystemState> topicList = systemState.getResult().getTopics();
for (String topic : topicList) {
Set<String> publishers = topic.getPublishers();
for (String publisher : publishers) {
System.out.println(publisher);
}
Set<String> subscribers = topic.getSubscribers();
for (String subscriber : subscribers) {
System.out.println(subscriber);
}
}
After getting all topics, you could collect all nodes by iterating the available publishers and subscribers for each topic.

Micrometer throws exception when creating a second kafka-consumer

Exception occurred when upgrading to spring-boot 2.3.0. Exception is as follows:
java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'kafka_consumer_fetch_manager_records_consumed_total' containing tag keys [client_id, kafka_version, product, spring_id, topic]. The meter you are attempting to register has keys [client_id, kafka_version, product, spring_id].
at io.micrometer.prometheus.PrometheusMeterRegistry.lambda$applyToCollector$17(PrometheusMeterRegistry.java:429)
at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1932)
at io.micrometer.prometheus.PrometheusMeterRegistry.applyToCollector(PrometheusMeterRegistry.java:413)
at io.micrometer.prometheus.PrometheusMeterRegistry.newFunctionCounter(PrometheusMeterRegistry.java:247)
at io.micrometer.core.instrument.MeterRegistry$More.lambda$counter$1(MeterRegistry.java:884)
at io.micrometer.core.instrument.MeterRegistry.lambda$registerMeterIfNecessary$5(MeterRegistry.java:559)
at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:612)
at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:566)
at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:559)
at io.micrometer.core.instrument.MeterRegistry.access$600(MeterRegistry.java:76)
at io.micrometer.core.instrument.MeterRegistry$More.counter(MeterRegistry.java:884)
at io.micrometer.core.instrument.FunctionCounter$Builder.register(FunctionCounter.java:122)
at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.registerCounter(KafkaMetrics.java:189)
at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.bindMeter(KafkaMetrics.java:174)
at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.lambda$checkAndBindMetrics$1(KafkaMetrics.java:161)
at java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603)
at java.base/java.util.Collections$UnmodifiableMap.forEach(Collections.java:1505)
at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.checkAndBindMetrics(KafkaMetrics.java:137)
at io.micrometer.core.instrument.binder.kafka.KafkaMetrics.bindTo(KafkaMetrics.java:93)
at io.micrometer.core.instrument.binder.kafka.KafkaClientMetrics.bindTo(KafkaClientMetrics.java:39)
at org.springframework.kafka.core.MicrometerConsumerListener.consumerAdded(MicrometerConsumerListener.java:74)
at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:301)
at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createKafkaConsumer(DefaultKafkaConsumerFactory.java:242)
at org.springframework.kafka.core.DefaultKafkaConsumerFactory.createConsumer(DefaultKafkaConsumerFactory.java:212)
at org.springframework.kafka.core.ConsumerFactory.createConsumer(ConsumerFactory.java:67)
at org.springframework.kafka.core.ConsumerFactory.createConsumer(ConsumerFactory.java:54)
at org.springframework.kafka.core.ConsumerFactory.createConsumer(ConsumerFactory.java:43)
This exception occurs when I attempt to create a consumer through ConsumerFactory.createConsumer.
There is another consumer in the app which is created by using spring-kafka through annotating the method with #KafkaListener(topics = TOPICS, groupId = GROUP_ID).
In io.micrometer.core.instrument.binder.kafka.KafkaMetrics line 146-147, I read
//Kafka has metrics with lower number of tags (e.g. with/without topic or partition tag)
//Remove meters with lower number of tags
Which means that the new metric will be discarded as it lacks the topic-tag.
Are there any reasons to why the different ways of creating consumers causes a deviation in tags? If so, is it possible to append the topic-tag to the metric created through ConsumerFactory.createConsumer?

After some debugging, we found this:
I'll look around some more, but seems that when a consumer is started (#KafkaListener) it also adds some metics with the topic it's assigned to?
Just a hypothesis so far.
Another example with less stack - Seems to register topic when scheduled task starts in KafkaMetrics.bindTo -> scheduler.scheduleAtFixedRate(() -> checkAndBindMetrics(registry), getRefreshIntervalInMillis(), getRefreshIntervalInMillis(), TimeUnit.MILLISECONDS);

How do I convert this spring-integration configuration from XML to Java?

This particular piece makes sense to implement in the application rather than XML because it is a constant across the entire cluster, not localized to a single job.
From dissecting the XSD, it looks to me like the xml for int-kafka:outbound-channel-adapter constructs a KafkaProducerMessageHandler.
There is no visible way to set the channel, the topic, or most of the other attributes.
Note to potential downvoters - (rant on) I have been RTFM'ing for a week and am more confused than when I started. My choice of language has graduated from adjectives through adverbs, and I'm starting to borrow words from other languages. The answer may be in there. But if it is, it is not locatable by mere mortals. (rant off)
XML configuration:
<int-kafka:outbound-channel-adapter id="kafkaOutboundChannelAdapter"
kafka-template="kafkaTemplate"
auto-startup="false"
channel="outbound-staging"
topic="foo"
sync="false"
message-key-expression="'bar'"
send-failure-channel="failures"
send-success-channel="successes"
partition-id-expression="2">
</int-kafka:outbound-channel-adapter>
If so, then I would expect the java config to look something like this:
#Bean
public KafkaProducerMessageHandler kafkaOutboundChannelAdapter () {
KafkaProducerMessageHandler result = new KafkaProducerMessageHandler(kafkaTemplate());
result.set????? (); // WTH?? No methods for most of the attributes?!!!
return result;
}
EDIT: Additional information about the high level problem being solved
As a part of a larger project, I am trying to implement the textbook example from https://docs.spring.io/spring-batch/4.0.x/reference/html/spring-batch-integration.html#remote-partitioning , with Kafka backing instead of JMS backing.
I believe the final integration flow should be something like this:
partitionHandler -> messagingTemplate -> outbound-requests (DirectChannel) -> outbound-staging (KafkaProducerMessageHandler) -> kafka
kafka -> executionContainer (KafkaMessageListenerContainer) -> inboundKafkaRequests (KafkaMessageDrivenChannelAdapter) -> inbound-requests (DirectChannel) -> serviceActivator (StepExecutionRequestHandler)
serviceActivator (StepExecutionRequestHandler) -> reply-staging (KafkaProducerMessageHandler) -> kafka
kafka -> replyContainer (KafkaMessageListenerContainer) -> inboundKafkaReplies (KafkaMessageDrivenChannelAdapter) -> inbound-replies (DirectChannel) -> partitionhandler

Not sure what you mean that they are missed, but this is what I see in the source code of that KafkaProducerMessageHandler:
public void setTopicExpression(Expression topicExpression) {
this.topicExpression = topicExpression;
}
public void setMessageKeyExpression(Expression messageKeyExpression) {
this.messageKeyExpression = messageKeyExpression;
}
public void setPartitionIdExpression(Expression partitionIdExpression) {
this.partitionIdExpression = partitionIdExpression;
}
/**
* Specify a SpEL expression to evaluate a timestamp that will be added in the Kafka record.
* The resulting value should be a {#link Long} type representing epoch time in milliseconds.
* #param timestampExpression the {#link Expression} for timestamp to wait for result
* fo send operation.
* #since 2.3
*/
public void setTimestampExpression(Expression timestampExpression) {
this.timestampExpression = timestampExpression;
}
and so on.
You also have access to the super class setters, for example a setSync() for your XML variant.
The input-channel is not a MessageHandler responsibility. It goes to the Endpoint and can be confgigured via #ServiceActivator alongside with that #Bean.
See more info in the Core Spring Integration Reference Manual: https://docs.spring.io/spring-integration/reference/html/#annotations_on_beans
Also there is very important chapter in the beginning: https://docs.spring.io/spring-integration/reference/html/#programming-tips
In addition it might be better to consider to use Java DSL instead of direct MessageHandler usage:
Kafka
.outboundChannelAdapter(producerFactory)
.sync(true)
.messageKey(m -> m
.getHeaders()
.get(IntegrationMessageHeaderAccessor.SEQUENCE_NUMBER))
.headerMapper(mapper())
.partitionId(m -> 0)
.topicExpression("headers[kafka_topic] ?: '" + topic + "'")
.configureKafkaTemplate(t -> t.id("kafkaTemplate:" + topic))
.get();
See more info about Java DSL in the mentioned Spring Integration Docs: https://docs.spring.io/spring-integration/reference/html/#java-dsl

Unit testing a kafka topology that's using kstream joins

I have a topology that does two kstream joins, the problem im facing is when trying to unit test with the TopologyTestDriver sending a couple of ConsumerRecords with pipeInput and then readOutput. It seems not to be working.
Im thinking this might be because the joins is using the internal rocksdb in the actual kafka which we dont use in the tests.
So i've been looking around for a solution for this but cant find any.
Note: This method of testing works perfectly fine when removing the kstream-kstream joins.

I have a topology that does two kstream joins, the problem im facing is when trying to unit test with the TopologyTestDriver sending a couple of ConsumerRecords with pipeInput and then readOutput. It seems not to be working.
By design, but unfortunately in your case, the TopologyTestDriver isn't a 100% accurate model of how the Kafka Streams engine works at runtime. Notably, there are some differences in the processing order of new, incoming events.
This can indeed cause problems when trying to test, for example, certain joins because these operations depend on a certain processing order (e.g., in a stream-table join, the table should already have an entry for key 'alice' before a stream-side event for 'alice' arrives, otherwise the join output for the stream-side 'alice' will not include any table-side data).
So i've been looking around for a solution for this but cant find any.
What I suggest is to use tests that spin up an embedded Kafka cluster, and then run your tests against that cluster using the "real" Kafka Streams engine (i.e., not the TopologyTestDriver). Effectively, this means you are changing your tests from unit tests to integration/system tests: your test will launch a full-fledged Kafka Streams topology that talks to the embedded Kafka cluster that runs on the same machine as your test.
See the integration tests for Kafka Streams in the Apache Kafka project, where EmbeddedKafkaCluster and IntegrationTestUtils are the center pieces for the tooling. A concrete test example for joins is StreamTableJoinIntegrationTest (there are a few join related integration tests) with its parent AbstractJoinIntegrationTest. (For what it's worth, there are further integration tests examples at https://github.com/confluentinc/kafka-streams-examples#examples-integration-tests, which includes tests that also cover Confluent Schema Registry when using Apache Avro as your data format, etc.)
However, unless I am mistaken, the integration tests and their tooling are not included in the test utilities artifact of Kafka Streams (i.e., org.apache.kafka:kafka-streams-test-utils). So you'd have to do some copy-pasting into your own code base.

Have you had a look at the Kafka Streams unit tests [1]? It's about piping in the data and checking the end result with a mock processor.
For example for the following stream join:
stream1 = builder.stream(topic1, consumed);
stream2 = builder.stream(topic2, consumed);
joined = stream1.outerJoin(
stream2,
MockValueJoiner.TOSTRING_JOINER,
JoinWindows.of(ofMillis(100)),
StreamJoined.with(Serdes.Integer(), Serdes.String(), Serdes.String()));
joined.process(supplier);
You can then start piping input items into the first or second topic and check with each successive piping of input, what the processor can check:
// push two items to the primary stream; the other window is empty
// w1 = {}
// w2 = {}
// --> w1 = { 0:A0, 1:A1 }
// w2 = {}
for (int i = 0; i < 2; i++) {
inputTopic1.pipeInput(expectedKeys[i], "A" + expectedKeys[i]);
}
processor.checkAndClearProcessResult(EMPTY);
// push two items to the other stream; this should produce two items
// w1 = { 0:A0, 1:A1 }
// w2 = {}
// --> w1 = { 0:A0, 1:A1 }
// w2 = { 0:a0, 1:a1 }
for (int i = 0; i < 2; i++) {
inputTopic2.pipeInput(expectedKeys[i], "a" + expectedKeys[i]);
}
processor.checkAndClearProcessResult(new KeyValueTimestamp<>(0, "A0+a0", 0),
new KeyValueTimestamp<>(1, "A1+a1", 0));
I hope this helps.
References:
[1] https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/kstream/internals/KStreamKStreamJoinTest.java#L279

Can Spark Streaming do Anything Other Than Word Count?

I'm trying to get to grips with Spark Streaming but I'm having difficulty. Despite reading the documentation and analysing the examples I wish to do something more than a word count on a text file/stream/Kafka queue which is the only thing we're allowed to understand from the docs.
I wish to listen to an incoming Kafka message stream, group messages by key and then process them. The code below is a simplified version of the process; get the stream of messages from Kafka, reduce by key to group messages by message key then to process them.
JavaPairDStream<String, byte[]> groupByKeyList = kafkaStream.reduceByKey((bytes, bytes2) -> bytes);
groupByKeyList.foreachRDD(rdd -> {
List<MyThing> myThingsList = new ArrayList<>();
MyCalculationCode myCalc = new MyCalculationCode();
rdd.foreachPartition(partition -> {
while (partition.hasNext()) {
Tuple2<String, byte[]> keyAndMessage = partition.next();
MyThing aSingleMyThing = MyThing.parseFrom(keyAndMessage._2); //parse from protobuffer format
myThingsList.add(aSingleMyThing);
}
});
List<MyResult> results = myCalc.doTheStuff(myThingsList);
//other code here to write results to file
});
When debugging I see that in the while (partition.hasNext()) the myThingsList has a different memory address than the declared List<MyThing> myThingsList in the outer forEachRDD.
When List<MyResult> results = myCalc.doTheStuff(myThingsList); is called there are no results because the myThingsList is a different instance of the List.
I'd like a solution to this problem but would prefer a reference to documentation to help me understand why this is not working (as anticipated) and how I can solve it for myself (I don't mean a link to the single page of Spark documentation but also section/paragraph or preferably still, a link to 'JavaDoc' that does not provide Scala examples with non-functional commented code).

The reason you're seeing different list addresses is because Spark doesn't execute foreachPartition locally on the driver, it has to serialize the function and send it over the Executor handling the processing of the partition. You have to remember that although working with the code feels like everything runs in a single location, the calculation is actually distributed.
The first problem I see with you code has to do with your reduceByKey which takes two byte arrays and returns the first, is that really what you want to do? That means you're effectively dropping parts of the data, perhaps you're looking for combineByKey which will allow you to return a JavaPairDStream<String, List<byte[]>.
Regarding parsing of your protobuf, looks to me like you don't want foreachRDD, you need an additional map to parse the data:
kafkaStream
.combineByKey(/* implement logic */)
.flatMap(x -> x._2)
.map(proto -> MyThing.parseFrom(proto))
.map(myThing -> myCalc.doStuff(myThing))
.foreachRDD(/* After all the processing, do stuff with result */)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.