I'm a former legacy ActiveMQ user learning Kafka. And I have a question.
With Active MQ you can do this:
Submit 100 messages into a queue
Wait however long you want
Consume those 100 messages from that queue. Guaranteed single consumer of the message.
I try in Kafka to do the same thing
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.containers.KafkaContainer;
import org.testcontainers.utility.DockerImageName;
public class KafkaTest {
private static final Logger LOG = LoggerFactory.getLogger(KafkaTest.class);
public static final String MY_GROUP_ID = "my-group-id";
public static final String TOPIC = "topic";
KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1"));
#Before
public void before() {
kafka.start();
}
#After
public void after() {
kafka.close();
}
#Test
public void testPipes() throws ExecutionException, InterruptedException {
Properties consumerProps = new Properties();
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
consumerProps.put("group.id", MY_GROUP_ID);
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
ExecutorService es = Executors.newCachedThreadPool();
Future consumerFuture = es.submit(() -> {
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps)) {
consumer.subscribe(Collections.singletonList(TOPIC));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
LOG.info("Thread: {}, Topic: {}, Partition: {}, Offset: {}, key: {}, value: {}", Thread.currentThread().getName(), record.topic(), record.partition(), record.offset(), record.key(), record.value().toUpperCase());
}
}
} catch (Exception e) {
LOG.error("Consumer error", e);
}
});
Thread.sleep(10000); // NOTICE! if you remove this, the consumer will not receive the messages. because the consumer won't be registered yet before the messages come rolling on in.
Properties producerProps = new Properties();
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
Future producerFuture = es.submit(() -> {
try (KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps)) {
int counter = 0;
while (counter <= 100) {
System.out.println("Sent " + counter);
String msg = "Message " + counter;
producer.send(new ProducerRecord<>(TOPIC, msg));
counter++;
}
} catch (Exception e) {
LOG.error("Failed to send message by the producer", e);
}
});
producerFuture.get();
consumerFuture.get();
}
}
This example does not work if you do not start Consumer, wait for it to start, then run the producer.
Can anyone show me how to alter my example program to do things where the messages await to be consumed?
In your consumer config, you need to add auto.offset.reset=earliest or call seekToBeginning after subscribing.
Otherwise, it starts to read from the end of the topic. In other words, if you start the consumer after the producer, it'll begin to read after all the existing data.
Related
I am attempting to build out a kstreams app that takes in records from an input topic that is a simple json payload (id and timestamp included - the key is a simple 3 digit string) (there is also no schema required). for the output topic I wish to produce only the records in which have been abandoned for 30 minutes or more (session window). based on this link, I have begun to develop a kafka streams app:
package io.confluent.developer;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.SessionWindows;
import java.io.FileInputStream;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.time.format.FormatStyle;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class SessionWindow {
private final DateTimeFormatter timeFormatter = DateTimeFormatter.ofLocalizedTime(FormatStyle.LONG)
.withLocale(Locale.US)
.withZone(ZoneId.systemDefault());
public Topology buildTopology(Properties allProps) {
final StreamsBuilder builder = new StreamsBuilder();
final String inputTopic = allProps.getProperty("input.topic.name");
final String outputTopic = allProps.getProperty("output.topic.name");
builder.stream(inputTopic, Consumed.with(Serdes.String(), Serdes.String()))
.groupByKey()
.windowedBy(SessionWindows.ofInactivityGapAndGrace(Duration.ofMinutes(5), Duration.ofSeconds(10)))
.count()
.toStream()
.map((windowedKey, count) -> {
String start = timeFormatter.format(windowedKey.window().startTime());
String end = timeFormatter.format(windowedKey.window().endTime());
String sessionInfo = String.format("Session info started: %s ended: %s with count %s", start, end, count);
return KeyValue.pair(windowedKey.key(), sessionInfo);
})
.to(outputTopic, Produced.with(Serdes.String(), Serdes.String()));
return builder.build();
}
public Properties loadEnvProperties(String fileName) throws IOException {
Properties allProps = new Properties();
FileInputStream input = new FileInputStream(fileName);
allProps.load(input);
input.close();
return allProps;
}
public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new IllegalArgumentException("This program takes one argument: the path to an environment configuration file.");
}
SessionWindow tw = new SessionWindow();
Properties allProps = tw.loadEnvProperties(args[0]);
allProps.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
allProps.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, ClickEventTimestampExtractor.class);
Topology topology = tw.buildTopology(allProps);
ClicksDataGenerator dataGenerator = new ClicksDataGenerator(allProps);
dataGenerator.generate();
final KafkaStreams streams = new KafkaStreams(topology, allProps);
final CountDownLatch latch = new CountDownLatch(1);
// Attach shutdown handler to catch Control-C.
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close(Duration.ofSeconds(5));
latch.countDown();
}
});
try {
streams.cleanUp();
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
static class ClicksDataGenerator {
final Properties properties;
public ClicksDataGenerator(final Properties properties) {
this.properties = properties;
}
public void generate() {
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
}
}
}
package io.confluent.developer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.streams.processor.TimestampExtractor;
public class ClickEventTimestampExtractor implements TimestampExtractor {
#Override
public long extract(ConsumerRecord<Object, Object> record, long previousTimestamp) {
System.out.println(record.value());
return record.getTimestamp();
}
}
i am having issues withe the following:
getting the code to compile - I keep getting this error (I am new to java so please bear with me). what is the correct way to call the getTimestamp?:
error: cannot find symbol
return record.getTimestamp();
^
symbol: method getTimestamp()
location: variable record of type ConsumerRecord<Object,Object>
1 error
not sure if the timestamp extractor will work for this particular scenario. I read here that 'The Timestamp extractor can only give you one timestamp'. does that mean that if there are multiple messages with different keys this wont work? some clarification or examples would help.
thanks!
I'm trying to figure out how to include custom headers in the Spring Message<?> used in Spring Cloud Stream with the Kafka Binder. My goal is to include some custom header data that would be added on in one producer (function) class, passed to kafka and then consumed by another class in a different service (with the customer header data).
I feel like I am missing something as I can seem to get it to work using the TestChannelBinder e.g.
import org.springframework.messaging.Message;
import org.springframework.stereotype.Component;
import java.util.function.Function;
#Component
#Slf4j
public class BaseStream implements Function<Message<String>, String> {
#Override
public String apply(Message<String> transactionMessage) {
log.debug("Converted Message: {} ", transactionMessage);
return transactionMessage.getPayload();
}
}
Test class with Test Binder:
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.cloud.stream.binder.test.InputDestination;
import org.springframework.cloud.stream.binder.test.OutputDestination;
import org.springframework.cloud.stream.binder.test.TestChannelBinderConfiguration;
import org.springframework.context.annotation.Import;
import org.springframework.integration.support.MessageBuilder;
import org.springframework.kafka.test.context.EmbeddedKafka;
import org.springframework.test.context.TestPropertySource;
#SpringBootTest
#TestPropertySource("classpath:testStream.properties")
#Import(TestChannelBinderConfiguration.class)
public class TestForStream {
#Autowired
InputDestination inputDestination;
#Autowired
OutputDestination outputDestination;
#Test
void contextLoads() {
inputDestination.send(MessageBuilder
.withPayload("Test Payload")
.setHeader("customHeader", "headerSpecificData")
.build());
}
}
testStream.properties
spring.cloud.function.definition=baseStream
spring.cloud.stream.bindings.baseStream-in-0.destination=test-in
spring.cloud.stream.bindings.baseStream-out-0.destination=test-out
spring.cloud.stream.bindings.baseStream-in-0.group=test-group-base
Log when running:
Converted Message: GenericMessage [payload=Test Payload, headers={id=5c6d1082-c084-0b25-4afc-b5d97bf537f9, customHeader=headerSpecificData, contentType=application/json, timestamp=1639398696800, target-protocol=kafka}]
Which is what I am looking to do. But when I try to test it for the kafka bider it seems to include the Message<String> object in the payload as a JSON string, which I thought would be parsed into the requested input of the function BaseStream.
Just wondering if someone could maybe see where i'm going wrong with my testing as I have tried various things to get this to work, and seeing as it works with a test binder I would assume it works for the Kafka Binder.
Test Class for Kafka Binder Test:
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.integration.support.MessageBuilder;
import org.springframework.kafka.core.DefaultKafkaProducerFactory;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
import org.springframework.kafka.support.serializer.JsonSerializer;
import org.springframework.kafka.test.EmbeddedKafkaBroker;
import org.springframework.kafka.test.context.EmbeddedKafka;
import org.springframework.kafka.test.utils.KafkaTestUtils;
import org.springframework.test.context.TestPropertySource;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
#EmbeddedKafka(partitions = 1, brokerProperties = { "listeners=PLAINTEXT://localhost:9092", "port=9092"})
#SpringBootTest
#TestPropertySource("classpath:testStream.properties")
public class TestForStream {
public static CountDownLatch latch = new CountDownLatch(1);
#Autowired
public EmbeddedKafkaBroker broker;
#Test
void contextLoads() {
sleep(5);//Included this as it takes some time to init>
sendMessage("test-in", MessageBuilder
.withPayload("Test Payload")
.setHeader("customHeader", "headerSpecificData")
.build());
}
public <T> ProducerFactory<String, T> createProducerFactory() {
Map<String, Object> configs = new HashMap<>(KafkaTestUtils.producerProps(broker));
configs.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configs.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
//Is JsonSerializer correct for a message?
return new DefaultKafkaProducerFactory<>(configs);
}
public <T> void sendMessage(String topic, T listObj) {
try {
KafkaTemplate<String, T> kafkaTemplate = new KafkaTemplate<>(createProducerFactory());
kafkaTemplate.send(new ProducerRecord<>(topic, listObj));
}catch (Exception e){
e.printStackTrace();
}
}
public void sleep(long time){
try {
latch.await(time, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
Log of kafka binder test for message:
Converted Message: GenericMessage [payload={"payload":"Test Payload","headers":{"customHeader":"headerSpecificData","id":"d540a3ca-28db-b137-fc86-c25cc4b7eb8b","timestamp":1639399810476}}, headers={deliveryAttempt=1, kafka_timestampType=CREATE_TIME, kafka_receivedTopic=test-in, target-protocol=kafka, kafka_offset=0, scst_nativeHeadersPresent=true, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer#79580279, id=1cf2d382-df29-2672-4180-07da99e58244, kafka_receivedPartitionId=0, kafka_receivedTimestamp=1639399810526, contentType=application/json, __TypeId__=[B#24c79350, kafka_groupId=test-group-base, timestamp=1639399810651}]
So here the message has been included in the payload and the kafka headers included in the headers as expected.
I have tried spring.cloud.stream.kafka.binder.headers and headerMode to see if they would change anything but to no avail.
Edit:
Using springCloudVersion = 2020.0.3
I was using:
public <T> void sendMessage(String topic, T listObj) {
try {
KafkaTemplate<String, T> kafkaTemplate = new KafkaTemplate<>(createProducerFactory());
kafkaTemplate.send(new ProducerRecord<>(topic, listObj));
}catch (Exception e){
e.printStackTrace();
}
}
To send the message which was putting the message as the value.
What I should've been using:
public void sendMessage(String topic, Message<?> listObj) {
try {
KafkaTemplate<String, Message<?>> kafkaTemplate = new KafkaTemplate<>(createProducerFactory());
kafkaTemplate.setDefaultTopic(topic);
kafkaTemplate.send(listObj);
}catch (Exception e){
e.printStackTrace();
}
}
I am trying out the kafka streaming. I am reading messages from one topic and doing groupByKey and then doing the count of groups. But the problem is that the messages count is coming as unreadable "boxes".
If I run the console consumer these are coming as empty strings
This is the WordCount code I wrote
package streams;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import java.util.Arrays;
import java.util.Properties;
public class WordCount {
public static void main(String[] args) {
Properties properties = new Properties();
properties.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, "streams-demo-2");
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.StringSerde.class.getName());
properties.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.StringSerde.class.getName());
// topology
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> input = builder.stream("temp-in");
KStream<String, Long> fil = input.flatMapValues(val -> Arrays.asList(val.split(" "))) // making stream of text line to stream of words
.selectKey((k, v) -> v) // changing the key
.groupByKey().count().toStream(); // getting count after groupBy
fil.to("temp-out");
KafkaStreams streams = new KafkaStreams(builder.build(), properties);
streams.start();
System.out.println(streams.toString());
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
This is the output I am getting in the consumer. It is there on the right side in image
I had tried casting the long to long again to see if it works. But it's not working
I am attaching the consumer code too if it helps.
package tutorial;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class Consumer {
public static void main(String[] args) {
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
// Once the consumer starts running it keeps running even after we stop in console
// We should create new consumer to read from earliest because the previous one had already consumed until certain offset
// when we run the same consumer in two consoles kafka detects it and re balances
// In this case the consoles split the partitions they consume forming a consumer group
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "consumer-application-1"); // -> consumer id
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); // -> From when consumer gets data
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singleton("temp-out"));
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record: consumerRecords) {
System.out.println(record.key() + " " + record.value());
System.out.println(record.partition() + " " + record.offset());
}
}
}
}
Any help is appreciated. Thanks in advance.
The message value you're writing with Kafka Streams is a Long, and you're consuming it as a String.
If you make the following changes to your Consumer class, you'll be able to see the count printed correctly to stdout:
// Change this from StringDeserializer to LongDeserializer.
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
...
// The value you're consuming here is a Long, not a String.
KafkaConsumer<String, Long> consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singleton("temp-out"));
while (true) {
ConsumerRecords<String, Long> consumerRecords = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, Long> record : consumerRecords) {
System.out.println(record.key() + " " + record.value());
System.out.println(record.partition() + " " + record.offset());
}
}
We recently started to use Kafka and I am writing a Kafka consumer application using Kafka Java native consumer API.
However most of the examples I saw are using a while loop and then call poll method on a consumer object in the loop. Like below:
while (true) {
final ConsumerRecords<Long, String> consumerRecords =
consumer.poll(1000);
if (consumerRecords.count()==0) {
noRecordsCount++;
if (noRecordsCount > giveUp) break;
else continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n",
record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
}
I am just seeking a better way of doing this without a loop using the native Java consumer API. I know by using spring kafka you don't need to write those. How about using native API? Any good approach or best practice?
I tried using a scheduler and code is working.
package com.kafka;
import java.text.ParseException;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.Properties;
import java.util.TimerTask;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class ScheduledTask extends TimerTask {
Date now; // to display current time
public void run( ) {
now = new Date();
System.out.println("Time is :" + now);
String AlarmString=null;
Properties props = new Properties();
props.put("bootstrap.servers", "10.*.*.*:9092");
props.put("group.id", "grp-1");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("RAW_MH_RAN_SAM"));
ConsumerRecords<String, String> records = consumer.poll(1000);
//ConsumerRecords<String, String> records =consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records)
{
System.out.println("Consumer:=============== partition Id= " + record.partition() + " offset = " + record.offset() + " value = " + record.value() + "=================");
if (AlarmString==null && !(record.value().toString().contains("PR ALARM:")) ){
AlarmString=record.value();
}
else{
if( !(record.value().toString().contains("PR ALARM:")) )
{
//System.out.println("record.value() :::"+ record.value() );
AlarmString=AlarmString+","+record.value();
}
}
}
if (consumer != null) {
System.out.println("Closing Connection");
consumer.close();
}
}
}
//Simple consumer
package com.kafka;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import com.google.gson.Gson;
import java.text.ParseException;
import java.time.Duration;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;
import java.util.Timer;
public class SimpleConsumer{
public static void main(String[] args)
{
Timer time = new Timer();
ScheduledTask st = new ScheduledTask(); // Instantiate SheduledTask class
time.schedule(st, 0,60000); // Create Repetitively task for every 1 secs
}
}
Polling continuously is "just" the way how Kafka consumer works, how the underneath Kafka protocol works.
Each action is always initiated by the clients (pulling model) not by brokers (pushing model) that in case of consuming messages translates into polling.
Using the Java Kafka consumer API means having a loop, using a scheduler or whatever technology you have with Java for executing code continuously, you have to deal with it.
Other frameworks like Spring or Smallrye reactive messaging just do that for you. They are hiding the poll loop to your application but in the end there is always a loop ... it's how Kafka works.
i am trying to read the 100k file and send it to kafka topic. Here is my Kafka Code Which sends data to Kafka-console-consumer. When i am sending data i am receiving the data like this
java.util.stream.ReferencePipeline$Head#e9e54c2
Here is the sample single record data what i am sending:
173|172686|548247079|837113012|0x548247079f|7|173|172686a|0|173|2059 22143|0|173|1|173|172686|||0|||7|0||7|||7|172686|allowAllServices|?20161231:22143|548247079||0|173||172686|5:2266490827:DCCInter;20160905152146;2784
Any suggestion to get the data which i had showned in above...Thanks
Code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Properties;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.stream.Stream;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
#SuppressWarnings("unused")
public class HundredKRecords {
private static String sCurrentLine;
public static void main(String args[]) throws InterruptedException, ExecutionException{
String fileName = "/Users/sreeeedupuganti/Downloads/octfwriter.txt";
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
kafka(stream.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
public static void kafka(String stream) {
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class","kafka.producer.DefaultPartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
producer.send(new KeyedMessage<String, String>("test",stream));
producer.close();
}
}
Problem is in line kafka(stream.toString());
Java stream class doesn't override method toString. By default it returns getClass().getName() + '#' + Integer.toHexString(hashCode()). That's exactly that you recieve.
In order to receive in kafka the whole file, you have manually convert it to one String (array of bytes).
Please, note, that kafka has limit for message size.