I am not getting any data from the queue using Kafka direct stream. In my code I put System.out.println() This statement not run that means I am not getting any data from that topic..
I am pretty sure data available in queue and since not getting in console.
I didn't see any error in console also.
Can anyone please suggest something?
Here is my Java code,
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount11").setMaster("local[*]");
sparkConf.set("spark.streaming.concurrentJobs", "3");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(3000));
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "x.xx.xxx.xxx:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", true);
Collection<String> topics = Arrays.asList("topicName");
final JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
JavaPairDStream<String, String> lines = stream
.mapToPair(new PairFunction<ConsumerRecord<String, String>, String, String>() {
#Override
public Tuple2<String, String> call(ConsumerRecord<String, String> record) {
return new Tuple2<>(record.key(), record.value());
}
});
lines.print();
// System.out.println(lines.count());
lines.foreachRDD(rdd -> {
rdd.values().foreachPartition(p -> {
while (p.hasNext()) {
System.out.println("Value of Kafka queue" + p.next());
}
});
});
I am able to print string which fetch from the kafka queue using direct kafka stream..
Here is my code,
import java.util.HashMap;
import java.util.HashSet;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import java.util.Calendar;
import java.util.Collection;
import java.util.Currency;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.atomic.AtomicReference;
import java.util.regex.Pattern;
import scala.Tuple2;
import kafka.serializer.StringDecoder;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.HasOffsetRanges;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.kafka.OffsetRange;
import org.json.JSONObject;
import org.omg.CORBA.Current;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.Durations;
public final class KafkaConsumerDirectStream {
public static void main(String[] args) throws Exception {
try {
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount11").setMaster("local[*]");
sparkConf.set("spark.streaming.concurrentJobs", "30");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(200));
Map<String, String> kafkaParams = new HashMap<>();
kafkaParams.put("metadata.broker.list", "x.xx.xxx.xxx:9091");
Set<String> topics = new HashSet();
topics.add("PartWithTopic02Queue");
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class,
String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
lines.foreachRDD(rdd -> {
if (rdd.count() > 0) {
List<String> strArray = rdd.collect();
// Print string here
}
});
jssc.start();
jssc.awaitTermination();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
#Vimal Here is a link to the working version of creating direct streams in Scala.
I believe after reviewing it in Scala, you must convert it easily.
Please make sure that you are turning off for reading the latest topics in Kafka. It might not pick any topic which was processed last time.
Related
I'm a former legacy ActiveMQ user learning Kafka. And I have a question.
With Active MQ you can do this:
Submit 100 messages into a queue
Wait however long you want
Consume those 100 messages from that queue. Guaranteed single consumer of the message.
I try in Kafka to do the same thing
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.containers.KafkaContainer;
import org.testcontainers.utility.DockerImageName;
public class KafkaTest {
private static final Logger LOG = LoggerFactory.getLogger(KafkaTest.class);
public static final String MY_GROUP_ID = "my-group-id";
public static final String TOPIC = "topic";
KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1"));
#Before
public void before() {
kafka.start();
}
#After
public void after() {
kafka.close();
}
#Test
public void testPipes() throws ExecutionException, InterruptedException {
Properties consumerProps = new Properties();
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
consumerProps.put("group.id", MY_GROUP_ID);
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
ExecutorService es = Executors.newCachedThreadPool();
Future consumerFuture = es.submit(() -> {
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps)) {
consumer.subscribe(Collections.singletonList(TOPIC));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
LOG.info("Thread: {}, Topic: {}, Partition: {}, Offset: {}, key: {}, value: {}", Thread.currentThread().getName(), record.topic(), record.partition(), record.offset(), record.key(), record.value().toUpperCase());
}
}
} catch (Exception e) {
LOG.error("Consumer error", e);
}
});
Thread.sleep(10000); // NOTICE! if you remove this, the consumer will not receive the messages. because the consumer won't be registered yet before the messages come rolling on in.
Properties producerProps = new Properties();
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
Future producerFuture = es.submit(() -> {
try (KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps)) {
int counter = 0;
while (counter <= 100) {
System.out.println("Sent " + counter);
String msg = "Message " + counter;
producer.send(new ProducerRecord<>(TOPIC, msg));
counter++;
}
} catch (Exception e) {
LOG.error("Failed to send message by the producer", e);
}
});
producerFuture.get();
consumerFuture.get();
}
}
This example does not work if you do not start Consumer, wait for it to start, then run the producer.
Can anyone show me how to alter my example program to do things where the messages await to be consumed?
In your consumer config, you need to add auto.offset.reset=earliest or call seekToBeginning after subscribing.
Otherwise, it starts to read from the end of the topic. In other words, if you start the consumer after the producer, it'll begin to read after all the existing data.
This is producer config.
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.core.DefaultKafkaProducerFactory;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
import java.util.HashMap;
import java.util.Map;
#Configuration
public class KafkaProducerConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootStrapServers;
public Map<String, Object> producerConfig() {
HashMap<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServers);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, "20971520");
return props;
}
#Bean
public ProducerFactory<String, String> producerFactory() {
return new DefaultKafkaProducerFactory<String, String>(producerConfig());
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate(ProducerFactory<String, String> producerFactory) {
return new KafkaTemplate<String, String>(producerFactory);
}
}
This is consumer config
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.config.KafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
import org.springframework.kafka.core.DefaultKafkaConsumerFactory;
import org.springframework.kafka.listener.ConcurrentMessageListenerContainer;
import java.util.HashMap;
import java.util.Map;
#Configuration
public class KafkaConsumerConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootStrapServers;
public Map<String, Object> consumerConfig() {
HashMap<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServers);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
return props;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<String, String>(consumerConfig());
}
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> factory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
}
This is kafka template.
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
#Service
public class KafkaSender {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
String kafkaTopic = "testTopic";
public void send() {
byte[] array = null;
try {
array = Files.readAllBytes(Paths.get("Test.webm"));
String kafkaTopic = "testTopic";
String encoded = java.util.Base64.getEncoder().encodeToString(array);
kafkaTemplate.send(kafkaTopic, encoded);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
This is listener.
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;
import java.io.FileOutputStream;
import java.io.IOException;
import java.lang.reflect.Array;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
#Component
public class Listener {
#KafkaListener(topics = "testTopic", groupId = "foo")
public void listenGroupFoo(String message){
byte[] decoded = java.util.Base64.getDecoder().decode(message);
try {
FileOutputStream out;
out = new FileOutputStream("video1.mp4");
out.write(decoded);
out.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
Currently I am sending whole byte array but it there is size limitation on kafka I could not sent larger size for eg: 1GB
Please let me know how can we implementation so that i can send byte by byte of a video from producer and collect at consumer and convert all the bytes to a array.
send byte by byte of a video from producer
Literally? Don't use StringSerializer. You'd loop over the array and use ByteArraySerializer
byte[] array = Files.readAllBytes(Paths.get("Test.webm"));
String kafkaTopic = "testTopic";
for (byte b : bytes) {
kafkaTemplate.send(kafkaTopic, new byte[] {b});
}
But
You can only ever produce one file into the same topic at a time - multiple producers will have mixed file bytes
You must modify Kafka producer properties to use transactions, no retries, and only one in flight request max. Otherwise, bytes get dropped, duplicated, or reordered.
Your topic can only have one partition. Otherwise, bytes get reordered
Now, you could chunk the file into larger byte slices, but then re-ordering matters even more.
As far as the consumer goes - there's no straightforward way to know which byte is the end of the file/stream, but you'd need to have some if statement in the listener/poll loop.
Ultimately, Kafka is not designed for file transfers or A/V streaming, and largest reasonable record size would only be a few MB.
I am attempting to build out a kstreams app that takes in records from an input topic that is a simple json payload (id and timestamp included - the key is a simple 3 digit string) (there is also no schema required). for the output topic I wish to produce only the records in which have been abandoned for 30 minutes or more (session window). based on this link, I have begun to develop a kafka streams app:
package io.confluent.developer;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.SessionWindows;
import java.io.FileInputStream;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.time.format.FormatStyle;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class SessionWindow {
private final DateTimeFormatter timeFormatter = DateTimeFormatter.ofLocalizedTime(FormatStyle.LONG)
.withLocale(Locale.US)
.withZone(ZoneId.systemDefault());
public Topology buildTopology(Properties allProps) {
final StreamsBuilder builder = new StreamsBuilder();
final String inputTopic = allProps.getProperty("input.topic.name");
final String outputTopic = allProps.getProperty("output.topic.name");
builder.stream(inputTopic, Consumed.with(Serdes.String(), Serdes.String()))
.groupByKey()
.windowedBy(SessionWindows.ofInactivityGapAndGrace(Duration.ofMinutes(5), Duration.ofSeconds(10)))
.count()
.toStream()
.map((windowedKey, count) -> {
String start = timeFormatter.format(windowedKey.window().startTime());
String end = timeFormatter.format(windowedKey.window().endTime());
String sessionInfo = String.format("Session info started: %s ended: %s with count %s", start, end, count);
return KeyValue.pair(windowedKey.key(), sessionInfo);
})
.to(outputTopic, Produced.with(Serdes.String(), Serdes.String()));
return builder.build();
}
public Properties loadEnvProperties(String fileName) throws IOException {
Properties allProps = new Properties();
FileInputStream input = new FileInputStream(fileName);
allProps.load(input);
input.close();
return allProps;
}
public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new IllegalArgumentException("This program takes one argument: the path to an environment configuration file.");
}
SessionWindow tw = new SessionWindow();
Properties allProps = tw.loadEnvProperties(args[0]);
allProps.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
allProps.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, ClickEventTimestampExtractor.class);
Topology topology = tw.buildTopology(allProps);
ClicksDataGenerator dataGenerator = new ClicksDataGenerator(allProps);
dataGenerator.generate();
final KafkaStreams streams = new KafkaStreams(topology, allProps);
final CountDownLatch latch = new CountDownLatch(1);
// Attach shutdown handler to catch Control-C.
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close(Duration.ofSeconds(5));
latch.countDown();
}
});
try {
streams.cleanUp();
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
static class ClicksDataGenerator {
final Properties properties;
public ClicksDataGenerator(final Properties properties) {
this.properties = properties;
}
public void generate() {
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
}
}
}
package io.confluent.developer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.streams.processor.TimestampExtractor;
public class ClickEventTimestampExtractor implements TimestampExtractor {
#Override
public long extract(ConsumerRecord<Object, Object> record, long previousTimestamp) {
System.out.println(record.value());
return record.getTimestamp();
}
}
i am having issues withe the following:
getting the code to compile - I keep getting this error (I am new to java so please bear with me). what is the correct way to call the getTimestamp?:
error: cannot find symbol
return record.getTimestamp();
^
symbol: method getTimestamp()
location: variable record of type ConsumerRecord<Object,Object>
1 error
not sure if the timestamp extractor will work for this particular scenario. I read here that 'The Timestamp extractor can only give you one timestamp'. does that mean that if there are multiple messages with different keys this wont work? some clarification or examples would help.
thanks!
Please help me diagnose the error message "Failed to connect to service endpoint:". That is the complete error message. Kind of looks like it can't find the endpoint, but as you can see below, I do supply the endpoint with the ".withEndpointConfiguration" method.
Here is my code:
package xyz.bombchu;
import java.util.HashMap;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.InstanceProfileCredentialsProvider;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
public class LambdaFunctionHandler implements RequestHandler<Object, String> {
DynamoDB ddb;
#Override
public String handleRequest(Object input, Context context) {
Regions REGION = Regions.AP_SOUTHEAST_2;
HashMap<String, AttributeValue> item_values =
new HashMap<String, AttributeValue>();
String relativeTime = "02000001";
item_values.put("dateTime", new AttributeValue().withN(relativeTime));
item_values.put("cID", new AttributeValue("TEST"));
AmazonDynamoDB ddb = AmazonDynamoDBClientBuilder.standard()
.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("dynamodb.ap-southeast-2.amazonaws.com", "ap-southeast-2"))
.withCredentials(new InstanceProfileCredentialsProvider())
.withClientConfiguration(new ClientConfiguration())
.build();
try {
ddb.putItem("myTableTest", item_values);
} catch (Exception e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
}
i am trying to read the 100k file and send it to kafka topic. Here is my Kafka Code Which sends data to Kafka-console-consumer. When i am sending data i am receiving the data like this
java.util.stream.ReferencePipeline$Head#e9e54c2
Here is the sample single record data what i am sending:
173|172686|548247079|837113012|0x548247079f|7|173|172686a|0|173|2059 22143|0|173|1|173|172686|||0|||7|0||7|||7|172686|allowAllServices|?20161231:22143|548247079||0|173||172686|5:2266490827:DCCInter;20160905152146;2784
Any suggestion to get the data which i had showned in above...Thanks
Code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Properties;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.stream.Stream;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
#SuppressWarnings("unused")
public class HundredKRecords {
private static String sCurrentLine;
public static void main(String args[]) throws InterruptedException, ExecutionException{
String fileName = "/Users/sreeeedupuganti/Downloads/octfwriter.txt";
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
kafka(stream.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
public static void kafka(String stream) {
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class","kafka.producer.DefaultPartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
producer.send(new KeyedMessage<String, String>("test",stream));
producer.close();
}
}
Problem is in line kafka(stream.toString());
Java stream class doesn't override method toString. By default it returns getClass().getName() + '#' + Integer.toHexString(hashCode()). That's exactly that you recieve.
In order to receive in kafka the whole file, you have manually convert it to one String (array of bytes).
Please, note, that kafka has limit for message size.