Spark streaming Kafka messages not consumed - java

I want to receive messages from a topic in Kafka (broker v 0.10.2.1) using Spark (1.6.2) Streaming.
I'm using the Receiver approach. The code is as following code:
public static void main(String[] args) throws Exception
{
SparkConf sparkConf = new SparkConf().setAppName("SimpleStreamingApp");
JavaStreamingContext javaStreamingContext = new JavaStreamingContext(sparkConf, new Duration(5000));
//
Map<String, Integer> topicMap = new HashMap<>();
topicMap.put("myTopic", 1);
//
String zkQuorum = "host1:port1,host2:port2,host3:port3";
//
Map<String, String> kafkaParamsMap = new HashMap<>();
kafkaParamsMap.put("bootstraps.server", zkQuorum);
kafkaParamsMap.put("metadata.broker.list", zkQuorum);
kafkaParamsMap.put("zookeeper.connect", zkQuorum);
kafkaParamsMap.put("group.id", "group_name");
kafkaParamsMap.put("security.protocol", "SASL_PLAINTEXT");
kafkaParamsMap.put("security.mechanism", "GSSAPI");
kafkaParamsMap.put("ssl.kerberos.service.name", "kafka");
kafkaParamsMap.put("key.deserializer", "kafka.serializer.StringDecoder");
kafkaParamsMap.put("value.deserializer", "kafka.serializer.DefaultDecoder");
//
JavaPairReceiverInputDStream<byte[], byte[]> stream = KafkaUtils.createStream(javaStreamingContext,
byte[].class, byte[].class,
DefaultDecoder.class, DefaultDecoder.class,
kafkaParamsMap,
topicMap,
StorageLevel.MEMORY_ONLY());
VoidFunction<JavaPairRDD<byte[], byte[]>> voidFunc = new VoidFunction<JavaPairRDD<byte[], byte[]>> ()
{
public void call(JavaPairRDD<byte[], byte[]> rdd) throws Exception
{
List<Tuple2<byte[], byte[]>> all = rdd.collect();
System.out.println("size of red: " + all.size());
}
}
stream.forEach(voidFunc);
javaStreamingContext.start();
javaStreamingContext.awaitTermination();
}
Access to Kafka is kerberized. When I launch
spark-submit --verbose --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf"
--files jaas.conf,privKey.der
--principal <accountName>
--keytab <path to keytab file>
--master yarn
--jars <comma separated path to all jars>
--class <fully qualified java main class>
<path to jar file containing main class>
VerifiableProperties class from Kafka logs warning messages for the properties included in the kafkaParams hashmap:
INFO KafkaReceiver: connecting to zookeeper: <the correct zookeeper quorum provided in kafkaParams map>
VerifiableProperties: Property auto.offset.reset is overridden to largest
VerifiableProperties: Property enable.auto.commit is not valid.
VerifiableProperties: Property sasl.kerberos.service.name is not valid
VerifiableProperties: Property key.deserializer is not valid
...
VerifiableProperties: Property zookeeper.connect is overridden to ....
I think because these properties are not accepted, so it might be affecting the stream processing.
** when I launch in the cluster mode --master yarn, then these warning messages don't appear**
Later, I see following logs repeated every 5 seconds as configured:
INFO BlockRDD: Removing RDD 4 from persistence list
INFO KafkaInputDStream: Removing blocks of RDD BlockRDD[4] at createStream at ...
INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
INFO ... INFO BlockManager: Removing RDD 4
However, I don't see any actual message getting printed on the console.
Question: Why is my code not printing any actual messages?
My gradle dependencies are:
compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.6.2'
compile group: 'org.apache.spark', name: 'spark-streaming_2.10', version: '1.6.2'
compile group: 'org.apache.spark', name: 'spark-streaming-kafka_2.10', version: '1.6.2'

stream is an object of JavaPairReceiverInputDStream. Convert it into Dstream and use foreachRDD to print the messages that are consumed from Kafka

Spark 1.6.2 not support kafka 0.10 ,just support kafka0.8. For kafka 0.10 ,you should use spark 2

Related

Kafka Connect - No converter present due to unexpected object type: java.lang.Double

I have a Kafka Streams application, which does some calculations based on values from topic, sing sliding windows. I read, that the best practice for persisting the data would be to push the values to another topic, and use Kafka Connect to get data from topic and save it in the database.
I downloaded the confluent/kafka-connect image and extended it by installing mongodb-kafka-connector using the dockerfile.
FROM confluentinc/cp-kafka-connect:7.0.3
COPY target/components/packages/mongodb-kafka-connect-mongodb-1.7.0.zip /tmp/mongodb-kafka-connect-mongodb-1.7.0.zip
RUN confluent-hub install --no-prompt /tmp/mongodb-kafka-connect-mongodb-1.7.0.zip
I sent a request with config to the kafka connect:
curl -X PUT http://localhost:8083/connectors/sink-mongodb-users/config -H "Content-Type: application/json" -d ' {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max":"1",
"topics":"MOVINGAVG",
"connection.uri":"mongodb://mongo:27017",
"database":"Temperature",
"collection":"MovingAverage",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":true,
"value.converter":"org.apache.kafka.connect.converters.DoubleConverter",
"value.converter.schemas.enable":true
}'
The Kafka Streams app producing records:
StreamsBuilder streamsBuilder = new StreamsBuilder();
KStream<String, Double> kafkaStreams = streamsBuilder.stream(topic, Consumed.with(Serdes.String(), Serdes.Double()));
Duration timeDifference = Duration.ofSeconds(30);
KTable table = kafkaStreams.groupByKey()
.windowedBy(SlidingWindows.ofTimeDifferenceWithNoGrace(timeDifference))
.aggregate(
() -> generateTuple(logger), // initializer
(key, value, aggregate) -> tempAggregator(key, value,aggregate, logger))
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.mapValues((ValueMapper<AggregationClass, Object>) tuple2 -> tuple2.getAverage());
table.toStream().peek((k,v) -> System.out.println("Value:" + v)).to(targetTopic, Produced.valueSerde(Serdes.Double()));
KafkaStreams streams = new KafkaStreams(streamsBuilder.build(), properties);
streams.cleanUp();
streams.start();
The messages are being put to the topic, and mongo connector is reading them, however an exception is thrown:
Caused by: org.apache.kafka.connect.errors.DataException: Could not convert value `2.1474836E7` into a BsonDocument.
at com.mongodb.kafka.connect.sink.converter.LazyBsonDocument.getUnwrapped(LazyBsonDocument.java:169)
at com.mongodb.kafka.connect.sink.converter.LazyBsonDocument.containsKey(LazyBsonDocument.java:83)
at com.mongodb.kafka.connect.sink.processor.DocumentIdAdder.shouldAppend(DocumentIdAdder.java:68)
at com.mongodb.kafka.connect.sink.processor.DocumentIdAdder.lambda$process$0(DocumentIdAdder.java:51)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at com.mongodb.kafka.connect.sink.processor.DocumentIdAdder.process(DocumentIdAdder.java:49)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModel$1(MongoProcessedSinkRecordData.java:90)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModel$2(MongoProcessedSinkRecordData.java:90)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.tryProcess(MongoProcessedSinkRecordData.java:105)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.buildWriteModel(MongoProcessedSinkRecordData.java:85)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.createWriteModel(MongoProcessedSinkRecordData.java:81)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.<init>(MongoProcessedSinkRecordData.java:51)
at com.mongodb.kafka.connect.sink.MongoSinkRecordProcessor.orderedGroupByTopicAndNamespace(MongoSinkRecordProcessor.java:45)
at com.mongodb.kafka.connect.sink.StartedMongoSinkTask.put(StartedMongoSinkTask.java:75)
at com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:90)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:601)
... 10 more
Caused by: org.apache.kafka.connect.errors.DataException: No converter present due to unexpected object type: java.lang.Double
at com.mongodb.kafka.connect.sink.converter.SinkConverter.getRecordConverter(SinkConverter.java:92)
at com.mongodb.kafka.connect.sink.converter.SinkConverter.lambda$convert$1(SinkConverter.java:60)
at com.mongodb.kafka.connect.sink.converter.LazyBsonDocument.getUnwrapped(LazyBsonDocument.java:166)
... 27 more
Why such exception would be thrown?

Kafka producer ERROR using "security.protocol"

I have a Kerberized cluster with 2 Kafka brokers and 3 Zookeeper nodes.
I have a Java application, in local, ( using kafka-client 2.4.0 ) that must produce and read messages from topics in Kafka brokers.
In input at the VM I gave:
-Djava.security.auth.login.config=/Users/mypath/kafka_client_jaas.conf
I also have a schema registry ( confluent ), in local, connected to the cluster.
To create a Producer a set those options:
Properties producerProps = new Properties();
producerProps.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
producerProps.put("sasl.kerberos.service.name", "kafka");
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.KAFKA_BROKERS);
producerProps.put(ProducerConfig.CLIENT_ID_CONFIG, user);
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
producerProps.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
error:
java.io.IOException: Configuration Error:
row 5: expected [option key], found [null]
at sun.security.provider.ConfigFile$Spi.<init>(ConfigFile.java:137)
at sun.security.provider.ConfigFile.<init>(ConfigFile.java:102)
The problem is caused by
producerProps.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
Without this line, the application compiles but I receive:
org.apache.kafka.common.errors.TimeoutException: Topic TEST not present in metadata after 60000 ms.
What should I do?
P.S. the "kafka_client_jaas.conf" should be right, since I use it also with schema registry and it works fine.
kafka_client_jaas.conf:
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="/Users/path/kafka.service.keytab" \
principal="kafka/kafka-broker01.domain.xx#DOMAIN.XX"; };

Create Kafka Producer to send each message from the list

I have kafka and zookeeper running in the docker-machine
I need to send kafka messages to kafka by using springboot.
List of Messages:
[[{"id":"0x804f","timestamp":1551684977690}],
[{"id":"1234","timestamp":155168497800}],
[{"id":"39339e82-6bd6-4ab6-9672-21d0df4d34eb","timestamp":1551684977690}],
[{"id":"a3173ca5-4cc4-408b-a058-879a298d6081","timestamp":155168497800}]]
This is what I tried for sample :
import java.util.Properties;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
public class Producer {
private Properties properties = new Properties();
String topicName = "tslistsbc";
public Producer(){
String bootstrapServer = "docker-machineIP:9092";
String keySerializer = StringSerializer.class.getName();
String valueSerializer = StringSerializer.class.getName();
String producerId = "simpleProducer";
int retries = 2;
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, keySerializer);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, valueSerializer);
properties.put(ProducerConfig.CLIENT_ID_CONFIG, producerId);
properties.put(ProducerConfig.RETRIES_CONFIG, retries);
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties);
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties);
String value = "sample list"
ProducerRecord<String, String> producerRecord = new ProducerRecord<>(topicName, "1",value);
kafkaProducer.send(producerRecord);
kafkaProducer.close();
}
Docker Image:
These containers are running in the docker machine
zookeeper:
build: ../components/zookeeper
image: xxxx:${ZOOKEEPER}
container_name: zookeeper
ports:
- 2181:2181
restart: unless-stopped
kafka:
build: ../components/kafka
image: xxx:${EMD_KAFKA}
container_name: image-kafka
environment:
KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100
KAFKA_CREATE_TOPICS: "tslist:1:1,topic:1:1"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_MESSAGE_MAX_BYTES: "15728640"
ports:
- 9092:9092
depends_on:
- zookeeper
restart: unless-stopped
Error Message
SLF4J: Failed toString() invocation on an object of type [org.apache.kafka.clients.NodeApiVersions]
Reported exception:
java.lang.NullPointerException
at org.apache.kafka.clients.NodeApiVersions.apiVersionToText(NodeApiVersions.java:167)
Its not working, the message is not being sent.
Since you are trying to access one of the docker compose containers externally from the docker compose started services (for instance by running your service in your IDE), you need to add the docker container name to your system's hosts file.
In Linux/Mac the hosts file is at /etc/hosts and in Windows its at c:\windows\system32\drivers\etc\hosts. According to the error you are getting, your hosts file should have an entry like the following:
127.0.0.1 image-kafka
Regarding the exception
SLF4J: Failed toString() invocation on an object of type
[org.apache.kafka.clients.NodeApiVersions]
Reported exception:
java.lang.NullPointerException
at org.apache.kafka.clients.NodeApiVersions.apiVersionToText(NodeApiVersions.java:167)
it is due to a mismatch between the used Kafka Server version and Kafka Client version (check the answer here).

Kafka Streams error "TaskAssignmentException: unable to decode subscription data: version=4"

During deployment with only changed Kafka-Streams version from 1.1.1 to 2.x.x (without changing application.id), we got exceptions on app node with older Kafka-Streams version and, as a result, Kafka streams changed state to error and closed, meanwhile app node with new Kafka-Streams version consumes messages fine.
If we upgrade from 1.1.1 to 2.0.0, got error unable to decode subscription data: version=3; if from 1.1.1 to 2.3.0: unable to decode subscription data: version=4.
It might be really painful during canary deployment, e.g. we have 3 app nodes with previous Kafka-Streams version, and when we add one more node with a new version, all existing 3 nodes will be in error state. Error stack trace:
TaskAssignmentException: unable to decode subscription data: version=4
at org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.decode(SubscriptionInfo.java:128)
at org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:297)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:358)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:520)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:93)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:472)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:455)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:822)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:802)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:563)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:390)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:293)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:364)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:831)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:719)
Issue is reproducible on both Kafka broker versions 1.1.0 and 2.1.1, even with the simple Kafka-Streams DSL example:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("default.key.serde", "org.apache.kafka.common.serialization.Serdes$StringSerde");
props.put("default.value.serde", "org.apache.kafka.common.serialization.Serdes$StringSerde");
props.put("application.id", "xxx");
StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.<String, String>stream("source")
.mapValues(value -> value + value)
.to("destination");
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), props);
Is it a bug of kafka-streams? Does exist any workaround to prevent such failure?

Error connecting Spark local cluster

I am trying to run the following code in my local mac where a spark cluster with master and slaves are running
public void run(String inputFilePath) {
String master = "spark://192.168.1.199:7077";
SparkConf conf = new SparkConf()
.setAppName(WordCountTask.class.getName())
.setMaster(master);
JavaSparkContext context = new JavaSparkContext(conf);
context.textFile(inputFilePath)
.flatMap(text -> Arrays.asList(text.split(" ")).iterator())
.mapToPair(word -> new Tuple2<>(word, 1))
.reduceByKey((a, b) -> a + b)
.foreach(result -> LOGGER.info(
String.format("Word [%s] count [%d].", result._1(), result._2)));
}
}
However I get the following exception both in the master console and
Error while invoking RpcHandler#receive() on RPC id
5655526795459682754 java.io.EOFException
and in the program console
18/07/01 22:35:19 WARN StandaloneAppClient$ClientEndpoint: Failed to
connect to master 192.168.1.199:7077 org.apache.spark.SparkException:
Exception thrown in awaitResult
This runs well when I set the master as "local[*]" as given in this example.
I have seen examples where the jar is submited with spark-submit command but I am trying to run it programatically.
Just realised the version of Spark was different in the master/slave and the POM file of the code. Bumped up the version in the pom.xml to match the spark cluster and it worked.

Categories