I need to perform unit testing on a kafka application avoiding third-party libraries.
My problem right now is that I would like to clear all the topics between tests but I don't know how.
This is my temporary solution: commit every message produced after each test and put all test consumers in the same consumer group.
override protected def afterEach():Unit={
val cleanerConsumer= newConsumer(Seq.empty)
val topics=cleanerConsumer.listTopics()
println("pulisco")
cleanerConsumer.subscribe(topics.keySet())
cleanerConsumer.poll(100)
cleanerConsumer.commitSync()
cleanerConsumer.close()
}
This doesn't work though and I don't know why.
For example, when I create a new consumer inside a test, messages contains the messages produced in the previous test.
val consumerProbe = newConsumer(SMSGatewayTopic)
val messages = consumerProbe.poll(1000)
How can I solve this?
You can also embed a Kafka/Zookeeper instances in your test sources, to have more controller over such isolated services.
trait Kafka { self: ZooKeeper =>
Kafka.start()
}
object Kafka {
import org.apache.hadoop.fs.FileUtil
import kafka.server.KafkaServer
#volatile private var started = false
lazy val logDir = java.nio.file.Files.createTempDirectory("kafka-log").toFile
lazy val kafkaServer: KafkaServer = {
val config = com.typesafe.config.ConfigFactory.
load(this.getClass.getClassLoader)
val (host, port) = {
val (h, p) = config.getString("kafka.servers").span(_ != ':')
h -> p.drop(1).toInt
}
val serverConf = new kafka.server.KafkaConfig({
val props = new java.util.Properties()
props.put("port", port.toString)
props.put("broker.id", port.toString)
props.put("log.dir", logDir.getAbsolutePath)
props.put(
"zookeeper.connect",
s"localhost:${config getInt "test.zookeeper.port"}"
)
props
})
new KafkaServer(serverConf)
}
def start(): Unit = if (!started) {
try {
kafkaServer.startup()
started = true
} catch {
case err: Throwable =>
println(s"fails to start Kafka: ${err.getMessage}")
throw err
}
}
def stop(): Unit = try {
if (started) kafkaServer.shutdown()
} finally {
FileUtil.fullyDelete(logDir)
}
}
trait ZooKeeper {
ZooKeeper.start()
}
object ZooKeeper {
import java.nio.file.Files
import java.net.InetSocketAddress
import org.apache.hadoop.fs.FileUtil
import org.apache.zookeeper.server.ZooKeeperServer
import org.apache.zookeeper.server.ServerCnxnFactory
#volatile private var started = false
lazy val logDir = Files.createTempDirectory("zk-log").toFile
lazy val snapshotDir = Files.createTempDirectory("zk-snapshots").toFile
lazy val (zkServer, zkFactory) = {
val srv = new ZooKeeperServer(
snapshotDir, logDir, 500
)
val config = com.typesafe.config.ConfigFactory.
load(this.getClass.getClassLoader)
val port = config.getInt("test.zookeeper.port")
srv -> ServerCnxnFactory.createFactory(
new InetSocketAddress("localhost", port), 1024
)
}
def start(): Unit = if (!zkServer.isRunning) {
try {
zkFactory.startup(zkServer)
started = true
while (!zkServer.isRunning) {
Thread.sleep(500)
}
} catch {
case err: Throwable =>
println(s"fails to start ZooKeeper: ${err.getMessage}")
throw err
}
}
def stop(): Unit = try {
if (started) zkFactory.shutdown()
} finally {
try { FileUtil.fullyDelete(logDir) } catch { case _: Throwable => () }
FileUtil.fullyDelete(snapshotDir)
}
}
The tests classes can extends Kafka with ZooKeeper to ensure this available.
If the test JVM is not forked, Tests.Cleanup in SBT testOptions in Test setting can be used to stop the embedded services after testing.
I would suggest, you simply recreate all topics before your tests. For example, this is the way kafka tests create/delete topics:
Kafka repository on GitHub
Related
Context: I found few tutorials explaining how consume mutilple endpoints from Kotlin at same time but they are based on Android and in my case it is a backend application. I have some experience using CompleteableFuture but I assume I should use Coroutine since it is a Kotlin and there is no Spring dependency.
Following some suggestions, I reached
#Singleton
class PersonEndpoint()
{
#Inject
lateinit var employeClient: EmployeClient
override suspend fun getPersonDetails(request: PersonRequest): PersonResponse {
var combinedResult: String
GlobalScope.launch {
val resultA: String
val resultB: String
val employeesA = async{ employeClient.getEmployeesA()}
val employeesB = async{ employeClient.getEmployeesB()}
try{
combinedResult = employeesA.await() + employeesB.await()
print(combinedResult)
} catch (ex: Exception) {
ex.printStackTrace()
}
// ISSUE 1
if I try add return over here it is not allowed.
I understand it is working how it is designed to work: GlobalScope is running in different thread
}
// ISSUE 2
if I try return combinedResult over here combinedResult isn't initialized.
I understand it is working how it is designed to work: GlobalScope is running in different thread and I can
debug and see that return over here executes earlier than employeesA.await = employeesB.await
}
So, how can I execute combinedResult = employeesA.await() + employeesB.await() before returning to the client?
*** Edited after Denis/ answer
#Singleton
class CustomerEndpoint(){
fun serve(): Collection<Int> {
return runBlocking {
async {
getItemDouble(1)
}
async {
getItemTriple(1)
}
}.map { it.await() }
}
suspend fun getItemDouble(i: Int): Int {
delay(1000)
return i * 2
}
suspend fun getItemTriple(i: Int): Int {
delay(1000)
return i * 3
}
override suspend fun getPersonDetails(request: PersonRequest): PersonResponse {
val result = serve()
println("Got result $result")
...
}
import kotlinx.coroutines.async
import kotlinx.coroutines.delay
import kotlinx.coroutines.runBlocking
import kotlin.system.measureTimeMillis
fun main() {
val durationMs = measureTimeMillis {
val result = serve()
println("Got result $result")
}
println("The processing is done in $durationMs ms")
}
fun serve(): Collection<Int> {
return runBlocking {
(1..2).map {
async {
getItem(it)
}
}.map { it.await() }
}
}
suspend fun getItem(i: Int): Int {
delay(1000) // Emulate item retrieval work
return i * 2
}
Note that here we have two nested calls - getItem(1) and getItem(2). We can see that they are executed in parallel as overall running time is ~1 second.
Edited in August 05th 2021
private suspend fun myMethod(): List<Any> {
return runBlocking {
listOf(
async { method1() },
async { method2() }
).map { it.await() }
}
}
method1 and method2 are methods calling different endpoints.
I am working on akka with Kafka and writing test cases for my Kafka Consumer.
I used Embedded Kafka for unit testing.
When i try to run my test case everything goes fine, but in last following Exception occurred:
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://testActor/system/StreamSupervisor-0/flow-0-1-mapAsyncUnordered#-2130769]] after [1000 ms]. Message of type [akka.stream.impl.fusing.ActorGraphInterpreter$Snapshot$] was sent by [Actor[akka://testActor/system/StreamSupervisor-0#-867168141]]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
at akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:675)
at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:696)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:202)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:875)
at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:113)
at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:107)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:873)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:334)
at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:285)
at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:289)
at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:241)
at java.lang.Thread.run(Thread.java:748)
Here is my code.
My Test method is:
#Test
public void publishMessage() {
final TestKit probe = new TestKit(system);
final Config config = system.settings().config().getConfig("akka.kafka.producer");
ActorRef childMaker = probe.getTestActor();
final ProducerSettings<String, String> producerSettings =
ProducerSettings.create(config, new StringSerializer(), new StringSerializer())
.withBootstrapServers(bootstrapServers);
Source.range(1, 10)
.map(Object::toString)
.map(value -> new ProducerRecord<>(topic, 0, "key1", value))
.runWith(Producer.plainSink(producerSettings), materializer);
new EventFilter(Logging.Info.class, system)
.occurrences(1)
.matches("Starting up Consumer:")
.matches("Consumer Started:")
.intercept(() -> TestActorRef
.create(system, KafkaConsumerPlainExternalSource.props(new RequestRegisterConsumer(system,
config, bootstrapServers, groupId, topic, (byte) 0, childMaker))));
}
My KafkaConsumer Class looks like:
public class KafkaConsumerPlainExternalSource extends AbstractLoggingActor {
private static RequestRegisterConsumer consumerConf;
static Props props(RequestRegisterConsumer consumerConf) {
return Props.create(KafkaConsumerPlainExternalSource.class, consumerConf);
}
public KafkaConsumerPlainExternalSource(RequestRegisterConsumer consumerConf) {
KafkaConsumerPlainExternalSource.consumerConf = consumerConf;
}
#Override
public Receive createReceive() {
return receiveBuilder().build();
}
#Override
public void preStart() {
log().info("Starting up Consumer: " + self().path().toString());
//Update
akka.kafka.javadsl.Consumer.plainExternalSource(consumer, Subscriptions
.assignment(new TopicPartition(consumerConf.getTopic(), consumerConf.getTopicPartition())))
.mapAsync(10, Consumer :: consume)
.to(Sink.ignore())
.run(ActorMaterializer.create(consumerConf.getActorSystem()));
log().info("Consumer Started: " + self().path().toString());
}
}
My application.conf file is:
> akka {
> loggers = [akka.testkit.TestEventListener]
> test {
> timefactor = 1.0
> filter-leeway = 10s
> calling-thread-dispatcher {
> type = akka.testkit.CallingThreadDispatcherConfigurator
> } } kafka.producer {//producer conf} }
When i add a sleep of 10 sec in last of my test case than my test runs fine. I could not find the root cause of this Exception.
I´m using graphDSL of akka stream to create a DSL for my test framework, but now that I´m looking how works I dont think it fit well.
My concern is that it seems like when I make an assert(false) in one of the flows instead of propagate the error in the test it´s getting stuck
I dont know if I´m doing something wrong
My DSL implementation looks like:
def given(message: String, musVersion: MUSVersion = ONE) = Source.single(new message(message, musVersion))
def When(sentence: String) = Flow[message].map(message => {
try {
HttpClient.request(message._1, message._2)
} catch {
case e: Exception => {
HttpResponse[String](e.getMessage, 500, Map())
}
}
})
def Then(sentence: String) = Sink.foreach[HttpResponse[String]](response => {
assert(false)
thenAction(sentence, response)
println(s"######## $x")
})
Like I said my test it get stuck instead mark the test as failure because of the assert.
Here my Test code:
class TestDSL extends MainDSL {
private def generateKey(): String = s"""${UUID.randomUUID().toString}"""
implicit val config = this.getRequestConfig("cassandra")
val message: String = Messages.message(path = "cassandra", key = "Cassandra " + generateKey())
info("This test has as requirement create and find an Entity using cassandra connector")
feature("First DSL") {
scenario(s"This is a prove of concept of the DSL") {
RunnableGraph.fromGraph(GraphDSL.create() { implicit builder =>
given(message) ~> When("I make a request") ~> Then("The return code='200'") ~> AndThen("The payload is not empty")
ClosedShape
}).run()
}
}
}
Any idea what´s wrong?.
Regards.
I'm trying to make distributed pub-sub across different cluster system but it's not working whatever i try.
All I'm trying to do is create a simple example where.
1) I create a topic, say "content".
2) One node in say jvm A creates the topic, subscribes to it, and a publisher who publishes to it too.
3) In a different node , say jvm B on a different port , I create a subscriber.
4) When i sent a message to the topic from jvm A, then I want the subscriber on jvm B to receive it too as its subscribed to the same topic.
Any helps would be greatly appreciated or a simple working example of distributed pub sub with subscribers and publishers in different cluster system on different ports, in Java.
here is the code for app1 and its config file.
public class App1{
public static void main(String[] args) {
System.setProperty("akka.remote.netty.tcp.port", "2551");
ActorSystem clusterSystem = ActorSystem.create("ClusterSystem");
ClusterClientReceptionist clusterClientReceptionist1 = ClusterClientReceptionist.get(clusterSystem);
ActorRef subcriber1=clusterSystem.actorOf(Props.create(Subscriber.class), "subscriber1");
clusterClientReceptionist1.registerSubscriber("content", subcriber1);
ActorRef publisher1=clusterSystem.actorOf(Props.create(Publisher.class), "publisher1");
clusterClientReceptionist1.registerSubscriber("content", publisher1);
publisher1.tell("testMessage1", ActorRef.noSender());
}
}
app1.confi
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
stdout-loglevel = "DEBUG"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2551
}
}
cluster {
seed-nodes = [
"akka.tcp://ClusterSystem#127.0.0.1:2551"
]
auto-down-unreachable-after = 10s
}
akka.extensions = ["akka.cluster.pubsub.DistributedPubSub",
"akka.contrib.pattern.ClusterReceptionistExtension"]
akka.cluster.pub-sub {
name = distributedPubSubMediator
role = ""
routing-logic = random
gossip-interval = 1s
removed-time-to-live = 120s
max-delta-elements = 3000
use-dispatcher = ""
}
akka.cluster.client.receptionist {
name = receptionist
role = ""
number-of-contacts = 3
response-tunnel-receive-timeout = 30s
use-dispatcher = ""
heartbeat-interval = 2s
acceptable-heartbeat-pause = 13s
failure-detection-interval = 2s
}
}
code for app2 and its config file
public class App
{
public static Set<ActorPath> initialContacts() {
return new HashSet<ActorPath>(Arrays.asList(
ActorPaths.fromString("akka.tcp://ClusterSystem#127.0.0.1:2551/system/receptionist")));
}
public static void main( String[] args ) {
System.setProperty("akka.remote.netty.tcp.port", "2553");
ActorSystem clusterSystem = ActorSystem.create("ClusterSystem2");
ClusterClientReceptionist clusterClientReceptionist2 = ClusterClientReceptionist.get(clusterSystem);
final ActorRef clusterClient = clusterSystem.actorOf(ClusterClient.props(ClusterClientSettings.create(
clusterSystem).withInitialContacts(initialContacts())), "client");
ActorRef subcriber2=clusterSystem.actorOf(Props.create(Subscriber.class), "subscriber2");
clusterClientReceptionist2.registerSubscriber("content", subcriber2);
ActorRef publisher2=clusterSystem.actorOf(Props.create(Publisher.class), "publisher2");
publisher2.tell("testMessage2", ActorRef.noSender());
clusterClient.tell(new ClusterClient.Send("/user/publisher1", "hello", true), null);
}
}
app2.confi
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
stdout-loglevel = "DEBUG"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2553
}
}
cluster {
seed-nodes = [
"akka.tcp://ClusterSystem#127.0.0.1:2553"
]
auto-down-unreachable-after = 10s
}
akka.extensions = ["akka.cluster.pubsub.DistributedPubSub",
"akka.contrib.pattern.ClusterReceptionistExtension"]
akka.cluster.pub-sub {
name = distributedPubSubMediator
role = ""
routing-logic = random
gossip-interval = 1s
removed-time-to-live = 120s
max-delta-elements = 3000
use-dispatcher = ""
}
akka.cluster.client.receptionist {
name = receptionist
role = ""
number-of-contacts = 3
response-tunnel-receive-timeout = 30s
use-dispatcher = ""
heartbeat-interval = 2s
acceptable-heartbeat-pause = 13s
failure-detection-interval = 2s
}
}
Publisher and Subscriber class are same for both application which is given below.
Publisher:
public class Publisher extends UntypedActor {
private final ActorRef mediator =
DistributedPubSub.get(getContext().system()).mediator();
#Override
public void onReceive(Object msg) throws Exception {
if (msg instanceof String) {
mediator.tell(new DistributedPubSubMediator.Publish("events", msg), getSelf());
} else {
unhandled(msg);
}
}
}
Subscriber:
public class Subscriber extends UntypedActor {
private final LoggingAdapter log = Logging.getLogger(getContext().system(), this);
public Subscriber(){
ActorRef mediator = DistributedPubSub.get(getContext().system()).mediator();
mediator.tell(new DistributedPubSubMediator.Subscribe("events", getSelf()), getSelf());
}
public void onReceive(Object msg) throws Throwable {
if (msg instanceof String) {
log.info("Got: {}", msg);
} else if (msg instanceof DistributedPubSubMediator.SubscribeAck) {
log.info("subscribing");
} else {
unhandled(msg);
}
}
}
i got this error in receiver side app while running both apps.Dead letters encounterd
[ClusterSystem-akka.actor.default-dispatcher-21] INFO akka.actor.RepointableActorRef - Message [java.lang.String] from Actor[akka://ClusterSystem/system/receptionist/akka.tcp%3A%2F%2FClusterSystem2%40127.0.0.1%3A2553%2FdeadLetters#188707926] to Actor[akka://ClusterSystem/system/distributedPubSubMediator#1119990682] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
and in sender side app message send successfully is displayed in log.
[ClusterSystem2-akka.actor.default-dispatcher-22] DEBUG akka.cluster.client.ClusterClient - Sending buffered messages to receptionist
Using the ClusterClient in that way does not really make sense and does not have anything to do with using the distributed pub sub, as both your nodes are a part of the cluster you can just use the distributed pub sub api directly.
Here is a simple main including config creating a two node cluster using your exact Publisher and Subscriber actors that works as expected:
public static void main(String[] args) throws Exception {
final Config config = ConfigFactory.parseString(
"akka.actor.provider=cluster\n" +
"akka.remote.netty.tcp.port=2551\n" +
"akka.cluster.seed-nodes = [ \"akka.tcp://ClusterSystem#127.0.0.1:2551\"]\n");
ActorSystem node1 = ActorSystem.create("ClusterSystem", config);
ActorSystem node2 = ActorSystem.create("ClusterSystem",
ConfigFactory.parseString("akka.remote.netty.tcp.port=2552")
.withFallback(config));
// wait a bit for the cluster to form
Thread.sleep(3000);
ActorRef subscriber = node1.actorOf(
Props.create(Subscriber.class),
"subscriber");
ActorRef publisher = node2.actorOf(
Props.create(Publisher.class),
"publisher");
// wait a bit for the subscription to be gossiped
Thread.sleep(3000);
publisher.tell("testMessage1", ActorRef.noSender());
}
Note that distributed pub sub does not give any guarantees of delivery, so if you send a message before the mediators has gotten in contact with each other, the message will simply be lost (hence the Thread.sleep statements, which are ofc not something you should do in actual code).
I think the issue is that your actor systems have different names ClusterSystem and ClusterSystem2. At least I was having the same issue because I had two different services in the cluster but I names the systems in each service with a different name.
How can I identify the topic name from a message in kafka.
String[] topics = { "test", "test1", "test2" };
for (String t : topics) {
topicMap.put(t, new Integer(3));
}
SparkConf conf = new SparkConf().setAppName("KafkaReceiver")
.set("spark.streaming.receiver.writeAheadLog.enable", "false")
.setMaster("local[4]")
.set("spark.cassandra.connection.host", "localhost");
;
final JavaSparkContext sc = new JavaSparkContext(conf);
JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(
1000));
/* Receive Kafka streaming inputs */
JavaPairReceiverInputDStream<String, String> messages = KafkaUtils
.createStream(jssc, "localhost:2181", "test-group",
topicMap);
JavaDStream<MessageAndMetadata> data =
messages.map(new Function<Tuple2<String, String>, MessageAndMetadata>()
{
public MessageAndMetadata call(Tuple2<String, String> message)
{
System.out.println("message ="+message._2);
return null;
}
}
);
I can fetch message from kafka producer. But since the consumer now consuming from three topic, it is needed to identify topic name.
As of Spark 1.5.0, official documentation encourages using no-receiver/direct approach starting from recent releases, which has graduated from experimental in recent 1.5.0.
This new Direct API allows you to easily obtain message and its metadata apart from other good things.
Unfortunately, this is not straightforward as KafkaReceiver and ReliableKafkaReceiver in Spark's source code only store MessageAndMetadata.key and message.
There are two open tickets related to this issue in Spark's JIRA:
https://issues.apache.org/jira/browse/SPARK-3146
https://issues.apache.org/jira/browse/SPARK-4960
which have been opened for a while.
A dirty copy/paste/modify of Spark's source code to solve your issue:
package org.apache.spark.streaming.kafka
import java.lang.{Integer => JInt}
import java.util.{Map => JMap, Properties}
import kafka.consumer.{KafkaStream, Consumer, ConsumerConfig, ConsumerConnector}
import kafka.serializer.{Decoder, StringDecoder}
import kafka.utils.VerifiableProperties
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.api.java.{JavaReceiverInputDStream, JavaStreamingContext}
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
import org.apache.spark.streaming.util.WriteAheadLogUtils
import org.apache.spark.util.ThreadUtils
import scala.collection.JavaConverters._
import scala.collection.Map
import scala.reflect._
object MoreKafkaUtils {
def createStream(
jssc: JavaStreamingContext,
zkQuorum: String,
groupId: String,
topics: JMap[String, JInt],
storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
): JavaReceiverInputDStream[(String, String, String)] = {
val kafkaParams = Map[String, String](
"zookeeper.connect" -> zkQuorum, "group.id" -> groupId,
"zookeeper.connection.timeout.ms" -> "10000")
val walEnabled = WriteAheadLogUtils.enableReceiverLog(jssc.ssc.conf)
new KafkaInputDStreamWithTopic[String, String, StringDecoder, StringDecoder](jssc.ssc, kafkaParams, topics.asScala.mapValues(_.intValue()), walEnabled, storageLevel)
}
}
private[streaming]
class KafkaInputDStreamWithTopic[
K: ClassTag,
V: ClassTag,
U <: Decoder[_] : ClassTag,
T <: Decoder[_] : ClassTag](
#transient ssc_ : StreamingContext,
kafkaParams: Map[String, String],
topics: Map[String, Int],
useReliableReceiver: Boolean,
storageLevel: StorageLevel
) extends ReceiverInputDStream[(K, V, String)](ssc_) with Logging {
def getReceiver(): Receiver[(K, V, String)] = {
if (!useReliableReceiver) {
new KafkaReceiverWithTopic[K, V, U, T](kafkaParams, topics, storageLevel)
} else {
new ReliableKafkaReceiverWithTopic[K, V, U, T](kafkaParams, topics, storageLevel)
}
}
}
private[streaming]
class KafkaReceiverWithTopic[
K: ClassTag,
V: ClassTag,
U <: Decoder[_] : ClassTag,
T <: Decoder[_] : ClassTag](
kafkaParams: Map[String, String],
topics: Map[String, Int],
storageLevel: StorageLevel
) extends Receiver[(K, V, String)](storageLevel) with Logging {
// Connection to Kafka
var consumerConnector: ConsumerConnector = null
def onStop() {
if (consumerConnector != null) {
consumerConnector.shutdown()
consumerConnector = null
}
}
def onStart() {
logInfo("Starting Kafka Consumer Stream with group: " + kafkaParams("group.id"))
// Kafka connection properties
val props = new Properties()
kafkaParams.foreach(param => props.put(param._1, param._2))
val zkConnect = kafkaParams("zookeeper.connect")
// Create the connection to the cluster
logInfo("Connecting to Zookeeper: " + zkConnect)
val consumerConfig = new ConsumerConfig(props)
consumerConnector = Consumer.create(consumerConfig)
logInfo("Connected to " + zkConnect)
val keyDecoder = classTag[U].runtimeClass.getConstructor(classOf[VerifiableProperties])
.newInstance(consumerConfig.props)
.asInstanceOf[Decoder[K]]
val valueDecoder = classTag[T].runtimeClass.getConstructor(classOf[VerifiableProperties])
.newInstance(consumerConfig.props)
.asInstanceOf[Decoder[V]]
// Create threads for each topic/message Stream we are listening
val topicMessageStreams = consumerConnector.createMessageStreams(
topics, keyDecoder, valueDecoder)
val executorPool =
ThreadUtils.newDaemonFixedThreadPool(topics.values.sum, "KafkaMessageHandler")
try {
// Start the messages handler for each partition
topicMessageStreams.values.foreach { streams =>
streams.foreach { stream => executorPool.submit(new MessageHandler(stream)) }
}
} finally {
executorPool.shutdown() // Just causes threads to terminate after work is done
}
}
// Handles Kafka messages
private class MessageHandler(stream: KafkaStream[K, V])
extends Runnable {
def run() {
logInfo("Starting MessageHandler.")
try {
val streamIterator = stream.iterator()
while (streamIterator.hasNext()) {
val msgAndMetadata = streamIterator.next()
store((msgAndMetadata.key, msgAndMetadata.message, msgAndMetadata.topic))
}
} catch {
case e: Throwable => reportError("Error handling message; exiting", e)
}
}
}
}