I am attempting to build out a kstreams app that takes in records from an input topic that is a simple json payload (id and timestamp included - the key is a simple 3 digit string) (there is also no schema required). for the output topic I wish to produce only the records in which have been abandoned for 30 minutes or more (session window). based on this link, I have begun to develop a kafka streams app:
package io.confluent.developer;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.kstream.SessionWindows;
import java.io.FileInputStream;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.time.format.FormatStyle;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
public class SessionWindow {
private final DateTimeFormatter timeFormatter = DateTimeFormatter.ofLocalizedTime(FormatStyle.LONG)
.withLocale(Locale.US)
.withZone(ZoneId.systemDefault());
public Topology buildTopology(Properties allProps) {
final StreamsBuilder builder = new StreamsBuilder();
final String inputTopic = allProps.getProperty("input.topic.name");
final String outputTopic = allProps.getProperty("output.topic.name");
builder.stream(inputTopic, Consumed.with(Serdes.String(), Serdes.String()))
.groupByKey()
.windowedBy(SessionWindows.ofInactivityGapAndGrace(Duration.ofMinutes(5), Duration.ofSeconds(10)))
.count()
.toStream()
.map((windowedKey, count) -> {
String start = timeFormatter.format(windowedKey.window().startTime());
String end = timeFormatter.format(windowedKey.window().endTime());
String sessionInfo = String.format("Session info started: %s ended: %s with count %s", start, end, count);
return KeyValue.pair(windowedKey.key(), sessionInfo);
})
.to(outputTopic, Produced.with(Serdes.String(), Serdes.String()));
return builder.build();
}
public Properties loadEnvProperties(String fileName) throws IOException {
Properties allProps = new Properties();
FileInputStream input = new FileInputStream(fileName);
allProps.load(input);
input.close();
return allProps;
}
public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new IllegalArgumentException("This program takes one argument: the path to an environment configuration file.");
}
SessionWindow tw = new SessionWindow();
Properties allProps = tw.loadEnvProperties(args[0]);
allProps.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
allProps.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, ClickEventTimestampExtractor.class);
Topology topology = tw.buildTopology(allProps);
ClicksDataGenerator dataGenerator = new ClicksDataGenerator(allProps);
dataGenerator.generate();
final KafkaStreams streams = new KafkaStreams(topology, allProps);
final CountDownLatch latch = new CountDownLatch(1);
// Attach shutdown handler to catch Control-C.
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close(Duration.ofSeconds(5));
latch.countDown();
}
});
try {
streams.cleanUp();
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
static class ClicksDataGenerator {
final Properties properties;
public ClicksDataGenerator(final Properties properties) {
this.properties = properties;
}
public void generate() {
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
}
}
}
package io.confluent.developer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.streams.processor.TimestampExtractor;
public class ClickEventTimestampExtractor implements TimestampExtractor {
#Override
public long extract(ConsumerRecord<Object, Object> record, long previousTimestamp) {
System.out.println(record.value());
return record.getTimestamp();
}
}
i am having issues withe the following:
getting the code to compile - I keep getting this error (I am new to java so please bear with me). what is the correct way to call the getTimestamp?:
error: cannot find symbol
return record.getTimestamp();
^
symbol: method getTimestamp()
location: variable record of type ConsumerRecord<Object,Object>
1 error
not sure if the timestamp extractor will work for this particular scenario. I read here that 'The Timestamp extractor can only give you one timestamp'. does that mean that if there are multiple messages with different keys this wont work? some clarification or examples would help.
thanks!
Related
I'm a former legacy ActiveMQ user learning Kafka. And I have a question.
With Active MQ you can do this:
Submit 100 messages into a queue
Wait however long you want
Consume those 100 messages from that queue. Guaranteed single consumer of the message.
I try in Kafka to do the same thing
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.containers.KafkaContainer;
import org.testcontainers.utility.DockerImageName;
public class KafkaTest {
private static final Logger LOG = LoggerFactory.getLogger(KafkaTest.class);
public static final String MY_GROUP_ID = "my-group-id";
public static final String TOPIC = "topic";
KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.2.1"));
#Before
public void before() {
kafka.start();
}
#After
public void after() {
kafka.close();
}
#Test
public void testPipes() throws ExecutionException, InterruptedException {
Properties consumerProps = new Properties();
consumerProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
consumerProps.put("group.id", MY_GROUP_ID);
consumerProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumerProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
ExecutorService es = Executors.newCachedThreadPool();
Future consumerFuture = es.submit(() -> {
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(consumerProps)) {
consumer.subscribe(Collections.singletonList(TOPIC));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
LOG.info("Thread: {}, Topic: {}, Partition: {}, Offset: {}, key: {}, value: {}", Thread.currentThread().getName(), record.topic(), record.partition(), record.offset(), record.key(), record.value().toUpperCase());
}
}
} catch (Exception e) {
LOG.error("Consumer error", e);
}
});
Thread.sleep(10000); // NOTICE! if you remove this, the consumer will not receive the messages. because the consumer won't be registered yet before the messages come rolling on in.
Properties producerProps = new Properties();
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka.getBootstrapServers());
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
Future producerFuture = es.submit(() -> {
try (KafkaProducer<String, String> producer = new KafkaProducer<>(producerProps)) {
int counter = 0;
while (counter <= 100) {
System.out.println("Sent " + counter);
String msg = "Message " + counter;
producer.send(new ProducerRecord<>(TOPIC, msg));
counter++;
}
} catch (Exception e) {
LOG.error("Failed to send message by the producer", e);
}
});
producerFuture.get();
consumerFuture.get();
}
}
This example does not work if you do not start Consumer, wait for it to start, then run the producer.
Can anyone show me how to alter my example program to do things where the messages await to be consumed?
In your consumer config, you need to add auto.offset.reset=earliest or call seekToBeginning after subscribing.
Otherwise, it starts to read from the end of the topic. In other words, if you start the consumer after the producer, it'll begin to read after all the existing data.
I'm getting an error with StreamIdentifier when trying to use MultiStreamTracker in a kinesis consumer application.
java.lang.IllegalArgumentException: Unable to deserialize StreamIdentifier from first-stream-name
What is causing this error? I can't find a good example of using the tracker with kinesis.
The stream name works when using a consumer with a single stream so I'm not sure what is happening. It looks like the consumer is trying to parse the accountId and streamCreationEpoch. But when I create the identifiers I am using the singleStreamInstance method. Is the stream name required to have these values? They appear to be optional from the code.
This test is part of a complete example on github.
package kinesis.localstack.example;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import com.amazonaws.services.kinesis.producer.KinesisProducer;
import com.amazonaws.services.kinesis.producer.KinesisProducerConfiguration;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.testcontainers.containers.localstack.LocalStackContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
import org.testcontainers.utility.DockerImageName;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.cloudwatch.CloudWatchAsyncClient;
import software.amazon.awssdk.services.dynamodb.DynamoDbAsyncClient;
import software.amazon.awssdk.services.kinesis.KinesisAsyncClient;
import software.amazon.kinesis.common.ConfigsBuilder;
import software.amazon.kinesis.common.InitialPositionInStream;
import software.amazon.kinesis.common.InitialPositionInStreamExtended;
import software.amazon.kinesis.common.KinesisClientUtil;
import software.amazon.kinesis.common.StreamConfig;
import software.amazon.kinesis.common.StreamIdentifier;
import software.amazon.kinesis.coordinator.Scheduler;
import software.amazon.kinesis.exceptions.InvalidStateException;
import software.amazon.kinesis.exceptions.ShutdownException;
import software.amazon.kinesis.lifecycle.events.InitializationInput;
import software.amazon.kinesis.lifecycle.events.LeaseLostInput;
import software.amazon.kinesis.lifecycle.events.ProcessRecordsInput;
import software.amazon.kinesis.lifecycle.events.ShardEndedInput;
import software.amazon.kinesis.lifecycle.events.ShutdownRequestedInput;
import software.amazon.kinesis.processor.FormerStreamsLeasesDeletionStrategy;
import software.amazon.kinesis.processor.FormerStreamsLeasesDeletionStrategy.NoLeaseDeletionStrategy;
import software.amazon.kinesis.processor.MultiStreamTracker;
import software.amazon.kinesis.processor.ShardRecordProcessor;
import software.amazon.kinesis.processor.ShardRecordProcessorFactory;
import software.amazon.kinesis.retrieval.KinesisClientRecord;
import software.amazon.kinesis.retrieval.polling.PollingConfig;
import static java.util.stream.Collectors.toList;
import static org.assertj.core.api.Assertions.assertThat;
import static org.awaitility.Awaitility.await;
import static org.testcontainers.containers.localstack.LocalStackContainer.Service.CLOUDWATCH;
import static org.testcontainers.containers.localstack.LocalStackContainer.Service.DYNAMODB;
import static org.testcontainers.containers.localstack.LocalStackContainer.Service.KINESIS;
import static software.amazon.kinesis.common.InitialPositionInStream.TRIM_HORIZON;
import static software.amazon.kinesis.common.StreamIdentifier.singleStreamInstance;
#Testcontainers
public class KinesisMultiStreamTest {
static class TestProcessorFactory implements ShardRecordProcessorFactory {
private final TestKinesisRecordService service;
public TestProcessorFactory(TestKinesisRecordService service) {
this.service = service;
}
#Override
public ShardRecordProcessor shardRecordProcessor() {
throw new UnsupportedOperationException("must have streamIdentifier");
}
public ShardRecordProcessor shardRecordProcessor(StreamIdentifier streamIdentifier) {
return new TestRecordProcessor(service, streamIdentifier);
}
}
static class TestRecordProcessor implements ShardRecordProcessor {
public final TestKinesisRecordService service;
public final StreamIdentifier streamIdentifier;
public TestRecordProcessor(TestKinesisRecordService service, StreamIdentifier streamIdentifier) {
this.service = service;
this.streamIdentifier = streamIdentifier;
}
#Override
public void initialize(InitializationInput initializationInput) {
}
#Override
public void processRecords(ProcessRecordsInput processRecordsInput) {
service.addRecord(streamIdentifier, processRecordsInput);
}
#Override
public void leaseLost(LeaseLostInput leaseLostInput) {
}
#Override
public void shardEnded(ShardEndedInput shardEndedInput) {
try {
shardEndedInput.checkpointer().checkpoint();
} catch (Exception e) {
throw new IllegalStateException(e);
}
}
#Override
public void shutdownRequested(ShutdownRequestedInput shutdownRequestedInput) {
}
}
static class TestKinesisRecordService {
private List<ProcessRecordsInput> firstStreamRecords = Collections.synchronizedList(new ArrayList<>());
private List<ProcessRecordsInput> secondStreamRecords = Collections.synchronizedList(new ArrayList<>());
public void addRecord(StreamIdentifier streamIdentifier, ProcessRecordsInput processRecordsInput) {
if(streamIdentifier.streamName().contains(firstStreamName)) {
firstStreamRecords.add(processRecordsInput);
} else if(streamIdentifier.streamName().contains(secondStreamName)) {
secondStreamRecords.add(processRecordsInput);
} else {
throw new IllegalStateException("no list for stream " + streamIdentifier);
}
}
public List<ProcessRecordsInput> getFirstStreamRecords() {
return Collections.unmodifiableList(firstStreamRecords);
}
public List<ProcessRecordsInput> getSecondStreamRecords() {
return Collections.unmodifiableList(secondStreamRecords);
}
}
public static final String firstStreamName = "first-stream-name";
public static final String secondStreamName = "second-stream-name";
public static final String partitionKey = "partition-key";
DockerImageName localstackImage = DockerImageName.parse("localstack/localstack:latest");
#Container
public LocalStackContainer localstack = new LocalStackContainer(localstackImage)
.withServices(KINESIS, CLOUDWATCH)
.withEnv("KINESIS_INITIALIZE_STREAMS", firstStreamName + ":1," + secondStreamName + ":1");
public Scheduler scheduler;
public TestKinesisRecordService service = new TestKinesisRecordService();
public KinesisProducer producer;
#BeforeEach
void setup() {
KinesisAsyncClient kinesisClient = KinesisClientUtil.createKinesisAsyncClient(
KinesisAsyncClient.builder().endpointOverride(localstack.getEndpointOverride(KINESIS)).region(Region.of(localstack.getRegion()))
);
DynamoDbAsyncClient dynamoClient = DynamoDbAsyncClient.builder().region(Region.of(localstack.getRegion())).endpointOverride(localstack.getEndpointOverride(DYNAMODB)).build();
CloudWatchAsyncClient cloudWatchClient = CloudWatchAsyncClient.builder().region(Region.of(localstack.getRegion())).endpointOverride(localstack.getEndpointOverride(CLOUDWATCH)).build();
MultiStreamTracker tracker = new MultiStreamTracker() {
private List<StreamConfig> configs = List.of(
new StreamConfig(singleStreamInstance(firstStreamName), InitialPositionInStreamExtended.newInitialPosition(TRIM_HORIZON)),
new StreamConfig(singleStreamInstance(secondStreamName), InitialPositionInStreamExtended.newInitialPosition(TRIM_HORIZON)));
#Override
public List<StreamConfig> streamConfigList() {
return configs;
}
#Override
public FormerStreamsLeasesDeletionStrategy formerStreamsLeasesDeletionStrategy() {
return new NoLeaseDeletionStrategy();
}
};
ConfigsBuilder configsBuilder = new ConfigsBuilder(tracker, "KinesisPratTest", kinesisClient, dynamoClient, cloudWatchClient, UUID.randomUUID().toString(), new TestProcessorFactory(service));
scheduler = new Scheduler(
configsBuilder.checkpointConfig(),
configsBuilder.coordinatorConfig(),
configsBuilder.leaseManagementConfig(),
configsBuilder.lifecycleConfig(),
configsBuilder.metricsConfig(),
configsBuilder.processorConfig().callProcessRecordsEvenForEmptyRecordList(false),
configsBuilder.retrievalConfig()
);
new Thread(scheduler).start();
producer = producer();
}
#AfterEach
public void teardown() throws ExecutionException, InterruptedException, TimeoutException {
producer.destroy();
Future<Boolean> gracefulShutdownFuture = scheduler.startGracefulShutdown();
gracefulShutdownFuture.get(60, TimeUnit.SECONDS);
}
public KinesisProducer producer() {
var configuration = new KinesisProducerConfiguration()
.setVerifyCertificate(false)
.setCredentialsProvider(localstack.getDefaultCredentialsProvider())
.setMetricsCredentialsProvider(localstack.getDefaultCredentialsProvider())
.setRegion(localstack.getRegion())
.setCloudwatchEndpoint(localstack.getEndpointOverride(CLOUDWATCH).getHost())
.setCloudwatchPort(localstack.getEndpointOverride(CLOUDWATCH).getPort())
.setKinesisEndpoint(localstack.getEndpointOverride(KINESIS).getHost())
.setKinesisPort(localstack.getEndpointOverride(KINESIS).getPort());
return new KinesisProducer(configuration);
}
#Test
void testFirstStream() {
String expected = "Hello";
producer.addUserRecord(firstStreamName, partitionKey, ByteBuffer.wrap(expected.getBytes(StandardCharsets.UTF_8)));
var result = await().timeout(600, TimeUnit.SECONDS)
.until(() -> service.getFirstStreamRecords().stream()
.flatMap(r -> r.records().stream())
.map(KinesisClientRecord::data)
.map(r -> StandardCharsets.UTF_8.decode(r).toString())
.collect(toList()), records -> records.size() > 0);
assertThat(result).anyMatch(r -> r.equals(expected));
}
#Test
void testSecondStream() {
String expected = "Hello";
producer.addUserRecord(secondStreamName, partitionKey, ByteBuffer.wrap(expected.getBytes(StandardCharsets.UTF_8)));
var result = await().timeout(600, TimeUnit.SECONDS)
.until(() -> service.getSecondStreamRecords().stream()
.flatMap(r -> r.records().stream())
.map(KinesisClientRecord::data)
.map(r -> StandardCharsets.UTF_8.decode(r).toString())
.collect(toList()), records -> records.size() > 0);
assertThat(result).anyMatch(r -> r.equals(expected));
}
}
Here is the error I am getting.
[Thread-9] ERROR software.amazon.kinesis.coordinator.Scheduler - Worker.run caught exception, sleeping for 1000 milli seconds!
java.lang.IllegalArgumentException: Unable to deserialize StreamIdentifier from first-stream-name
at software.amazon.kinesis.common.StreamIdentifier.multiStreamInstance(StreamIdentifier.java:75)
at software.amazon.kinesis.coordinator.Scheduler.getStreamIdentifier(Scheduler.java:1001)
at software.amazon.kinesis.coordinator.Scheduler.buildConsumer(Scheduler.java:917)
at software.amazon.kinesis.coordinator.Scheduler.createOrGetShardConsumer(Scheduler.java:899)
at software.amazon.kinesis.coordinator.Scheduler.runProcessLoop(Scheduler.java:419)
at software.amazon.kinesis.coordinator.Scheduler.run(Scheduler.java:330)
at java.base/java.lang.Thread.run(Thread.java:829)
According to documentation:
The serialized stream identifier should be of the following format: account-id:StreamName:streamCreationTimestamp
So your code should be like this:
private List<StreamConfig> configs = List.of(
new StreamConfig(multiStreamInstance("111111111:multiStreamTest-1:12345"), InitialPositionInStreamExtended.newInitialPosition(TRIM_HORIZON)),
new StreamConfig(multiStreamInstance("111111111:multiStreamTest-2:12389"), InitialPositionInStreamExtended.newInitialPosition(TRIM_HORIZON)));
Note: this also will change leaseKey format to account-id:StreamName:streamCreationTimestamp:ShardId
I'm trying to confirm the value of an HTTP response header with Spring 5 WebClient, but only if the web call responds with an HTTP 200 status code. In this use case if authentication is not successful, the API call returns with an HTTP 401 without the response header present. I have the following code below which functionally works, but it is making the web call twice (because I'm blocking twice). Short of just blocking on the HTTP response header only, and putting a try/catch for an NPE when the header isn't present, is there any "cleaner" way to do this?
import java.net.URI;
import java.time.Duration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.WebApplicationType;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.http.HttpMethod;
import org.springframework.http.HttpStatus;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.web.reactive.function.BodyInserters;
import org.springframework.web.reactive.function.client.ClientRequest;
import org.springframework.web.reactive.function.client.ClientResponse;
import org.springframework.web.reactive.function.client.ExchangeFunction;
import org.springframework.web.reactive.function.client.ExchangeFunctions;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
#SpringBootApplication
public class ContentCheckerApplication {
private static final Logger LOGGER = LoggerFactory.getLogger(ContentCheckerApplication.class);
private ExchangeFunction exchange = ExchangeFunctions.create(new ReactorClientHttpConnector());
public static void main(String[] args) {
SpringApplication app = new SpringApplication(ContentCheckerApplication.class);
// prevent SpringBoot from starting a web server
app.setWebApplicationType(WebApplicationType.NONE);
app.run(args);
}
#Bean
public CommandLineRunner myCommandLineRunner() {
return args -> {
// Our reactive code will be declared here
LinkedMultiValueMap<String, String> formData = new LinkedMultiValueMap<String, String>();
formData.add("username", args[2]);
formData.add("password", args[3]);
ClientRequest request = ClientRequest.method(HttpMethod.POST, new URI(args[0]+"/api/token"))
.body(BodyInserters.fromFormData(formData)).build();
Mono<ClientResponse> mresponse = exchange.exchange(request);
Mono<String> mnewToken = mresponse.map(response -> response.headers().asHttpHeaders().getFirst("WSToken"));
LOGGER.info("Blocking for status code...");
HttpStatus statusCode = mresponse.block(Duration.ofMillis(1500)).statusCode();
LOGGER.info("Got status code!");
if (statusCode.value() == 200) {
String newToken = mnewToken.block(Duration.ofMillis(1500));
LOGGER.info("Auth token is: " + newToken);
} else {
LOGGER.info("Unable to authenticate successfully! Status code: "+statusCode.value());
}
};
}
}
Thanks to comments from #M. Deinum to guide me, I have the following code which is workable now.
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.WebApplicationType;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.http.HttpStatus;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.web.reactive.function.BodyInserters;
import org.springframework.web.reactive.function.client.ExchangeFunction;
import org.springframework.web.reactive.function.client.ExchangeFunctions;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;
#SpringBootApplication
public class ContentCheckerApplication {
private static final Logger LOGGER = LoggerFactory.getLogger(ContentCheckerApplication.class);
private ExchangeFunction exchange = ExchangeFunctions.create(new ReactorClientHttpConnector());
public static void main(String[] args) {
SpringApplication app = new SpringApplication(ContentCheckerApplication.class);
// prevent SpringBoot from starting a web server
app.setWebApplicationType(WebApplicationType.NONE);
app.run(args);
}
#Bean
public CommandLineRunner myCommandLineRunner() {
return args -> {
// Change some Netty defaults
ReactorClientHttpConnector connector = new ReactorClientHttpConnector(
options -> options.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 2000)
.compression(true)
.afterNettyContextInit(ctx -> {
ctx.addHandlerLast(new ReadTimeoutHandler(1500, TimeUnit.MILLISECONDS));
}));
LinkedMultiValueMap<String, String> formData = new LinkedMultiValueMap<String, String>();
formData.add("username", args[2]);
formData.add("password", args[3]);
WebClient webClient = WebClient.builder().clientConnector(connector).build();
Mono<String> tokenResult = webClient.post()
.uri( args[0] + "/api/token" )
.body( BodyInserters.fromFormData(formData))
.exchange()
.onErrorMap(ContentCheckerApplication::handleAuthTokenError)
.map(response -> {
if (HttpStatus.OK.equals(response.statusCode())) {
return response.headers().asHttpHeaders().getFirst("WSToken");
} else {
return "";
}
});
LOGGER.info("Subscribing for the result and then going to sleep");
tokenResult.subscribe(ContentCheckerApplication::handleAuthTokenResponse);
Thread.sleep(3600000);
};
}
private static Throwable handleAuthTokenError(Throwable e) {
LOGGER.error("Exception caught trying to process authentication token. ",e);
ContentCheckerApplication.handleAuthTokenResponse("");
return null;
}
private static void handleAuthTokenResponse(String newToken) {
LOGGER.info("Got status code!");
if (!newToken.isEmpty()) {
LOGGER.info("Auth token is: " + newToken);
} else {
LOGGER.info("Unable to authenticate successfully!");
}
System.exit(0);
}
}
I am not getting any data from the queue using Kafka direct stream. In my code I put System.out.println() This statement not run that means I am not getting any data from that topic..
I am pretty sure data available in queue and since not getting in console.
I didn't see any error in console also.
Can anyone please suggest something?
Here is my Java code,
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount11").setMaster("local[*]");
sparkConf.set("spark.streaming.concurrentJobs", "3");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(3000));
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "x.xx.xxx.xxx:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", true);
Collection<String> topics = Arrays.asList("topicName");
final JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
JavaPairDStream<String, String> lines = stream
.mapToPair(new PairFunction<ConsumerRecord<String, String>, String, String>() {
#Override
public Tuple2<String, String> call(ConsumerRecord<String, String> record) {
return new Tuple2<>(record.key(), record.value());
}
});
lines.print();
// System.out.println(lines.count());
lines.foreachRDD(rdd -> {
rdd.values().foreachPartition(p -> {
while (p.hasNext()) {
System.out.println("Value of Kafka queue" + p.next());
}
});
});
I am able to print string which fetch from the kafka queue using direct kafka stream..
Here is my code,
import java.util.HashMap;
import java.util.HashSet;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import java.util.Calendar;
import java.util.Collection;
import java.util.Currency;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.atomic.AtomicReference;
import java.util.regex.Pattern;
import scala.Tuple2;
import kafka.serializer.StringDecoder;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.HasOffsetRanges;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.kafka.OffsetRange;
import org.json.JSONObject;
import org.omg.CORBA.Current;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.Durations;
public final class KafkaConsumerDirectStream {
public static void main(String[] args) throws Exception {
try {
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount11").setMaster("local[*]");
sparkConf.set("spark.streaming.concurrentJobs", "30");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(200));
Map<String, String> kafkaParams = new HashMap<>();
kafkaParams.put("metadata.broker.list", "x.xx.xxx.xxx:9091");
Set<String> topics = new HashSet();
topics.add("PartWithTopic02Queue");
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(jssc, String.class,
String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
lines.foreachRDD(rdd -> {
if (rdd.count() > 0) {
List<String> strArray = rdd.collect();
// Print string here
}
});
jssc.start();
jssc.awaitTermination();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
#Vimal Here is a link to the working version of creating direct streams in Scala.
I believe after reviewing it in Scala, you must convert it easily.
Please make sure that you are turning off for reading the latest topics in Kafka. It might not pick any topic which was processed last time.
i am trying to read the 100k file and send it to kafka topic. Here is my Kafka Code Which sends data to Kafka-console-consumer. When i am sending data i am receiving the data like this
java.util.stream.ReferencePipeline$Head#e9e54c2
Here is the sample single record data what i am sending:
173|172686|548247079|837113012|0x548247079f|7|173|172686a|0|173|2059 22143|0|173|1|173|172686|||0|||7|0||7|||7|172686|allowAllServices|?20161231:22143|548247079||0|173||172686|5:2266490827:DCCInter;20160905152146;2784
Any suggestion to get the data which i had showned in above...Thanks
Code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Properties;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.stream.Stream;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
#SuppressWarnings("unused")
public class HundredKRecords {
private static String sCurrentLine;
public static void main(String args[]) throws InterruptedException, ExecutionException{
String fileName = "/Users/sreeeedupuganti/Downloads/octfwriter.txt";
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
kafka(stream.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
public static void kafka(String stream) {
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class","kafka.producer.DefaultPartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
producer.send(new KeyedMessage<String, String>("test",stream));
producer.close();
}
}
Problem is in line kafka(stream.toString());
Java stream class doesn't override method toString. By default it returns getClass().getName() + '#' + Integer.toHexString(hashCode()). That's exactly that you recieve.
In order to receive in kafka the whole file, you have manually convert it to one String (array of bytes).
Please, note, that kafka has limit for message size.