I want to read from multiple topics, so i declared them in yaml file with comma separated but getting below error:
java.lang.IllegalStateException: Topic(s) [ topic-1 , topic-2, topic-3, topic-4, topic-5, topic-6, topic-7] is/are not present and missingTopicsFatal is true
Spring:
kafka:
topics:
tp: topic-1 , topic-2, topic-3, topic-4, topic-5, topic-6, topic-7
#KafkaListener(topics = "#{'${spring.kafka.topics.tp}'.split(',')}",
concurrency = "190",
clientIdPrefix = "client1",
groupId = "group1")
public void listenData(final ConsumerRecord<Object, Object> inputEvent) throws Exception {
handleMessage(inputEvent);
}
if i declare all topics inside KafkaListener annotation its working fine.
Remove the spaces
tp: topic-1,topic-2,topic-3,topic-4,topic-5,topic-6,topic-7
Or use
.split(' *, *')
Related
I have three component: rest, Cassandra and Kafka and I am using Apache camel. When the request receives, I want to add a record in Cassandra and after that, adding that record to Kafka. Finally generating rest response. May be pipeline is not a good solution for me! Because Cassandra part is InOnly and hasn't any out exchange!
I wrote this route:
rest().path("/addData")
.consumes("text/plain")
.produces("application/json")
.post()
.to("direct:requestData");
from("direct:requestData")
.pipeline("direct:init",
"cql://localhost/myTable?cql=" + CQL,
"direct:addToKafka"
)
.process(exchange -> {
var currentBody = (List<?>) exchange.getIn().getBody();
var body = new Data((String) currentBody.get(0), (Long) currentBody.get(1), (String) currentBody.get(2));
exchange.getIn.setBody(body.toJsonString());
});
from("direct:init")
.process(exchange -> {
var currentBody = exchange.getIn().getBody();
var body = Arrays.asList(generateId(), generateTimeStamp, currentBody);
exchange.getIn().setBody(body);
});
from("direct:addToKafka")
.process(
// do sth to add kafka
);
I tried sth such setting patternExtention to InOut for Cassandra!!! finally understand that this is impossible! because patternExtention used for consumer and I used Cassandra in a route as producer.
I am creating a Springboot app and struggling to understand why my GlobalKtable is not updating.
As far as I understand, the global table is supposed to update automatically when the source topic is updated. This is not the case for me.
I did notice the global table becomes populated with new data after I manually delete the state store folder.
I also noted the following error output when the Spring boot app is launched:
**2021-11-27 23:09:18.232 ERROR 17592 --- [ main] o.a.k.s.p.internals.StateDirectory : Failed to change
permissions for the directory d:\kafkastreamsdb
2021-11-27 23:09:18.233 ERROR 17592 --- [ main]
o.a.k.s.p.internals.StateDirectory : Failed to change
permissions for the directory d:\kafkastreamsdb\Kafka-streams**
Seems to me the reason why I only see all the current data in the GlobalKtable after deleting the statestore folder is because the stream is not writing to the state store while the stream is running, but recreates the state store from the source topic after deletion?
So, the issue here is when I try to use the global table as a look-up in the join below, the enriched stream returns NULL for the table values. However, when I delete the state store folder, and restart Springboot, the enriched stream does return the table values.
Just to clarify, new events are continuously sent to the source topic, but this data is only visible in the table after deletion.
Here is my code:
#Service
public class TopologyBuilder2 {
public static Topology build() {
StreamsBuilder builder = new StreamsBuilder();
// Register FakeAddress stream
KStream<String, FakeAddress> streamFakeAddress =
builder.stream("FakeAddress", Consumed.with(Serdes.String(), JsonSerdes.FakeAddress()));
GlobalKTable<String, Greetings> globalGreetingsTable = builder.globalTable(
"Greetings"
, Consumed.with(Serdes.String(), JsonSerdes.Greetings())
, Materialized.<String, Greetings, KeyValueStore<Bytes, byte[]>>as(
"GREETINGS" /* table/store name */)
.withKeySerde(Serdes.String()) /* key serde */
.withValueSerde(JsonSerdes.Greetings()) /* value serde */);
// LEFT Key mapper
KeyValueMapper<String, FakeAddress, String> keyMapperFakeAddress =
( leftkey, fakeAddress) -> {
// System.out.println(String.valueOf(fakeAddress.getCountry()));
return String.valueOf(fakeAddress.getCountry());
};
// Value joiner
ValueJoiner<FakeAddress, Greetings, EnrichedCountryGreeting> valueJoinerFakeAddressAndGreetings =
(fakeAddress, greetings) -> new EnrichedCountryGreeting(fakeAddress, greetings);
KStream<String, EnrichedCountryGreeting> enrichedStream
= streamFakeAddress.join(globalGreetingsTable, keyMapperFakeAddress, valueJoinerFakeAddressAndGreetings);
enrichedStream.print(Printed.<String, EnrichedCountryGreeting>toSysOut().withLabel("Stream-enrichedStream: "));
return builder.build();
}
}
I'm attempting to reproduce this example. My topology is:
#Bean("myTopo")
public KStream<Object, Object> getTopo(#Qualifier("myKConfig") StreamsBuilder builder) {
var stream = builder.stream("my-events");
stream.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(2)))
.count()
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.foreach((k, v) -> {
System.out.println("k + v = " + k + " --- " + v);
});
I've set the serde and the windowed serde internal classes in the config:
...
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, JsonSerde.class);
...
props.put(JsonDeserializer.VALUE_DEFAULT_TYPE, JsonNode.class);
props.put(StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS, JsonSerde.class);
var config = new KafkaStreamsConfiguration(props);
return new StreamsBuilderFactoryBean(config);
The error I get is
org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor.
Do the Processor's input types match the deserialized types?
Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
Make sure the Processor can accept the deserialized input of type key:
org.apache.kafka.streams.kstream.Windowed,
and value:
org.apache.kafka.streams.kstream.internals.Change.
with underlying cause
java.lang.ClassCastException: class org.apache.kafka.streams.kstream.Windowed
cannot be cast to class java.lang.String (org.apache.kafka.streams.kstream.Windowed is in unnamed module of loader 'app';
java.lang.String is in module java.base of loader 'bootstrap')
I see count() returns KTable<Windowed<Object>, Long>. So it looks like the problem is it wants a Windowed<String> serde for the key. Apparently, DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS isn't sufficient.
How do I create and set this?
I think I ran into this bug:
https://issues.apache.org/jira/browse/KAFKA-9259
I added a Materialized to the count() method
var store = Stores.persistentTimestampedWindowStore(
"some-state-store",
Duration.ofMinutes(5),
Duration.ofMinutes(2),
false);
var materialized = Materialized
.<String, Long>as(store)
.withKeySerde(Serdes.String());
Now the code runs without exception.
I'm trying to load all incoming parquet files from an S3 Bucket, and process them with delta-lake. I'm getting an exception.
val df = spark.readStream().parquet("s3a://$bucketName/")
df.select("unit") //filter data!
.writeStream()
.format("delta")
.outputMode("append")
.option("checkpointLocation", checkpointFolder)
.start(bucketProcessed) //output goes in another bucket
.awaitTermination()
It throws an exception, because "unit" is ambiguous.
I've tried debugging it. For some reason, it finds "unit" twice.
What is going on here? Could it be an encoding issue?
edit:
This is how I create the spark session:
val spark = SparkSession.builder()
.appName("streaming")
.master("local")
.config("spark.hadoop.fs.s3a.endpoint", endpoint)
.config("spark.hadoop.fs.s3a.access.key", accessKey)
.config("spark.hadoop.fs.s3a.secret.key", secretKey)
.config("spark.hadoop.fs.s3a.path.style.access", true)
.config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", 2)
.config("spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored", true)
.config("spark.sql.caseSensitive", true)
.config("spark.sql.streaming.schemaInference", true)
.config("spark.sql.parquet.mergeSchema", true)
.orCreate
edit2:
output from df.printSchema()
2020-10-21 13:15:33,962 [main] WARN org.apache.spark.sql.execution.datasources.DataSource - Found duplicate column(s) in the data schema and the partition schema: `unit`;
root
|-- unit: string (nullable = true)
|-- unit: string (nullable = true)
Reading the same data like this...
val df = spark.readStream().parquet("s3a://$bucketName/*")
...solves the issue. For whatever reason. I would love to know why... :(
I'm working with Spring cloud streams and wanted to fiddle with KStreams/KTables a little.
I'm looking for the methodology of going from a standard Kafka topic to turn it into a stream.
I've done this in KSQL but I'm trying to figure out if there is a way to have SpringBoot handle this. The best I can find is examples where both #Input and #Output channels are already KStreams but I think that is not what I want.
Kafka Setup
Inside of SpringBoot I'm doing the following:
My data comes on: force-entities-topic
I then "clean" the data removing the [UTC] tag from the time message and re-publish on:
force-entities-topic-clean
From there I was hoping to take the output of that and build both a KStream and KTable keyed on the platformUID field.
Input data
So the data I'm working with is:
{
"platformUID": "UID",
"type": "TLPS",
"state": "PLATFORM_INITIALIZED",
"fuelremaining": 5.9722E+24,
"latitude": 39,
"longitude": -115,
"altitude": 0,
"time": "2018-07-18T00:00:00Z[UTC]"
}
KSQL
I can run these KSQL commands to create what I need. (Here I'm reading time in as a string as opposed to actual time which I'm doing in the java/kotlin implementation)
CREATE STREAM force_no_key (
platformUID string,
type string,
state string,
fuelremaining DOUBLE,
latitude DOUBLE,
longitude DOUBLE,
altitude DOUBLE
) with (
kafka_topic='force-entities-topic',
value_format='json');
From there I make another stream (because I couldn't get it to read the key correctly)
CREATE STREAM force_with_key
WITH (KAFKA_TOPIC='blue_force_with_key') AS
select PLATFORMUID as UID, LATITUDE as lat, LONGITUDE as LON, ALTITUDE as ALT, state, type
FROM force_no_key
PARTITION BY UID;
And from this point
CREATE TABLE FORCE_TABLE
( UID VARCHAR,
LAT DOUBLE,
LON DOUBLE,
ALT DOUBLE
) WITH (KAFKA_TOPIC = 'force_with_key',
VALUE_FORMAT='JSON',
KEY = 'UID');
Java Style!
Where I'm running into trouble I think is here. I define my binding interface here:
interface ForceStreams {
companion object {
// From the settings file we configure it with the value of-force-in
const val DIRTY_INPUT = "dirty-force-in"
const val CLEANED_OUTPUT = "clean-force-out"
const val CLEANED_INPUT = "clean-force-in"
const val STREAM_OUT = "stream-out"
}
#Input(DIRTY_INPUT)
fun initialInput(): MessageChannel
#Output(CLEANED_OUTPUT)
fun cleanOutput(): SubscribableChannel
#Input(CLEANED_INPUT)
fun cleanInput(): MessageChannel
#Output(STREAM_OUT)
fun cleanedBlueForceMessage(): KStream<String, ForceEntity>
#Output(TABLE_OUT)
fun tableOutput(): KTable<String, ForceEntity>
}
And then I do the cleaning with this block:
#StreamListener(ForceStreams.DIRTY_INPUT)
#SendTo(ForceStreams.CLEANED_OUTPUT)
fun forceTimeCleaner(#Payload message: String): ForceEntity {
var inputMap: Map<String, Any> = objectMapper.readValue(message)
var map = inputMap.toMutableMap()
map["type"] = map["type"].toString().replace("-", "_")
map["time"] = map["time"].toString().replace("[UTC]", "")
val json = objectMapper.writeValueAsString(map)
val fe : ForceEntity = objectMapper.readValue(json, ForceEntity::class.java)
return fe
}
But I'm going from MessageChannel to SubscribableChannel
What I'm unsure how to do is go from SubscribableChannel to either KStream<String,ForceEntity> or KTable<String,ForceEntity>
Any help would be appreciated - thanks
Edit - applicaiton.yml
server:
port: 8888
spring:
application:
name: Blue-Force-Table
kafka:
bootstrap-servers: # This seems to be for the KStreams the other config is for normal streams
- localhost:19092
cloud:
stream:
defaultBinder: kafka
kafka:
binder:
brokers: localhost:19092
bindings:
dirty-force-in:
destination: force-entities-topic
contentType: application/json
clean-force-in:
destination: force-entities-topic-clean
contentType: application/json
clean-force-out:
destination: force-entities-topic-clean
contentType: application/json
stream-out:
destination: force_stream
contentType: application/json
table-out:
destination: force_table
contentType: application/json
I guess the follow on question is - is this even possible? Can you mix and match binders within a single function?
In the first StreamListener, you are receiving data through DIRTY_INPUT binding and writing through the binding CLEANED_OUTPUT. Then you need to have another StreamListener, where you receive that data as KStream and do the processing and write the output.
First processor:
#StreamListener(ForceStreams.DIRTY_INPUT)
#SendTo(ForceStreams.CLEANED_OUTPUT)
fun forceTimeCleaner(#Payload message: String): ForceEntity {
....
Change the following to a KStream binding.
#Input(CLEANED_INPUT)
fun cleanInput(): MessageChannel
to
#Input(CLEANED_INPUT)
fun cleanInput(): KStream<String, ForceEntity>
Second processor:
#StreamListener(CLEANED_INPUT)
#SendTo(STREAM_OUT)
public KStream<String, ForceEntity> process(
KStream<String, ForceEntity> forceEntityStream) {
return forceEntityStream
........
.toStream();
}
Currently, the Kafka Streams binder in Spring Cloud Stream does not support writing the data out as a KTable. Only KStream objects are allowed on the output (KTable binding is allowed on the input). If that is a hard requirement, you need to look into Spring Kafka where you can go lower level and do that kind of outbound operations.
Hope that helps.