Infinispan equivalent of IMap.values(Predicate) - java

Is there an equivalent of Hazelcast's IMap.values(Predicate) for Infinispan ? And possibly non-blocking (async) ?
Thanks.

It depends on what you trying to do. Infinispan extends Java's Stream functionality so you can use the Stream interface to get the filtered values.
Examples
//filter by key
cache.values().stream()
.filterKeys(/*set with keys*/)
.forEach(/*do something with the value*/) //or collect()
//filter by key and value
cache.getAdvancedCache().cacheEntrySet().stream()
.filter(entry -> /*check key and value using entry.getKey() or entry.getValue()*/)
.map(StreamMarshalling.entryToValueFunction()) //extract the value only
.forEach(/*do something with the value*/; //or collect()
Infinispan documentation about streams here.

Related

java streams: accumulated collector

Currently, that's my code:
Iterable<Practitioner> referencedPractitioners = this.practitionerRepository.findAllById(
Optional.ofNullable(patient.getPractitioners())
.map(List::stream)
.orElse(Stream.of())
.map(Reference::getIdPart)
.collect(Collectors.toList())
);
As you can see, I'm using this.practitionerRepository.findAllById(Iterable<String> ids), in order to get all using a single communication with database.
I was trying to change it using this:
Optional.ofNullable(patient)
.map(org.hl7.fhir.r4.model.Patient::getPractitioners)
.map(List::stream)
.orElse(Stream.of())
.map(Reference::getIdPart)
.collect(????????);
How could I use this.practitionerRepository.findAllById(Iterable<String> ids) into a custom collector in collect method?
Remember I need to get all entities at once. I can't get them one by one.
You can use Collectors.collectingAndThen(Collector<T,A,R> downstream, Function<R,RR> finisher) specialized collector for that.
Make a list of IDs using the Collector.toList() collector and then
Pass a reference practitionerRepository::findAllById to convert from List<String> to Iterable<Practitioner>
Example:
Iterable<Practitioner> referencedPractitioners = Optional.ofNullable(patient)
.map(Patient::getPractitioners)
.map(List::stream)
.orElseGet(Stream::of)
.map(Reference::getIdPart)
.collect(Collectors.collectingAndThen(toList(), practitionerRepository::findAllById));

Kafka Streams: Windowing by field in record json value

I need to aggregate a stream, which is a join of two other streams. To do this, I specify the windowing of 1 day, but I need to use as a timestamp the value stored in the json of the message. Is it realistic to specify your own timestamp for the stream?
//Record of stream1: {"a_id": 1, "b_id": 2}
//Record of stream2: {"b_id": 2, "timestamp": ...}
KStream<Long, JsonNode> aStream = builder
.stream(aTopic, Consumed.with(Serdes.String(), jsonSerde))
.selectKey((k, v) -> v.get("b_id").asLong());
KStream<Long, JsonNode> bStream = builder
.stream(bTopic, Consumed.with(Serdes.String(), jsonSerde))
.selectKey((k, v) -> v.get("b_id").asLong());
aStream.join(bStream, (JsonNode v1, JsonNode v2) ->
JsonUtils.addFieldIntoJsonNode(v1, v2.get("timestamp"), "timestamp"),
JoinWindows.of(Duration.ofHours(1)),
StreamJoined.with(Serdes.Long(), jsonSerde, jsonSerde))
.{some aggregation with windowing by that "timestamp" field}
I tried to use a timestamp extractor, but I can specify it only when reading a stream that does not fit, because the join window will then be different in the two streams.
What can be done in this case?
You can write your own Processor or Transformer and utilise the ProcessorContext within. If your Kafka Streams version is sufficiently recent, you should find the method ProcessorContext.<K,V> forward(K key, V value, To to). The To class allows specification of the timestamp to be used. The simplest call would be To.all().withTimestamp(123456789L).
You can use a custom timestamp extractor that you can either set globally for all input topics via default.timestamp.extractor config or pass on a per-topic basis via Consumed.with(...).withTimestampExtractor(...).
Cf https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#default-timestamp-extractor

Kafka Stream windowedBy aggregate Materialized withRetention Out Of Memoery

I'm having a KStream<String,Event> which should be windowedBy and aggregated results in an out of memory:
java.lang.OutOfMemoryError: Java heap space
The KStream DSL is as follows:
TimeWindows timeWindows = TimeWindows.of(Duration.ofDays(1)).advanceBy(Duration.ofMillis(1));
Initializer<History> historyInitializer = History::new;
Aggregator<String, Event, History> historyAggregator = (key, value, aggregate) -> {
aggregate.key = value.uuid;
aggregate.addHistoryEventWindow(value);
return aggregate;
};
KTable<String, History> historyWindowed = eventStreamRaw
.filter((key, value) -> value != null)
.groupByKey(Grouped.with(Serdes.String(), this.eventSerde))
// segment our messages into 1-day windows
.windowedBy(timeWindows)
.aggregate(historyInitializer, historyAggregator, Named.as("name"), Materialized.with(Serdes.String(), this.historySerde))
.suppress(Suppressed.untilWindowCloses(BufferConfig.unbounded()))
.groupBy(
(key, value) -> new KeyValue<String, History>(
value.key + "|+|" + key.window().start() + "|+|" + key.window().end(), value),
Grouped.with(Serdes.String(), this.historySerde))
.aggregate(History::new, (key, value, aggValue) -> value, (key, value, aggValue) -> value,
Materialized.with(Serdes.String(), this.historySerde));
Reading some articles (for example Kafka Streams Window By & RocksDB Tuning) I noticed that I may have to configure the store "Materialized" with a retention of "1 day + 1 Milli".
But trying to add that doesn't work for me:
final Materialized<String, History, WindowStore<Bytes, byte[]>> store = Materialized.<String, History, WindowStore<Bytes, byte[]>>as("eventstore")
.withKeySerde(Serdes.String())
.withValueSerde(this.historySerde)
.withRetention(Duration.ofDays(1).plus(Duration.ofMillis(1)));
KTable<String, History> historyWindowed = eventStreamRaw
...
.aggregate(historyInitializer, historyAggregator, Named.as("name"), store)
The Java compile throw the following error:
The method
aggregate(Initializer<VR>, Aggregator<? super String,? super Event,VR>, Named, Materialized<String,VR,WindowStore<Bytes,byte[]>>)
in the type TimeWindowedKStream<String,Event> is not applicable for the arguments
(Initializer<History>, Aggregator<String,Event,History>, Named, Materialized<String,History,WindowStore<Bytes,byte[]>>)
To be honest, I don't get it. The parameters are correct; the VR type is 'History'.
So, do you know what I'm missing?
The idea of this windowedBy KTable is to have a state which holds all events for one "thing" for one day. Let's say a new alert is produced I want to attach all events of a "thing" for one day to the alert. I would then do a leftJoin from the KStream Alert to the KTable History. Would that the best way to add historical data to a Kafka event? Is there a way to just "look up" the last x days of the KStream Events? I've checked the KStream Alert-KStream Event leftJoin but that would produce an output for every new KStream Event. So, that would be from my point not practicable.
Many thanks for your help. I hope it's just a simple fix one. Highly appreciate!
looking at the following post Kafka Streams App - count and sum aggregate I've imported the wrong "Byte"-class. So, be sure to import the following class "org.apache.kafka.common.utils.Bytes".
But, maybe you have a better idea to enrich a Kafka message from one stream with historical data from another stream related by a (foreign) key.
Thanks guys.

Kafka GroupTable tests generating extra messages when using ProcessorTopologyTestDriver

I've written a stream that takes in messages and sends out a table of the keys that have appeared. If something appears, it will show a count of 1. This is a simplified version of my production code in order to demonstrate the bug. In a live run, a message is sent out for each message received.
However, when I run it in a unit test using ProcessorTopologyTestDriver, I get a different behavior. If a key that has already been seen before is received, I get an extra message.
If I send messages with keys "key1", then "key2", then "key1", I get the following output.
key1 - 1
key2 - 1
key1 - 0
key1 - 1
For some reason, it decrements the value before adding it back in. This only happens when using ProcessorTopologyTestDriver. Is this expected? Is there a work around? Or is this a bug?
Here's my topology:
final StreamsBuilder builder = new StreamsBuilder();
KGroupedTable<String, String> groupedTable
= builder.table(applicationConfig.sourceTopic(), Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> KeyValue.pair(key, value), Serialized.with(Serdes.String(), Serdes.String()));
KTable<String, Long> countTable = groupedTable.count();
KStream<String, Long> countTableAsStream = countTable.toStream();
countTableAsStream.to(applicationConfig.outputTopic(), Produced.with(Serdes.String(), Serdes.Long()));
Here's my unit test code:
TopologyWithGroupedTable top = new TopologyWithGroupedTable(appConfig, map);
Topology topology = top.get();
ProcessorTopologyTestDriver driver = new ProcessorTopologyTestDriver(config, topology);
driver.process(inputTopic, "key1", "theval", Serdes.String().serializer(), Serdes.String().serializer());
driver.process(inputTopic, "key2", "theval", Serdes.String().serializer(), Serdes.String().serializer());
driver.process(inputTopic, "key1", "theval", Serdes.String().serializer(), Serdes.String().serializer());
ProducerRecord<String, Long> outputRecord = driver.readOutput(outputTopic, keyDeserializer, valueDeserializer);
assertEquals("key1", outputRecord.key());
assertEquals(Long.valueOf(1L), outputRecord.value());
outputRecord = driver.readOutput(outputTopic, keyDeserializer, valueDeserializer);
assertEquals("key2", outputRecord.key());
assertEquals(Long.valueOf(1L), outputRecord.value());
outputRecord = driver.readOutput(outputTopic, keyDeserializer, valueDeserializer);
assertEquals("key1", outputRecord.key());
assertEquals(Long.valueOf(1L), outputRecord.value()); //this fails, I get 0. If I pull another message, it shows key1 with a count of 1
Here's a repo of the full code:
https://bitbucket.org/nsinha/testtopologywithgroupedtable/src/master/
Stream topology: https://bitbucket.org/nsinha/testtopologywithgroupedtable/src/master/src/main/java/com/nick/kstreams/TopologyWithGroupedTable.java
Test code: https://bitbucket.org/nsinha/testtopologywithgroupedtable/src/master/src/test/java/com/nick/kstreams/TopologyWithGroupedTableTests.java
It's not a bug, but behavior by design (c.f. explanation below).
The difference in behavior is due to KTable state store caching (cf. https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html). When you run the unit test, the cache is flushed after each record, while in your production run, this is not the case. If you disable caching in your production run, I assume that it behaves the same as in your unit test.
Side remark: ProcessorTopologyTestDriver is an internal class and not part of public API. Thus, there is no compatibility guarantee. You should use the official unit-test packages instead: https://docs.confluent.io/current/streams/developer-guide/test-streams.html
Why do you see two records:
In your code, you are using a KTable#groupBy() and in your specific use case, you don't change the key. However, in general, the key might be changed (depending on the value of the input KTable. Thus, if the input KTable is changed, the downstream aggregation needs to remove/subtract the old key-value pair from the aggregation result, and add the new key-value pair to the aggregation result—in general, the key of the old and new pair are different and thus, it's required to generate two records because the subtraction and addition could happen on different instances as different keys might be hashed differently. Does this make sense?
Thus, for each update of the input KTable, two updates two the result KTable on usually two different key-value pairs need to be computed. For you specific case, in which the key does not change, Kafka Stream does the same thing (there is no check/optimization for this case to "merge" both operations into one if the key is actually the same).

getting a hashmap in R using rJava

I have a plain hashmap with numeric values and would like to retrieve its content, ideally in a list (but that can be worked out).
Can it be done?
Try this:
library(rJava)
.jinit()
# create a hash map
hm<-.jnew("java/util/HashMap")
# using jrcall instead of jcall, since jrcall uses reflection to get types
.jrcall(hm,"put","one", "1")
.jrcall(hm,"put","two","2")
.jrcall(hm,"put","three", "3")
# convert to R list
keySet<-.jrcall(hm,"keySet")
an_iter<-.jrcall(keySet,"iterator")
aList <- list()
while(.jrcall(an_iter,"hasNext")){
key <- .jrcall(an_iter,"next");
aList[[key]] <- .jrcall(hm,"get",key)
}
Note that using .jrcall is less efficient than .jcall. But for the life of me I can not get the method signature right with .jcall. I wonder if it has something to do with the lack of generics.
I have never done this myself, but there is an example in the rJava documentation of creating and working with a HashMap using the with function:
HashMap <- J("java.util.HashMap")
with( HashMap, new( SimpleEntry, "key", "value" ) )
with( HashMap, SimpleEntry )

Categories