kafka streams hopping windowed aggregation causing multiple windows at timestamp zero - java

Kafka Streams DSL windowed aggregation causing multiple windows.
#StreamListener("input")
public void process(KStream<String, Data> DataKStream) {
JsonSerde<DataAggregator> DataJsonSerde =
new JsonSerde<>(DataAggregator.class);
DataKStream
.groupByKey()
.windowedBy(TimeWindows.of(60000).advanceBy(30000))
.aggregate(
DataAggregator::new,
(key, Data, aggregator) -> aggregator.add(Data),
Materialized.with(Serdes.String(), DataJsonSerde)
);
}
DataAggregator.java
public class DataAggregator {
private List<String> dataList = new ArrayList<>();
public DataAggregator add(Data data) {
dataList.add(data.getId());
System.out.println(dataList);
return this;
}
public List<String> getDataList() {
return dataList;
}
}
I am grouping input data based on key, then doing 1 minute window with 30 seconds hop and in aggregator I'm just collecting data and displaying.
I was expecting 1 window at the beginning and after 30 seconds another window. But the actual output is different since beginning itself 2 windows are creating.
Expected:
[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6] // till 30 seconds only one window
[6] // new window after 30 seconds
[1, 2, 3, 4, 5, 6, 7]
[6, 7]
[1, 2, 3, 4, 5, 6, 7, 8]
[6, 7, 8]
Actual output:
[1]
[1]
[1, 2]
[1, 2]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6] // duplicate window even before 30 seconds
[6] // new window after 30 seconds and 1 window from earlier will be dropped
[1, 2, 3, 4, 5, 6, 7]
[6, 7]
Since I'm creating hoping window of 30 seconds in a 1 minute window. I believe, initially there should be only one window and after 30 seconds another window should create.
Can someone please let me know, is the actual output is expected behavior or I am missing something?
NOTE: I am getting input data every 4 seconds and expected/actual output is just for representation.
From Kafka Documentation:
Hopping time windows are aligned to the epoch, with the lower interval
bound being inclusive and the upper bound being exclusive. “Aligned to
the epoch” means that the first window starts at timestamp zero. For
example, hopping windows with a size of 5000ms and an advance interval
(“hop”) of 3000ms have predictable window boundaries
[0;5000),[3000;8000),... — and not [1000;6000),[4000;9000),... or even
something “random” like [1452;6452),[4452;9452),....

Because your windows overlap, you get multiple windows per timestamp. For your particular window configuration, you always get 2 windows (in milliseconds):
[0,60000) [60000,12000) [12000,18000) ...
[30000,90000) [90000,15000) ...
You cannot change this behavior, however, you could apply a filter() on the result (ie, aggregate(...).filter(...) to drop windows you are not interested in.
Furthermore, by default the record event-time is used by Kafka Streams. There is a WallclockTimestampExtractor but it's only used if you set it explicitly. Cf. https://docs.confluent.io/current/streams/developer-guide/config-streams.html#default-timestamp-extractor

Related

How does comparator works internally?

It may sound trivial for you but I am having a hard time visualizing the comparator / array.sort. How can we sort a full array using only 2 arguments? How does it work internally?
for example- Input -[5,3,2,6,8,10,1], Output- [1,2,3,5,6,8,10]
Which algo does it use internally? Which 2 objects does it compare at first? (5 compared to 3?) and then what are the next two objects? (5 compared to 2?) or (3 compared to 2)?
public static void main(String[] args) {
Integer[] tring = new Integer[]{5,3,2,6,8,10,1};
lol(tring);
for(int i=0;i<tring.length;i++){
System.out.println(tring[i]);
}
}
public static void lol(Integer[] args) {
Arrays.sort(args,(h1,h2)->h1-h2);
}
You can visualize the process like this.
Integer[] tring = new Integer[] {5, 3, 2, 6, 8, 10, 1};
Comparator<Integer> comparator = (a, b) -> {
System.out.println(Arrays.toString(tring) + " comparing " + a + " and " + b);
return a.compareTo(b);
};
Arrays.sort(tring, comparator);
System.out.println(Arrays.toString(tring));
result:
[5, 3, 2, 6, 8, 10, 1] comparing 3 and 5
[5, 3, 2, 6, 8, 10, 1] comparing 2 and 3
[5, 3, 2, 6, 8, 10, 1] comparing 6 and 2
[2, 3, 5, 6, 8, 10, 1] comparing 6 and 3
[2, 3, 5, 6, 8, 10, 1] comparing 6 and 5
[2, 3, 5, 6, 8, 10, 1] comparing 8 and 5
[2, 3, 5, 6, 8, 10, 1] comparing 8 and 6
[2, 3, 5, 6, 8, 10, 1] comparing 10 and 5
[2, 3, 5, 6, 8, 10, 1] comparing 10 and 8
[2, 3, 5, 6, 8, 10, 1] comparing 1 and 6
[2, 3, 5, 6, 8, 10, 1] comparing 1 and 3
[2, 3, 5, 6, 8, 10, 1] comparing 1 and 2
[1, 2, 3, 5, 6, 8, 10]
The comparator uses a sort called a TimSort
I personally do not feel qualified to explain the timsort algorithm, but I'm sure you can find plenty of explanations on google.
For the second part of your question, the way the comparator uses your two augments is to determine what order any two given values order should be.
So, for example, say if you wanted to sort [6,4] the comparator would use your function a-b and would then plug in the numbers 6 and 4 and get the 2 and because 2 is positive the sort knows that 6 needs to be behind 4. Which would result in [4,6].

How to partition list with current and next object in every iteration using java streams?

I would like to get the next two objects from an ArrayList every iteration. I know we can achieve this using traditional for loop but just wondering if there are any other ways to do it.
For example, if I have the following list,
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6);
My output should be
[1, 2]
[2, 3]
[3, 4]
[4, 5]
[5, 6]
Any help would be much appreciated.
By using IntStream you can iterate the List based on index and collect the result into List
List<List<Integer>> res = IntStream.range(0, list.size()-1)
.mapToObj(i->Arrays.asList(list.get(i),list.get(i+1)))
.collect(Collectors.toList());
Output :
[[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]

Java ArrayList.removeAll()

I want to make a section of code that takes a list of lists, splits it into sublists of 9 and remove numbers from all the lists in each of the sublists. However, when my code runs it removes numbers from all the lists, not just the section taken from the original list
for (int startingIndex = 0; startingIndex <= 8; startingIndex++) {
int initialIndex = startingIndex * 9;
ArrayList<ArrayList<String>> gridRow = new ArrayList<ArrayList<String>>();
gridRow.addAll((posabilityGrid.subList(initialIndex, initialIndex+9)));
System.out.println("gridrow - " + gridRow);
ArrayList<String> numbers = new ArrayList<String>();
for (ArrayList<String> posability : gridRow) {
if (posability.size() == 1) {
numbers.add(posability.get(0));
}
}
System.out.println("numbers - " + numbers);
for (ArrayList<String> posability : gridRow) {
posability.removeAll(numbers);
}
System.out.println("newgrid - " + gridRow);
edit:
When the starting index first equals 0:
grid row - [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [4], [3], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [2], [1, 2, 3, 4, 5, 6, 7, 8, 9], [9]]
numbers - [4, 3, 2, 9]
it then correctly prints out:
newgrid - [[1, 5, 6, 7, 8], [1, 5, 6, 7, 8], [], [], [1, 5, 6, 7, 8], [1, 5, 6, 7, 8], [], [1, 5, 6, 7, 8], []]
However, when starting index equals 1 at the start:
gridrow - [[1, 5, 6, 7, 8], [1, 5, 6, 7, 8], [5], [1, 5, 6, 7, 8], [1, 5, 6, 7, 8], [9], [1, 5, 6, 7, 8], [1, 5, 6, 7, 8], [1]]
instead of the expected
[[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [5], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1]]
This section of the list has had the previous numbers removed from it for some reason when it should stay the same so the only the new set of numbers are subtracted
2nd edit:
I've added the line
numbers.clear();
but i still have the same problem.
I've printed out the numbers list and checked that it is cleared each time but the main list seems to be changed on the first "posability.removeAll(numbers);"
Edit 3:
I've solved it now, The problem was with the ArrayList and Sublists. Once I changed the list so a new ArrayList deepcopy is created rather then just referencing the old one the code works great.
List<List<String>> posabilityGridClone = posabilityGrid.stream().map(it -> new ArrayList(it)).collect(Collectors.toList());
gridRow.addAll((Collection<? extends ArrayList<String>>) (posabilityGridClone.subList(initialIndex, initialIndex+9)));
Add numbers.clear() as the last line of your main cycle. You numbers array is persisted between cycles and it is the problem if I correctly understand what you expect to get.
EDIT
Sorry, I didn't saw numbers at first. I though it was created out of scope.
You problem is actually in this line:
gridRow.addAll((posabilityGrid.subList(initialIndex, initialIndex+9)));
When you create a sublist you have two problems here:
1) Sublist is just a view of the same array. (removing element from sublist affect original list)
2) Elements of array are references to another arrays. So when you run removeAll you actually remove it all from original arrays.
What you need is to make deepcopy of you arrays of arrays and use it instead of original one.
List<List<String>> posabilityGridClone = posabilityGrid.stream().map(it -> new ArrayList(it)).collect(Collectors.toList());
gridRow.addAll((posabilityGridClone.subList(initialIndex, initialIndex+9)));

Get N last objects emitted by observable in RxJava2

I have a Observables which emits some numbers and I simply want to take last N elements.
I have following code (I'm using RxKotlin which is simply a wrapper on RxJava):
val list = listOf(1,2,3,4,5,6,7,8,9,10)
Observable.fromIterable(list)
.buffer(3, 1)
.lastOrError()
.subscribe{value -> println(value)}
Unfortunately the result is [10], as I looked closer what the buffer operator returns, I saw this:
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
[5, 6, 7]
[6, 7, 8]
[7, 8, 9]
[8, 9, 10]
[9, 10]
[10]
is there a way to get last "full" buffer -> [8, 9, 10] ?
In RxJava, many operators have the name that matches the common language expression of the same operation: take + last N -> takeLast(int n):
Observable.range(1, 10)
.takeLast(3)
.toList() // <-- in case you want it as a list
.subscribe(System.out::println);

Capture Barcode input into one single String

Ive been trying to work it out but cant get to no solution, im sure its something simple that im doing wrong. But im trying to capture the input into one simple String. I tried adding the keys into an array and try to convert them to a string but to no avail.
This is my basic code
#Override
public void keyPressed(KeyEvent e) {
if(e.getKeyCode() >=48 && e.getKeyCode() <=57){
String myString = Character.toString(e.getKeyChar());
keys.add(myString);
}
System.out.println(keys);
}
});
When doing this my output is :
[4, 2]
[4, 2, 2]
[4, 2, 2, 1]
[4, 2, 2, 1, 1]
[4, 2, 2, 1, 1, 4]
[4, 2, 2, 1, 1, 4, 7]
[4, 2, 2, 1, 1, 4, 7, 1]
[4, 2, 2, 1, 1, 4, 7, 1]
[4, 2, 2, 1, 1, 4, 7, 1]
[4, 2, 2, 1, 1, 4, 7, 1]
The last few entries are the correct barcode but i cant seperate them using:
String barcode = keys.get(keys.size() - 1);
When i print the barcode i get
4
2
2
1
1
4
7
1
1
1
1
which is wrong as there are extra numbers and it isn't one single string without spaces.
Looks like keys - is List.So when you are trying to print it, Java prints the result of toString function (similar to this [4, 2, 2, 1, 1, 4, 7]). But with keys.get(n) you are getting only last element from this List.
Try this:
String barcode = keys.toString().
By the way, you have some extra numbers because of evaluating keyPressed function, which appends new values to your keys List.

Categories