How to drop fields from avro data in kafka streams

How to drop fields from avro data in kafka streams - java

I have an avro data with fields userid, email, orderid, totalcost, address .....
I want drop some fields from the data in kafka streams
I have tried this to drop both the email and orderid field
avrodata.peek((key, value) -> System.out.println("incoming " + value))
.filterNot((key, value) -> value == value.get("email"))
.filterNot((key, value) -> value == value.get("orderid"))
.peek((key, value) -> System.out.println("processed " + value));
and also this using the getter from the Avro Maven plugin generated class
avrodata.peek((key, value) -> System.out.println("incoming " + value))
.filterNot((key, value) -> value == value.getEmail())
.filterNot((key, value) -> value == value.getOrderid())
.peek((key, value) -> System.out.println("processed " + value));
but the second peek shows that no filtering is happening. also i am not sure if filterNot is the correct solution
i also try using mapValues but cant figure out how to map multiple fields
thanks.

You don't "drop fields" with filter, you drop records.
I order to drop fields, you need to use mapValues and return a new object without those fields.
e.g.
stream.mapValues((value) -> {
return FooBuilder
.withEmail(value.get("email"))
.withOrderId(value.get("orderId"))
.build();
});
Avro shouldn't matter, other than you'll need a new schema for the "reduced object" or make the not-set fields as nullable.

Related

Given a map of type Map<K, List<V>>, how to update value within List<V> using Java streams?

I am receiving the resposne from some API which is of type:
public class Response {
private String role;
private String permission;
// Setters and getters
}
value of Role could be something like "Admin", "User" etc and value of permission is something like "R-4", "C-44" etc. "R-" indicates Region and "C-" indicates country.
So initially I constructed Map<String, Set> by doing:
Map<String, Set<String>> initialMap= responseList.stream().collect(
Collectors.groupingBy(Reponse::getRole, HashMap::new,
Collectors.mapping(Reponse::getPermission, Collectors.toSet()))
);
But for my application, I want map to be Map<String, Set<Long>> which indicate Role and respective countries associated with that role. Basically, I want to remove "R-" and "C-" from the Set which is value of hashmap. When value is something like "R-4" then I want to remove "R-" and use id which is 4 in this example and pass this id to database to get the List<Long> countries and add it into value of hashmap. When value is something like "C-44" then I want to remove "C-" and add that id into value of hashmap.
One approach could be manually iterating over each entry of initialMap and then getting it's corresponding Set<String> value and then again iterating over Set<String> to get String value and then converting it to Long. I was thinking is there any better way to do this using Streams? can I directly construct Map<String, Set<Long>> from my initial reponseList?

It seems that the main issue here is not with parsing the id of region/country but rather with combining List<Long> containing multiple country IDs per regionId retrieved from some DB/repository, and a single country ID from the permission C-###.
This may be resolved using Collectors.flatMapping available in Java 9+. Also, a separate function streamOfCountryIDs needs to be implemented to map the response's permission into a Stream<Long> for this collector:
Map<String, Set<Long>> converted = responses.stream()
.collect(Collectors.groupingBy(
Response::getRole,
Collectors.flatMapping(MyClass::streamOfCountryIDs, Collectors.toSet())
));
System.out.println(converted);
// MyClass
private static Stream<Long> streamOfCountryIDs(Response response) {
String permission = response.getPermission().toUpperCase();
if (permission.startsWith("R-")) {
return countryRepo.getCountriesByRegion(Long.parseLong(permission.substring(2)))
.stream();
} else if (permission.startsWith("C-")) {
return Stream.of(Long.parseLong(permission.substring(2)));
}
// log bad permission and return empty stream or throw an exception if needed
System.out.println("Bad permission: '" + permission + "'");
return Stream.empty();
}

You can try this
Map<String, Set<String>> initialMap= responseList.stream().collect(
Collectors.groupingBy(Response::getRole, HashMap::new,
Collectors.mapping((Response res) -> Long.parseLong(res.getModifiedPermission()), Collectors.toSet()))
);
with getModifiedPermission() being a function which removes the prefix from Permission.

Collect key values from array to map without duplicates

My app gets some string from web service. It's look like this:
name=Raul&city=Paris&id=167136
I want to get map from this string:
{name=Raul, city=Paris, id=167136}
Code:
Arrays.stream(input.split("&"))
.map(sub -> sub.split("="))
.collect(Collectors.toMap(string-> string[0]), string -> string[1]));
It's okay and works in most cases, but app can get a string with duplicate keys, like this:
name=Raul&city=Paris&id=167136&city=Oslo
App will crash with following uncaught exception:
Exception in thread "main" java.lang.IllegalStateException: Duplicate key city (attempted merging values Paris and Oslo)
I tried to change collect method:
.collect(Collectors.toMap(tokens -> tokens[0], tokens -> tokens[1]), (r, strings) -> strings[0]);
But complier says no:
Cannot resolve method 'collect(java.util.stream.Collector<T,capture<?>,java.util.Map<K,U>>, <lambda expression>)'
And Array type expected; found: 'T'
I guess, it's because I have an array. How to fix it?

You are misunderstanding the final argument of toMap (the merge operator). When it find a duplicate key it hands the current value in the map and the new value with the same key to the merge operator which produces the single value to store.
For example, if you want to just store the first value found then use (s1, s2) -> s1. If you want to comma separate them, use (s1, s2) -> s1 + ", " + s2.

If you want to add value of duplicated keys together and group them by key (since app can get a string with duplicate keys), instead of using Collectors.toMap() you can use a Collectors.groupingBy with custom collector (Collector.of(...)) :
String input = "name=Raul&city=Paris&city=Berlin&id=167136&id=03&id=505";
Map<String, Set<Object>> result = Arrays.stream(input.split("&"))
.map(splitedString -> splitedString.split("="))
.filter(keyValuePair -> keyValuePair.length() == 2)
.collect(
Collectors.groupingBy(array -> array[0], Collector.of(
() -> new HashSet<>(), (set, array) -> set.add(array[1]),
(left, right) -> {
if (left.size() < right.size()) {
right.addAll(left);
return right;
} else {
left.addAll(right);
return left;
}
}, Collector.Characteristics.UNORDERED)
)
);
This way you'll get :
result => size = 3
"city" -> size = 2 ["Berlin", "Paris"]
"name" -> size = 1 ["Raul"]
"id" -> size = 3 ["167136","03","505"]

You can achieve the same result using kotlin collections
val res = message
.split("&")
.map {
val entry = it.split("=")
Pair(entry[0], entry[1])
}
println(res)
println(res.toMap()) //distinct by key
The result is
[(name, Raul), (city, Paris), (id, 167136), (city, Oslo)]
{name=Raul, city=Oslo, id=167136}

Preserve order in Java stream with collect

I am running into a bit of an issue where I need order preserved in an operation being performed on a list of strings and using the collect method from the java streams api.
public List<String> countOccurrences(ArrayList<String> messages) {
List<String> stackedMessages = new LinkedList<>();
HashMap<String, Integer> messageOccurrences =
messages.stream()
.collect(groupingBy(Function.identity(), summingInt(e -> 1)));
messageOccurrences.forEach((key, value) -> {
String appendString = value == 1 ? "" : " (" + value + " times)";
stackedMessages.add(key + appendString);
});
return stackedMessages;
}
The problem with the above code is if I process a list such as ["blah", "blah", "yep"], it returns ["yep", "blah (2 times)"] where I need it to return ["blah (2 times)", "yep"].
I looked at this post here already and was lead to believe if I am using a stream on an already ordered data structure then order would be ensured: How to ensure order of processing in java8 streams?
I'm thinking I need to change groupingBy to toMap and as of right now I am reading through that documentation. Anybody who is well versed in the subject matter already please offer some pointers.
UPDATE:
Thanks to the user #Aominè, this is the correct way to do it using groupingBy
.collect(groupingBy(Function.identity(),LinkedHashMap::new, summingInt(e->1)))

You’ll need to collect to a LinkedHashMap to get the expected result. Also, you don’t have to do another forEach separately from the stream pipeline after the groupingBy, just create a stream from the entrySet and map then collect to list.
return messages.stream()
.collect(groupingBy(Function.identity(), LinkedHashMap::new, summingInt(e -> 1)))
.entrySet()
.stream()
.map(e -> e.getKey()+(e.getValue() == 1 ? "" : " (" + e.getValue() +" times)"))
.collect(toList());

Filtering between topics

I have 1,000 records in a topic. I am trying to filter the records from the input topic to the output topic based on the Salary.
For Example: I want the records of people whose salary is higher than 30,000.
I am trying to use KSTREAMS using Java for this.
The records are in text format(Comma Seperated), example:
first_name, last_name, email, gender, ip_address, country, salary
Redacted,Tranfield,user#example.com,Female,45.25.XXX.XXX,Russia,$12345.01
Redacted,Merck,user#example.com,Male,236.224.XXX.XXX,Belarus,$54321.96
Redacted,Kopisch,user#example.com,Male,61.36.XXX.XXX,Morocco,$12345.05
Redacted,Edds,user#example.com,Male,6.87.XXX.XXX,Poland,$54321.72
Redacted,Alston,user#example.com,Female,56.146.XXX.XXX,Indonesia,$12345.16
...
This is my code:
public class StreamsStartApp {
public static void main(String[] args) {
System.out.println();
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-starter-app");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
StreamsBuilder builder = new StreamsBuilder();
// Stream from Kafka topic
KStream<Long, Long> newInput = builder.stream("word-count-input");
Stream<Long, Long> usersAndColours = newInput
// step 1 - we ensure that a comma is here as we will split on it
.filter(value -> value.contains(",")
// step 2 - we select a key that will be the user id
.selectKey((key, value) -> value.split(",")[6])
// step 3 - got stuck here.
// .filter(key -> key.value[6] > 30000
// .selectKey((new1, value1) -> value1.split)(",")[3])
// .filter((key, value) -> key.greater(10));
// .filter((key, value) -> key > 10);
// .filter(key -> key.getkey().intValue() > 10);
usersAndColours.to("new-output");
Runtime.getRuntime().addShutdownHook(new Thread(streams::close))
Here in this above code near step 1, I have separated the sample data using ','.
In step 2 I have selected one field i.e.: salary field as key.
Now in step 3 I am trying to filter the data using salary field.
I tried some ways which are commented, but nothing worked.
Any ideas will help.

First, both your key and value are String serdes, not Longs, so KStream<Long, Long> is not correct.
And value.split(",")[6] is just a String, not a Double. (or a Long, since there's decimal values)
You need to remove the $ from your column and parse the string to a Double, then you can filter on it. Also it's not key.value[6] because your key is not an object with a value field.
And you should probably make the email the key, not the salary, if you even need a key, that is
Realistically, you can do this in one line (made two here for readability)
newInput.filter(value -> value.contains(",") &&
Double.parseDouble(value.split(",")[6].replace("$", "")) > 30000);

transform xml string by lambda java8

I have String as XML. I'm trying to transform String by regexp:
public String replaceValueByTag(final String source, String tag, String value) {
return replaceFirst(source, "(?<=<" + tag + ">).*?(?=</" + tag + ">)", value);
}
then create map with tag, new value:
Map<String, String> params = TAGS.stream().collect(toMap(tag -> tag, tag -> substringByTag(request, tag)));
and use map to replace values in XML:
public String getConfirm(String request) {
String[] answer = {template};
Map<String, String> params = TAGS.stream().collect(toMap(tag -> tag, tag -> substringByTag(request, tag)));
params.entrySet().forEach(entry -> answer[0] = replaceValueByTag(answer[0], entry.getKey(), entry.getValue()));
return answer[0];
}
How to write lambda expression without saving in array (lambda takes String, converts it by map and returns a String)?

You can use reduce to apply all the elements of the Stream of map entries on your template String.
I'm not sure, though, how the combiner should look like (i.e. how to combine two partially transformed Strings into a String that contains all transformations), but if a sequential Stream is sufficient, you don't need the combiner:
String result =
params.entrySet()
.stream()
.reduce(template,
(t,e) -> replaceValueByTag(t, e.getKey(), e.getValue()),
(s1,s2)->s1); // dummy combiner

instead of using an intermediate map you could directly apply the terminal operation, I'll use the .reduce() operation like #Eran suggested:
String result = TAGS.stream()
.reduce(
template,
(tmpl, tag) -> replaceValueByTag(tmpl, tag, substringByTag(request, tag),
(left, right) -> left) // TODO: combine them
);
This way you wont have as much overhead.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to drop fields from avro data in kafka streams - java

Related

Given a map of type Map<K, List<V>>, how to update value within List<V> using Java streams?

Collect key values from array to map without duplicates

Preserve order in Java stream with collect

Filtering between topics

transform xml string by lambda java8

Categories

Resources