Efficiently joining text in nested lists - java

Suppose I have a text represented as a collection of lines of words. I want to join words in a line with a space, and join lines with a newline:
class Word {
String value;
}
public static String toString(List <List <Word>> lines) {
return lines.stream().map(
l -> l.stream().map(w -> w.value).collect(Collectors.joining(" "))
).collect(Collectors.joining("\n"));
}
This works fine, but I end up creating an intermediate String object for each line. Is there a nice concise way of doing the same without the overhead?

String s = List.of(
List.of(new Word("a"), new Word("b")),
List.of(new Word("c"), new Word("d")),
List.of(new Word("e"), new Word("f")))
.stream()
.collect(Collector.of(
() -> new StringJoiner(""),
(sj, list) -> {
list.forEach(x -> sj.add(x.getValue()).add(" "));
sj.add("\n");
},
StringJoiner::merge,
StringJoiner::toString));
EDIT
I can thing of this, but can't tell if you would agree for the extra verbosity vs creating that String:
.stream()
.collect(Collector.of(
() -> new StringJoiner(""),
(sj, list) -> {
int i;
for (i = 0; i < list.size() - 1; ++i) {
sj.add(list.get(i).getValue()).add(" ");
}
sj.add(list.get(i).getValue());
sj.add("\n");
},
StringJoiner::merge,
x -> {
String ss = x.toString();
return ss.substring(0, ss.length() - 1);
}));

You can use
public static String toString(List<List<Word>> lines) {
return lines.stream()
.map(l -> l.stream()
.map(w -> w.value)
.collect(() -> new StringJoiner(" "),
StringJoiner::add,
StringJoiner::merge))
.collect(() -> new StringJoiner("\n"),
StringJoiner::merge,
StringJoiner::merge).toString();
}
The inner collect basically does what Collectors.joining(" ") does, but omits the final StringJoiner.toString() step.
Then, the outer collect differs from an ordinary Collectors.joining("\n") in that it accepts StringJoiner as an input and combines them using merge. This relies on a documented behavior:
If the other StringJoiner is using a different delimiter, then elements from the other StringJoiner are concatenated with that delimiter and the result is appended to this StringJoiner as a single element.
This is done internally on the StringBuilder/character data level without creating a String instance while retaining the intended semantic.

Related

Any way to make this stream more efficient?

How would one optimise this without adding value into the new ArrayList instead just having the original list updated?
String filterval = filter.toLowerCase();
ArrayList<String> filtredArr = new ArrayList<String>();
listArray.forEach(
valueText -> {
String val = valueText.toLowerCase();
if (val.startsWith(filterval) || val.contains(filterval))
filtredArr.add(valueText);
else {
Arrays.stream(valueText.split(" ")).forEach(
singleWord -> {
String word = singleWord.toLowerCase();
if(word.startsWith(filterval) || word.contains(filterval))
filtredArr.add(valueText);
}
);
}
});
When using streams, it is best not to modify the source of the stream while the latter iterates over the former. (See this for instance.)
Regarding readability and making idiomatic use of stream operations, your code is almost indistinguishable from a plain old for loop (and I would advise you to change it to that if you only use streams to do a forEach, but you could modify it to use a chain of shorter and more "atomic" stream operations, as in the following examples.
To get the list of strings in which at least one word contains filterval:
List<String> filtered = listArray.stream()
.filter(str -> str.toLowerCase().contains(filterval))
.collect(Collectors.toList());
To get the list of strings in which at least one word starts with filterval:
List<String> filtered =
listArray.stream()
.filter(str -> Arrays.stream(str.split(" "))
.map(String::toLowerCase)
.anyMatch(word -> word.startsWith(filterval)))
.collect(Collectors.toList());
To get the list of words (in any of the strings) that contain the filter value:
List<String> filteredWords = listArray.stream()
.map(String::toLowerCase)
.flatMap(str -> Arrays.stream(str.split(" ")))
.filter(word -> word.contains(filterval))
.collect(Collectors.toList());
(I'm assuming listArray, which you don't show in your code snippet, is a List<String>.)
Notes
The condition val.startsWith(filterval) || val.contains(filterval) is completely equivalent to val.contains(filterval). The reason is, if a string starts with filterval, it must also mean that it contains it; one implies the other.
There's no need to compute the lowercase versions of single words since you've already lowercased the whole string (so any words within it will also be lowercase).
Instead of treating single words and space-separated strings separately, we can apply split to all of them and then "concatenate" the sequences of words by using filterMap.
Minimal, complete, verifiable example
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String filterval = "ba";
List<String> listArray = List.of("foo", "BAR", "spam EGGS ham bAt", "xxbazz");
List<String> filtered1 = listArray.stream()
.filter(str -> str.toLowerCase().contains(filterval))
.collect(Collectors.toList());
List<String> filtered2 =
listArray.stream()
.filter(str -> Arrays.stream(str.split(" "))
.map(String::toLowerCase)
.anyMatch(word -> word.startsWith(filterval)))
.collect(Collectors.toList());
List<String> filtered3 = listArray.stream()
.map(String::toLowerCase)
.flatMap(str -> Arrays.stream(str.split(" ")))
.filter(word -> word.contains(filterval))
.collect(Collectors.toList());
System.out.println(Arrays.toString(filtered1.toArray()));
System.out.println(Arrays.toString(filtered2.toArray()));
System.out.println(Arrays.toString(filtered3.toArray()));
}
}
Output:
[BAR, spam EGGS ham bAt, xxbazz]
[BAR, spam EGGS ham bAt]
[bar, bat, xxbazz]

Sorting sets alphabetically, letters in sets separated by commas

public static void main(String[] args) throws IOException
{
HashSet set = new HashSet<String>();
set.add("{}");
set.add("{a}");
set.add("{b}");
set.add("{a, b}");
set.add("{a, c}");
sortedSet(set);
}
public static void sortedSet(HashSet set)
{
List<String> setList = new ArrayList<String>(set);
List<String> orderedByAlpha = new ArrayList<String>(set);
//sort by alphabetical order
orderedByAlpha = (List<String>) setList.stream()
.sorted((s1, s2) -> s1.compareToIgnoreCase(s2))
.collect(Collectors.toList());
System.out.println(orderedByAlpha);
}
I am trying to sort alphabetically but the output I get is this :
[{a, b}, {a, c}, {a}, {b}, {}]
but it should be:
[{a}, {a, b}, {a, c}, {b}, {}]
You're output doesn't match your code. You are showing 2D array lists, but your converting to a 1D arraylist, doesn't make sense.
public static void main(String[] args)
{
test(Arrays.asList("a", "d", "f", "a", "b"));
}
static void test(List<String> setList)
{
List<String> out = setList.stream().sorted((a, b) -> a.compareToIgnoreCase(b)).collect(Collectors.toList());
System.out.println(out);
}
This is properly sorting 1D arrays, so you're correct there.
You'll probably need to implement your own comparator to compare the 2D array lists to sort them.
instead of having the source as a List<String> I'd recommend you have it as a List<Set<String>> e.g.
List<Set<String>> setList = new ArrayList<>();
setList.add(new HashSet<>(Arrays.asList("a","b")));
setList.add(new HashSet<>(Arrays.asList("a","c")));
setList.add(new HashSet<>(Collections.singletonList("a")));
setList.add(new HashSet<>(Collections.singletonList("b")));
setList.add(new HashSet<>());
Then apply the following comparator along with the mapping operation to yield the expected result:
List<String> result =
setList.stream()
.sorted(Comparator.comparing((Function<Set<String>, Boolean>) Set::isEmpty)
.thenComparing(s -> String.join("", s),
String.CASE_INSENSITIVE_ORDER))
.map(Object::toString)
.collect(Collectors.toList());
and this prints:
[[a], [a, b], [a, c], [b], []]
note that, currently the result is a list of strings where each string is the string representation of a given set. if however, you want the result to be a List<Set<String>> then simply remove the map operation above.
Edit:
Managed to get a solution working based on your initial idea....
So, first, you need a completely new comparator instead of just (s1, s2) -> s1.compareToIgnoreCase(s2) as it will not suffice.
Given the input:
Set<String> set = new HashSet<>();
set.add("{}");
set.add("{a}");
set.add("{b}");
set.add("{a, b}");
set.add("{a, c}");
and the following stream pipeline:
List<String> result = set.stream()
.map(s -> s.replaceAll("[^A-Za-z]+", ""))
.sorted(Comparator.comparing(String::isEmpty)
.thenComparing(String.CASE_INSENSITIVE_ORDER))
.map(s -> Arrays.stream(s.split(""))
.collect(Collectors.joining(", ", "{", "}")))
.collect(Collectors.toList());
Then we would have a result of:
[{a}, {a, b}, {a, c}, {b}, {}]
Well, as #Aomine and #Holger noted already, you need a custom comparator.
But IMHO their solutions look over-engineered. You don't need any of costly operations like split and substring:
String.substring creates a new String object and calls System.arraycopy() under the hood
String.split is even more costly. It iterates over your string and calls String.substring multiple times. Moreover it creates an ArrayList to store all the substrings. If the number of substrings is big enough then your ArrayList will need to expand its capacity (perhaps not only once) causing another call of System.arraycopy().
For your simple case I would slightly modify the code of built-in String.compareTo method:
Comparator<String> customComparator =
(s1, s2) -> {
int len1 = s1.length();
int len2 = s2.length();
if (len1 == 2) return 1;
if (len2 == 2) return -1;
int lim = Math.min(len1, len2) - 1;
for (int k = 1; k < lim; k++) {
char c1 = s1.charAt(k);
char c2 = s2.charAt(k);
if (c1 != c2) {
return c1 - c2;
}
}
return len1 - len2;
};
It will compare the strings with complexity O(n), where n is the length of shorter string. At the same time it will neither create any new objects nor perform any array replication.
The same comparator can be implemented using Stream API:
Comparator<String> customComparatorUsingStreams =
(s1, s2) -> {
if (s1.length() == 2) return 1;
if (s2.length() == 2) return -1;
return IntStream.range(1, Math.min(s1.length(), s2.length()) - 1)
.map(i -> s1.charAt(i) - s2.charAt(i))
.filter(i -> i != 0)
.findFirst()
.orElse(0);
};
You can use your custom comparator like this:
List<String> orderedByAlpha = setList.stream()
.sorted(customComparatorUsingStreams)
.collect(Collectors.toList());
System.out.println(orderedByAlpha);
A take on it (slightly similar to the answer by Aomine) would be to strip the strings of the characters that makes String#compareTo() fail, in this case ('{' and '}'). Also, the special case that an empty string ("{}") is to be sorted after the rest needs to be taken care of.
The following code implements such a comparator:
static final Comparator<String> COMPARE_IGNORING_CURLY_BRACES_WITH_EMPTY_LAST = (s1, s2) -> {
Function<String, String> strip = string -> string.replaceAll("[{}]", "");
String strippedS1 = strip.apply(s1);
String strippedS2 = strip.apply(s2);
return strippedS1.isEmpty() || strippedS2.isEmpty() ?
strippedS2.length() - strippedS1.length() :
strippedS1.compareTo(strippedS2);
};
Of course, this is not the most efficient solution. If efficiency is truly important here, I would loop through the characters, like String#compareTo() does, as suggested by ETO.

split string and store it into HashMap java 8

I want to split below string and store it into HashMap.
String responseString = "name~peter-add~mumbai-md~v-refNo~";
first I split the string using delimeter hyphen (-) and storing it into ArrayList as below:
public static List<String> getTokenizeString(String delimitedString, char separator) {
final Splitter splitter = Splitter.on(separator).trimResults();
final Iterable<String> tokens = splitter.split(delimitedString);
final List<String> tokenList = new ArrayList<String>();
for(String token: tokens){
tokenList.add(token);
}
return tokenList;
}
List<String> list = MyClass.getTokenizeString(responseString, "-");
and then using the below code to convert it to HashMap using stream.
HashMap<String, String> = list.stream()
.collect(Collectors.toMap(k ->k.split("~")[0], v -> v.split("~")[1]));
The stream collector doesnt work as there is no value against refNo.
It works correctly if I have even number of elements in ArrayList.
Is there any way to handle this? Also suggest how I can use stream to do these two tasks (I dont want to use getTokenizeString() method) using stream java 8.
Unless Splitter is doing any magic, the getTokenizeString method is obsolete here. You can perform the entire processing as a single operation:
Map<String,String> map = Pattern.compile("\\s*-\\s*")
.splitAsStream(responseString.trim())
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
By using the regular expression \s*-\s* as separator, you are considering white-space as part of the separator, hence implicitly trimming the entries. There’s only one initial trim operation before processing the entries, to ensure that there is no white-space before the first or after the last entry.
Then, simply split the entries in a map step before collecting into a Map.
First of all, you don't have to split the same String twice.
Second of all, check the length of the array to determine if a value is present for a given key.
HashMap<String, String> map=
list.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
This is assuming you want to put the key with a null value if a key has no corresponding value.
Or you can skip the list variable :
HashMap<String, String> map1 =
MyClass.getTokenizeString(responseString, "-")
.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
private final String dataSheet = "103343262,6478342944, 103426540,84528784843, 103278808,263716791426, 103426733,27736529279,
103426000,27718159078, 103218982,19855201547, 103427376,27717278645,
103243034,81667273413";
final int chunk = 2;
AtomicInteger counter = new AtomicInteger();
Map<String, String> pairs = Arrays.stream(dataSheet.split(","))
.map(String::trim)
.collect(Collectors.groupingBy(i -> counter.getAndIncrement() / chunk))
.values()
.stream()
.collect(toMap(k -> k.get(0), v -> v.get(1)));
result:
pairs =
"103218982" -> "19855201547"
"103278808" -> "263716791426"
"103243034" -> "81667273413"
"103426733" -> "27736529279"
"103426540" -> "84528784843"
"103427376" -> "27717278645"
"103426000" -> "27718159078"
"103343262" -> "6478342944"
We need to group each 2 elements into key, value pairs, so will partion the list into chunks of 2, (counter.getAndIncrement() / 2) will result same number each 2 hits ex:
IntStream.range(0,6).forEach((i)->System.out.println(counter.getAndIncrement()/2));
prints:
0
0
1
1
2
2
You may use the same idea to partition list into chunks.
Another short way to do :
String responseString = "name~peter-add~mumbai-md~v-refNo~";
Map<String, String> collect = Arrays.stream(responseString.split("-"))
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
System.out.println(collect);
First you split the String on basis of - , then you map like map(s -> s.split("~", 2))it to create Stream<String[]> like [name, peter][add, mumbai][md, v][refNo, ] and at last you collect it to toMap as a[0] goes to key and a[1] goes to value.

Adding non-duplicated elements to existing keys in java 8 functional style

I have a map I want to populate:
private Map<String, Set<String>> myMap = new HashMap<>();
with this method:
private void compute(String key, String[] parts) {
myMap.computeIfAbsent(key, k -> getMessage(parts));
}
compute() is invoked as follows:
for (String line : messages) {
String[] parts = line.split("-");
validator.validate(parts); //validates parts are as expected
String key = parts[parts.length - 1];
compute(key, parts);
}
parts elements are like this:
[AB, CC, 123]
[AB, FF, 123]
[AB, 456]
In the compute() method, as you can see I am trying to use the last part of the element of the array as a key and the other parts to be used as values for the map I am looking to build.
My Question: How do I add to existing key only the unique values using Java 8 functional style e.g.
{123=[AB, FF, CC]}
As you requested I added a lambda variant, which just adds the parts via lambda to the map in the compute-method:
private void compute(String key, String[] parts) {
myMap.computeIfAbsent(key,
s -> Stream.of(parts)
.limit(parts.length - 1)
.collect(toSet()));
}
But in this case you will only get something like 123=[AB, CC] in your map. Use merge instead, if you want to add also all values which come on subsequent calls:
private void compute(String key, String[] parts) {
myMap.merge(key,
s -> Stream.of(parts)
.limit(parts.length - 1)
.collect(toSet()),
(currentSet, newSet) -> {currentSet.addAll(newSet); return currentSet;});
}
I am not sure what you intend with computeIfAbsent, but from what you listed as parts and what you expect as output, you may also want to try the following instead of the whole code you listed :
// the function to identify your key
Function<String[], String> keyFunction = strings -> strings[strings.length - 1];
// the function to identify your values
Function<String[], List<String>> valuesFunction = strings -> Arrays.asList(strings).subList(0, strings.length - 1);
// a collector to add all entries of a collection to a (sorted) TreeSet
Collector<List<String>, TreeSet<Object>, TreeSet<Object>> listTreeSetCollector = Collector.of(TreeSet::new, TreeSet::addAll, (left, right) -> {
left.addAll(right);
return left;
});
Map myMap = Arrays.stream(messages) // or: messages.stream()
.map(s -> s.split("-"))
.peek(validator::validate)
.collect(Collectors.groupingBy(keyFunction,
Collectors.mapping(valuesFunction, listTreeSetCollector)));
Using your samples as input you get the result you mentioned (well, actually sorted, as I used a TreeSet).
String[] messages = new String[]{
"AB-CC-123",
"AB-FF-123",
"AB-456"};
produces a map containing:
123=[AB, CC, FF]
456=[AB]
Last, but not least: if you can, pass the key and the values themselves to your method. Don't split the logic about identifying the key and identifying the values. That makes it really hard to understand your code later on or by someone else.
Try this:
private void compute(String[] parts) {
int lastIndex = parts.length - 1;
String key = parts[lastIndex];
List<String> values = Arrays.asList(parts).subList(0, lastIndex);
myMap.computeIfAbsent(key, k -> new HashSet<>()).addAll(values);
}
Or if you want, you can replace the entire loop with a stream:
Map<String, Set<String>> myMap = messages.stream() // if messages is an array, use Arrays.stream(messages)
.map(line -> line.split("-"))
.peek(validator::validate)
.collect(Collectors.toMap(
parts -> parts[parts.length - 1],
parts -> new HashSet<>(Arrays.asList(parts).subList(0, parts.length - 1)),
(a, b) -> { a.addAll(b); return a; }));
To add more parts to a possibly existing key you're using the wrong method; you want merge(), not computeIfAbsent().
If validator.valudate() throws a checked Exception, you must call it outside a stream, so you'll need a foreach loop:
for (String message : messages) {
String[] parts = message.split("-");
validator.validate(parts);
LinkedList<String> list = new LinkedList(Arrays.asList(parts));
String key = list.getLast();
list.removeLast();
myMap.merge(key, new HashSet<>(list), Set::addAll);
}
Using a LinkedList, which has methods getLast() and removeLast(), makes the code very readable.
Disclaimer: Code may not compile or work as it was thumbed in on my phone (but there's a reasonable chance it will work)

Using java 8 streams to generate pairs of integers

I am trying to generate pairs of integers - I have a class Pair with a constructor taking 2 ints. The following code works but seems rather clunky - in particular the conversion from an intStream to an object stream using mapToObj(Integer::new).
private static List<Pair> success() {
return IntStream.range(0, 10).
mapToObj(Integer::new).flatMap(i -> IntStream.range(12, 15).
mapToObj(j -> new Pair(i, j))).
collect(Collectors.toList());
}
Firstly does anyone have a more elegant way to do this ?
Secondly when I refactored to extract some streams as variables, I get an error: IllegalStateException: stream has already been operated upon or closed. Here is the refactored method - does anyone know if this a problem with the code ?
static List<Pair> fail() {
Stream<Integer> outer = IntStream.range(0, 10).mapToObj(Integer::new);
IntStream inner = IntStream.range(12, 15);
Stream<Pair> pairStream = outer.flatMap(i ->
inner.mapToObj(j -> new Pair(i, j)));
return pairStream.collect(Collectors.toList());
}
It is possible to make it a bit more concise by replacing mapToObj(Integer::new) with boxed- but apart from that, Java is not that concise:
IntStream.range(0, 10)
.boxed()
.flatMap(i -> IntStream.range(12, 15)
.mapToObj(j -> new Pair(i, j)))
.collect(Collectors.toList());
As for the second question: There are other answers which link to the problem. The concrete problem is that inner is not used once, but each time of the outer flatMap().
This way it works:
final IntStream range = IntStream.range(0, 10);
List<Pair> ps = range
.boxed().flatMap(i -> {
final IntStream range1 = IntStream.range(12, 15);
return range1.
mapToObj(j -> new Pair<>(i, j));
}).
collect(Collectors.toList());
Why not use plain for-loops? Plain for-loops will:
Look nicer
Make your intent clear
static List<Pair> fail() {
List<Pair> pairs = new ArrayList<>(30);
for (int i = 0; i < 10; i++) {
for (int j = 12; j < 15; j++) {
pairs.add(new Pair(i, j));
}
}
return pairs;
}
If your Pair class accepts primitive ints you can eliminate the unnecessary boxing this way:
private static List<Pair> success() {
return IntStream.range(0, 10).
mapToObj(i -> IntStream.range(12, 15).
mapToObj(j -> new Pair(i, j))).
flatMap(Function.identity()).
collect(Collectors.toList());
}
As for extracting streams into variables, you may create a supplier instead:
private static List<Pair> success() {
Supplier<IntStream> inner = () -> IntStream.range(12, 15);
return IntStream.range(0, 10).
mapToObj(i -> inner.get().
mapToObj(j -> new Pair(i, j))).
flatMap(Function.identity()).
collect(Collectors.toList());
}
Though it seems unnecessary for me to extract the stream into the variable.

Categories