Java Regex match to avoid splitting

Java Regex match to avoid splitting - java

I'm reading a text file that has multiple lines like below.
key1:Combine(val -> [{"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}])
key2:Combine(val -> [{"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}])
key3:Combine(val -> [{"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}])
For each line I need to create a map that has key and the Json string as the val.
For example for the above example I would need my map to be like this
map1 = key1,{"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}
map2 = key2,{"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}
map3 = key3,{"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}
Im using the split approach like below and strip the last 2 chars of the second val.
String[] temp = str.split(":Combine(val -> [");
Im trying to create regex pattern match to exract these key and val which I need help on

You will need to use a pattern and matcher with the regex that you require like so:
Pattern pattern = Pattern.compile("val -> (.*)\\)");
Matcher matcher = pattern.matcher(inputString);
matcher.results().forEach((match) -> {
System.out.println(match.group(1));
});
Will output:
[{"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}]
[{"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}]
[{"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}]
To convert the Json I can only recommend using Gson and parse it through that using by creating a class for it like:
Class GenericItem {
Public Integer id;
Public String pid;
}
Class ListOfGenericItems {
List<GenericItem> items;
}
Then using gson to turn the results into something more usable:
var key1 = new Gson.fromJson(match.group(1), ListOfGenericItems.class);
This way if you wish to use the data you can use
key1.items(0).id; //get id of result of first group on key 1
key1.items(1).pid; //get pid of result of second group on key 1
#WJS did have a better answer for the regex pattern however, mine will only return the json values theirs will bring the first key as well if you wanted to use that to identify key of groups.

Alternative using split:
Used regex:
"^(\\w+):Combine\\(val -> \\[(.*)\\]"
Split in context:
/**
* Content of inputFile.txt:
* key1:Combine(val -> [{"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}])
* key2:Combine(val -> [{"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}])
* key3:Combine(val -> [{"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}])
*/
public static void main(String[] args) throws IOException {
String fileName = "C:\\Users\\myUserName\\Desktop\\inputFile.txt"; // windows file system
List<String> linesFromInputFile = Files.readAllLines(Paths.get(fileName));
Pattern replacePattern = Pattern.compile("^(\\w+):Combine\\(val -> \\[(.*)\\]");
Pattern splitPattern = Pattern.compile("#");
Map<String, String> mapResult = linesFromInputFile.stream()
.map(line -> replacePattern.matcher(line).replaceFirst("$1#$2"))
.map(replacedLine -> splitPattern.split(replacedLine))
.collect(Collectors.toMap(arr -> arr[0], arr -> arr[1]));
// Print output
mapResult.entrySet().stream()
.sorted(Comparator.comparing(Map.Entry::getKey))
.forEach(entry -> System.out.printf("Key: '%s' => Value: '%s'%n"
, entry.getKey()
, entry.getValue()));
}
Output:
Key: 'key1' => Value: '{"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"})'
Key: 'key2' => Value: '{"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"})'
Key: 'key3' => Value: '{"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"})'

Here is one of many possible solutions.
String[] strs = {
"key1:Combine(val -> [{\"id\":\"123\",\"pid\":\"Xd34d\"},{\"id\":\"124\",\"pid\":\"sdDfsd\"}])",
"key2:Combine(val -> [{\"id\":\"211\",\"pid\":\"Xd34d\"},{\"id\":\"223\",\"pid\":\"sdDfsd\"}])",
"key3:Combine(val -> [{\"id\":\"423\",\"pid\":\"Xd34d\"},{\"id\":\"454\",\"pid\":\"sdDfsd\"}])" };
Here is one way to parse it. And you can apply it in a stream to make it easy to build the map. The key and value would end up in group1 and group2 below.
(.*?): reluctant capture of characters up to first :
.*?\\[ reluctant skipping of characters up to and including first [
(.*}) capture of remaining characters up to and including last }
String regex = "(.*?):Combine.*?\\[(.*})";
Pattern p = Pattern.compile(regex);
Map<String, String> results = Arrays.stream(strs)
.flatMap(st -> p.matcher(st).results())
.collect(Collectors
.toMap(m -> m.group(1), m -> m.group(2)));
results.entrySet().forEach(System.out::println);
Prints
key1={"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}
key2={"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}
key3={"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}

You indicated you wanted to avoid splitting but in your post you show an example of using split so I thought I would address that with the following. You can split on multiple string using the alternation (|) operator. In the following case the split would produce an array of size 3. Only the first two values would be used as key and value.
String regex = ":Combine.*?\\[|(?:\\])";
String s =
"key1:Combine(val -> [{\"id\":\"123\",\"pid\":\"Xd34d\"},{\"id\":\"124\",\"pid\":\"sdDfsd\"}])";
String [] parts = s.split(regex);
for (int i = 0; i < parts.length; i++) {
System.out.printf("parts[%d] -> %s%n", i, parts[i]);
}
prints
parts[0] -> key1
parts[1] -> {"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}
parts[2] -> )
And using in a stream to create a map
key1={"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}
key2={"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}
key3={"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}
Map<String, String> results = Arrays.stream(strs)
.map(st -> st.split(regex))
.collect(Collectors.toMap(a -> a[0], a -> a[1]));
results.entrySet().forEach(System.out::println);
prints
key1={"id":"123","pid":"Xd34d"},{"id":"124","pid":"sdDfsd"}
key2={"id":"211","pid":"Xd34d"},{"id":"223","pid":"sdDfsd"}
key3={"id":"423","pid":"Xd34d"},{"id":"454","pid":"sdDfsd"}

Related

Java-Stream - Split, group and map the data from a String using a single Stream

I have a string as below:
String data = "010$$fengtai,010$$chaoyang,010$$haidain,027$$wuchang,027$$hongshan,027$$caidan,021$$changnin,021$$xuhui,020$$tianhe";
And I want to convert it into a map of type Map<String,List<String>> (like shown below) by performing the following steps:
first split the string by , and then split by $$;
the substring before $$ would serve as a Key while grouping the data, and the substring after $$ needs to placed inside into a list, which would be a Value of the Map.
Example of the resulting Map:
{
027=[wuchang, hongshan, caidan],
020=[tianhe],
010=[fengtai, chaoyang, haidain],
021=[changnin, xuhui]
}
I've used a traditional way of achieving this:
private Map<String, List<String>> parseParametersByIterate(String sensors) {
List<String[]> dataList = Arrays.stream(sensors.split(","))
.map(s -> s.split("\\$\\$"))
.collect(Collectors.toList());
Map<String, List<String>> resultMap = new HashMap<>();
for (String[] d : dataList) {
List<String> list = resultMap.get(d[0]);
if (list == null) {
list = new ArrayList<>();
list.add(d[1]);
resultMap.put(d[0], list);
} else {
list.add(d[1]);
}
}
return resultMap;
}
But it seems more complicated and verbose. Thus, I want to implement this logic one-liner (i.e. a single stream statement).
What I have tried so far is below
Map<String, List<String>> result = Arrays.stream(data.split(","))
.collect(Collectors.groupingBy(s -> s.split("\\$\\$")[0]));
But the output doesn't match the one I want to have. How can I generate a Map structured as described above?

You simply need to map the values of the mapping. You can do that by specifying a second argument to Collectors.groupingBy:
Collectors.groupingBy(s -> s.split("\\$\\$")[0],
Collectors.mapping(s -> s.split("\\$\\$")[1],
Collectors.toList()
))
Instead of then splitting twice, you can split first and group afterwards:
Arrays.stream(data.split(","))
.map(s -> s.split("\\$\\$"))
.collect(Collectors.groupingBy(s -> s[0],
Collectors.mapping(s -> s[1],Collectors.toList())
));
Which now outputs:
{027=[wuchang, hongshan, caidan], 020=[tianhe], 021=[changnin, xuhui], 010=[fengtai, chaoyang, haidain]}

You can extract the required information from the string without allocating intermediate arrays and by iterating over the string only once and also employing the regex engine only once instead of doing multiple String.split() calls and splitting first by coma , then by $$. We can get all the needed data in one go.
Since you're already using regular expressions (because interpreting "\\s\\s" requires utilizing the regex engine), it would be wise to leverage them to the full power.
Matcher.results()
We can define the following Pattern that captures the pieces of you're interested in:
public static final Pattern DATA = // use the proper name to describe a piece of information (like "027$$hongshan") that the pattern captures
Pattern.compile("(\\d+)\\$\\$(\\w+)");
Using this pattern, we can produce an instance of Matcher and apply Java 9 method Matcher.result(), which produces a stream of MatchResults.
MatchResult is an object encapsulating information about the captured sequence of characters. We can access the groups using method MatchResult.group().
private static Map<String, List<String>> parseParametersByIterate(String sensors) {
return DATA.matcher(sensors).results() // Stream<MatchResult>
.collect(Collectors.groupingBy(
matchResult -> matchResult.group(1), // extracting "027" from "027$$hongshan"
Collectors.mapping(
matchResult -> matchResult.group(2), // extracting "hongshan" from "027$$hongshan"
Collectors.toList())
));
}
main()
public static void main(String[] args) {
String data = "010$$fengtai,010$$chaoyang,010$$haidain,027$$wuchang,027$$hongshan,027$$caidan,021$$changnin,021$$xuhui,020$$tianhe";
parseParametersByIterate(data)
.forEach((k, v) -> System.out.println(k + " -> " + v));
}
Output:
027 -> [wuchang, hongshan, caidan]
020 -> [tianhe]
021 -> [changnin, xuhui]
010 -> [fengtai, chaoyang, haidain]

Replace a map of values in string

Let's say I have a String text = "abc" and I want to replace a map of values, eg:
a->b
b->c
c->a
How would you go for it?
Because obviously:
map.entrySet().forEach(el -> text = text.replaceAll(el.getKey(), el.getValue()))
won't work, since the second replacement will overwrite also the first replacement (and at the end you won't get bca)
So how would you avoid this "replacement of the previous replacement"?
I saw this answer but I hope in a more concise and naive solution (and hopefully without the use of Apache external packages)
By the way the string can be also more than one character

I came up with this solution with java streams.
String text = "abc";
Map<String, String> replaceMap = new HashMap<>();
replaceMap.put("a", "b");
replaceMap.put("b", "c");
replaceMap.put("c", "a");
System.out.println("Text = " + text);
text = Arrays.stream(text.split("")).map(x -> {
String replacement = replaceMap.get(x);
if (replacement != null) {
return x.replace(x, replacement);
} else {
return x;
}
}).collect(Collectors.joining(""));
System.out.println("Processed Text = " + text);
Output
Text = abc
Processed Text = bca

This is a problem I'd normal handle with regex replacement. The code for that in Java is a bit verbose, but this should work:
String text = "abc";
Map<String, String> map = new HashMap<>();
map.put("a", "b");
map.put("b", "c");
map.put("c", "a");
String regex = map.keySet()
.stream()
.map(s -> Pattern.quote(s))
.collect(Collectors.joining("|"));
String output = Pattern.compile(regex)
.matcher(text)
.replaceAll((m) -> {
String s = m.group();
String r = map.get(s);
return r != null ? r : s;
});
System.out.println(output);
// bca
It's relatively straightforward, if a little verbose because Java. First, create a regex expression that will accept any of the keys in the map (using Pattern.quote() to sanitize them), and then use lambda replacement to pluck the appropriate replacement from the map whenever an instance is found.
The performance-intensive part is just compiling the regex in the first place; the replacement itself should make only one pass through the string.
Should be compatible with Java 1.9+

Java 8 onwards, there is a method called chars that returns an IntStream from which you can get a character corresponding to integer represented by the character and map it using your map.
If your map is String to String map then you could use:
text = text.chars().mapToObj(el -> map.get(String.valueOf((char)el))).
collect(Collectors.joining(""));
if your map is Character to Character then just remove String.valueOf()
text = text.chars().mapToObj(el -> map.get((char)el)).collect(Collectors.joining(""));

Collect key values from array to map without duplicates

My app gets some string from web service. It's look like this:
name=Raul&city=Paris&id=167136
I want to get map from this string:
{name=Raul, city=Paris, id=167136}
Code:
Arrays.stream(input.split("&"))
.map(sub -> sub.split("="))
.collect(Collectors.toMap(string-> string[0]), string -> string[1]));
It's okay and works in most cases, but app can get a string with duplicate keys, like this:
name=Raul&city=Paris&id=167136&city=Oslo
App will crash with following uncaught exception:
Exception in thread "main" java.lang.IllegalStateException: Duplicate key city (attempted merging values Paris and Oslo)
I tried to change collect method:
.collect(Collectors.toMap(tokens -> tokens[0], tokens -> tokens[1]), (r, strings) -> strings[0]);
But complier says no:
Cannot resolve method 'collect(java.util.stream.Collector<T,capture<?>,java.util.Map<K,U>>, <lambda expression>)'
And Array type expected; found: 'T'
I guess, it's because I have an array. How to fix it?

You are misunderstanding the final argument of toMap (the merge operator). When it find a duplicate key it hands the current value in the map and the new value with the same key to the merge operator which produces the single value to store.
For example, if you want to just store the first value found then use (s1, s2) -> s1. If you want to comma separate them, use (s1, s2) -> s1 + ", " + s2.

If you want to add value of duplicated keys together and group them by key (since app can get a string with duplicate keys), instead of using Collectors.toMap() you can use a Collectors.groupingBy with custom collector (Collector.of(...)) :
String input = "name=Raul&city=Paris&city=Berlin&id=167136&id=03&id=505";
Map<String, Set<Object>> result = Arrays.stream(input.split("&"))
.map(splitedString -> splitedString.split("="))
.filter(keyValuePair -> keyValuePair.length() == 2)
.collect(
Collectors.groupingBy(array -> array[0], Collector.of(
() -> new HashSet<>(), (set, array) -> set.add(array[1]),
(left, right) -> {
if (left.size() < right.size()) {
right.addAll(left);
return right;
} else {
left.addAll(right);
return left;
}
}, Collector.Characteristics.UNORDERED)
)
);
This way you'll get :
result => size = 3
"city" -> size = 2 ["Berlin", "Paris"]
"name" -> size = 1 ["Raul"]
"id" -> size = 3 ["167136","03","505"]

You can achieve the same result using kotlin collections
val res = message
.split("&")
.map {
val entry = it.split("=")
Pair(entry[0], entry[1])
}
println(res)
println(res.toMap()) //distinct by key
The result is
[(name, Raul), (city, Paris), (id, 167136), (city, Oslo)]
{name=Raul, city=Oslo, id=167136}

split string and store it into HashMap java 8

I want to split below string and store it into HashMap.
String responseString = "name~peter-add~mumbai-md~v-refNo~";
first I split the string using delimeter hyphen (-) and storing it into ArrayList as below:
public static List<String> getTokenizeString(String delimitedString, char separator) {
final Splitter splitter = Splitter.on(separator).trimResults();
final Iterable<String> tokens = splitter.split(delimitedString);
final List<String> tokenList = new ArrayList<String>();
for(String token: tokens){
tokenList.add(token);
}
return tokenList;
}
List<String> list = MyClass.getTokenizeString(responseString, "-");
and then using the below code to convert it to HashMap using stream.
HashMap<String, String> = list.stream()
.collect(Collectors.toMap(k ->k.split("~")[0], v -> v.split("~")[1]));
The stream collector doesnt work as there is no value against refNo.
It works correctly if I have even number of elements in ArrayList.
Is there any way to handle this? Also suggest how I can use stream to do these two tasks (I dont want to use getTokenizeString() method) using stream java 8.

Unless Splitter is doing any magic, the getTokenizeString method is obsolete here. You can perform the entire processing as a single operation:
Map<String,String> map = Pattern.compile("\\s*-\\s*")
.splitAsStream(responseString.trim())
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
By using the regular expression \s*-\s* as separator, you are considering white-space as part of the separator, hence implicitly trimming the entries. There’s only one initial trim operation before processing the entries, to ensure that there is no white-space before the first or after the last entry.
Then, simply split the entries in a map step before collecting into a Map.

First of all, you don't have to split the same String twice.
Second of all, check the length of the array to determine if a value is present for a given key.
HashMap<String, String> map=
list.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
This is assuming you want to put the key with a null value if a key has no corresponding value.
Or you can skip the list variable :
HashMap<String, String> map1 =
MyClass.getTokenizeString(responseString, "-")
.stream()
.map(s -> s.split("~"))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));

private final String dataSheet = "103343262,6478342944, 103426540,84528784843, 103278808,263716791426, 103426733,27736529279,
103426000,27718159078, 103218982,19855201547, 103427376,27717278645,
103243034,81667273413";
final int chunk = 2;
AtomicInteger counter = new AtomicInteger();
Map<String, String> pairs = Arrays.stream(dataSheet.split(","))
.map(String::trim)
.collect(Collectors.groupingBy(i -> counter.getAndIncrement() / chunk))
.values()
.stream()
.collect(toMap(k -> k.get(0), v -> v.get(1)));
result:
pairs =
"103218982" -> "19855201547"
"103278808" -> "263716791426"
"103243034" -> "81667273413"
"103426733" -> "27736529279"
"103426540" -> "84528784843"
"103427376" -> "27717278645"
"103426000" -> "27718159078"
"103343262" -> "6478342944"
We need to group each 2 elements into key, value pairs, so will partion the list into chunks of 2, (counter.getAndIncrement() / 2) will result same number each 2 hits ex:
IntStream.range(0,6).forEach((i)->System.out.println(counter.getAndIncrement()/2));
prints:
0
0
1
1
2
2
You may use the same idea to partition list into chunks.

Another short way to do :
String responseString = "name~peter-add~mumbai-md~v-refNo~";
Map<String, String> collect = Arrays.stream(responseString.split("-"))
.map(s -> s.split("~", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length > 1 ? a[1] : ""));
System.out.println(collect);
First you split the String on basis of - , then you map like map(s -> s.split("~", 2))it to create Stream<String[]> like [name, peter][add, mumbai][md, v][refNo, ] and at last you collect it to toMap as a[0] goes to key and a[1] goes to value.

String manipulation in Java 8 Streams

I have a stream of Strings like-
Token1:Token2:Token3
Here ':' is delimiter character. Here Token3 String may contain delimiter character in it or may be absent.
We have to convert this stream into map with Token1 as key and value is array of two strings- array[0] = Token2 and array[1] = Token3 if Token3 is present, else null.
I have tried something like-
return Arrays.stream(inputArray)
.map( elem -> elem.split(":"))
.filter( elem -> elem.length==2 )
.collect(Collectors.toMap( e-> e[0], e -> {e[1],e[2]}));
But It didn't work. Beside that it do not handle the case if Token3 is absent or contain delimiter character in it.
How can I accomplish it in Java8 lambda expressions?

You can map every input string to the regex Matcher, then leave only those which actually match and collect via toMap collector using Matcher.group() method:
Map<String, String[]> map = Arrays.stream(inputArray)
.map(Pattern.compile("([^:]++):([^:]++):?(.+)?")::matcher)
.filter(Matcher::matches)
.collect(Collectors.toMap(m -> m.group(1), m -> new String[] {m.group(2), m.group(3)}));
Full test:
String[] inputArray = {"Token1:Token2:Token3:other",
"foo:bar:baz:qux", "test:test"};
Map<String, String[]> map = Arrays.stream(inputArray)
.map(Pattern.compile("([^:]++):([^:]++):?(.+)?")::matcher)
.filter(Matcher::matches)
.collect(Collectors.toMap(m -> m.group(1), m -> new String[] {m.group(2), m.group(3)}));
map.forEach((k, v) -> {
System.out.println(k+" => "+Arrays.toString(v));
});
Output:
test => [test, null]
foo => [bar, baz:qux]
Token1 => [Token2, Token3:other]
The same problem could be solved with String.split as well. You just need to use two-arg split version and specify how many parts at most do you want to have:
Map<String, String[]> map = Arrays.stream(inputArray)
.map(elem -> elem.split(":", 3)) // 3 means that no more than 3 parts are necessary
.filter(elem -> elem.length >= 2)
.collect(Collectors.toMap(m -> m[0],
m -> new String[] {m[1], m.length > 2 ? m[2] : null}));
The result is the same.

You can achieve what you want with the following:
return Arrays.stream(inputArray)
.map(elem -> elem.split(":", 3)) // split into at most 3 parts
.filter(arr -> arr.length >= 2) // discard invalid input (?)
.collect(Collectors.toMap(arr -> arr[0], arr -> Arrays.copyOfRange(arr, 1, 3))); // will add null as the second element if the array length is 2

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex match to avoid splitting - java

Related

Java-Stream - Split, group and map the data from a String using a single Stream

Replace a map of values in string

Collect key values from array to map without duplicates

split string and store it into HashMap java 8

String manipulation in Java 8 Streams

Categories

Resources