Split String by | and numbers

Split String by | and numbers - java

Let's imagine I have the following strings:
String one = "123|abc|123abc";
String two = "123|ab12c|abc|456|abc|def";
String three = "123|1abc|1abc1|456|abc|wer";
String four = "123|abc|def|456|ghi|jkl|789|mno|pqr";
If I do a split on them I expect the following output:
one = ["123|abc|123abc"];
two = ["123|ab12c|abc", "456|abc|def"];
three = ["123|1abc|1abc1", "456|abc|wer"];
four = ["123|abc|def", "456|ghi|jkl", "789|mno|pqr"];
The string has the following structure:
Starts with 1 or more digits followed by a random number of (| followed by random number of characters).
When after a | it's only numbers is considered a new value.
More examples:
In - 123456|xxxxxx|zzzzzzz|xa2314|xzxczxc|1234|qwerty
Out - ["123456|xxxxxx|zzzzzzz|xa2314|xzxczxc", "1234|qwerty"]
Tried multiple variations of the following but does not work:
value.split( "\\|\\d+|\\d+" )

You may split on \|(?=\d+(?:\||$)):
List<String> nums = Arrays.asList(new String[] {
"123|abc|123abc",
"123|ab12c|abc|456|abc|def",
"123|1abc|1abc1|456|abc|wer",
"123|abc|def|456|ghi|jkl|789|mno|pqr"
});
for (String num : nums) {
String[] parts = num.split("\\|(?=\\d+(?:\\||$))");
System.out.println(num + " => " + Arrays.toString(parts));
}
This prints:
123|abc|123abc => [123|abc|123abc]
123|ab12c|abc|456|abc|def => [123|ab12c|abc, 456|abc|def]
123|1abc|1abc1|456|abc|wer => [123|1abc|1abc1, 456|abc|wer]
123|abc|def|456|ghi|jkl|789|mno|pqr => [123|abc|def, 456|ghi|jkl, 789|mno|pqr]

Instead of splitting, you can match the parts in the string:
\b\d+(?:\|(?!\d+(?:$|\|))[^|\r\n]+)*
\b A word boundary
\d+ Match 1+ digits
(?: Non capture group
\|(?!\d+(?:$|\|)) Match | and assert not only digits till either the next pipe or the end of the string
[^|\r\n]+ Match 1+ chars other than a pipe or a newline
)* Close the non capture group and optionally repeat (use + to repeat one or more times to match at least one pipe char)
Regex demo | Java demo
String regex = "\\b\\d+(?:\\|(?!\\d+(?:$|\\|))[^|\\r\\n]+)+";
String string = "123|abc|def|456|ghi|jkl|789|mno|pqr";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(string);
List<String> matches = new ArrayList<String>();
while (m.find())
matches.add(m.group());
for (String s : matches)
System.out.println(s);
Output
123|abc|def
456|ghi|jkl
789|mno|pqr

Related

Java Regex Matcher skipping the matches

Below is my Java code to delete all pair of adjacent letters that match, but I am getting some problems with the Java Matcher class.
My Approach
I am trying to find all successive repeated characters in the input e.g.
aaa, bb, ccc, ddd
Next replace the odd length match with the last matched pattern and even length match with "" i.e.
aaa -> a
bb -> ""
ccc -> c
ddd -> d
s has single occurrence, so it's not matched by the regex pattern and excluded from the substitution
I am calling Matcher.appendReplacement to do conditional replacement of the patterns matched in input, based on the group length (even or odd).
Code:
public static void main(String[] args) {
String s = "aaabbcccddds";
int i=0;
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("([a-z])\\1+");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(i).length()%2==0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
i++;
}
m.appendTail(output);
System.out.println(output);
}
Input : aaabbcccddds
Actual Output : aaabbcccds (only replacing ddd with d but skipping aaa, bb and ccc)
Expected Output : acds

This can be done in a single replaceAll call like this:
String repl = str.replaceAll( "(?:(.)\\1)+", "" );
Regex expression (?:(.)\\1)+ matches all occurrences of even repetitions and replaces it with empty string this leaving us with first character of odd number of repetitions.
RegEx Demo
Code using Pattern and Matcher:
final Pattern p = Pattern.compile( "(?:(.)\\1)+" );
Matcher m = p.matcher( "aaabbcccddds" );
String repl = m.replaceAll( "" );
//=> acds

You can try like that:
public static void main(String[] args) {
String s = "aaabbcccddds";
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(\\w)(\\1+)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(2).length()%2!=0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
}
m.appendTail(output);
System.out.println(output);
}
It is similar to yours but when getting just the first group you match the first character and your length is always 0. That's why I introduce a second group which is the matched adjacent characters. Since it has length of -1 I reverse the odd even logic and voila -
acds
is printed.

You don't need multiple if statements. Try:
(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)
Replace with $1
Regex live demo
Java code:
str.replaceAll("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)", "$1");
Java live demo
Regex breakdown:
(?: Start of non-capturing group
(\\w) Capture a word character
(?:\\1\\1)+ Match an even number of same character
| Or
(\\w) Capture a word character
\\2+ Match any number of same character
) End of non-capturing group
(?!\\1|\\2) Not followed by previous captured characters
Using Pattern and Matcher with StringBuffer:
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) m.appendReplacement(output, "$1");
m.appendTail(output);
System.out.println(output);

regex double split

How Java regex expression should look like if I want to find two matches
1. NEW D City
2. 1259669
From
Object No: NEW D City | Item ID: 1259669
I tried with
(?<=:\s)\w+
but it only get
1. NEW
2. 1259669
https://regex101.com/r/j5jwK2/1

Using a pattern to capture both values is simpler. Here is the regex used :
Object No:([^|]*)\| Item ID: (\d*)
And a code generated by regex101 and adapted to match the output you want.
final String regex = "Object No: ([^|]*)\\| Item ID: (\\d*)";
final String string = "Object No: NEW D City | Item ID: 1259669";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(+ i + ": " + matcher.group(i));
}
}
Output:
1: NEW D City
2: 1259669
A similar but more generec solution would be [^:]*[:\s]*([^|]*)\|[^:]*[:\s]*(\d*) (not perfect, I didn't try to do something efficient)

You may use a combination of two splits:
String key = "Object No: NEW D City | Item ID: 1259669";
String[] parts = key.split("\\s*\\|\\s*");
List<String> result = new ArrayList<>();
for (String part : parts) {
String[] kvp = part.split(":\\s*");
if (kvp.length == 2) {
result.add(kvp[1]);
System.out.println(kvp[1]); // demo
}
}
See the Java demo
First, you split with \\s*\\|\\s* (a | enclosed with 0+ whitespaces) and then with :\\s*, a colon followed with 0+ whitespaces.
Another approach is to use :\s*([^|]+) pattern and grab and trim Group 1 value:
String s = "Object No: NEW D City | Item ID: 1259669";
List<String> result = new ArrayList<>();
Pattern p = Pattern.compile(":\\s*([^|]+)");
Matcher m = p.matcher(s);
while(m.find()) {
result.add(m.group(1).trim());
System.out.println(m.group(1).trim()); // For demo
}
See the Java demo. In this regex, the ([^|]+) is a capturing group (pushing its contents into matcher.group(1)) that matches one or more (+) chars other than | (with the [^|] negated character class).

What is the best regex which can split list of http headers?

My header list format is string of:
"headerName1:value1,headerName2:value2,headerName3:value3,..."
So since a comma can be present in headers, splitting using it might be a problem.
So what would be the characters that might not be present within the headers which I can use for splitting?
This is my code:
public List<Header> getHeaders(String headers) {
List<Header> headersList = new ArrayList<>();
if (!"".equals(headers)) {
String[] spam = headers.split(",");
for (String aSpam : spam) {
String[] header = aSpam.split(":",2);
if (header.length > 1) {
headersList.add(new Header(header[0], header[1]));
} else {
throw new HTTPSinkAdaptorRuntimeException("Invalid format");
}
}
return headersList;
}
My desired output is an array, {"headerName1:value1", "headerName2:value2", "headerName3:value3", ...}
The problem is: "From: Donna Doe, chief bottle washer ,TO: John Doe, chief bottle washer "
scenario like this it does not work well.

I believe you want to extract any 1+ word chars before : as key and then any number of any chars before the end of string or the first sequence of 1+ word chars followed with :.
You may consider using
(\w+):([^,]*(?:,(?!\s*\w+:)[^,]*)*)
which is an unrolled variant of (\w+):(.*?)(?=\s*\w+:|$) regex. See the regex demo.
Details:
(\w+) - Group 1 (key)
: - a colon
([^,]*(?:,(?!\s*\w+:)[^,]*)*) - Group 2 (value):
[^,]* - zero or more chars other than ,
(?:,(?!\s*\w+:)[^,]*)* - zero or more sequences of:
,(?!\s*\w+:) - comma not followed with 0+ whitespaces and then 1+ word chars + :
[^,]* - zero or more chars other than ,
The (.*?)(?=\s*\w+:|$) is more readable, but less efficient. It captures into Group 2 any 0+ chars other than line break chars (with (.*?)), but as few as possible (due to *?) up to the first occurrence of end of string ($) or 0+ whitespaces + 1 or more word chars + : (with the (?=\s*\w+:|$) positive lookahead).
See the Java demo:
Map<String,String> hash = new HashMap<>();
String s = "headerName1:va,lu,e1, headerName2:v,a,lue2,headerName3:valu,,e3,hn:dddd, ddd:val";
Pattern pattern = Pattern.compile("(\\w+):([^,]*(?:,(?!\\s*\\w+:)[^,]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
hash.put(matcher.group(1), matcher.group(2));
}
System.out.println(hash);
// => {headerName1=va,lu,e1, ddd=val, headerName2=v,a,lue2, hn=dddd, headerName3=valu,,e3}

regex to remove round brackets from a string

i have a string
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .
s1 = Identity_philosphy
s2= unique identity
I have tried following code
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..
Please Help
Thanks

Use
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:
Identity_philosophy
unique identity

You may use
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
See a Java demo.
Details
The "\\[{2}(.*)\\|(.*)]]" with matches() is parsed as a ^\[{2}(.*)\|(.*)]]\z pattern that matches a string that starts with [[, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]. See the regex demo.
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")), then trimming the result (.trim()) and replacing all spaces with _ (.replace(" ", "_")) as the final touch.

Get Two Specific Word Using Regex and Save it to HashMap

I need a help,
I have a String like
LOCALHOST = https://192.168.56.1
I want to get the "LOCALHOST" and the IP address then save it to HashMap
This is my code so far, I didnt know how to use regex, please help
The output that I want is in HashMap {LOCALHOST=192.168.56.1}
public static void main(String[] args) {
try {
String line = "LOCALHOST = https://192.168.56.1";
//this should be a hash map
ArrayList<String> urls = new ArrayList<String>();
//didnt know how to get two string
Matcher m = Pattern.compile("([^ =]+)").matcher(line);
while (m.find()) {
urls.add(m.group());
}
System.out.println(urls);
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
}
}
Thank you for the help

To answer the question as per the title:
String line = "LOCALHOST = https://192.168.56.1";
Map<String, String> map = new HashMap<String, String>();
String[] parts = line.split(" *= *");
map.put(parts[0], parts[1]);
The regex splits on equals sign and consumes any spaces around it too so you don't have to trim to parts.

Try something like the following:
final Matcher m = Pattern.compile("^(.+) = https:\\/\\/(\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})$");
m.matcher(line);
final Map<String,String> map = new HashMap<String,String();
if (m.matches())
{
final String lh = m.group(1);
final String ip = m.group(2);
map.add(lh,ip);
}
Learn to use a good interactive Regular Expression editor like the one at regex101.com
/^(.+) = https:\/\/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$/m
^ Start of line
1st Capturing group (.+) 
. 1 to infinite times [greedy] Any character (except newline)
 = https:\/\/ Literal  = https://
2nd Capturing group (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) 
\d 1 to 3 times [greedy] Digit [0-9]
\. Literal .
\d 1 to 3 times [greedy] Digit [0-9]
\. Literal .
\d 1 to 3 times [greedy] Digit [0-9]
\. Literal .
\d 1 to 3 times [greedy] Digit [0-9]
$ End of line
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

String line = "LOCALHOST = https://192.168.56.1";
String []s =line.split("=");
map.put(s[0].trim(), s[1].trim());

This is very simple and does not require 'matcher/pattern' regex. Try This:
HashMap<String, String> x = new HashMap<String, String>();
String line = "LOCALHOST = https://192.168.56.1";
String[] items = line.split("=");
x.add(items[0], items[1]);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split String by | and numbers - java

Related

Java Regex Matcher skipping the matches

regex double split

What is the best regex which can split list of http headers?

regex to remove round brackets from a string

Get Two Specific Word Using Regex and Save it to HashMap

Categories

Resources