Split string in 3 blocks (Text - Number - Other characters) if possible - java

I need to divide a string into 3 blocks at most
The first block is only for letters
The second only numbers
The third the remainder
Examples:
Karbwqeaf 11D
Jablunkovska 21/2
Tastoor Nstraat 43
Schzelkjedow
Heajsd 3/5/7 m 344
Lasdasdt seavees 3., 729. tasdasd F 2.
ul. Pasydufasdfa 73k/120
I need to split like this:
Block1: Karbwqeaf
Block2: 11
Block3: D
Block1: Jablunkovska
Block2: 21
Block3: /2
Block1: Tastoor Nstraat
Block2: 43
Block3:
Block1: Schzelkjedow
Block2:
Block3:
Block1: Heajsd
Block2: 3
Block3: /5/7 m 344
Block1: Lasdasdt seavees 3
Block2: 3
Block3: ., 729. tasdasd F 2.
Block1: ul. Pasydufasdfa
Block2: 73
Block3: k/120
Below my code, but I don't know how to do it so that all my requirements are met. Any idea?
List<String> AllAddress = Arrays.asList("Karbwqeaf 11D", "Jablunkovska 21/2", "Tastoor Nstraat 43", "Schzelkjedow", "Heajsd 3/5/7 m 344", "Lasdasdt seavees 3., 729. tasdasd F 2.", "ul. Pasydufasdfa 73k/120");
for (String Address : AllAddress) {
String block1 = "";
String block2 = "";
String block3 = "";
Pattern pattern = Pattern.compile("(.+)\\s(\\d)(.*)");
Matcher matcher = pattern.matcher(Address);
if(matcher.matches()) {
block1 = matcher.group(1);
block2 = matcher.group(2);
block3 = matcher.group(3);
System.out.println("block1 = " + block1);
System.out.println("block2 = " + block2);
System.out.println("block3 = " + block3);
}
}

You can use 3 capturing groups, where the second group matches 1 or more digits and the 3rd group matches any character 0+ times.
^([^\d\r\n]+)(?:\h+(\d+)(.*))?$
Explanation
^ Start of string
( Capture group 1
[^\d\r\n]+ Match any char except a newline or digit
) Close group 1
(?: Non capture group
\h+ Match 1+ horizontal whitespace chars
(\d+)(.*) Capture 1 or more digits in group 2 and capture 0 or more times any character in group 3
)? Close the non capture group and make it optional
$ End of string
Regex demo

Related

Split String by | and numbers

Let's imagine I have the following strings:
String one = "123|abc|123abc";
String two = "123|ab12c|abc|456|abc|def";
String three = "123|1abc|1abc1|456|abc|wer";
String four = "123|abc|def|456|ghi|jkl|789|mno|pqr";
If I do a split on them I expect the following output:
one = ["123|abc|123abc"];
two = ["123|ab12c|abc", "456|abc|def"];
three = ["123|1abc|1abc1", "456|abc|wer"];
four = ["123|abc|def", "456|ghi|jkl", "789|mno|pqr"];
The string has the following structure:
Starts with 1 or more digits followed by a random number of (| followed by random number of characters).
When after a | it's only numbers is considered a new value.
More examples:
In - 123456|xxxxxx|zzzzzzz|xa2314|xzxczxc|1234|qwerty
Out - ["123456|xxxxxx|zzzzzzz|xa2314|xzxczxc", "1234|qwerty"]
Tried multiple variations of the following but does not work:
value.split( "\\|\\d+|\\d+" )
You may split on \|(?=\d+(?:\||$)):
List<String> nums = Arrays.asList(new String[] {
"123|abc|123abc",
"123|ab12c|abc|456|abc|def",
"123|1abc|1abc1|456|abc|wer",
"123|abc|def|456|ghi|jkl|789|mno|pqr"
});
for (String num : nums) {
String[] parts = num.split("\\|(?=\\d+(?:\\||$))");
System.out.println(num + " => " + Arrays.toString(parts));
}
This prints:
123|abc|123abc => [123|abc|123abc]
123|ab12c|abc|456|abc|def => [123|ab12c|abc, 456|abc|def]
123|1abc|1abc1|456|abc|wer => [123|1abc|1abc1, 456|abc|wer]
123|abc|def|456|ghi|jkl|789|mno|pqr => [123|abc|def, 456|ghi|jkl, 789|mno|pqr]
Instead of splitting, you can match the parts in the string:
\b\d+(?:\|(?!\d+(?:$|\|))[^|\r\n]+)*
\b A word boundary
\d+ Match 1+ digits
(?: Non capture group
\|(?!\d+(?:$|\|)) Match | and assert not only digits till either the next pipe or the end of the string
[^|\r\n]+ Match 1+ chars other than a pipe or a newline
)* Close the non capture group and optionally repeat (use + to repeat one or more times to match at least one pipe char)
Regex demo | Java demo
String regex = "\\b\\d+(?:\\|(?!\\d+(?:$|\\|))[^|\\r\\n]+)+";
String string = "123|abc|def|456|ghi|jkl|789|mno|pqr";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(string);
List<String> matches = new ArrayList<String>();
while (m.find())
matches.add(m.group());
for (String s : matches)
System.out.println(s);
Output
123|abc|def
456|ghi|jkl
789|mno|pqr

Regex not matching all numbers with delimiters

Need a single combined regex for the following pattern:
Prefix: 2221-2720 , Length: 16
Prefix: 51-55 , Length: 16
where the delimiters b/w digits can be either space ( ), minus sign (-), period (.), backslash (\), equals (=). The condition being that more than one delimiter (same or different type) can't occur more than once b/w any two digits.
Valid number - 230.293.217.952.148.4
Valid number - 230.293 217-952.148.4
Invalid number - 230..293.217.952.148.4
Invalid number - 230.293.-217. 952.148.4
A valid input is one where you have 16 digits separated by any/no delimiters as long as there are no two delimiters adjacent to each other.
Have come up with the following regex:
(2[\s=\\.-]*2[\s=\\.-]*2[\s=\\.-]*[1-9][\s=\\.-]*|2[\s=\\.-]*2[\s=\\.-]*[3-9][\s=\\.-]*[0-9][\s=\\.-]*|2[\s=\\.-]*[3-6][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){1}|2[\s=\\.-]*7[\s=\\.-]*[01][\s=\\.-]*[0-9][\s=\\.-]*|2[\s=\\.-]*7[\s=\\.-]*2[\s=\\.-]*0[\s=\\.-]*)[0-9](?:[\s=\\.-]*[0-9]){11}|(5[\s=\\.-]*[1-5][\s=\\.-]*)[0-9](?:[\s=\\.-]*[0-9]){13}
It does not match certain patterns. For example:
2 3 0 2 9 3 2 1 7 9 5 2 1 4 8 4
23-02-93-21-79-52-14-84
2 3 0 3 4 5 8 0 9 4 9 3 0 8 2 3
For the same numbers, it matches (as expected) the following patterns:
2302932179521484
230.293.217.952.148.4
2303458094930823
230.345.809.493.082.3
230-345-809-493-082-3
There seems to be an issue with delimiters. Kindly let me know what is wrong with my regex.
For this rule
A valid input is one where you have 16 digits separated by any/no
delimiters as long as there are no two delimiters adjacent to each
other
Prefix: 2221-2720 , Length: 16
Prefix: 51-55 , Length: 16
2221 can also be written as 2.2.-2.1
For these rules, it might be easier to write a pattern with 2 capture groups to match the whole string.
Then using some Java code, you can check the value of the capture groups for the ranges.
^((\d[ =\\.-]?\d)[ =\\.-]?\d[ =\\.-]?\d)(?:[ =\\.-]?\d){12}$
The pattern matches:
^ Start of string
( Capture group 1
(\d[ =\\.-]?\d) Capture group 2 Match 2 digits with an optional char = \ . -
[ =\\.-]?\d[ =\\.-]?\d Match 2 times optionally 1 of the listed chars and a single digit
) close group 1
(?:[ =\\.-]?\d){12} Repeat 12 times matching one of the characters and a single digit
$ End of string
Regex demo | Java demo
For example
String strings[] = {
"2221.7.952.148.412.32",
"230.293.217.952.148.4",
"5511111111111111",
"130.293 217-952.148.4",
"30..293.217.952.148.4",
"5..5",
".5.5."
};
String regex = "^((\\d[ =\\\\.-]?\\d)[ =\\\\.-]?\\d[ =\\\\.-]?\\d)(?:[ =\\\\.-]?\\d){12}$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
int grp1 = Integer.parseInt(matcher.group(1).replaceAll("\\D+", ""));
int grp2 = Integer.parseInt(matcher.group(2).replaceAll("\\D+", ""));
if ((grp1 >= 2221 && grp1 <= 2720) || (grp2 >=51 && grp2 <= 55)) {
System.out.println("Match for " + matcher.group());
}
}
}
Output
Match for 2221.7.952.148.412.32
Match for 230.293.217.952.148.4
Match for 5511111111111111

Replace dash character between three or more number separate by dash with space in sentence

I want to replace dash between three or more number (each number has only one digit, i.e number from 0 - 9) separate by dash with one space in an sentence. What is the good solution to done this task ?
Sample Input:
4-2-2-1 kim yoong-yun
4 -2 - 2 - 1 and 4 - 5
1-2-3-4-5
1-5
4 - 5
Expected Output:
4 2 2 1 kim yoong-yun
4 2 2 1 and 4 - 5
1 2 3 4 5
1-5 // will not replace
4 - 5 // will not replace
I know i can done this by this complex method:
String sentence = "4-2-3-1";
Pattern pCode = Pattern.compile("\\b(?:\\d ?- ?){2,}\\d");
Matcher mCode = pCode.matcher(sent);
while (mCode.find()) {
sentence = mCode.replaceFirst(mCode.group(0).replaceAll(" ?- ?", " "));
mCode = pCode.matcher(sent);
}
System.out.print(sentence) // 4 2 3 1
But can I done in one replace, or any simple solution?
In Java 9+, you may use Matcher#replaceAll​(Function<MatchResult,String> replacer) method:
String sentence = "4-2-3-1";
Pattern pCode = Pattern.compile("\\b\\d(?:\\s?-\\s?\\d){2,}\\b");
Matcher mCode = pCode.matcher(sentence);
String result = mCode.replaceAll(x -> x.group().replace("-", " ") );
System.out.println( result ); // => 4 2 3 1
See the online Java demo. In earlier versions, use
String sentence = "4-2-3-1";
Pattern pCode = Pattern.compile("\\b\\d(?:\\s?-\\s?\\d){2,}\\b");
Matcher mCode = pCode.matcher(sentence);
StringBuffer sb = new StringBuffer();
while (mCode.find()) {
mCode.appendReplacement(sb, mCode.group().replace("-", " "));
}
mCode.appendTail(sb);
See this Java demo.
The regex is a bit modified to follow the best practices (quantified parts should be moved as far to the right as possible):
\b\d(?:\s?-\s?\d){2,}\b
See the regex demo. Details:
\b - word boundary
\d - a single digit
(?:\s?-\s?\d){2,} - two or more occurrences of:
\s?-\s? - a - enclosed with one or zero whitespace
\d - a single digit
\b - word boundary
You can use the following function
private static String unDash(String input) {
String[] splitString = input.split("\\s*-\\s*");
if(splitString.length < 3){
return input;
} else {
return String.join(" ", splitString);
}
}
Split is done using "\\s*-\\s*" which takes care of trimming the String after splitting it by '-'. String.join can be used to combine the spilt String using a delimiter, which in our case is " ".
Here’s a 1-liner:
String s2 = s1.matches("(\\d+[ -]+){2,}\\d") ? s1.replaceAll("[ -]+", " ") : s1;
Your logic is “replace separators if there’s more than 3 numbers”, and this code captures that succinctly.
See live demo.

Regular expressions: some groups missing

I have following Java code:
String s2 = "SUM 12 32 42";
Pattern pat1 = Pattern.compile("(PROD)|(SUM)(\\s+(\\d+))+");
Matcher m = pat1.matcher(s2);
System.out.println(m.matches());
System.out.println(m.groupCount());
for (int i = 1; i <= m.groupCount(); ++i) {
System.out.println(m.group(i));
}
which produces:
true
4
null
SUM
42
42
I wonder what's a null and why 12 and 32 are missing (I expected to find them amongst groups).
A repeated group will contain the match of the last substring matching the expression for the group.
It would be nice if the regexp engine would give back all substrings that matched a group. Unfortunately this is not supported:
Regular expression with variable number of groups?
Furthermore groups are a static and numbered like this:
0
_______________________
/ \
(PROD)|(SUM)(\\s+(\\d+))+
\____/ \___/| \____/|
1 2 | 4 |
\________/
3
Group X from this part of your regex:
(\\s+(\\d+))+
| |
+----------+--> X
will first match 12, then 32 and finally 42. Each time X's value gets changed, and replaces the previous one. If you want all values, you'll need a Pattern & Matcher.find() approach:
String s = "SUM 12 32 42 PROD 1 2";
Matcher m = Pattern.compile("(PROD|SUM)((\\s+\\d+)+)").matcher(s);
while(m.find()) {
System.out.println("Matched : " + m.group(1));
Matcher values = Pattern.compile("\\d+").matcher(m.group(2));
while(values.find()) {
System.out.println(" : " + values.group());
}
}
which will print:
Matched : SUM
: 12
: 32
: 42
Matched : PROD
: 1
: 2
And you see a null printed because in group 1, there's PROD, which you didn't match.
I wonder what's a null
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
http://download.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html#group%28int%29
the string given does not matches the entire pattern.

how to write a java pattern in this format: any characters (int,int) (int,int) number number any number of (int,int,int)

For example
Maze0.bmp (0,0) (319,239) 65 120
Maze0.bmp (0,0) (319,239) 65 120 (254,243,90)
Maze0.bmp (0,0) (319,239) 65 120 (254,243,90) (0,0,0)
Maze0.bmp (0,0) (319,239) 65 120 (254,243,90) (0,0,0) (11,33,44)
I want to get the maze0.bmp and all the numbers. I have:
Pattern pattern = Pattern.compile("([A-z][^\\s]*)\\s+\\((\\d+),(\\d+)\\)\\s+\\((\\d+),(\\d+)\\)\\s+(\\d+)\\s+(\\d+)\\s+(\\((\\d+),(\\d+),(\\d+)\\)\\s*)");
BufferedReader stdin = new BufferedReader(new InputStreamReader( System.in));
String input;
Matcher matcher = null;
boolean isMatched = false;
while (!isMatched) {
System.out.println("Please enter right format\n");
input = stdin.readLine();
matcher = pattern.matcher(input);
while(matcher.find()) {
isMatched = true;
for (int i = 1; i <= matcher.groupCount(); ++i)
System.out.println(matcher.group(i));
}
}
but it's correct. For example, if my input is
Maze0.bmp (0,0) (319,239) 65 120 (254,243,90) (0,0,0)
it cannot get the the last tuple( 0,0,0).
Here is the best I can come up with. Note, that I used TWO patterns, because for some reason Java refuses to capture repeating groups (if anyone happens to know why, plz leave a comment).
final Pattern outerPattern = Pattern.compile("(.*?) \\((\\d+),(\\d+)\\) \\((\\d+),(\\d+)\\) (\\d+) (\\d+)(.*)");
final Pattern optionalTouplePattern = Pattern.compile(" \\((\\d+),(\\d+),(\\d+)\\)");
final BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
boolean isMatched;
do
{
System.out.println("Please enter right format:");
Matcher m = outerPattern.matcher(stdin.readLine());
if (isMatched = m.find())
{
System.out.println(String.format("name='%s', first touple: [%s,%s], second touple: [%s,%s], first single number: %s, second single number: %s", m.group(1), m.group(2), m.group(3), m.group(4), m.group(5), m.group(6), m.group(7)));
m = optionalTouplePattern.matcher(m.group(8));
while(m.find())
{
System.out.println(String.format("+ optional touple: [%s,%s,%s]", m.group(1), m.group(2), m.group(3)));
}
}
}while(!isMatched);
Ok, sorry, I have got to revise. The java matcher seems to not like pattern counts it can't determine at compile time of the regex. But this works (tested):
Matcher m = Pattern.compile("\\((\\d+),(\\d+),(\\d+)\\)").matcher("(23,56,78) (54,22,11)");
while(m.find())
{
for(int i = 1; i <= m.groupCount(); ++i)
System.out.println(m.group(i));
}
I don't know the context of matching in java, but I know regex very well.
Try this context:
while matching BITMAP records is not done
("
([A-z][^\s]) 'maze.bmp' ~ group 1
\s+
\( (\d+),(\d+) \) '0' '0' ~ group 2,3
\s+
\( (\d+),(\d+) \) '319' '239' ~ group 4,5
\s+
(\d+) '65' ~ group 6
\s+
(\d+) '120' ~ group 7
\s+
(
(?: \( \d+,\d+,\d+ \) \s+ )+ '(254,243,90) (0,0,0) ' ~ group 8
)
") - context = global
{
// save to BITMAP.array (groups 1 - 7)
copy group 8 to variable '(254,243,90) (0,0,0) '
new matching of TUPLES, group 8 is the regex subject for this new match
("
(\d+)
") - context = global
append TUPLES.array (254 243 90 0 0 0)
to BITMAP.array (maze.bmp 0 0 319 239 65 120 <append> 254 243 90 0 0 0)
// do next BITMAP record
}

Categories