Java Regex Help: Splitting String on spaces, "=>", and commas - java

I need to split a string on any of the following sequences:
1 or more spaces
0 or more spaces, followed by a comma, followed by 0 or more spaces,
0 or more spaces, followed by "=>", followed by 0 or more spaces
Haven't had experience doing Java regexs before, so I'm a little confused. Thanks!
Example:
add r10,r12 => r10
store r10 => r1

Just create regex matching any of your three cases and pass it into split method:
string.split("\\s*(=>|,|\\s)\\s*");
Regex here means literally
Zero or more whitespaces (\\s*)
Arrow, or comma, or whitespace (=>|,|\\s)
Zero or more whitespaces (\\s*)
You can replace whitespace \\s (detects spaces, tabs, line breaks, etc) with plain space character if necessary.

Strictly translated
For simplicity, I'm going to interpret you indication of "space" () as "any whitespace" (\s).
Translating your spec more or less "word for word" is to delimit on any of:
1 or more spaces
\s+
0 or more spaces (\s*), followed by a comma (,), followed by 0 or more spaces (\s*)
\s*,\s*
0 or more spaces (\s*), followed by a "=>" (=>), followed by 0 or more spaces (\s*)
\s*=>\s*
To match any of the above: (\s+|\s*,\s*|\s*=>\s*)
Reduced form
However, your spec can be "reduced" to:
0 or more spaces
\s*,
followed by either a space, comma, or "=>"
(\s|,|=>)
followed by 0 or more spaces
\s*
Put it all together: \s*(\s|,|=>)\s*
The reduced form gets around some corner cases with the strictly translated form that makes some unexpected empty "matches".
Code
Here's some code:
import java.util.regex.Pattern;
public class Temp {
// Strictly translated form:
//private static final String REGEX = "(\\s+|\\s*,\\s*|\\s*=>\\s*)";
// "Reduced" form:
private static final String REGEX = "\\s*(\\s|=>|,)\\s*";
private static final String INPUT =
"one two,three=>four , five six => seven,=>";
public static void main(final String[] args) {
final Pattern p = Pattern.compile(REGEX);
final String[] items = p.split(INPUT);
// Shorthand for above:
// final String[] items = INPUT.split(REGEX);
for(final String s : items) {
System.out.println("Match: '"+s+"'");
}
}
}
Output:
Match: 'one'
Match: 'two'
Match: 'three'
Match: 'four'
Match: 'five'
Match: 'six'
Match: 'seven'

String[] splitArray = subjectString.split(" *(,|=>| ) *");
should do it.

Related

Regulare expression for finding words started with # and end with whitespaces or new line

I am looking for a Java regex to find usernames in a text.
Usernames always start with #. No whitespace is acceptable after #. Usernames are combinations of upper- or lowercase letters, digits, ., _, -.
So, the regex should match words that start with # and end with whitespace or newline.
For example, the text hi #anyOne.2 I'm looking for #Name_13
#name14 #_n.a.m.e-15 but not # name16 contains the following matches: anyOne.2, Name_13, name14, _n.a.m.e-15.
I am using
String pattern = "#[^\\s]*(\\w+)";
You may use
\B#(\S+)
See the regex demo.
Details
\B - a non-word boundary, the char that is right before the current location must be a non-word char (or start of string)
# - a # char
(\S+) - Capturing group 1: one or more non-whitespace characters.
See Java sample usage:
String text = "hi #anyOne.2 I'm looking for #Name_13 #name14 #_n.a.m.e-15 but not # name16";
String pattern = "\\B#(\\S+)";
// Java 9+
String[] results = Pattern.compile(pattern).matcher(text).results().flatMap(grps -> Stream.of(grps.group(1))).toArray(String[]::new);
System.out.println(Arrays.toString(results)); // => [anyOne.2, Name_13, name14, _n.a.m.e-15]
// Java 8 (include import java.util.stream.*)
Matcher m = Pattern.compile(pattern).matcher(text);
List<String> strs = new ArrayList<>();
while(m.find()) {
strs.add(m.group(1));
}
System.out.println(strs); // => [anyOne.2, Name_13, name14, _n.a.m.e-15]

Splitting a String by number of delimiters

I am trying to split a string into a string array, there might be number of combinations,
I tried:
String strExample = "A, B";
//possible option are:
1. A,B
2. A, B
3. A , B
4. A ,B
String[] parts;
parts = strExample.split("/"); //Split the string but doesnt remove the space in between them so the 2 item in the string array is space and B ( B)
parts = strExample.split("/| ");
parts = strExample.split(",|\\s+");
Any guidance would be appreciated
To split with comma enclosed with optional whitespace chars you may use
s.split("\\s*,\\s*")
The \s*,\s* pattern matches
\s* - 0+ whitespaces
, - a comma
\s* - 0+ whitespaces
In case you want to make sure there are no leading/trailing spaces, consider trim()ming the string before splitting.
You can use
parts=strExample.split("\\s,\\s*");
for your case.

Regex to find a string containing more than a single whitespace with no leading/trailing whitespace

Currently i have
Pattern p = Pattern.compile("\s");
boolean invalidChar = p.matcher(text).find();
I want it to return true only when i have more than a single whitespace.
Also there should not be any whitespace in the beginning or ending of string.
So some valid/invalid text would be
12 34 56 = valid
ab-34 56 = valid
ab 34 = invalid
12 34 53 = invalid
Without regex..
public class Answ {
public static boolean isValid(String s) {
return !s.contains(" "); //two white spaces
}
public static void main(String[] args) {
String st1 = "12 34 56";
System.out.println(isValid(st1));
}
}
Try this:
(^\s{1,}|\s{2,}|\s$)
Final:
Pattern p = Pattern.compile("(^\s{1,}|\s{2,}|\s$)");
Since there can't be whitespace at the start and end of the string, and there cannot be two or more consecutive whitespaces inside, you may use
boolean isValid = s.matches("\\S+(?:\\s\\S+)*");
This expression will match the following:
^ (implicit in matches that anchors the match by default, i.e. the whole string must match the regex pattern) - the start of the string
\S+ - 1 or more chars other than whitespaces
(?:\s\S+)* - zero or more sequences of:
\s - a single whitespace
\S+ - 1 or more chars other than whitespaces
$ (implicit in matches) - the end of the string.
See the regex demo.
You can use this pattern:
Pattern p = Pattern.compile("(?<!\\S)(?!\\S)");
Matcher m = p.matcher(text);
boolean invalidChar = m.find();
or boolean isValid = !m.find(), as you want.
Where (?<!\\S) means "not preceded by a non-whitespace" (that includes a preceding whitespace or the start of the string) and (?!\\S) "not followed by a non-whitespace" (that includes a following whitespace or the end of the string).
These two lookarounds describe all possible cases:
successive white-spaces (matches the position between the first two white-spaces)
white-space at the beginning or at the end
empty string
Try this:
boolean invalidChar = text.matches("\\S(?!.*\\s\\s).*\\S");
Explanation:
\\S - the match begins with a non-whitespace character
(?!.*\\s\\s) - negative lookahead assertion to ensure there are no instances of two whitespace characters next to each other
.* - matches 0 or more of any character
\\S - the match ends with a non-whitespace character
Note: the matches("regex") method returns true only if the regex matches the entire text string.

Splitting a string on whitespaces

I'm currently trying to splice a string into a multi-line string.
The regex should select white-spaces which has 13 characters before.
The problem is that the 13 character count does not reset after the previous selected white-space. So, after the first 13 characters, the regex selects every white-space.
I'm using the following regex with a positive look-behind of 13 characters:
(?<=.{13})
(there is a whitespace at the end)
You can test the regex here and the following code:
import java.util.ArrayList;
public class HelloWorld{
public static void main(String []args){
String str = "This is a test. The app should break this string in substring on whitespaces after 13 characters";
for (String string : str.split("(?<=.{13}) ")) {
System.out.println(string);
}
}
}
The output of this code is as follows:
This is a test.
The
app
should
break
this
string
in
substring
on
whitespaces
after
13
characters
But it should be:
This is a test.
The app should
break this string
in substring on
whitespaces after
13 characters
You may actually use a lazy limiting quantifier to match the lines and then replace with $0\n:
.{13,}?[ ]
See the regex demo
IDEONE demo:
String str = "This is a test. The app should break this string in substring on whitespaces after 13 characters";
System.out.println(str.replaceAll(".{13,}?[ ]", "$0\n"));
Note that the pattern matches:
.{13,}? - any character that is not a newline (if you need to match any character, use DOTALL modifier, though I doubt you need it in the current scenario), 13 times at least, and it can match more characters but up to the first space encountered
[ ] - a literal space (a character class is redundant, but it helps visualize the pattern).
The replacement pattern - "$0\n" - is re-inserting the whole matched value (it is stored in Group 0) and appends a newline after it.
You can just match and capture 13 characters before white spaces rather than splitting.
Java code:
Pattern p = Pattern.compile( "(.{13}) +" );
Matcher m = p.matcher( text );
List<String> matches = new ArrayList<>();
while(m.find()) {
matches.add(m.group(1));
}
It will produce:
This is a test.
The app should
break this string
in substring on
whitespaces after
13 characters
RegEx Demo
you can do this with the .split and using regular expression. It would be like this
line.split("\\s+");
This will spilt every word with one or more whitespace.

Regex add space between all punctuation

I need to add spaces between all punctuation in a string.
\\ "Hello: World." -> "Hello : World ."
\\ "It's 9:00?" -> "It ' s 9 : 00 ?"
\\ "1.B,3.D!" -> "1 . B , 3 . D !"
I think a regex is the way to go, matching all non-punctuation [a-ZA-Z\\d]+, adding a space before and/or after, then extracting the remainder matching all punctuation [^a-ZA-Z\\d]+.
But I don't know how to (recursively?) call this regex. Looking at the first example, the regex will only match the "Hello". I was thinking of just building a new string by continuously removing and appending the first instance of the matched regex, while the original string is not empty.
private String addSpacesBeforePunctuation(String s) {
StringBuilder builder = new StringBuilder();
final String nonpunctuation = "[a-zA-Z\\d]+";
final String punctuation = "[^a-zA-Z\\d]+";
String found;
while (!s.isEmpty()) {
// regex stuff goes here
found = ???; // found group from respective regex goes here
builder.append(found);
builder.append(" ");
s = s.replaceFirst(found, "");
}
return builder.toString().trim();
}
However this doesn't feel like the right way to go... I think I'm over complicating things...
You can use lookarounds based regex using punctuation property \p{Punct} in Java:
str = str.replaceAll("(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)", " ");
(?<=\\S) Asserts if prev char is not a white-space
(?<=\\p{Punct}) asserts a position if previous char is a punctuation char
(?=\\p{Punct}) asserts a position if next char is a punctuation char
(?=\\S) Asserts if next char is not a white-space
IdeOne Demo
When you see a punctuation mark, you have four possibilities:
Punctuation is surrounded by spaces
Punctuation is preceded by a space
Punctuation is followed by a space
Punctuation is neither preceded nor followed by a space.
Here is code that does the replacement properly:
String ss = s
.replaceAll("(?<=\\S)\\p{Punct}", " $0")
.replaceAll("\\p{Punct}(?=\\S)", "$0 ");
It uses two expressions - one matching the number 2, and one matching the number 3. Since the expressions are applied on top of each other, they take care of the number 4 as well. The number 1 requires no change.
Demo.

Categories