Need help splitting the expression with regex - java

I have an expression like this.
A AND (B OR (C OR D))
I want the parentheses as a separate string and not combined with C OR D in the output array.
[A, AND, (, B, OR, (, C, OR, D, ), )]
Appending , in place of SPACE and after every ( and before every ) and then using .split(",") would solve my problem.
Is there any way better way to do this by simply using the right regex in the split method ?

How about this:
String input = "A AND (B OR (C OR D))";
String regex = "\\s+|(?<=\\()|(?=\\))";
String[] tokens = input.split(regex);
Which returns:
{A, AND, (, B, OR, (, C, OR, D, ), )}
Explanation:
The regex splits by
One or more spaces
Anything followed by a parenthesis
Anything preceded by a parenthesis
I used positive lookaheads and positive lookbehinds, which are INCREDIBLY useful, so do look them up (no pun intended)

I hope this would help:
"A AND (B OR (C OR D))".split(" +| (?=\\()|(?=\\))|(?<=\\()") #=> [A, AND, (, B, OR, (, C, OR, D, ), )]
+ # splits by whitespaces
(?=\\() # splits by whitespace followed by opening brace: e.g. in " (" it would give you single "(" instead of " " and "(" (like in the next part without whitespace in the beginning)
(?=\\)) # splits by empty string followed by closing brace: e.g. "B)" => ["B", ")"]
(?<=\\)) # splits by empty string preceding by closing brace: e.g. "))"
Search for "Positive lookahead/lookbehind" in regular expressions (personally I use regex101.com).

Related

How to remove spaces from string only if it occurs once between two words but not if it occurs thrice?

I am a beginner working on a diff and regenerate algorithm but for Strings. I store the patch in a file. To regenerate the new string from old I use that file. Although the code works, I face a problem when using space.
I use replaceAll(" ", ""); for removing spaces. This is fine when the string is [char][space][char], but creates problem when it is like [space][space][space]. Here, I want that the space be retained(only one).
I thought of doing replaceAll(" ", " ");. But this would leave spaces in type [char][space][char]. I am using scanner to scan through the string.
Is there a way to achieve this?
Input Output
c => c
cc => cc
c c => cc
c c => This is not possible. Since there will be padding of one space for each character
c c => c c
We can also split the string on where there are more than one white space, then join the resulting array by into a string using the Stream and Collector API.
Also we would replace the single spaces by using replaceAll() in a Stream#map operation:
String test = " this is a test of space in string ";
//using the pattern \\s{n,} for splitting at multi spaces
String[] arr = test.split("\\s{2,}");
String s = Arrays.stream(arr)
.map(str -> str.replaceAll(" ", ""))
.collect(Collectors.joining(" "));
System.out.println(s);
Output:
this isatestof spaceinstring
You could use lookarounds to do your replacement:
String newText = text
.replaceAll("(?<! ) (?! )", "")
.replaceAll(" +", " ");
The first replaceAll removes any space not surrounded by spaces; the second one replaces the remaining sequences of spaces by a single one.
Ideone example. Sequences of two or more spaces become a single space, and single spaces are removed.
Lookarounds
A lookaround in the context of regular expressions is a collective term for lookbehinds and lookaheads. These are so-called zero-width assertions, that means they match a certain pattern, but do not actually consume characters. There are positive and negative lookarounds.
A short example: the pattern Ira(?!q) matches the substring Ira, but only if it's not followed by a q. So if the input string is Iraq, it won't match, but if the input string is Iran, then the match is Ira.
More info:
https://www.regular-expressions.info/lookaround.html
If you want to replace any group of space by one you could use:
value.replaceAll("\\s+", " ")
I had to use two replacements:
String e = "a b c";
e = e.replaceAll("([A-Z|a-z])\\s([A-Z|a-z])", "$1$2");
e = e.replaceAll(" "," ");
System.out.println(e);
Which prints
ab c
The first one replaces any letter-space-letter combo with just the two letters, and then the second replaces any triple-space with a single space.
The first replacement is using backreferences. $1 refers to the part inside the first set of parenthesis that matches the first letter, and $2 refers to the part inside the second set of parenthesis.
If you have leading/trailing spaces on the input, you can call trim() before doing the replacements.
e = e.trim()

How to split a string by space and some special character in java

Consider the string .more opr (&x NE &m),&n+1
All i need is split this string into following parts .more,opr,(,&x,NE,&m,) , , , &n, +, 1.In short I need to split on spaces and some special symbols like ( ) , and arithmetic operators.
How to write regex expression for split() in java to achieve this.
Split on space or either side of brackets or operators:
str.split(" |(?<=[,()+-])|(?<! )(?=[,()+-])")
The output of:
String str = ".more opr (&x NE &m),&n+1";
System.out.println(Arrays.toString(str.split(" |(?<=[,()+-])|(?<! )(?=[,()+-])")));
is:
[.more, opr, (, &x, NE, &m, ), ,, &n, +, 1]
Or more clearly:
Arrays.stream(str.split(" |(?<=[,()+-])|(?<! )(?=[,()+-])")).forEach(System.out::println);
outputs:
.more
opr
(
&x
NE
&m
)
,
&n
+
1

Java How to remove apostrophes from a string based on the preceding character

I would like to remove the apostrophes in a string based on the preceding character. Please let me know the most efficient way to do this. If apostrophe is found in a string and the preceding character is {D, L, O, d, l, o}, then I would like to keep the apostrophe. Any other letter, remove it.
few examples:
Name |Expected outcome
F'redricks |Fredricks
D'Angelo |D'Angelo
O’Brien |O'Brien
L'Beam |L'Beam
d'Angelo |d'Angelo
o’Brien |o'Brien
l'Beam |l'Beam
'''AAAA'''' |AAAA
'D'Angelo |D'Angelo
‘O’Brien |O'Brien
‘L'Beam |L'Beam
You can use a regex replace with a negative lookbehind:
input = input.replaceAll("(?i)(?<![dlo])'", "");
This removes any apostrophe not preceded by d, l or o. The (?i) flag is for case-insensitivity.

Regex to break non-whitespace string into individual characters and digit chunks in Java

I've been reading/searching for awhile now, but can't find anything that quite answers my question case.
Currently, I have a string (str) such as "a1bc23def456" being split using the following regex:
String[] stuff = str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
which gives me a string array that looks like
["a","1","bc","23","def","456"]
but what I am trying to get is a split on every character that is a letter, and before a number begins. So that my array will look like:
["a","1","b","c","23","d","e","f","456"]
so numbers are split from letters, but not from themselves, and letters are split from everything.
I am quite fresh to using regex with Java, so please go easy.
Edit:
This is not quite like the "duplicate" question linked. Because the regex answers provided in that section also result in the same splitting pattern.
I am trying to split groupings of letters. I think it was said well above "so numbers are split from letters, but not from themselves, and letters are split from everything [including other letters]."
The simplest regex that works is:
(?<=\D)|(?=\D)
Which splits before or after a letter (\D means non-digit, which in this context is a letter).
Demo:
System.out.println(Arrays.toString("a1bc23def456".split("(?<=\\D)|(?=\\D)")));
Output:
[a, 1, b, c, 23, d, e, f, 456]
You can use either of the 2 approaches mentioned in a very similar question:
Matching single non-digit characters with \D or (|) digit chunks (\d+): \D|\d+
Splitting a string at the locations between a non-digit and a digit ((?<=\D)(?=\d)) AND right before a non-digit ((?=\D))
Java demo:
String str = "a1bc23def456";
String[] stuff = str.split("(?=[^0-9])|(?<=[^0-9])(?=[0-9])");
System.out.println("Split: " + Arrays.toString(stuff)); // => Split: [a, 1, b, c, 23, d, e, f, 456]
// Or match...
Matcher matcher = Pattern.compile("[^0-9]|[0-9]+").matcher(str);
List<String> result = new ArrayList<>();
while (matcher.find()) {
result.add(matcher.group(0));
}
System.out.println("Match: " + result); // => Match: [a, 1, b, c, 23, d, e, f, 456]
This works for me:
(?!^|(?<=\d)(?=\d))
It matches anywhere except the beginning of the string or between two digits. If you're using Java 8, you can leave out the ^|, because it automatically removes leading empty tokens (same as it always removed trailing empty tokens). Here's a demo.

Converting a String representing a mathemetical expression into an array

I want to convert a String such as 1+40.2+(2) into a String array [1, +, 40.2, +, (, 2, )] in order to use it as a parameter for a Shunting Yard algorithm in my Calculator class.
The input will be entered without spaces, so I can't just use input.split("\\s+"). I have come up with a long process involving ArrayLists, StringBuilders, and stacks, but I was wondering if there was an easier way to do this.
input.split("") won't work, since it would return [1, +, 4, 0, ., 2, +, (, 2, )]. This is actually the starting point of my current process, and I can post the pseudocode for it, if anyone is interested (although I'm having problems actually implementing my pseudocode).
Any advice or help is appreciated. Thanks!
I really like the first answer, but if you want to try using Regex as suggested in second comment, here's a Regex that will match each element of your equation one by one so you can append to your list. Note that it assumes that all of the string consists of are decimal point numbers, operators, and parenthesis.
[0-9\.]+|[+\-*/]|[()]
Note that in character classes, any character except ^-]\ is a literal so that's why the character classes look a bit funny. To construct the corresponding Java pattern, use
Pattern.compile("[0-9\\.]+|[+\\-*/]|[()]")
Example:
String s = "1+40.2+(2)";
Pattern p = Pattern.compile("[0-9\\.]+|[+\\-*/]|[()]");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
1
+
40.2
+
(
2
)
The replaceAll string method should be able to help you. Use this to surround the tokens you want to pull out with a special dividing character (I arbitrarily chose ':', but any character/string you're confident won't actually be in the input will work). Then you can split on that character.
String s = "1+40.2+(2)";
String dividingToken = ":";
String[] sSplit = s.replaceAll("\\+", dividingToken + "+" + dividingToken)
.replaceAll("\\(", dividingToken + "(" + dividingToken)
.replaceAll("\\)", dividingToken + ")" + dividingToken)
.split(dividingToken);
for(String str: sSplit){
System.out.println(str);
}
Output:
1
+
40.2
+
(
2
)
You could easily loop .replaceAll over an array of tokens (["+", "-", "*", ...]) that you want to split up. Just remember to add "//" before it in replace all because many of them have special regex meaning, whereas you actually want to match "+".

Categories