Regular Expression for Extracting Operands from Mathematical Expression - java

No question on SO addresses my particular problem. I know very little about regular expression. I am building an expression parser in Java using Regex Class for that purpose. I want to extract Operands, Arguments, Operators, Symbols and Function Names from expression and then save to ArrayList. Currently I am using this logic
String string = "2!+atan2(3+9,2+3)-2*PI+3/3-9-12%3*sin(9-9)+(2+6/2)" //This is just for testing purpose later on it will be provided by user
List<String> res = new ArrayList<>();
Pattern pattern = Pattern.compile((\\Q^\\E|\\Q/\\E|\\Q-\\E|\\Q-\\E|\\Q+\\E|\\Q*\\E|\\Q)\\E|\\Q)\\E|\\Q(\\E|\\Q(\\E|\\Q%\\E|\\Q!\\E)) //This string was build in a function where operator names were provided. Its mean that user can add custom operators and custom functions
Matcher m = pattern.matcher(string);
int pos = 0;
while (m.find())
{
if (pos != m.start())
{
res.add(string.substring(pos, m.start()))
}
res.add(m.group())
pos = m.end();
}
if (pos != string.length())
{
addToTokens(res, string.substring(pos));
}
for(String s : res)
{
System.out.println(s);
}
Output:
2
!
+
atan2
(
3
+
9
,
2
+
3
)
-
2
*
PI
+
3
/
3
-
9
-
12
%
3
*
sin
(
9
-
9
)
+
(
2
+
6
/
2
)
Problem is that now Expression can contain Matrix with user defined format. I want to treat every Matrix as a Operand or Argument in case of functions.
Input 1:
String input_1 = "2+3-9*[{2+3,2,6},{7,2+3,2+3i}]+9*6"
Output Should be:
2
+
3
-
9
*
[{2+3,2,6},{7,2+3,2+3i}]
+
9
*
6
Input 2:
String input_2 = "{[2,5][9/8,func(2+3)]}+9*8/5"
Output Should be:
{[2,5][9/8,func(2+3)]}
+
9
*
8
/
5
Input 3:
String input_3 = "<[2,9,2.36][2,3,2!]>*<[2,3,9][23+9*8/8,2,3]>"
Output Should be:
<[2,9,2.36][2,3,2!]>
*
<[2,3,9][23+9*8/8,2,3]>
I want that now ArrayList should contain every Operand, Operators, Arguments, Functions and symbols at each index. How can I achieve my desired output using Regular expression. Expression validation is not required.

I think you can try with something like:
(?<matrix>(?:\[[^\]]+\])|(?:<[^>]+>)|(?:\{[^\}]+\}))|(?<function>\w+(?=\())|(\d+[eE][-+]\d+)|(?<operand>\w+)|(?<operator>[-+\/*%])|(?<symbol>.)
DEMO
elements are captured in named capturing groups. If you don't need it, you can use short:
\[[^\]]+\]|<[^>]+>|\{[^\}]+\}|\d+[eE][-+]\d+|\w+(?=\()|\w+|[-+\/*%]|.
The \[[^\]]+\]|<[^>]+>|\{[^\}]+\} match opening bracket ({, [ or <), non clasing bracket characters, and closing bracket (},],>) so if there are no nested same-type brackets, there is no problem.
Implementatin in Java:
public class Test {
public static void main(String[] args) {
String[] expressions = {"2!+atan2(3+9,2+3)-2*PI+3/3-9-12%3*sin(9-9)+(2+6/2)", "2+3-9*[{2+3,2,6},{7,2+3,2+3i}]+9*6",
"{[2,5][9/8,func(2+3)]}+9*8/5","<[2,9,2.36][2,3,2!]>*<[2,3,9][23 + 9 * 8 / 8, 2, 3]>"};
Pattern pattern = Pattern.compile("(?<matrix>(?:\\[[^]]+])|(?:<[^>]+>)|(?:\\{[^}]+}))|(?<function>\\w+(?=\\())|(?<operand>\\w+)|(?<operator>[-+/*%])|(?<symbol>.)");
for(String expression : expressions) {
List<String> elements = new ArrayList<String>();
Matcher matcher = pattern.matcher(expression);
while (matcher.find()) {
elements.add(matcher.group());
}
for (String element : elements) {
System.out.println(element);
}
System.out.println("\n\n\n");
}
}
}
Explanation of alternatives:
\[[^\]]+\]|<[^>]+>|\{[^\}]+\} - match opening bracket of given
type, character which are not closing bracket of that type
(everything byt not closing bracket), and closing bracket of that
type,
\d+[eE][-+]\d+ = digit, followed by e or E, followed by operator +
or -, followed by digits, to capture elements like 2e+3
\w+(?=\() - match one or more word characters (A-Za-z0-9_) if it is
followed by ( for matching functions like sin,
\w+ - match one or more word characters (A-Za-z0-9_) for matching
operands,
[-+\/*%] - match one character from character class, to match
operators
. - match any other character, to match other symbols
Order of alternatives is quite important, as last alternative . will match any character, so it need to be last option. Similar case with \w+(?=\() and \w+, the second one will match everything like previous one, however if you don't wont to distinguish between functions and operands, the \w+ will be enough for all of them.
In longer exemple the part (?<name> ... ) in every alternative, is a named capturing group, and you can see in demo, how it group matched fragments in gorups like: operand, operator, function, etc.

With regular expressions you cannot match any level of nested balanced parentheses.
For example, in your second example {[2,5][9/8,func(2+3)]} you need to match the opening brace with the close brace, but you need to keep track of how many opening and closing inner braces/parens/etc there are. That cannot be done with regular expressions.
If, on the other hand, you simplify your problem to remove any requirement for balancing, then you probably can handle with regular expressions.

Related

How to split a string every N words [duplicate]

This question already has answers here:
How to split a String by space
(17 answers)
How to split a string array into small chunk arrays in java?
(17 answers)
Splitting at every n-th separator, and keeping the character
(4 answers)
Closed last year.
I want to split one big string into smaller parts, so given for example:
"A B C D E F G H I J K L"
I want to get array (String []): [A,B,C,D], [E,F,G,H], [I,J,K,L]
Is there any regex for that or I need to do that manually so first to split every space and then concat every N words. ??
You can create a regex that describes this pattern.
e.g. "((?:\w+\s*){4})"
Or in simple words:
The \w+\s* part means that there are 1 or multiple word-characters (e.g. text, digits) followed by 0, 1 or multiple whitespace characters.
It is surrounded in braces and followed by {4} to indicate that we want this to occur 4 times.
Finally that again is wrapped in braces, because we want to capture that result.
By contrast the braces which were used to specify {4} are preceded by a (?: ...) prefix, which makes it a "non-capturing-group". We don't want to capture the individual matches just yet.
You can use that pattern in java to extract each chunk of 4 occurrences.
And than next, you can simply split each individual result with a second regex, \s+ ( = whitespace)
Edit
One more thing, you may notice that the first matched group also contains whitespace at the end. You can get rid of that with a more advanced regex: ((?:\w+\s+){3}(?:\w+))\s*
You could use regex for this:
e.g.:
String x = "AAS BASD CAFAS DAFASF EASFASF FAFSASF GA HASF IAS JAS KAS LSA";
ArrayList<String> found = new ArrayList<>();
Pattern pattern = Pattern.compile("(\\w+\\s\\w+\\s\\w+)");
Matcher m = pattern.matcher(x);
while (m.find()) {
String s = m.group();
found.add(s);
}
//if you want to convert your List to an Array
String[] result = found.toArray(new String[0]);
System.out.println(Arrays.toString(result));
Result: [AAS BASD CAFAS DAFASF, EASFASF FAFSASF GA HASF, IAS JAS KAS LSA]
This pattern ("(\\w+\\s\\w+\\s\\w+\\s\\w+)") matches 4 words separated by one space. The loop iterates over every found match and adds it to your result list.
There are multiple ways you can achieve this,
for ex. let your string be
String str = "A B C D E F G H I J K L";
one way to split it would be using regular expression
java.util.Arrays.toString(str.split("(?<=\\G....)"))
here the .... represent how many characters in each string, another way to specify the pattern would be .{4}
another way would be
Iterable<String> strArr = Splitter.fixedLength(3).split(str );
there could be more ways to achieve the same

Separate numbers in math expression [duplicate]

This question already has answers here:
Parsing an arithmetic expression and building a tree from it in Java
(5 answers)
Closed 8 years ago.
I have a math expression stored as a String:
String math = "12+3=15";
I want to separate the string into the following:
int num1 (The first number, 12)
int num2 (The second number, 3)
String operator (+)
int answer (The answer, 15)
(num1 and num2 can be digits between 0-20, and operator can be either +,-,*,/)
What is the easiest way to achieve this? I was thinking about regular expressions, but I'm not sure how to do it.
Now, don't scowl at me.. You asked for the simplest solution :P
public static void main(String[] args) {
String math = "12+3=15";
Pattern p = Pattern.compile("(\\d+)(.)(\\d+)=(\\d+)");
Matcher m = p.matcher(math);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
System.out.println(m.group(4));
}
}
O/P :
12
+
3
15
EDIT : (\\d+)(.)(\\d+)=(\\d+) -->
\\d+ matches one or more digits.
. matches anything
() --> captures whatever is inside it
(\\d+)(.)(\\d+)=(\\d+) --> captures one or more digits followed by anything (+-* etc) then again one or more digits and ignores the "=" and then captures digits again.
captured groups are named from 1 to n.. group 0 represents the entire string.
\\b(\\d+)\\b|([+*\/-])
You can simply do this and grab the capture.See demo.
https://regex101.com/r/wU7sQ0/30
Or simply split by \\b.See demo.
https://regex101.com/r/wU7sQ0/31
var re = /\b(\d+)\b|([+=\\-])/gm;
var str = '12+3=15';
var m;
while ((m = re.exec(str)) != null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
// View your result using the m-variable.
// eg m[0] etc.
}

Java Regular expressions issue - Can't match two strings in the same line [duplicate]

This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 8 years ago.
just experiencing some problems with Java Regular expressions.
I have a program that reads through an HTML file and replaces any string inside the #VR# characters, i.e. #VR#Test1 2 3 4#VR#
However my issue is that, if the line contains more than two strings surrounded by #VR#, it does not match them. It would match the leftmost #VR# with the rightmost #VR# in the sentence and thus take whatever is in between.
For example:
#VR#Google#VR#
My code would match
URL-GOES-HERE#VR#" target="_blank" style="color:#f4f3f1; text-decoration:none;" title="ContactUs">#VR#Google
Here is my Java code. Would appreciate if you could help me to solve this:
Pattern p = Pattern.compile("#VR#.*#VR#");
Matcher m;
Scanner scanner = new Scanner(htmlContent);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String match_found = m.group().replaceAll("#VR#", "");
System.out.println("group: " + match_found);
}
}
I tried replacing m.group() with m.group(0) and m.group(1) but nothing. Also m.groupCount() always returns zero, even if there are two matches as in my example above.
Thanks, your help will be very much appreciated.
Your problem is that .* is "greedy"; it will try to match as long a substring as possible while still letting the overall expression match. So, for example, in #VR# 1 #VR# 2 #VR# 3 #VR#, it will match 1 #VR# 2 #VR# 3.
The simplest fix is to make it "non-greedy" (matching as little as possible while still letting the expression match), by changing the * to *?:
Pattern p = Pattern.compile("#VR#.*?#VR#");
Also m.groupCount() always returns zero, even if there are two matches as in my example above.
That's because m.groupCount() returns the number of capture groups (parenthesized subexpressions, whose corresponding matched substrings retrieved using m.group(1) and m.group(2) and so on) in the underlying pattern. In your case, your pattern has no capture groups, so m.groupCount() returns 0.
You can try the regular expression:
#VR#(((?!#VR#).)+)#VR#
Demo:
private static final Pattern REGEX_PATTERN =
Pattern.compile("#VR#(((?!#VR#).)+)#VR#");
public static void main(String[] args) {
String input = "#VR#Google#VR# ";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll("$1")
); // prints "Google "
}

Regex in Java with multiple condition to extract arithmetic operator

I am new to regular expression syntax, after one whole day digging on the google, still can't find a good regex in java to extract the thing I want from a string...
for example:I have a
stringA = "-3.5 + 2 * 3 / 2"
stringB = "2 * 3 / 2 - 3.5";
the regex i used was
regex="[\\+\\-\\*\\/]", -->choose +,-,*,or / from the target;
by doing this, I am able to capture ANY signs in the string including negative sign.
However, I was to capture the negative sign(-) only when it is following by a whitespace.
That is, I want the result from
string A as [ +, *, /], these three signs and stringB as [ *, / , -]
I realized I only need to add another condition into regex for the negative sign such as
regex = "[\\+{\\-\\s}\\*\\/]" ---> I want to choose same thing but with
extra condition "-"sign has to follow by a whitespace.
the square bracket does not work like this way..Is there anyone can kindly guide my how to add another condition into the original regex? or write a new regex to qualify the need? Thank you so much in advance.
Chi, this might be the simple regex you're looking for:
[+*/]|(?<=\s)-
How does it work?
There is an alternation | in the middle, which is a way of saying "match this or match that."
On the left, the character class [+*/] matches one character that is a +, * or /
On the right, the lookbehind (?<=\s) asserts "preceded by a whitespace character", then we match a minus.
How to use it?
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile("[+*/]|(?<=\\s)-");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
If you are interested, you may want to read up on regex lookaheads and lookbehinds.
Let me know if you have any question.
What you can do is ditch the class (the [] enclosed Pattern), use OR instead, and use a negative lookahead for your minus sign, to avoid for it to be followed by a digit:
String input0 = "2 * 3 / 2 - 3.5";
String input1 = "-3.5 + 2 * 3 / 2";
Pattern p = Pattern.compile("\\+|\\-(?!\\d)|\\*|/");
Matcher m = p.matcher(input0);
while (m.find()) {
System.out.println(m.group());
}
System.out.println();
m = p.matcher(input1);
while (m.find()) {
System.out.println(m.group());
}
Output
*
/
-
+
*
/
Yet another solution.
Maybe you want to catch the minus sign regardless of white spaces and rather depending on its meaning, i. e. a binary-minus operator and not the minus sign before the numbers.
You could have the case where you could have a binary-minus without any space at all, like in 3-5 or you could have a minus sign before the number with a space between them (which it is allowed in many programming languages, Java included). So, in order to catch your tokens properly (positive-negative-numbers and binary-operators) you can try this:
public static void main(String[] args) {
String numberPattern = "(?:-? *\\d+(?:\\.\\d+)?(?:E[+-]?\\d+)?)";
String opPattern = "[+*/-]";
Pattern tokenPattern = Pattern.compile(numberPattern + "|" + opPattern);
String stringA = "-3.5 + -2 * 3 / 2";
Matcher matcher = tokenPattern.matcher(stringA);
while(matcher.find()) {
System.out.println(matcher.group().trim());
}
}
Here you are catching operators AND ALSO operands, regardless of white spaces. If you only need the binary operators, just filter them.
Try with the string "-3.5+-2*3/2" (without spaces at all) and you'll have your tokens anyway.
Try String#replaceAll(). Its very simple pattern.
// [any digit] or [minus followed by any digit] or [decimal]
String regex = "(\\d|-\\d|\\.)";
String stringA = "-3.5 + 2 * 3 / 2";
String stringA1 = stringA.replaceAll(regex, "").trim();
System.out.println(stringA1);
String stringB = "2 * 3 / 2 - 3.5";
String stringB1 = stringB.replaceAll(regex, "").trim();
System.out.println(stringB1);
output
+ * /
* / -
Note : You can get all the operators using String#split("\\s+").

Is there a way to stop a RegEx before a character value and start another one after that character?

I am trying to remove numbers before a character such as a-z or *, /, +, -, and then remove any numbers following that character but before a different character. Here is what I have.
s= s.replaceAll("(\\d+)", "");
s= s.replace("*", r.toString());
Where s is the string that I need to read, and r is the result of the operation.
The * is arbitrary. It could be any char. previously mentioned
The problem with this is that it removes every number in the string.
If I were to iterate once with the input of:
26 + 4 - 2
The program returns this:
30 -
It deletes all three numbers and then replaces the "+" with 30.
I would like to change it to resemble this (with one iteration):
26 + 4 - 2
The first RegEx would delete the first set of numbers
+ 4 - 2
The second would remove the numbers after the operator, but before the next operator
+ - 2
The next statement would replace the operator with the result of the expression
30 - 2
I would like the same for problems with other functions such as sine, cosine, etc.
Note: Sine is 'a'
"Sin pi" is the same as "a pi"
After one iteration it should look like
a pi + 2
a + 2
0 + 2
Here is a sample of the code.
This is the Multiply "case"
case '*':
{
int m = n + 1;
while (m < result.length){
if (result[m] != '*' && result[m] != '/' && result[m] != '+' && result[m] != '-'){ //checks the item to see if it is numeric
char ch2 = result[m]; //makes the number a character
number3 += new String(new char[]{ch2}); //combines the character into a string. For example: '2' + '3' = "23".
++m;}
else {
break;
}}
resultNumber = (Double.parseDouble(number2) * Double.parseDouble(number3)); //"number2" holds the value of the numbers before the operator. Example: This number ----> "3" '*' "23"
equation = equation.replaceAll("(\\d+)", ""); // <---- Line I pulled out earlier that I want to change.
equation = equation.replace("*", resultNumber.toString()); // <----- Line I pulled out earlier
result = equation.toCharArray();
number3 = ""; //erases any number held
number2 = ""; //erases any number held
++n;
break;
}
I'll first suggest two alternate approaches, then answer your question as it stands.
Perhaps beter without regular expressions
I have many doubts about your application. A proper tokenizer (lexer), together with a very simple parser would likely do a better job and give clearer error messages than your code.
Matching all operands
Even if you were to use regular expressions, it might make more sense to match both operands in a single pass. I.e. match (\d+)\s*\*\s*(\d+) to match a multiplication of exactly two numbers. You could first search for a match, then extract the operands from the capturing groups, then compute the resulting value and finally glue together substrings including the result:
// Multiplication of unparenthesized integers
Pattern p = Pattern.compile("(\\d+)\\s*\\*\\s*(\\d+)");
Matcher m = p.matcher(s);
while (m.find()) {
int a = Integer.parseInt(m.group(1));
int b = Integer.parseInt(m.group(2));
s = s.substring(0, m.start(1)) + (a*b) + s.substring(m.end(2));
m.reset(s);
}
Answer to the question as it was phrased in the title
Regarding the exact formulation of your question:
Is there a way to stop a RegEx before a character value and start another one after that character?
If you want a regex to not match after a given character in the input, you can achieve that by a negative look-behind assertion. Likewise, to only match after a given character, you can use a positive look-behind assertion.
So a regex starting in (?<!\*.*) would only match up to the first occurrence of '*', whereas a regex starting in (?<=\*.*) would only match after the first occurrence of that character. Both would have to be compiled using DOTALL, or in a more complicated form like (?<!\*(?:\n|.*)*).
But ensuring that these matches correspond to the math you have in mind would likely be very tricky.

Categories