Pretty simple question and my brain is frozen today so I can't think of an elegant solution where I know one exists.
I have a formula which is passed to me in the form "A+B"
I also have a mapping of the formula variables to their "readable names".
Finally, I have a formula parser which will calculate the value of the formula, but only if its passed with the readable names for the variables.
For example, as an input I get
String formula = "A+B"
String readableA = "foovar1"
String readableB = "foovar2"
and I want my output to be "foovar1+foovar2"
The problem with a simple find and replace is that it can be easily be broken because we have no guarantees on what the 'readable' names are. Lets say I take my example again with different parameters
String formula = "A+B"
String readableA = "foovarBad1"
String readableB = "foovarAngry2"
If I do a simple find and replace in a loop, I'll end up replacing the capital A's and B's in the readable names I have already replaced.
This looks like an approximate solution but I don't have brackets around my variables
How to replace a set of tokens in a Java String?
That link you provided is an excellent source since matching using patterns is the way to go. The basic idea here is first get the tokens using a matcher. After this you will have Operators and Operands
Then, do the replacement individually on each Operand.
Finally, put them back together using the Operators.
A somewhat tedious solution would be to scan for all occurences of A and B and note their indexes in the string, and then use StringBuilder.replace(int start, int end, String str) method. (in naive form this would not be very efficient though, approaching smth like square complexity, or more precisely "number of variables" * "number of possible replacements")
If you know all of your operators, you could do split on them (like on "+") and then replace individual "A" and "B" (you'd have to do trimming whitespace chars first of course) in an array or ArrayList.
A simple way to do it is
String foumula = "A+B".replaceAll("\\bA\\b", readableA)
.replaceAll("\\bB\\b", readableB);
Your approach does not work fine that way
Formulas (mathematic Expressions) should be parsed into an expression structure (eg. expression tree).
Such that you have later Operand Nodes and Operator nodes.
Later this expression will be evaluated traversing the tree and considering the mathematical priority rules.
I recommend reading more on Expression parsing.
Matching Only
If you don't have to evaluate the expression after doing the substitution, you might be able to use a regex. Something like (\b\p{Alpha}\p{Alnum}*\b)
or the java string "(\\b\\p{Alpha}\\p{Alnum}*\\b)"
Then use find() over and over to find all the variables and store their locations.
Finally, go through the locations and build up a new string from the old one with the variable bits replaced.
Not that It will not do much checking that the supplied expression is reasonable. For example, it wouldn't mind at all if you gave it )A 2 B( and would just replace the A and B (like )XXX 2 XXX(). I don't know if that matters.
This is similar to the link you supplied in your question except you need a different regular expression than they used. You can go to http://www.regexplanet.com/advanced/java/index.html to play with regular expressions and figure out one that will work. I used it with the one I suggested and it finds what it needs in A+B and A + (C* D ) just fine.
Parsing
You parse the expression using one of the available parser generators (Antlr or Sable or ...) or find an algebraic expression parser available as open source and use it. (You would have to search the web to find those, I haven't used one but suspect they exist.)
Then you use the parser to generate a parsed form of the expression, replace the variables and reconstitute the string form with the new variables.
This one might work better but the amount of effort depends on whether you can find existing code to use.
It also depends on whether you need to validate the expression is valid according to the normal rules. This method will not accept invalid expressions, most likely.
Related
I am trying to solve a problem in which I have to solve a given expression consisting of one or more initialization in a same string with no operator precedence (although with bracketed sub-expressions). All the operators have right precedence so I have to evaluate it from right to left. I am confused how to proceed for the given problem. Detailed problem is given here : http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=108
I'll give you some ideas to try:
First off, you need to recursively evaluate inside brackets. You want to do brackets from most nested to least nested, so use a regex that matches brackets with no ) inside of them. Substring the result of the computation into the part of the string the bracketed expression took up.
If there are no brackets, then now you need to evaluate operators. The reason why the question requires right precedence is to force you to think about how to answer it - you can't just read the string and do calculations. You have to consider the whole string THEN start doing calculations, which means storing some structure describing it. There's a number of strategies you could use to do this, for example:
-You could tokenize the string, either using a scanner or regexes - continually try to see if the next item in the string is a number or which of the operators it is, and push what kind of token it is and its value onto a list. Then, you can evaluate the list from right to left using some kind of case/switch structure to determine what to do for each operator (either that, or each operator is associated with what it does to numbers). = itself would address a map of variable name keys to values, and insert the value under that variable's key, and then return (to be placed into the list) the value it produced, so it can be used for another assignment. It also seems like - can be determined as to whether it's subtraction or a negative number by whether there's a space on its right or not.
-Instead of tokenization, you could use regexes on the string as a whole. But tokenization is more robust. I tried to build a calculator based on applying regexes to the whole string over and over but it's so difficult to get all the rules right and I don't recommend it.
I've written an expression evaluating calculator like this before, so you can ask me questions if you run into specific problems.
I am using a regular expression for image file names.
The main reason why I'm using RegEx's is to prevent multiple files for the exact same purpose.
The syntax for the filenames can either be:
1) img_0F_16_-32_0.png
2) img_65_32_x.png
As you might have noticed, "img_" is the general prefix.
What follows is a two-digit hexadecimal number.
After another underscore comes an integer that has to be a power of two, somewhere between 1 through 512. Yet another underscore is next.
Okay so this far, my regular expression is working flawlessly.
The rest is what I'm having problems with:
Because what can follow is either a pair of integer coordinates (can be 0), separated by an underscore, or an x. After this comes the final ".png". Done.
Now the main problem I am having is that both variants have to be possible,
and also it is highly important that there may not be any duplicate coordinates.
Most importantly, integers, both positive and negative, may never start with one or more zeros!
This would produce duplications like:
401 = 00401
-10 = -0010
This is my first attempt:
img_[0-9a-fA-F]{2}_(1|2|4|8|16|32|64|128|256|512)_([-]?[1-9])?[0-9]*_([-]?[1-9])?[0-9]*[.]png
Thanks for your help in advance,
Tom S.
Why use regular expressions? Why not create a class that decomposes either variant of String to a canonical String, give the class a hashCode() and equals() method that uses this canonical String and then create a HashSet of these objects to make sure that only one of these types of files exist?
I have a bunch of strings representing mathematical functions (which could be nested and have any number of arguments), and I want to be able to use regex to return an array of strings, each string being an argument of the outer-most function. Here's an example:
"f1(f2(x),f3(f4(f5(x,y,z))),f(f(1)))"
I would want a regex pattern that I could use to somehow get an array of all the arguments of f1, which in this case are the strings "f2(x)", "f3(f4(f5(x,y,z)))", and "f(f(1))". There will be no spaces in the input string.
Thank you very much to anyone who can help.
I don't think this can be done with regexes alone.
This would probably require being able to identify balanced parentheses -- for example, once we've parsed f1(f2(x), the next character could either be a ) or a , -- and that's a canonical example of something that can't be done with regexes, but requires a more sophisticated parser.
I'm rather new to the community but I've seen some helpful posts on here so I thought I'd ask.
I've got a homework question that asks us to recursively check whether a given string is a valid prefix expression given by the two following rules (standard):
Variables (a-z) are prefix expressions
If O is a binary operator and F and E are prefix expressions, OFE
Now, I kind of get the evaluation and have looked at the prefix-to-infix algorithms, but I can't for the life of me figure out how to implement just the evaluation methods (as I only need to check if it's valid, so not +a-b for example).
I know most of the implementation for these problems is done using stacks but I don't see how I would do it recursively here... some help would be tremendously appreciated.
Think of it this way. (I'm not going to write the code, since that's what you need to learn).
You want to check if a certain string is a prefix expression, so you have a function:
boolean isPrefix(string)
Now, there's two way that string could be a prefix:
It's a character from a-z
It's in the format O(prefix)(prefix)
So first, you check if the string has a length of one and is between a-z, and if so, the answer is yes.
Next you can check if the string starts with an O. If it does, you need to test the rest of the string to see if it is composed of two prefix expressions (FE).
So you start iterating from 1 to length, and passing each substring (0->i, i->length) into isPrefix(). If both substrings are also valid prefix expressions, the answer is yes.
Otherwise, the answer is no.
That's pretty much it, but the implementation, however, is up to you.
I'm not sure I entirely understand the point of this, but I imagine you should have some method like checkPrefixIn(String s) that looks at only part of the given String, returns true if it is only a prefix, false if it is only an operator (or invalid character), or the return value of checkPrefixIn(partOfS), where partOfS is a substring of the input s
Consider following script (it's total nonsense in pseudo-language):
if (Request.hostMatch("asfasfasf.com") && someString.existsIn(new String[] {"brr", "hrr"})) {
if (Requqest.clientIp("10.0.x.x")) {
somevar = "1";
}
somevar = "2";
}
else {
somevar = "first";
}
string foo = "foo";
// etc. etc.
How would you grab if-block's parameters and contents from it? The if-block has format of:
if<whitespace>(<parameters>)<whitespace>{<contents>}<anything>
I tried using String.split() with regex pattern of ^if\s*\(|\)\s*\{|\}\s* but this fails miserably. Namely, the problem is that ) { is found also in inner if-block and the closing } is found from many places as well. I don't think neither lazy or eager expansion works here.
So... any pointers to what might I need here in order to implement this with regex?
I also need to get the remaining string without the if-block's code (so code starting from else { ...). Using just String.split() seems to make it difficult as there is no information about the length of the parts that were parsed away.
I initially created a loop based solution (using String.substring() heavily) for this, but it's dull. I would like to have something fancier instead. Should I go with regex or create a custom, generic function (there are many other cases than just this) that takes the parseable String and the pattern instead (consider the if<whitespace>(... pattern above)?
Edit: Changed returns to variable assignments as it would have not made sense otherwise.
You'd be far better off using (or writing) a parser than trying to do this with Regex.
Regex is great for somethings, but for complex parsing like this, it sucks. Another example where it sucks that gets asked a lot here is parsing HTML - you can do it to a limited degree, but for anything complex, a DOM parser is a much better solution.
For a [very] simple parser, what you need is a recursive function that searches for a braces { and }, recursing down a level each time it comes across an opening brace, and returning back up a level when it finds a closing brace. It then needs to store the string contents between the two braces at each level.
A regular language won't work because a regular grammar can't match things like "any number of open parenthesis followed by any number of close parenthesis". A context-free grammar would be needed for that.
Unless you use a context-free grammar parser for Java or a regular expression extension that makes regular expressions no longer regular, your loop-based solution is probably the fanciest solution.
As per the above, you'll need a parser. One type that's easy to implement (and fun to write!) is a recursive descent parser with backtracking. There is also a plethora of parser generators out there, though most of those have a learning curve. One Java-friendly parser generator is JavaCC.