Shunting-yard functions - java

I am using the Shunting-Yard algorithm (https://en.wikipedia.org/wiki/Shunting-yard_algorithm) in a Java program in order to create a calculator. I am almost done, but I still have to implement functions. I have ran into a problem: I want the calculator to automatically multiply variables like x and y when put together - Example: calculator converts xy to x*y. Also, I want the calculator to convert (x)(y) to (x)*(y) and x(y) to x*(y). I have done all of this using the following code:
infix = infix.replaceAll("([a-zA-Z])([a-zA-Z])", "$1*$2");
infix = infix.replaceAll("([a-zA-Z])\\(", "$1*(");
infix = infix.replaceAll("\\)\\(", ")*(");
infix = infix.replaceAll("\\)([a-zA-Z])", ")*$1");
(In my calculator, variable names are always single characters.)
This works great right now, but when I implement functions this will, of course, not work. It will turn "sin(1)" into "s*i*n*(1)". How can I make this code do the multiplication converting only for operators, and not for functions?

Preprocessing the input to parse isn't a good way to implement what you want. The text replacement can't know what the parsing algorithm knows and you also lose the original input, which can be useful for printing helpful error messages.
Instead, you should decide on what to do according to the context. Keep the type of the previously parsed token wth a special type for the beginning of the input.
If the previous token was a value token – a number, a variable name or the closing brace of a subextression – and the current one is a value token, too, emit an extra multiplication operator.
The same logic can be used to decide whether a minus sign is a unary negation or a binary subtraction: It's a subtraction if the minus is found after a value token and a negation otherwise.
Your idea to convert x(y) to x * (y) will, of course, clash with function call syntax.

We can break this down into two parts. There is one rule for bracketed expressions and another for multiplications.
Rather than the wikipedia article, which is a deliberately simplified for explanatory purposes, I would follow a more details example like Parsing Expressions by Recursive Descent that deals with bracketed expressions.
This is the code I use for my parser which can work with implicit multiplication. I have multi-letter variable names and use a space to separate different variables so you can have "2 pi r".
protected void expression() throws ParseException {
prefixSuffix();
Token t = it.peekNext();
while(t!=null) {
if(t.isBinary()) {
pushOp(t);
it.consume();
prefixSuffix();
}
else if(t.isImplicitMulRhs()) {
pushOp(implicitMul);
prefixSuffix();
}
else
break;
t=it.peekNext();
}
while(!sentinel.equals(ops.peek())) {
popOp();
}
}
This require a few other functions.
I've used a separate tokenizing step which breaks the input into discrete tokens. The Tokens class has a number of methods, in particular Token.isBinary() test if the operator is a binary operator like +,=,*,/. Another method Token.isImplicitMulRhs() tests if the token can appear on the right hand side of an implicit multiplication, this will be true for numbers, variable names, and left brackets.
An Iterator<Token> is used for the input stream. it.peekNext() looks at the next token and it.consume() moves to the next token in the input.
pushOp(Token) pushes a token onto the operator stack and popOp removes one and . pushOp has the logic to handle the precedence of different operators. Popping operator if they have lower precedence
protected void pushOp(Token op)
{
while(compareOps(ops.peek(),op))
popOp();
ops.push(op);
}
Of particular note is implicitMul an artificial token with the same precedence as multiplication which is pushed onto the operator stack.
prefixSuffix() handles expressions which can be numbers and variables with optional prefix of suffix operators. This will recognise "2", "x", "-2", "x++" removing tokens from the input and added them to the output/operator stack as appropriate.
We can think of this routine in BNF as
<expression> ::=
<prefixSuffix> ( <binaryOp> <prefixSuffix> )* // normal binary ops x+y
| <prefixSuffix> ( <prefixSuffix> )* // implicit multiplication x y
Handling brackets is done in prefixSuffix(). If this detects a left bracket, it will then recursively call expression(). To detect the matching right bracket a special sentinel token is pushed onto the operator stack. When the right bracket is encountered in the input the main loop breaks, and all operators on the operator stack popped until the sentinel is encountered and control returned to prefixSuffix(). Code for this might be like
void prefixSuffix() {
Token t = it.peekNext();
if(t.equals('(')) {
it.consume(); // advance the input
operatorStack.push(sentinel);
expression(); // parse until ')' encountered
t = it.peekNext();
if(t.equals(')')) {
it.consume(); // advance the input
return;
} else throw Exception("Unmatched (");
}
// handle variable names, numbers etc
}

Another approach may be the use of tokens, in a similar way to how a parser work.
The first phase would be to convert the input text into a list of tokens, which are objects that represent both the type of entity found and its value.
For example you can have a variable token, with its value being the name of the variable ('x', 'y', etc.), a token for open or close parenthesis, etc.
Since, I assume, you know in advance the names of the functions that can be used by the calculator, you'll also have a function token, with its value being the function name.
So the output of the tokenizing phase differentiates between variables and functions.
Implementing this is not too hard, just always try to match function names first,
so "sin" will be recognized as a function and not as three variables.
Now the second phase can be to insert the missing multiplication operators. This will not be hard now, since you know you to just insert them between:
{VAR, RIGHT_PAREN} and {VAR, LEFT_PAREN, FUNCTION}
But never between FUNCTION and LEFT_PAREN.

Related

How to split a string into either one or two new strings

I'm making a calculator, and some computations require 3 pieces of information (eg, 3*4), whereas other only require 2 (eg, 5!).
I have managed to split the input into 3 parts using the following code:
String[] parts = myInput.split(" ");
String num1 = parts[0];
String operation = parts[1];
String num2 = parts[2];
But this means that when I only type 2 things, it doesn't work.
How can I allow for either 3 or 2 things as an input?
You should not assume that the input will always in the 3 parameter form. Rather, take a generic input, parse on it based on use cases.
In your case, it boils down to two specific use cases AFTER you accept the input:
Operators that operate on single operand - unary operator
Operators that operate on double operand - binary operator
For each case, you can define a list of operators allowed. E.g. '!' will be in case 1 list, while '*' will be in case '2'
scan through the string, look for the operator position. [^0-9] (represents a pattern which other than numbers). Or simply you can do a trivial check for the operator (by creating your own custom method)
Once you have figured out this, validate your string (optional) - e.g. makes no sense to have two operands for a unary operator.
Having done this, it is fairly easy to parse now the required operands by splitting out the string based on the operator. Computation is fairly trivial then.
Hope this helps.
Without knowing more I can't help you find a better design to accommodate for the commonalities and variances. But to answer the stated questions....
You can use parts.length to know many variables you were given and use that to branch your code (i.e. if else statement).

defining rule for identifiers in ANTLR

I'm trying to write a grammar in ANTLR, and the rules for recognizing IDs and int literals are written as follows:
ID : Letter(Letter|Digit|'_')*;
TOK_INTLIT : [0-9]+ ;
//this is not the complete grammar btw
and when the input is :
void main(){
int 2a;
}
the problem is, the lexer is recognizing 2 as an int literal and a as an ID, which is completely logical based on the grammar I've written, but I don't want 2a to be recognized this way, instead I want an error to be displayed since identifiers cannot begin with something other than a letter... I'm really new to this compiler course... what should be done here?
It's at least interesting that in C and C++, 2n is an invalid number, not an invalid identifier. That's because the C lexer (or, to be more precise, the preprocessor) is required by the standard to interpret any sequence of digits and letters starting with a digit as a "preprocessor number". Later on, an attempt is made to reinterpret the preprocessor number (if it is still part of the preprocessed code) as one of the many possible numeric syntaxes. 2n isn't, so an error will be generated at that point.
Preprocessor numbers are more complicated than that, but that should be enough of a hint for you to come up with a simple solution for your problem.

expression evaluation with right-associativity in java

I am trying to solve a problem in which I have to solve a given expression consisting of one or more initialization in a same string with no operator precedence (although with bracketed sub-expressions). All the operators have right precedence so I have to evaluate it from right to left. I am confused how to proceed for the given problem. Detailed problem is given here : http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=108
I'll give you some ideas to try:
First off, you need to recursively evaluate inside brackets. You want to do brackets from most nested to least nested, so use a regex that matches brackets with no ) inside of them. Substring the result of the computation into the part of the string the bracketed expression took up.
If there are no brackets, then now you need to evaluate operators. The reason why the question requires right precedence is to force you to think about how to answer it - you can't just read the string and do calculations. You have to consider the whole string THEN start doing calculations, which means storing some structure describing it. There's a number of strategies you could use to do this, for example:
-You could tokenize the string, either using a scanner or regexes - continually try to see if the next item in the string is a number or which of the operators it is, and push what kind of token it is and its value onto a list. Then, you can evaluate the list from right to left using some kind of case/switch structure to determine what to do for each operator (either that, or each operator is associated with what it does to numbers). = itself would address a map of variable name keys to values, and insert the value under that variable's key, and then return (to be placed into the list) the value it produced, so it can be used for another assignment. It also seems like - can be determined as to whether it's subtraction or a negative number by whether there's a space on its right or not.
-Instead of tokenization, you could use regexes on the string as a whole. But tokenization is more robust. I tried to build a calculator based on applying regexes to the whole string over and over but it's so difficult to get all the rules right and I don't recommend it.
I've written an expression evaluating calculator like this before, so you can ask me questions if you run into specific problems.

Elegant way to do variable substitution in a java string

Pretty simple question and my brain is frozen today so I can't think of an elegant solution where I know one exists.
I have a formula which is passed to me in the form "A+B"
I also have a mapping of the formula variables to their "readable names".
Finally, I have a formula parser which will calculate the value of the formula, but only if its passed with the readable names for the variables.
For example, as an input I get
String formula = "A+B"
String readableA = "foovar1"
String readableB = "foovar2"
and I want my output to be "foovar1+foovar2"
The problem with a simple find and replace is that it can be easily be broken because we have no guarantees on what the 'readable' names are. Lets say I take my example again with different parameters
String formula = "A+B"
String readableA = "foovarBad1"
String readableB = "foovarAngry2"
If I do a simple find and replace in a loop, I'll end up replacing the capital A's and B's in the readable names I have already replaced.
This looks like an approximate solution but I don't have brackets around my variables
How to replace a set of tokens in a Java String?
That link you provided is an excellent source since matching using patterns is the way to go. The basic idea here is first get the tokens using a matcher. After this you will have Operators and Operands
Then, do the replacement individually on each Operand.
Finally, put them back together using the Operators.
A somewhat tedious solution would be to scan for all occurences of A and B and note their indexes in the string, and then use StringBuilder.replace(int start, int end, String str) method. (in naive form this would not be very efficient though, approaching smth like square complexity, or more precisely "number of variables" * "number of possible replacements")
If you know all of your operators, you could do split on them (like on "+") and then replace individual "A" and "B" (you'd have to do trimming whitespace chars first of course) in an array or ArrayList.
A simple way to do it is
String foumula = "A+B".replaceAll("\\bA\\b", readableA)
.replaceAll("\\bB\\b", readableB);
Your approach does not work fine that way
Formulas (mathematic Expressions) should be parsed into an expression structure (eg. expression tree).
Such that you have later Operand Nodes and Operator nodes.
Later this expression will be evaluated traversing the tree and considering the mathematical priority rules.
I recommend reading more on Expression parsing.
Matching Only
If you don't have to evaluate the expression after doing the substitution, you might be able to use a regex. Something like (\b\p{Alpha}\p{Alnum}*\b)
or the java string "(\\b\\p{Alpha}\\p{Alnum}*\\b)"
Then use find() over and over to find all the variables and store their locations.
Finally, go through the locations and build up a new string from the old one with the variable bits replaced.
Not that It will not do much checking that the supplied expression is reasonable. For example, it wouldn't mind at all if you gave it )A 2 B( and would just replace the A and B (like )XXX 2 XXX(). I don't know if that matters.
This is similar to the link you supplied in your question except you need a different regular expression than they used. You can go to http://www.regexplanet.com/advanced/java/index.html to play with regular expressions and figure out one that will work. I used it with the one I suggested and it finds what it needs in A+B and A + (C* D ) just fine.
Parsing
You parse the expression using one of the available parser generators (Antlr or Sable or ...) or find an algebraic expression parser available as open source and use it. (You would have to search the web to find those, I haven't used one but suspect they exist.)
Then you use the parser to generate a parsed form of the expression, replace the variables and reconstitute the string form with the new variables.
This one might work better but the amount of effort depends on whether you can find existing code to use.
It also depends on whether you need to validate the expression is valid according to the normal rules. This method will not accept invalid expressions, most likely.

Expression parsing using binary operators

I'm trying to figure out ways to parse an expression that uses all binary operators. Each operator is surrounded by exactly one set of parenthesis, such that:
5x^2 + 3x + 2
would be
((5*(x^2))+((3*x)+2))
and is taken as an args[] argument (more importantly, it is given as a String).
I'm recursively breaking this down, where each recursion breaks down the left part of the top binary operator and call the recursion with that expression as an argument, then again with the right. The base case is when the expression being passed contains no operators.
The problem I'm having is appropriately parsing the left side from the right side. I am trying to develop a way based upon a Scanner that counts the number of parenthesis counted overall, but can't seem to determine a final solution. Anyone have an idea how to parse this correctly in order to pass it as an expression to the recursive method.
P.s. - Language I am using is Java
EDIT::::
I am using this parser as a part of a GUI graph plotter, so I would set the variable (x) based on what value of the x-axis I am currently looking to generate on the GUI graph. So, the expression being parsed within the program (as shown in the second code tag above) would be broken down and operated on to produce a final "y" value that would correlate to the position on the window where a small dot would be used to represent that point on the line of the graph.
Maybe this will better explain how I am trying to use this.
I would start with an element class
interface Element {
}
And two elements
abstract class Operator implements Element {
Operand operate(Operand a, Operand b);
}
class Operand implements Element {
int value;
Operand(int value) { this.value = value; }
}
Now you can create your Operator factory
class OperatorFactory {
Operator createOperator(String symbol) {
if("+".equals(symbol))
return new Operator() {
Operator operate(Operand a, Operand b) {
return new Operand(a.value + b.value);
}
};
if("-".equals(symbol)) /* continued */
}
}
Now you're able to make yourself a stack processor that recurs when you reach a "(" and operates when you reach a ")". I imagine the rest will be pretty trivial from there.
What you want to implement is a simple recursive descent parser, which means that each production of your grammar will turn into a recursive function probably similar to what you're currently doing.
I don't know if you're familiar with the BNF syntax but here is one possible grammar for your language of binary operators. I'm omitting the power operator, which I leave for you to implement.
Expression ::= Expression + Term
Expression ::= Expression - Term
Expression ::= Term
Term ::= Term * Factor
Term ::= Term / Factor
Term ::= Factor
Factor ::= number
Factor ::= ( Expression )
With that you can see that you're defining an Expression by using an Expression, hence the use of recursive functions.
Please read this Wikipedia link and you'll immediately see how to implement what you want.
http://en.wikipedia.org/wiki/Recursive_descent_parser
Is your problem with logic or code?
In your logic, only issue I see is the precedence and associativity should also be considered (depending on the operator).
If the problem is with code, can you post the code? Also an example with teh expected output vs actual output would help so I don't have to put it in eclipse and run it myself.

Categories