Stucked with Reverse Polish Notation - java

I'm having a homework: When we input the math expression like -(2 + 3) * 1/5 , the output should be -1. After doing some research, I found that RPN algorithm is the way to solve this problem. So what I did is convert the expression from infix to postfix. But the problem is I can't determine the operands in some situations, like this:
Input: 11+((10-2)*6)+7
Infix-to-Postfix-----------
Output: 11102-6*+7+
There is no white space between "11" and "10", and between "10" and "2" so I can't determine every single operand correctly .
Because my output (postfix) is a string, I'm totally don't know how to solve this problem. Is there any idea for this?

You described the problem -- and obvious solution -- in your posting: the postfix output you chose destroys critical information from the original expression. The obvious solution is that you have to change your postfix routine to preserve that information.
The particular problem is that you can no longer parse a string of digits into the original integers. The obvious solution is to retain or insert a unique separator. When you emit (output) an integer, add some sort of punctuation. Since RPN uses only numbers and a handful of operators, choose something easy for you to detect and to read yourself: a space, comma, or anything else that would work for you.
For instance, if you use a simple space, then you'd have RPN format as
11 10 2 -6 *+7 +
When you read this in your RPN evaluator, use the separator as a "push integer" signal (or operator).
Note that I have used this separator as a terminal character on every integer, not merely a separator between consecutive integers. Making it a terminal simplifies both output processing and input parsing. Deciding whether or not to add that symbol depends on only one token (the integer), rather than making it conditional upon two adjacent tokens (requiring a small amount of context status).

Related

Evaluating an infix expression (accounting for doubles and negatives)

The program should have a loop asking the user whether or not they have an infix expression to enter. If the answer is “yes”, the program should prompt the user to input the next expression.
I’ve got this down
The expression must be entered as a single string. The input string must be echoed back out as a string.
The program must use the 2-stack method to evaluate the expression. Print the numerical value.
Use this data:
3*(4+2*(6-4)+1)+2*3
2.75 * ((3 + 1.) – (_3.14159*7+3.1))
The underscore symbol _ means the number is negative.
I understand how to use the 2-stack method, I’m just having trouble coming up with a way to build up the number piece by piece. For example, _3.14159, I’m thinking I’ll need a loop to go through and update a number (int n for example) after each char is read in. I just can’t wrap my head around exactly how to do it. If someone could point me in the right direction it’d be greatly appreciated.
Edit: I haven’t found anything else that addresses this specifically. We can’t use a built in method to do this, we need to make it ourselves. Also, other examples don’t seem to account for doubles.

How to calculate string 2*2+6/3+6

I want to make calculator. I don't know to split the string and calculate the result. Is there any algorithm or some easy way to get result? I have already searched for it. But it only tells infix expressions.
Using an External Library
I Google searched the term "android library for calculating math expressions in strings" and the first result was: http://mathparser.org/
If you go to that link and scroll down a bit, it actually shows an animation of how it works. There also seems to be plenty of tutorials.
Creating your own Algorithm
Whilst using an external library in this case certainly does seem to be the optimal option, if you were required to develop the algorithm yourself the use of BODMAS (or PEMDAS depending where you are from) will be critical. At least in your case however, brackets and orders would not be needed.
Your algorithm could initially iterate the string for all cases of division e.g. where the character '/' is found, then multiplication '*', then addition '+' and finally subtraction '-'.
For each case the operands would be the digits to the left and right of the operator. Then replace the immediate expression with the calculation in the overall expression.
So the steps of your recursive style algorithm may look like the following:
2*2+6/3+6
2*2+2+6
4+2+6
6+6
12
Some things you'll need to be able to do:
Convert a character to the expected integer ( hint: subtract '0' or 48 )
Consider if your operands will be longer than one character e.g. 10 (look into substrings)
Possibly think about performance improvement e.g. 4+2+6 could be calculated in one step rather than two

Regex expression for comma and dash seperated text of items

I do have a Java Web Application, where I get some inputs from the user. Once I got this input I have to parse it and the parsing part depends on what kind of input I'll get. I decided to use the Pattern class of java for some of predefined user inputs.
So I need the last 2 regex patterns:
a)Enumaration:
input can be - A03,B24.1,A25.7
The simple way would be to check if there are a comma in there ([^,]+) but it will end up with a lot of updates in to parsing function, which I would like to avoid. So, in addition to comma it should check if it starts with
letter
minimum 3 letters (combined with numbers)
can have one dot in the word
minimum 1 comma (updated it)
b) Mixed
input can be A03,B24.1-B35.5,A25.7
So all of what Enumuration part got, but with addition that it can have a dash minimum one.
I've tried to use multiple online regex generators but didnt get it correct. Would be much appreciated if you can help.
Here is what I got if its B24.1-B35.5 if its just a simple range.
"='.{1}\\d{0,2}-.{1}\\d{0,2}'|='.{1}\\d{1,2}.\\d{1,2}-.{1}\\d{1,2}.\\d{1,2}'";
Edit1: Valid and Invalid inputs
for a)Enumaration
A03,B24.1,A25.7 Valid
A03,B24.1 Valid
A03,B24.1-B25.1 -Invalid because in this case (enumaration) it should not contain dash
A03 invalid because no comma
A03,B24.1 - Valid
A03 Invalid
for b)Mixed
everything that a enumeration has with addition that it can have dash too.
You can use this regex for (a) Enumeration part as per your rules:
[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
Rules:
Verifies that each segment starts with a letter
Minimum of three letters or numbers [A-Za-z][A-Za-z0-9]{2,}
Optionally followed by decimal . and one or more alphabets and numbers i.e (?:\.[A-Za-z0-9]{1,})?
Same thing repeated, and seperated by a comma ,. Also must have atleast one comma so using + i.e (?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
?: to indicate non-capturing group
Using [A-Za-z0-9] instead of \w to avoid underscores
Regex101 Demo
For (b) Mixed, you haven't shared too many valid and invalid cases, but based on my current understanding here's what I have:
[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:[,-][A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
Note that , from previous regex has been replaced with [,-] to allow - as well!
Regex101 Demo
// Will match
A03,B24.1-B35.5,A25.7
A03,B24.1,A25.7
A03,B24.1-B25.1
Hope this helps!
EDIT: Making sure each group starts with a letter (and not a number)
Thanks to #diginoise and #anubhava for pointing out! Changed [A-Za-z0-9]{3,} to [A-Za-z][A-Za-z0-9]{2,}
As I said in the comments, I would chop the input by commas and verify each segment separately. Your domain ICD 10 CM codes is very well defined and also I would be very wary of any input which could be non valid, yet pass the validation.
Here is my solution:
regex
([A-TV-Z][0-9][A-Z0-9](\.?[A-Z0-9]{0,4})?)
... however I would avoid that.
Since your domain is (moste likely) medical software, people's lives (or at least well being) is at stake. Not to mention astronomical damages and the lawyers ever-chasing ambulances. Therefore avoid the easy solution, and implement the bomb proof one.
You could use the regex to establish that given code is definitely not valid. However if a code passes your regex it does not mean that it is valid.
bomb proof method
See this example: O09.7, O09.70, O09.71, O09.72, O09.73 are valid entries, but O09.1 is not valid.
Therefore just get all possible codes. According to this gist there are 42784 different codes. Just load them to memory and any code which is not in the set, is not valid. You could compress said list and be clever about the encoding in memory, to occupy less space, but verbatim all codes are under 300kB on disk, so few MBs max in memory, therefore not a massive cost to pay for a price of people not having left instead of right kidney removed.

Java, poor regex performance with lazy expressions

The code is actually in Scala (Spark/Scala) but the library scala.util.matching.Regex, as per the documentation, delegates to java.util.regex.
The code, essentially, reads a bunch of regex from a config file and then matches them against logs fed to the Spark/Scala app. Everything worked fine until I added a regex to extract strings separated by tabs where the tab has been flattened to "#011" (by rsyslog). Since the strings can have white-spaces, my regex looks like:
(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)#011(.+?)
The moment I add this regex to the list, the app takes forever to finish processing logs. To give you an idea of the magnitude of delay, a typical batch of a million lines takes less than 5 seconds to match/extract on my Spark cluster. If I add the expression above, a batch takes an hour!
In my code, I have tried a couple of ways to match regex:
if ( (regex findFirstIn log).nonEmpty ) { do something }
val allGroups = regex.findAllIn(log).matchData.toList
if (allGroups.nonEmpty) { do something }
if (regex.pattern.matcher(log).matches()){do something}
All three suffer from poor performance when the regex mentioned above it added to the list of regex. Any suggestions to improve regex performance or change the regex itself?
The Q/A that's marked as duplicate has a link that I find hard to follow. It might be easier to follow the text if the referenced software, regexbuddy, was free or at least worked on Mac.
I tried negative lookahead but I can't figure out how to negate a string. Instead of /(.+?)#011/, something like /([^#011]+)/ but that just says negate "#" or "0" or "1". How do I negate "#011"? Even after that, I am not sure if negation will fix my performance issue.
The simplest way would be to split on #011. If you want a regex, you can indeed negate the string, but that's complicated. I'd go for an atomic group
(?>(.+?)#011)
Once matched, there's no more backtracking. Done and looking forward for the next group.
Negating a string
The complement of #011 is anything not starting with a #, or starting with a # and not followed by a 0, or starting with the two and not followed... you know. I added some blanks for readability:
((?: [^#] | #[^0] | #0[^1] | #01[^1] )+) #011
Pretty terrible, isn't it? Unlike your original expression it matches newlines (you weren't specific about them).
An alternative is to use negative lookahead: (?!#011) matches iff the following chars are not #011, but doesn't eat anything, so we use a . to eat a single char:
((?: (?!#011). )+)#011
It's all pretty complicated and most probably less performant than simply using the atomic group.
Optimizations
Out of my above regexes, the first one is best. However, as Casimir et Hippolyte wrote, there's a room for improvements (factor 1.8)
( [^#]*+ (?: #(?!011) [^#]* )*+ ) #011
It's not as complicated as it looks. First match any number (including zero) of non-# atomically (the trailing +). Then match a # not followed by 011 and again any number of non-#. Repeat the last sentence any number of times.
A small problem with it is that it matches an empty sequence as well and I can't see an easy way to fix it.

expression evaluation with right-associativity in java

I am trying to solve a problem in which I have to solve a given expression consisting of one or more initialization in a same string with no operator precedence (although with bracketed sub-expressions). All the operators have right precedence so I have to evaluate it from right to left. I am confused how to proceed for the given problem. Detailed problem is given here : http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=108
I'll give you some ideas to try:
First off, you need to recursively evaluate inside brackets. You want to do brackets from most nested to least nested, so use a regex that matches brackets with no ) inside of them. Substring the result of the computation into the part of the string the bracketed expression took up.
If there are no brackets, then now you need to evaluate operators. The reason why the question requires right precedence is to force you to think about how to answer it - you can't just read the string and do calculations. You have to consider the whole string THEN start doing calculations, which means storing some structure describing it. There's a number of strategies you could use to do this, for example:
-You could tokenize the string, either using a scanner or regexes - continually try to see if the next item in the string is a number or which of the operators it is, and push what kind of token it is and its value onto a list. Then, you can evaluate the list from right to left using some kind of case/switch structure to determine what to do for each operator (either that, or each operator is associated with what it does to numbers). = itself would address a map of variable name keys to values, and insert the value under that variable's key, and then return (to be placed into the list) the value it produced, so it can be used for another assignment. It also seems like - can be determined as to whether it's subtraction or a negative number by whether there's a space on its right or not.
-Instead of tokenization, you could use regexes on the string as a whole. But tokenization is more robust. I tried to build a calculator based on applying regexes to the whole string over and over but it's so difficult to get all the rules right and I don't recommend it.
I've written an expression evaluating calculator like this before, so you can ask me questions if you run into specific problems.

Categories