how can I simplify a regular expression generated from a DFA - java

I try to implement the Brzozowski algebraic method using Java, to generate the regular expression of the langage accepted by a given DFA, The expression generated is correct but not simplified.
For example :
E|(E)e(e)|(A|(E)e(A))(A|e|(B)e(A))*(B|(B)e(e))|(B|(E)e(B)|(A|(E)e(A))(A|e|(B)e(A))*(E|(B)e(B)))(B|e|(E)e(B)|(A|(E)e(A))(A|e|(B)e(A))*(E|(B)e(B)))*(E|(E)e(e)|(A|(E)e(A))(A|e|(B)e(A))*(B|(B)e(e)))
(e = epsilon, E = the empty set)
instead of : (A|B)*AB
The "Transitive closure method" returns nearly the same result.
One of the solutions consists of minimizing the automaton, but i think it's too heavy to generate a simplified regular expression.
also, using Java regular expressions methods to simplify a regular expression is not pretty at all :) .
So, it would be nice to try helping me to find a solution.

Yes, it is possible. Take a look at this Peter Norvig article. Oh! There is a second part. The solution is in python, but you can easily adapt it.

Related

Are regex engines of Java and Groovy the same?

Now I am doing some code basing on regex in Groovy. But for creation and testing my regexes I use books referencing to Java regex engine and Java-oriented http://www.regexplanet.com/advanced/java/index.html.
And I am a bit afraid - Is Groovy regex engine really the same as Java one? I know that they are very close. But have they some differences nevertheless? If you know the answer - can you kindly give me some reference on the subject?
From the language documentation:
The pattern operator (~) provides a simple way to create a java.util.regex.Pattern instance.
I can't find phrasing where the documentation guarantees this is the regular expression engine used for pattern matching throughout Groovy; I do however find it very, very, very, very unlikely Groovy would use two RE engines in its implementation now or switch the RE engine in the future.
"Because Groovy is based on Java, you can use Java's regular expression package with Groovy. Simply put import java.util.regex.* at the top of your Groovy source code. Any Java code using regular expressions will then automatically work in your Groovy code too."
Source: regular-expressions.info
Here is a groovy example to match regex along with find :
assert ['abc'] == ['def', 'abc', '123'].findAll { it =~ /abc/ }
You can find more examples from here (including above sample), thanks to Mr. Haki.

Whats the right way to implement a custom expression evaluator in Java?

I have a map that contains a number of properties e.g. "a", "b", "c" ...
I want to define a template where i can evaluate an expression such as,
"a" && "b" && !"c" to mean the following,
true if
"a" is in the map, "b" is in the map but "c" is not in the map
false otherwise
Whats a way to implement this in Java? Does JUEL help?
Edit:
To make this clear I need to create a configurable language where you could define any kind of expressions in a configuration file that will need to be evaluated at runtime.
For e.g. I need my java code to parse a file that could contain any expressions such as,
"a" && "b"
!("a" && "d")
I don't know what expressions will need to be evaluated at compile time. Hope this makes the requirement more clear.
Whats a way to implement this in Java?
I would use something like Javacc or Antlr:
read the tutorials for the tool, and study the examples (especially if you have never learned / been taught about grammars and parser generators before)
write a grammar for your simple language
add "actions" to the grammar to evaluate the expressions on the fly
generate the Java classes
You will have a learning curve, but this should be an extremely simple / non-problematic grammar to implement.
Does JUEL help?
Probably not. For a start, anything based on JUEL would accept full EL syntax.
The basic techniques are recursive descent, Dijkstra Shunting-yard algorithm, or parser generation via any of quite a number of systems. If you only need to handle parentheses, 'not', 'and', and 'or', I wouldn't go beyond recursive descent myself.

How do I solve a simple String arithmetic expression such as 5-2*10?

I'm having a lot of trouble trying to do this for some reason. I have a class which wants me to evaluate a complex Java expression such as (3 + 5[3*2-4]), using recursion. I think I have an idea on how I want to approach it, but I can't seem to figure out how to solve something really simple first off - like
5-2*10
I have no clue how to do that. They don't allow you to import any outside scripts, nor are you allowed to convert it to a postfix expression.
I don't expect anybody to write me the code but if anybody could send me off in the right direction or give me a little psuedocode I'd really appreciate it - I've spent like two hours to no avail trying to understand how I could use string tokenizers and other stuff to solve it, but I always run into a wall that I don't know how to get around. Thanks a lot in advance!
Back at university, I had to do this as well.
The approach I took was to parse expressions using recursive descent. The article does a great job of providing an overview of how your parser should tokenize your input, that you can then go on to evaluate. The key you have to realize here, is that making your parser be top-down is going to make your life easier beacuse if you were to start at the bottom of the parse tree with individual characters and then use the rules to connect the characters together into larger tokens as we go is going to require you to maintain a stack and get overly complicated. As appose to this, since you know all you're doing are logical operations, you can firstly assume that your expression matches your production rules and then you can go on to look at the internal logical implications of this assumption.
These Brief Notes on Parsing actually explain very well the difference in implementations if you were to choose to build a top-down parser or a bottom-up parser. However, depending on how complex the expressions your parser needs to handle, you might choose to implement a bottom-up parser, because for all their complexities, bottomup parsing algorithms are more powerful than top-down.
The parser I built was in OCaml, and functional programming turned out to be quite a good solution for this use case.
Please let me know if you have any questions!
You could consecutively reduce sub-expressions (so called "redexes") until no more reductions are possible.
This replacement of inner expressions can be done with regular expressions:
"(\d+)([*/])(-?\d+)"
"(\d+)([+-])(-?\d+)"
"\[(-?\d+)\]"
...
Loops in loops. See Pattern, Matcher, find.
As this seems homework, I leave the further challenge to you.
Negative numbers may be a bit challenging, might try a unary minus operator.
You are asked to implement an expression analyzer.
Here is a summary of how it is usually done. You scan the string from left to right, using the following procedures alternately. Each method consumes some of the text and returns an integer value.
A method to evaluate an integer: It starts with a digit. Collect all the contiguous digits and convert that to a value.
A method to evaluate a factor: If it starts with a digit, evaluate an integer. If it starts with a '(', evaluate an expression. If not, it is an error.
A method to evaluate a term: Evaluate a factor. While the next character is a * or a /, skip it, evaluate an additional factor, and multiply or divide the previous value by the new value.
A method to evaluate a sum: Evaluate a term. While the next character is a + or a -, skip it, evaluate an additional term and add or subtract the new value to the previous value.
A method to evaluate an expression: It starts with a '('. Skip it and evaluate a sum. If the next character is a ')', skip it. If not, it is an error.
For simple String operations, like "5-2*10", one solution is:
import javax.script.ScriptEngineManager;
import javax.script.ScriptEngine;
import javax.script.ScriptException;
I don't want to consider the idea of not importing.
Basically, this is the code for the "Calculate" button, assuming that the expression 5-2*10 was introduced in a TextField:
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByName("JavaScript");
String r = jTextField1.getText();
try {
jTextField1.setText(engine.eval(r).toString());
} catch (ScriptException ex) {
Logger.getLogger(MegaCal.class.getName()).log(Level.SEVERE, null, ex);
}

Should we use regular expression in Java?

I know regular expressions are very powerful, and to become an expert with them is not easy.
One of my colleagues once wrote a java class to parse formatted text files. Unfortunately it caused a StackOverFlowError in the first integration test. It seems difficault to find the bug, before another colleague from structural programming world came over and fixed it quickly by thowing away all regular expressions and instead using many nested conditional statements and many split and trim methods, and it works very well!
Well, why do we need regular expression in a programming language like Java? As far as I know, the only necessary usage of regular expression is the find/replace function in text editors.
Like everything else: Use with care and KISS
I use regexes quite often, but I don't go over the top and write a 100 character regex, because I know that I (personally) won't understand it later... in fact I think my limit is about 30-40 characters, something larger than that makes me spend too much time scratching my head.
Anything that can be expressed as a regular expression can, by definition, be expressed as a chain of IFs. You use REGEX basically for two reasons:
REGEX libraries tends to have optimized implementation that most of the time will be better than a hand-coded "IF" chain for some expressions.
REGEX are usually easier to follow, if properly written, than the IF chains. Specially for more complex expressions.
If your expression gets too complex, the use the advice given by this answer. If it get truly nasty, think about learning how to use a parser generator like ANTLR or JavaCC. A simple grammar usually can replace a regex, and it is a lot easier to maintain.
So the multiple nested conditional statements with many split and trim methods are easier for you to debug than a single line or two with regular expressions?
My preference is regular expressions because once you learn them, they are far more maintainable and far easier to read than parsing huge nested if loops.
If you find that a regular expression would get too complex and unmaintable, use code instead. Regular expressions can get very complex even for things that sound very simple at first. For example validation of dates in the format mm/dd/yy[yy] is as "simple" as:
^(((((((0?[13578])|(1[02]))[\.\-/]?((0?[1-9])|([12]\d)|(3[01])))|(((0?[469])|(11))[\.\-/]?((0?[1-9])|([12]\d)|(30)))|((0?2)[\.\-/]?((0?[1-9])|(1\d)|(2[0-8]))))[\.\-/]?(((19)|(20))?([\d][\d]))))|((0?2)[\.\-/]?(29)[\.\-/]?(((19)|(20))?(([02468][048])|([13579][26])))))$
Nobody can maintain that. Manually parsing the date will need more code but can be much more readable and maintainable.
Regular expressions are very powerful and useful for matching TEXT patterns, but are bad for validation with numeric parts like dates.
As always, you should use the best tool for the job. I would define the "best tool" by the most simple, understandable, effective method that fulfills the requirements.
Often regexes will simplify code and make it more readable. But this is not always the case.
Also, I would not jump to conclusions that regexes caused the StackOverflowError.
Regular expressions are a tool (like many others). You should use it when the work to be done could best be done with that tool. To know which tool to use, it helps ask a question like "When could I use regular expressions?". And of course it will become easier to decide which tool to use when you have many different tools in your toolbox and you know them fairly well.
You can use regex cleverly by spliting those into smaller chunks, something like,
final String REGEX_SOMETHING = "something";
final String REGEX_WHATEVER = "whatever";
..
String REGEX_COMPLETE = REGEX_SOMETHING + REGEX_WHATEVER + ...
Regular expressions can be easier to read, but they can also be too complicated. It depends on the format of data you want to match.
The Java RE implementation still has some quirks, with the effect that some quite simple expressions (like '((?:[^'\\]|\\.)*)') cause a stack overflow when matching longer strings. So make sure you test with real life data (and more extreme examples, too) - or use a regex engine with a different implementation (there are several ones, also as Java libraries).
Regular expression is very powerful in looking for patterns in the content. You can certainly avoid using regular expression and rely on the conditional statements, but you will soon notice that it takes many lines of code to accomplish the same task. Using too many nested conditional statements increases the cyclomatic complexity of your code, as a result, it becomes even more difficult to test because there are too many branches to test. Further, it also makes the code difficult to read and understand.
Granted, your colleague should have written testcases to test his regular expressions first.
There's no right or wrong answer here. If the task is simple, then there's no need to use regular expression. Otherwise, it is nice to sprinkle a little regular expressions here and there to make your code easy to read.

distinguishing a string with flex

I need to tokenize some strings which will be splitted of according to operators like = and !=. I was successful using regex until the string has != operator. In my case, string was seperated into two parts, which is expected but ! mark is in the left side even it is part of given operator. Therefore, I believe that regex is not suitable for it and I want to benefit from lex. Since I do not have enough knowledge and experience with lex, I am not sure whether it fits my work or not. Basically, I am trying to do replace the right hand side of the operators with actual values from other data. Do you people think that can it be helpful for my case?
Thanks.
Should you use lex? It depends how complex your language is. It's a very powerful tool, worth understanding (especially with yacc, or in Java you could use antlr or javacc).
public String[] split(String regex) does take a regex, not just a string. You could use the regex "!?=", which means zero or one ! followed by =. But the problem with using split is that it won't tell you what the actual delimiter was.
With what little info we have about your application, I'd be tempted to use regular expressions. There are lots of experts here on stackoverflow to help. A great place to start is the Java regex tutorial.
(Thanks to Falle1234 for picking up my mistake - now corrected.)

Categories