java comparing two Pattern objects

java comparing two Pattern objects - java

Is there an easy way to compare two Pattern objects?
I have a Pattern which compiled using the regex "//" to check for comments in a code.
Since there are several regex to describe comments, I want to find a way to difference them.
How can it be done? the Pattern class does not implements the equals method.

You can compare Pattern objects by comparing the result of calling pattern() or toString but this doesn't do what you want (if I understand your question correctly). Specifically, this compares the strings that were passed to the Pattern.compile(...) factory method. However, this takes no account of flags passed separately to the pattern string.
There is no simple way to test if two non-identical regexes are equivalent. For example ".+" and "..*" represent equivalent regexes, but there is no straight-forward way to determine this using the Pattern API.
I don't know if the problem is theoretically solvable ... in the general case. #Akim comments:
There is no finite axiomatization to regex equivalence, so the short answer is "this is not doable by tree transformations of the regexes themselves". However one can compare the languages of two automata (test their equality), so one can compute whether two regexes are equivalent. Note that I'm referring to the "genuine" regexes, with no extensions such as back-references to capture groups, which escape the realm of rational languages, i.e., that of automata.
I also want to comment on the accepted answer. The author provides some code that he claims shows that Pattern's equals method is inherited from Object. In fact, the output he is seeing is consistent with that ... but it doesn't show it.
The correct way to know if this is the case is to look at the javadoc ... where the equals method is listed in the list of inherited methods. That is definitive.
So why doesn't the example show what the author says it shows?
It is possible for two methods to behave the same way, but be implemented differently. If we treat the Pattern class as a black box, then we cannot show that this is not happening. (Or at least ... not without using reflection.)
The author has only run this on one platform. Other platforms could behave differently.
On the second point, my recollection is that in the earlier implementation of Pattern (in Java 1.4) the Pattern.compile(...) methods kept a cache of recently compiled pattern objects1. If you compiled a particular pattern string twice, the second time you might get the same object as was returned the first time. That would cause the test code to output:
true
true
true
true
But what does that show? Does it show that Pattern overrides Object.equals? No!
The lesson here is that you should figure out how a Java library method behaves primarily by looking at the javadocs:
If you write a "black box" test, you are liable to draw incorrect conclusions ... or at least, conclusions that may not be true for all platforms.
If you base your conclusions on "reading the code", you run the risk of drawing conclusions that are invalid for other platforms.
1 - Even if my recollection is incorrect, such an implementation would be consistent with the javadocs for the Pattern.compile(...) methods. They do not say that each compile call returns a new Pattern object.

Maybe I do not fully understand to the question. But as you can see in the following example, there is a default java.lang.Object.equals(Object) method for every Java Object. This method compares the references to the objects, i.e. uses the == operator.
package test;
import java.util.regex.Pattern;
public class Main {
private static final Pattern P1 = Pattern.compile("//.*");
private static final Pattern P2 = Pattern.compile("//.*");
public static void main(String[] args) {
System.out.println(P1.equals(P1));
System.out.println(P1.equals(P2));
System.out.println(P1.pattern().equals(P1.pattern()));
System.out.println(P1.pattern().equals(P2.pattern()));
}
}
Outputs:
true
false
true
true

For mysterious reasons, the Pattern object doesn't implement equals(). For example, this simple unittest will fail:
#Test
public void testPatternEquals() {
Pattern p1 = Pattern.compile("test");
Pattern p2 = Pattern.compile("test");
assertEquals(p1, p2); // fails!
}
The most common workaround for this seems to be to compare the string representations of the Pattern objects (which returns the String used to create the Pattern):
#Test
public void testPatternEquals() {
Pattern p1 = Pattern.compile("test");
Pattern p2 = Pattern.compile("test");
assertEquals(p1.toString(), p2.toString()); // succeeds!
}

Pattern doesn't but String does. Why not just compare the regex from which the Patterns were compiled?

I know automata may solve your problem. But that maybe complicated.
Roughly, you should compare pattern.pattern() and pattern.flags() at-least, though it‘s not enough to decide whether two regex are equivalent or not.

You can compare string representations from which patterns have been made:
Pattern p1 = getPattern1();
Pattern p2 = getPattern2();
if (p1.pattern().equals(p2.pattern())){
// your code here
}

I think I get the idea of the question and since I searched for ways to compare Patterns I end up here (two years too late probably, well, sorry...).
I'm writing tests and I need to know if a method of mine returns the expected pattern. While the text via toString() or pattern() might be the same, the flags can be different and the result when using the pattern would be unexpected.
A while ago I wrote my own general implementation of toString(). It collects all fields including the private ones and constructs a string that can be used for logging and apparently for testing. It showed that fields root and matchRoot were different when compiling two equal patterns. Assuming that those two aren't that relevant for equality and since there is a field flag, my solution is quite good if not perfect.
/**
* Don't call this method from a <code>toString()</code> method with
* <code>useExistingToString</code> set to <code>true</code>!!!
*/
public static String toString(Object object, boolean useExistingToString, String... ignoreFieldNames) {
if (object == null) {
return null;
}
Class<? extends Object> clazz = object.getClass();
if (useExistingToString) {
try {
// avoid the default implementation Object.toString()
Method methodToString = clazz.getMethod("toString");
if (!methodToString.getDeclaringClass().isAssignableFrom(Object.class)) {
return object.toString();
}
} catch (Exception e) {
}
}
List<String> ignoreFieldNameList = Arrays.asList(ignoreFieldNames);
Map<String, Object> fields = new HashMap<String, Object>();
while (clazz != null) {
for (Field field : clazz.getDeclaredFields()) {
String fieldName = field.getName();
if (ignoreFieldNameList.contains(fieldName) || fields.containsKey(fieldName)) {
continue;
}
boolean accessible = field.isAccessible();
if (!accessible) {
field.setAccessible(true);
}
try {
Object fieldValue = field.get(object);
if (fieldValue instanceof String) {
fieldValue = stringifyValue(fieldValue);
}
fields.put(fieldName, fieldValue);
} catch (Exception e) {
fields.put(fieldName, "-inaccessible- " + e.getMessage());
}
if (!accessible) {
field.setAccessible(false);
}
}
// travel upwards in the class hierarchy
clazz = clazz.getSuperclass();
}
return object.getClass().getName() + ": " + fields;
}
public static String stringifyValue(Object value) {
if (value == null) {
return "null";
}
return "'" + value.toString() + "'";
}
And the test is green:
String toString1 = Utility.toString(Pattern.compile("test", Pattern.CASE_INSENSITIVE), false, "root", "matchRoot");
String toString2 = Utility.toString(Pattern.compile("test", Pattern.CASE_INSENSITIVE), false, "root", "matchRoot");
assertEquals(toString1, toString2);

To determine whether two Pattern objects are equivalent, the simplest thing to do is to compare the actual string pattern and the flags used to create that pattern:
boolean isPatternEqualToPattern(final Pattern p1, final Pattern p2) {
return p1.flags() == p2.flags() &&
p1.pattern().equals(p2.pattern());
}

Although the other answers might solve the problem, I do not think they are the real answer to the problem.
If you really want to compare two patterns you essentially want to compare two regular languages.
To do this, cs stackexchange has already posted a solution:
https://cs.stackexchange.com/questions/12876/equivalence-of-regular-expressions
A fast method to check the equivalence of regular languages is the Hopcroft and Karp algorithm (HK).
Here is a java implementation of the algorithm:
http://algs4.cs.princeton.edu/65reductions/HopcroftKarp.java.html

Related

Java 8 functional interfaces verses functions

Suppose I have an application that needs to apply several custom transformation on strings. The needs will grow by time. The following two approaches do exactly the same thing, but I am wondering which one is more beneficial in the long run. Are they the same? Or, does one offer more benefits than the other as the number of transforms increase and vary?
Suppose we have these:
public static final String PL = "(";
public static final String PR = ")";
public static final String Q1 = "'";
Here is each approach's setup and usage.
Approach 1:
#FunctionalInterface
public interface StringFunction {
String applyFunction(String s);
}
public class StrUtils {
public static String transform(String s, StringFunction f) {
return f.applyFunction(s);
}
public static String putInQ1(String s) {
return Q1.concat(s).concat(Q1);
}
public static String putInParens(String s) {
return PL.concat(s).concat(PR);
}
// and so on...
}
Which I would use like this:
System.out.println(StrUtils.transform("anSqlStr", StrUtils::putInQ1));
System.out.println(StrUtils.transform("sqlParams", StrUtils::putInParens));
Approach 2:
Here, I use straightforward Function:
Function<String, String> putInQ1 = n -> Q1.concat(n).concat(Q1);
Function<String, String> putInParens = n -> PL.concat(n).concat(PR);
// and so on...
Which I would use like this:
System.out.println(putInQ1.apply("anSqlStr");
System.out.println(putInParens.apply("sqlParams");

You sketched two ways of offering a certain functionality
The first one is to explicitly offer it as a method
public static String putInQ1(String s) {
return Q1.concat(s).concat(Q1);
}
which is supposed to be used via a method reference.
The second one is to offer it as a Function object:
Function<String, String> putInQ1 = n -> Q1.concat(n).concat(Q1);
(Here, you did not say where these instances should be located. I assume that you would also create a class that contained all these Function instances as (possibly public static final fields)
JBNizet mentioned a third option: You could use the methods directly, and not via method references. Indeed, the purpose of the transform function is not entirely clear. The only justification for this would be that you want to pass in arbitrary method references there, but these method references would just be Function objects - like in the second approach...
However, in a technical sense, the difference is not so large. Just to illustrate the point: Both approaches can trivially be converted into each other! The method can be implemented based on the function object
public static String putInQ1(String s) {
return putInQ1.apply(s);
}
And a function object can be created from the method reference:
Function<String, String> putInQ1 = StringUtils::putInQ1;
So the main question may be: How do you want to offer this functionality to the user of your library?
For this, consider the use case the you have an input string, and want to put it into ( parentheses ), and the result into ' single quotes ':
String doItWithMethodReferences(String input) {
String result = input;
result = StrUtils.transform(result, StrUtils::putInParens);
result = StrUtils.transform(result, StrUtils::putInQ1);
return result;
}
String doItWithFunctionObjects(String input) {
String result = input;
result = StringFunctions.putInParens.apply(result);
result = StringFunctions.putInQ1.apply(result)
return result;
}
String doItWithMethods(String input) {
String result = input;
result = StrUtils.putInParens(result);
result = StrUtils.putInQ1(result);
return result;
}
You can see that there is hardly a difference between the approaches that would qualify one of them as "better" or "worse" than the other in terms of readability, except for the obvious fact that the last one is simpler than the first one by avoiding the unnecessary transform calls.
Of course, each of these methods could be written "more compactly", in a single line. But depending on the number and the structure of the operations, this could severely reduce the readability, and in fact, this leads to another point: I could imagine that extensibility may something to consider. Imagine you wanted to create a single operation that placed a string into '( single quotes and parentheses )' at once.
With methods:
public static String putInPandQ1(String s) {
return putInQ1(putInParens(s));
}
With functions:
Function<String, String> putInPandQ1 = putInParens.andThen(putInQ1);
I think that the andThen function would be a nice feature that helps to compose more complex string manipulations.
(But taking that arbitrarily far, one has to ask whether you are not actually attempting to implement a template engine or a new domain-specific programming language...)
A short note: All this seems fairly unrelated to performance. Whether you do return s0 + s1; or return s0.concat(s1) will often not matter, and in the few cases where it does matter, you can change the implementation later - because, given the functionality that is sketched in the question, the decision about using + or concat or some StringBuilder trickery is exactly that: An implementation detail.
And another note, as pointed out in the comments: Instead of defining your own StringFunction interface, you could use UnaryOperator<String>. Both are "structurally equal", but the first one is part of the standard API. Imagine that there are already many libraries out there, with methods that expect the standard UnaryOperator<String> as an argument. When you only have instances of your own StringFunction, then you may have to convert these instances so that your code can cooperate with other code. This is trivial, of course, but the interfaces in the functional package are carefully chosen to cover a large range of application cases, and I think that the interoperability between libraries can be greatly increased when programmers don't needlessly create new interfaces that already exist in the standard API. One could argue that the introduction of the StringFunction makes code easier, because it does not need the <String> generic parameter. But if you want this, then you should simply declare the iterface as interface StringFunction extends UnaryOperator<String> { }, which simply is a further specialization, and will keep the compatibility with other code. Additionally, you'll then conveniently inherit all the default methods from Function, like the andThen that I mentined above.

Why not simply define the method 'putInWhatever(String s, String left, String right) {
return left + s + right;
}
with overloaded variants in case left and right are equal. No complicated functional interfaces and lambda's needed

Is there a way in Java to have mutually exclusive options handled by the compiler?

I'm wondering if there's a way to declare a method that takes a String and an Enum, but only require the string to be used if a certain enum is used.
Example:
public enum SearchType {
REGEX,DEFAULT
}
public static List<File> Search(String path, SearchType search, String pattern) {
//do things
}
Ideally, the pattern string field would only be required for the programmer to specify if they used the specific SearchType.REGEX, and if you either forgot it, or included it on a SearchType.DEAFULT, your program would not compile.
As it stands, the programmer would have to pass in an empty string if they're using SearchType DEFAULT, and the the code must check using a couple of if statements for the mutually exclusive options. Right now I have it throwing an IllegalArgumentException if you include a pattern with DEFAULT or forget the pattern on a REGEX, since both of those things indicate the person using this function probably made a mistake.
My questions are:
Is this kind of compile-time parameter checking even possible with Java 8?
is there a more idiomatic/safe/logical way to handle this case?
This sounds like it would be handled by an Interface of some kind if it can even be done.
Thanks!

You should declare two methods: one method is called to do one job, not two. So here, you can declare:
public static List<File> search(String path);
and
public static List<File> search(String path, String pattern);
Your enum SearchType is useless in that case. Remember to add javadoc to your methods and the user won't be confused.

There is no way to do compile-time parameter checking in Java 8.
However, you could write this in a cleaner way with varargs. Something like this:
public static List<File> Search(String path, SearchType search, String... searchArgs) {
if (search == SearchType.REGEX) {
if (searchArgs.length != 1) {
throw new IllegalArgumentException("Improper number of arguments for regex search: Expected 1, got " + searchArgs.length);
}
//Do the search
}
if (search == SearchType.DEFAULT) {
if (searchArgs.length != 0) {
throw new IllegalArgumentException("Improper number of arguments for default search: Expected 0, got " + searchArgs.length);
}
//Do the search
}
}
That way, it can be called like this:
Search("C:\\", SearchType.REGEX, "[a]");
Search("C:\\", SearchType.DEFAULT);
You still won't receive a compile error if you call it the wrong way, though.

What is this design (or anti-) pattern and more importantly is there a better way?

I'm receiving from a webservice a list of key-value pairs, and have inherited the following code:
public String iconValue = null;
... (over 50 class variables assigned in MyObject constructor below)
public MyObject(List<Attribute> attrs) {
String attrName, attrValue;
for (Attribute a : attrs) {
try
{
attrName = a.getName();
attrValue = a.getValue();
if (attrValue == null || "".equals(attrValue.trim()))
continue;
if (ICONS.equals(attrName)) {
//Do something including assignment
this.iconValue = attrValue;
}
else if (URL.equals(attrName))
{
//Do something including assignment
}
else if (...) A giant list of over 50 different attributes hardcoded
{
//Do something including assignment
}
...
So,except for keeping a hashmap - is there a better way than the above to keep hard coded variables within the class and use this "when-if" pattern.
Also,does this pattern have a name?

One way I can think about is to use ENUMs and dynamically dispatch the works to each of the ENUM object, instead of doing a huge if else, esp. since ENUMs can be looked up by their names.
That would be like a strategy pattern.
For example:
Implement an ENUM to have a method doJob() for each of the instances;
Use the valueOf() method to dispatch the works.
Code sample:
public enum Strategies {
URL {
#Override
public void doJob(MyObject mo) {
// do the work
}
},
ICONS {
#Override
public void doJob(MyObject mo) {
// another work
}
};
public abstract void doJob(MyObject mo);
}
And when using it,
try {
Strategies.valueOf(attrName).doJob();
} catch (IllegalArgumentException e) {
// ENUM does not exist, illegal parameter
}

If you want to take a different action for each possible value of attribute, you will end up with something about that verbose, I'm afraid. Some improvements though:
If you are using Java7 or above, you can now use switch statements with Strings (link)
If you are not, you could create an Enum that has a static method that returns an Enum element you could switch on. It's no performance improvement, but it might help with readability of your code.

Does this pattern have a name?
Nope.
In Java 7 you can express that as:
switch (attrName) {
case ICONS:
//Do something including assignment
break;
case URL:
//Do something including assignment
break;
// and so on
}
... provided that ICONS, URL and the other strings are compile-time constants.
That is more concise and more robust. It is also (probably) more efficient because the switch can most likely be implemented using hashing.

I don't think it has a name, but you could call it "using polymorphism wrong" (if type safety is a concern). It depends on whether you have a well defined data contract or not. Is the data you're receiving a proper object, or just "random" data?
If it's a proper object I would create a concrete representation and use something like Dozer (or if you don't want to be tied down wit dependency, roll your own mapper using reflection) to convert between them.
If it's more or less random data, I'd just use a Map, or similar data structure.

Mockito: Stub method with complex object as a parameter

Maybe this is a newbie question, but can't find the answer.
I need to stub a method with Mockito. If the method has "simple" arguments, then I can do it. For example, a find method with two parameters, car color and number of doors:
when(carFinderMock.find(eq(Color.RED),anyInt())).thenReturn(Car1);
when(carFinderMock.find(eq(Color.BLUE),anyInt())).thenReturn(Car2);
when(carFinderMock.find(eq(Color.GREEN), eq(5))).thenReturn(Car3);
The problem is that the find argument is a complex object.
mappingFilter = new MappingFilter();
mappingFilter.setColor(eq(Color.RED));
mappingFilter.setDoorNumber(anyInt());
when(carFinderMock.find(mappingFilter)).thenReturn(Car1);
This code does not work. The error is "Invalid use of argument matchers! 1 matchers expected, 2 recorded".
Can't modify the "find" method, it needs to be a MappingFilter parameter.
I suppose that I have to do "something" to indicate Mockito that when the mappingFilter.getColor is RED, and mappingFilter.getDoorNumber is any, then it has to return Car1 (and the same for the other two sentences).
But how?

Use a Hamcrest matcher, as shown in the documentation:
when(carFinderMock.find(argThat(isRed()))).thenReturn(car1);
where isRed() is defined as
private Matcher<MappingFilter> isRed() {
return new BaseMatcher<MappingFilter>() {
// TODO implement abstract methods. matches() should check that the filter is RED.
}
}

Since 2.1.0 Mockito has its own matcher mechanism build on top of org.mockito.ArgumentMatcher interface. This allows to avoid using Hamcrest. Usage is almost of same as with Hamcrest. Keep in mind that ArgumentMatcher is a functional interface and implementation of a matched can be expressed as a lambda expression.
private ArgumentMatcher<SomeObject> isYellow() {
return argument -> argument.isYellow();
}
and then
when(mock.someMethod(argThat(isYellow()).thenReturn("Hurray");

You need to correctly implement equals() method of your MappingFilter. In equals() you should only compare color and not doorNumber .
In simplest form, it should look like this -
#Override
public boolean equals(Object obj) {
MappingFilter other = (MappingFilter) obj;
return other.getColor() == this.getColor();
}
Also, you should form your MappingFilter simply as below instead of using any matcher such as eq
mappingFilter = new MappingFilter();
mappingFilter.setColor(Color.RED);
mappingFilter.setDoorNumber(10); //Any integer

How does object.equals method is supposed to work in Java?

This is my source code. I'trying to implement a simple program that asks a question to a user and expects the answer to be "yes" or "no" and terminates only if the user answer to the question "yes" or "no". The book I have suggested me not to use == comparison and to use the equals method instead, so that the program can understand if the user typed "y e s" instead of "yes". But in this way the result is the same and the method seems to compare the user's answer if it is exactly "yes" or "no". It doesn't accept for example an aswer of "n o". Is that logical for that method? Is it supposed to work that way? How can I change the program to accept answers like "Yes" "ye s" "No" "NO" etc.? I would appreciate your help:)
import acm.program.*;
public class YesNoExample extends ConsoleProgram{
public void run(){
while(true){
String answer = readLine("Would you like instructions? ");
if(askYesNoQuestion(answer)){
break;
}
println("Please answer yes or no.");
}
}
private boolean askYesNoQuestion(String str){
if(str.equals("yes")||str.equals("no")){
return true;
}else{
return false;
}
}
}

If you use == you'll be comparing the references (memory pointers) of two String objects. If you use equals, a custom made method in the String class will be run that does some "intelligent" comparison, in this case, check that the characters are all the same, and the whole thing has the same length.
If you'd like to support mixed case letters, you could use "someString".equalsIgnoreCase("SoMeString") (which will return true). This is done (said roughly) by making both strings lowercase (so the case doesn't matter) and comparing them using equals.
Edit: The other posters made me realize that, in addition to capitalization, you also want to look for String equality where spaces don't matter. If that's the case, a similar trick to turning everything to lowercase applies, where you first remove all the spaces, as #LouisWasserman says in his answer

If you need to fuzzily identify yes/no, first you need exact rules as to what matches. Based on your examples, I can suggest this:
private boolean askYesNoQuestion(String str) {
str = str.replace(" ", "").toUpperCase();
return str.equals("YES") || str.equals("NO");
}
If interested in top performance and not at all in intelligibility, use this:
private static final Pattern p =
Pattern.compile("y\\s*e\\s*s|n\\s*o", Pattern.CASE_INSENSITIVE);
private boolean askYesNoQuestion(String str) {
return p != null && p.matcher(str.trim()).matches();
}

Semantics of == vs .equals()
First off you misunderstand the semantics.
== tests for object identity. A == B says is A a reference to the exact same object as B.
.equals() applies custom logic to test if the objects are equal in some logical manner, without being the exact same object. For this to be implemented correct, both objects should have the same .hashCode() value as well.
Idiomatic Java Solution
Since the String object is final which means it can't be inherited from. You can't override the .equals() on the String object.
What you need to do is preprocess the input into something that can be directly compared to the target value with .equalsIgnoreCase().
One way to do this is use, answer.replaceAll("\\s","") to remove all the whitespace then you can compare it to your target String literal with .equalsIgnoreCase().
A better method to replace askYesNoQuestion() would be:
private boolean isAnswerYesOrNo(final String answer)
{
final String input = answer.replaceAll("\\s","");
return "yes".equalsIgnoreCase(input) || "no".equalsIgnoreCase(input);
}
Comparing a literal to the input parameter will insulate you from NullPointerExceptions if the input parameter happens to be null "yes".equalsIgnoreCase()can never throw aNullPointerException`. This is idiomatic Java.
Get a better book
That book isn't very useful if it really says what you are claiming it says. Also it is teaching you to write lots of code to handle bad input when that is a complete anti-pattern and a well designed program would exit with a verbose explanation of the exact problem was what can be done to fix the input.

With the explanation of == and .equals well described above, here's a two examples of a one liner that does the comparison you want.
if ( Pattern.matches("\\s*[yY]\\s*[eE]\\s*[sS]\\s*", input) ) {
// do something
}
if ( input.replaceAll("\\s", "").equalsIgnoreCase("yes") ) {
// do something
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.