Why I must specify whole string in Java regular expression? [duplicate] - java

This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 6 years ago.
Suppose, I have a string:
String str = "some strange string with searched symbol";
And I want to search in it some symbols, suppose it will be "string". So we have a following:
str.matches("string"); //false
str.matches(".*string.*"); //true
So, as stated in the title, why I must specify whole string in Java regular expression?
Java documentation says:
public boolean matches(String regex)
Tells whether or not this string matches the given regular expression.
It doesn't says
Tells whether or not this whole string matches the given regular expression.
For example, in the php it would be:
$str = "some strange string with searched symbol";
var_dump(preg_match('/string/', $str)); // int(1)
var_dump(preg_match('/.*string.*/', $str)); //int(1)
So, both of the regex's will be true.
And I think this is correct, because if I want to test whole string I would do str.matches("^string$");
PS: Yes, I know that is to search a substring, simpler and faster will be to use str.indexOf("string") or str.contains("string"). My question regards only to Java regular expression.
UPDATE: As stated by #ChrisJester-Young (and #GyroGearless) one of the solutions, if you want to search regex that is part of a subject string, is to use find() method like this:
String str = "some strange string with searched symbol";
Matcher m = Pattern.compile("string").matcher(str);
System.out.println(m.find()); //true

matches always matches the whole input string. If you want to allow substrings to match, use find.

As the documentation you suggest,
public boolean matches(String regex)
Tells whether or not this string matches the given regular expression.
What it means is whether that string matches with the given regex. i.e. matches verifies whether your string is an instance of the given regex.
not whether it contains a substring which is an instance of the given regex. As Chris suggested you can use find instead or for your problem you can use contains.

Yes, as you already know (now) that matches() will return true only for the complete string.
But, in your update, you have stated that:
As stated by #ChrisJester-Young (and #GyroGearless) the only solution, if you want to search regex that is part of a subject string, is to use find() method...
I would like to say that using find() is not the only solution. There is at least one more, which I know :
String str = "some strange string with searched symbol";
boolean found = str.split("string").length>1;
System.out.println(found);
This prints true. And this will work for all regular expressions. Though this is not the way to do it, and is instead a hack.
There may be many more solutions.

Related

Regular expression to match whole list as well as its parts [duplicate]

This question already has answers here:
Java Regex: repetitive groups?
(3 answers)
Closed 2 years ago.
In Java I have a string like +aba,biba,-miba, which is a list to sort orders. There might be any number of parts. "aba" "biba", "miba" are just examples.
I would like to make a regular expression, which finds +/- and aba, biba, miba.
I would also like to check if a full string matches the syntax. Which means, that I need to find +aba,biba,-miba as well.
I managed to write regex for the first part:
([+-]?)([^,]*)[,]?
How should I complete the expression that I can get 2nd part out of it as well?
Depending on the complexity of the list, i.e. what could be part of it, a regex to check the entire list would be quite straight forward. This regex could contain a group that represents each part as well as a quantifier but you wouldn't be able to extract the all the parts from a single regex as Java's implementation isn't built that way. Thus you'd need to either use a simple split() to get the parts or a second regex to extract them.
Assuming your list is separated by comma, doesn't contain whitespace and only allows +/- as well as lower-case characters you could use the following expression to check the format of the list:
boolean listMatches = list.matches("^([+-]?[a-z]+(,(?!$))?)*$");
Note that String.matches() makes ^ and $ superfluous but I added them for completeness in case you use another method to apply the expression. This basically checks for any number ob lower-case "names" preceded by an optional + or - and followed by a comma if it isn't the last character in the string.
Note that this would allow for a empty lists as well. If the list must contain at least one element you might use something like this:
boolean listMatches = list.matches("^[+-]?[a-z]+(,[+-]?[a-z]+)*$");
Looking for the parts could then look like this:
Pattern partPattern = Pattern.compile("([+-]?)([a-z]+)");
Matcher partMatcher = partPattern.matcher(list);
while( partMatcher.find() ) {
String direction = partMatcher.group(1);
String name = partMatcher.group(2);
}
Note that this could also be done with a combination of list.split(","), list.charAt(0) and list.subString(1,list.length()) - it's up to you :)

Java regular expression match end doesn't work [duplicate]

This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 2 years ago.
I want to match the last character "c". What am I doing wrong? In the documentation it's clearly explained that $ matches the end of the line, i have used regex milions of time on unix shell ecc... and always worked as expected but in java no.
String string = "abc";
if(string.matches("c$")){//I know that .*c$ will work.
System.out.println("yes");//This is never printed
}
Where is the error?
I know that .*c$ will work, but by reading the javadoc I can't find this information.
Can some one tell me how do I interpret what is the meaning of this java official tutorial?
https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
or this?
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#sum
Am I not able to read? Because it seems pretty obvious, but I really can't find the solution, I fell really retarded in this moment!
Under java.lang.String there is a method who clearly say the following:
matches(String regex)
Tells whether or not this string matches the given regular expression.
This will print YES.
String line = "abc";
Pattern pattern = Pattern.compile("c$");
Matcher matcher = pattern.matcher(line);
System.out.println(matcher.find() ? "YES" : "NO");

Is there any java function act like 'LIKE' statement in SQL?

I want to find a function in java that can check if string contain pattern "%A%B%" just like 'LIKE' statement in SQL. This function will return true if the string contain the pattern and false if not.
Can anyone suggest any class, function or line of code? Thank you!
Regular expression. Learn more here: https://docs.oracle.com/javase/tutorial/essential/regex/
The easiest way of calling it is using String.matches(String regex)
If you want to check the same regular expression more often, it's better to precompile it and use a Pattern.
A typical invocation sequence is then
Pattern p = Pattern.compile(".*A.*B.*"); // you keep this stored for re-use
Matcher m = p.matcher("BARBARIAN");
boolean b = m.matches();
There is a good Online Regex Tester and Debugger tool, where you can check your regular expression.
Pattern.compile(".*A.*B.*").matches(input)
will return true if input contains an A followed by a B.

Replace all with a string having regex wild chars [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
java String.replaceAll without regex
I have a string and I need to replace some parts of it.
The replacement text contains regex wild chars though. Example:
String target = "Something * to do in ('AAA', 'BBB')";
String replacement = "Hello";
String originalText = "ABCDEFHGIJKLMN" + target + "ABCDEFHGIJKLMN";
System.out.println(originalText.replaceAll(target, replacement));
I get:
ABCDEFHGIJKLMNSomething * to do in ('AAA', 'BBB')ABCDEFHGIJKLMN
Why doesn't the replacement occur?
Because *, ( and ) are all meta-characters in regular expressions. Hence all of them need to be escaped. It looks like Java has a convenient method for this:
java.util.regex.Pattern.quote(target)
However, the better option might be, to just not use the regex-using replaceAll function but simply replace. Then you do not need to escape anything.
String.replaceAll() takes a regular expression and so it's trying to expand these metacharacters.
One approach is to escape these chars (e.g. \*).
Another would be to do the replacement yourself by using String.indexOf() and finding the start of the contained string. indexOf() doesn't take a regexp but rather a normal string.

Why the second argument is not being taken as regex?

I came across an interesting question on java regex
Is there a regular expression way to replace a set of characters with another set (like shell tr command)?
So I tried the following:
String a = "abc";
a = a.replaceAll("[a-z]", "[A-Z]");
Now if I get print a the output is
[A-Z][A-Z][A-Z]
Here I think the compiler is taking the first argument as gegex, but not the second argument.
So is there any problem with this code or something else is the reason???
This is the way replaceAll works.
See API:
public String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
The answer to the linked question is a quite clear »No«, so this should come as no surprise.
As you can see from the documentation the second argument is indeed a regular string that is used as replacement:
Parameters:
regex – the regular expression to which this string is to be matched
replacement – the string to be substituted for each match
second argument is simple String that will get substituted according to API
If you want to turn lower case to upper case, there is a toUpperCase function available in String class. For equivalent functionality to tr utility, I think there is no support in Java (up to Java 7).
The replacement string is usually take literally, except for the sequence $n where n denotes the number of the capturing group in the regex. This will use captured string from the match as replacement.
I consider regex as a way to express a condition (i.e does a given string match this expression). With that in mind, what you are asking would mean "please replace what matches in my string with ... another condition" which doesn't make much sens.
Now by trying to understand what you are looking for, it ssems to me that you want to find some automatic mapping between classes of characters (e.g. [a-z] -> [A-Z]). As far as I know this does not exist and you would have to write it yourself (except for the forementionned toUpperCase())
public String replaceAll(String regex, String replacement)
First argument is regular expression if substring matches with that pattern that will be replaced by second argument ,if you want to convert to lowercase to upper case use
toUpperCase()
method
You should look into jtr. Example of usage:
String hello = "abccdefgdhcij";
CharacterReplacer characterReplacer;
try {
characterReplacer = new CharacterReplacer("a-j", "Helo, Wrd!");
hello = characterReplacer.doReplacement(hello);
} catch(CharacterParseException e) {
}
System.out.println(hello);
Output:
Hello, World!

Categories