text wrongly matchs with sub string of words in group - java

I want to check the text to see if it starts with what or who and and is a question type, so for that I wrote the following code:
private static void startWithQOrIf(String commentstr){
String urlPattern = "(|who|what).*\\?.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
everything works good but for example when I try:
whooooooooo is the follower?
will match as well but should not because I am looking for who not whooooooooo
Any idea?

You can ensure a whole word using a word boundary \b:
(|who|what)\\b.*\\?.*$
^^
If the words in the alternation group are supposed to appear at the start of the string, you can just use matches and remove $ anchor:
String urlPattern = "(|who|what)\\b.*\\?.*";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.matches()) { // < - Here, matches is used
System.out.println("yes");
}
Note that (|who|what) matches either an empty string, or who, or what. If you do not plan to allow empty string, use just (who|what).

You must use word boundaries.
String urlPattern = "\\b(who|what)\\b.*\\?.*$";

Related

No match for Java Regular Expression

I am running into an issue where my code is unable to find regex occurrences. Code:
String content = "This\ is\ an\ example.=This is an example\nThis\ is\ second\:=This is second"
String regex = "\"^.*(?=\\=)\"gm";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(content);
List<String> mKeys = new ArrayList<>();
while (m.find()) {
mKeys.add(m.group());
}
mKeys turns out to be empty. I have already validated my regex here https://regex101.com/r/YResRc/3. I am expecting the list to contain two keys from the content.
Your content contains no " quotes, and no text gm, so why would you expect that regex to match?
FYI: Syntaxes like "foo"gm or /foo/gm are something other languages do for regex literals. Java doesn't do that.
The g flag is implied by the fact that you're using a find() loop, and m is the MULTILINE flag that affects ^ and $ and you can specify that using the (?m) pattern, or by adding a second parameter to compile(), i.e. one of these ways:
Pattern p = Pattern.compile("foo", Pattern.MULTILINE);
Pattern p = Pattern.compile("(?m)foo");
Your regex should simply be:
(?m)^.*(?==)
which means: Match everything from the beginning of a line up to the last = sign on the line.
Test
String content = "This is an example.=This is an example\nThis is second:=This is second";
String regex = "(?m)^.*(?==)";
Matcher m = Pattern.compile(regex).matcher(content);
List<String> mKeys = new ArrayList<>();
while (m.find()) {
mKeys.add(m.group());
}
System.out.println(mKeys);
Output
[This is an example., This is second:]

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

Regular expression 30 numbers + space + hypen + space+ 1 number

I have this code to find this pattern: 201409250200131738007947036000 - 1 ,inside the text
final String patternStr = "(\\d{30} - \\d{1})";
final Pattern p = Pattern.compile(patternStr);
final Matcher m = p.matcher(page);
if (m.matches()) {
System.out.println("SUCCESS");
}
But for any strange reasson in Java did't work, Can somebody help me where is the error please?
The reason is that the matches method checks for the entire given string to match the regex.
So i.e. if your string is 123456123412345612341234561234 - 8 it will match, if it is my number 123456123412345612341234561234 - 8 is inside other text it won't.
Use the find method to accomplish your task:
if (m.find()) {
System.out.println("SUCCESS");
}
It will search inside the given string instead of attempting to match the entire string.
From the documentation for Matcher, matches:
Attempts to match the entire region against the pattern.
As opposed to find which:
Attempts to find the next subsequence of the input sequence that matches the pattern.
So use matches to match an entire String against a pattern, use find to locate a pattern inside a String.
Try:
final String patternStr = "\\d{30}+\\s-\\s\\d";
final Pattern p = Pattern.compile(patternStr);
final Matcher m = p.matcher(page);
while (m.find()) {
System.out.printf("FOUND A MATCH: %s%n", matcher.group());
}
I edited your pattern slightly to make it more robust. This will print each match that it finds.

Pattern matching with string containing dots

Pattern is:
private static Pattern r = Pattern.compile("(.*\\..*\\..*)\\..*");
String is:
sentVersion = "1.1.38.24.7";
I do:
Matcher m = r.matcher(sentVersion);
if (m.find()) {
guessedClientVersion = m.group(1);
}
I expect 1.1.38 but the pattern match fails. If I change to Pattern.compile("(.*\\..*\\..*)\\.*");
// notice I remove the "." before the last *
then 1.1.38.XXX fails
My goal is to find (x.x.x) in any incoming string.
Where am I wrong?
Problem is probably due to greedy-ness of your regex. Try this negation based regex pattern:
private static Pattern r = Pattern.compile("([^.]*\\.[^.]*\\.[^.]*)\\..*");
Online Demo: http://regex101.com/r/sJ5rD4
Make your .* matches reluctant with ?
Pattern r = Pattern.compile("(.*?\\..*?\\..*?)\\..*");
otherwise .* matches the whole String value.
See here: http://regex101.com/r/lM2lD5

Java RegExp to get variable image name with its extension from javascript

I am trying to get the image name from the following javascript.
var g_prefetch ={'Im': {url:'\/az\/hprichbg\/rb\/WhiteTippedRose_ROW10477559674_1366x768.jpg', hash:'674'}
Problem:
The name of the image is variable. That is, in the above example code the image changes regularly.
Output I want:
WhiteTippedRose_ROW10477559674_1366x768.jpg
and i tried the following regExp :
Pattern p = Pattern.compile("\{\'Im\'\: \{url\:\'\\\/az\\\/hprichbg\\\/rb\\\/(.*?)\.jpg\'\, hash\:\'674\'\}");
//System.out.println(p);
Matcher m=p.matcher(out);
if(m.find()) {
System.out.println(m.group());
}
I don't know too much RegExp so please help me and let me understand the approach.
Thank You
I would use the following regex, it should be fast enough:
Pattern p = Pattern.compile("[^/]+\\.jpg");
Matcher m = p.matcher(str);
if (m.find()) {
String match = m.group();
System.out.println(match);
}
This will match the a full sequence of characters ending with .jpg not including /.
I think that the correct approach will be to check the correct legality of a file name.
Here is a list of not legal characters for Windows: "\\/:*?\"<>|"
for Mac /:
Linux/Unix /;
Here is more complex example assuming format will change , it is mostly designed for legal Window file name:
String s = "{'Im': {url:'\\/az\\/hprichbg\\/rb\\/?*<>WhiteTippedRose_ROW10477559674_1366x768.jpg', hash:'674'}";
Pattern p = Pattern.compile("[^\\/:*?\"<>|]+\\.jpg");
Matcher m = p.matcher(s);
if (m.find()) {
String match = m.group();
System.out.println(match);
}
This will still print
WhiteTippedRose_ROW10477559674_1366x768.jpg
Here you may find a demo
Assuming that the image is always placed after a / and does not contain any /, you can use the following:
String s = "{'Im': {url:'\\/az\\/hprichbg\\/rb\\/WhiteTippedRose_ROW10477559674_1366x768.jpg', hash:'674'}";
s = s.replaceAll(".*?([^/]*?\\.jpg).*", "$1");
System.out.println("s = " + s);
outputs:
s = WhiteTippedRose_ROW10477559674_1366x768.jpg
In substance:
.*? skip the beginning of the string until the next pattern is found
([^/]*?\\.jpg) a group like "xxx.jpg" where xxx does not contain any "/"
.* rest of the string
$1 returns the content of the group
If the String is always of this form, I would simply do:
int startIndex = s.indexOf("rb\\/") + 4;
int endIndex = s.indexOf('\'', startIndex);
String image = s.substring(startIndex, endIndex);

Categories