Java Regex not working as desired - java

I created this regex, but somehow it only detects the first part
of the regex not the last part. I would like to know what is going on?
Here's the code:
String m = -2√3254i/18.5
String regex = "-?\\d+(\\.\\d*)?\\√\\d+(\\.\\d*)?i\\/\\d+(\\.\\d*)?"
I have tried many different ways, such as:
-?\\d+(\\.\\d*)?\\√\\d+(\\.\\d*)?i+\\/+\\d+(\\.\\d*)?
-?\\d+(\\.\\d*)?\\√\\d+(\\.\\d*)?i/\\d+(\\.\\d*)?
-?\\d+(\\.\\d*)?\\√\\d+(\\.\\d*)?i\\(\\/\\d+(\\.\\d*))?
none of them work.
the output is always
-2√3254
Any suggestions,
thank you

Okay so my regex is actually composed of many regexes:
String regex = "a regex | another regex | -?\\d+(\\.\\d*)?\\√\\d+(\\.\\d*)?\\i"
+ "|another regex | -?\\d+(\\.\\d*)?\\√\\d+(\\.\\d*)?i\\/\\d+ (\\.\\d*)?
The problem happens between both regexes shown. The first symbolical regex is picked up by the matcher first, but I really intended for the second symbolical regex to pick up my String m = "-2√32454i/18.5"
It seems the matcher exits matching when one of the boolean conditions is met.
All I had to do was rearrange the order of my regexes which make up my regex.

Related

How to remove the # in a string using Pattern in java

I need to remove a part of the string which starts with #.
My sample code works for one string and fails for another.
Failed one: Not able to remove #news4buffalo:
String regex = "\\#\\w+ || #\\w*";
String rawContent = "RT #news4buffalo: Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
rawContent = rawContent.replaceAll(regex, "");
}
Success one:
String regex = "\\#\\w+ || #\\w*";
String rawContent = "#ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
rawContent = rawContent.replaceAll(regex, "");
}
Output:
couldn't agree more. Good crowd last night. #LetsGoFish
From your question it looks like this regex can work for you:
rawContent = rawContent.replaceAll("#\\S*", "");
You can try in this way as well.
String s = "#ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
System.out.println(s.replaceAll("#[^\\s]*\\s+", ""));
// Look till space is not found----^^^^ ^^^^---------remove extra spaces as well
The regex is only considering word characters whereas your input String contains a colon :. You can solve this by replacing \\w with \\S (any non-whitespace character) in your regex. Also there is no need for two patterns.
String regex = "#\\S*";
You don't need to escape # so don't add \ before it like "\\#" (it confuses people).
Don't use matcher to check if string contains part which should be replaced and than use replaceAll because you will have to iterate second time. Just use replaceAll at start, and if it doesn't have anything to replace, it will leave string unchanged. BTW. use replaceAll from Matcher instance to avoid recompiling Pattern.
Regex in form foo||bar doesn't seem right. Regex uses only one pipe | to represent OR so such regex represents foo OR emptyString OR bar. Since empty String is kind of special (every string contains empty string at start, and at end, and even in between characters) it can cause some problems like "foo".replaceAll("|foo", "x") returns xfxoxox, instead of for instance "xxx" because consumption of empty string before f prevented it from being used as potential first character of foo :/
Anyway it seems that you would like to accept any #xxxx words so consider maybe something like "#\\w+" if you want to make sure that there will be at least one character after #.
You can also add condition that # must be first character of word (in case you wouldn't want to remove part after # from e-mail addresses). To do this just use look-behind like (?<=\\s|^)# which will check that before # exist some whitespace, or it is placed at start of the string.
You can also remove space after word you wanted to remove (it there is any).
So you can try with
String regex = "(?<=\\s|^)#\\w*\\s?";
which for data like
RT #news4buffalo: Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…
will return
RT : Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…
But if you would also like to remove other characters beside alphabetic or numeric ones from \\w like : you can simply use \\S which represents non-whitespace-characters, so your regex can look like
String regex = "(?<=\\s|^)#\\S*\\s?";

Java // No match with RegExp and square brackets

I have a string like
Berlin -> Munich [label="590"]
and now I'm searching a regular expression in Java that checks if a given line (like above) is valid or not.
Currently, my RegExp looks like \\w\\s*->\\s*\\w\\s*\\[label=\"\\d\"\\]"
However, it doesn't work and I've found out that \\w\\s*->\\s*\\w\\s* still works but when adding \\[ it can't find the occurence (\\w\\s*->\\s*\\w\\s*\\[).
What I also found out is that when '->' is removed it works (\\w\\s*\\s*\\w\\s*\\[)
Is the arrow the problem? Can hardly imagine that.
I really need some help on this.
Thank you in advance
This is the correct regular expression:
"\\w+\\s*->\\s*\\w+\\s*\\[label=\"\\d+\"\\]"
What you report about matches and non-matches of partial regular expressions is very unlikely, not possible with the Berlin/Munich string.
Also, if you are really into German city names, you might have to consider names like Castrop-Rauxel (which some wit has called the Latin name of Wanne-Eickel ;-) )
Try this
String message = "Berlin -> Munich [label=\"590\"]";
Pattern p = Pattern.compile("\\w+\\s*->\\s*\\w+\\s*\\[label=\"\\d+\"\\]");
Matcher matcher = p.matcher(message);
while(matcher.find()) {
System.out.println(matcher.group());
}
You need to much more than one token of characters and numbers.

Java regular expression - Search string by group

Please could someone explain this for me:
We have a regular expression which we use to check if a string matches a specific sequence. The regular expression is shown below:
JPRN(JAPICCTI\d{6})|(JAPICCTI\d{6})
I want to try and understand what this code is trying to achieve:
matcher = Pattern.compile("JPRN(JAPICCTI\d{6})|(JAPICCTI\d{6})");
Matcher m = matcher.matcher("JAPICCTI132323");
if(m.find()){
Matcher m2 = matcher.matcher(m.group());
if(m2.find()){
return m2.replaceAll("$1")
}
}
The string it tries to check (i.e. JAPICCTI132323) does match with the regular expression.
I dont however understand why the matching is done twice i.e. using the string and again using the "group". What would be the reason for doing this?
And also what is the purpose of the $1 string.
This is failing because the m2.replaceAll("$1") is returning an empty string but i was expecting it to return JAPICCTI132323. Given that i dont understand what it is doing i am struggling to understand why the result is an empty string
Thanks in advance.
The | symbol indicates alternation which means "Match the left group first, if it does not match, try the second group"
The $1 symbol represents what was matched, in this case it would simply replace itself with itself.
If you have a number of capture groups: (one\d+)(two\w+\d)(three.*?)
Then you could use $1, $2 and $3 to represent the matched strings.
In other regex implementations you can name a capture group like so: (?<first match>regexpattern) or (?<phone number>\d{2}\s\d{4}) but unfortunately in Java, it is not available.
You might have to do some testing, but you might be able to specify $1$2 as the replacement, since if one of them is null, it won't add anything but the other match will.
But if both match, it will cause issues because you will have two strings in your replacement.

Match characters via regex in a string mutliple times

I'm trying to replace all occurences in a string with a regex using a Pattern object, but it only replaces the odd occurences:
final Pattern p = Pattern.compile("(^|\\W|\\\\N)(recursive)(\\W|$)", Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("i-i-i").replaceAll("$1I$3"));
This returns me:
I-i-I
But I need to to match also the I in the middle, but somehow it doesn't catch that. I also tried a simplified regex (^|-)(I)($|-) and try to do the same with i-i-i-i-i-i which returned me I-i-I-i-I-i.
I guess it is because the odd dashs (at 4x+1) were already matched, so they can't be matched a second time for the even is. Is it possible to allow that?
It seems that your problem is that you are trying to use same character - in few matches. In that case you should probably use look-around mechanism. For example you can change your
(^|-)(I)($|-)
patter to
(^|-)(I)(?=($|-))
and as replacement use $1I. This way regex will only check if after I exists $ or - but will not include it in match, so
final Pattern p = Pattern.compile("(^|-)(I)(?=($|-))",
Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
System.out.println(p.matcher("i-i-i-i-i-i").replaceAll("$1I"));
prints
I-I-I-I-I-I

Need regex to match the given string

I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}

Categories