My goal is to compare one string with multiple other strings for equal operation using only regex in java 8.
I used below syntax
"^UK (Main Land)|German|Japan|Swiss|French|Italian$"
But this syntax works good for German,Japan,Swiss,French but validation fails for UK (Main Land) and Italian.
What the change that I have to make it work?
There are a couple of issues here.
The parentheses, as literal chars, must be escaped.
If you use Matcher.find(), you need the ^ and $ anchors to make sure the pattern matches the entire string (although \A and \z would be better), but you need to group the alternatives with either (...) or (?:...).
You do not need the group and anchors if you use String.matches() or Pattern.matches that ensure an entire string match.
I'd rather use
Boolean result = text.matches("UK \\(Main Land\\)|German|Japan|Swiss|French|Italian");
Related
I used regex101 to make my expression, and it looks like this using their symbols
\d+ [+-\/*] \d*
Basically I want a user to enter like 123 + 123 but the entire statement is one string with exactly one space after the first number and one space after the operator
The above expression works, but It doesn't convert the same into Java.
I thought these symbols were universal, but I guess not. Any ideas how to convert this to the proper syntax?
Regular expressions are not universal.
In general,
no two regular expression systems are the same.
Java does not have regular expressions.
Some Java classes support regular expressions.
The Pattern class defines the regular expressions that are used by some Java classes including Matcher which seems likely to be the class you are using.
As already identified in the comments,
\ is the escape-the-next-character character in Java.
If you want to represent \ in a String,
you must use \\.
For example,
\d in a regular expression must be written \\d in a Java String.
You can simply use groups () and design a RegEx as you wish. This RegEx might be one way to do so:
((\d+\s)(\+|\-)(\s\d+))
It has four groups, and you can simply call the entire input using $1:
You can also escape \ those required language-based chars.
I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.
You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).
You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/
I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.
when I tried this regex
\"(\S\S+)\"(?!;c)
on this string "MM:";d it comes as matched as I wanted
and on this string "MM:";c it comes as not matched as desired.
But when I add a second group, by moving the semicolon inside that group and making it optional using |
\"(\S\S+)\"(;|)(?!c)
for this string "MM:";c it comes as matched when I expected it to not like before.
I tried this on Java and then on Javascript using Regex tool debuggex:
This link contains a snippet of the above
What am I doing wrong?
note the | is so it is not necessary to have a semicolon.Also in the examples I put c, it is just a substitute in the example for a word, that's why I am using negative lookahead.
After following Holgers response of using Possessive Quantifiers,
\"(\S\S+)\";?+(?!c)
it worked, here is a link to it on RegexPlanet
I believe that the regex will do what it can to find a match; since your expression said the semicolon could be optional, it found that it could match the entire expression (since if the semicolon is not consumed by the first group, it becomes a "no-match" for the negative lookahead. This has to do with the recursive way that regex works: it keeps trying to find a match...
In other words, the process goes like this:
MM:" - matched
(;|) - try semicolon? matched
(?!c) - oops - negative lookahead fails. No match. Go back
(;|) - try nothing. We still have ';c' left to match
(?!c) - negative lookahead not matched. We have a match
An update (based on your comment). The following code may work better:
\"(\S\S+)\"(;|)((?!c)|(?!;c))
Debuggex Demo
The problem is that you don’t want to make the semicolon optional in the sense of regular expression. An optional semicolon implies that the matcher is allowed to try both, matching with or without it. So even if the semicolon is there the matcher can ignore it creating an empty match for the group letting the lookahead succeed.
But you want to consume the semicolon if it’s there, so it is not allowed to be used to satisfy the negative look-ahead. With Java’s regex engine that’s pretty easy: use ;?+
This is called a “possessive quantifier”. Like with the ? the semicolon doesn’t need to be there but if it’s there it must match and cannot be ignored. So the regex engine has no alternatives any more.
So the entire pattern looks like \"(\S\S+)\";?+(?!c) or \"(\S\S+)\"(;?+)(?!c) if you need the semicolon in a group.
I am trying to convert the following regular expression from Java to .NET:
(?i:(?:([^\d,]+?)\W+\b((?:CA|SD|SC|CT|DC)\b)?\W*)?(\d{5}(?:[- ]\d{3,4})?)?)
When I run a match against the following string:
Mountain View, CA 94043
using a Pattern and Matcher object in Java, it populates four groups with the values:
"Mountain View, CA 94043"
"Mountain View"
"CA"
"94043"
However, in .NET, there are two matches. The first match populates the four groups with these values:
"Mountain "(there is a space on the end of group 0)
"Mountain"
""
""
The second match populates the three groups with these values:
"View, CA 94043"
"View"
"CA"
"94043"
I also tried the expression using RegexBuddy using both the Java and .NET modes and in RegexBuddy, both modes work like the .NET version.
Thanks everyone!
Add ^ to the beginning of your pattern, and add $ to the end of it to match the beginning and end of the string, respectively. This will make the pattern match the entire string and produces your desired result:
string input = "Mountain View, CA 94043";
string pattern = #"^(?i:(?:([^\d,]+?)\W+\b((?:CA|SD|SC|CT|DC)\b)?\W*)?(\d{5}(?:[- ]\d{3,4})?)?)$";
Match m = Regex.Match(input, pattern);
foreach (Group g in m.Groups)
{
Console.WriteLine(g.Value);
}
Since you didn't restrict the pattern to be an exact match, as above, it found partial matches, especially since some of your groups are completely optional. Thus, it considers "Mountain" a match, then considers "View, CA 94043" as the next match.
EDIT: as requested in the comments, I'll try to point out the differences between the Java and .NET regex approaches.
In Java the matches() method returns true/false if the pattern matches the whole string. Thus it doesn't require the pattern to be modified with boundary anchors or atomic zero-width assertions. In .NET there is no such equivalent method that will do this for you. Instead, you need to explicitly add either the ^ and $ metacharacters, to match the start and end of the string or line, respectively, or the \A and \z metacharacters to do the same for the entire string. For a reference of .NET metacharacters check out this MSDN page. I'm not sure which set of anchors Java's matches() uses, although this article suggests \A and \z are used.
Java's matches() returns a boolean, and .NET provides the Regex.IsMatch() method to do the same thing (apart from the already discussed difference of matching the entire string). The .NET equivalent of Java's find() method is the Regex.Match() method, which you can use in a loop to continue to find the next match. In addition, .NET offers a Regex.Matches() method that will do this for you, and returns a collection of successful matches. Depending on your needs and the input this might be fine, but for added flexibility you may want to check Match.Success in a loop and use the Match.NextMatch() method to keep looking for matches (an example of this is available in the NextMatch link).