How "STAR" not considered as a Quantifier in regular expersion? - java

There is no problem for the following model of IP(for example) :
255.3.3.6
by this RE(from: http://www.mkyong.com/regular-expressions/how-to-validate-ip-address-with-regular-expression/):
"^([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\." +
"([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\." +
"([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\." +
"([01]?\\d\\d?|2[0-4]\\d|25[0-5])$";
but I want to have an IP-pattern to handle one IP like following model:
255.*.3.100
OR
*.*.3.100
OR
*.*.*.*
(any places in the IP, can be a star)
i use this pattern:
"^([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.|(\\*)\\." +
"([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.|(\\*)\\." +
"([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.|(\\*)\\." +
"([01]?\\d\\d?|2[0-4]\\d|25[0-5])|(\\*)\\.$";
but it dose not work.
I think star in my pattern considered as one Quantifier .
what should I do?please help me.

The asterisk is an additional alternative. Compose without repetitions.
String group = "(?:[01]?\\d\\d?|2[0-4]\\d|25[0-5]|\\*)";
String patstr = "^" + group + "(\\." + group + "){3}$";
Pattern pat = Pattern.compile( patstr );
Matcher mat = pat.matcher( args[0] );
System.out.println( mat.matches() );
The grammar represented by OP's regular expression can be written as
IP ::= DP
|APDP
|APDP
|APD
|AP
D ::= Number
P ::= '.'
A ::= '*'
Note that the operator | separates alternatives; thus no valid address is matching, and no address where a number is replaced by an asterisk.

Related

RegEx for matching special patterns

I'm trying to match a String like this:62.00|LQ+2*2,FP,MD*3 "Description"
Where the decimal value is 2 digits optional, each user is characterized by two Chars and it can be followed by
(\+[\d]+)? or (\*[\d]+)? or none, or both, or both in different order
like:
LQ*2+4 | LQ+4*2 | LQ*2 | LQ+8 | LQ
Description is also optional
What i have tried is this:
Pattern.compile("^(?<number>[\\d]+(\\.[\\d]{2})?)\\|(?<users>([A-Z]{2}){1}(((\\+[\\d]+)?(\\*[\\d]+)?)|((\\+[\\d]+)?(\\*[\\d]+)?))((,[A-Z]{2})(((\\+[\\d]+)?(\\*[\\d]+)?)|((\\+[\\d]+)?(\\*[\\d]+)?)))*)(\\s\\\"(?<message>.+)\\\")?$");
I need to get all the users so i can split them by ',' and then further regex my way into it.But i cannot grab anything out of it.The desired output from
62.00|LQ+2*2,FP,MD*3 "Description"
Should be:
62.00
LQ+2*2,FP,MD*3
Description
Accepted inputs should be of these kind:
62.00|LQ+2*2,FP,MD*3
30|LQ "Burgers"
35.15|LQ*2,FP+2*4,MD*3+4 "Potatoes"
35.15|LQ,FP,MD
The precise regex to match the inputs you described should be fulfilled by this regex,
^(\d+(?:\.\d{1,2})?)\|([a-zA-Z]{2}(?:(?:\+\d+(?:\*\d+)?)|(?:\*\d+(?:\+\d+)?))?(?:,[a-zA-Z]{2}(?:(?:\+\d+(?:\*\d+)?)|(?:\*\d+(?:\+\d+)?))?)*)(?: +(.+))?$
Where group1 will contain the number that can have optional decimals upto two digits and group2 will have the comma separated inputs as you described in your post and group3 will contain the optional description if present.
Explanation of regex:
^ - Start of string
(\d+(?:\.\d{1,2})?) - Matches the number which can have optional 2 digits after decimal and captures it in group1
\| - Matches literal | present in your input after the number
([a-zA-Z]{2}(?:(?:\+\d+(?:\*\d+)?)|(?:\*\d+(?:\+\d+)?))?(?:,[a-zA-Z]{2}(?:(?:\+\d+(?:\*\d+)?)|(?:\*\d+(?:\+\d+)?))?)*) - This part matches two letters followed by any combination of + followed by number and optionally having * followed by number OR * followed by number and optionally having + followed by number exactly either once or whole of it being optional and captures it in group2
(?: +(.+))? - This matches the optional description and captures it in group3
$ - Marks end of input
Regex Demo
I'm guessing that we have several optional groups here, that might not be a problem. The problem I'm having is that I'm not quite sure what would be the range of our inputs and what might be desired outputs.
RegEx 1
If we are just matching everything, that I'm guessing, we might like to start with something similar to:
[0-9]+(\.[0-9]{2})?\|[A-Z]{2}[+*]?([0-9]+)?[+*]?([0-9]+)?,[A-Z]{2},[A-Z]{2}[+*]?([0-9]+)?(\s+"Description")?
Here, we simply add a ? after every sub-expression that we wish to have it optional, then we use char lists and quantifiers, and start swiping everything from left to right, to cover all inputs.
If we like to capture, then we simply wrap any part that we want captured with a capturing group ().
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "[0-9]+(\\.[0-9]{2})?\\|[A-Z]{2}[+*]?([0-9]+)?[+*]?([0-9]+)?,[A-Z]{2},[A-Z]{2}[+*]?([0-9]+)?(\\s+\"Description\")?";
final String string = "62.00|LQ+2*2,FP,MD*3 \"Description\"\n"
+ "62|LQ+2*2,FP,MD*3 \"Description\"\n"
+ "62|LQ+2*2,FP,MD*3\n"
+ "62|LQ*2,FP,MD*3\n"
+ "62|LQ+8,FP,MD*3\n"
+ "62|LQ,FP,MD";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx 2
If we wish to output three groups that is listed:
([0-9]+(\.[0-9]{2})?)\|([A-Z]{2}[+*]?([0-9]+)?[+*]?([0-9]+)?,[A-Z]{2},[A-Z]{2}[+*]?([0-9]+)?)(\s+"Description")?
Demo 2
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "([0-9]+(\\.[0-9]{2})?)\\|([A-Z]{2}[+*]?([0-9]+)?[+*]?([0-9]+)?,[A-Z]{2},[A-Z]{2}[+*]?([0-9]+)?)(\\s+\"Description\")?";
final String string = "62.00|LQ+2*2,FP,MD*3 \"Description\"\n"
+ "62|LQ+2*2,FP,MD*3 \"Description\"\n"
+ "62|LQ+2*2,FP,MD*3\n"
+ "62|LQ*2,FP,MD*3\n"
+ "62|LQ+8,FP,MD*3\n"
+ "62|LQ,FP,MD";
final String subst = "\\1\\n\\3\\n\\7";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
RegEx 3
Based on updated desired output, this might work:
([0-9]+(\.[0-9]{2})?)\|((?:[A-Z]{2}[+*]?([0-9]+)?[+*]?([0-9]+)?,?)(?:[A-Z]{2}[+*]?([0-9]+)?[*+]?([0-9]+)?,?[A-Z]{2}?[*+]?([0-9]+)?[+*]?([0-9]+)?)?)(\s+"(.+?)")?
DEMO

Regex for circle and polygon string with decimal/integer values

I'm trying to create regex patterns to be used in Java for the following two strings:
CIRCLE ( (187.8562 ,-88.562 ) , 0.774 )
and
POLYGON ( (17.766 55.76676,77.97666 -32.866888,54.97799 54.2131,67.666777 24.9771,17.766 55.76676) )
Please note that
one/more white spaces may exist anywhere.Exceptions are not between alphabets.And not between any digits of a number. [UPDATED]
CIRCLE and POLYGON words are fixed but are not case sensitive.[UPDATED]
For the 2nd string the number of point set are not fixed.Here I've given 5 set of points for simplicity.
points are set of decimal/integer numbers [UPDATED]
positive decimal number can have a + sign [UPDATED]
leading zero is not mandatory for a decimal number [UPDATED]
For polygon atleast 3 point set are required.And also first & last point set will be the same (enclosed polygon) [UPDATED]
Any help or suggestion will be appreciated.
I've tried as:
(CIRCLE)(\\s+)(\\()(\\s+)(\\()(\\s+)([+-]?\\d*\\.\\d+)(?![-+0-9\\.])(\\s+)(,)(\\s+)([+-]?\\d*\\.\\d+)(?![-+0-9\\.])(\\s+)(\\))(\\s+)(,)(\\s+)([+-]?\\d*\\.\\d+)(?![-+0-9\\.])(\\s+)(\\))
Could you please provide me the working regex pattern for those two string?
I suggest you to remove space from your string before submitting it to the regex.
Circle:
CIRCLE\(\(-?\d+\.\d+,-?\d+\.\d+\),[-]?\d+\.\d+\)
Polygon:
POLYGON\(\((-?\d+\.\d+\s+-?\d+\.\d+,)+-?\d+\.\d+\s+-?\d+\.\d+\)\)
Circle including spaces:
CIRCLE\s*\(\s*\(\s*-?\d+\.\d+\s*,\s*-?\d+\.\d+\s*\)\s*,\s*-?\d+\.\d+\s*\)
Polygon including spaces:
POLYGON\s*\(\s*\(\s*(-?\d+\.\d+\s+-?\d+\.\d+\s*,\s*)+\s*-?\d+\.\d+\s+-?\d+\.\d+\s*\)\s*\)
Circle including spaces updated:
/CIRCLE\s*\(\s*\(\s*[+-]?\d*\.\d+\s*,\s*[+-]?\d*\.\d+\s*\)\s*,\s*[+-]?\d*\.\d+\s*\)/i
Polygon including spaces updated:
/POLYGON\s*\(\s*\(\s*([+-]?\d*\.\d+)\s+([+-]?\d*\.\d+)\s*(,\s*[+-]?\d*\.\d+\s+[+-]?\d*\.\d+)+\s*,\s*\1\s+\2\s*\)\s*\)/i
UPDATED ANSWER:
This match examples from question and comments:
(CIRCLE|POLYGON)([( ]+)([+ \-\.]?(\d+)?([ \.]\d+[ ,)]+))+
Any help or suggestion will be appreciated.
My suggestion is to break it up into pieces. Just as you'd want to break up a large, complex function into smaller functions so that each part is easy to see and understand, you want to break up a large, complex regex pattern into smaller patterns for the same reason. For example:
private interface Patterns {
String UNSIGNED_INTEGER = "(?:0|[1-9]\\d*+)";
String DECIMAL_PART = "(?:[.]\\d++)";
String UNSIGNED_NUMBER_WITH_INTEGER_PART =
"(?:" + UNSIGNED_INTEGER + DECIMAL_PART + "?+)";
String UNSIGNED_NUMBER =
"(?:" + UNSIGNED_NUMBER_WITH_INTEGER_PART + "|" + DECIMAL_PART ")";
String NUMBER = "(?:[+-]?+" + UNSIGNED_NUMBER + ")";
String SPACE_SEPARATED_PAIR = "(?:" + NUMBER + "\\s++" + NUMBER + ")";
String OPTIONAL_SPACE = "(?:\\s*+)";
String LPAREN = "(?:" + OPTIONAL_SPACE + "[(]" + OPTIONAL_SPACE + ")";
String RPAREN = "(?:" + OPTIONAL_SPACE + "[)]" + OPTIONAL_SPACE + ")";
String COMMA = "(?:" + OPTIONAL_SPACE + "," + OPTIONAL_SPACE + ")";
Pattern CIRCLE = Pattern.compile(
OPTIONAL_SPACE + "CIRCLE" + OPTIONAL_SPACE + LPAREN +
LPAREN +
NUMBER + COMMA + NUMBER +
RPAREN + COMMA +
NUMBER +
RPAREN + OPTIONAL_SPACE,
Pattern.CASE_INSENSITIVE);
Pattern POLYGON = Pattern.compile(
OPTIONAL_SPACE + "POLYGON" + OPTIONAL_SPACE + LPAREN +
LPAREN +
NUMBER_PAIR + "(?:" + COMMA + NUMBER_PAIR + "){3,}+" +
RPAREN
RPAREN + OPTIONAL_SPACE,
Pattern.CASE_INSENSITIVE);
}
Notes:
The above is not tested. My goal was to show you how to do this maintainably, rather than to simply do it for you. (It should work as-is, though, unless I have typos or whatnot.)
Note the pervasive use of non-capture groups (?:...). This allows each subpattern to be a separate module; for example, something like COMMA + "+" is well-defined as meaning "one or more commas, plus optional spaces".
Also note the pervasive use of possessive quantifiers like ?+ and *+ and ++. It's easier to tell what is matched by a given occurrence of NUMBER when you know that NUMBER will never "stop short" before a trailing digit. (Imagine having a function whose behavior depended on the code that runs after it. That would be confusing, right? Well, the non-possessive quantifiers can change their meaning depending on what follows, which can have similarly confusing results for large, complex regexes.) This also has considerable performance benefits in the event of a near-match.
I made no attempt to detect the "And also first & last point set will be the same (enclosed polygon)" case. Regexes are not suited to this, since regexes are string-description language, and "same" in this case is not a string concept but a mathematical one. (It's easy to tell that 1 +0.3 is equivalent to +1.0 .30 if you use something like BigDecimal to store the actual values; but to try to express that using a regex would be pure folly.)

Java pattern for [j-*]

Please help me with the pattern matching. I want to build a pattern which will match the word starting with j- or c- in the following in a string (Say for example)
[j-test] is a [c-test]'s name with [foo] and [bar]
The pattern needs to find [j-test] and [c-test] (brackets inclusive).
What I have tried so far?
String template = "[j-test] is a [c-test]'s name with [foo] and [bar]";
Pattern patt = Pattern.compile("\\[[*[j|c]\\-\\w\\-\\+\\d]+\\]");
Matcher m = patt.matcher(template);
while (m.find()) {
System.out.println(m.group());
}
And its giving output like
[j-test]
[c-test]
[foo]
[bar]
which is wrong. Please help me, thanks for your time on this thread.
Inside a character class, you don't need to use alternation to match j or c. Character class itself means, match any single character from the ones inside it. So, [jc] itself will match either j or c.
Also, you don't need to match the pattern that is after j- or c-, as you are not bothered about them, as far as they start with j- or c-.
Simply use this pattern:
Pattern patt = Pattern.compile("\\[[jc]-[^\\]]*\\]");
To explain:
Pattern patt = Pattern.compile("(?x) " // Embedded flag for Pattern.COMMENT
+ "\\[ " // Match starting `[`
+ " [jc] " // Match j or c
+ " - " // then a hyphen
+ " [^ " // A negated character class
+ " \\]" // Match any character except ]
+ " ]* " // 0 or more times
+ "\\] "); // till the closing ]
Using (?x) flag in the regex, ignores the whitespaces. It is often helpful, to write readable regexes.

How to use multiple different patterns?

how to check strings for multi-pattern regex not for single pattern if tried for one pattern but I need it for multi-pattern and i tried but it doesn't work.
when I running these codes just I can get one of them (time or price ) that is in the String but when I combine them don't show me any output.
thanks for your help....
here is my code :
String line = "This order was places for QT 30.00$ ! OK? and time is 2:45";
String pattern = "\\d+[.,]\\d+.[$]"+"\\d:\\d\\d";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
The "+" operator does not separate patterns - it concatenates strings.
What you can do is provide a pattern that accepts characters in between the two groups.
String pattern = "(\\d+[.,]\\d+.[$]).*(\\d:\\d\\d)";
The parentheses above are optional. If you include them, you can get the matched price and time as separate strings:
if (m.find( )) {
System.out.println("Found value: " + m.group(1) + " with time: " + m.group(2));
}
EDIT:
Just noticed your comment that you're looking for OR, not AND.
You can do that with an expression of the form X | Y:
String pattern = "\\d+[.,]\\d+.[$]|\\d:\\d\\d";
This will match either a price or a time, whichever occurs first. You can get the match with m.group(0).

How do I find a group of words using Reg-ex?

Here is the code:
String Str ="Animals \n" +
"Dog \n" +
"Cat \n" +
"Fruits \n" +
"Apple \n" +
"Banana \n" +
"Watermelon \n" +
"Sports \n" +
"Soccer \n" +
"Volleyball \n";
The Str basically has 3 categories (Animals, Fruits, Sports). Each of them in separate line. Using Regular Expression, how do I find the Fruits' contents, which will give me the output like this:
Apple
Banana
Watermelon
I would like an explanation that goes with your answer as well, so that I will have a better understand about this problem.
Thanks. :)
Assuming that you want to extract the text between the word "Fruits" and the word "Sports" you could use a regular expression with a capturing group. This way, if a string matches then you still have to extract the group that contains the text that you want.
For example:
Pattern p = Pattern.compile("Fruits(.*?)Sports", Pattern.DOTALL);
// The string "Fruits" ------^ ^ ^ ^
// Capture everything in between --^ ^ ^
// The string "Sports" -----------------^ ^
// This tells the regex to treat newlines ^
// like normal characters ---------------------^
See the railroad diagram below:
Alternatively, you can use a more advanced regular expression using positive lookahead and lookbehinds. This means that you can make your regular expression still look for text between the words "Fruit" and "Sports" but not consider those strings themselves as part of the match.
Pattern p = Pattern.compile("(?<!Fruits).*?(?=Sports)", Pattern.DOTALL);
I would start by splitting the string into an array of words (String[] words = Regex.Split(Str, "\n");), then loop through the words array, adding elements to their proper categories as you go along, switching between the categories as you see headings.

Categories