Period in regex using java

Period in regex using java - java

What I'm trying to do is making a valid mail id using regular expressions, from a given string. This is my code:
Pattern pat3 = Pattern.compile("[(a-z)+][(a-z\\d)]+{3,}\\#[(a-z)+]\\.[(a-z)+]");
Matcher mat3 = pat3.matcher("dasdsa#2 #ada. ss2#dad.2om p2# 2#2.2 fad2#yahoo.com 22#yahoo.com fad#yahoo.com");
System.out.println(mat3.pattern() + " ");
while(mat3.find()){
System.out.println("Position: " + mat3.start() + " ");
}
The problem is nothing is printed out. What I want to print, and what I really expect to print, but it doesn't, is: 39, 67.
Can someone explain me, why \\. doesn't work? Before putting \\. my regex was working fine till that point.

Make your pattern as the following :
[a-z]+[a-z\\d]+{3,}\\#[a-z]+\\.[a-z]+
So, the code will be :
Pattern pat3 = Pattern.compile("[a-z]+[a-z\\d]+{3,}\\#[a-z]+\\.[a-z]+");
// Your Code
while(mat3.find()){
System.out.println("Position: " + mat3.start() + " --- Match: " + mat3.group());
}
This will give the following result :
Pattern :: [a-z]+[a-z\d]+{3,}\#[a-z]+\.[a-z]+
Position: 39 --- Match: fad2#yahoo.com
Position: 67 --- Match: fad#yahoo.com
Explanation:
You have put the pattern as
[(a-z)+][(a-z\\d)]+{3,}\\#[(a-z)+]\\.[(a-z)+]
the character set, [(a-z)+] will not match one or more repetition of lower-case alphabet. It will match only one occurrence of any of these : (, a-z, ), +
to match one or more repetition of lower-case alphabets, the character set should be like [a-z]+
So if you remove the \\. part from your pattern , and
while(mat3.find()){
System.out.println("Position: " + mat3.start() + " --- Match: " + mat3.group());
}
will give :
Pattern :: [(a-z)+][(a-z\d)]+{3,}\#[(a-z)+][(a-z)+]
Position: 15 --- Match: ss2#da // not ss2#dad
Position: 39 --- Match: fad2#ya // not fad2#yahoo
Position: 67 --- Match: fad#ya // not fad#yahoo

Related

How to capture multiline repeated groups using regular expression

I've been trying to write a regular expression in a Kotlin application that I can use to parse multiline journal entries that are delimited by means of a timestamp prefix like so:
28-03-2020 23:00:00 - This
is
line
1
28-03-2021 14:23:15 - This
is
line
2
Each repeating group should capture the timestamp (1) and all text that occurs until either the next timestamp pattern at the start of a line or the end of text (2).
So, in the example above I expect the following output:
Match 1
Group 1: 28-03-2020 23:00:00
Group 2: This\nis\nline\n1\n
Match 2
Group 1: 28-03-2020 14:23:15
Group 2: This\nis\nline\n2\n
So far, I've managed to conjure up a regular expression that can capture the first match using:
^(\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}) -([\s\S]*?)(?=^\d{2}.*?)
However, I've been unsuccessful in capturing as repeated groups so far.. can someone help?
I've setup this regex101 session to test it.

If you want to match:
Each repeating group should capture the timestamp and all text
that occurs until either the next timestamp pattern at the start of a
line or the end of text.
you can capture the timestamp at the start of the string in group 1.
Without setting an end boundary like a newline or a digit at the start of the line, capture all lines that do not start with a timestamp like pattern using a negative lookahead in group 2.
^(\d{2}-\d{2}-\d{4}\h+\d{2}:\d{2}:\d{2})\h+-\h*(.*(?:\R(?!\d{2}-\d{2}-\d{4}\h+\d{2}:\d{2}:\d).*)*)
^ Start of string
(\d{2}-\d{2}-\d{4}\h+\d{2}:\d{2}:\d{2}) Capture group 1, match a datetime like pattern
\h+-\h* Match - preceded by 1+ horizontal whitespace char and followed by optional ones
( Capture group 2
.* Match the whole line
(?: Non capture group
\R Match a newline
(?!\d{2}-\d{2}-\d{4}\h+\d{2}:\d{2}:\d) Negative lookahead, assert not a datetime like pattern directly to the right
.* If the assertion in true, match the whole line
)* Match a newline and the rest of the line if it does not start with a datetime like pattern
) Close group 2
Regex demo | Java demo
For example
String regex = "^(\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2})\\h+-\\h*(.*(?:\\R(?!\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d).*)*)";
String string = "28-03-2020 23:00:00 - This\n"
+ "is\n"
+ "line\n"
+ "1\n\n"
+ "28-03-2021 14:23:15 - This\n"
+ "is\n"
+ "line\n"
+ "2\n\n\n\n"
+ "28-03-2020 23:00:00 - This\n"
+ "is\n"
+ "12\n"
+ "line\n"
+ "1";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println("--------------------");
}
Output
28-03-2020 23:00:00
This
is
line
1
--------------------
28-03-2021 14:23:15
This
is
line
2
--------------------
28-03-2020 23:00:00
This
is
12
line
1
--------------------

You should use Pattern.DOTALL like this.
public static void main(String[] args) {
String s = "28-03-2020 23:00:00 - This\n"
+ "is\n"
+ "line\n"
+ "1\n"
+ "\n"
+ "28-03-2021 14:23:15 - This\n"
+ "is\n"
+ "line\n"
+ "2\n"
+ "\n";
Pattern pat = Pattern.compile(
"(\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2})\\s*-\\s*(.*?)\n\n",
Pattern.DOTALL);
Matcher m = pat.matcher(s);
while (m.find()) {
System.out.println("Group 1 : " + m.group(1));
System.out.println("Group 2 : " + m.group(2));
}
}
output:
Group 1 : 28-03-2020 23:00:00
Group 2 : This
is
line
1
Group 1 : 28-03-2021 14:23:15
Group 2 : This
is
line
2

How about this.
public static void main(String[] args) {
String s = "28-03-2020 23:00:00 - This\n"
+ "is\n"
+ "line\n"
+ "1\n"
+ "\n"
+ "28-03-2021 14:23:15 - This\n"
+ "is\n"
+ "line\n"
+ "2\n"
+ "\n";
Pattern r = Pattern.compile("^(\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2}) -(?:[\\s\\D]*?)^(\\d{1,2})",Pattern.MULTILINE);
Matcher matcher = r.matcher(s);
while(matcher.find()) {
System.out.println("Group 1 : " + matcher.group(1));
System.out.println("Group 2 : " + matcher.group(2));
}
}
And the output is as below.
Group 1 : 28-03-2020 23:00:00
Group 2 : 1
Group 1 : 28-03-2021 14:23:15
Group 2 : 2

Refactor regex Pattern into Java flavor pattern

I have a regex pattern created on regex101.com:
https://regex101.com/r/cMvHlm/7/codegen?language=java
however, that regex does not seem to work in my Java program (I use spring toolsuite as IDE):
#Test
public void testRegex() {
//Pattern referenceCodePattern = Pattern.compile("((\\h|\\:)+)(([\u00DFA-Za-z0-9-_#\\\\\\/])+)(([[:punct:]])?)");
Pattern pattern = Pattern.compile(""
+ "(?:\\s+|chiffre|job-id|job-nr[.]|job-nr|\\bjob id\\b|job nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job nr.|ziffer|kennziffer|kennz.|referenz code|referenz-code|"
+ "referenzcode|ref[.] nr[.]|ref[.] id|ref id|ref[.]id|ref[.]-nr[.]|ref[.]- nr[.]|"
+ "referenz nummer|referenznummer|referenz nr[.]|stellenreferenz| referenz-nr[.]|referenznr[.]|referenz|referenznummer der stelle|id#|id #|stellenausschreibungen|"
+ "stellenausschreibungs\\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots id|stellenangebots-id|stellenangebotsid|stellen id|stellen-id|stellenid|stellenreferenz|"
+ "stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\\bst[.]-nr[.]\\b|\\bst[.] nr[.]\\b|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|"
+ "bewerbungskennziffer|projekt id|projekt-id|reference number|reference no[.]|reference code|job code|job id|job vacancy no[.]|job-ad-number|auto req id|job ref|\\bstellenausschreibung nr[.]\\b)"
+ ":?(?:\\w*)(?:\\s*)([A-Z]*\\s*)([!\"#$%&'()*+,\\-.\\/:;<=>?#[\\]^_`{|}~]*\\w*[!\"#$%&'()*+,\\-.\\/:;<=>?#[\\]^_`{|}~]*\\w*[!\"#$%&'()*+,\\-.\\/:;<=>?#[\\]^_`{|}~]*\\w*[!\"#$%&'()*+,\\-.\\/:;<=>?#[\\]^_`{|}~]*)?");
String line = "Referenznummer: INDUSTRY Kontakt: ZAsdfsdfS Herr Andrafgdh Neue Str. 7 21244 Buchholz +42341 22322 mdjob.bu44lz#zaqusssis.de Stellenanzeige teilen: Jetzt online bewerben! oder bewerben Sie sich mit\n" +
"Geben Sie bei Ihrer Bewerbung die Stellenreferenz und die Stellenbezeichnung an! \n" +
"Stellenreferenz: 21533448-JOtest\n\n" +
"Stellenausschreibung Nr. PD-666/19";
// Create a Pattern object
//Pattern r = Pattern.compile(pattern);
Matcher m = pattern.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
}else {
System.out.println("NO MATCH");
}
}
I get the following error:
java.util.regex.PatternSyntaxException: Unclosed character class near index 1337
at java.util.regex.Pattern.error(Pattern.java:1957)
at java.util.regex.Pattern.clazz(Pattern.java:2550)
at java.util.regex.Pattern.clazz(Pattern.java:2506)
at java.util.regex.Pattern.clazz(Pattern.java:2506)
at java.util.regex.Pattern.clazz(Pattern.java:2506)
at java.util.regex.Pattern.sequence(Pattern.java:2065)
at java.util.regex.Pattern.expr(Pattern.java:1998)
at java.util.regex.Pattern.group0(Pattern.java:2907)
at java.util.regex.Pattern.sequence(Pattern.java:2053)
at java.util.regex.Pattern.expr(Pattern.java:1998)
at java.util.regex.Pattern.compile(Pattern.java:1698)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
Is there a way to find out where index 1337 is?

The main problem with the regex is that both [ and ] must be escaped in a character class in a Java regex as these are used to form character class unions and intersections, are "special" there.
Another issue is the [.]\b patterns won't work as expected because a word boundary after a non-word char will require a word char immediately to the right of the current position. You need a \B there, not \b.
You need to escape / char in a Java regex pattern.
You do not have to repeat the pattern at the end of the regex, you may "repeat" it with a limiting {0,3} quantifier after wrapping the repeated pattern with a non-capturing group, (?:...).
Consider a while block to get all matches. You may use a boolean flag to see if there were any matches or not.
Also, you probably want to use \\s+ alternative as the last one in the first group, it is too generic, but I will leave it at the start for the time being.
Use
Pattern pattern = Pattern.compile(""
+ "(?:\\s+|chiffre|job-id|job-nr[.]|job-nr|\\bjob id\\b|job nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job nr\\.|ziffer|kennziffer|kennz\\.|referenz code|referenz-code|"
+ "referenzcode|ref[.] nr[.]|ref[.] id|ref id|ref[.]id|ref[.]-nr[.]|ref[.]- nr[.]|"
+ "referenz nummer|referenznummer|referenz nr[.]|stellenreferenz| referenz-nr[.]|referenznr[.]|referenz|referenznummer der stelle|id#|id #|stellenausschreibungen|"
+ "stellenausschreibungs\\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots id|stellenangebots-id|stellenangebotsid|stellen id|stellen-id|stellenid|stellenreferenz|"
+ "stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\\bst[.]-nr[.]\\B|\\bst[.] nr[.]\\B|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|"
+ "bewerbungskennziffer|projekt id|projekt-id|reference number|reference no[.]|reference code|job code|job id|job vacancy no[.]|job-ad-number|auto req id|job ref|\\bstellenausschreibung nr[.]\\B)"
+ ":?\\w*\\s*([A-Z]*\\s*)([!\"#$%&'()*+,\\-./:;<=>?#\\[\\]^_`{|}~]*(?:\\w*[!\"#$%&'()*+,\\-./:;<=>?#\\[\\]^_`{|}~]*){0,3})?");
String line = "Referenznummer: INDUSTRY Kontakt: ZAsdfsdfS Herr Andrafgdh Neue Str. 7 21244 Buchholz +42341 22322 mdjob.bu44lz#zaqusssis.de Stellenanzeige teilen: Jetzt online bewerben! oder bewerben Sie sich mit\n" +
"Geben Sie bei Ihrer Bewerbung die Stellenreferenz und die Stellenbezeichnung an! \n" +
"Stellenreferenz: 21533448-JOtest\n\n" +
"Stellenausschreibung Nr. PD-666/19";
Matcher m = pattern.matcher(line);
boolean found = false;
while (m.find()) {
found = true;
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
System.out.println(" ----------------------- " );
}
if (!found) {
System.out.println("NO MATCH");
}
See this Java demo.

In Java, unescaped [ is always considered an open class syntax, never a literal.
This is the reason some recommend always escape literal class metachars [ ]
which translates across most all engines.
Converting
[!"#$%&'()*+,\-.\/:;<=>?#[\]^_`{|}~]
to [!-/:-#\[\]-`{-~]
then refactoring the regex.
(Note there may be usability problems with the regex as well.)
Before refactor :
(?:\s+|chiffre|job-id|job-nr[.]|job-nr|\bjob[ ]id\b|job[ ]nr[.]|jobnummer|jobnr[.]|jobid|jobcode|job[ ]nr.|ziffer|kennziffer|kennz.|referenz[ ]code|referenz-code|referenzcode|ref[.][ ]nr[.]|ref[.][ ]id|ref[ ]id|ref[.]id|ref[.]-nr[.]|ref[.]-[ ]nr[.]|referenz[ ]nummer|referenznummer|referenz[ ]nr[.]|stellenreferenz|[ ]referenz-nr[.]|referenznr[.]|referenz|referenznummer[ ]der[ ]stelle|id\#|id[ ]\#|stellenausschreibungen|stellenausschreibungs\s?nr[.]|stellenausschreibungs-nr[.]|stellenausschreibungsnr[.]|stellenangebots[ ]id|stellenangebots-id|stellenangebotsid|stellen[ ]id|stellen-id|stellenid|stellenreferenz|stellen-referenz|ref[.]st[.]nr[.]|stellennumer|\bst[.]-nr[.]\b|\bst[.][ ]nr[.]\b|kenn-nr[.]|positionsnummer|kennwort|stellenkey|stellencode|job-referenzcode|stellenausschreibung|bewerbungskennziffer|projekt[ ]id|projekt-id|reference[ ]number|reference[ ]no[.]|reference[ ]code|job[ ]code|job[ ]id|job[ ]vacancy[ ]no[.]|job-ad-number|auto[ ]req[ ]id|job[ ]ref|\bstellenausschreibung[ ]nr[.]\b):?(?:\w*)(?:\s*)([A-Z]*\s*)([!"#$%&'()*+,\-.\/:;<=>?#\[\]^_`{|}~]*(?:\w*[!"#$%&'()*+,\-.\/:;<=>?#\[\]^_`{|}~]*){3})?
After refactor :
(?:\s+|chiffre|job(?:-(?:id|nr[.]?|referenzcode|ad-number)|[ ](?:(?:nr|vacancy[ ]no)[.]|code|id|ref)|n(?:ummer|r[.])|id|code)|\b(?:job[ ]id|st(?:[.][ \-]|ellenausschreibung[ ])nr[.])\b|(?:bewerbungskenn)?ziffer|kenn(?:z(?:iffer|.)|-nr[.]|wort)|ref(?:eren(?:z(?:[ ](?:code|n(?:ummer|r[.]))|-?code|n(?:ummer|r[.]|ummer[ ]der[ ]stelle))?|ce[ ](?:n(?:umber|o[.])|code))|[.](?:[ ](?:nr[.]|id)|id|(?:-[ ]?|st[.])nr[.])|[ ]id)|stellen(?:referenz|a(?:usschreibung(?:en|s(?:\s?|-)?nr[.])?|ngebots[ \-]?id)|[ ]?id|-(?:id|referenz)|numer|key|code)|[ ]referenz-nr[.]|id[ ]?\#|p(?:ositionsnummer|rojekt[ \-]id)|auto[ ]req[ ]id):?\w*\s*[A-Z]*\s*(?:[!-/:-#\[\]-`{-~]*(?:\w*[!-/:-#\[\]-`{-~]*){3})?

Java Regex OR operator not working properly

I have this Strings :
String test1=":test:block1:%a1%a2%a3%a4:block2:BL";
and
String test2=":test:block2:BL:block1:%a1%a2%a3%a4";
I've created an regex pattern in order to isolate this piece of String
block1:%a1%a2%a3%a4:
from the rest of the String letting those Strings like this :
in the case of test1="block1:%a1%a2%a3%a4:"; (with ':' at the end)
in the case of test2=":block1:%a1%a2%a3%a4"; (with ':' at the beggining)
The regex i've created is :
"(block1:(.*?):|:block1:(.*))";
With test1 is working , but with test2 is retrieving me this :
block1:%a1%a2%a3%a4:block2:BL";
Can someone give me a hand with this ?
Cheers!

You may use
block1:([^:]*)
It matches block1: text and then captures into Group 1 any 0 or more chars other than :.
See Java demo:
String patternString = "block1:([^:]*)";
String[] tests = {":test:block1:%a1%a2%a3%a4:block2:BL",
":test:block2:BL:block1:%a1%a2%a3%a4"};
for (int i=0; i<tests.length; i++)
{
Pattern p = Pattern.compile(patternString, Pattern.DOTALL);
Matcher m = p.matcher(tests[i]);
if(m.find())
{
System.out.println(tests[i] + " matched. Match: " +
m.group(0) + ", Group 1: " + m.group(1));
}
}
Output:
:test:block1:%a1%a2%a3%a4:block2:BL matched. Match: block1:%a1%a2%a3%a4, Group 1: %a1%a2%a3%a4
:test:block2:BL:block1:%a1%a2%a3%a4 matched. Match: block1:%a1%a2%a3%a4, Group 1: %a1%a2%a3%a4

Nested/Repeated Group in Regex

I have to parse a multi line string and retrieve the email addresses in a specific location.
And I have done it using the below code:
String input = "Content-Type: application/ms-tnef; name=\"winmail.dat\"\r\n"
+ "Content-Transfer-Encoding: binary\r\n" + "From: ABC aa DDD <aaaa.b#abc.com>\r\n"
+ "To: DDDDD dd <sssss.r#abc.com>\r\n" + "CC: Rrrrr rrede <sssss.rv#abc.com>, Dsssssf V R\r\n"
+ " <dsdsdsds.vr#abc.com>, Psssss A <pssss.a#abc.com>, Logistics\r\n"
+ " <LOGISTICS#abc.com>, Gssss Bsss P <gdfddd.p#abc.com>\r\n"
+ "Subject: RE: [MyApps] (PRO-34604) PR for Additional Monitor allocation [CITS\r\n"
+ " Ticket:258849]\r\n" + "Thread-Topic: [MyApps] (PRO-34604) PR for Additional Monitor allocation\r\n"
+ " [CITS Ticket:258849]\r\n" + "Thread-Index: AQHRXMJHE6KqCFxKBEieNqGhdNy7Pp8XHc0A\r\n"
+ "Date: Mon, 1 Feb 2016 17:56:17 +0530\r\n"
+ "Message-ID: <B7F84439E634A44AB586E3FF2EA0033A29E27E47#JETWINSRVRPS01.abc.com>\r\n"
+ "References: <JA.101.1453963700000#myapps.abc.com>\r\n"
+ " <JA.101.1453963700000.978.1454311765375#myapps.abc.com>\r\n"
+ "In-Reply-To: <JIRA.450101.1453963700000.978.1454311765375#myapps.abc.com>\r\n"
+ "Accept-Language: en-US\r\n" + "Content-Language: en-US\r\n" + "X-MS-Has-Attach:\r\n"
+ "X-MS-Exchange-Organization-SCL: -1\r\n"
+ "X-MS-TNEF-Correlator: <B7F84439E634A44AB586E3FF2EA0033A29E27E47#JETWINSRVRPS01.abc.com>\r\n"
+ "MIME-Version: 1.0\r\n" + "X-MS-Exchange-Organization-AuthSource: TURWINSRVRPS01.abc.com\r\n"
+ "X-MS-Exchange-Organization-AuthAs: Internal\r\n" + "X-MS-Exchange-Organization-AuthMechanism: 04\r\n"
+ "X-Originating-IP: [1.1.1.7]";
Pattern pattern = Pattern.compile("To:(.*<([^>]*)>).*Message-ID", Pattern.DOTALL);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
Pattern innerPattern = Pattern.compile("<([^>]*)>");
Matcher innerMatcher = innerPattern.matcher(matcher.group(1));
while (innerMatcher.find()) {
System.out.println("-->:" + innerMatcher.group(1));
}
}
Here it works fine. I'm first grouping the part from To till the Message which is the required part. And then I have another grouping to extract the email ids.
Is there any better way to do this? Can we do it with one pattern matcher set?
Update:
This is the expected output:
-->:sssss.r#abc.com
-->:sssss.rv#abc.com
-->:dsdsdsds.vr#abc.com
-->:pssss.a#abc.com
-->:LOGISTICS#abc.com
-->:gdfddd.p#abc.com

Ideally, you could have used lookarounds:
(?<=To:.*)<([^>]+)>(?=.*Message-ID)
Visualization by Debuggex
Unfortunately, Java doesn't support variable length in lookbehinds. A workaround could be:
(?<=To:.{0,1000})<([^>]+)>(?=.*Message-ID)

I think you are looking for all the emails inside <...> that come after To: and before Message-ID. So, you may use a \G based regex for one pass:
Pattern pt = Pattern.compile("(?:\\bTo:|(?!^)\\G).*?<([^>]*)>(?=.*Message-ID)", Pattern.DOTALL);
Matcher m = pt.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
See IDEONE demo and a regex demo
The regex matches:
(?:\\bTo:|(?!^)\\G) - a leading boundary, either To: as a whole word or the location after the previous successful match
.*? - any characters, any number of occurrences up to the first
<([^>]*)> - substring starting with < followed with zero or more characters other than > (Group 1) and followed with a closing >
(?=.*Message-ID) - a positive lookahead that makes sure there is Message-ID somewhere ahead of the current match.

Advanced parsing of numeric ranges from string

I'm using Java to parse strings input by the user, representing either single numeric values or ranges. The user can input the following string:
10-19
And his intention is to use whole numbers from 10-19 --> 10,11,12...19
The user can also specify a list of numbers:
10,15,19
Or a combination of the above:
10-19,25,33
Is there a convenient method, perhaps based on regular expressions, to perform this parsing? Or must I split the string using String.split(), then manually iterate the special signs (',' and '-' in this case)?

This is how I would go about it:
Split using the , as a delimiter.
If it matches this regular expression: ^(\\d+)-(\\d+)$, then I know I have a range. I would then extract the numbers and create my range (it might be a good idea to make sure that the first digit is lower than the second digit, because you never know...). You then act accordingly.
If it matches this regular expression: ^\\d+$ I would know I have only 1 number, so I have a specific page. I would then act accordingly.

This tested (and fully commented) regex solution meets the OP requirements:
Java regex solution
// TEST.java 20121024_0700
import java.util.regex.*;
public class TEST {
public static Boolean isValidIntRangeInput(String text) {
Pattern re_valid = Pattern.compile(
"# Validate comma separated integers/integer ranges.\n" +
"^ # Anchor to start of string. \n" +
"[0-9]+ # Integer of 1st value (required). \n" +
"(?: # Range for 1st value (optional). \n" +
" - # Dash separates range integer. \n" +
" [0-9]+ # Range integer of 1st value. \n" +
")? # Range for 1st value (optional). \n" +
"(?: # Zero or more additional values. \n" +
" , # Comma separates additional values. \n" +
" [0-9]+ # Integer of extra value (required). \n" +
" (?: # Range for extra value (optional). \n" +
" - # Dash separates range integer. \n" +
" [0-9]+ # Range integer of extra value. \n" +
" )? # Range for extra value (optional). \n" +
")* # Zero or more additional values. \n" +
"$ # Anchor to end of string. ",
Pattern.COMMENTS);
Matcher m = re_valid.matcher(text);
if (m.matches()) return true;
else return false;
}
public static void printIntRanges(String text) {
Pattern re_next_val = Pattern.compile(
"# extract next integers/integer range value. \n" +
"([0-9]+) # $1: 1st integer (Base). \n" +
"(?: # Range for value (optional). \n" +
" - # Dash separates range integer. \n" +
" ([0-9]+) # $2: 2nd integer (Range) \n" +
")? # Range for value (optional). \n" +
"(?:,|$) # End on comma or string end.",
Pattern.COMMENTS);
Matcher m = re_next_val.matcher(text);
String msg;
int i = 0;
while (m.find()) {
msg = " value["+ ++i +"] ibase="+ m.group(1);
if (m.group(2) != null) {
msg += " range="+ m.group(2);
};
System.out.println(msg);
}
}
public static void main(String[] args) {
String[] arr = new String[]
{ // Valid inputs:
"1",
"1,2,3",
"1-9",
"1-9,10-19,20-199",
"1-8,9,10-18,19,20-199",
// Invalid inputs:
"A",
"1,2,",
"1 - 9",
" ",
""
};
// Loop through all test input strings:
int i = 0;
for (String s : arr) {
String msg = "String["+ ++i +"] = \""+ s +"\" is ";
if (isValidIntRangeInput(s)) {
// Valid input line
System.out.println(msg +"valid input. Parsing...");
printIntRanges(s);
} else {
// Match attempt failed
System.out.println(msg +"NOT valid input.");
}
}
}
}
Output:
r'''
String[1] = "1" is valid input. Parsing...
value[1] ibase=1
String[2] = "1,2,3" is valid input. Parsing...
value[1] ibase=1
value[2] ibase=2
value[3] ibase=3
String[3] = "1-9" is valid input. Parsing...
value[1] ibase=1 range=9
String[4] = "1-9,10-19,20-199" is valid input. Parsing...
value[1] ibase=1 range=9
value[2] ibase=10 range=19
value[3] ibase=20 range=199
String[5] = "1-8,9,10-18,19,20-199" is valid input. Parsing...
value[1] ibase=1 range=8
value[2] ibase=9
value[3] ibase=10 range=18
value[4] ibase=19
value[5] ibase=20 range=199
String[6] = "A" is NOT valid input.
String[7] = "1,2," is NOT valid input.
String[8] = "1 - 9" is NOT valid input.
String[9] = " " is NOT valid input.
String[10] = "" is NOT valid input.
'''
Note that this solution simply demonstrates how to validate an input line and how to parse/extract value components from each line. It does not further validate that for range values the second integer is larger than the first. This logic check however, could be easily added.
Edit:2012-10-24 07:00 Fixed index i to count from zero.

You can use
strinput = '10-19,25,33'
eval(cat(2,'[',strrep(strinput,'-',':'),']'))
Best is to include some input checks, also negative numbers will give problems with this method.

In a simplest approach you can use the evil eval for this
A = eval('[10:19,25,33]')
A =
10 11 12 13 14 15 16 17 18 19 25 33
BUT of course you should think twice before you do that. Especially if this is a user-supplied string! Imagine what would happen if the user supplied any other command...
eval('!rm -rf /')
You would have to make sure that there really is nothing else than what you want. You could do this by regexp.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Period in regex using java - java

Related

How to capture multiline repeated groups using regular expression

Refactor regex Pattern into Java flavor pattern

Java Regex OR operator not working properly

Nested/Repeated Group in Regex

Advanced parsing of numeric ranges from string

Categories

Resources