I'm trying to find a proper regex in java to detect all version 1 from large content. And I only care with just version 1, version 1.0, or version 1.0 but not 1.1. The test string can then be followed any other character or end of line.
How do I do that in java?
Thanks in advance
String regex="(version)(\\s)(1|1\\.0)";
Pattern p = Pattern.compile(regex);
Matcher m = null;
String testString1="version 1";
m = p.matcher(testString1);
System.out.println (m.find());
String testString2="version 1.0";
m = p.matcher(testString2);
System.out.println (m.find());
String testString3="version 1.1"; // should not match
m = p.matcher(testString3);
System.out.println (m.find());
If you have version string in a longer string then use this lookahead regex:
\bversion\s+1(?:\.0)?(?=\s|$)
RegEx Demo
In Java:
final String regex = "\\bversion\\s+1(?:\\.0)?(?=\\s|$)";
(?=\\s|$) is positive lookahead to assert that we have a whitespace or line end after version number.
Proposal:
"(version)\\s(1(\\.0)?)([^\\.0-9].*|$)"
The string "version" needs to be present
followed by any whitespace
then a single "1"
optionally followed by ".0"
and the next char either (cannot be "." or any digit (think 1.01 is forbidden, as well as 1.0.1)) or (is the end of the string)
Related
I am having problems understand how regular expression can match text but not include the matched text that is found. Perhaps I need to be working with groups which I'm not doing because I usually see the term non-capturing groups being used.
The goal is say I have ticket in a log file as follows:
TICKET/A/ADMIN/05MAR2020// to return only A/ADMIN/05MAR2020
or if
TICKET/A/ENGINEERING/05MAR2020. to return only A/ENGINEERING/05MAR02020
where the "//" or "." has been removed
Lastly to ignore lines like:
TICKET HAS BEEN COMPLETED
using regex = "(?<=^TICKET\\s{0,2}/).*(?://|\\.)?
So telling parser look for TICKET at start of string followed by a forward slash, but don't return TICKET. And look for either a double forward slash "//" or "." a period at the end of string but make this optional.
My Java 1.8.x code follows:
// used in the import statement: import java.util.regex.Matcher;
// import java.util.regex.Pattern;
private static void testRegex() {
String ticket1 = "TICKET/A/ITSUPPORT/05MAR2020//";
String ticket2 = "TICKET /B/ADMIN/06MAR2020.";
String ticket3 = "TICKET/C/GENERAL/07MAR2020";
//https://www.regular-expressions.info/brackets.html
String regex = "(?<=^TICKET\\s{0,2}/).*(?://|\\.)?";
Pattern pat = Pattern.compile(regex);
Matcher mat = pat.matcher(ticket1);
if (mat.find()) {
String myticket = ticket1.substring(mat.start(), mat.end());
System.out.println(myticket+ ", Expect 'A/ITSUPPORT/05MAR2020'");
}
mat = pat.matcher(ticket2);
if (mat.find()) {
String myticket = ticket2.substring(mat.start(), mat.end());
System.out.println(myticket+", Expect 'B/ADMIN/06MAR2020'");
}
mat = pat.matcher(ticket3);
if (mat.find()) {
String myticket = ticket3.substring(mat.start(), mat.end());
System.out.println(myticket+", Expect 'C/GENERAL/07MAR2020'");
}
regex = "(//|\\.)";
pat = Pattern.compile(regex);
mat = pat.matcher(ticket1);
if (mat.find()) {
String myticket = ticket1.substring(mat.start(), mat.end());
System.out.println(myticket+", "+mat.start() + ", " + mat.end() + ", " + mat.groupCount());
}
}
My actual results follow:
A/ITSUPPORT/05MAR2020//, Expect 'A/ITSUPPORT/05MAR2020
B/ADMIN/06MAR2020., Expect 'B/ADMIN/06MAR2020
C/GENERAL/07MAR2020, Expect 'C/GENERAL/07MAR2020
//, 28, 30, 1
Any suggestion would be appreciate. Please note, been learning from StackOverflow long-time but first entry, hope question is asked appropriately. Thank you.
You could use a positive lookahead at the end of the pattern instead of a match.
The lookahead asserts what is at the end of the string is an optional // or .
As the dot and the double forward slash are optional, you have to make the .*? non greedy.
(?<=^TICKET\s{0,2}/).*?(?=(?://|\.)?$)
In parts
(?<= Positive lookbehind, assert what is on the left is
^ Start of the string
TICKET\s{0,2}/ Match TICKET and 0-2 whitespace chars followed by /
) Close lookbehind
.*? Match any char except a newline 0+ times, as least as possible (non greedy)
(?= Positive lookahead, assert what is on the the right is
(?: Non capture group for the alternation | because both can be followed by $
// Match 2 forward slashes
| Or
\. Match a dot
)? Close the non capture group and make it optional
$ Assert the end of the string
) Close the positive lookahead
In Java
String regex = "(?<=^TICKET\\s{0,2}/).*?(?=(?://|\\.)?$)";
Regex demo 1 | Java demo
1. The regex demo has Javascript selected for the demo only
Output of the updated pattern with your code:
A/ITSUPPORT/05MAR2020, Expect 'A/ITSUPPORT/05MAR2020'
B/ADMIN/06MAR2020, Expect 'B/ADMIN/06MAR2020'
C/GENERAL/07MAR2020, Expect 'C/GENERAL/07MAR2020'
//, 28, 30, 1
I need to extract a substring from a string using regex. The tricky (for me) part is that the string may be in one of two formats:
either LLDDDDLDDDDDDD/DDD (eg. AB1000G242424/001) or just between 1 and 7 digits (eg. 242424).
The substring I need to extract would needs to be:
If string is 7 digits or longer, then extract substring consisting of 7 digits.
Else (if string is shorter than 7 digits), then extract substring consisting of 1-6 digits.
Below is one of my tries.
String regex = ("([0-9]{7}|[0-9]{0,6})");
Pattern pattern = Pattern.compile(regex);
Matcher matcher;
matcher = pattern.matcher("242424");
String extractedNr1 = "";
while (matcher.find()) {
extractedNr1 += matcher.group();
}
matcher = pattern.matcher("AB1000G242424/001");
String extractedNr2 = "";
while (matcher.find()) {
extractedNr2 += matcher.group();
}
System.out.println("ExtractedNr1 = " + extractedNr1);
System.out.println("ExtractedNr2 = " + extractedNr2);
Output:
ExtractedNr1 = 242424
ExtractedNr2 = 1000242424001
I understand the second one is a concat from all the groups, but don't understand why matches are arranged like that. Can I make a regex that will stop immidiately after finding a match (with priority for the first option, that is 7 digits)?
I thought about using some conditional statement, but apparently these are not supported in java.util.regex, and I cannot use third party library.
I can do this in java obviously, but the whole point is in using regex.
Regex is a secundary concern, the occurrences of digits must be compared by length. As in regex \d stand for digit and \D for non-digit you can use String.splitAsStream as follows:
Optional<String> digits takeDigits(String s) {
return s.splitAsStream("\\D+")
filter(w -> !w.isEmpty() && w.length() <= 7)
max(Comparator.comparingInt(String::length));
}
You can use String.replaceAll to remove the non-digit characters:
String extracted = new String("AB1000G242424/001").replaceAll("[^0-9]","");
if (extracted.length() > 7)
extracted = extracted.substring(0, 7);
Output:
1000242
I want to parse through hyphen, the answer should be 0 0 1 (integer), what could be the best way to parse in java
public static String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
Please help me out.
Use the below regex with Pattern and matcher classes.
Pattern.compile("\\d+(?=-)");
\\d+ - Matches one or more digits. + repeats the previous token \\d (which matches a digit character) one or more times.
(?=-) - Only if it's followed by an hyphen. (?=-) Called positive lookahead assertion which asserts that the match must be followed by an - symbol.
String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
Matcher m = Pattern.compile("\\d+(?=-)").matcher(str);
while(m.find())
{
System.out.println(m.group());
}
one lazy way: if you already know the pattern of the string, use substring and indexof to locate your word.
String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
integer int1 = Integer.parseInt(str.substring(str.indexOf("["),str.indexOf("-S1")));
and so on.
I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )
I would like to create a regular expression pattern that would succeed in matching only if the pattern string not followed by any other string in the test string or input string ! Here is what i tried :
Pattern p = Pattern.compile("google.com");//I want to know the right format
String input1 = "mail.google.com";
String input2 = "mail.google.com.co.uk";
Matcher m1 = p.matcher(input1);
Matcher m2 = p.matcher(input2);
boolean found1 = m1.find();
boolean found2 = m2.find();//This should be false because "google.com" is followed by ".co.uk" in input2 string
Any help would be appreciated!
Your pattern should be google\.com$. The $ character matches the end of a line. Read about regex boundary matchers for details.
Here is how to match and get the non-matching part as well.
Here is the raw regex pattern as an interactive link to a great regular expression tool
^(.*)google\.com$
^ - match beginning of string
(.*) - capture everything in a group up to the next match
google - matches google literal
\. - matches the . literal has to be escaped with \
com - matches com literal
$ - matches end of string
Note: In Java the \ in the String literal has to be escaped as well! ^(.*)google\\.com$
You should use google\.com$. $ character matches the end of a line.
Pattern p = Pattern.compile("google\\.com$");//I want to know the right format
String input2 = "mail.google.com.co.uk";
Matcher m2 = p.matcher(input2);
boolean found2 = m2.find();
System.out.println(found2);
Output = false
Pattern p = Pattern.compile("google\.com$");
The dollar sign means it has to occur at the end of the line/string being tested. Note too that your dot will match any character, so if you want it to match a dot only, you need to escape it.