I have a search string.
When it contains a dollar symbol, I want to capture all characters thereafter, but not include the dot, or a subsequent dollar symbol.. The latter would constitute a subsequent match.
So for either of these search strings...:
"/bla/$V_N.$XYZ.bla";
"/bla/$V_N.$XYZ;
I would want to return:
V_N
XYZ
If the search string contains percent symbols, I also want to return what's between the pair of % symbols.
The following regex seems do the trick for that.
"%([^%]*?)%";
Inferring:
Start and end with a %,
Have a capture group - the ()
have a character class containing anything except a % symbol, (caret infers not a character)
repeated - but not greedily *?
Where some languages allow %1, %2, for capture groups, Java uses backslash\number syntax instead. So, this string compiles and generates output.
I suspect the dollar symbol and dot need escaping, as they are special symbols:
$ is usually end of string
. is a meta sequence for any character.
I have tried using double backslash symbols.. \
Both as character classes .e.g. [^\\.\\$%]
and using OR'd notation %|\\$
in attempts to combine this logic and can't seem to get anything to play ball.
I wonder if another pair of eyes can see how to solve this conundrum!
My attempts so far:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "([%\\$])([^%\\.\\$]*?)\\1?";
/* Either % or $ in first capture group ([%\\$])
* Second capture group - anything except %, dot or dollar sign
* non greedy group ( *?)
* then a backreference to an optional first capture group \\1?
* Have to use two \, since you escape \ in a Java string.
*/
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(search);
List<String> results = new ArrayList<String>();
while (m.find())
{
for (int i = 0; i<= m.groupCount(); i++) {
results.add(m.group(i));
}
}
for (String result : results) {
System.out.println(result);
}
}
}
The following links may be helpful:
An interactive Java playground where you can experiment and copy/paste code.
Regex101
Java RegexTester
Java backreferences (The optional backreference \\1 in the Regex).
Link that summarises Regex special characters often found in languages
Java Regex book EPub link
Regex Info Website
Matcher class in the Javadocs
You may use
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "[%$]([^%.$]*)";
Matcher matcher = Pattern.compile(pattern).matcher(search);
while (matcher.find()){
System.out.println(matcher.group(1));
} // => V_N, XYZ
See the Java demo and the regex demo.
NOTE
You do not need an optional \1? at the end of the pattern. As it is optional, it does not restrict match context and is redundant (as the negated character class cannot already match neither $ nor%)
[%$]([^%.$]*) matches % or $, then captures into Group 1 any zero or more
chars other than %, . and $. You only need Group 1 value, hence, matcher.group(1) is used.
In a character class, neither . nor $ are special, thus, they do not need escaping in [%.$] or [%$].
Related
String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M
If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}
You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";
I need to construct a regular expression such that it should not allow / at the start or end, and there should not be more than one / in sequence.
Valid Expression is: AB/CD
Valid Expression :AB
Invalid Expression: //AB//CD//
Invalid Expression: ///////
Invalid Expression: AB////////
The / character is just a separator between two words. Its length should not be more than one between words.
Assuming you only want to allow alphanumerics (including underscore) between slashes, it's pretty trivial:
boolean foundMatch = subject.matches("\\w+(?:/\\w+)*");
Explanation:
\w+ # Match one or more alnum characters
(?: # Start a non-capturing group
/ # Match a single slash
\w+ # Match one or more alnum characters
)* # Match that group any number of times
This regex does it:
^(?!/)(?!.*//).*[^/]$
So in java:
if (str.matches("(?!/)(?!.*//).*[^/]"))
Note that ^ and $ are implied by matches(), because matches must match the whole string to be true.
[a-zA-Z]+(/[a-zA-Z]+)+
It matches
a/b
a/b/c
aa/vv/cc
doesn't matches
a
/a/b
a//b
a/b/
Demo
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Reg {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("[a-zA-Z]+(/[a-zA-Z]+)+");
Matcher matcher = pattern.matcher("a/b/c");
System.out.println(matcher.matches());
}
}
I'm trying to get a similar result \ has in Java String literals. If there are two of them, it's a \, otherwise it "escapes" whatever follows. So if there is a delimiter that follows a single release char, it doesn't count. But two release chars resolve to a release char literal, so then the following delimiter should be considered a delimiter. So, if an odd number of release chars precede a delimiter, it's ignored. For 0 or an even number it's a delimiter. So, in the code example below:
?: <- : is not a delimiter
??: <- : is a delimiter
???: <- : is not a delimiter
????: <- : is a delimiter
Here's sample code showing what doesn't work.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestPattern
{
public static void main(final String[] args)
{
final Matcher m = Pattern.compile("(\\?\\?)*[^\\?]\\:").matcher("a??:b:c");
m.find(0);
System.out.println(m.end());
}
}
The following should work
\b(\?{2})*:
The * means there can be zero of that group. So that capturing group can be the empty string. [^\\?] can be any character that isn't a ?, as ? is not a special character inside a character class. The \ is ignored.
Therefore, b: (with an empty string preceding it) matches, and the second colon is your last (and, in this case, first) match.
I think you simply want "(\\?\\?)*\\?:".
Your regex means:
Zero or more '??'
(\\?\\?)*
Followed by not '?'
[^\\?]
Ending in ':'
\\:
So, your last match is the last colon. That's why the result offset is 6.
You could change for:
final Matcher m = Pattern.compile("((\\?){2})+").matcher("a??:b:????:c");
while (m.find()){
//outputs 1 and 6, places
//you would have to start
//scaping...
System.out.println(m.start());
}
It appears that just be reversing the regex it works. Putting the "don't match a ?" first, and then the "any even number of ?'s" seems to do the trick:
[^?](\\?\\?)*:
I am trying to create a hexadecimal calculator but I have a problem with the regex.
Basically, I want the string to only accept 0-9, A-E, and special characters +-*_
My code keeps returning false no matter how I change the regex, and the adding the asterisk is giving me a PatternSyntaxException error.
public static void main(String[] args) {
String input = "1A_16+2B_16-3C_16*4D_16";
String regex = "[0-9A-E+-_]";
System.out.println(input.matches(regex));
}
Also whenever I add the * as part of the regex it gives me this error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 9
[0-9A-E+-*_]+
^
You need to match more than one character with your regex. As it currently stands you only match one character.
To match one or more characters add a + to the end of the regex
[0-9A-E+-_]+
Also to match a * just add a star in the brackets so the final regex would be
[0-9A-E+\\-_*]+
You need to escape the - otherwise the regex thinks you want to accept all character between + and _ which is not what you want.
You regex is OK there should be no exceptions, just add + at the end of regex which means one or more characters like those in brackets, and it seems you wanted * as well
"[0-9A-E+-_]+"
public static boolean isValidCode (String code) {
Pattern p = Pattern.compile("[fFtTvV\\-~^<>()]+"); //a-zA-Z
Matcher m = p.matcher(code);
return m.matches();
}
I have a string that looks something like the following:
12,44,foo,bar,(23,45,200),6
I'd like to create a regex that matches the commas, but only the commas that are not inside of parentheses (in the example above, all of the commas except for the two after 23 and 45). How would I do this (Java regular expressions, if that makes a difference)?
Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):
Pattern regex = Pattern.compile(
", # Match a comma\n" +
"(?! # only if it's not followed by...\n" +
" [^(]* # any number of characters except opening parens\n" +
" \\) # followed by a closing parens\n" +
") # End of lookahead",
Pattern.COMMENTS);
This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.
Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.
The regex is very simple:
\(.*?\)|(,)
The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.
In this demo, you can see the Group 1 captures in the lower right pane.
You said you want to match the commas, but you can use the same general idea to split or replace.
To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();
// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list
// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program
Here is a live demo
To use the same technique for splitting or replacing, see the code samples in the article in the reference.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
I don’t understand this obsession with regular expressions, given that they are unsuited to most tasks they are used for.
String beforeParen = longString.substring(longString.indexOf('(')) + longString.substring(longString.indexOf(')') + 1);
int firstComma = beforeParen.indexOf(',');
while (firstComma != -1) {
/* do something. */
firstComma = beforeParen.indexOf(',', firstComma + 1);
}
(Of course this assumes that there always is exactly one opening parenthesis and one matching closing parenthesis coming somewhen after it.)