I'm trying to write a line of regex that performs the following:
A string variable that can contain only:
The letters a to z (upper and lowercase) (zero or many times)
The hyphen character (zero or many times)
The single quote character (zero or one time)
The space character (zero or one time)
Tried searching through many regex websites
.matches("([a-zA-Z_0-9']*(\\s)?)(-)?"))
This allows close to what I want, however you cant start typing a-z anymore after you have typed in space character. So it's sequential in a way. I want the validation to allow for any sequence of those factors.
Expected:
Allowed to type a string that has any amount of a-zA-Z, zero to one space, zero to one dash, anywhere throughout the string.
This is a validation for that
"^(?!.*\\s.*\\s)(?!.*'.*')[a-zA-Z'\\s-]*$"
Expanded
^ # Begin
(?! .* \s .* \s ) # Max single whitespace
(?! .* ' .* ' ) # Max single, single quote
[a-zA-Z'\s-]* # Optional a-z, A-Z, ', whitespace or - characters
$ # End
I guess,
^(?!.*([ ']).*\\1)[A-Za-z' -]*$
might work OK.
Here,
(?!.*([ ']).*\\1)
we are trying to say that, if there was horizontal space (\h) or single quote (') twice in the string, exclude those, which we would be then keeping only those with zero or one time of repetition.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "^(?!.*([ ']).*\\1)[A-Za-z' -]*$";
final String string = "abcAbc- ";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output
Full match: abcAbc-
Group 1: null
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Related
I am having problems understand how regular expression can match text but not include the matched text that is found. Perhaps I need to be working with groups which I'm not doing because I usually see the term non-capturing groups being used.
The goal is say I have ticket in a log file as follows:
TICKET/A/ADMIN/05MAR2020// to return only A/ADMIN/05MAR2020
or if
TICKET/A/ENGINEERING/05MAR2020. to return only A/ENGINEERING/05MAR02020
where the "//" or "." has been removed
Lastly to ignore lines like:
TICKET HAS BEEN COMPLETED
using regex = "(?<=^TICKET\\s{0,2}/).*(?://|\\.)?
So telling parser look for TICKET at start of string followed by a forward slash, but don't return TICKET. And look for either a double forward slash "//" or "." a period at the end of string but make this optional.
My Java 1.8.x code follows:
// used in the import statement: import java.util.regex.Matcher;
// import java.util.regex.Pattern;
private static void testRegex() {
String ticket1 = "TICKET/A/ITSUPPORT/05MAR2020//";
String ticket2 = "TICKET /B/ADMIN/06MAR2020.";
String ticket3 = "TICKET/C/GENERAL/07MAR2020";
//https://www.regular-expressions.info/brackets.html
String regex = "(?<=^TICKET\\s{0,2}/).*(?://|\\.)?";
Pattern pat = Pattern.compile(regex);
Matcher mat = pat.matcher(ticket1);
if (mat.find()) {
String myticket = ticket1.substring(mat.start(), mat.end());
System.out.println(myticket+ ", Expect 'A/ITSUPPORT/05MAR2020'");
}
mat = pat.matcher(ticket2);
if (mat.find()) {
String myticket = ticket2.substring(mat.start(), mat.end());
System.out.println(myticket+", Expect 'B/ADMIN/06MAR2020'");
}
mat = pat.matcher(ticket3);
if (mat.find()) {
String myticket = ticket3.substring(mat.start(), mat.end());
System.out.println(myticket+", Expect 'C/GENERAL/07MAR2020'");
}
regex = "(//|\\.)";
pat = Pattern.compile(regex);
mat = pat.matcher(ticket1);
if (mat.find()) {
String myticket = ticket1.substring(mat.start(), mat.end());
System.out.println(myticket+", "+mat.start() + ", " + mat.end() + ", " + mat.groupCount());
}
}
My actual results follow:
A/ITSUPPORT/05MAR2020//, Expect 'A/ITSUPPORT/05MAR2020
B/ADMIN/06MAR2020., Expect 'B/ADMIN/06MAR2020
C/GENERAL/07MAR2020, Expect 'C/GENERAL/07MAR2020
//, 28, 30, 1
Any suggestion would be appreciate. Please note, been learning from StackOverflow long-time but first entry, hope question is asked appropriately. Thank you.
You could use a positive lookahead at the end of the pattern instead of a match.
The lookahead asserts what is at the end of the string is an optional // or .
As the dot and the double forward slash are optional, you have to make the .*? non greedy.
(?<=^TICKET\s{0,2}/).*?(?=(?://|\.)?$)
In parts
(?<= Positive lookbehind, assert what is on the left is
^ Start of the string
TICKET\s{0,2}/ Match TICKET and 0-2 whitespace chars followed by /
) Close lookbehind
.*? Match any char except a newline 0+ times, as least as possible (non greedy)
(?= Positive lookahead, assert what is on the the right is
(?: Non capture group for the alternation | because both can be followed by $
// Match 2 forward slashes
| Or
\. Match a dot
)? Close the non capture group and make it optional
$ Assert the end of the string
) Close the positive lookahead
In Java
String regex = "(?<=^TICKET\\s{0,2}/).*?(?=(?://|\\.)?$)";
Regex demo 1 | Java demo
1. The regex demo has Javascript selected for the demo only
Output of the updated pattern with your code:
A/ITSUPPORT/05MAR2020, Expect 'A/ITSUPPORT/05MAR2020'
B/ADMIN/06MAR2020, Expect 'B/ADMIN/06MAR2020'
C/GENERAL/07MAR2020, Expect 'C/GENERAL/07MAR2020'
//, 28, 30, 1
Considering the following string: "${test.one}${test.two}" I would like my regex to return two matches, namely "test.one" and "test.two". To do that I have the following snippet:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final Pattern pattern = Pattern.compile("\\$\\{((?:(?:[A-z]+(?:\\.[A-z0-9()\\[\\]\"]+)*)+|(?:\"[\\w/?.&=_\\-]*\")+)+)}+$");
public static void main(String[] args) {
String testString = "${test.one}${test.two}";
Matcher matcher = pattern.matcher(testString);
while (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
I have some other stuff in there as well, because I want this to also be a valid match ${test.one}${"hello"}.
So, basically, I just want it to match on anything inside of ${} as long as it either follows the format: something.somethingelse (alphanumeric only there) or something.somethingElse() or "something inside of quotations" (alphanumeric plus some other characters). I have the main regex working, or so I think, but when I run the code, it finds two groups,
${test.two}
test.two
I want the output to be
test.one
test.two
Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.
If you load your regex at regex101, you will see it matches
\$\{
( - start of a capturing group
(?: - start of a non-capturing group
(?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)
(?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
)+ - repeat the non-capturing group 1 or more times
| - or
(?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
)+ - repeat the group pattern 1+ times
) - end of non-capturing group
}+ - 1+ } chars
$ - end of string.
To match any occurrence of your pattern inside a string, you need to use
\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}
See the regex demo, get Group 1 value after a match is found. Details:
\$\{ - a ${ substring
(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:
\"[^\"]*\" - ", 0+ chars other than " and then a "
| - or
\w+(?:\(\))? - 1+ word chars and an optional () substring
(?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
} - a } char.
See the Java demo:
String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
Output:
test.one
test.two
test.one
test.two()
test.one
"hello"
You could use the regular expression
(?<=\$\{")[a-z]+(?="\})|(?<=\$\{)[a-z]+\.[a-z]+(?:\(\))?(?=\})
which has no capture groups. The characters classes [a-z] can be modified as required provided they do not include a double-quote, period or right brace.
Demo
Java's regex engine performs the following operations.
(?<=\$\{") # match '${"' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
(?="\}) # match '"}' in a positive lookahead
| # or
(?<=\$\{) # match '${' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
\.[a-z]+ # match '.' followed by 1+ lowercase letters
(?:\(\))? # optionally match `()`
(?=\}) # match '}' in a positive lookahead
I have a search string.
When it contains a dollar symbol, I want to capture all characters thereafter, but not include the dot, or a subsequent dollar symbol.. The latter would constitute a subsequent match.
So for either of these search strings...:
"/bla/$V_N.$XYZ.bla";
"/bla/$V_N.$XYZ;
I would want to return:
V_N
XYZ
If the search string contains percent symbols, I also want to return what's between the pair of % symbols.
The following regex seems do the trick for that.
"%([^%]*?)%";
Inferring:
Start and end with a %,
Have a capture group - the ()
have a character class containing anything except a % symbol, (caret infers not a character)
repeated - but not greedily *?
Where some languages allow %1, %2, for capture groups, Java uses backslash\number syntax instead. So, this string compiles and generates output.
I suspect the dollar symbol and dot need escaping, as they are special symbols:
$ is usually end of string
. is a meta sequence for any character.
I have tried using double backslash symbols.. \
Both as character classes .e.g. [^\\.\\$%]
and using OR'd notation %|\\$
in attempts to combine this logic and can't seem to get anything to play ball.
I wonder if another pair of eyes can see how to solve this conundrum!
My attempts so far:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "([%\\$])([^%\\.\\$]*?)\\1?";
/* Either % or $ in first capture group ([%\\$])
* Second capture group - anything except %, dot or dollar sign
* non greedy group ( *?)
* then a backreference to an optional first capture group \\1?
* Have to use two \, since you escape \ in a Java string.
*/
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(search);
List<String> results = new ArrayList<String>();
while (m.find())
{
for (int i = 0; i<= m.groupCount(); i++) {
results.add(m.group(i));
}
}
for (String result : results) {
System.out.println(result);
}
}
}
The following links may be helpful:
An interactive Java playground where you can experiment and copy/paste code.
Regex101
Java RegexTester
Java backreferences (The optional backreference \\1 in the Regex).
Link that summarises Regex special characters often found in languages
Java Regex book EPub link
Regex Info Website
Matcher class in the Javadocs
You may use
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "[%$]([^%.$]*)";
Matcher matcher = Pattern.compile(pattern).matcher(search);
while (matcher.find()){
System.out.println(matcher.group(1));
} // => V_N, XYZ
See the Java demo and the regex demo.
NOTE
You do not need an optional \1? at the end of the pattern. As it is optional, it does not restrict match context and is redundant (as the negated character class cannot already match neither $ nor%)
[%$]([^%.$]*) matches % or $, then captures into Group 1 any zero or more
chars other than %, . and $. You only need Group 1 value, hence, matcher.group(1) is used.
In a character class, neither . nor $ are special, thus, they do not need escaping in [%.$] or [%$].
I need only specific number of special characters in a password. I tried the following regex
(?=.*[$#!%*?&]{1})
It takes special character from that set but accepts even multiple special characters.
{1} means number of characters from the set I am allowing the string to validate.
For example,
Alpha1? should be true for the above regular expression
#lpha1? should not be validated by above regex because now it has 2 characters from that set.
Can someone please help?
Any help is appreciated. Thanks in advance
Try this Regex:
^[^$#!%*?&\n]*[$#!%*?&][^$#!%*?&\n]*$
Explanation:
^ - asserts the start of the string
[^$#!%*?&\n]* - matches 0+ occurrences of any character that does NOT fall in these set of characters: $, #, !, %, ?, & or a newline character
[$#!%*?&] - matches one occurrence of one of these characters: $, #, !, %, ?, &
[^$#!%*?&\n]* - matches 0+ occurrences of any character that does NOT fall in these set of characters: $, #, !, %, ?, & or a newline character
$ - asserts the end of the string
Click for Demo
JAVA Code:(Generated here)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^[^$#!%*?&\\n]*[$#!%*?&][^$#!%*?&\\n]*$";
final String string = "abc12312\n"
+ "$123\n"
+ "$123?\n"
+ "Alpha1?\n"
+ "#lpha1?";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Update:
To get the strings with exactly 2 special characters, use this:
^(?:[^$#!%*?&\n]*[$#!%*?&]){2}[^$#!%*?&\n]*$
To get strings with exactly 5 spl. characters, replace {2} with {5}.
To get string with 2-5 special characters, use {2-5}
In Java you can use the method replaceAll to filter all characters up to your set of special characters. The method takes as argument a regular expression. The size of the result represents the count of special characters:
String password = "#foo!";
int size = password.replaceAll("[^$#$!%*?&]","").length();
System.out.println(size);
// will print 2
I have a string that looks something like the following:
12,44,foo,bar,(23,45,200),6
I'd like to create a regex that matches the commas, but only the commas that are not inside of parentheses (in the example above, all of the commas except for the two after 23 and 45). How would I do this (Java regular expressions, if that makes a difference)?
Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):
Pattern regex = Pattern.compile(
", # Match a comma\n" +
"(?! # only if it's not followed by...\n" +
" [^(]* # any number of characters except opening parens\n" +
" \\) # followed by a closing parens\n" +
") # End of lookahead",
Pattern.COMMENTS);
This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.
Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.
The regex is very simple:
\(.*?\)|(,)
The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.
In this demo, you can see the Group 1 captures in the lower right pane.
You said you want to match the commas, but you can use the same general idea to split or replace.
To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();
// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list
// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program
Here is a live demo
To use the same technique for splitting or replacing, see the code samples in the article in the reference.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
I don’t understand this obsession with regular expressions, given that they are unsuited to most tasks they are used for.
String beforeParen = longString.substring(longString.indexOf('(')) + longString.substring(longString.indexOf(')') + 1);
int firstComma = beforeParen.indexOf(',');
while (firstComma != -1) {
/* do something. */
firstComma = beforeParen.indexOf(',', firstComma + 1);
}
(Of course this assumes that there always is exactly one opening parenthesis and one matching closing parenthesis coming somewhen after it.)