I am looking for regex which can help me replace strings like
source=abc/task=cde/env=it --> source='abc'/task='cde'/env='it'
To be more precise, I want to replace a string which starts with = and ends with either / or end of the string with ''
Tried code like this
"source=abc/task=cde/env=it".replaceAll("=(.*?)/","'$1'")
But that results in
source'abc'task'cde'env=it
Using lookahead and look behind:
(?<==)([^/]*)((?=/)|$)
Lookbehind allows you to specify what comes before your match. In this case an equals: (?<==).
The main match in my regex looks for any non-slash character, zero or more times: ([^/]*)
Lookahead allows you to specify what comes after your match. In this case, a slash: (?=/).
The $ matches the end of the line, so that the last item in your test data becomes quoted. ((?=/)|$) combines with this with the lookahead, meaning "either a slash comes after the match or this is the end of the line".
Here it is in action in a test.
#Test
public void test_quote_items() {
String regex = "(?<==)([^/]*)((?=/)|$)";
String actual = "source=abc/task=cde/env=it".replaceAll(regex,"'$1'");
String expected = "source='abc'/task='cde'/env='it'";
assertEquals(expected, actual);
}
Try
String input = "source=abc/task=cde/env=it".replaceAll("=(.*?)(/|$)","='$1'/");
The problems I found are that you are not replacing the =
and also the / is not there for the end of String, that also needs to be replaced when found.
output
source='abc'/task='cde'/env='it'/
If you don't want the last '/', that is trivial to remove isn't it.
I want to split an input string based on the regex pattern using Pattern.split(String) api. The regex uses both positive and negative lookaheads. The regex is supposed to split on a delimiter (,) and needs to ignore the delimiter if it is enclosed in double inverted quotes("x,y").
The regex is - (?<!(?<!\Q\\E)\Q\\E)\Q,\E(?=(?:[^\Q"\E]*(?<=\Q,\E)\Q"\E[[^\Q,\E|\Q"\E] | [\Q"\E]]+[^\Q"\E]*[^\Q\\E]*[\Q"\E]*)*[^\Q"\E]*$)
The input string for which this split call is getting timed out is -
"","1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]","QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I read that the lookup technics are heavy and can cause the timeouts if the string is too long. And if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect and split.
The pattern also does not detect the first delimiter at place [STIFFENER]","QH20426AD3 with the above string. But if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect it.
I am not very experienced with the lookup in regex, can some one please give hints about how can I optimize this regex and avoid time outs?
Any pointers, article links are appreciated!
If you want to split on a comma, and the strings that follow are from an opening till closing double quote after it:
,(?="[^"\\]*(?:\\.[^"\\]*)*")
The pattern matches:
, Match a comma
(?= Positive lookahad
"[^"\\]* Match " and 0+ times any char except " or \
(?:\\.[^"\\]*)*" Optionally repeat matching \ to escape any char using the . and again match any chars other than " and /
) Close lookahead
Regex demo | Java demo
String string = "\"\",\"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]\",\"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\\\"BOLT,HI-JOK\\\"]\"\n";
String[] parts = string.split(",(?=\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")");
for (String part : parts)
System.out.println(part);
Output
""
"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]"
"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I have the following regular expression that I'm using to remove the dev. part of my URL.
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll(".*\\.(?=.*\\.)", ""));
Outputs: mydomain.com but this is giving me issues when the domains are in the vein of dev.mydomain.com.pe or dev.mydomain.com.uk in those cases I am getting only the .com.pe and .com.uk parts.
Is there a modifier I can use on my regex to make sure it only takes what is before the first . (dot included)?
Desired output:
dev.mydomain.com -> mydomain.com
stage.mydomain.com.pe -> mydomain.com.pe
test.mydomain.com.uk -> mydomain.com.uk
You may use
^[^.]+\.(?=.*\.)
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1 or more chars other than dots
\. - a dot
(?=.*\.) - followed with any 0 or more chars other than line break chars as many as possible and then a ..
Java usage example:
String result = domain.replaceFirst("^[^.]+\\.(?=.*\\.)", "");
Following regex will work for you. It will find first part (if exists), captures rest of the string as 2nd matching group and replaces the string with 2nd matching group. .*? is non-greedy search that will match until it sees first dot character.
(.*?\.)?(.*\..*)
Regex Demo
sample code:
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "stage.mydomain.com.pe";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "test.mydomain.com.uk";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
output:
mydomain.com
mydomain.com.pe
mydomain.com.uk
mydomain.com
I am using an email validation pattern I found at How to validate an email and it works fine except it allows a + in the first part of the email and that isn't allowed in my specs. The original code is
public static final String EMAIL_PATTERN = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
protected boolean isInvalidEmail(String email) {
pattern = Pattern.compile(EMAIL_PATTERN);
matcher = pattern.matcher(email);
return !matcher.matches();
}
I thought I could just remove the + from "^[_A-Za-z0-9-\\+] but I get a Pattern Syntax Exception: Unclosed Character Class. Can someone tell me why removing the + uncloses the class? Thanks!
You have to remove the \\+ portion.
\\ escapes the \ character. \+ escapes the + regex operator. Thus \\+ breaks down to \+ which means match the literal + character.
Note: The + regex operator means match one or more of the preceding element.
The reason that it gives you Unclosed Character Class is because only removing the + now escapes the closing square bracket so it is considered part of the pattern. Hence, the class does not have a matching closing square bracket. As Jonny Henly mentions the solution is to remove the \\+ to align with your spec, but this gives the answer as to why it is unclosed.
I have a CSS style that I need to extract the color from using a Java regex.
eg
color:#000;
I need to extract the thing after : to ;. Can anyone give an example?
I'm not sure how to apply it to Java, but one regex to do this would be:
^color:\s*(#[0-9a-f]+);?$
To just extract from : up to ; do something like:
Pattern pattern = Pattern.compile("[^:]*:(.*);");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
String value = matcher.group(1);
System.out.println("'" + value+ "'"); // do something with value
}
[^:]* - any number of chars that are not ':'
: - one ':'
(...) - a capturing group
.*- any number of any character
;- the terminating ';'
use color:(.*); for only accepting values for 'color'.
/(?<=:).+(?=;)/
That will do it for you
Not sure how you implement regex in Java though.
www.regexr.com to help you text out your regex in real time.
The expression
":(#.+);"
should do it