I need a Regex to merge multiple numbers in a line without merging them all together.
Example line :
Hello World9.99 123 456.00 7 890 123.45 0.97
My desired output is :
Hello World9.99 123456.00 7890123.45 0.97
I know basic regex but am not experienced with lookaheads/behinds.
So far I created this method :
final String regex = "(?<!\\.\\d{1,3})\\s+(?=\\d{1,3}\\.?\\d{2}?)";
public String mergeNumbers(String s){
return s.replaceAll(regex, "");
}
This works fine if the number tied to the word has a dot.
But I just can't figure out how to match this line without a dot at the beginning :
Hello World99 123 456.00 7 890 123.45 0.97
This is returning :
Hello World99123456.00 7890123.45 0.97
but I want :
Hello World99 123456.00 7890123.45 0.97
So my question is :
How can I modify my regex to match both cases?
I suggest using
.replaceAll("\\b(?<!\\.)(\\d+)\\s+(?=\\d)", "$1")
See the regex demo.
Details:
\b - a word boundary
(?<!\.) - there can be no . immediately before the current location
(\d+) - Group 1 (referred to with $1 backreference from the string replacement pattern): one or more digits
\s+ - 1+ whitespaces
(?=\\d) - there must be a digit immediately to the right of the current location.
Related
I want to split an input string based on the regex pattern using Pattern.split(String) api. The regex uses both positive and negative lookaheads. The regex is supposed to split on a delimiter (,) and needs to ignore the delimiter if it is enclosed in double inverted quotes("x,y").
The regex is - (?<!(?<!\Q\\E)\Q\\E)\Q,\E(?=(?:[^\Q"\E]*(?<=\Q,\E)\Q"\E[[^\Q,\E|\Q"\E] | [\Q"\E]]+[^\Q"\E]*[^\Q\\E]*[\Q"\E]*)*[^\Q"\E]*$)
The input string for which this split call is getting timed out is -
"","1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]","QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I read that the lookup technics are heavy and can cause the timeouts if the string is too long. And if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect and split.
The pattern also does not detect the first delimiter at place [STIFFENER]","QH20426AD3 with the above string. But if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect it.
I am not very experienced with the lookup in regex, can some one please give hints about how can I optimize this regex and avoid time outs?
Any pointers, article links are appreciated!
If you want to split on a comma, and the strings that follow are from an opening till closing double quote after it:
,(?="[^"\\]*(?:\\.[^"\\]*)*")
The pattern matches:
, Match a comma
(?= Positive lookahad
"[^"\\]* Match " and 0+ times any char except " or \
(?:\\.[^"\\]*)*" Optionally repeat matching \ to escape any char using the . and again match any chars other than " and /
) Close lookahead
Regex demo | Java demo
String string = "\"\",\"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]\",\"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\\\"BOLT,HI-JOK\\\"]\"\n";
String[] parts = string.split(",(?=\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")");
for (String part : parts)
System.out.println(part);
Output
""
"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]"
"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I have the following regular expression that I'm using to remove the dev. part of my URL.
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll(".*\\.(?=.*\\.)", ""));
Outputs: mydomain.com but this is giving me issues when the domains are in the vein of dev.mydomain.com.pe or dev.mydomain.com.uk in those cases I am getting only the .com.pe and .com.uk parts.
Is there a modifier I can use on my regex to make sure it only takes what is before the first . (dot included)?
Desired output:
dev.mydomain.com -> mydomain.com
stage.mydomain.com.pe -> mydomain.com.pe
test.mydomain.com.uk -> mydomain.com.uk
You may use
^[^.]+\.(?=.*\.)
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1 or more chars other than dots
\. - a dot
(?=.*\.) - followed with any 0 or more chars other than line break chars as many as possible and then a ..
Java usage example:
String result = domain.replaceFirst("^[^.]+\\.(?=.*\\.)", "");
Following regex will work for you. It will find first part (if exists), captures rest of the string as 2nd matching group and replaces the string with 2nd matching group. .*? is non-greedy search that will match until it sees first dot character.
(.*?\.)?(.*\..*)
Regex Demo
sample code:
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "stage.mydomain.com.pe";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "test.mydomain.com.uk";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
output:
mydomain.com
mydomain.com.pe
mydomain.com.uk
mydomain.com
I am having the regex for capturing string if they are in between double quote and not start or end with /.
But the regex solution which I wanted.
The regex should not capture
Condition 1. Capture text between two double or single quotes.
Condition 2. But it shouldn't capture if starts with [ and ends with ]
Condition 3. But it shouldn't if starts with /" and ends with /' or starts with /" and ends with /'
Example:
REGEX: \"(\/?.)*?\"
Input: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])
output:
captured output:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
5. "test"
6. "in"
Expected result:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
Condition 1 explanation:
Capture the text between double or single quotes.
example:
input : "test","m2m:cin.as"
output: "test","m2m:cin.as"
Condition 2 explanation:
If the regex is between starts with [ and ends with ] but it is having double or single quote then also it should not capture.
example:
input: ["test"]
output: it should not capture
Condition 3 explanation:
In the above-expected result for the input "[/"Dimming Value/"]" there is a two-time double quote but is capturing only one excluding /". So, the output is [/"Dimming Value/"]. Like this, I want if /' (single quote preceded by /).
Note:
For input "[/"Dimming Value/"]" or '[/'Dimming Value/']', here although the text is between double quote and single quote and having [ and ] it should not ignore the string. The output should be [/"Dimming Value/"].
As I understood, you want to capture text between double quotes, except:
if initial double quotes prefixed by [ or final double quotes suffixed by ]
doubles quotes prefixed by / should not be the begin or end of matched text
I don't know if you want also capture text between single quotes, because you text is not complete clear.
For create a non capture group with negative matching of prefixed chars, you need a group of type Negative Lookbehind, with syntax (?<!prefix that you dont want), but this is not present on java or javascript regex engine.
The best regex that I build to return what you want for you example (but only work on PHP or python (you can check it on site regex101.com or similar)) is:
(?<![\[/])\"(?!\])(\/?.)*?\"(?![\]/])
I added the restriction for don't match if initial double quotes suffixed by ] to prevent match "][" on text ["test"]["in"]
Anyway, this will not solve your problem, since will not work within java or javascript engine!
Do you have any way to process the results, and exclude the bad matches?
If so, you can match bad prefix and bad suffix and exclude it from the results:
[\[]?\"(\/?.)*?\"[\]]?
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
["test"]
["in"]
Full javascript code, including pos processing:
'Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])'
.match(/[\[]?\"(\/?.)*?\"[\]]?/g).filter(s => !s.startsWith('[') && !s.endsWith(']'))
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
EDIT:
equivalent java code:
CharSequence yourStringHere = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(\"test\"), \"m2m:cin.as\"),\"payloads_ul.test\"),\"[/\"Dimming Value/\"]\",input[\"test\"][\"in\"])";
Matcher m = Pattern.compile("[\\[]?\\\"(\\/?.)*?\\\"[\\]]?")
.matcher(yourStringHere);
while (m.find()) {
String s = m.group();
if (!s.startsWith("[") && !s.endsWith("]")) {
allMatches.add(s);
}
}
I have a java regex to get a (numeric space text) value but when i have a new line the value doesn´t grouped, sample
regex expression (\d{14}\s.*\n)
Value :
59105999000321 My value start
with new line number
12105999000343 AAAAAasasas asdasdasd
32105999000323 asdasxasxasx asdasd
the group result in https://regex101.com/ is:
Match 1
Full match 0-29 `59105999000321 My value start`
Group 1. 0-29 `59105999000321 My value start`
Match 2
Full match 51-87 `12105999000343 AAAAAasasas asdasdasd`
Group 1. 51-87 `12105999000343 AAAAAasasas asdasdasd`
Match 3
Full match 88-122 `32105999000323 asdasxasxasx asdasd`
Group 1. 88-122 `32105999000323 asdasxasxasx asdasd`
I have a regex with following result:
group 1 : 59105999000321 My value start
with new line number
group 2: 12105999000343 AAAAAasasas asdasdasd
group 3: 32105999000323 asdasxasxasx asdasd
I expect
group 1 : 59105999000321 My value start with new line number
group 2: 12105999000343 AAAAAasasas asdasdasd
group 3: 32105999000323 asdasxasxasx asdasd
how i do this in java?
Try Regex: (\d{14}\s.*(?:\n*(?!^\d{14}).*)*)
Demo
Explanation:
Since the requirement is to match lines starting with 14 digits, a negative lookahead for the same is added after \n*
Given an excerpt of text like
Preface (optional, up to multiple lines)
Main : sequence1
sequence2
sequence3
sequence4
Epilogue (optional, up to multiple lines)
which Java regular expression could be used to extract all the sequences (i.e. sequence1, sequence2, sequence3, sequence4 above)? For example, a Matcher.find() loop?
Each "sequence" is preceded by and may also contain 0 or more white spaces (including tabs).
The following regex
(?m).*Main(?:[ |t]+:(?:[ |t]+(\S+)[\r\n])+
only yields the first sequence (sequence1).
You may use the following regex:
(?m)(?:\G(?!\A)[^\S\r\n]+|^Main\s*:\s*)(\S+)\r?\n?
Details:
(?m) - multiline mode on
(?:\G(?!\A)[^\S\r\n]+|^Main\s*:\s*) - either of the two:
\G(?!\A)[^\S\r\n]+ - end of the previous successful match (\G(?!\A)) and then 1+ horizontal whitespaces ([^\S\r\n]+, can be replaced with [\p{Zs}\t]+ or [\s&&[^\r\n]]+)
| - or
^Main\s*:\s* - start of a line, Main, 0+ whitespaces, :, 0+ whitespaces
(\S+) - Group 1 capturing 1+ non-whitespace symbols
\r?\n? - an optional CR and an optional LF.
See the Java code below:
String p = "(?m)(?:\\G(?!\\A)[^\\S\r\n]+|^Main\\s*:\\s*)(\\S+)\r?\n?";
String s = "Preface (optional, up to multiple lines)...\nMain : sequence1\n sequence2\n sequence3\n sequence4\nEpilogue (optional, up to multiple lines)";
Matcher m = Pattern.compile(p).matcher(s);
while(m.find()) {
System.out.println(m.group(1));
}