Separate regex groups - java

I have a java regex to get a (numeric space text) value but when i have a new line the value doesn´t grouped, sample
regex expression (\d{14}\s.*\n)
Value :
59105999000321 My value start
with new line number
12105999000343 AAAAAasasas asdasdasd
32105999000323 asdasxasxasx asdasd
the group result in https://regex101.com/ is:
Match 1
Full match 0-29 `59105999000321 My value start`
Group 1. 0-29 `59105999000321 My value start`
Match 2
Full match 51-87 `12105999000343 AAAAAasasas asdasdasd`
Group 1. 51-87 `12105999000343 AAAAAasasas asdasdasd`
Match 3
Full match 88-122 `32105999000323 asdasxasxasx asdasd`
Group 1. 88-122 `32105999000323 asdasxasxasx asdasd`
I have a regex with following result:
group 1 : 59105999000321 My value start
with new line number
group 2: 12105999000343 AAAAAasasas asdasdasd
group 3: 32105999000323 asdasxasxasx asdasd
I expect
group 1 : 59105999000321 My value start with new line number
group 2: 12105999000343 AAAAAasasas asdasdasd
group 3: 32105999000323 asdasxasxasx asdasd
how i do this in java?

Try Regex: (\d{14}\s.*(?:\n*(?!^\d{14}).*)*)
Demo
Explanation:
Since the requirement is to match lines starting with 14 digits, a negative lookahead for the same is added after \n*

Related

match repeated blocks separated by ~

To match the following text:
text : SS~B66\88~PRELIMINARY PAGES\M01~HEADING PAGES
It has this format:<code1>~<description1>\<code2>~<description2>\<code3>~<description3>....<codeN>~<descriptionN>
I used this regex: [A-Z0-9 ]+~[A-Z0-9 ]+(?:\\[A-Z0-9 ]+~[A-Z0-9 ]+)+
So:
case 1. SS~B66\88~PRELIMINARY PAGES\M01~HEADING PAGES (Match: OK)
case 2. SS~B66\88~PRELIMINARY PAGES~HEADING PAGES (No Match: OK because I removed the code 'M01')
case 3. SS~B66~PRELIMINARY PAGES\M01~HEADING PAGES (No Match: OK because I removed the code '88')
More examples:
SS~B66\88~MEKLKE\M01~MOIIE
B~A310\0~PRELIM#INARY\00-00~HEADING
My problem is that <code> and <description> can accept any type of characters, so when I replaced my regex with:
My new regex .+~.+(?:\\.+~.+)+ , but it can match case 2 and case 3.
Thank you for your help.
Instead of using [A-Z0-9 ] which would not match all the allowed chars, or .+ which would match too much, you can use a negated character class [^~\\] matching any char except \ and ~ to set the boundaries for the matched parts.
^[^~]+~[^~\\]+(?:\\[^~]+~[^~\\]+)+$
^ Start of string
[^~]+~ Match any char other than ~, then match ~
[^~\\]+ Repeat matching 1+ times any char other than ~ and `
(?: Non capture group
\\[^~]+~[^~\\]+ Match \ and a ~ between other chars than ~ before and ~ \ after
)+ Close the group and repeat 1 or more times to match at least a \
$ End of string
Regex demo (The demo contains \n to not cross the newlines in the example data)

how to check regex starts and ends with regex

I am having the regex for capturing string if they are in between double quote and not start or end with /.
But the regex solution which I wanted.
The regex should not capture
Condition 1. Capture text between two double or single quotes.
Condition 2. But it shouldn't capture if starts with [ and ends with ]
Condition 3. But it shouldn't if starts with /" and ends with /' or starts with /" and ends with /'
Example:
REGEX: \"(\/?.)*?\"
Input: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])
output:
captured output:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
5. "test"
6. "in"
Expected result:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
Condition 1 explanation:
Capture the text between double or single quotes.
example:
input : "test","m2m:cin.as"
output: "test","m2m:cin.as"
Condition 2 explanation:
If the regex is between starts with [ and ends with ] but it is having double or single quote then also it should not capture.
example:
input: ["test"]
output: it should not capture
Condition 3 explanation:
In the above-expected result for the input "[/"Dimming Value/"]" there is a two-time double quote but is capturing only one excluding /". So, the output is [/"Dimming Value/"]. Like this, I want if /' (single quote preceded by /).
Note:
For input "[/"Dimming Value/"]" or '[/'Dimming Value/']', here although the text is between double quote and single quote and having [ and ] it should not ignore the string. The output should be [/"Dimming Value/"].
As I understood, you want to capture text between double quotes, except:
if initial double quotes prefixed by [ or final double quotes suffixed by ]
doubles quotes prefixed by / should not be the begin or end of matched text
I don't know if you want also capture text between single quotes, because you text is not complete clear.
For create a non capture group with negative matching of prefixed chars, you need a group of type Negative Lookbehind, with syntax (?<!prefix that you dont want), but this is not present on java or javascript regex engine.
The best regex that I build to return what you want for you example (but only work on PHP or python (you can check it on site regex101.com or similar)) is:
(?<![\[/])\"(?!\])(\/?.)*?\"(?![\]/])
I added the restriction for don't match if initial double quotes suffixed by ] to prevent match "][" on text ["test"]["in"]
Anyway, this will not solve your problem, since will not work within java or javascript engine!
Do you have any way to process the results, and exclude the bad matches?
If so, you can match bad prefix and bad suffix and exclude it from the results:
[\[]?\"(\/?.)*?\"[\]]?
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
["test"]
["in"]
Full javascript code, including pos processing:
'Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])'
.match(/[\[]?\"(\/?.)*?\"[\]]?/g).filter(s => !s.startsWith('[') && !s.endsWith(']'))
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
EDIT:
equivalent java code:
CharSequence yourStringHere = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(\"test\"), \"m2m:cin.as\"),\"payloads_ul.test\"),\"[/\"Dimming Value/\"]\",input[\"test\"][\"in\"])";
Matcher m = Pattern.compile("[\\[]?\\\"(\\/?.)*?\\\"[\\]]?")
.matcher(yourStringHere);
while (m.find()) {
String s = m.group();
if (!s.startsWith("[") && !s.endsWith("]")) {
allMatches.add(s);
}
}

Regex to merge multiple numbers with spaces in one line

I need a Regex to merge multiple numbers in a line without merging them all together.
Example line :
Hello World9.99 123 456.00 7 890 123.45 0.97
My desired output is :
Hello World9.99 123456.00 7890123.45 0.97
I know basic regex but am not experienced with lookaheads/behinds.
So far I created this method :
final String regex = "(?<!\\.\\d{1,3})\\s+(?=\\d{1,3}\\.?\\d{2}?)";
public String mergeNumbers(String s){
return s.replaceAll(regex, "");
}
This works fine if the number tied to the word has a dot.
But I just can't figure out how to match this line without a dot at the beginning :
Hello World99 123 456.00 7 890 123.45 0.97
This is returning :
Hello World99123456.00 7890123.45 0.97
but I want :
Hello World99 123456.00 7890123.45 0.97
So my question is :
How can I modify my regex to match both cases?
I suggest using
.replaceAll("\\b(?<!\\.)(\\d+)\\s+(?=\\d)", "$1")
See the regex demo.
Details:
\b - a word boundary
(?<!\.) - there can be no . immediately before the current location
(\d+) - Group 1 (referred to with $1 backreference from the string replacement pattern): one or more digits
\s+ - 1+ whitespaces
(?=\\d) - there must be a digit immediately to the right of the current location.

Regex: Select first specific word occurrences inside enclosed elements

I have a String containing url paths:
...
/test/section/1.png
"/test/section/test/2.png" "/test/section/test/2.png"
(/test/section/test/3.png)
...
I want to get all first "test" occurrences of enclosed url elements in quotes or parenthesis.
Until now i have accomplished to get the first occurance of each String with the '"' or '(':
(\(|\")(\/orbeon\/)
Matches are presented with bold.
Current output:
/test/section/1.png
"/test/ section/test/2.png" "/test/ section/test/2.png"
(/test/ section/test/3.png)
Desired output:
/test/section/1.png
" /test/ section/test/2.png" " /test/ section/test/2.png"
( /test/ section/test/3.png)
How can i exclude the char before matching word?
Caution! I want only the first word occurance of each enclosed url path:
Corner case: /test/ section/test/2.png
Using this regex with java
Your current (\(|\")(\/orbeon\/) regex matches ( or " into Group 1 and /orbeon/ into Group 2.
Thus, when you execute matcher.find(), you will need to access Group 2 using matcher.group(2).
Else, use a lookbehind: Pattern.compile("(?<=[(\"])/orbeon/"), and you will have access to the necessary text with matcher.group() or matcher.group(0). The (?<=[(\"]) positive lookbehind will assert the presence of ( or " before /orbeon/, and if not present, there won't be any match.

What is Java Regex to match number then space then 3 alphabet char?

I want to validate the about of money with the currency Unit.
100 USD : valid
1.11 USD : not valid
1,12 USD : not valid
12 US : not valid
So the valid string is "the number then space then 3 alphabet char".
text.matches("^\\d+ [a-zA-Z]{3}*$")
I got error:
Exception caught: Dangling meta character '*' near index 16
^\d+ [a-zA-Z]{3}*$
So how to fix it?
i fixed obmitting * then it is fine:
text.matches("^\\d+ [a-zA-Z]{3}$")

Categories