Java RegEx combine patterns in any form

Java RegEx combine patterns in any form - java

I'm trying to match some legal documents links. I've gone fare enough but I think I'm missing something. This is my work for now:
(\d( )?)?(([[a-zA-Z]\.])+?) ([0-9]+?)\b:([0-9]+?)?\b
I have a base construction witch I can match:
? = optional
number/space?/string/space/number/:/number
But now I want to optionally match any combination of the fallowing:
-/number
,/space/number
,/space/number/-/number
This is my best match:
(\d( )?)?(([[a-zA-Z]\.])+?) ([0-9]+?)\b:([0-9]+?)(, [0-9]+?)?(-[0-9]+?)?(, ([0-9]+?)-([0-9]+?)?)?\b
I can match this:
8 Law 84:145, 252-320
But not this:
8 Law 84:145, 252-320, 458, 517-665

You may use
(\d+)\s*([a-zA-Z]+)\s+(\d+):(\d+)((?:-\d+|,\s\d+(?:-\d+)?)*)
See the regex demo
The main part I added is ((?:-\d+|,\s\d+(?:-\d+)?)*) that matches and captures into a group 0 or more sequences of:
-\d+ - a hyphen and 1+ digits
| - or
,\s\d+(?:-\d+)? - comma, whitespace, 1+ digits, and then an optional sequence of - and 1+ digits.
Do not forget to double backslashes in the Java string literal inside the code.

Related

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.

Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here

In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Regular expression match fails if only whitespace after the - character

I am working on a regular expression where the pattern is:
1.0.0[ - optional description]/1.0.0.0[ - optional description].txt
The [ - optional description] part is of course, optional. So some possible VALID values are
1.0.0/1.0.0.0.txt
1.0.0/1.0.0.0 - xyz.txt
1.0.0 - abc/1.0.0.0 - xyz.txt
1.0.0 - abc/1.0.0.0.txt
To be a little more robust in the pattern matching, I'd like to match zero or more spaces before and after the "-" character. So all these would be valid too.
1.0.0 - abc/1.0.0.0 - xyz.txt
1.0.0-abc/1.0.0.0-xyz.txt
1.0.0 -abc/1.0.0.0- xyz.txt
To do this matching, I have the following regular expression (Java code):
String part1 = "((\\d+.{1}\\d+.{1}\\d+)(\\s*-\\s*(.+))?)";
String part2 = "((\\d+.{1}\\d+.{1}\\d+.{1}\\d+)(\\s*-\\s*(.+))?\\.sql)";
pattern = Pattern.compile(part1+ "/" + part2);
So far this regular expression is working well. But while unit testing I found a case I can't quite figure out yet. The use case is if the string contains the "-" character is surrounded by 1 or more spaces, but there is no description after the "-" character. This would look like:
1.0.0 - /1.0.0.0.txt
1.0.0- /1.0.0.0-xyz.txt
In these cases, I want the pattern match to FAIL. But with my current regular expression the match succeeds. I think what I want is if there is a "-" character surrounded by any number of spaces like " - " then there must also be at least 1 non-space character following it. But I can't quite figure out the regex for this.
Thanks!

Something like,
^\d+\.\d+\.\d+(?:\s*-\s*\w+)?\/\d+\.\d+\.\d+\.\d+(?:\s*-\s*\w+)?.txt$
Or you can combine the \.\d+ repetitions as
^\d+(?:\.\d+){2}(?:\s*-\s*\w+)?\/\d+(?:\.\d+){3}(?:\s*-\s*\w+)?.txt$
Regex Demo
Changes
.{1} When you want to repeat something once, no need for {}. Its implicit
(?:\s*-\s*\w+) Matches zero or more space (\s*) followed by -, another space and then \w+ a description of length greater than 1
The ? at the end of this patterns makes this optional.
This same pattern is repeated again at the end to match the second part.
^ Anchors the regex at the start of the string.
$ Anchors the regex at the end of the string. These two are necessary so that there is nothing other in the string.
Don't group the patterns using () unless it is necessary to capture them. This can lead to wastage of memory. Use (?:..) If you want to group patterns but not capture them

In the group that matches the optional part, you need to replace .+ with \\S+ where \S means any non-whitespace character. This enforces the optional part to include non-whitespace character in order to match the pattern:
String part1
= "((\\d+\\.\\d+\\.\\d+)(\\s*-\\s*(\\S+))?)";
String part2
= "((\\d+\\.\\d+\\.\\d+.{1}\\d+)(\\s*-\\s*(\\S+))?\\.txt)";
Also note that .{1} (which is the same as just .) matches any character. From the examples, you want to match a dot, so it should be replaced with \.

Something like
^\d+\.\d+\.\d+(?:\s*-\s*[^\/\s]+)?\/\d+\.\d+\.\d+\.\d+?(?:\s*-\s*[^.\s]+)?\.\w+$
Check it out here at regex101.

regex for optional characters

I am using the following regex:
^([W|w][P|p]|[0-9]){8}$
The above regex accepts wp1234567 (wp+7 digits) also. Whereas expected: WP+6digit or wp+6digit or only 8 digit
For example:
WP123456
wp126456
64535353

Note that [W|w] matches W, w and |, since | inside a character class loses its special meaning of an alternation operator. Also, by setting the grouping (...) around [W|w][P|p]|[0-9] you match 8 occurrences of *the whole sequences of WP or digits.
You should set the correct value in the limited quantifier and remove grouping and use alternation to allow either wp+6 digits or just 8 digits:
^(?:[Ww][Pp][0-9]{6}|[0-9]{8})$
See demo
The regex matches:
^ - start of string (not necessary if you check the whole string with String#matches())
(?:[Ww][Pp][0-9]{6}|[0-9]{8}) - 2 alternatives:
[Ww][Pp][0-9]{6} - W or w followed with P or p followed with 6 digits
| - or...
[0-9]{8} - exactly 8 digits
$ - end of string
Other scenarios (just in case):
If you need to match strings consisting of 7 or 8 digits, you need to replace {8} limited quantifier with {7,8}:
^(?:[Ww][Pp][0-9]{6}|[0-9]{7,8})$
And in case you do not want to match Wp123456 or wP123456, use one more alternation in the beginning:
^(?:(?:WP|wp)[0-9]{6}|[0-9]{8})$

How get the priority in regex?

I have a wrong regex
([A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9]*\.)
I need to accept strings like:
a-b.
ab.
a.
But i am not needing in this string - a-.
What should I change?

[A-Za-z0-9]+\.|[A-Za-z0-9]+-?[A-Za-z0-9]\.
The idea is:
-? - optional dash
\. - escaped dot, to match literal dot
| - alternation (one or the other)
x+ - one or more repetitions, equivalent to xx*
If you don't mind matching underscores too, you can use the word character set:
\w+\.|\w+-?\w\.
See it in action

You can try like this by using an optional group.
"(?i)[A-Z0-9](?:-?[A-Z0-9]+)*\\."
(?i) flag for caseless matching.
[A-Z0-9] one alphanumeric character
(?:-?[A-Z0-9]+)* any amount of (?: an optional hyphen followed by one or more alnum )
\. literal dot
See demo at Regexplanet (click Java)

This works for your test cases:
[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.
See live demo.

Regular expression for match

How to find such lines in a file
######## this_is_a_line.sh ########
I tried the below regular expression but it does not work
(#)+( )+(A-Za-z0-9_)+(.sh)( )+(#)+
But this doesn't seem to work. Can anyone let me know what is wrong?

You used (...) (a capturing group) instead of [...] (a character class). Use the character class:
#+ +[A-Za-z0-9_]+\.sh +#+
^^^^^^^^^^^^^
See regex demo (note most of the capture groups are redundant here, and I removed them. Also, . must be escaped to match a literal dot.)
The [A-Za-z0-9_]+ matches 1 or more letters, digits or _.
The (A-Za-z0-9_)+ matches 1 or more sequences of A-Za-z0-9_ (see demo).
Also, in Java, you can use \w to match [A-Za-z0-9_] and shorten your regex to
#+ +\w+\.sh +#+
Do not forget that you need to double each \ in the pattern string in Java (String pattern = "#+ +\\w+\\.sh +#+";).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java RegEx combine patterns in any form - java

Related

How to match a string in this way?

Regular expression match fails if only whitespace after the - character

regex for optional characters

How get the priority in regex?

Regular expression for match

Categories

Resources