Regex Exceptions - java

I am trying to come up with a java regex that will match numbers with 2 too 3 decimals and not match any decimal number more than 3.
this is my regex
[0-9]{2}[.][0-9]{3}
It matches 41.51778000 and 18.740
but I only want it to match numbers that have exactly 3 decimal places and not numbers with more than three

You need to ask the regex to match the end and start as well.
^[0-9]{2}[.][0-9]{3}$

You must use word boundary on either side to stop unexpected matches:
\b[0-9]{2}[.][0-9]{2,3}\b
In Java it would be:
\\b\\d{2}\\.\\d{2,3}\\b

You can invoke Matcher#matches or String.matches instead of, say, Matcher#find to match the whole String.
Otherwise, you can prepend ^ and append $ to your pattern, to delimit start and end of input.
Finally, you can surround your pattern with something like \\D, or \\b or \\w to respectively match non-digits, word boundaries or whitespace around it, if you need to invoke find on an input containing more than 1 instance of the pattern.

Related

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.
Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here
In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

How to validate dash-separated values using regex?

I have a string with digits and separators.
The digits may either be separated by a comma or a hyphen. But there may never be two digits that are both separated by hyphens without a comma in between.
Example:
valid: 123,12,2,1-3,1,1-3,1
invalid: 123,12,2,1-3,1,1-3-5,1
I have a regex that almost works, except it does not detect those 1-3-5 invalid lines.
How can I improve the following?
^([0-9])+((,|-)[0-9]+)*$
You can decompose your input:
normal: one or more digits, optionally followed by a dash then one or more digits;
special: a comma.
The regex for the normal case can be written as \d+(?:-\d+)?; for the special case, this is simply ,.
Applying the normal* (special normal*)* pattern, and adding anchors and quantifiers, we have:
^\d+(?:-\d+)?(,\d+(?:-\d+)?)*$
Here's a solution:
^(?:\d+(?:-\d+)?(?:,|$))+$
Demo
Explanation: Match a number, optionally followed by a dash and another number, then match either a comma or the end of the string. And repeat.
You can add condition using look-around which will search for -digits- so your regex can look like:
^(?!.*-\\d+-)[0-9]+([,-][0-9]+)*$
^^^^^^^^^^^^-negative look-ahead, match will fail if there is any -digits- in your string

Constructing a specific regex

I want to make a regular expression in Java, with the next criteria:
Length: 10 characters exactly. Not more, not less.
Can accept any character between A-Z (only uppercase letters) and between digits 0-9.
Can accept only one dash character '-' in any position. It cannot accept any other characters, strictly only one dash.
EXAMPLES:
ABCD-12345
F-01234GHK
09-PL89GG5
LJ8U9N3-Y2
PLN86D4V-1
I have been making tries with regex of my own invention, some regular expressions that are close to the result I want, but with no success.
Do I have to combine two regular expressions?
Please, help me to get rid of this issue.... and thanks in advance!!!
I think you need lookahead (which is a way of combining two regular expressions, sort of).
^(?!.*-.*-)[A-Z0-9-]{10}$
The second part will match 10 characters that are A-Z, 0-9, or dash; the first part is negative lookahead that will reject a pattern that has two dashes in it.
You can use this:
^(?![^-]*+-[^-]*+-)[A-Z0-9-]{10}$
Note: If you use the matches method you can remove anchors.

How to make a regular expression that matches tokens with delimiters and separators?

I want to be able to write a regular expression in java that will ensure the following pattern is matched.
<D-05-hello-87->
For the letter D, this can either my 'D' or 'E' in capital letters and only either of these letters once.
The two numbers you see must always be a 2 digit decimal number, not 1 or 3 numbers.
The string must start and end with '<' and '>' and contain '-' to seperate parts within.
The message in the middle 'hello' can be any character but must not be more than 99 characters in length. It can contain white spaces.
Also this pattern will be repeated, so the expression needs to recognise the different individual patterns within a logn string of these pattersn and ensure they follow this pattern structure. E.g
So far I have tried this:
([<](D|E)[-]([0-9]{2})[-](.*)[-]([0-9]{2})[>]\z)+
But the problem is (.*) which sees anything after it as part of any character match and ignores the rest of the pattern.
How might this be done? (Using Java reg ex syntax)
Try making it non-greedy or negation:
(<([DE])-([0-9]{2})-(.*?)-([0-9]{2})>)
Live Demo: http://ideone.com/nOi9V3
Update: tested and working
<([DE])-(\d{2})-(.{1,99}?)-(\d{2})>
See it working: http://rubular.com/r/6Ozf0SR8Cd
You should not wrap -, < and > in [ ]
Assuming that you want to stop at the first dash, you could use [^-]* instead of .*. This will match all non-dash characters.

Java regex mix two patterns

How can i get this pattern to work:
Pattern pattern = Pattern.compile("[\\p{P}\\p{Z}]");
Basically, this will split my String[] sentence by any kind of punctuation character (p{P} or any kind of whitespace (p{Z}). But i want to exclude the following case:
(?<![A-Za-z-])[A-Za-z]+(?:-[A-Za-z]+){1,}(?![A-Za-z-])
pattern explained here: Java regex patterns
which are the hyphened words like this: "aaa-bb", "aaa-bb-cc", "aaa-bb-c-dd". SO, i can i do that?
Unfortunately it seems like you can't merge both expressions, at least as far as I know.
However, maybe you can reformulate your problem.
If, for example, you want to split between words (which can contain hyphens), try this expression:
(?>[^\p{L}-]+|-[^\p{L}]+|^-|-$)
This should match any sequence of non-letter characters that are not a minus or any minus that is followed my a non-letter character or that is the first or last character in the input.
Using this expression for a split should result in this:
input="aaa-bb, aaa-bb-cc, aaa-bb-c-dd,no--match,--foo"
ouput={"aaa-bb","aaa-bb-cc","aaa-bb-c-dd","no","match","","foo"}
The regex might need some additional optimization but it is a start.
Edit: This expression should get rid of the empty string in the split:
(?>[^\p{L}-][^\p{L}]*|-[^\p{L}]+|^-|-$)
The first part would now read as "any non-character which is not a minus followed by any number of non-character characters" and should match .-- as well.
Edit: in case you want to match words that could potentially contain hyphens, try this expression:
(?>(?<=[^-\p{L}])|^)\p{L}+(?:-\p{L}+)*(?>(?=[^-\p{L}])|$)
This means "any sequence of letters (\p{L}+) followed by any number of sequences consisting of one minus and at least one more letters ((?:-\p{L}+)*+). That sequence must be preceeded by either the start or anything not a letter or minus ((?>(?<=[^-\p{L}])|^)) and be followed by anything that is not a letter or minus or the end of the input ((?>(?=[^-\p{L}])|$))".

Categories