Limited currency regex - java

I have found a lot of good currency regular expressions that get very close to what I need. Alas, I am no regex guru and can't seem to edit my current regex to meet requirements.
I need to limit the valid inputs to the format of 'xxx,xxx.xx'. The max allowed amount needs to be '999,999.99' with commas optional. I've been using this regex until now:
^([0-9]{1,3}(,[0-9]{3})*|([0-9]+))(.[0-9]{2})?$
It has been working great except for not being able to make the upper limit '999,999.99'. Thanks for the help!
Update
I've been tinkering and I've managed to come up with this:
/^(?:([0-9]{3}?,?)?[0-9]{3}(?:\.[0-9]?[0-9]?)?)$/
Still testing to see if it works. RegexPlanet isn't passing it with any of the Strings I try, but I'll be going through my app and manually testing.

burning_LEGION's answer authorizes some cases I think you probably don't want:
- 999,9
- 9.
I'll assume you want those conditions fulfilled:
- if there is a comma, there are 3 numbers after
- if there is a point, there are 2 numbers after
^\d{1,3}(,?\d{3})?(\.\d{2})?$

use this regex ^\d{1,3}(,?\d{1,3}){0,1}(\.\d{0,2})?$

Related

RegEx: Handling subgroups of Big - OR - Group as individuals and not as group count together

Let's say I have this pattern:
(?:StackOverflow is (.*)|(.*) is StackOverflow)
I am using Java or Python. But I think they work quite similar.
My Input Strings would be:
StackOverflow is great
or
great is StackOverflow
In the real use case, I don't know the pattern and I don't know the input String. Both are set by the user.
I've tested it with regex101.com .
The result looks like this:
StackOverflow is great : Group 0 is great
great is StackOverflow : Group 1 is great
However I need both times to have Group 0 be great .
So what I'm trying to achieve is: Only count those groups, that actually exist in the input strings. Any other part of the big surrounding OR - group should be ignored.
I searched already on the internet but I don't really know what to search for in this case.
Is there a way to do this in RegEx?
Generally speaking, regex doesn't work that way. Groups are numbered from left to right and there's nothing you can do about it.
That said, the regex module for python does it differently. It would consider both of these groups as #1. Unfortunately I don't know if such a thing exists for Java.
However, I think the real solution here is for the user to input a different regex. For example, your regex could be written as (StackOverflow is )?(.+)(?(1)| is StackOverflow), which is functionally equivalent except the word you're matching will always be in group #2. (Of course, this solution doesn't work if the word absolutely must be captured in group #1.)

regular expression to limit 300 words

I need a regex which success 0-300 words and fails 301 or more words.
I tried:
^\s*(\S+\s+){0,300}\S*$
I also checked
^\W*(?:\w+\b\W*){0,300}$
Both are working fine but in Java I get a java.lang.StackOverflowError. I know using a larger "XSS" I get around this issue but I wanted to ask if there is a way to optimize the regex?
I believe the problem is because the Java implementation of Pattern uses up a stack for each repetition of a group because of backtracking. The solution might be to either change your approach as others have answered or to make all quantifiers possessive :
^\s*(\S+\s+){0,300}+\S*$
or
^\W*(?:\w+\b\W*){0,300}+$
For more info, see here or here.
You can use String.split and check the size of the returned array.
Your regex will fail if the 300th word is the last word and there is notspace ahead of it.You should use
^ *(?:\S+(?: +|$)){0,300} *$

Java regex to match quoted numbers

I need to clean up a JSON including incorrectly quoted numbers via a short Java (not JS!) Regex snippet. Example for what I have:
[{"series":"a","x":"1","y":"111.71"},{"series":"a","x":"2","y":"120.25"}]
Example for what I would need to get:
[{"series":"a","x":1,"y":111.71},{"series":"a","x":2,"y":120.25}]
So I only need to match and eliminate quote characters if preceeded or followed by [0-9], but how to avoid replacing part of the number is beyond my lowly regex skills.
Any help greatly appreciated!
EDIT (2nd round):
Thanks for the fast feedback! I'm not too worried about false positives since I can control the contents of the descriptors, and I'll make sure they're text-only. Spaces can be avoided as well, only negative numbers might occur - good one! Separators are always commas (",") for the JSON, the arbitrary number of decimals in of the double values are always separated by dots ("."). I cannot fix the JSON source unfortunately, and I definitely want to clean this up in Java.
Trying out the suggestions now and reporting back. I'll also toy around with this: http://www.regular-expressions.info/lookaround.html#lookbehind
How about replaceAll("\"(-?\\d+([.]\\d+)?)\"","$1");
This works for your specific example, but would not work if other numbers have a different format (see my comment):
String s = "[{\"series\":\"a\",\"x\":\"1\",\"y\":\"111.71\"},{\"series\":\"a\",\"x\":\"2\",\"y\":\"120.25\"}]";
String clean = s.replaceAll("\"(\\d+\\.?\\d*)\"", "$1");
System.out.println(clean);
outputs:
[{"series":"a","x":1,"y":111.71},{"series":"a","x":2,"y":120.25}]

Regex: How not to match a few letters

I have the following string: SEE ATTACHED ADDENDUM TO HUD-1194,520.07
Inside that string is HUD-1 and after that is 194,520.07. What I want is the 194,520.07 part.
I have written the following regular expression to pull that value out:
[^D\-1](?:-|\()?\$?(?:\d{1,3}[ ,]?)*(?:\.\d+)\)?
However, this pulls out: 94,520.07
I know it has something to do with this part: [^D\-1] "eating" to many of the 1's. Any ideas how I can stop it from "eating" 1's after the first one that appears in HUD-1?
UPDATED:
The reason for all the other stuff is I only want to match as well if the value after HUD-1 is a money amount. And the rest of that regex tries to determine all the different ways a money amount could be written
Why not something as simple as:
.*HUD\-1(.*+)
Ok, you need to be more restrictive I see based on your updated question. Try changing [^D\-1] to just (?:HUD\-1)?. For what it's worth, your currency RegEx is vary lax, allowing input like:
001 001 .31412341234123
You might consider not reinventing the wheel there, I'm sure you can find a currency RegEx quickly via Google. Otherwise, I'd also suggest anchoring your RegEx with a $ at the end of it.
this change will make the second match group of the regex include the full number you would like (everything after the first 1), and put the possible HUD-1 in a separate matching group, if present.
(HUD-1)?((?:-|\()?\$?(?:\d{1,3}[ ,]?)*(?:\.\d+)\)?)

Java Regex, capturing groups with comma separated values

InputString: A soldier may have bruises , wounds , marks , dislocations or other Injuries that hurt him .
ExpectedOutput:
bruises
wounds
marks
dislocations
Injuries
Generalized Pattern Tried:
".[\s]?(\w+?)"+ // bruises.
"(?:(\s)?,(\s)?(\w+?))*"+ // wounds marks dislocations
"[\s]?(?:or|and) other (\w+)."; // Injuries
The pattern should be able to match other input strings like: A soldier may have bruiser or other injuries that hurt him.
On trying the generalized pattern above, the output is:
bruises
dislocations
Injuries
There is something wrong with the capturing group for "(?:(\s)?,(\s)?(\w+?))*". The capturing group has one more occurences.. but it returns only "dislocations". "marks" and "dislocation: are devoured.
Could you please suggest what should be the right pattern, and where is the mistake?
This question comes closest to this question, but that solution didn't help.
Thanks.
When the capture group is annotated with a quantifier [ie: (foo)*] then you will only get the last match. If you wanted to get all of them then you need to quantifier inside the capture and then you will have to manually parse out the values. As big a fan as I am of regex, I don't think it's appropriate here for any number of reasons... even if you weren't ultimately doing NLP.
How to fix: (?:(\s)?,(\s)?(\w+?))*
Well, the quantifier basically covers the whole regex in that case and you might as well use Matcher.find() to step through each match. Also, I'm curious why you have capture groups for the whitespace. If all you are trying to do is find a comma-separated set of words then that's something like: \w+(?:\s*,\s*\w+)* Then don't bother with capture groups and just split the whole match.
And for anything more complicated re: NLP, GATE is a pretty powerful tool. The learning curve is steep at times but you have a whole industry of science-guys to draw from: http://gate.ac.uk/
Regex in not suited for (natural) language processing. With regex, you can only match well defined patterns. You should really, really abandon the idea of doing this with regex.
You may want to start a new question where you specify what programming language you're using to perform this task and ask for pointers there.
EDIT
PSpeed posted a promising link to a 3rd party library, Gate, that's able to do many language processing tasks. And it's written in Java. I have not used it myself, but looking at the people/institutions working on it, it seems pretty solid.
The pattern that works is: \w+(?:\s*,\s*\w+)* and then manually separate CSV
There is no other method to do this with Java Regex.
Ideally, Java regex is not suitable for NLP. A useful tool for text mining is: gate.ac.uk
Thanks to Bart K. , and PSpeed.

Categories