Extracting two numbers from a string - java

I have a String like the following one:
"some value is 25 but must not be bigger then 12"
I want to extract the two numbers from the string.
The numbers are integers.
There might be no text before the first number and some text after the second number.
I tried to do it with a regexp and groups, but failed miserably:
public MessageParser(String message) {
Pattern stringWith2Numbers = Pattern.compile(".*(\\d?).*(\\d?).*");
Matcher matcher = stringWith2Numbers.matcher(message);
if (!matcher.matches()) {
couldParse = false;
firstNumber = 0;
secondNumber = 0;
} else {
final String firstNumberString = matcher.group(1);
firstNumber = Integer.valueOf(firstNumberString);
final String secondNumberString = matcher.group(2);
secondNumber = Integer.valueOf(secondNumberString);
couldParse = true;
}
}
Any help is apreciated.

Your pattern should look more like:
Pattern stringWith2Numbers = Pattern.compile("\\D*(\\d+)\\D+(\\d+)\\D*");
You need to accept \\d+ because it can be one or more digits.

Your ".*" patterns are being greedy, as is their wont, and are gobbling up as much as they can -- which is going to be the entire string. So that first ".*" is matching the entire string, rendering the rest moot. Also, your "\\d?" clauses indicate a single digit which happens to be optional, neither of which is what you want.
This is probably more in line with what you're shooting for:
Pattern stringWith2Numbers = Pattern.compile(".*?(\\d+).*?(\\d+).*?");
Of course, since you don't really care about the stuff before or after the numbers, why bother with them?
Pattern stringWith2Numbers = Pattern.compile("(\\d+).*?(\\d+)");
That ought to do the trick.
Edit: Taking time out from writing butt-kickingly awesome comics, Alan Moore pointed out some problems with my solution in the comments. For starters, if you have only a single multi-digit number in the string, my solution gets it wrong. Applying it to "This 123 is a bad string" would cause it to return "12" and "3" when it ought to simply fail. A better regex would stipulate that there MUST be at least one non-digit character separating the two numbers, like so:
Pattern stringWith2Numbers = Pattern.compile("(\\d+)\\D+(\\d+)");
Also, matches() applies the pattern to the entire string, essentially bracketing it in ^ and $; find() would do the trick, but that's not what the OP was using. So sticking with matches(), we'd need to bring back in those "useless" clauses in front of and after the two numbers. (Though having them explicitly match non-digits instead of the wildcard is better form.) So it would look like:
Pattern stringWith2Numbers = Pattern.compile("\\D*(\\d+)\\D+(\\d+)\\D*");
... which, it must be noted, is damn near identical to jjnguy's answer.

Your regex matches, but everything gets eaten up by your first .* and the rest matches the empty string.
Change your regex to "\\D*(\\d+)\\D+(\\d+)\\D*".
This should be read as: At least one numeric digit followed by at least one character that isn't a numeric digit, followed by at least one numeric digit.

Related

Java Regex Look-Behind Doesn't Work

So I am working on regex comparing phone numbers and this is the result:
(?:(?:0{2}|\+)?([1-9][0-9]))? ?([1-9][0-9])? ?([1-9][0-9]{5})
As you can see there are spaces between the numbers. I want them to appear only when there is some other number before the space so:
"0022 45 432345" - should match
"45 345678" or "560032" - should match
" 324400" - shouldn't match because of the space in the beginning
I've been reading different tutorials about regexes and found out about look-behinds, but simple construction like that(just for test):
Pattern p2 = Pattern.compile("(?<=abc)aa");
Matcher m2 = p2.matcher("abcaa");
doesn't work.
Can you tell me what's wrong?
Another problem is - I want a character only happen when it is THE FIRST character in a string, otherwise it shouldn't occur. So the code:
0043 022 234567 should not work, but 022 123450 should match.
I'm stuck right now and would appreciate any help a lot.
This should work just fine. The spaces are moved into the optional groups and are themselves optional. This way, they only match if the group before them is present, but even then they are still optional. No look-behind required.
(?:(?:(?:00|\+)?([1-9][0-9]) ?)?([1-9][0-9]) ?)?([1-9][0-9]{5})
Lookbehind is a zero length match.
The javadoc for the Matcher.matches method determines if the whole String is a match.
What you're looking for is something the Matcher.find and Matcher.group methods. Something like:
final Pattern pattern = Pattern.compile("(?<=abc)aa");
final Matcher matcher = pattern.matcher("abaca");
final String subMatch;
if (matcher.find()) {
subMatch = matcher.group();
} else {
subMatch = "";
}
System.out.println(subMatch);
Example.

Extract substring from end till first alphabet in java

I have a string of format: A-2-Q4567
More examples: AB-456-T12, A24-5-M12345, etc.
I want to extract the last numerical values out of these strings, which are: 4567, 12, 12345 respectively (which is the numerical value of the substring from the end till first non-numeric character is encountered)
I can split the string, get the last string from the splitted string array, and then do a parseInt after removing the non-numerical characters from it.
But is there a more elegant way of doing this?
You can use this regex: (\d+$). It returns the last sequence of digits in the string.
EDIT - some explanation:
The \d means any digit.
The + means one or more of the previous symbols. Since the previous symbol is a digit, then \d+ means "one or more digits".
The $ means the end of the string, so \d+$ is the last sequence of digits in the string.
you can do this :
String getLastNumeric(String input)
{
String str="";
char c;
for(int i=input.length()-1;i>=0 && Character.isDigit(c=input.charAt(i));i--)
str=c+str;
return str;
}
The regex solutions might be more elegant but performance-wise I think the above is the best because Regex match can be more expensive than a simple for loop with a simple condition to evaluate.
Ofcourse The Regex is more flexible, what if your requirements change and now a dash "-" must precede the numbers ? with Regex it should be just a matter of changing one regex expression.
I put the Regex version here but remember if you're sure your requirements won't change I think the above solution is better on the CPU :
Matcher matcher= Pattern.compile("(\\d+$)").matcher(input);
if(matcher.find())
return matcher.group();
return "";

Java Regex to check "=number", ex "=5455"?

I want to check a string that matches the format "=number", ex "=5455".
As long as the fist char is "=" & the subsequence is any number in [0-9] (dot is not allowed), then it will popup "correct" message.
if(str.matches("^[=][0-9]+")){
Window.alert("correct");
}
So, is this ^[=][0-9]+ the correct one?
if it is not correct, can u provide a correct solution?
if it is correct, then can u find a better solution?
I'm no big regex expert and more knowledgeable people than me might correct this answer, but:
I don't think there's a point in using [=] rather than simply = - the [...] block is used to declare multiple choices, why declare a multiple choice of one character?
I don't think you need to use ^ (if your input string contains any character before =, it won't match anyway). I'm unsure as to whether its presence makes your regex faster, slower or has no effect.
In conclusion, I'd use =[0-9]+
That should be correct it is looking for an anchored at the beginning = sign and then 1 or more digits between 0-9
Your regex will work, even though it can be simplified:
.matches() does not really do regex matching, since it tries and matches all the input against the regex; therefore the beginning of input anchor is not needed;
you don't need the character class around the =.
Therefore:
if (str.matches("=[0-9]+")) { ... }
If you want to match a string which only begins with that regex, you have to use a Pattern, a Matcher and .find():
final Pattern p = Pattern.compile("^=[0-9]+");
final Matcher m = p.matcher(str);
if (m.find()) { ... }
And finally, Matcher also has .lookingAt() which anchors the regex only at the beginning of the input.

Java regex split double from string

I am having a problem splitting something like the following string:
43.80USD
What I want is to be able to split the expression into an array that has "43.80" as the first element and "USD" as the second. So the result would be something like:
["43.80", "USD"]
I am sure there is some way to do this with regex, but I am not proficient enough with it to figure it out on my own. Any help would be much appreciated.
If the format of your string is fixed you can split it as follows
String[] currency = "48.50USD".split("(?<=\\d)(?=[a-zA-Z])");
System.out.println("Amount='"+currency[0]+"'; Denomination='"+currency[1]+"'");
// prints: Amount='48.50'; Denomination='USD'
The regex above uses a positive look-behind (?<=) and a positive lookahead (?=) to find a separator (which is of zero-length here) that's preceded with a number and followed by a letter.
If your data really looks like "43.80USD" then you can use
"43.80USD".split("(?i)(?=[a-z])",2)
(?=[a-z]) will split before any of a-z characters
(?i) will make used regex case-insensitive so it will also work for uSd
second argument is max size of result array, since you don't want ["43.80", "U", "S, "D"] but ["43.80", "USD"] we need to use 2.
This regex works(\d*\.\d*)([a-zA-Z]*). Group 1 will be the amount, including the decimal. Group 2 will be the USD or other monetary name. Note that this regex only requires a decimal point, everything else is optional. So this also matches: "45123.15542ABCDEFG". Group 1 will be 45123.15542 and group 2 will be ABCDEFG. If you want more strict requirements, tell me what they are and Ill put it in. Otherwise your code will look something like:
Pattern p = Pattern.compile("(\\d*\\.\\d*)([a-zA-Z]*)");//Note the double \\ to escape twice.
Matcher m = p.matcher("43.80USD");
String amount, type;
if(m.matches){
amount = m.group(1);
type = m.group(2);
}

Is this Regex incorrect? No matches found

I'm trying to parse through a string formatted like this, except with more values:
Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value
The Regex
((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))
In the actual string, there are about double the amount of key/values, but I'm keeping it short for brevity. I have them in parentheses so I can call them in groups. The keys I have stored as Constants, and they will always be the same. The problem is, it never finds a match which doesn't make sense (unless the Regex is wrong)
Judging by your comment above, it sounds like you're creating the Pattern and Matcher objects and associating the Matcher with the target string, but you aren't actually applying the regex. That's a very common mistake. Here's the full sequence:
String regex = "Key1=(.*),Key2=(.*)"; // etc.
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(targetString);
// Now you have to apply the regex:
if (m.find())
{
String value1 = m.group(1);
String value2 = m.group(2);
// etc.
}
Not only do you have to call find() or matches() (or lookingAt(), but nobody ever uses that one), you should always call it in an if or while statement--that is, you should make sure the regex actually worked before you call any methods like group() that require the Matcher to be in a "matched" state.
Also notice the absence of most of your parentheses. They weren't necessary, and leaving them out makes it easier to (1) read the regex and (2) keep track of the group numbers.
Looks like you'd do better to do:
String[] pairs = data.split(",");
Then parse the key/value pairs one at a time
Your regex is working for me...
If you are always getting an IllegalStateException, I would say that you are trying to do something like:
matcher.group(1);
without having invoked the find() method.
You need to call that method before any attempt to fetch a group (or you will be in an illegal state to call the group() method)
Give this a try:
String test = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
Pattern pattern = Pattern.compile("((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))");
Matcher matcher = pattern.matcher(test);
matcher.find();
System.out.println(matcher.group(1));
It's not wrong per se, but it requires a lot of backtracking which might cause the regular expression engine to bail. I would try a split as suggested elsewhere, but if you really need to use a regular expression, try making it non-greedy.
((Key1)=(.*?)),((Key2)=(.*?)),((Key3)=(.*?)),((Key4)=(.*?)),((Key5)=(.*?)),((Key6)=(.*?)),((Key7)=(.*?))
To understand why it requires so much backtracking, understand that for
Key1=(.*),Key2=(.*)
applied to
Key1=x,Key2=y
Java's regular expression engine matches the first (.*) to x,Key2=y and then tries stripping characters off the right until it can get a match for the rest of the regular expression: ,Key2=(.*). It effectively ends up asking,
Does "" match ,Key2=(.*), no so try
Does "y" match ,Key2=(.*), no so try
Does "=y" match ,Key2=(.*), no so try
Does "2=y" match ,Key2=(.*), no so try
Does "y2=y" match ,Key2=(.*), no so try
Does "ey2=y" match ,Key2=(.*), no so try
Does "Key2=y" match ,Key2=(.*), no so try
Does ",Key2=y" match ,Key2=(.*), yes so the first .* is "x" and the second is "y".
EDIT:
In Java, the non-greedy qualifier changes things so that it starts off trying to match nothing and then building from there.
Does "x,Key2=(.*)" match ,Key2=(.*), no so try
Does ",Key2=(.*)" match ,Key2=(.*), yes.
So when you've got 7 keys it doesn't need to unmatch 6 of them which involves unmatching 5 which involves unmatching 4, .... It can do it's job in one forward pass over the input.
I'm not going to say that there's no regex that will work for this, but it's most likely more complicated to write (and more importantly, read, for the next person that has to deal with the code) than it's worth. The closest I'm able to get with a regex is if you append a terminal comma to the string you're matching, i.e, instead of:
"Key1=value1,Key2=value2"
you would append a comma so it's:
"Key1=value1,Key2=value2,"
Then, the regex that got me the closest is: "(?:(\\w+?)=(\\S+?),)?+"...but this doesn't quite work if the values have commas, though.
You can try to continue tweaking that regex from there, but the problem I found is that there's a conflict in the behavior between greedy and reluctant quantifiers. You'd have to specify a capturing group for the value that is greedy with respect to commas up to the last comma prior to an non-capturing group comprised of word characters followed by the equal sign (the next value)...and this last non-capturing group would have to be optional in case you're matching the last value in the sequence, and maybe itself reluctant. Complicated.
Instead, my advice is just to split the string on "=". You can get away with this because presumably the values aren't allowed to contain the equal sign character.
Now you'll have a bunch of substrings, each of which that is a bunch of characters that comprise a value, the last comma in the string, followed by a key. You can easily find the last comma in each substring using String.lastIndexOf(',').
Treat the first and last substrings specially (because the first one does not have a prepended value and the last one has no appended key) and you should be in business.
If you know you always have 7, the hack-of-least resistance is
^Key1=(.+),Key2=(.+),Key3=(.+),Key4=(.+),Key5=(.+),Key6=(.+),Key7=(.+)$
Try it out at http://www.fileformat.info/tool/regex.htm
I'm pretty sure that there is a better way to parse this thing down that goes through .find() rather than .matches() which I think I would recommend as it allows you to move down the string one key=value pair at a time. It moves you into the whole "greedy" evaluation discussion.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
The simplest solution is the most robust.
final String data = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
final String[] pairs = data.split(",");
for (final String pair: pairs)
{
final String[] keyValue = pair.split("=");
final String key = keyValue[0];
final String value = keyValue[1];
}

Categories