digit regex no match - java

I keep getting an error that I shouldn't be getting and I am no regex expert but it should be so simple. I looked over it so many times and can't figure out why it isn't working. I have also searched a bunch for something similar but I can't find anyone that has the same problem.
This is the error I'm getting:
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:485)
at DailyData.importUsers(DailyData.java:456)
at DailyData.main(DailyData.java:40)
Here is my code, through debugging I found its the last line that gives the error:
Pattern memberSincePattern = Pattern.compile("\\W*(\\d+):(\\d+):(\\d+)\\W*(\\d+)/(\\d+)/(\\d+)");
Matcher memberSinceMatcher = memberSincePattern.matcher("12:12:12 12/12/2012");
String msGroupOne = memberSinceMatcher.group(1);
I am using eclipse on Ubuntu 14.04 LTS.
I have imported the proper libraries and have tried \d{1,2} for the digits as well as getting rid of the leading \W*. I want it to be able to grab either 1 or two digits for each group.
I get no syntax errors or warnings on this either.

As the exception indicates, you need to find a match to your regex before looking for a matched group.
For example, you could use Matcher#matches, as follows:
Pattern memberSincePattern = Pattern.compile("\\W*(\\d+):(\\d+):(\\d+)\\W*(\\d+)/(\\d+)/(\\d+)");
Matcher memberSinceMatcher = memberSincePattern.matcher("12:12:12 12/12/2012");
if(memberSinceMatcher.matches()) {
String msGroupOne = memberSinceMatcher.group(1);
}
Here's the javadoc entry for Matcher#matches.
As a side note, I'd like to point out that if you want to match only a sub-sequence of your original String, at least one time, you should use Matcher#find instead of Matcher#matches. Possibly in a while loop :)

Related

Java Regular Expression not evaluating

I have a string that is changes frequently, in the form of :
*** START OF THIS PROJECT FILENAME ***
Where FILENAME can be multiple words in different instances. I tried running the regex :
Pattern.matches("\\*\\*\\* START OF THIS PROJECT ", line);
where line is equal to one such strings.
I also tried using the Matcher, where beginningOfFilePatter is also set to the same regex pattern above:
Matcher beginFileAccelerator;
beginFileAccelerator = beginningOfFilePattern.matcher(line);
if (beginFileAccelerator.find()
//Do Something
Ive exhuastively tried at least 30 different combinations of regex, and I simply can't find the solution. If anyone could lend me an eye I would greatly appreciate it.
Pattern.matches tries to match the entire string against the pattern, because under the covers it uses Matcher#matches, which says:
Attempts to match the entire region against the pattern.
In your case, that will fail at the end, because the input doesn't end with "PROJECT ". It has more after that.
To allow anything at the end, add .*:
Pattern.matches("\\*\\*\\* START OF THIS PROJECT .*", line)
// Here -----------------------------------------^
Live Example

Regex for commas and periods allowed

I tried searching for an answer to this question and also reading the Regex Wiki but I couldn't find what I'm looking for exactly.
I have a program that validates a document. (It was written by someone else).
If certain lines or characters don't match the regex then an error is generated. I've noted that a few false errors are always generated and I want to correct this. I believe I have narrowed down the problem to this:
Here is an example:
This error is flagged by the program logic:
ERROR: File header immediate origin name is invalid: CITIBANK, N.A.
Here is the code that causes that error:
if(strLine.substring(63,86).matches("[A-Z,a-z,0-9, ]+")){
}else{
JOptionPane.showMessageDialog(null, "ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
errorFound=true;
fileHeaderErrorFound=true;
bw.write("ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
bw.newLine();
I believe the reason that the error is called at runtime is because the text contains a period and comma.. I am unsure how to allow these in the regex.
I have tried using this
if(strLine.substring(63,86).matches("[A-Z,a-z,0-9,,,. ]+")){
and it seemed to work I just wanted to make sure that is the correct way because it doesn't look right.
You're right in your analysis, the match failed because there was a dot in the text that isn't contained in the character class.
However, you can simplify the regex - no need to repeat the commas, they don't have any special meaning inside a class:
if(strLine.substring(63,86).matches("[A-Za-z0-9,. ]+"))
Are you sure that you'll never have to match non-ASCII letters or any other kind of punctuation, though?
Alphabets and digits : a-zA-Z0-9 can effectively be replaced by \w denoting 'words'.
The period and comma don't need escaping and can be used as is. Hence this regex might come in handy:
"[\w,.]"
Hope this helps. :)

Getting an exception while applying regex to str.split method

This code totally runs when I am applying it outside android, that is, in a pure java enviroment. (There is a link that says it is a doublicate of the question, but its not) I want to know why it runs in java without android, but crashes in android.
String[] ar = new String[iters];
ar = myStr.split("(?<=\\G.{16})");
However, when I apply the same in android enviroment, I get the following exception
04-13 13:50:22.255: E/AndroidRuntime(2147): FATAL EXCEPTION: main
04-13 13:50:22.255: E/AndroidRuntime(2147): java.util.regex.PatternSyntaxException: Look-behind pattern matches must have a bounded maximum length near index 12:
04-13 13:50:22.255: E/AndroidRuntime(2147): (?<=\G.{16})
Possible reason:
It looks like a bug of Java version which your Android is using, which was corrected in later Java versions.
\G can be considered as anchor which represents either
end of previous match
start of the string (if no match was found yet)
and as any anchor it is zero-length.
I suspect that main part of that bug is that \G is seen by look-behind as entire previous match, not its end, and since previous match could have any length look-behind complains about it because it can't determine obvious maximal length.
Way around.
Avoid split("(?<=\\Gwhatever)").
Instead of finding delimiters, use Matcher class to find() things you want to get from text. So your code can look like:
String myStr = "0123456789012345678901234567890123456789012345678901234567890123456789";
Matcher m = Pattern.compile(".{1,16}").matcher(myStr);
while (m.find()) {
String s = m.group();
//do what you want with current token stored in `s`
System.out.println(s);
}
Output:
0123456789012345
6789012345678901
2345678901234567
8901234567890123
456789

Simple Java regular expression matching fails

Before y'all jump on me for posting something similar to previous questions asked, yes, there seem to be a number of regex related questions but nothing which seems to help me, or at least that I can see.
I am trying to parse strings in JAVA using PATTERN and MATCHER and am really having no joy. My regular expression seems to match my input string when I use a few of the online regular expression testing websites but Java simply does not match my expression.
My input string is:
"Big apple" title="Little Apple" type="Container" url="http://malcolm.com/testing"
The regular expression I am using to match is ".*" title="(.*)" type="Container" url="(.*)"
Essentially I want to pull out the text within the second and the fourth set of quotes. There will always be 4 sets of quotes with text within and around.
I am coding as follows:
Variable XMLSubstring contains the string above (including the quotes) and is as stated, even when I print it out.
Pattern p = Pattern.compile(".* title=\"(.*)\" type=\"Container\" url=\"(.*)\"");
m = p.matcher(XMLSubstring);
It doesn't appear to be rocket science I'm attempting but I'm pulling my hair out trying to debug the bloody thing.
Is there something wrong with my regex pattern?
Is there something wrong with the code I am using?
Am I simply a moron and should stop coding with immediate effect?
EDIT & UPDATE: I have found the problem. My string had a space at the end of it which was breaking the parser! How silly, and I think based on that, I need to accept the third suggestion of mine and give up programming. Thanks all for your assistance.
Try this,
String str="\"Big apple\" title=\"Little Apple\" type=\"Container\" url=\"http://malcolm.com/testing\"";
Pattern p=Pattern.compile(".* title=\\\".*\\\" type=\\\"Container\\\" url=\\\".*\\\"");
Matcher m=p.matcher(str);

Splitting long error message

I am currently trying to add a few errormessages to my application. For that I am using JOptionPane.showMessageDialog(...);
Basically everything is working as I'd expect it to. But one thing is a bit of a pain.
I am using e.getMessage() to receive the description of the occuring error.
In the case of a sql connection error this is such a long message, that it can't possibly fit to the screen. So I thouht to split it after every sentence, using split([\\.]).
This is working as well, BUT: the message includes a part like this
Error: "java.net.SocketTimeoutException: Receive timed out"., which, of course ends up in:
Error: "java
net
SocketTimeoutException: Receive timed out"
How could I avoid this behaviour? Or is there possibly a better way to achieve the result of a splitted error message?
Why not just split on every space that has dot before it?
Try maybe split("(?<=[.])\\s+")
(?<=[.]) is positive-look-behind. It is used to make sure that group of spaces \\s+ have dot before it, but will not include this dot in match, so it will stay untouched after split, while white-spaces will be removed.
Not sure until your input and expected result are posted in full, but you could use "lookarounds" for that purpose.
For instance:
String input = "Error: \"java.net.SocketTimeoutException: Receive timed out\".";
System.out.println(Arrays.toString(input.split("(?<!\\w)\\.(?!\\w)")));
Output
[Error: "java.net.SocketTimeoutException: Receive timed out"]
Explanation
It splits the String based on (escaped) dot Patterns neither preceded nor followed by any word character
It prints the split Array (here, only 1 element since the package-delimiting dots do not match the Pattern as expected)
An alternative to use regexes is using WordUtils.wrap from the apache.commons.lang package. Using a regex has the advantage of not using an additional library, but makes the code a wee bit more unreadable. In your case not really a big issue, but as an added benefit, commons.lang contains a good deal of useful stuff which might come in handy in your project.
It is one of the libraries which is pretty much a constant in my tool-belt.

Categories