Getting an exception while applying regex to str.split method - java

This code totally runs when I am applying it outside android, that is, in a pure java enviroment. (There is a link that says it is a doublicate of the question, but its not) I want to know why it runs in java without android, but crashes in android.
String[] ar = new String[iters];
ar = myStr.split("(?<=\\G.{16})");
However, when I apply the same in android enviroment, I get the following exception
04-13 13:50:22.255: E/AndroidRuntime(2147): FATAL EXCEPTION: main
04-13 13:50:22.255: E/AndroidRuntime(2147): java.util.regex.PatternSyntaxException: Look-behind pattern matches must have a bounded maximum length near index 12:
04-13 13:50:22.255: E/AndroidRuntime(2147): (?<=\G.{16})

Possible reason:
It looks like a bug of Java version which your Android is using, which was corrected in later Java versions.
\G can be considered as anchor which represents either
end of previous match
start of the string (if no match was found yet)
and as any anchor it is zero-length.
I suspect that main part of that bug is that \G is seen by look-behind as entire previous match, not its end, and since previous match could have any length look-behind complains about it because it can't determine obvious maximal length.
Way around.
Avoid split("(?<=\\Gwhatever)").
Instead of finding delimiters, use Matcher class to find() things you want to get from text. So your code can look like:
String myStr = "0123456789012345678901234567890123456789012345678901234567890123456789";
Matcher m = Pattern.compile(".{1,16}").matcher(myStr);
while (m.find()) {
String s = m.group();
//do what you want with current token stored in `s`
System.out.println(s);
}
Output:
0123456789012345
6789012345678901
2345678901234567
8901234567890123
456789

Related

Regex code not collecting multiple lines of matching pattern

I'm new to using regex and I was hoping that someone could help me with this.
I have this regex code which is supposed to identify tab groups in a tablature file. It works on regex testing websites such as regexr.com, regextester.com, and extendsclass.com/regex-tester, but when I code it in java using the example text shown below, I am given each individual line as its own separate group, instead of 4 groups containing all the text which are separated only by one newline.
I have read through this stack overflow thread"Regular expression works on regex101.com, but not on prod" and have been careful to avoid string literal problems, multiline problems, and ive tried the code with other regex engines on regex101 and it worked, but still, it does not work in my java code shown below.
I tried enabling the multiline flag but it still doesn't work. I thought it was a problem with my code, but then I got the same wrong output on other regex tester websites: myregexp.com and freeformatter.com/java-regex-tester
here is the original regex. It is ling, so it might be easier to use the regex above as they both have the same problem I was talking about:
RealRegexCode = (^|[\n\r])(((?<=^|[\n\r])[^\S\n\r]*\|*[^\S\n\r]*((E|A|D|G|B|e|a|d|g|b)[^\S\n\r]*\|*(?=(([^\S\n\r]*-[ -]*(?=\|))|([ -]*((\(?[a-zB-Z0-9]+\)?)+[^\S\n\r]*-[ -]*)+((\(?[a-zB-Z0-9]+\)?)+){0,1}[^\S\n\r]*))[|\r\n]|$)))((([^\S\n\r]*-[ -]*(?=\|))|([ -]*((\(?[a-zB-Z0-9]+\)?)+[^\S\n\r]*-[ -]*)+((\(?[a-zB-Z0-9]+\)?)+){0,1}[^\S\n\r]*))\|)+(((?<=\|)[^\S\n\r]*((E|A|D|G|B|e|a|d|g|b)[^\S\n\r]*\|*(?=(([^\S\n\r]*-[ -]*(?=\|))|([ -]*((\(?[a-zB-Z0-9]+\)?)+[^\S\n\r]*-[ -]*)+((\(?[a-zB-Z0-9]+\)?)+){0,1}[^\S\n\r]*))[|\r\n]|$)))((([^\S\n\r]*-[ -]*(?=\|))|([ -]*((\(?[a-zB-Z0-9]+\)?)+[^\S\n\r]*-[ -]*)+((\(?[a-zB-Z0-9]+\)?)+){0,1}[^\S\n\r]*))\|)+)*(\n|\r|$))+
Here is a simplified regex code that displays the same problem, provided for the sake of debugging
SimplifiedRegexCode = (^|[\n\r])([^\n\r]+(\n|\r|$))+
here is the code that finds the matches using the regex pattern:
public static void main(String[] args){
String filePath = "C:\\Users\\stani\\IdeaProjects\project\\src\\testing files\\guitar - a thousand matches by passenger.txt";
Path path = Path.of(filePath);
List<String> stuff = new ArrayList<>();
try {
String rootStr = Files.readString(path);
Pattern pattern = Pattern.compile("(^|[\\n\\r])([^\\n\\r]+(\\n|\\r|$))+");
Matcher ptrnMatcher = pattern.matcher(rootStr);
while (ptrnMatcher.find()) {
stuff.add(ptrnMatcher.group());
}
}catch (Exception e) {
e.printStackTrace();
}
System.out.println(new Patterns().MeasureGroupCollection);
for (String s:stuff)
System.out.println(s);
}
And here is the text I was testing it with. It might help to copy and paste this in a text editor as stack overflow might distort how the text looks:
e|---------------------------------|------------------------------------|
e|------------------------------------------------------------------|
B|-----1--------(1)----1-----------|-------1---------------1----------1-|
B|-----1--------(1)----0---------0-----1---------1-----3--------(3)-|
G|-----------0------------0--------|-------------0----------------0-----|
G|-----------0---------------0---------------0---------------0------|
D|-----0h2-----2-------2-----------|-------2-------2-------0--------0---|
D|-----2-------2-------2-------2-------2-------2-------0-------0----|
A|-3-------3-------3-------3-------|------------------------------------|
A|-0-------0--------------------------------------------------------|
E|-----------------------------0---|---1-------1-------3-------3--------|
E|-----------------0-------0--------1------1-------3-------3--------|
e|-------------------------------------------------------------------|
B|-----1---------1-----1---------1-----3---------3-------1---------1-|
G|-----------0---------------0---------------0-----------------0-----|
D|-----3-------2-------2-------2-------0-------0---------2-------2---|
A|-----------------3-------3-------------------------3-------3-------|
E|-1-------1-----------------------3-------3-------------------------|
It should identify four different groups from the text. However, in java and in the two testers I mentioned above, it recognizes each line as its own different group (i.e 12 groups)
I couldn't help but respond to this as I am familiar with both regex and guitar haha.
For your short regex, please see the following regex on regex101.com:
https://regex101.com/r/NqGhoh/1/
The multiline modifier is required.
The main problem with this is that you are handling newlines on the front and back of the expression. I have modified the expression in a couple ways:
Made the regex match newlines only on the end, always looking for a ^ at the beginning.
Matching the carriage return new line combination as \r?\n as a carriage return should always be followed by a newline when it is used.
Used non-capturing groups to improve overhead and reduce complexity when looking at matches. This is the ?: just inside the parenthesis. It means the group won't be captured in the result, just used for encapsulation.
I started testing your longer regex and may update that as well, though it sounds like you already know what to do with the shorter one corrected.

Java Regular Expression not evaluating

I have a string that is changes frequently, in the form of :
*** START OF THIS PROJECT FILENAME ***
Where FILENAME can be multiple words in different instances. I tried running the regex :
Pattern.matches("\\*\\*\\* START OF THIS PROJECT ", line);
where line is equal to one such strings.
I also tried using the Matcher, where beginningOfFilePatter is also set to the same regex pattern above:
Matcher beginFileAccelerator;
beginFileAccelerator = beginningOfFilePattern.matcher(line);
if (beginFileAccelerator.find()
//Do Something
Ive exhuastively tried at least 30 different combinations of regex, and I simply can't find the solution. If anyone could lend me an eye I would greatly appreciate it.
Pattern.matches tries to match the entire string against the pattern, because under the covers it uses Matcher#matches, which says:
Attempts to match the entire region against the pattern.
In your case, that will fail at the end, because the input doesn't end with "PROJECT ". It has more after that.
To allow anything at the end, add .*:
Pattern.matches("\\*\\*\\* START OF THIS PROJECT .*", line)
// Here -----------------------------------------^
Live Example

digit regex no match

I keep getting an error that I shouldn't be getting and I am no regex expert but it should be so simple. I looked over it so many times and can't figure out why it isn't working. I have also searched a bunch for something similar but I can't find anyone that has the same problem.
This is the error I'm getting:
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:485)
at DailyData.importUsers(DailyData.java:456)
at DailyData.main(DailyData.java:40)
Here is my code, through debugging I found its the last line that gives the error:
Pattern memberSincePattern = Pattern.compile("\\W*(\\d+):(\\d+):(\\d+)\\W*(\\d+)/(\\d+)/(\\d+)");
Matcher memberSinceMatcher = memberSincePattern.matcher("12:12:12 12/12/2012");
String msGroupOne = memberSinceMatcher.group(1);
I am using eclipse on Ubuntu 14.04 LTS.
I have imported the proper libraries and have tried \d{1,2} for the digits as well as getting rid of the leading \W*. I want it to be able to grab either 1 or two digits for each group.
I get no syntax errors or warnings on this either.
As the exception indicates, you need to find a match to your regex before looking for a matched group.
For example, you could use Matcher#matches, as follows:
Pattern memberSincePattern = Pattern.compile("\\W*(\\d+):(\\d+):(\\d+)\\W*(\\d+)/(\\d+)/(\\d+)");
Matcher memberSinceMatcher = memberSincePattern.matcher("12:12:12 12/12/2012");
if(memberSinceMatcher.matches()) {
String msGroupOne = memberSinceMatcher.group(1);
}
Here's the javadoc entry for Matcher#matches.
As a side note, I'd like to point out that if you want to match only a sub-sequence of your original String, at least one time, you should use Matcher#find instead of Matcher#matches. Possibly in a while loop :)

Regular expression to search for specific line in a paragraph

I am trying to search for a specific line in a paragraph. Could somebody help me out with a regular expression.
I need to search for " unable to extend table" inside the paragraph :
BasicData:RootContextID=3a88bfa0c11511e1915e9e572a3f5ee0,AuditTimestamp=1340883271834,ContextID=3a88bfa0c11511e1915e9e572a3f5ee0,AuditSchemaName=wMSession,AuditSchemaVersion=1,ServerID=wbrbwm7qi1:5555,SessionID=c8231fb0c11311e1872d8aebd5d052bf,SessionState=2,UserID=Default,SessionName=172.18.186.11,Rpcs=0,Age=621422,$$$AUDITPROCESS={MemData:DefaultJDBCConfig_1=4},ERRORINFO=java.sql.SQLException: [sag-cjdbc42-0000][Oracle JDBC Driver][Oracle]ORA-01653: unable to extend table WMIS712.WMSESSION by 128 in tablespace WEBMDATA 2012-07-10 08:22:01 SAST [ISS.0095.0010E] AuditLogManager Runtime Exception: >>>BasicData:RootContextID=8faed230ca5711e1b0a6f6fdea974793,AuditTimestamp=1341901321940,ContextID=8faed230ca5711e1b0a6f6fdea974793,AuditSchemaName=wMSession,AuditSchemaVersion=1,ServerID=wbrbwm7qi1:5555,SessionID=8fac6130ca5711e1b0a3db011b193ad1,SessionState=2,UserID=Administrator,SessionName=system,Rpcs=0,Age=16<<< publishing log entry com.wm.app.audit.AuditException: [BAA.0002.0000] Wrapped Exception: com.wm.app.store.TSException: [BAT.0002.0000] Wrapped Exception: com.wm.txn.TransactionException: [BAC.0002.0000] Wrapped Exception: com.wm.txn.TransactionException: [BAF.0003.0072] BAF.0003.0072 .
If you know the exact text why don't you just use String's indexOf?
If you just need to know whether your string exists or not, you could just use stringInstance.contains("our string").
However, a very simple regex should be .*YOURTEXTHERE.* -> .* denotes any character (0 or more) and followed by your string followed by any character (0 or more).
Nevertheless, this regex just gives you an indication whether the string exists or not. In fact, the contains(String) method may be a better choice.
Additionally, as #thatidiotguy already said, if you need to know where exactly this string occurs you could use indexOf or if you may want to find the same string more than once a Matcher with a compiled regex pattern.
Hope this helps! :-)

Java Regex Engine Crashing

Regex Pattern - ([^=](\\s*[\\w-.]*)*$)
Test String - paginationInput.entriesPerPage=5
Java Regex Engine Crashing / Taking Ages (> 2mins) finding a match. This is not the case for the following test inputs:
paginationInput=5
paginationInput.entries=5
My requirement is to get hold of the String on the right-hand side of = and replace it with something. The above pattern is doing it fine except for the input mentioned above.
I want to understand why the error and how can I optimize the Regex for my requirement so as to avoid other peculiar cases.
You can use a look behind to make sure your string starts at the character after the =:
(?<=\\=)([\\s\\w\\-.]*)$
As for why it is crashing, it's the second * around the group. I'm not sure why you need that, since that sounds like you are asking for :
A single character, anything but equals
Then 0 or more repeats of the following group:
Any amount of white space
Then any amount of word characters, dash, or dot
End of string
Anyway, take out that *, and it doesn't spin forever anymore, but I'd still go for the more specific regex using the look behind.
Also, I don't know how you are using this, but why did you have the $ in there? Then you can only match the last one in the string (if you have more than one). It seems like you'd be better off with a look-ahead to the new line or the end: (?=\\n|$)
[Edit]: Update per comment below.
Try this:
=\\s*(.*)$

Categories