Java regex pattern too long? [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have this regex which is a bit longer than usual. I try to capture some values in a text document.
\\n*.*(k\\s=\\s\\d)(.|\\n)*?estimate\\s.*\\n*\\s*((\\d+|<)\\.\\d+)\\s*((\\d+|<)\\.\\d+)\\s*((\\d+|<)\\.\\d+)\\s*((\\d+|<)\\.\\d+)\\s*((\\d+|<)\\.\\d+)\\s*((\\d+|<)\\.\\d+)\\s+
It works perfectly fine on regexr.com link
but in Java only this part works
\\n*.*(k\\s=\\s\\d)(.|\\n)*?estimat
as soon as I add the missing 'e' it stops working.
For now I am ignoring that some groups are filled wrongly.
What goes wrong?

The (.|\\n)*? makes the regex engine perform too many redundant backtracking steps. You need to replace all such parts in your pattern with (?s:.*?), a modifier group that matches any 0+ chars including line break chars. Since there is no alternation, there is no redundant backtracking here.
Note that in JavaScript (as you are testing the pattern at regexr.com that only supports JavaScript regex flavor), the (.|\n)*? should be replaced with [^]*? or [\s\S]*? as its regex engine does not support inline modifiers at all.

Related

Extract version from string using java regex [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I could not seem to figure out the proper regex (java style) to extract just the version part for these string
EXPECTATION
spring-aop-4.2.5.RELEASE.jar -> 4.2.5
rumi-1.js -> 1
BouncyCastle-Net-12-1.dll -> 12-1
With the following java regex I keep getting a period at the end of line
\\b\\d[\\d|\\.|\\-]*\\b
Anyone can suggest a better regex?
FAULTY_RESULT
spring-aop-4.2.5.RELEASE.jar -> 4.2.5.
rumi-1.js -> 1.
BouncyCastle-Net-12-1.dll -> 12-1.
Digit(s), followed by zero or more lots of a dot/dash and digit(s), not preceded by a word character:
(?<!\w)\d+([.-]\d+)*
In java, it can be done in one line:
String version = packageName.replaceAll(".*?((?<!\\w)\\d+([.-]\\d+)*).*", "$1");
Here, the target term is captured while the regex matches the entire input and the replacement term returns the target (via the captured group).
Probably you can get this expression better, but this is my first attempt. Note. you need to scape this expression.
(\d+(\.?|-?|\d)+\d|\d)
regards.
Try something like this:
\b\d[\d|.|\-]*\d\b
The doubling of the back-slashes goes into the String constant only:
"\\b\\d[\\d|.|\\-]+*\\d\\b"
However this will not match a single digit version...

Regex that removes everything but the number [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I am trying to use java's string.replaceAll() or replaceFirst() method in order to edit data read from a pdf document. A line of data that could be returned is:
21/1**E (6-11) 4479 77000327633 (U)
I wish to only store the 77000327633 into a variable for working with and looking for the correct regex that will capture ONLY this 11 digit number. I've tried searching around for a regex but nothing seems to give me my desired outcome.
It could be done like this:
String value = "21/1**E (6-11) 4479 77000327633 (U)";
Pattern pattern = Pattern.compile(".* (\\d{11}) .*");
System.out.println(pattern.matcher(value).replaceAll("$1"));
Output:
77000327633
NB: This assumes that your number has 11 digits and that there is a space before and after.
NB2: It is not meant to be perfect it is only to show the idea which is here to define a global pattern with a group and replace everything by the content of the group
This is it : (.*)[ ]([0-9])*[ ](.*)
Can access to your value using $2

Parsing string that contains user entered free text [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I'm trying to parse strings of the following format in Java:
Number-Action-Msg, Number-Action-Msg, Number-Action-Msg, Number-Action-Msg, ...
For example
"512-WARN-Cannot update the name.,615-PREVENT-The app is currently down, please try again later.,736-PREVENT-Testing,"
I would like to get an array with the following entries:
512-WARN-Cannot update the name.
615-PREVENT-The app is currently down, please try again later.
736-PREVENT-Testing
The problem is that the message is user entered, so I can't rely on just the commas to split up the String. The actions will always be WARN or PREVENT. What's the best way to accomplish this parsing? Thanks!
Instead of splitting by comma you can use this lookahead based regex for matching:
(\d+-(?:WARN|PREVENT).*?)(?=,\d+-(?:WARN|PREVENT)|,$)
RegEx Demo
(?=,\d+-(?:WARN|PREVENT)|,$) is a positive lookahead to assert there is a comma followed by digits-(WARN|PREVENT) or end of line ahead.
Seems quite simple:
Regular expression:
WARN|PREVENT
Debuggex Demo
In java:
String string = "512-WARN-Cannot update the name.,615-PREVENT-The app is currently down, please try again later.,736-PREVENT-Testing,";
String regex = "WARN|PREVENT";
System.out.println(Arrays.toString(string.split(regex)));
Will output:
[512-, -Cannot update the name.,615-, -The app is currently down, please try again later.,736-, -Testing,]
Of course you may want to adjust regex adding the -, for example:
String regex = "-WARN-|-PREVENT-";

Regular expression based detection [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I need to identify a pattern in given text (string), and I'm looking for a regex for the same. Using a Regex is preferable due to the framework I'm working in.
For instance, consider the text --
Problem:
<<< empty line(s) >>>>
Reason:
here goes some multi-line reasoning...
...
...
As you can see there is "no text (empty line(s)) after Problem: and before Reason: ".
I need to be able to identify this pattern from the text given to me, using a regex.
Any help is much appreciated.
Thanks!
The simplest regex would be
Pattern regex = Pattern.compile("Problem:\\s+Reason:");
which finds the text Problem:, followed by one or more whitespace characters, followed by the text Reason:.
If you want to make sure that there are at least two linebreaks between the two texts, you could also do
Pattern regex = Pattern.compile("Problem:[ \t]*\r?\n[ \t]*\r?\nReason:");
but that's probably not necessary.

How do I write a regex that fit a desirable pattern and not fit another pattern? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am writing a simplified Java compiler. I wrote a regex for variable name:
"(_?[a-zA-Z]+[\w]*)"
and I want to add that the name can not be certain words, like int, double, true, false...
I tryed using ^ , but it is not working.
It can be done with a RE, but it's not easy for a human to write it. Treat keywords as identifiers in the scanner and distinguish the identifiers vs keywords in the tokenizer afterwards. That should be substantially easier.
I don't believe that this should do that via regular expressions but rather can be better done using a HashSet<String> and exclude identifier names that are contained in the set.
^ is used for something else :
^ may appear at the beginning of a pattern to require the match to
occur at the very beginning of a line. For example, ^abc matches
abc123 but not 123abc.
consider using "(?!...)" :
(?!...) is a negative look-ahead because it requires that the
specified pattern not exist.
i suggest that if it's impossible or too hard , go to real coding instead . sometimes , regular expressions can be much slower than real , optimized code , and they can be very confusing and you might have problems finding what's wrong with what you've written.
for trying out your regular expressions , check this one:
http://gskinner.com/RegExr/
for quick referencing , check this one:
http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm

Categories