java regex char sequence - java

I have a string with multiple "message" inside it. "message" starts with certain char sequence. I've tried:
String str = 'ab message1ab message2ab message3'
Pattern pattern = Pattern.compile('(?<record>ab\\p{ASCII}+(?!ab))');
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
handleMessage(matcher.group('record'))
}
but \p{ASCII}+ greedy eat everything.
Symbols a, b can be inside message only their sequence mean start of next message

p{ASCII}+ is the greedy regex for one or more ASCII characters, meaning that it will use the longest possible match. But you can use the reluctant quantifier if you want the shortest possible match: p{ASCII}+?. In that case, you should use a positive lookahead assertion.
The regex could become:
Pattern pattern = Pattern.compile("(?<record>ab\\p{ASCII}+?)(?=(ab)|\\z)");
Please note the (ab)|\z to match the last message...

Related

How do I write regular expression to match the below pattern

Pattern : (any special characters,new lines,spaces, alphabets and numbers)[!- (any special characters,alphabets and numbers) -!](any special characters,new lines,spaces, alphabets and numbers)
Example :
!|this is first line|
|abc|62883HJKS,JSK|56.23|28378|!-23838.37|63883BC|9729-!|
Need to match all substrings like !-23838.37|63883BC|9729-!
It could work like this
final String regex = "(!-.*?-!)";
Matcher m = Pattern.compile(regex)
.matcher(
"|abc|6288\n3HJKS,JSK|28378|!-23838|63883BC|9729-!|abc|62883HJKS,JSK|28378|!-8XX38|638,83BC|9729-!|abc|62883HJKS,JSK|28378|!-8XX38|638\n83BC|9729-!|");
while (m.find()) {
System.out.println(m.group(1));
}
Result:
!-23838|63883BC|9729-!
!-8XX38|638,83BC|9729-!
!-8XX38|638\n83BC|9729-! is ignored
instead of Regex (!-.?-!) you could use something replacing the point like (!-[a-zA-Z0-9|,.?!]?-!). Use *? to have lazy match to get shortest Range of !-...-!

Java Pattern matcher not matching for HTTP response code [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

How to get all integers before hyphen from java String

I want to parse through hyphen, the answer should be 0 0 1 (integer), what could be the best way to parse in java
public static String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
Please help me out.
Use the below regex with Pattern and matcher classes.
Pattern.compile("\\d+(?=-)");
\\d+ - Matches one or more digits. + repeats the previous token \\d (which matches a digit character) one or more times.
(?=-) - Only if it's followed by an hyphen. (?=-) Called positive lookahead assertion which asserts that the match must be followed by an - symbol.
String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
Matcher m = Pattern.compile("\\d+(?=-)").matcher(str);
while(m.find())
{
System.out.println(m.group());
}
one lazy way: if you already know the pattern of the string, use substring and indexof to locate your word.
String str ="[0-S1|0-S2|1-S3, 1-S1|0-S2|0-S3, 0-S1|1-S2|0-S3]";
integer int1 = Integer.parseInt(str.substring(str.indexOf("["),str.indexOf("-S1")));
and so on.

Regular expression for a string starting with some string

I have some string, that has this type: (notice)Any_other_string (notes that : () has in this string`.
So, I want to separate this string to 2 part : (notice) and the rest. I do as follow :
private static final Pattern p1 = Pattern.compile("(^\\(notice\\))([a-z_A-Z1-9])+");
String content = "(notice)Stack Over_Flow 123";
Matcher m = p1.matcher(content);
System.out.println("Printing");
if (m.find()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
}
I hope the result will be (notice) and Stack Over_Flow 123, but instead, the result is : (notice)Stack and (notice)
I cannot explain this result. Which regex is suitable for my purpose?
Issue 1: group(0) will always return the entire match - this is specified in the javadoc - and the actual capturing groups start from index 1. Simply replace it with the following:
System.out.println(m.group(1));
System.out.println(m.group(2));
Issue 2: You do not take spaces and other characters, such as underscores, into account (not even the digit 0). I suggest using the dot, ., for matching unknown characters. Or include \\s (whitespace) and _ into your regex. Either of the following regexes should work:
(^\\(notice\\))(.+)
(^\\(notice\\))([A-Za-z0-9_\\s]+)
Note that you need the + inside the capturing group, or it will only find the last character of the second part.

Negative lookahead regex not working

input1="caused/VBN by/IN thyroid disorder"
Requirement: find word "caused" that is followed by slash followed by any number of capital alphabets -- and not followed by space + "by/IN.
In the example above "caused/VBN" is followed by " by/IN", so 'caused' should not match.
input2="caused/VBN thyroid disorder"
"by/IN" doesn't follow caused, so it should match
regex="caused/[A-Z]+(?![\\s]+by/IN)"
caused/[A-Z]+ -- word 'caused' + / + one or more capital letters
(?![\\s]+by) -- negative lookahead - not matching space and by
Below is a simple method that I used to test
public static void main(String[] args){
String input = "caused/VBN by/IN thyroid disorder";
String regex = "caused/[A-Z]+(?![\\s]+by/IN)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()){
System.out.println(matcher.group());
}
Output: caused/VB
I don't understand why my negative lookahead regex is not working.
You need to include a word boundary in your regular expression:
String regex = "caused/[A-Z]+\\b(?![\\s]+by/IN)";
Without it you can get a match, but not what you were expecting:
"caused/VBN by/IN thyroid disorder";
^^^^^^^^^
this matches because "N by" doesn't match "[\\s]+by"
The character class []+ match will be adjusted (via backtracking) so that the lookahead will match.
What you have to do is stop the backtracking so that the expression []+ is fully matched.
This can be done a couple of different ways.
A positive lookahead, followed by a consumption
"caused(?=(/[A-Z]+))\\1(?!\\s+by/IN)"
A standalone sub-expression
"caused(?>/[A-Z]+)(?!\\s+by/IN)"
A possesive quantifier
"caused/[A-Z]++(?!\\s+by/IN)"

Categories