JAVA REGEX fine the correct pattern - java

I tried to use regex. I have this pattern
STACK
blabla
OVER
blabla
STACK
vlvlv
OVER
and maybe can another line in the end.
I write this patter that seems to work in sites that check regex but dont work in java.
"^(STACK(\n[^\n]+\n)OVER(\n[^\n]+(\n)?)?)+$"
what is the right pattern?
THANKS

Assuming that you want to check if your entire input can be matched with regex you can use something like
String data =
"STACK\r\n" +
"blabla\r\n" +
"OVER\r\n" +
"blabla\r\n" +
"STACK\r\n" +
"vlvlv\r\n" +
"OVER";
String regex ="(^STACK$((\r?\n|\r).+(\r?\n|\r))^OVER$((\r?\n|\r).+(\r?\n|\r)?)?+)+";
Pattern p = Pattern.compile(regex,Pattern.MULTILINE);
Matcher m = p.matcher(data);
System.out.println(m.matches());
I added Pattern.MULTILINE flag to let ^ and $ be start and end of lines, not like it is by default start and end of entire input.
Also to say that START and OVER has to be the only word in line I surrounded it with ^ and $.
Another thing you didn't include in your regex is possibility that line separator can also be \r\n or \r so I changed it to reflect it.
Last thing I did was changing [^\n] to . since they represents almost the same (dot doesn't include \r while [^\n] does.

Related

can deal with the first line space when i use regex for polynomials

here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out
Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits
Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.

Pattern matching with Java regex

My server dev gave me a regex that he says is the requirement for user name. It is #"^\w+([\s-]\w+)*$"
I need help figuring out right Java expression for this. It is confusing since I need to put some escape characters to make compiler happy.
I am trying this. Please let me know if this is right :
Pattern p = Pattern.compile("^\\w+([\\s-]\\w+)*\\$", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(username);
if ((username.length() < 3 ) || (m.find())) {
log ("Invalid pattern");
return false;
}
Is this correct ?
The correct pattern is "^\\w+([\\s-]\\w+)*$".
$ denotes the end of the string, if you use \\$ it will force the string to have the char $ and that's not the intent.
In your regex
^\\w+([\\s-]\\w+)*\\$
^
You don't have to escape this $. It is there to indicate End Of Line.
so the correct Regex would be:
^\\w+([\\s-]\\w+)*$
N.B.: However, you have to make sure that this $ sign doesn't represent $ literally. In that case you'd have to escape it, but I anticipate in that case it would be escaped in your source RegEx as well.

Java Regex Matcher not giving expected result

I have the following code.
String _partsPattern = "(.*)((\n\n)|(\n)|(.))";
static final Pattern partsPattern = Pattern.compile(_partsPattern);
String text= "PART1: 01/02/03\r\nFindings:no smoking";
Matcher match = partsPattern.matcher(text);
while (match.find()) {
System.out.println( match.group(1));
return; //I just care on the first match for this purpose
}
Output: PART1: 01/02/0
I was expecting PART1: 01/02/03 why is the 3 at the end of my text not matching in my result.
Problem with your regex is that . will not match line separators like \r or \n so your regex will stop before \r and since last part of your regex
(.*)((\n\n)|(\n)|(.))
^^^^^^^^^^^^^^^
is required and it can't match \r last character will be stored in (.).
If you don't want to include these line separators in your match just use "(.*)$"; pattern with Pattern.MULTILINE flag to make $ match end of each line (it will represent standard line separators like \r or \r\n or \n but will not include them in match).
So try with
String _partsPattern = "(.*)$"; //parenthesis are not required now
final Pattern partsPattern = Pattern.compile(_partsPattern,Pattern.MULTILINE);
Other approach would be changing your regex to something like (.*)((\r\n)|(\n)|(.)) or (.*)((\r?\n)|(.)) but I am not sure what would be the purpose of last (.) (I would probably remove it). It is just variation of your original regex.
Works, giving "PART1: 01/02/03 ". So my guess is that in the real code you read the text maybe with a Reader.readLine and erroneously strip a carriage return + linefeed. Far fetched but I cannot imagine otherwise. (readLine strips the newline itself.)

using regex to slash out initials in pattern

I am trying to slash out pattern as specified using regex , but in replacement also replaces wanted character . specifying boundary does not help in this case .
String name = "Dr.Dre" ;
Pattern p = Pattern.compile("(Mr.|MR.|Dr.|mr.|DR.|dr.|ms.|Ms.|MS.|Miss.|Mrs.|mrs.|miss.|MR|mr|Mr|Dr|DR|dr|ms|Ms|MS|miss|Miss|Mrs|mrs)"+"\\b");
Matcher m = p.matcher(name);
StringBuffer sb = new StringBuffer();
String namef = m.replaceAll("");
System.out.println(namef);
Input : Dr.Dre or Dr Dre or Dr. Dre
> output(expected) : Dre or Dre or Dre
Edit:
Thanks for help , but there is little regex issue I am facing:
Program:
String name = "Dr. Dre" ;
Pattern p = Pattern.compile("(Mr\\.|MR\\.|Dr\\.|mr\\.|DR\\.|dr\\.|ms\\.|Ms\\.|MS\\.|Miss\\.|Mrs\\.|mrs\\.|miss\\.|MR|mr|Mr|Dr|DR|dr|ms|Ms|MS|miss|Miss|Mrs|mrs)"+"\\b");
Matcher m = p.matcher(name);
String namef = m.replaceAll("");
System.out.println(namef);
For above program I receive output as:
. Dre
while the desired output is :
Dre
Dot in a regular expression means "any character". You need to escape it with a backslash, which in turn needs to be escaped in a string literal:
Pattern p = Pattern.compile("Mr\\.|MR\\.|Dr\\.|mr\\.|DR\\.|dr\\.|ms\\."); // etc
Note that you'll end up with a double space after removing "Dr." from "or Dr. Dre" though...
EDIT: For some reason (I haven't worked out why), a space after a dot doesn't count as a word boundary. If you change your pattern to use \\s instead of \\b, so replace a single whitespace character, it works for "Dr. Dre" - but as noted in comments, it then fails for "Dr.Dre". You could either remove the word boundary entirely and add a space to the later parts of the pattern ("DR |Dr |" etc) or use (\\s|\\b) which works for the cases I tried it on, but may well have other undesirable side-effects.
The question is a bit unclear (you aren't providing the problematic results), but my guess is that the problem lies in using the period character. The period has a meaning in regex - it matches ANY character, so "Dr." will actually match *Dr.D*re. You have to escape it like so "Dr." or in your code specifically, to escape the escape slash, like this: "Dr\."
Hope that helps!

groovy or java: how to retrieve a block of comments using regex from /** ***/?

This might be a piece of cake for java experts. Please help me out:
I have a block of comments in my program like this:
/*********
block of comments - line 1
line 2
.....
***/
How could I retrieve "block of comments" using regex?
thanks.
Something like this should do:
String str =
"some text\n"+
"/*********\n" +
"block of comments - line 1\n" +
"line 2\n"+
"....\n" +
"***/\n" +
"some more text";
Pattern p = Pattern.compile("/\\*+(.*?)\\*+/", Pattern.DOTALL);
Matcher m = p.matcher(str);
if (m.find())
System.out.println(m.group(1));
(DOTALL says that the . in the pattern should also match new-line characters)
Prints:
block of comments - line 1
line 2
....
Pattern regex = Pattern.compile("/\\*[^\\r\\n]*[\\r\\n]+(.*?)[\\r\\n]+[^\\r\\n]*\\*+/", Pattern.DOTALL);
This works because comments can't be nested in Java.
It is important to use a reluctant quantifier (.*?) or we will match everything from the first comment to the last comment in a file, regardless of whether there is actual code in-between.
/\* matches /*
[^\r\n]* matches whatever else is on the rest of this line.
[\r\n]+ matches one or more linefeeds.
.*? matches as few characters as possible.
[\r\n]+ matches one or more linefeeds.
[^\r\n]* matches any characters on the line of the closing */.
\*/ matches */.
Not sure about the multi-line issues, but it were all on one line, you could do this:
^\/\*.*\*\/$
That breaks down to:
^ start of a line
\/\*+ start of a comment, one or more *'s (both characters escaped)
.* any number of characters
\*+\/ end of a comment, one or more *'s (both characters escaped)
$ end of a line
By the way, it's "regex" not "regrex" :)

Categories