Java Regex does not match - java

I know that this kind of questions are proposed very often, but
I can't figure out why this RegEx does not match.
I want to check if there is a "M" at the beginning of the line, or not.
Finaly, i want the path at the end of the line.
This is why startsWith() doesn't fit my Needs.
line = "M 72208 70779 koj src\com\company\testproject\TestDomainf1.java";
if (line.matches("^(M?)(.*)$")) {}
I've also tried the other way out:
Pattern p = Pattern.compile("(M?)");
Matcher m = datePatt.matcher(line);
if (m.matches()) {
System.out.println("yay!");
}
if (line.matches("(M?)(.*)")) {}
Thanks

The correct regex would be simply
line.matches("M.*")
since the matches method enforces that the whole input sequence must match. However, this is such a simple problem that I wonder if you really need a regex for it. A plain
line.startsWith("M")
or
line.length() > 0 && line.charAt(0) == 'M'
or even just
line.indexOf('M') == 0
will work for your requirement.
Performance?
If you are also interested in performance, my second and third options win in that department, whereas the first one may easily be the slowest option: it must first compile the regex, then evaluate it. indexOf has the problem that its worst case is scanning the whole string.
UPDATE
In the meantime you have completely restated your question and made it clear that the regex is what you really need. In this case the following should work:
Matcher m = Pattern.compile("M.*?(\\S+)").matcher(input);
System.out.println(m.matches()? m.group(1) : "no match");
Note, this only works if the path doesn't contain spaces. If it does, then the problem is much harder.

You dont need a regex for that. Just use String#startsWith(String)
if (line.startsWith("M")) {
// code here
}
OR else use String#toCharArray():
if (line.length() > 0 && line.toCharArray()[0] == 'M') {
// code here
}
EDIT: After your edited requirement to get path from input string.
You still can avoid regex and have your code like this:
String path="";
if (line.startsWith("M"))
path = line.substring(line.lastIndexOf(' ')+1);
System.out.println(path);
OUTPUT:
src\com\company\testproject\TestDomainf1.java

You can use this pattern to check whether an M character appears as at the beginning of the string:
if (line.matches("M.*"))
But for something this simple, you can just use this:
if (line.length() > 0 && line.charAt(0) == 'M')

Why not do this
line.startsWith("M");

String str = new String("M 72208 70779 kij src/com/knapp/testproject/TestDomainf1.java");
if(str.startsWith("M") ){
------------------------
------------------------
}

If you need Path, you can split (I guess than \t is the separator) the string and take the latest field:
String[] tabS = "M 72208 70779 kij src\com\knapp\testproject\TestDomainf1.java".split("\t");
String path = tabS[tabS.length-1];

Related

Using regex in Java

I am checking for precedence in a string in my function. Is there a way to change my code to incorporate regex -- I am quite unfamiliar with regex, and though I have read some tutorials online, still not getting 'it' too well. So any guidance would really be appreciated.
For example in a string XLYZ, if the char after X is not L or C, the 'violation' statement gets printed. Here's the code below:
if (subtnString[cnt]=='X' && cnt+1<subtnString.length){
if(subtnString[cnt+1]!= 'L' || subtnString[cnt+1]!= 'C'){
System.out.println("Violation: X can be subtracted from L and C only");
return false;
}
}
Is there a way I can use regex to replace this code?
You can use something like this:
Pattern regex = Pattern.compile("^X[LC]");
Matcher regexMatcher = regex.matcher(subjectString);
if(regexMatcher.find() ) { // it matched!
}
else { // nasty message
}
In the demo, see the strings that match.
Explanation
The ^ anchor asserts that we are at the beginning of the string
X matches the literal X
[LC] is a character class that matches either one L or one C
Reference
Java Class String
Using Regular Expressions in Java
to match texts that violate your rule use this regex:
X[^LC]
see Demo
and to match regex that do not violate your rule use this:
X[LC]
see Demo

Java Regex needed

I need regex that will fail only for below patterns and pass for everything else.
RXXXXXXXXXX (X are digits)
XXX.XXX.XXX.XXX (IP address)
I have basic knowledge of regex but not sure how to achieve this one.
For the first part, I know how to use regex to not start with R but how to make sure it allows any number of digits except 10 is not sure.
^[^R][0-9]{10}$ - it will do the !R thing but not sure how to pull off the not 10 digits part.
Well, simply define a regex:
Pattern p = Pattern.compile("R[0-9]{10} ((0|1|)[0-9]{1,2}|2([0-4][0-9]|5[0-5]))(\\.((0|1|)[0-9]{1,2}|2([0-4][0-9]|5[0-5]))){3}");
Matcher m = p.matcher(theStringToMatch);
if(!m.matches()) {
//do something, the test didn't pass thus ok
}
Or a jdoodle.
EDIT:
Since you actually wanted two possible patterns to filter out, chance the pattern to:
Pattern p = Pattern.compile("(R[0-9]{10})|(((0|1|)[0-9]{1,2}|2([0-4][0-9]|5[0-5]))(\\.((0|1|)[0-9]{1,2}|2([0-4][0-9]|5[0-5]))){3})");
If you want to match the entire string (so that the string should start and end with the pattern, place ^ in from and $ at the end of the pattern.
This should work:
!(string.matches("R\d{10}|(\d{3}\\.){3}\d{3}");
The \d means any digit, the brackets mean how many times it is repeated, and the \. means the period character. Parentheses indicate a grouping.
Here's a good reference on java regex with examples.
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Regex is not meant to validate every kind of input. You could, but sometimes it is not the right approach (similar to use a wrench as a hammer: it could do it but is not meant for it).
Split the string in two parts, by the space, then validate each:
String foo = "R1234567890 255.255.255.255";
String[] stringParts = foo.split(" ");
Pattern p = Pattern.compile("^[^R][0-9]{10}$");
Matcher m = p.macher(stringParts[0]);
if (m.matches()) {
//the first part is valid
//start validating the IP
String[] ipParts = stringParts.split("\\.");
for (String ip : ipParts) {
int ipPartValue = Integer.parseInt(ip);
if (!(ipPartValue >= 0 && ipPartValue <= 255)) {
//error...
}
}
}

Java regular expression for repeated letters

I can't find a regex that matches repeated letters. My problem is that I want to use regex to filter out spam-mails, for example, I want to use regex to detect "spam" and "viagra" in these strings :
"xxxSpAmyyy",
"xxxSPAMyyy",
"xxxvI a Gr AA yyy",
"xxxV iiA gR a xxx"
Do You have any suggestions how I do that in a good way?
This ignores the case, and it takes them whether they are one next to another, or there are other characters in between them
"(?i).{0,}v.{0,}i.{0,}a.{0,}g.{0,}r.{0,}a.{0,}"
If you know how many characters can be between the letters, you can enter .{0,max_distance} instead of .{0,}
UPDATE:
It works even for duplicates, as i have tried it:
String str = "xxxV iiA gR a xxx";
if(str.matches("(?i).{0,}v.{0,}i.{0,}a.{0,}g.{0,}r.{0,}a.{0,}")){
System.out.println("Yes");
}
else{
System.out.println("No");
}
This prints Yes
I think, you're on wrong way. Filtering of spam is closely related to machine learning. I'd suggest you to read about Bayesian spam filtering.
If you suppose, that you'll get spam mails with misspelled words (and other kind of garbage) - I'd suggest to use filtering based not on entire words, but on n-grams.
Like searching this?
"v.{0,3}i.{0,3}a.{0,3}g.{0,3}r.{0,3}a"
See Pattern
Code:
This leaves space for 0 to 3 characters between characters. I did not compile the following,
but it "should work."
String[] strings = new String[] { ""xxxV iiA gR a xxx"" };
final Pattern spamPattern = makePattern("viagra");
for (String s : strings) {
boolean isSpam = spamPattern.matcher(s).find();
if (isSpam) {
System.out.println("Spam: " + s);
}
}
...
Pattern makePattern(String cusWord) {
cusWord = cusWord.toLowerCase();
StringBuilder sb = new StringBuilder();
sb.append("(?i)"); // Case-insensitive setting.
for (int i = 0; i < cusWord.length(); ) {
int cp = cusWord.codePointAt(i);
i += Character.charCount(cp);
if ('o' == cp) {
sb.append("[o0]");
} else if ('l' == cp) {
sb.append("[l1]");
} else {
sb.appendCodePoint(cp);
}
sb.append(".{0,3}"); // 0 - 3 occurrences of any char.
}
return Pattern.compile(sb.toString());
}
You could try using positive look-aheads
(?=.*v)(?=.*i)(?=.*a)(?=.*g)(?=.*r)(?=.*a).*
Edit:
(?=.*v.*i.*a.*g.*r.*a.*).*
Did you try any regex?
Something like \w*[sSpPaAmM]+\w* should do the trick
You can test your RE on this site : http://www.regexplanet.com/advanced/java/index.html

Double "Pipes" in title

At my job today, I was made aware of a little error in our pages' titles. Our site is built using .jsp pages and for the titles of our product pages we use
In our admin (where we can set up the titles for each of the products), we would normally add in * anyone ever run into this issue before, and if so, does anyone know of a way to fix the double pipes issue I have encountered?
Problem is that the method replaceAll has as the first argument regular expression. The "|" is reserved symbol in regular expressions and you must escape it if you want use it as a string literal. You can create workaround, for example this way.
String[] words = str.split(" ");
for (int i = 0; i < words.length; i++) {
if (words[i].length() > 0) {
if (!(words[i].substring(0, 1).equals("|"))) {
sb.append(words[i].replaceFirst(words[i].substring(0, 1), words[i].substring(0, 1).toUpperCase()) + " ");
} else {
sb.append(words[i] + " ");
}
}
}
Try using the html escape code for the pipe character ¦.
Your title would be:
"Monkey Thank You ¦ Monkey Thank You Cards"
I think the issue is in the fact that replaceFirst() takes a regex as parameter and a replacement string. Because you push in the first character as is for the regex parameter, what happens with the vertical bar is (omitting adding to the StringBuffer) equivalent to:
String addedToBuffer = "|".replaceFirst("|", "|".toUpperCase());
What happens then, is that we have a regex which matches the empty string or the empty string. Well, any string matches the empty string regex. So the match gets replaced by "|" (to upper case). So "|".replaceFirst("|", "|".toUpperCase()) expands to "||". So the append() call is given the parameter of "|| ".
You can fix your algorithm in two ways:
Fix the regex automatically, use literal notation in between \Q and \E. So your regex to pass to replaceFirst() becomes something like "\\Q"+ literal + "\\E".
Realise that you do not need regexes in the first place. Instead use two append() operations. One to append() the case converted first character of the item to add, the other to append the rest. This looks like this:
for(String s: items) {
if(s.equals("")) {
sb.append(" ");
}
else {
sb.append(Character.toUpperCase(s.charAt(0)));
if(s.length() > 1) {
sb.append(s.substring(1));
}
sb.append(" ");
}
}
The second approach is probably much easier to follow as well.
PS: For some reason the StackOverflow editor is vehemently disagreeing with code blocks in lists. If someone happens to know how to fix the munged formatting... ?

How to find if a Java String contains X or Y and contains Z

I'm pretty sure regular expressions are the way to go, but my head hurts whenever I try to work out the specific regular expression.
What regular expression do I need to find if a Java String (contains the text "ERROR" or the text "WARNING") AND (contains the text "parsing"), where all matches are case-insensitive?
EDIT: I've presented a specific case, but my problem is more general. There may be other clauses, but they all involve matching a specific word, ignoring case. There may be 1, 2, 3 or more clauses.
If you're not 100% comfortable with regular expressions, don't try to use them for something like this. Just do this instead:
string s = test_string.toLowerCase();
if (s.contains("parsing") && (s.contains("error") || s.contains("warning")) {
....
because when you come back to your code in six months time you'll understand it at a glance.
Edit: Here's a regular expression to do it:
(?i)(?=.*parsing)(.*(error|warning).*)
but it's rather inefficient. For cases where you have an OR condition, a hybrid approach where you search for several simple regular expressions and combine the results programmatically with Java is usually best, both in terms of readability and efficiency.
If you really want to use regular expressions, you can use the positive lookahead operator:
(?i)(?=.*?(?:ERROR|WARNING))(?=.*?parsing).*
Examples:
Pattern p = Pattern.compile("(?=.*?(?:ERROR|WARNING))(?=.*?parsing).*", Pattern.CASE_INSENSITIVE); // you can also use (?i) at the beginning
System.out.println(p.matcher("WARNING at line X doing parsing of Y").matches()); // true
System.out.println(p.matcher("An error at line X doing parsing of Y").matches()); // true
System.out.println(p.matcher("ERROR Hello parsing world").matches()); // true
System.out.println(p.matcher("A problem at line X doing parsing of Y").matches()); // false
try:
If((str.indexOf("WARNING") > -1 || str.indexOf("ERROR") > -1) && str.indexOf("parsin") > -1)
I usually use this applet to experiment with reg. ex. The expression may look like this:
if (str.matches("(?i)^.*?(WARNING|ERROR).*?parsing.*$")) {
...
But as stated in above answers it's better to not use reg. ex. here.
Regular Expressions are not needed here. Try this:
if((string1.toUpperCase().indexOf("ERROR",0) >= 0 ||
string1.toUpperCase().indexOf("WARNING",0) >= 0 ) &&
string1.toUpperCase().indexOf("PARSING",0) >= 0 )
This also takes care of the case-insensitive criteria
I think this regexp will do the trick (but there must be a better way to do it):
(.*(ERROR|WARNING).*parsing)|(.*parsing.*(ERROR|WARNING))
If you've a variable number of words that you want to match I would do something like that:
String mystring = "Text I want to match";
String[] matchings = {"warning", "error", "parse", ....}
int matches = 0;
for (int i = 0; i < matchings.length(); i++) {
if (mystring.contains(matchings[i]) {
matches++;
}
}
if (matches == matchings.length) {
System.out.println("All Matches found");
} else {
System.out.println("Some word is not matching :(");
}
Note: I haven't compiled this code, so could contain typos.
With multiple .* constucts the parser will invoke thousands of "back off and retry" trial matches.
Never use .* at the beginning or in the middle of a RegEx pattern.

Categories