Regular expression not matching on first and last word of string - java

I am trying to write a java program that will look for a specific words in a string. I have it working for the most part but it doesnt seem to match if the word to match is the first or last word in the string. Here is an example:
"trying to find the first word".matches(".*[^a-z]find[^a-z].*") //returns true
"trying to find the first word".matches(".*[^a-z]trying[^a-z].*") //returns false
"trying to find the first word".matches(".*[^a-z]word[^a-z].*") //returns false
Any idea how to make this match on any word in the string?
Thanks in advance,
Craig

The problem is your character class before and after the words [^a-z]- I think that what you actually want is a word boundary character \b (as per ColinD's comment) as opposed to not a character in the a-z range. As pointed out in the comments (thanks) you'll also needs to handle the start and end of string cases.
So try, eg:
"(?:^|.*\b)trying(?:\b.*|$)"

You can use the optional (?) , check below link and test more cases if this give proper output:
https://regex101.com/r/oP5zB8/1
(.*[^a-z]?trying[^a-z]?.*)

I think (^|^.*[^a-z])trying([^a-z].*$|$) just fits your need.
Or (?:^|^.*[^a-z])trying(?:[^a-z].*$|$) for non capturing parentheses.

You can try following program to check the existence on start and end of any string:
package com.ajsodhi.utilities;
import java.util.regex.Pattern;
public class RegExStartEndWordCheck {
public static final String stringToMatch = "StartingsomeWordsEndWord";
public static void main(String[] args) {
String regEx = "Starting[A-Za-z0-9]{0,}EndWord";
Pattern patternOriginalSign = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
boolean OriginalStringMatchesPattern = patternOriginalSign.matcher(stringToMatch).matches();
System.out.println(OriginalStringMatchesPattern);
}
}

you should use the boundary \b that's specify a beginning or a ending of a word instead of [^a-z] which is not so logic.
Just something like
".*\\bfind\\b.*"

Related

How to match two string using java Regex

String 1= abc/{ID}/plan/{ID}/planID
String 2=abc/1234/plan/456/planID
How can I match these two strings using Java regex so that it returns true? Basically {ID} can contain anything. Java regex should match abc/{anything here}/plan/{anything here}/planID
If your "{anything here}" includes nothing, you can use .*. . matches any letter, and * means that match the string with any length with the letter before, including 0 length. So .* means that "match the string with any length, composed with any letter". If {anything here} should include at least one letter, you can use +, instead of *, which means almost the same, but should match at least one letter.
My suggestion: abc/.+/plan/.+/planID
If {ID} can contain anything I assume it can also be empty.
So this regex should work :
str.matches("^abc.*plan.*planID$");
^abc at the beginning
.* Zero or more of any Character
planID$ at the end
I am just writing a small code, just check it and start making changes as per you requirement. This is working, check for your other test cases, if there is any issue please comment that test case. Specifically I am using regex, because you want to match using java regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class MatchUsingRejex
{
public static void main(String args[])
{
// Create a pattern to be searched
Pattern pattern = Pattern.compile("abc/.+/plan/.+/planID");
// checking, Is pattern match or not
Matcher isMatch = pattern.matcher("abc/1234/plan/456/planID");
if (isMatch.find())
System.out.println("Yes");
else
System.out.println("No");
}
}
If line always starts with 'abc' and ends with 'planid' then following way will work:
String s1 = "abc/{ID}/plan/{ID}/planID";
String s2 = "abc/1234/plan/456/planID";
String pattern = "(?i)abc(?:/\\S+)+planID$";
boolean b1 = s1.matches(pattern);
boolean b2 = s2.matches(pattern);

Match String ending with (regex) java

I am following the suggestions on the page, check if string ends with certain pattern
I am trying to display a string that is
Starts with anything
Has the letters ".mp4" in it
Ends explicitly with ', (apostrophe followed by comma)
Here is my Java code:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
// your code goes here
String str = " _file='ANyTypEofSTR1ngHere_133444556_266545797_10798866.mp4',";
Pattern p = Pattern.compile(".*.mp4[',]$");
Matcher m = p.matcher(str);
if(m.find())
System.out.println("yes");
else
System.out.println("no");
}
}
It prints "no". How should I declare my RegEx?
There are several issues in your regex:
"Has the letters .mp4 in it" means somewhere, not necessarily just in front of ',, so another .* should be inserted.
. matches any character. Use \. to match .
[,'] is a character group, i.e. exactly one of the characters in the brackets has to occur.
You can use the following regex instead:
Pattern p = Pattern.compile(".*\\.mp4.*',$");
Your character set [',] is checking whether the string ends with ' or , a single time.
If you want to match those character one or more times, use [',]+. However, you probably don't want to use a character set in this case since you said order is important.
To match an apostrophe followed by comma, just use:
.*\\.mp4',$
Also, since . has special meaning, you need to escape it in '.mp4'.

Regex to detect end of line(\n) that has double slash(//)

I need a regex for this example:
//This is a comment and I need this \n position
String notwanted ="//I do not need this end of line position";
Try this regex:
(?<!")\/\/[^\n]+(\n)
you can use Matcher method matcher.start(1) to get index of \n character, but in will not match String where \\ is preceded by ". Example in Java:
public class Main {
public static void main(String[] args){
String example = "//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";";
Pattern regex = Pattern.compile("(?<!\")//[^\\n]+(\\n)");
Matcher matcher = regex.matcher(example);
while (matcher.find()) {
System.out.println(matcher.start(1));
}
}
}
however it would be enough to use:
(?<!")\/\/[^\n]+
and just use matcher.end(), to get start position of new line.
Another case, if you would like to split a string using this position, you can also use this one:
example.split("(?<=^//[^\n]{0,1000})\n");
The (?<=^//[^\n]{0,999}) means:
?<= - lookbehind,
^// - beginning of a line, fallowed by // comments sign
[^\n]{0,1000} - multiple characters but not new lines; here is tricky thing, as lookbehind need to have defined lenght, you cannot use quatifires like * or +, this is why you need to use interval, in this case, from 0 to 1000 characters, but be aware, if your comment is more than 1000 characters (not too possible but still possible), it will not work - so set this number (1000 in this example) carefully
\n - new line you are looking for
but if you would like to split whole string in multiple places, you will need to add modifier (?m) - multiline match - on the beginning of regex:
(?m)(?<=^//[^\n]{0,1000})\n
but I'm not entirely sure
>>EDIT<< response to questions from comments
Try this code:
public class Main {
public static void main(String[] args){
String example =
"//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";\n" +
"String a = aaa; //comment here";
Pattern regex = Pattern.compile("(?m)(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)");
Matcher matcher = regex.matcher(example);
while(matcher.find()){
System.out.println(matcher.start());
}
System.out.println(example.replaceAll("(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)", " (X)\n"));
}
}
maybe this regex will fulfill your expectations. If not, please redefine and ask another question with more details like: input, expexted output, your current code, your goal.
This should work for you. It's really really awful. Couldn't really think of a much better, versatile solution. I'm assuming you also wanted comments like this:
String myStr = "asasdasd"; //some comment here
^[^"\n]*?(?:[^"\n]*?"(?>\\"|[^"\n])*?"[^"\n]*?)*?[^"\n]*?\/\/.*?(\n)
Regex101

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

How do I know if a regexp has more than one possible match?

I am writing Java code that has to distinguish regular expressions with more than one possible match from regular expressions that have only one possible match.
For example:
"abc." can have several matches ("abc1", abcf", ...),
while "abcd" can only match "abcd".
Right now my best idea was to look for all unescaped regexp special characters.
I am convinced that there is a better way to do it in Java. Ideas?
(Late addition):
To make things clearer - there is NO specific input to test against. A good solution for this problem will have to test the regex itself.
In other words, I need a method who'se signature may look something like this:
boolean isSingleResult(String regex)
This method should return true if only for one possible String s1. The expression s1.matches(regex) will return true. (See examples above.)
This sounds dirty, but it might be worth having a look at the Pattern class in the Java source code.
Taking a quick peek, it seems like it 'normalize()'s the given regex (Line 1441), which could turn the expression into something a little more predictable. I think reflection can be used to tap into some private resources of the class (use caution!). It could be possible that while tokenizing the regex pattern, there are specific indications if it has reached some kind "multi-matching" element in the pattern.
Update
After having a closer look, there is some data within package scope that you can use to leverage the work of the Pattern tokenizer to walk through the nodes of the regex and check for multiple-character nodes.
After compiling the regular expression, iterate through the compiled "Node"s starting at Pattern.root. Starting at line 3034 of the class, there are the generalized types of nodes. For example class Pattern.All is multi-matching, while Pattern.SingleI or Pattern.SliceI are single-matching, and so on.
All these token classes appear to be in package scope, so it should be possible to do this without using reflection, but instead creating a java.util.regex.PatternHelper class to do the work.
Hope this helps.
If it can only have one possible match it isn't reeeeeally an expression, now, is it? I suspect your best option is to use a different tool altogether, because this does not at all sound like a job for regular expressions, but if you insist, well, no, I'd say your best option is to look for unescaped special characters.
The only regular expression that can ONLY match one input string is one that specifies the string exactly. So you need to match expressions with no wildcard characters or character groups AND that specify a start "^" and end "$" anchor.
"the quick" matches:
"the quick brownfox"
"the quick brown dog"
"catch the quick brown fox"
"^the quick brown fox$" matches ONLY:
"the quick brown fox"
Now I understand what you mean. I live in Belgium...
So this is something what work on most expressions. I wrote this by myself. So maybe I forgot some rules.
public static final boolean isSingleResult(String regexp) {
// Check the exceptions on the exceptions.
String[] exconexc = "\\d \\D \\w \\W \\s \\S".split(" ");
for (String s : exconexc) {
int index = regexp.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
// Then remove all exceptions:
String regex = regexp.replaceAll("\\\\.", "");
// Now, all the strings how can mean more than one match
String[] mtom = "+ . ? | * { [:alnum:] [:word:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]".split(" ");
// iterate all mtom-Strings
for (String s : mtom) {
int index = regex.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
return true;
}
Martijn
I see that the only way is to check if regexp matches multiple times for particular input.
package com;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class AAA {
public static void main(String[] args) throws Exception {
String input = "123 321 443 52134 432";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
int i = 0;
while (matcher.find()) {
++i;
}
System.out.printf("Matched %d times%n", i);
}
}

Categories