I am a new to Java. I want to search for a string in text file. Suppose the file contains:
Hi, I am learning Java.
I am using this below pattern to search through every exact word.
Pattern p = Pattern.compile("\\b"+search string+"\\b", Pattern.CASE_INSENSITIVE);
It works fine but it doesn't find "java." How to find both patterns. i.e with boundary symbols and with "." at end of the string. Does anyone have any ideas on how I can solve this problem?
You should parse your search string in order to change the dot . into a RegEx dot: \\.. Note that a single dot is a metacharacter in Regular Expressions and means any character. For example, you can replace all the dots in your String for \\.
If you don't want to do all that job, then just send java\\. as your search string
More info:
Using Regular Expressions in Java
Java Regex Tutorial
Java Regular Expressions
Code example:
public static void main(String[] args) {
String fileContent = "Hi i am learning java.";
String searchString = "java";
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(fileContent );
while(m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
It would print: 17 java
public static void main(String[] args) {
String fileContent = "Hi i am learning java.";
String searchString = "java\\.";
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(fileContent );
while(m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
It would print: 17 java. (note the dot in the end)
EDIT: As a very basic solution, since the only problem you have is with the dot, you can replace all the dots in your string with \\.
public static void main(String[] args) {
String fileContent = "Hi i am learning java.";
String searchString = "java.";
//this will do the trick even if the "searchString" doesn't contain a dot inside
searchString = searchString.replaceAll("\\.", "\\.");
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(fileContent );
while(m.find()) {
System.out.println(m.start() + " " + m.group());
}
}
"\\b" + searchstring + "(?:\\.|\\b)"
If you want to stipulate that the dot must be followed by a non-word character or the end of the string, you could add a positive look-ahead
"\\b" + searchstring + "(?:\\.(?=\\W|$)|\\b)"
Pattern p = Pattern.compile(".*\\W*" + searchWord + "\\W*.*", Pattern.CASE_INSENSITIVE);
To be absolutely sure, the above says "find me a bit of text that starts with 0 or more characters, followed by 0 or more non-word characters specifically (\W* - the word boundary) followed by the search word, followed by the next word boundary followed by anything else".
This will caters for situations where the search word is at the beginning of the file, at the very end, or between punctuation eg: "hi,I am learning,java.".
Hope this helps...
Related
I have to remove "OR" if it ends with in a given string.
public class StringReplaceTest {
public static void main(String[] args) {
String text = "SELECT count OR %' OR";
System.out.println("matches:" + text.matches("OR$"));
Pattern pattern = Pattern.compile("OR$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("Found match at: " + matcher.start() + " to " + matcher.end());
System.out.println("substring:" + text.substring(matcher.start(), matcher.end()));
text = text.replace(text.substring(matcher.start(), matcher.end()), "");
System.out.println("after replace:" + text);
}
}
}
Output:
matches:false
Found match at: 19 to 21
substring:OR
after replace:SELECT count %'
Its removing all the occurrences of the string "OR" but I have to remove if its ends with only.
How to do that ?
Also regex is working with Pattern but not working with String.matches().
What is the difference between both and what is the best way to remove a string if it ends with ?
text.matches(".*OR$") as the match goes over the entire string.
Or:
if (text.endsWith("OR"))
Or:
text = text.replaceFirst(" OR$", "");
If you need to just remove the last OR, then I suggest using substring method as it is faster than a full regex pattern. In that case, you can remove the OR using this code:
text.substring(0, text.lastIndexOf("OR"));
If you need to replace OR by something else, you will need to use this code which detects the last OR with a break in the string.
text.replaceFirst("\\bOR$", "SOME");
I have a problem with not working REGEX. I dont know what I am doing wrong. My code:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("\\btimetable:(.*);");
//also tried "timetable:(.*);" and "(\\btimetable:)(.*)(;)"
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("S:" + m.start() + ", E:" + m.end());
System.out.println("x: "+ test.substring(m.start(), m.end()));
}
Expected result:
(1) "timetable:xxxxxtimetable:"
(2) "timetable: fullihhghtO"
I thanks for any help.
A non-capturing group could be handy in our case:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("(?:\\btimetable:(.*?);)+"); // <-- here
Matcher m = p.matcher(test);
int i = 1;
while (m.find()) {
System.out.println(i + ") "+ m.group(1));
i++;
}
OUTPUT
1) xxxxxtimetable:
2) fullihhghtO
Regex explained:
(?:\\btimetable:(.*?);)+ by using the non-capturing (?:\\btimetable:...) we'll consume the "timetable:" without capturing it, then the second matching group (.*?) captures what we want to capture (everything between \btimetable: and ;). Pay special attention to the non-greedy term: .*? which means that we'll consume the minimum possible amount of characters until the ;. If we won't use this lazy form, the regex will use "greedy" default mode and will consume all the characters until the last ; in the string!
Now, all that is relevant if you wanted to catch only the unique part, but if you wanted to catch the whole thing:
1) timetable:xxxxxtimetable:;
2) timetable: fullihhghtO;
It can be done easily by modifying the line with the regex to:
Pattern p = Pattern.compile("\\b(timetable:.*?;)+");
which is even simpler: only one capturing group (see that we still have to use the non-greedy mode!).
You don't need to use regex, a simple split would do it :
public static void main(String[] args) throws IOException {
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
String[] array = test.split(";");
String str1 = array[0].trim();
String str2 = array[1].trim();
System.out.println(str1 + "\n" + str2); //timetable:xxxxxtimetable:
//timetable: fullihhghtO
}
I have a String that contains new line characters say...
str = "Hello\n"+"Batman,\n" + "Joker\n" + "here\n"
I would want to know how to find the existance of a particular word say .. Joker in the string str using java.lang.String.matches()
I find that str.matches(".*Joker.*") returns false and returns true if i remove the new line characters. So what would be the regex expression to be used as an argument to str.matches()?
One way is... str.replaceAll("\\n","").matches(.*Joker.*);
The problem is that the dot in .* does not match newlines by default. If you want newlines to be matched, your regex must have the flag Pattern.DOTALL.
If you want to embed that in a regex used in .matches() the regex would be:
"(?s).*Joker.*"
However, note that this will match Jokers too. A regex does not have the notion of words. Your regex would therefore really need to be:
"(?s).*\\bJoker\\b.*"
However, a regex does not need to match all its input text (which is what .matches() does, counterintuitively), only what is needed. Therefore, this solution is even better, and does not require Pattern.DOTALL:
Pattern p = Pattern.compile("\\bJoker\\b"); // \b is the word anchor
p.matcher(str).find(); // returns true
You can do something much simpler; this is a contains. You do not need the power of regex:
public static void main(String[] args) throws Exception {
final String str = "Hello\n" + "Batman,\n" + "Joker\n" + "here\n";
System.out.println(str.contains("Joker"));
}
Alternatively you can use a Pattern and find:
public static void main(String[] args) throws Exception {
final String str = "Hello\n" + "Batman,\n" + "Joker\n" + "here\n";
final Pattern p = Pattern.compile("Joker");
final Matcher m = p.matcher(str);
if (m.find()) {
System.out.println("Found match");
}
}
You want to use a Pattern that uses the DOTALL flag, which says that a dot should also match new lines.
String str = "Hello\n"+"Batman,\n" + "Joker\n" + "here\n";
Pattern regex = Pattern.compile("".*Joker.*", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(str);
if (regexMatcher.find()) {
// found a match
}
else
{
// no match
}
Im using string.split(regex) so cut my string after every ',' but i dont know how to cut after the space that follows after the ','.
String content = new String("I, am, the, goddman, Batman");
content.split("(?<=,)");
gives me the array
{"I,"," am,"," the,"," goddman,"," Batman"}
what i actually want is
{"I, ","am, ","the, ","goddman, ","Batman "}
can anyone help me please?
Just add the space into your regex:
http://ideone.com/W8SaL
content.split("(?<=, )");
Also, you typoed goddman.
Using a positive lookbehind will not allow you to perform the match in case the string is separated with multiple spaces.
public static void main(final String... args) {
// final Pattern pattern = Pattern.compile("(?<=,\\s*)"); won't work!
final Pattern pattern = Pattern.compile(".+?,\\s*|.+\\s*$");
final Matcher matcher =
pattern.matcher("I, am, the, goddamn, Batman ");
while (matcher.find()) {
System.out.format("\"%s\"\n", matcher.group());
}
Output:
"I, "
"am, "
"the, "
"goddamn, "
"Batman "
Why this code:
String keyword = "pattern";
String text = "sometextpatternsometext";
String patternStr = "^.*" + keyword + ".*$"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("start = " + start + ", end = " + end);
}
start = 0, end = 23
don't work properly.
But, this code:
String keyword = "pattern";
String text = "sometext pattern sometext";
String patternStr = "\\b" + keyword + "\\b"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("start = " + start + ", end = " + end);
}
start = 9, end = 16
work fine.
It does work. Your pattern
^.*pattern.*$
says to match:
start at the beginning
accept any number of characters
followed by the string pattern
followed by any number of characters
until the end of the string
The result is the entire input string. If you wanted to find only the word pattern, then the regex would be just the word by itself, or as you found, bracketed with word-boundary metacharacters.
It is not that the first example didn't work, it is that you inadvertently asked it to match more than you meant.
The .* expressions expand to contain all the characters before "pattern" and all the characters after pattern, so the whole expression matches the whole line.
With your second example, you only specify that it match a blank space before and after "pattern" so the expression matches mostly pattern, plus a couple of spaces.
The problem is in your regex: "^.*" + keyword + ".*$"
The expression .* matches as many characters as there are in the string. It means that it actually matches whole string. After the whole string it cannot find your keyword.
To make it working you have to make it greedy, i.e. add question sign after .*:
"^.*?" + keyword + ".*$"
This time .*? matches minimum characters followed by your keyword.