i have a file which contains "(*" and "*)". i want to remove everything between this two char sequences.
i used the following code but it didn't do anything with my string.
String regex = "\\(\\*.*\\*\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
matcher.replaceAll("");
the 'input' is:
(* This program prints out a message. *)
program is
begin
write ("Hello, world!");
end;
You need to capture the return value of your matcher - it's replaceAll method returns the replaced String.
Additionally, use a regexp to match what you want to match, this time a parenthesized String. If you don't have some strange inputs, it may look like this:
String regex = "\\(\\*.*\\*\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.replaceAll("(\\*\\*)");
System.out.println(result);
This regexp in fact captures the whole region from the first comment start to the last comment end, which would usually not be what you want. To let it match non-greedy (reluctantly), use this regexp: \(\*.*?\*\) (with doubled backslashes in Java.)
Related
I have following input String:
abc.def.ghi.jkl.mno
Number of dot characters may vary in the input. I want to extract the word after the last . (i.e. mno in the above example). I am using the following regex and its working perfectly fine:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.find()) {
System.out.println(matcher.group(1));
}
However, I am using a third party library which does this matching (Kafka Connect to be precise) and I can just provide the regex pattern to it. The issue is, this library (whose code I can't change) uses matches() instead of find() to do the matching, and when I execute the same code with matches(), it doesn't work e.g.:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
The above code doesn't print anything. As per the javadoc, matches() tries to match the whole String. Is there any way I can apply similar logic using matches() to extract mno from my input String?
You may use
".*\\.([^.]*)"
It matches
.*\. - any 0+ chars as many as possible up to the last . char
([^.]*) - Capturing group 1: any 0+ chars other than a dot.
See the regex demo and the Regulex graph:
To extract a word after the last . per your instruction you could do this without Pattern and Matcher as following:
String input = "abc.def.ghi.jkl.mno";
String getMe = input.substring(input.lastIndexOf(".")+1, input.length());
System.out.println(getMe);
This will work. Use .* at the beginning to enable it to match the entire input.
public static void main(String[] argv) {
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile(".*([^.]{3})$");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
}
}
abc.def.ghi.jkl.mno
mno
This is a better pattern if the dot really is anywhere: ".*\\.([^.]+)$"
I have a string:
bundle://24.0:0/com/keop/temp/Activator.class
And from this string I need to get com/keop/temp/Activator but the following pattern:
Pattern p = Pattern.compile("bundle://.*/(.*)\\.class");
returns only Activator. Where is my mistake?
You need to follow the initial token .* with ? for a non-greedy match.
bundle://.*?/(.*)\\.class
^
Your regex uses greedy matching with a . that matches any character (but a newline). .*/ reads everything up to the final /, (.*)\\. matches everything up to the final period. Instead of lazy matching, you can restrict the characters matched to non-/ before the string you want to match. Change to
Pattern p = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Sample code:
String str = "bundle://24.0:0/com/keop/temp/Activator.class";
Pattern ptrn = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
Output of the sample program:
com/keop/temp/Activator
I have a regex w_p[a-z]
It would match input like w_pa, w_pb ... w_pz. I like to find which character exactly was matched i.e. a,b or z for the above input. Is this possible with java regex?
Yes, you need to capture:
final Pattern pattern = Pattern.compile("w_p([a-z])");
final Matcher m = pattern.matcher(input);
if (m.find())
// what is matched is in m.group(1)
Sure, use Regexpr groups. w_p([a-z]) defines a group for the character you are looking for.
Pattern p = Pattern.compile("w_p([a-z])");
Matcher matcher = p.matcher(input);
if (matcher.find()) {
String character = matcher.group(1)
}
matcher.group(0) contains all that was matched (w_pa or w_pb etc.)
matcher.group(1) contains what was found in the first () pair.
See the documentation for more information.
The REGEX will be something like this:
w_p([a-z])
So you will create a group from wich you can get the value
i am new to regular expressions in Java. I like to extract a string by using regular expressions.
This is my String: "Hello,World"
I like to extract the text after ",". The result would be "World". I tried this:
final Pattern pattern = Pattern.compile(",(.+?)");
final Matcher matcher = pattern.matcher("Hello,World");
matcher.find();
But what would be the next step?
You don't need Regex for this. You can simply split on comma and get the 2nd element from the array: -
System.out.println("Hello,World".split(",")[1]);
OUTPUT: -
World
But if you want to use Regex, you need to remove ? from your Regex.
? after + is used for Reluctant matching. It will only match W and stop there.
You don't need that here. You need to match until it can match.
So use greedy matching instead.
Here's the code with modified Regex: -
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
OUTPUT: -
World
Extending what you have, you need to remove the ? sign from your pattern to use the greedy matching and then process the matched group:
final Pattern pattern = Pattern.compile(",(.+)"); // removed your '?'
final Matcher matcher = pattern.matcher("Hello,World");
while (matcher.find()) {
String result = matcher.group(1);
// work with result
}
Other answers suggest different approaches to your problem and might offer better solution for what you need.
System.out.println( "Hello,World".replaceAll(".*,(.*)","$1") ); // output is "World"
You are using a reluctant expression and will only select a single character W, whereas you can use a greedy one and print your matched group content:
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
World
See Regex Pattern doc
I'm trying to extract snippets of dialogue from a book text. For example, if I have the string
"What's the matter with the flag?" inquired Captain MacWhirr. "Seems all right to me."
Then I want to extract "What's the matter with the flag?" and "Seem's all right to me.".
I found a regular expression to use here, which is "[^"\\]*(\\.[^"\\]*)*". This works great in Eclipse when I'm doing a Ctrl+F find regex on my book .txt file, but when I run the following code:
String regex = "\"[^\"\\\\]*(\\\\.[^\"\\\\]*)*\"";
String bookText = "\"What's the matter with the flag?\" inquired Captain MacWhirr. \"Seems all right to me.\""; Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(bookText);
if(m.find())
System.out.println(m.group(1));
The only thing that prints is null. So am I not converting the regex into a Java string properly? Do I need to take into account the fact that Java Strings have a \" for the double quotes?
In a natural language text, it's not likely that " is escaped by a preceding slash, so you should be able to use just the pattern "([^"]*)".
As a Java string literal, this is "\"([^\"]*)\"".
Here it is in Java:
String regex = "\"([^\"]*)\"";
String bookText = "\"What's the matter with the flag?\" inquired Captain MacWhirr. \"Seems all right to me.\"";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(bookText);
while (m.find()) {
System.out.println(m.group(1));
}
The above prints:
What's the matter with the flag?
Seems all right to me.
On escape sequences
Given this declaration:
String s = "\"";
System.out.println(s.length()); // prints "1"
The string s only has one character, ". The \ is an escape sequence present at the Java source code level; the string itself has no slash.
See also
JLS 3.10.6 Escape Sequences for Character and String Literals
The problem with the original code
There's actually nothing wrong with the pattern per se, but you're not capturing the right portion. \1 isn't capturing the quoted text. Here's the pattern with the correct capturing group:
String regex = "\"([^\"\\\\]*(?:\\\\.[^\"\\\\]*)*)\"";
String bookText = "\"What's the matter?\" inquired Captain MacWhirr. \"Seems all right to me.\"";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(bookText);
while (m.find()) {
System.out.println(m.group(1));
}
For visual comparison, here's the original pattern, as a Java string literal:
String regex = "\"[^\"\\\\]*(\\\\.[^\"\\\\]*)*\""
^^^^^^^^^^^^^^^^^
why capture this part?
And here's the modified pattern:
String regex = "\"([^\"\\\\]*(?:\\\\.[^\"\\\\]*)*)\""
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
we want to capture this part!
As mentioned before, though: this complicated pattern isn't necessary for natural language text, which isn't likely to contain escaped quotes.
See also
regular-expressions.info/Grouping and backreferences