i am new to regular expressions in Java. I like to extract a string by using regular expressions.
This is my String: "Hello,World"
I like to extract the text after ",". The result would be "World". I tried this:
final Pattern pattern = Pattern.compile(",(.+?)");
final Matcher matcher = pattern.matcher("Hello,World");
matcher.find();
But what would be the next step?
You don't need Regex for this. You can simply split on comma and get the 2nd element from the array: -
System.out.println("Hello,World".split(",")[1]);
OUTPUT: -
World
But if you want to use Regex, you need to remove ? from your Regex.
? after + is used for Reluctant matching. It will only match W and stop there.
You don't need that here. You need to match until it can match.
So use greedy matching instead.
Here's the code with modified Regex: -
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
OUTPUT: -
World
Extending what you have, you need to remove the ? sign from your pattern to use the greedy matching and then process the matched group:
final Pattern pattern = Pattern.compile(",(.+)"); // removed your '?'
final Matcher matcher = pattern.matcher("Hello,World");
while (matcher.find()) {
String result = matcher.group(1);
// work with result
}
Other answers suggest different approaches to your problem and might offer better solution for what you need.
System.out.println( "Hello,World".replaceAll(".*,(.*)","$1") ); // output is "World"
You are using a reluctant expression and will only select a single character W, whereas you can use a greedy one and print your matched group content:
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
World
See Regex Pattern doc
Related
I have following input String:
abc.def.ghi.jkl.mno
Number of dot characters may vary in the input. I want to extract the word after the last . (i.e. mno in the above example). I am using the following regex and its working perfectly fine:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.find()) {
System.out.println(matcher.group(1));
}
However, I am using a third party library which does this matching (Kafka Connect to be precise) and I can just provide the regex pattern to it. The issue is, this library (whose code I can't change) uses matches() instead of find() to do the matching, and when I execute the same code with matches(), it doesn't work e.g.:
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile("([^.]+$)");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(1));
}
The above code doesn't print anything. As per the javadoc, matches() tries to match the whole String. Is there any way I can apply similar logic using matches() to extract mno from my input String?
You may use
".*\\.([^.]*)"
It matches
.*\. - any 0+ chars as many as possible up to the last . char
([^.]*) - Capturing group 1: any 0+ chars other than a dot.
See the regex demo and the Regulex graph:
To extract a word after the last . per your instruction you could do this without Pattern and Matcher as following:
String input = "abc.def.ghi.jkl.mno";
String getMe = input.substring(input.lastIndexOf(".")+1, input.length());
System.out.println(getMe);
This will work. Use .* at the beginning to enable it to match the entire input.
public static void main(String[] argv) {
String input = "abc.def.ghi.jkl.mno";
Pattern pattern = Pattern.compile(".*([^.]{3})$");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
}
}
abc.def.ghi.jkl.mno
mno
This is a better pattern if the dot really is anywhere: ".*\\.([^.]+)$"
I have a string:
bundle://24.0:0/com/keop/temp/Activator.class
And from this string I need to get com/keop/temp/Activator but the following pattern:
Pattern p = Pattern.compile("bundle://.*/(.*)\\.class");
returns only Activator. Where is my mistake?
You need to follow the initial token .* with ? for a non-greedy match.
bundle://.*?/(.*)\\.class
^
Your regex uses greedy matching with a . that matches any character (but a newline). .*/ reads everything up to the final /, (.*)\\. matches everything up to the final period. Instead of lazy matching, you can restrict the characters matched to non-/ before the string you want to match. Change to
Pattern p = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Sample code:
String str = "bundle://24.0:0/com/keop/temp/Activator.class";
Pattern ptrn = Pattern.compile("bundle://[^/]*/(.*)\\.class");
Matcher matcher = ptrn.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(1));
Output of the sample program:
com/keop/temp/Activator
I need to print the simple bind variable names in the SQL query.
I need to print the words starting with : character But NOT ending with dot . character.
in this sample I need to print pOrg, pBusinessId but NOT the parameter.
The regular expression ="(:)(\\w+)^\\." is not working.
Could you help in correcting the regular expression.
Thanks
Peddi
public void testMethod(){
String regEx="(:)(\\w+)([^\\.])";
String input= "(origin_table like 'I%' or (origin_table like 'S%' and process_status =5))and header_id = NVL( :parameter.number1:NULL, header_id) and (orginization = :pOrg) and (businsess_unit = :pBusinessId";
Pattern pattern;
Matcher matcher;
pattern = Pattern.compile(regEx);
matcher = pattern.matcher(input);
String grp = null;
while(matcher.find()){
grp = matcher.group(2);
System.out.println(grp);
}
}
You can try with something like
String regEx = "(:)(\\w+)\\b(?![.])";
(:)(\\w+)\\b will make sure that you are matching only entire words starting with :
(?![.]) is look behind mechanism which makes sure that after found word there is no .
This regex will also allow :NULL so if there is some reason why it shouldn't be matched share it with us.
Anyway to exclude NULL from results you can use
String regEx = "(:)(\\w+)\\b(?![.])(?<!:NULL)";
To make regex case insensitive so NULL could also match null compile this pattern with Pattern.CASE_INSENSITIVE flag like
Pattern pattern = Pattern.compile(regEx,Pattern.CASE_INSENSITIVE);
Since it looks like you're using camelcase, you can actually simplify things a bit when it comes to excluding :NULL:
:([a-z][\\w]+)\\b(?!\\.)
And $1 will return your variable names.
Alternative that doesn't rely on negative lookahead:
:([a-z][\\w]+)\\b(?:[^\\.]|$)
You can try:
Pattern regex = Pattern.compile("^:.*?[^.]$");
Demo
I have a regex w_p[a-z]
It would match input like w_pa, w_pb ... w_pz. I like to find which character exactly was matched i.e. a,b or z for the above input. Is this possible with java regex?
Yes, you need to capture:
final Pattern pattern = Pattern.compile("w_p([a-z])");
final Matcher m = pattern.matcher(input);
if (m.find())
// what is matched is in m.group(1)
Sure, use Regexpr groups. w_p([a-z]) defines a group for the character you are looking for.
Pattern p = Pattern.compile("w_p([a-z])");
Matcher matcher = p.matcher(input);
if (matcher.find()) {
String character = matcher.group(1)
}
matcher.group(0) contains all that was matched (w_pa or w_pb etc.)
matcher.group(1) contains what was found in the first () pair.
See the documentation for more information.
The REGEX will be something like this:
w_p([a-z])
So you will create a group from wich you can get the value
i have a file which contains "(*" and "*)". i want to remove everything between this two char sequences.
i used the following code but it didn't do anything with my string.
String regex = "\\(\\*.*\\*\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
matcher.replaceAll("");
the 'input' is:
(* This program prints out a message. *)
program is
begin
write ("Hello, world!");
end;
You need to capture the return value of your matcher - it's replaceAll method returns the replaced String.
Additionally, use a regexp to match what you want to match, this time a parenthesized String. If you don't have some strange inputs, it may look like this:
String regex = "\\(\\*.*\\*\\)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.replaceAll("(\\*\\*)");
System.out.println(result);
This regexp in fact captures the whole region from the first comment start to the last comment end, which would usually not be what you want. To let it match non-greedy (reluctantly), use this regexp: \(\*.*?\*\) (with doubled backslashes in Java.)