Get String in between either single quotes or empty space - java

I wish to have a regular expression which gives me the name of classLoader inserted in single quotes/empty-space but not a mixture of both.
i.e. some examples. :
2014-05-21 22:05:13.685 TRACE [Core]
sun.misc.Launcher$AppClassLoader#62c8aeb3 searching for resource
'java/util/LoggerFactory.class'.
expected output sun.misc.Launcher$AppClassLoader#62c8aeb3
2014-05-21 22:05:13.685 TRACE [Core] Class
'org.jboss.weld.metadata.TypeStore' not found in classloader
'org.jboss.modules.ModuleClassLoader#4ebded0b'.
expected output org.jboss.modules.ModuleClassLoader#4ebded0b
2014-05-21 22:04:34.591 INFO [Core] Started plugin
org.zeroturnaround.javarebel.integration.IntegrationPlugin from
/Users/endragor/Downloads/jrebel/jrebel.jar in
sun.misc.Launcher$AppClassLoader#62c8aeb3
expected output sun.misc.Launcher$AppClassLoader#62c8aeb3
Note that for last exampe, the line ends with new line character. i.e. there is nothing in front.
This is what I have tried ".*[\\s|'](.*ClassLoader.*[^']*)['|\\s].*". But it doesn't work. For the first example it gives below rather than sun.misc.Launcher$AppClassLoader#62c8aeb3:
sun.misc.Launcher$AppClassLoader#62c8aeb3 searching for resource
'java/util/LoggerFactory.class
Also my regex does not handle if the class loader string is end of the line i.e. example-3 above. What can I do so that either ' is considered or \\s but not both

Try this one :
String extractedValue=yourString.replaceAll("(.*)([ '])(.*ClassLoader.*?)(\\2)(.*)", "$3");
Whenever we want to extract String between a predifined set of value , where the first and last delimiter should have the same value , we can use Backreference feature .

this regex should do without grouping:
[^\s']*ClassLoader[^\s']*
in java it should be:
[^\\s']*ClassLoader[^\\s']*
you don't need the pipe | in [..], in regex [abcd] means a or b or c or d
update
add java codes:
public static void main(String[] args){
Pattern p = Pattern.compile("[^\\s']*ClassLoader[^\\s']*");
Matcher m = p.matcher("2014-05-21 22:05:13.685 TRACE [Core] sun.misc.Launcher$AppClassLoader#62c8aeb3 searching for resource 'java/util/LoggerFactory.class'.");
if (m.find()) {
System.out.println(m.group());
}
}
output:
sun.misc.Launcher$AppClassLoader#62c8aeb3

As operator has changed the original post so here is the updated answer.
Simply use below pattern to check for default toString() representation of Object class.
[\w\.$]+ClassLoader#[a-z0-9]+
Pattern Expiation:
\w A word character: [a-zA-Z_0-9]
X+ X, one or more times
[abc] a, b, or c (simple class)
Snapshot:
Here is the DEMO

Related

Regular expression: Replace everything before first occurence

I have the following regular expression that I'm using to remove the dev. part of my URL.
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll(".*\\.(?=.*\\.)", ""));
Outputs: mydomain.com but this is giving me issues when the domains are in the vein of dev.mydomain.com.pe or dev.mydomain.com.uk in those cases I am getting only the .com.pe and .com.uk parts.
Is there a modifier I can use on my regex to make sure it only takes what is before the first . (dot included)?
Desired output:
dev.mydomain.com -> mydomain.com
stage.mydomain.com.pe -> mydomain.com.pe
test.mydomain.com.uk -> mydomain.com.uk
You may use
^[^.]+\.(?=.*\.)
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1 or more chars other than dots
\. - a dot
(?=.*\.) - followed with any 0 or more chars other than line break chars as many as possible and then a ..
Java usage example:
String result = domain.replaceFirst("^[^.]+\\.(?=.*\\.)", "");
Following regex will work for you. It will find first part (if exists), captures rest of the string as 2nd matching group and replaces the string with 2nd matching group. .*? is non-greedy search that will match until it sees first dot character.
(.*?\.)?(.*\..*)
Regex Demo
sample code:
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "stage.mydomain.com.pe";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "test.mydomain.com.uk";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
output:
mydomain.com
mydomain.com.pe
mydomain.com.uk
mydomain.com

Substring between lines using Regular Expression Java

Hi I am having following string
abc test ...
interface
somedata ...
xxx ...
!
sdfff as ##
example
yyy sdd ## .
!
I have a requirement that I want to find content between a line having word "interface" or "example" and a line "!".
Required output will be something like below
String[] output= {"somedata ...\nxxx ...\n","yyy sdd ## .\n"} ;
I can do this manually using substring and iteration . But I want to achieve this using regular expression.
Is it possible?
This is what I have tried
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?m)\ninterface(.*?)\n!\n");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Am I Right? Please suggest a right way of doing it .
Edit :
A small change : I want to find content between a line "interface" or "example" and a line "!".
Can we achieve this too using regex ?
You could use (?s) DOTALL modifier.
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?s)(?<=\\ninterface\\n).*?(?=\\n!\\n)");//Pattern.compile("(?m)^.*$");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Output:
somedata
xxx
yyy
Note that the input in your example is different.
(?<=\\ninterface\\n) Asserts that the match must be preceded by the characters which are matched by the pattern present inside the positive lookbehind.
(?=\\n!\\n) Asserts that the match must be followed by the characters which are matched by the pattern present inside the positive lookahead.
Update:
Pattern pattern = Pattern.compile("(?s)(?<=\\n(?:example|interface)\\n).*?(?=\\n!\\n)");

extracting a particular field from url

I want to extract particular fields from a url of a facebookpage. Iam not able to extract since link format is not static.eg:if I gave the below examples as input it should give the o/p as what we desire
1)https://www.facebook.com/pages/Ice-cream/109301862430120?rf=102173023157556
o/p -109301862430120
What about this type of link
can anyone help me
So in short, you want to get name after last / and (if there is any) before ? mark.
You can do it with using URI and File classes like
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
System.out.println(new File(new URI(data).getRawPath()).getName());
Output: 149675731889496
If you need to use regex then you can use
([^/?]+)(\\?|$)
and just read content of group 1 (the one in first pair of parenthesis).
If you don't want to use groups, and make regex match only digit part (without including ? in match) then you can use look around mechanisms like look-ahead (?=...). Regex you would have to use would look like
[^/?]+(?=\\?|$)
Code example:
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
Pattern p = Pattern.compile("([^/?]+)(\\?|$)");
Matcher m = p.matcher(data);
if (m.find()){
System.out.println(m.group(1));
}
Output:
149675731889496

Using regex pattern in Java

I created a regex pattern that works perfect, but I can't get it working in Java:
(\\"|[^" ])+|"(\\"|[^"])*"
applied to
robocopy "C:\test" "C:\test2" /R:0 /MIR /NP
gives (as it should)
[0] => robocopy
[1] => "C:\test"
[2] => "C:\test2"
[3] => /R:0
[4] => /MIR
[5] => /NP
in group 0 according to http://myregextester.com/index.php
Now, how do I get those 6 values in Java?
I tried
Pattern p = Pattern.compile(" (\\\"|[^\" ])+ | \"(\\\"|[^\"])*\" ");
Matcher m = p.matcher(command);
System.out.println(m.matches()); // returns false
but the pattern doesn't even match anything at all?
Update
The original perl regex was:
(\\"|[^" ])+|"(\\"|[^"])*"
Since the regexp string is first processed by the compiler before making it to the regexp processor, you need to double every backslahs in the expression, and add additional slashes for every doublequote.
Pattern p = Pattern.compile("(\\\\\"|[^\" ])+|\"(\\\\\"|[^\"])*\"");
The matches() method is matching the whole string to the regex - it returns true only if the entire string is matching
What you are looking for is the find() method, and get the substring using the group() method.
It is usually done by iterating:
while (m.find()) {
.... = m.group();
//post processing
}
matches() tries to match the pattern on entire string. You should use find() method of the Matcher object for your case.
So the solution is:
System.out.println(m.find());

How to split this string using Java Regular Expressions

I want to split the string
String fields = "name[Employee Name], employeeno[Employee No], dob[Date of Birth], joindate[Date of Joining]";
to
name
employeeno
dob
joindate
I wrote the following java code for this but it is printing only name other matches are not printing.
String fields = "name[Employee Name], employeeno[Employee No], dob[Date of Birth], joindate[Date of Joining]";
Pattern pattern = Pattern.compile("\\[.+\\]+?,?\\s*" );
String[] split = pattern.split(fields);
for (String string : split) {
System.out.println(string);
}
What am I doing wrong here?
Thank you
This part:
\\[.+\\]
matches the first [, the .+ then gobbles up the entire string (if no line breaks are in the string) and then the \\] will match the last ].
You need to make the .+ reluctant by placing a ? after it:
Pattern pattern = Pattern.compile("\\[.+?\\]+?,?\\s*");
And shouldn't \\]+? just be \\] ?
The error is that you are matching greedily. You can change it to a non-greedy match:
Pattern.compile("\\[.+?\\],?\\s*")
^
There's an online regular expression tester at http://gskinner.com/RegExr/?2sa45 that will help you a lot when you try to understand regular expressions and how they are applied to a given input.
WOuld it be better to use Negated Character Classes to match the square brackets? \[(\w+\s)+\w+[^\]]\]
You could also see a good example how does using a negated character class work internally (without backtracking)?

Categories