Regex - find data inside left and right encloses - java

I have this string:
text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz
I need to extract 123+456+789
What I done so far is:
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "text=(.*)&";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And I got all text until the last & which is: 123+456+789&xxxxxxxxx&yyyyyyyyyy while the requested output is: 123+456+789
Any suggestions how to fix it (regex is mandatory)?

Use a negated character class:
String ps = "text=([^&]*)";
The value you need will be in Group 1.
The [^&] matches any character but an ampersand.

You almost getting, you need to make your regex lazy (or non greedy) like this:
String ps = "text=(.*?)&";
here ---^
Working demo

Try this regex :
([0-9+]+)
Link : https://regex101.com/r/xU2zF4/1
java code :
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "([0-9+]+)";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0)); // value of s
System.out.println(m.group(1)); // returns 123+456+789
}

Related

Extracting string from within round brackets in Java with regex

I'm trying to extract a string from round brackets.
Let's say, I have John Doe (123456789) and I want to output the string 123456789 only.
I have found this link and this regex:
/\(([^)]+)\)/g
However, I wasn't able to figure out how to get the wanted result.
Any help would be appreciated. Thanks!
String str="John Doe (123456789)";
System.out.println(str.substring(str.indexOf("(")+1,str.indexOf(")")));
Here I'm performing string operations. I'm not that much familiar with regex.
this works for me :
#Test
public void myTest() {
String test = "test (mytest)";
Pattern p = Pattern.compile("\\((.*?)\\)");
Matcher m = p.matcher(test);
while(m.find()) {
assertEquals("mytest", m.group(1));
}
}
You need to escape brackets in your regexp:
String in = "John Doe (123456789)";
Pattern p = Pattern.compile("\\((\\d*)\\)");
Matcher m = p.matcher(in);
while (m.find()) {
System.out.println(m.group(1));
}
In Java, you need to use
String pattern = "\\(([^()]+)\\)";
Then, the value you need is in .group(1).
String str = "John Doe (123456789)";
String rx = "\\(([^()]+)\\)";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(1));
}
See IDEONE demo

Java Regex Multiline issue

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:
<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)
I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:
String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
(Dotall since it is a Multiline pattern and i want to cover linebreaks as well)
However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.
Any help is appreciated - thanks a lot!
The important thing to remember about this Java matches() method is that your regular expression must match the entire line.
So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:
String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
while(m.find()) {
System.out.println(m.group());
}
Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:
String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].
String s = "<!--LOGHEADER[START]/-->\n" +
"<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" +
"<!--LOGGINGVERSION[2.0.7.1006]/-->\n" +
"<!--NAME[./log/defaultTrace_00.trc]/-->\n" +
"<!--PATTERN[defaultTrace_00.trc]/-->\n" +
"<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" +
"<!--ENCODING[UTF8]/-->\n" +
"<!--FILESET[0, 20, 10485760]/-->\n" +
"<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" +
"<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" +
"<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" +
"<!--LOGHEADER[END]/-->\n" +
"#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.
Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{
System.out.println(m.group());
}

Java regex - why did the Matcher find extra characters?

I was experimenting trying to extract the 't' and 'f' flags from here.
So I was surprised to see extra characters in the output. Apparently the matcher backtracked - I dont understand why. What should be the correct regex?
System.out.println("searching...");
// "Sun:\\s Mon:\\s Tue:\\s Wed:\\s Thu:\\s Fri:\\s Sat:\\s "
Pattern p = Pattern.compile("[t|f]");
Matcher m = p.matcher("Sun:t Mon:f Tue:t Wed:t Thu:f Fri:t Sat:f ");
while (m.find()) {
System.out.println(m.group());
}
Output:
searching...
t
f
t
t
f
t
t
f
Sat has a t in it. Try ":([tf])" instead.
Pattern p = Pattern.compile(":([tf])");
Matcher m = p.matcher("Sun:t Mon:f Tue:t Wed:t Thu:f Fri:t Sat:f ");
while (m.find()) {
System.out.println(m.group(1));
}

Regx for extracting substring from in between data using java

String line = "asdasdasdasd <meta name=\"generator\" content=\"WordPress 3.5.2\" /> asdasdasdasdasd";
Pattern p = Pattern.compile("<meta name=\"generator\" content=\"WordPress\\s+([\\d.]+)\" />");
Matcher m = p.matcher(line);
if(m.matches())
System.out.println(m.group(1));
else
System.out.println("not found");
The regex I have used does not give the desired result. I want the wordpress version from the supplied string.
Matcher#matches() matches at the beginning of the string. So, you would need to build regex for complete string.
Alternatively, you can use Matcher#find() with just the regex for relevant part of the string:
Pattern p = Pattern.compile("content=\"WordPress\\s+([\\d.]+)\"");
Matcher m = p.matcher(line);
if(m.find())
System.out.println(m.group(1));
else
System.out.println("not found");
You have to escape the dot and accept more numbers just in case
Pattern p = Pattern.compile("WordPress\\s+([\\d+\\.]+)");

Java URL regex not matching

I am trying to count the number of URLs in a Java string:
String test = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
String urlRegex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]>";
int numUrls = 0;
pattern = Pattern.compile(urlRegex);
matcher = pattern.matcher(test);
while(matcher.find())
numUrls++;
System.err.println("numUrls = " + numUrls);
When I run this it tells me I have zero (not 2) URLs in the string. Any ideas as to why? Thanks in advance!
The < and > characters in urlRegex are causing a mismatch between your pattern and your input test String. Removing them will yield a numUrls value of 2 as intended.
Try this code :
String data = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
Pattern pattern = Pattern.compile("[hH][tT]{2}[Pp][sS]?://(\\w+(\\.\\w+?)?)+");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group());
}
Hopefully it will work.

Categories