I tried this regex to capture username
highs\(\d+\)\[.*?\]\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\]\sftid\(\d+\):\s.
It didn't work.
<55>Mar 17 12:02:00 forcesss-off [Father][1x91422234][eee][hote] abcd(QlidcxpOulqsf): highs(23455814)[mothers][192.192.21.12] ftid(64322816): oops authentication failed with (http-commo-auth, username='testuserMM' password='********'congratulation-fakem='login' )
You can use a much simpler regex for that:
\busername='([^']+)
See demo, result is in Group 1.
REGEX:
\b - Word boundary
username=' - literal string username='
([^']+) - A capturing group containing our substring that only contains 1 or more symbols other then a single apostrophe.
UPDATE:
Here are 2 ways to get the text you are looking for:
String str = "<55>Mar 17 12:02:00 forcesss-off [Father][1x91422234][eee][hote] abcd(QlidcxpOulqsf): highs(23455814)[mothers][192.192.21.12] ftid(64322816): oops authentication failed with (http-commo-auth, username='testuserMM' password='********'congratulation-fakem='login' )";
String res = str.replaceAll(".*\\busername='([^']+)'.*", "$1");
System.out.println(res);
String rx = "(?<=\\busername=')[^']+";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
See IDEONE demo
Related
I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22
I have this string:
text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz
I need to extract 123+456+789
What I done so far is:
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "text=(.*)&";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And I got all text until the last & which is: 123+456+789&xxxxxxxxx&yyyyyyyyyy while the requested output is: 123+456+789
Any suggestions how to fix it (regex is mandatory)?
Use a negated character class:
String ps = "text=([^&]*)";
The value you need will be in Group 1.
The [^&] matches any character but an ampersand.
You almost getting, you need to make your regex lazy (or non greedy) like this:
String ps = "text=(.*?)&";
here ---^
Working demo
Try this regex :
([0-9+]+)
Link : https://regex101.com/r/xU2zF4/1
java code :
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "([0-9+]+)";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0)); // value of s
System.out.println(m.group(1)); // returns 123+456+789
}
I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:
<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)
I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:
String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
(Dotall since it is a Multiline pattern and i want to cover linebreaks as well)
However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.
Any help is appreciated - thanks a lot!
The important thing to remember about this Java matches() method is that your regular expression must match the entire line.
So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:
String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
while(m.find()) {
System.out.println(m.group());
}
Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:
String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].
String s = "<!--LOGHEADER[START]/-->\n" +
"<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" +
"<!--LOGGINGVERSION[2.0.7.1006]/-->\n" +
"<!--NAME[./log/defaultTrace_00.trc]/-->\n" +
"<!--PATTERN[defaultTrace_00.trc]/-->\n" +
"<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" +
"<!--ENCODING[UTF8]/-->\n" +
"<!--FILESET[0, 20, 10485760]/-->\n" +
"<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" +
"<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" +
"<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" +
"<!--LOGHEADER[END]/-->\n" +
"#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.
Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{
System.out.println(m.group());
}
I have String user#domain:port
I want to fetch user, domain and port from this String.
So I created regex:
public static final String MATCH_USER_DOMAIN_PORT = "^([0-9,a-zA-Z-.*_]+)#([a-z0-9]+[\\.-][a-z0-9]+\\.[a-z]{2,}+):(6553[0-5]|655[0-2]\\d|65[0-4]\\d{2}|6[0-4]\\d{3}|[1-5]\\d{4}|[1-9]\\d{0,3})$";
and this is my method in Unitest so far:
public void test____matchesUserDomainWithPort(){
String identityText = "maxim#domain.com:5555";
String user = "";
String domain = "";
String port = "";
if(identityText.matches(MATCH_USER_DOMAIN_PORT))
{
Pattern p = Pattern.compile(MATCH_USER_DOMAIN_PORT);
Matcher m = p.matcher(identityText);
user = m.group(1);
domain= m.group(2);
port= m.group(3);
}
assertEquals("maxim", user);
assertEquals("domain.com", domain);
assertEquals("5555", port);
}
I get error:
java.lang.IllegalStateException: No successful match so far
at java.util.regex.Matcher.ensureMatch(Matcher.java:607)
....
in row: user = m.group(1);
I opened http://gskinner.com/RegExr/?2v5r0
and there all seems good:
Output:
RegExp: /^([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\.-][a-z0-9]+)*)+\.[a-z]{2,}+:(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})$/
pattern: ^([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\.-][a-z0-9]+)*)+\.[a-z]{2,}+:(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})$
flags:
3 capturing groups:
group 1: ([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\.-][a-z0-9]+)*)
group 2: ([\.-][a-z0-9]+)
group 3: (6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})
Do I miss something?
in C i just write: sscanf(identityText,"%[^#]#%[^:]:%511s",user,domain,port);
For sure I can split this text with # and : and get 3 values, but its interesting how to do that in gentle form :)
Please, help
Please use
if(identityText.matches(MATCH_USER_DOMAIN_PORT)){
Pattern p = Pattern.compile(MATCH_USER_DOMAIN_PORT);
Matcher m = p.matcher(identityText);
while(m.find()){
user = m.group(1);
domain= m.group(2);
port= m.group(3);
}
}
thanks
Yes, I think your regex is wrong.
public static final String MATCH_USER_DOMAIN_PORT = "^([0-9,a-zA-Z-.*_]+#[a-z0-9]+([\\.-][a-z0-9]+)*)+\\.[a-z]{2,}+:(6553[0-5]|655[0-2]\\d|65[0-4]\\d{2}|6[0-4]\\d{3}|[1-5]\\d{4}|[1-9]\\d{0,3})$";
To break it down:
^(
[0-9,a-zA-Z-.*_]+
any number of these characters, will match "maxim"
#
will match "#"
[a-z0-9]+
any number of these characters, will match "domain"
([\\.-][a-z0-9]+)*
will match ".com" (or theoretically ".somethingelse.com", nice)
)+
will make group #2 "maxim#domain.com", I believe, but what's with the "+" ?
\\.
nothing in the input string here
[a-z]{2,}+
is this for a country code like .eu ? Again, what's with the "+" ?
:
(6553[0-5]|655[0-2]\\d|65[0-4]\\d{2}|6[0-4]\\d{3}|[1-5]\\d{4}|[1-9]\\d{0,3})
seems overly complicated - probably don't do the numeric validation with the regex
$
Take a look at Using a regular expression to validate an email address for some advice on validation of email addresses.
I would like to extract the strings between the following characters in the given string using regex in Java:
/*
1) Between \" and \" ===> 12222222222
2) Between :+ and # ===> 12222222222
3) Between # and > ===> 192.168.140.1
*/
String remoteUriStr = "\"+12222222222\" <sip:+12222222222#192.168.140.1>";
String regex1 = "\"(.+?)\"";
String regex2 = ":+(.+?)#";
String regex3 = "#(.+?)>";
Pattern p = Pattern.compile(regex1);
Matcher matcher = p.matcher(remoteUri);
if (matcher.matches()) {
title = matcher.group(1);
}
I am using the above given code snippet, its not able to extract the strings that I want it to. Am I doing anything wrong? Meanwhile, I am quite new to regex.
The matches() method attempts to match the regular expression against the entire string. If you want to match a part of the string, you want the find() method:
if (matcher.find())
You could, however, build a single regular expression to match all three parts at once:
String regex = "\"(.+?)\" \\<sip:\\+(.+?)#(.+?)\\>";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(remoteUriStr);
if (matcher.matches()) {
title = matcher.group(1);
part2 = matcher.group(2);
ip = matcher.group(3);
}
Demo: http://ideone.com/8t2EC
If your input always looks like that and you always want the same parts from it you can put that in a single regex (with multiple capturing groups):
"([^"]+)" <sip:([^#]+)#([^>]+)>
So you can then use
Pattern p = Pattern.compile("\"([^\"]+)\" <sip:([^#]+)#([^>]+)>");
Matcher m = p.matcher(remoteUri);
if (m.find()) {
String s1 = m.group(1);
String s2 = m.group(2);
String s3 = m.group(3);
}