Java regex replace ignoring the first match - java

mediaSourceSpecificJunkCharacters=mediaSourceSpecificJunkCharacters+",";
Pattern p = Pattern.compile("\\[(.*?)\\],",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher matcher = p.matcher(mediaSourceSpecificJunkCharacters);
while(matcher.find()) {
String stringToMatch=matcher.group(1);
System.out.println("string to match "+stringToMatch);
originalText=originalText.replaceAll(stringToMatch.trim(),"");
}
here originalText="this is data from youtube youtube1 youtube2 youtube3 youtube4";
and mediaSourceSpecificJunkCharacters=[youtube2],[youtube3],[youtube4]
the first match is youtube3 and not youtube2....so youtube2 never gets replaced...why is it so?

You don't even have youtube1 in your mediaSourceSpecificJunkCharacters. Change that to
String mediaSourceSpecificJunkCharacters = "[youtube1],[youtube2],[youtube3],[youtube4]";
and also change your pattern to
Pattern p = Pattern.compile("\\[(.*?)\\]", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
if you want to replace youtube4 too, the , at the end prevents this in your code.

Related

Regex - find data inside left and right encloses

I have this string:
text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz
I need to extract 123+456+789
What I done so far is:
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "text=(.*)&";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And I got all text until the last & which is: 123+456+789&xxxxxxxxx&yyyyyyyyyy while the requested output is: 123+456+789
Any suggestions how to fix it (regex is mandatory)?
Use a negated character class:
String ps = "text=([^&]*)";
The value you need will be in Group 1.
The [^&] matches any character but an ampersand.
You almost getting, you need to make your regex lazy (or non greedy) like this:
String ps = "text=(.*?)&";
here ---^
Working demo
Try this regex :
([0-9+]+)
Link : https://regex101.com/r/xU2zF4/1
java code :
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "([0-9+]+)";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0)); // value of s
System.out.println(m.group(1)); // returns 123+456+789
}

Java Regex Multiline issue

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:
<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)
I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:
String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
(Dotall since it is a Multiline pattern and i want to cover linebreaks as well)
However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.
Any help is appreciated - thanks a lot!
The important thing to remember about this Java matches() method is that your regular expression must match the entire line.
So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:
String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
while(m.find()) {
System.out.println(m.group());
}
Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:
String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].
String s = "<!--LOGHEADER[START]/-->\n" +
"<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" +
"<!--LOGGINGVERSION[2.0.7.1006]/-->\n" +
"<!--NAME[./log/defaultTrace_00.trc]/-->\n" +
"<!--PATTERN[defaultTrace_00.trc]/-->\n" +
"<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" +
"<!--ENCODING[UTF8]/-->\n" +
"<!--FILESET[0, 20, 10485760]/-->\n" +
"<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" +
"<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" +
"<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" +
"<!--LOGHEADER[END]/-->\n" +
"#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.
Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{
System.out.println(m.group());
}

How to match regex in java

I want a pattern like this: GJ-16-RS-1234 and I have applied following patterns but they are not working.
My regex patterns are:
String str_tempPattern = "(^[A-Z]{2})\\-([0-9]{2})\\-([A-Z]{1,2})\\-([0-9]{1,4}$)";
String str_tempPattern = "(^[A-Z]{2})-([0-9]{1,2})-([A-Z]{1,2})-([0-9]{1,4})$";
String str_tempPattern = "^[A-Z]{2}\\-[0-9]{1,2}\\-[A-Z]{1,2}\\-[0-9]{1,4}$";
And I am using text watcher to check for any change in the aftertextchange()
Pattern p = Pattern.compile(str_tempPattern, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.find()){
}
Just set the condition using matches method.
if (string.matches("[A-Z]{2}\\-[0-9]{1,2}\\-[A-Z]{1,2}\\-[0-9]{1,4}"))
{
// Yes it matches
}
else
{
// No it won't
}

Matcher. How to get index of found group?

I have sentence and I want to calculate words, semiPunctuation and endPunctuation in it.
Command "m.group()" will show String result. But how to know which group is found?
I can use method with "group null", but it is sounds not good.
String input = "Some text! Some example text."
int wordCount=0;
int semiPunctuation=0;
int endPunctuation=0;
Pattern pattern = Pattern.compile( "([\\w]+) | ([,;:\\-\"\']) | ([!\\?\\.]+)" );
Matcher m = pattern.matcher(input);
while (m.find()) {
// need more correct method
if(m.group(1)!=null) wordCount++;
if(m.group(2)!=null) semiPunctuation++;
if(m.group(3)!=null) endPunctuation++;
}
You could use named groups to capture the expressions
Pattern pattern = Pattern.compile( "(?<words>\\w+)|(?<semi>[,;:\\-\"'])|(?<end>[!?.])" );
Matcher m = pattern.matcher(input);
while (m.find()) {
if (m.group("words") != null) {
wordCount++;
}
...
}

How do I use regex in Java to pull this from html?

I'm trying to pull data from the ESPN box scores, and one of the html files has:
<td style="text-align:left" nowrap>Channing Frye, PF</td>
and I'm only interested in grabbing the name (Channing Frye) and the position (PF)
Right now, I've been using Pattern.quote(start) + "(.*?)" + Pattern.quote(end) to grab text in between start and end, but I'm not sure how I'm supposed to grab text that starts with pattern .../http://espn.go.com/nba/player/_/id/ and then can contain (any integer)/anyfirst-anylast"> then grab the name I need (Channing Frye), then </a>, and then grab the position I need (PF) and ends with pattern </td>
Thanks!
Here is the pattern:
http://espn.go.com/nba/player/_/id/(\d+)/([\w-]+)">(.*?)</a>,\s*(\w+)</td>
You can use this tool - http://www.regexplanet.com/advanced/java/index.html for verifying regular expressions.
You could use this pattern:
\\/nba\\/player\\/_\\/.*\\\">(.*)<.+>,\\s(.*)<
This will match any link in the html that contains `/nba/player/
String re = "\\/nba\\/player\\/_\\/.*\\">(.*)<.+>,\\s(.*)<";
String str = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern p = Pattern.compile(re, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
example: http://regex101.com/r/hA3uV0
Use this regex:
[A-Z\sa-z0-9]+(?=</a>)|\w+(?=</td>)
Here is one regex:
. is used for any item, .+ is used for any 1+ items
.* means o or more items
\s is used for space
String str = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern pattern = Pattern.compile("<td.+>.*<a.+>(.+)</a>[\\s,]+(.+)</td>");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
You can use :
String lString = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern lPattern = Pattern.compile("<td.+><a.+id/\\d+/.+\\-.+>(.+)</a>, (.+)</td>");
Matcher lMatcher = lPattern.matcher(lString);
while(lMatcher.find()) {
System.out.println(lMatcher.group(1));
System.out.println(lMatcher.group(2));
}
This will give you :
Channing Frye
PF

Categories