Regex and RSS Feed - java

i write a simple Rss Feed reader now i have this problem, in the item description i have the text but this text have this caracters <br/>
for example
"my dog <br/> is black and he <br/> stay on table "
now i wont to clear the string from this caracters, i wirte this metod
private static String IsMatch(String s, String pattern) {
try {
Pattern patt = Pattern.compile(pattern);
Matcher matcher = patt.matcher(s);
return matcher.group();
} catch (RuntimeException e) {
return "Error";
} }
and
String regex ="[<br/>]";
theString2=IsMatch(theString,regex);
AppLog.logString(theString2);
but But this method return always Error. Can any one tell me what's the problem?
best regads
Antonio

The problem is that you never invoke find() on your matcher. You must invoke find() before invoking group() (and test that find() returns true).
I am not sure of what your method IsMatch is supposed to do. As it is, it will either return the match (ie, "<br/>", assuming you invoke find() before) either return "Error".
Also, don't put the brackets around <br/> in your regexp, they are not needed.
I wouls really consider using replace instead of regexp for your purposes:
String s = "my dog <br/> is black and he <br/> stay on table ".replace("<br/>","");
As a recommendation, don't catch an exception without logging the it. They provide valuable information that should not be hidden. It make debug really hard when a problem occurs.

String regex ="(\<br\/\>)";
I think you need to pay attention to the grammar details about the regular expression:
1. escape the "<" like special characters
different language of regular expression have different specifications

Related

Find List of String which mache in Text using Regex

I am getting stuck in this situation.
public void findListOfPattern(){
String text = "abce1213abcd231asdf";
String find = "1213|231|1232";
Pattern part = Pattern.compile(find);
Matcher mat = part.matcher(text);
System.out.println(mat.find()); //True
}
Able to get true result if any of string in find get match.
I want list of matcher from text.
There text can large with more find string and also find string can more.
In find : 1213,231,1232 are separates.
Result should be like :- 1213,231
You need to invoke mat.group() to return the desired match.
Typically you'd loop until mat.find() returns true and print all matches successively by invoking mat.group().
You can then build your expected result String by concatenating the outcome of mat.group() as you wish, e.g. with a StringBuilder.
Notes
API here.
You need to invoke Matcher#find in order for Matcher#group to yield any result and not throw IllegalStateException
Your Pattern only has the default group. If you'd used parenthesis or named groups (from Java 7), you could also invoke overloads Matcher#group(int group) or Matcher#group(String name).

Regex captuing group

I am trying to capture host address from string with regex. My code looks like the following:
private static final Pattern OBTAIN_HOST_PATTERN = Pattern.compile("Host:\\s?(.*)");
public static String getHostAddress(String line) {
Matcher m = OBTAIN_HOST_PATTERN.matcher(line);
if (m.matches()) {
return OBTAIN_HOST_PATTERN.matcher(line).group(1);
}
return "Pattern does not match.";
}
Then I call getHostAddress("Host: abc"); and it gives me java.lang.IllegalStateException: No match found which means it matches the pattern but group capturing does not work. So, could you please help me find out why does this happen and what I am missing. Thanks in advance :)
Edit: I resolved the issue. It was because I am getting the matcher twice (or at least I think this was the reason), but can someone explain why does this happen?
The statement
return OBTAIN_HOST_PATTERN.matcher(line).group(1);
calls neither matches or find. As the if statement has already found a match so you can just do
return m.group(1);
You could even do better, by naming your group so you don't get confused with group indexes while trying to find your corresponding group. It can be achieved by doing the following thing :
"Host:\\s?(?<mygroupname>.*)"
and then
m.group("mygroupname")
A bit of doc about it : https://blogs.oracle.com/xuemingshen/entry/named_capturing_group_in_jdk7

Looking for Correct Java REGEX for this kind of payload

I have following two different payloads where I am trying to Write Java Regex:
Input Payload 1
ISA*00* *00* *ZZ*EXDO *ZZ*047336389 *150327*1007*U*00401*900063730*0*P*>~
GS*QM*EXDO*047336389*20150327*1007*900063730*X*004010~
ST*214*900063730~
B10*326GENT15173**EXDO~
L11*019*TN~
Input Payload 2
ISA*00* *00* *02*HJBT *01*047336389 *140103*1751*U*00401*000012003*0*P*>\
GS*QM*HJBT*047336389*20140103*1751*12003*X*004010\
ST*214*0001\
B10*117094*B065199*HJBT\
N1*SH*INTEVA PRODUCTS LLC-\
I have following REGEX:
.*(ST\*214|ST\*210).*
I tried to evaluate the REGEX on this URL http://www.regexplanet.com/advanced/java/index.html
I see matches() as NO for 1st Payload and matches() as YES for 2nd Payload. I am looking for Updated REGEX which actually works for BOTH conditions here.
My Purpose here to validate payload information just like String contains method can do it using following approach.
payload.toString().contains('ST*214') || payload.toString().contains('ST*210').
I want to use regex instead of string.contains here.
"(?s).*(ST\\*214|ST\\*210).*"
In Java you need to enable DOTALL mode (to make . match with line terminators too). This can be done by including (?s) modifier. You had match only in this ST*214*900063730~ particular part of first string.
use the following regexp:
".*(ST\*214|ST\*210).*"
Have tested your two strings with following code:
public class RegTest {
public static void main (String[] args) {
String test1 = "ISA*00* 00 ZZEXDO *ZZ*047336389*150414*1108*U*00401*979863647*0*P*>~ GSQMEXDO*047336389*20150414*1108*979863647*X*004010~ ST*214*979863647~ B10*186143**EXDO~";
String test2 = "ISA*00* 00 *02*HJBT *01*047336389*140103*1751*U*00401*000012003*0*P*>\\GSQMHJBT*047336389*20140103*1751*12003*X*004010\\ST*214*0001\\B10*117094*B065199*HJBT\\N1*SH*INTEVA PRODUCTS LLC-\\";
if (test1.matches(".*(ST\\*214|ST\\*210).*")) {
System.out.println("String1 matches");
}
if (test2.matches(".*(ST\\*214|ST\\*210).*")) {
System.out.println("String2 matches");
}
}
}
just small fix, regexp in comment lost two '\' characters. You can use the regexp from code.
I think you try to match the wildcard character '*' so you should use backslashes :
.*(ST\*214|ST\*210).*
or
.*ST\*(214|210).*
or
.*ST\*21(4|0).*
or
.*ST\*21[40].*
Are the linefeed part of your payload or just some formatting ?

Modify "Black Box" Servlet Response Output

Problem:
I have a servlet that generate reports, more specifically the table body of a report. It is a black box, we do not have access to the source code.
Nevertheless, its working satisfactory, and the servlet is not planned to be rewritten or replaced anytime soon.
We need to modify its response text in order to update a few links it generates to other reports, I was thinking of doing it with a filter that would find the anchor text and replace it using a regex.
Research:
I ran into this question that has a regex filter. It should be what I need, but then maybe not.
I am not trying to parse HTML in the strict sense of the parsing term, and I am not working with the full spec of the language. What I have is a subset of HTML tags that compose a table body, and does not have nested tables, so the HTML subset generated by the servlet is not recursive.
I just need to find / replace the anchors targets and add an attribute to the tag.
So the question is:
I need to modify the output of a servlet in order to change all links of the kind:
<a href="http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg">
into links like:
<a href="http://myOtherPage.com/webReports/report.xhtml?id=MyReport&filters=abcdefg" target="_parent">
Should I use the regex filter written by # Jeremy Stein or is there a better solution?
Assuming that the only part of the target A tags which vary is the query component of the href attribute, then this tested regex solution should do a pretty good job:
// TEST.java 20121024_0800
import java.util.regex.*;
public class TEST {
public static String fixReportAnchorElements(String text) {
Pattern re_report_anchor = Pattern.compile(
"<a href=\"http://mypage\\.com/servlets/reports/\\?a=report&id=([^\"]+)\">",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher m = re_report_anchor.matcher(text);
return m.replaceAll(
"<a href=\"http://myOtherPage.com/webReports/report.xhtml?id=$1\" target=\"_parent\">"
);
}
public static void main(String[] args) {
String input =
"test <a href=\"http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg\"> test";
String output = fixReportAnchorElements(input);
System.out.println(output);
}
}
I used Jeremy Stein (click to go to question) classes, with a few changes:
a) Make sure nobody down the filter chain or the servlet DO NOT call getOutputStream() on the wrapper object, or it will throw an invalidStateException (check this answer by BalusC on the subject).
b) I wanted to make a single change on the page, so I did not put any filterConfig on the web.xml.
b.2) I also did not put anything on the web.xml at all. Used the javax.servlet.annotation.WebFilter on the class itself.
c) I set the Pattern and replace strings directly on the class:
Pattern searchPattern = Pattern.compile("<a (.*?) href=\".*?id=(.*?)[&|&]filtros=(.*?)\" (.*?)>(.*?)</a>");
String replaceString = "<a $1 href=\"/webReports/report.xhtml?idRel=$2&filtros=$3\" target=\"_parent\" $4>$5</a>";
note the .*? to have as little as possible matched, to avoid matching more than wanted.
For testing the matching and the regex, I used this applet I found while researching the subject.
Hope this helps anyone with the same problem.

Capturing dot and comma in Java RegExp

I have following code in Java:
Pattern fieldsPattern = Pattern.compile("(\"([^\"]+)\")|"
+"("+this.field_tag+"([0-9a-zA-Z_]+))");
Matcher fieldsMatcher = fieldsPattern.matcher(field);
while(fieldsMatcher.find())
{
//...
}
This code should capture expressions like "expression" and :expression (field_tag is just ":"). The problem occurs when I try to capture an expression like: "10.1" or "10,1". It dosen't work.
But expressions:
"10-1",
"10+1"
works as expected.
I also tried use this regexp on regexpal.com - site with javascript implementation of RegExp. On this site expressions like "10.1" and "10,1" works fine.
Is there any difference in java vs javascript in capturing dots? What am I doing wrong?
This works for me
Pattern fieldsPattern = Pattern.compile("(\"[^\"]+\")");
String field =" aa \"10\" \"10.1\" and \"10,1\"";
Matcher fieldsMatcher = fieldsPattern.matcher(field);
while(fieldsMatcher.find()) {
System.out.println(fieldsMatcher.group());
}
prints
"10"
"10.1"
"10,1"
The second set of brackets in the regex appear to be redundant, but are harmless.

Categories