Problem:
I have a servlet that generate reports, more specifically the table body of a report. It is a black box, we do not have access to the source code.
Nevertheless, its working satisfactory, and the servlet is not planned to be rewritten or replaced anytime soon.
We need to modify its response text in order to update a few links it generates to other reports, I was thinking of doing it with a filter that would find the anchor text and replace it using a regex.
Research:
I ran into this question that has a regex filter. It should be what I need, but then maybe not.
I am not trying to parse HTML in the strict sense of the parsing term, and I am not working with the full spec of the language. What I have is a subset of HTML tags that compose a table body, and does not have nested tables, so the HTML subset generated by the servlet is not recursive.
I just need to find / replace the anchors targets and add an attribute to the tag.
So the question is:
I need to modify the output of a servlet in order to change all links of the kind:
<a href="http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg">
into links like:
<a href="http://myOtherPage.com/webReports/report.xhtml?id=MyReport&filters=abcdefg" target="_parent">
Should I use the regex filter written by # Jeremy Stein or is there a better solution?
Assuming that the only part of the target A tags which vary is the query component of the href attribute, then this tested regex solution should do a pretty good job:
// TEST.java 20121024_0800
import java.util.regex.*;
public class TEST {
public static String fixReportAnchorElements(String text) {
Pattern re_report_anchor = Pattern.compile(
"<a href=\"http://mypage\\.com/servlets/reports/\\?a=report&id=([^\"]+)\">",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher m = re_report_anchor.matcher(text);
return m.replaceAll(
"<a href=\"http://myOtherPage.com/webReports/report.xhtml?id=$1\" target=\"_parent\">"
);
}
public static void main(String[] args) {
String input =
"test <a href=\"http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg\"> test";
String output = fixReportAnchorElements(input);
System.out.println(output);
}
}
I used Jeremy Stein (click to go to question) classes, with a few changes:
a) Make sure nobody down the filter chain or the servlet DO NOT call getOutputStream() on the wrapper object, or it will throw an invalidStateException (check this answer by BalusC on the subject).
b) I wanted to make a single change on the page, so I did not put any filterConfig on the web.xml.
b.2) I also did not put anything on the web.xml at all. Used the javax.servlet.annotation.WebFilter on the class itself.
c) I set the Pattern and replace strings directly on the class:
Pattern searchPattern = Pattern.compile("<a (.*?) href=\".*?id=(.*?)[&|&]filtros=(.*?)\" (.*?)>(.*?)</a>");
String replaceString = "<a $1 href=\"/webReports/report.xhtml?idRel=$2&filtros=$3\" target=\"_parent\" $4>$5</a>";
note the .*? to have as little as possible matched, to avoid matching more than wanted.
For testing the matching and the regex, I used this applet I found while researching the subject.
Hope this helps anyone with the same problem.
Related
I want to get one particular word using regex in java. thanks
in the below paragraph, I need to find the network interface name
resource "azurerm_network_interface" "nic_LinuxVMCent-nhi" {
name = "nic_LinuxVMCent-nhi"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
ip_configuration {
name = "pubIP_LinuxVMCent-nhi"
subnet_id = azurerm_subnet.sub_wind12VM-PtN.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.pubIP_LinuxVMCent-nhi.id
}
}
data "azurerm_snapshot" "snapLinuxVMCent-nhi" {
name = "CentOS76New-0"
resource_group_name = "SaaSworkloadsnaps"
}
Expected Result Ex:
nic_LinuxVMCent-nhi
This is a multi-line bit of text. However, there appears to be a line which you could recognise with a regex:
resource "azurerm_network_interface" "nic_LinuxVMCent-nhi" {
So the regex for that would be ^resource "azurerm_network_interface" "([^"]+)" {$ - see https://regexr.com/67ldb
You can use Matcher.match to see if the any line matches this expression and if it does then matcher.group(1) will be the value you're looking for.
you can use this regex to find the network interface name:
(?<=resource \"azurerm_network_interface\" \").+(?=\" {)
I have used lookahead to find the name.
Also, here's a link to regex101:
Link
I don't know network interfaces so,
This regex solution is specific to "azurerm_network_interface."
If you need any additional help, please comment down below.
Cheers :)
I have the following configuration in the urlrewrite.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN" "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">
<urlrewrite use-query-string="true">
<rule>
<from>^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$</from>
<to type="redirect" last="true">/events$4$5</to>
</rule>
</urlrewrite>
The regex ^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$ has 7 groups, which are:
(/event/showEventList): matches /event/showEventList
(\.{1}): matches a single dot (.)
(\bhtm\b|\bhtml\b): matches only htm or html
(\?{0,1}): matches question mark (?) which can may occur zero or one
([a-zA-Z0-9-_=&]{0,}+): matches the query string which can occur zero or more
(#{0,1}): matches hashtag (#) which can may occur zero or one
([a-zA-Z0-9-_=&]{0,}+): matches the fragment which can occur zero or more
If I test this configuration with a test URL: /event/showEventList.html?pageNumber=1#key=val, I am expecting that the redirected URL would be /events?pageNumber=1, but I am getting /events?pageNumber=1#key=val
I have a code snippet to test it, which is:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class UrlRewriterRegexTest {
public static void main(String[] args) {
String input = "/event/showEventList.html?pageNumber=1#key=val";
String regex = "^(/event/showEventList)(\\.{1})(\\bhtm\\b|\\bhtml\\b)(\\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceFirst("/events$4$5"));
}
}
It outputs to: /events?pageNumber=1.
Any pointer would be very helpful.
I'd simplify the expression a bit.
Escape slashes, as they are typically used as delimiters for the regex (\/event\/showEventList)
Remove superfluous quantifier (\.)
Shorten the html string test (htm(l)?) - careful, this messes with your capturing group numbers
Remove word boundary checks around html
Use ? instead of {0,1}
Use * instead of {0,}
Remove possessive quantifier (I don't see why you'd need it)
Ignore everything after #, you don't seem to need it in your replacement
This gives us ^(\/event\/showEventList)(\.)(htm(l)?)(\??)([a-zA-Z0-9-_=&]+)*#(.+)$ which subsitutes your example to /events?pageNumber=1
To play around, see https://regexr.com/4otp7
I've simplified the expression and here is the working solution
<from>^(\/event\/showEventList\.html?)(\?[a-zA-Z0-9-_=&]*)\#.*$</from>
<to type="redirect" last="true">/events$2</to>
This will match any thing and take everything from the beginning of query string till the first occurrence of #
Explanation:
Group 1 : Match the url /event/showEventList.html OR /event/showEventList.htm
Group 2 : Match all query string between o to many till the first occurrence of #
Group 2 is the string which you want to use for redirect and ignore any thing after # including #
Example:
I am answering my own question, so that in future if someone else stumbles upon the same problem, this answer could help him.
There is nothing to do with the UrlRewriteFilter framework. By enabling the debug log for this framework I have seen that the URL it is receiving before applying the defined rules doesn't have the URL Hash(#). From other SO answers and by analyzing the network traffic of the browser, I saw that the browser does not send the URL fragment to the server so it's not available in the HttpServletRequest. This is the reason the Regular Expressions are not working.
Since this hash is available in the client browser and thanks to HTML5 History API I am able to solve the problem using JavaScript:
<script type="text/javascript">
window.addEventListener('DOMContentLoaded', (event) => {
const url = new URL(window.location);
url.hash = '';
history.replaceState(null, document.title, url);
});
</script>
I am working with some legacy code that has a static method call which we need to remove from our source tree.
The existing code is as follows:
Logger.getInstance(JdkUtil.forceInit(SomeBusiness.class));
What we need to end up with is:
Logger.getInstance(SomeBusiness.class);
I've spent all day today trying to figure out how to do that replacement. Since I have very little experience with regular expressions, I have only been able to come up with a pattern that matches the source string.
The pattern JdkUtil.forceInit([a-zA-Z_0-9]*.class) finds matches on the input string I am providing. I've tested this at https://www.freeformatter.com/java-regex-tester.html
So if anyone can post a Java solution to this, I would really appreciate it.
Below is some Groovy code that I have so far. What I am missing is to how correctly replacement explained above.
String source = 'Logger.getInstance(JdkUtil.forceInit(RtpRuleEngineCompiledImpl.class))'
String regexpPattern = 'JdkUtil.forceInit\\([a-zA-Z_0-9\\)]*.class\\)'
String replaced = source.replaceFirst(regexpPattern, 'hello')
println replaced
When I run the above code I get the following output:
Logger.getInstance(hello)
Obviously 'hello' is just for testing.
Thanks in advance to anyone who can give me some suggestions.
You'll likely want to do something such as:
class StackOverflow {
public static void main(String[] args) {
String source = "Logger.getInstance(JdkUtil.forceInit(RtpRuleEngineCompiledImpl.class))";
String regexpPattern = "JdkUtil.forceInit\\(([a-zA-Z_0-9]*.class)\\)";
String replaced = source.replaceFirst(regexpPattern, "$1");
System.out.println(replaced);
}
}
Result:
Logger.getInstance(RtpRuleEngineCompiledImpl.class)
The capture group ($1) replaces the entire string which was within the parentheses.
I have the following Java code:
str = str.replaceAll("<.*?>.*?</.*?>|<.*?/>", "");
This turns a String like so:
How now <fizz>brown</fizz> cow.
Into:
How now cow.
However, I want it to just strip the <fizz> and </fizz> tags, or just standalone </fizz> tags, and leave the element's content alone. So, a regex that would turn the above into:
How now brown cow.
Or, using a more complex String, somethng that turns:
How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow.
Into:
How now brown cow.
I tried this:
str = str.replaceAll("<.*?></.*?>|<.*?/>", "");
And that doesn't work at all. Any ideas? Thanks in advance!
"How now <fizz>brown</fizz> cow.".replaceAll("<[^>]+>", "")
You were almost there ;)
Try this:
str = str.replaceAll("<.*?>", "")
While there are other correct answers, none give any explanation.
The reason your regex <.*?>.*?</.*?>|<.*?/> doesn't work is because it will select any tags as well as everything inside them. You can see that in action on debuggex.
The reason your second attempt <.*?></.*?>|<.*?/> doesn't work is because it will select from the beginning of a tag up to the first close tag following a tag. That is kind of a mouthful, but you can understand better what's going on in this example.
The regex you need is much simpler: <.*?>. It simply selects every tag, ignoring if it's open/close. Visualization.
You can try this too:
str = str.replaceAll("<.*?>", "");
Please have a look at the below example for better understanding:
public class StringUtils {
public static void main(String[] args) {
System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow."));
System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow."));
}
public static String replaceAll(String strInput) {
return strInput.replaceAll("<.*?>", "");
}
}
Output:
How now brown cow.
How now brown cow.
This isn't elegant, but it is easy to follow. The below code removes the start and end XML tags if they are present in a line together
<url>"www.xml.com"<\url> , <body>"This is xml"<\body>
Regex :
to_replace='<\w*>|<\/\w*>',value=""
If you want to parse XML log file so you can do with regex {java}, <[^<]+<.so you get <name>DEV</name>. Output like name>DEV. You have to just play with REGEX.
I have the following REGEX that I'm serving up to java via an xml file.
[a-zA-Z -\(\) \-]+
This regex is used to validate server side and client side (via javascript) and works pretty well at allowing only alphabetic content and a few other characters...
My problem is that it will also allow zero lenth strings / empty through.
Does anyone have a simple and yet elegant solution to this?
I already tried...
[a-zA-Z -\(\) \-]{1,}+
but that didn;t seem to work.
Cheers!
UPDATE FOLLOWING INVESTIGATION
It appears the code I provided does in fact work...
String inputStr = " ";
String pattern = "[a-zA-Z -\\(\\) \\-]+";
boolean patternMatched = java.util.regex.Pattern.matches(pattern, inputStr);
if ( patternMatched ){
out.println("Pattern MATCHED");
}else{
out.println("NOT MATCHED");
}
After looking at this more closely I think the problem may well be within the logic of some of my java bean coding... It appears the regex is dropped out at the point where the string parse should take place, thereby allowing empty strings to be submitted... And also any other string... EEJIT that I am...
Cheers for the help in peer reviewing my initial stupid though....!
Have you tried this:
[a-zA-Z -\(\) \-]+