Regular Expression - detect Specific Url and replace that string

Regular Expression - detect Specific Url and replace that string - java

I wanna detect exact domain url in string and then change that with another string and finally make it clickable in TextView.
What I want:
this is sample text with one type of url mydomain.com/pin/123456. another type of url is mydomain.com/username.
Wel, I wrote this regex:
([Hh][tT][tT][pP][sS]?://)?(?:www\\.)?example\\.com/?.*
([Hh][tT][tT][pP][sS]?://)?(?:www\\.)?example\\.com/pin/?.*
this regex can detect:
http://www.example.com
https://www.example.com
www.example.com
example.com
Hhtp://www.example.com // and all other wrong type in http
with anything after .com
Issues:
1. How detect end of domain ( with space or dot)
2. How detect two type of domain, one with /pin/ and another without?
3. How to replace detected domain like mydomain.com/pin/123 with PostLink and mydomain.com/username with ProfileLink
4. I know how to make them clickable with Linkify but if it possible show me best way to provide content provider for links to open each link with proper activity

You could try:
([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
which is a regex I found after a quick search here on stackoverflow:
Regular expression to find URLs within a string
I just removed the http:// part of that regex to fit your needs.
Be aware though that because of that it now tracks everything that is connected with a dot and no whitespace. For example: a.a would also be found

With special thanks of Gildraths
Answer to question 1
String urlRegex = "(https?://)?(?:www\\.)?exampl.com+([\\w.,#?^=%&:/~+#-]*[\\w#?^=%&/~+#-])?";
Pattern pattern = Pattern.compile(urlRegex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(textString);
Answer to question 2, 3
while(matcher.find()){
// Answer to question 2 - If was true, url contain "/pin"
boolean contain = matcher.group().indexOf("/pin/") >= 0;
if(contain){
String profileId = matcher.group().substring(matcher.group().indexOf("/pin/") + 5, matcher.group().length());
}
// Answer to question 3 - replace match group with custom text
textString = textString.replace(matcher.group(), "#" + profileId);
}
Answer to question 4
// Pattern to detect replaced custom text
Pattern profileLink = Pattern.compile("[#]+[A-Za-z0-9-_]+\\b");
// Schema
String Link = "content://"+Context.getString(R.string.profile_authority)+"/";
// Make it linkify ;)
Linkify.addLinks(textView, profileLink, Link);

Related

Need to get Particular word using java Regex

I want to get one particular word using regex in java. thanks
in the below paragraph, I need to find the network interface name
resource "azurerm_network_interface" "nic_LinuxVMCent-nhi" {
name = "nic_LinuxVMCent-nhi"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
ip_configuration {
name = "pubIP_LinuxVMCent-nhi"
subnet_id = azurerm_subnet.sub_wind12VM-PtN.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.pubIP_LinuxVMCent-nhi.id
}
}
data "azurerm_snapshot" "snapLinuxVMCent-nhi" {
name = "CentOS76New-0"
resource_group_name = "SaaSworkloadsnaps"
}
Expected Result Ex:
nic_LinuxVMCent-nhi

This is a multi-line bit of text. However, there appears to be a line which you could recognise with a regex:
resource "azurerm_network_interface" "nic_LinuxVMCent-nhi" {
So the regex for that would be ^resource "azurerm_network_interface" "([^"]+)" {$ - see https://regexr.com/67ldb
You can use Matcher.match to see if the any line matches this expression and if it does then matcher.group(1) will be the value you're looking for.

you can use this regex to find the network interface name:
(?<=resource \"azurerm_network_interface\" \").+(?=\" {)
I have used lookahead to find the name.
Also, here's a link to regex101:
Link
I don't know network interfaces so,
This regex solution is specific to "azurerm_network_interface."
If you need any additional help, please comment down below.
Cheers :)

Removing Hashtag using Java WebFilter

I have the following configuration in the urlrewrite.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN" "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">
<urlrewrite use-query-string="true">
<rule>
<from>^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$</from>
<to type="redirect" last="true">/events$4$5</to>
</rule>
</urlrewrite>
The regex ^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$ has 7 groups, which are:
(/event/showEventList): matches /event/showEventList
(\.{1}): matches a single dot (.)
(\bhtm\b|\bhtml\b): matches only htm or html
(\?{0,1}): matches question mark (?) which can may occur zero or one
([a-zA-Z0-9-_=&]{0,}+): matches the query string which can occur zero or more
(#{0,1}): matches hashtag (#) which can may occur zero or one
([a-zA-Z0-9-_=&]{0,}+): matches the fragment which can occur zero or more
If I test this configuration with a test URL: /event/showEventList.html?pageNumber=1#key=val, I am expecting that the redirected URL would be /events?pageNumber=1, but I am getting /events?pageNumber=1#key=val
I have a code snippet to test it, which is:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class UrlRewriterRegexTest {
public static void main(String[] args) {
String input = "/event/showEventList.html?pageNumber=1#key=val";
String regex = "^(/event/showEventList)(\\.{1})(\\bhtm\\b|\\bhtml\\b)(\\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceFirst("/events$4$5"));
}
}
It outputs to: /events?pageNumber=1.
Any pointer would be very helpful.

I'd simplify the expression a bit.
Escape slashes, as they are typically used as delimiters for the regex (\/event\/showEventList)
Remove superfluous quantifier (\.)
Shorten the html string test (htm(l)?) - careful, this messes with your capturing group numbers
Remove word boundary checks around html
Use ? instead of {0,1}
Use * instead of {0,}
Remove possessive quantifier (I don't see why you'd need it)
Ignore everything after #, you don't seem to need it in your replacement
This gives us ^(\/event\/showEventList)(\.)(htm(l)?)(\??)([a-zA-Z0-9-_=&]+)*#(.+)$ which subsitutes your example to /events?pageNumber=1
To play around, see https://regexr.com/4otp7

I've simplified the expression and here is the working solution
<from>^(\/event\/showEventList\.html?)(\?[a-zA-Z0-9-_=&]*)\#.*$</from>
<to type="redirect" last="true">/events$2</to>
This will match any thing and take everything from the beginning of query string till the first occurrence of #
Explanation:
Group 1 : Match the url /event/showEventList.html OR /event/showEventList.htm
Group 2 : Match all query string between o to many till the first occurrence of #
Group 2 is the string which you want to use for redirect and ignore any thing after # including #
Example:

I am answering my own question, so that in future if someone else stumbles upon the same problem, this answer could help him.
There is nothing to do with the UrlRewriteFilter framework. By enabling the debug log for this framework I have seen that the URL it is receiving before applying the defined rules doesn't have the URL Hash(#). From other SO answers and by analyzing the network traffic of the browser, I saw that the browser does not send the URL fragment to the server so it's not available in the HttpServletRequest. This is the reason the Regular Expressions are not working.
Since this hash is available in the client browser and thanks to HTML5 History API I am able to solve the problem using JavaScript:
<script type="text/javascript">
window.addEventListener('DOMContentLoaded', (event) => {
const url = new URL(window.location);
url.hash = '';
history.replaceState(null, document.title, url);
});
</script>

Find List of String which mache in Text using Regex

I am getting stuck in this situation.
public void findListOfPattern(){
String text = "abce1213abcd231asdf";
String find = "1213|231|1232";
Pattern part = Pattern.compile(find);
Matcher mat = part.matcher(text);
System.out.println(mat.find()); //True
}
Able to get true result if any of string in find get match.
I want list of matcher from text.
There text can large with more find string and also find string can more.
In find : 1213,231,1232 are separates.
Result should be like :- 1213,231

You need to invoke mat.group() to return the desired match.
Typically you'd loop until mat.find() returns true and print all matches successively by invoking mat.group().
You can then build your expected result String by concatenating the outcome of mat.group() as you wish, e.g. with a StringBuilder.
Notes
API here.
You need to invoke Matcher#find in order for Matcher#group to yield any result and not throw IllegalStateException
Your Pattern only has the default group. If you'd used parenthesis or named groups (from Java 7), you could also invoke overloads Matcher#group(int group) or Matcher#group(String name).

Modify "Black Box" Servlet Response Output

Problem:
I have a servlet that generate reports, more specifically the table body of a report. It is a black box, we do not have access to the source code.
Nevertheless, its working satisfactory, and the servlet is not planned to be rewritten or replaced anytime soon.
We need to modify its response text in order to update a few links it generates to other reports, I was thinking of doing it with a filter that would find the anchor text and replace it using a regex.
Research:
I ran into this question that has a regex filter. It should be what I need, but then maybe not.
I am not trying to parse HTML in the strict sense of the parsing term, and I am not working with the full spec of the language. What I have is a subset of HTML tags that compose a table body, and does not have nested tables, so the HTML subset generated by the servlet is not recursive.
I just need to find / replace the anchors targets and add an attribute to the tag.
So the question is:
I need to modify the output of a servlet in order to change all links of the kind:
<a href="http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg">
into links like:
<a href="http://myOtherPage.com/webReports/report.xhtml?id=MyReport&filters=abcdefg" target="_parent">
Should I use the regex filter written by # Jeremy Stein or is there a better solution?

Assuming that the only part of the target A tags which vary is the query component of the href attribute, then this tested regex solution should do a pretty good job:
// TEST.java 20121024_0800
import java.util.regex.*;
public class TEST {
public static String fixReportAnchorElements(String text) {
Pattern re_report_anchor = Pattern.compile(
"<a href=\"http://mypage\\.com/servlets/reports/\\?a=report&id=([^\"]+)\">",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher m = re_report_anchor.matcher(text);
return m.replaceAll(
"<a href=\"http://myOtherPage.com/webReports/report.xhtml?id=$1\" target=\"_parent\">"
);
}
public static void main(String[] args) {
String input =
"test <a href=\"http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg\"> test";
String output = fixReportAnchorElements(input);
System.out.println(output);
}
}

I used Jeremy Stein (click to go to question) classes, with a few changes:
a) Make sure nobody down the filter chain or the servlet DO NOT call getOutputStream() on the wrapper object, or it will throw an invalidStateException (check this answer by BalusC on the subject).
b) I wanted to make a single change on the page, so I did not put any filterConfig on the web.xml.
b.2) I also did not put anything on the web.xml at all. Used the javax.servlet.annotation.WebFilter on the class itself.
c) I set the Pattern and replace strings directly on the class:
Pattern searchPattern = Pattern.compile("<a (.*?) href=\".*?id=(.*?)[&|&]filtros=(.*?)\" (.*?)>(.*?)</a>");
String replaceString = "<a $1 href=\"/webReports/report.xhtml?idRel=$2&filtros=$3\" target=\"_parent\" $4>$5</a>";
note the .*? to have as little as possible matched, to avoid matching more than wanted.
For testing the matching and the regex, I used this applet I found while researching the subject.
Hope this helps anyone with the same problem.

Capturing dot and comma in Java RegExp

I have following code in Java:
Pattern fieldsPattern = Pattern.compile("(\"([^\"]+)\")|"
+"("+this.field_tag+"([0-9a-zA-Z_]+))");
Matcher fieldsMatcher = fieldsPattern.matcher(field);
while(fieldsMatcher.find())
{
//...
}
This code should capture expressions like "expression" and :expression (field_tag is just ":"). The problem occurs when I try to capture an expression like: "10.1" or "10,1". It dosen't work.
But expressions:
"10-1",
"10+1"
works as expected.
I also tried use this regexp on regexpal.com - site with javascript implementation of RegExp. On this site expressions like "10.1" and "10,1" works fine.
Is there any difference in java vs javascript in capturing dots? What am I doing wrong?

This works for me
Pattern fieldsPattern = Pattern.compile("(\"[^\"]+\")");
String field =" aa \"10\" \"10.1\" and \"10,1\"";
Matcher fieldsMatcher = fieldsPattern.matcher(field);
while(fieldsMatcher.find()) {
System.out.println(fieldsMatcher.group());
}
prints
"10"
"10.1"
"10,1"
The second set of brackets in the regex appear to be redundant, but are harmless.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular Expression - detect Specific Url and replace that string - java

Related

Need to get Particular word using java Regex

Removing Hashtag using Java WebFilter

Find List of String which mache in Text using Regex

Modify "Black Box" Servlet Response Output

Capturing dot and comma in Java RegExp

Categories

Resources