I want to replace in a String, which represents a Html-File,all relative Links with absolute Links. I write the following method, which does not work. any links are followed by a duplicate baseurl like http://www.google.dehttp://www.google.de/resource?
public static String replacePattern(URL targetUrl,String urlAsString,String patternString) throws IOException{
System.out.println(targetUrl.toString());
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(urlAsString);
Set<String> replacedStrings = new TreeSet<String>();
//return matcher.replaceAll(targetUrl.toString()+"$0");
while (matcher.find()) {
String relativeLink = matcher.group(1);
//System.out.println("Find Link " + relativeLink);
if(!replacedStrings.contains(relativeLink)){
//System.out.println("Relative Link " + relativeLink);
String newLink = targetUrl.toString() + relativeLink;
//System.out.println("New Link " + newLink);
urlAsString = urlAsString.replace(relativeLink,newLink);
replacedStrings.add(relativeLink);
}
}
return urlAsString;
}
UrlAsString is a String which contains the wholecontent as a String.My patterns are
href=['\"](/[^'\"]+)['\"]
and
src=['\"](/[^'\"]+)['\"]
Use Class URL:
URL baseUrl = new URL("http://www.domain.com/folder/");
URL url = new URL(baseURL , "url.html");
Related
I am using Java-8, I would like to check whether the URL is valid or not based on pattern.
If valid then I should get the attributes bookId, authorId, category, mediaId
Pattern: <basepath>/books/<bookId>/author/<authorId>/<isbn>/<category>/mediaId/<filename>
And this is the sample URL
URL => https:/<baseurl>/v1/files/library/books/1234-4567/author/56784589/32475622347586/media/324785643257567/507f1f77bcf86cd799439011_400.png
Here Basepath is /v1/files/library.
I see some pattern matchings but I couldn't relate with my use-case, probably I was not good at reg-ex. I am also using apache-common-utils but I am not sure How to achieve it either.
Any help or hint would be really appreciable.
Try this solution (uses named capture groups in regex):
public static void main(String[] args)
{
Pattern p = Pattern.compile("http[s]?:.+/books/(?<bookId>[^/]+)/author/(?<authorId>[^/]+)/(?<isbn>[^/]+)/media/(?<mediaId>[^/]+)/(?<filename>.+)");
Matcher m = p.matcher("https:/<baseurl>/v1/files/library/books/1234-4567/author/56784589/32475622347586/media/324785643257567/507f1f77bcf86cd799439011_400.png");
if (m.matches())
{
System.out.println("bookId = " + m.group("bookId"));
System.out.println("authorId = " + m.group("authorId"));
System.out.println("isbn = " + m.group("isbn"));
System.out.println("mediaId = " + m.group("mediaId"));
System.out.println("filename = " + m.group("filename"));
}
}
prints:
bookId = 1234-4567
authorId = 56784589
isbn = 32475622347586
mediaId = 324785643257567
filename = 507f1f77bcf86cd799439011_400.png
I have a String xxxxxxxxsrc="/slm/attachment/63338424306/Note.jpg"xxxxxxxx Now, I want to extract substrings slm/attachment/63338424306/Note.jpg & Note.jpg from the String in to variables i.e. temp1 & temp2.
How can I do that using regex in Java?
Note: 63338424306 could be any random no. & Note.jpg could be anything
like Note.png or abc.jpg or xxxx.yyy etc.
Please help me to extract these two strings using regex.
You can use negative look behind to get file name
((?:.(?<!/))+)\"
and below regex to get full path
/(.*)\"
Sample code
public static void main(String[] args) {
Pattern pattern = Pattern.compile("/(.*)\"");
Pattern pattern1 = Pattern.compile("((?:.(?<!/))+)\"");
String matchString = "/slm/attachment/63338424306/Note.jpg\"xxxxxxxx";
Matcher matcher = pattern.matcher(matchString);
String fullString = "";
while (matcher.find()) {
fullString = matcher.group(1);
}
matcher = pattern1.matcher(matchString);
String fileName = "";
while (matcher.find()) {
fileName = matcher.group(1);
}
System.out.println(fullString + " " + fileName);
}
As per your comment taking the string as declared below in my code:
Please clarify if your input string is not like this or I'm missing something.
public static void main(String[] args) {
String str = "xxxxxxxxsrc=\"/slm/attachment/63338424306/Note.jpg\"xxxxxxxx";
String url = null;
// The below pattern will grab string between quotes
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group(1));
url = m.group(1);
}
// and this will grab filename from the path(url)
p = Pattern.compile("(?:.(?<!/))+$");
m = p.matcher(url);
while (m.find()) {
System.out.println(m.group());
}
}
I have a id(Long) that js can not handle it. So when when I return id, I want to return another property id_str.
Just like :
{"id":43777753494847488, "id_str":"43777753494847488"}
I am using fastxml jackson writeValueAsString(object) method.
What should I do?
fail to rewrite the JsonSerializer. Maybe it's to hard for me. So I modify the json string. here is the code:
public static String expandUserIDStr(String json) {
String key = "user_id";
String expandKey = "user_id_str";
String r = "\"" + key + "\":(\\d+)[,]{0,1}";
Pattern patter = Pattern.compile(r);
Matcher matcher = patter.matcher(json);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
String expandContent = "\"" + expandKey + "\":\"" + matcher.group(1) +"\"," + matcher.group(0);
System.out.println(expandContent);
matcher.appendReplacement(buffer, expandContent);
}
matcher.appendTail(buffer);
return buffer.toString();
}
Need to do in Java with regex for conditional replacement (see the sample below if do it in javascript). Couldn't find a easy way to do it with Java. Anyone knows if there is equivalent function/callback to do it in Java?
The problem is on Android it needs to find a img tag match with certain token in it, and if that tag is found the src part need to be modified. The problem is the img tag may also have different attributes and they should be kept. And they could be located in random place inside the img tag. And the src is unknown but only know it should be modified.
So it is looking for the token attribute in img tag and modify src part in it.
(img attribute1 token="this is the token" style="width:100px;......" src="http://......" someOtherAttributes /)
or
(img src="http://......" attribute1 token="this is the token" someOtherAttributes style="width:100px;......" /)
The result would be only the src is modified, like
(img src="http://....../pathabc" attribute1 token="this is the token" someOtherAttributes style="width:100px;......" /)
In javascript you may want do
var passedInToken = "this is the token";
var srcPath = "/pathabc";
var regex = new RegExp("(?:<img\\s)([^<]*)(?:token=\""+passedInToken+"\"\\s*)([^>]*)>", "gi");
var foundToken = regex.test(testSourceHtmlString);
if (foundToken) {
var testSourceHtmlString = testSourceHtmlString.replace(regex, function(matchStr, grp1, grp2){
var attributeStrToBeReserved = grp1+" "+grp2;
var findSrcRegex = new RegExp("(src=\".*?\")");
if (findSrcRegex.test(attributeStrToBeReserved)){
attributeStrToBeReserved = attributeStrToBeReserved.replace(srcRegex, function(match, g1){
return g1 + srcPath;
});
}
return "<img token=\""+passedInToken+"\""+attributeStrToBeReserved+">";
});
}
Looks like using matcher may help to get the same. Is there better approach?
String passedInToken = "this is the token";
String srcPath = "/pathabc";
StringBuffer sb = new StringBuffer();
String srcPatternString = "(src=\".*?\")";
Pattern srcPattern = Pattern.compile(srcPatternString);
String patternString = "(?:<img\\s)([^<]*)(?:token=\""+passedInToken+"\"\\s*)([^>]*)>";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(htmlStr);
while(matcher.find()) {
String srcPart = matcher.group(1) + " " + matcher.group(2);
StringBuffer srcSb = new StringBuffer();
Matcher strMatcher = srcPattern.matcher(srcPart);
while (strMatcher.find()) {
strMatcher.appendReplacement(srcSb, strMatcher.group(1)+ srcPath);
}
strMatcher.appendTail(srcSb);
String srcStr = "<img \"" + srcSb.toString() + ">";
matcher.appendReplacement(sb, srcStr);
}
matcher.appendTail(sb);
String newHtmlStr = sb.toString();
I want to know how can copy the "?ned=us&topic=t" part in "http://news.google.com/?ned=us&topic=t". Basically, I want to copy the path of the url, or the portion after the ".com". How do I do this?
public class Example {
public static String url = "http://news.google.com/?ned=us&topic=t";
public static void main(String[] args) {
WebDriver driver = new FirefoxDriver();
driver.get(url);
WebElement reportCln=driver.findElement(By.id("id_submit_button"));
String path=driver.getCurrentUrl();
System.out.println(path);
}
}
You should have a look at the java.net.URL class and its getPath() and getQuery() methods.
#Test
public void urls() throws MalformedURLException {
final URL url = new URL("http://news.google.com/?ned=us&topic=t");
assertEquals("ned=us&topic=t", url.getQuery());
assertEquals("?ned=us&topic=t", "?" + url.getQuery());
assertEquals("/", url.getPath());
}
Regular expressions are fun, but IMO this is easier to understand.
Try this:
String request_uri = null;
String url = "http://news.google.com/?ned=us&topic=t";
if (url.startsWith("http://") {
request_uri = url.substring(7).split("/")[1];
} else {
request_uri = url.split("/")[1];
}
System.out.println (request_uri); // prints: ?ned=us&topic=t
If you're only interested in the query string i.e. for google.com/search?q=key+words you want to ignore search? then just split on ? directly
// prints: q=key+words
System.out.println ("google.com/search?q=key+words".split("\\?")[0]);
You can use regular expression to extract the part you want:
String txt = "http://news.google.com/?ned=us&topic=t";
String re1 = "(http:\\/\\/news\\.google\\.com\\/)"; // unwanted part
String re2 = "(\\?.*)"; // wanted part
Pattern p = Pattern.compile(re1 + re2, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String query = m.group(2);
System.out.print(query);
}