Extracting image src in HTML with jsoup

Extracting image src in HTML with jsoup - java

I tried to get the .png file's link with this code.
e = rawData.select("img[class=competitive-rank]");
for(Element el : e){
playerRankIconURL = el.attr("src");
println(playerRankIconURL);
}
But it seems to be not working properly...what am I doing it wrong?

Your selector is looking for an img with the class competitive-rank, but there isn't one. It's the div which has that class.
You probably instead want to select an img which is contained by a div with that class, which you could do with the selector div.competitive-rank img.

Related

Why does JSoup create empty files named after the link?

Im trying to get the all the images of a website but why does Jsoup not only get the images of the page but also a create document named like the link after the slash?
Elements imageElements = document.select("img[src$=.png], img[src$=.jpg], img[src$=.jpeg]");
for(Element imageElement : imageElements){
String strImageURL = imageElement.attr("abs:src");
Here is the full code

I found a way to fix it it is probably a prnt.sc issue:
Instead selecting the everything with the "img" tag like that:
Elements img = doc.getElementsByTag("img");
I selected everything with the file ending png, jpg and jpeg. Here the code:
Elements imageElements = document.select("img[src$=.png], img[src$=.jpg], img[src$=.jpeg]");

JavaFx WebEngine - Overwriting a website's stylesheet with (local) files

I'd like to customise the appearance of a website that I am loading, so I created a little test.css file that does nothing but changing the look of all table rows:
tr {
height: 22px;
background-image: url("test.png");
}
How do I get he WebEngine to load this file and replace the page's own CSS rules with mine?
Also, i'd like to be able to load page-specific css files and not one huge file for all pages.
I found this page, but it only shows how to run through the DOM and assign a new style to the desired elements by hand. This is, of course, not what I want. Instead, I'd like the browser to use my files as 'user defaults'.
Thx for any help :)

First of I have to state, that I hope you know what you are doing, as these things can seriously damage a web site.
So here is what you can do:
You grab the Document from the WebEngine, retrieve the head element and add a style child element to it, containing the src location of the stylesheet you want to add.
Document doc = webView.getEngine().getDocument();
URL scriptUrl = getClass().getResource(pathToAttachedDocument); // pathToAttachedDocument = CSS file you want to attach
String content = IOUtils.toString(scriptUrl); // Use Apache IO commons for conveniance
Element appendContent = doc.createElement("style");
appendContent.appendChild(doc.createTextNode(content));
doc.getElementsByTagName("head").item(0).appendChild(appendContent);
By the way, JavaScript can be added in a similar way, it's just 'script' instead of 'style'

I would do like this to ADD or REPLACE any rules :
String css = getFileAsString("style.css");
Document doc = webEngine.getDocument();
Element e = doc.getElementById("my_style");
e.setTextContent(css);
... given a
<style id="my_style"></style>
tag in the HTML document.

setUserStyleSheetLocation()was designed for that very purpose: to let the user of the web page, style it as they want.
Usage:
webEngine.setUserStyleSheetLocation(styleSheetURL.toString());

Unable to click on images selector SELENIUM JAVA

I am unable to click on png image and encounter error.
HTML:
<a onmouseover="i2uiSetMenuCoords(this,event)" href="javascript:showMenu('9721')"><img hspace="1" src="./skins/e2-modern/images/dropdown.png" border="0px"></a>
Code:
if (navigateToDetails) {
SearchListSelectorExt selector = new SearchListSelectorExt();
//switchToFrame(getFrames(FRAME_TYPE.rcp_content));
//switchToFrame(getHeaderFrames());
WebElement element= selector.get(By.xpath("//a[contains(#src,'./skins/e2-modern/images/dropdown.png'"));
Object value = selector.getElementValue(element);
systemDocID = value.toString();
selector.clickName(systemDocID);
//selector.clickName(CustomerItem);
}

Your xpath is wrong...Use the below xpath
//a/img[contains(#src,'/skins/e2-modern/images/dropdown.png')]
Hope this helps you...kindly get back if it is not working

Try below xpath:-
//img[contains(#src,'dropdown.png')]
Here, we are directly looking for img tag such that its src attribute contains dropdown.png text.
If there are more than 1 web elements satisfying above xpath, then try to make it unique by adding extra attributes or parent.
//a/img[contains(#src,'dropdown.png')]
//img[#hspace='1' and contains(#src,'dropdown.png')]

HTML Parsing and removing anchor tags while preserving inner html using Jsoup

I have to parse some html and remove the anchor tags , but I need to preserve the innerHTML of anchor tags
For example, if my html text is:
String html = "<div> <p> some text some link text </p> </div>"
Now I can parse the above html and select for a tag in jsoup like this,
Document doc = Jsoup.parse(inputHtml);
//this would give me all elements which have anchor tag
Elements elements = doc.select("a");
and I can remove all of them by,
element.remove()
But it would remove the complete achor tag from start bracket to close bracket, and the inner html would be lost, How can I preserve the inner HTML which removing only the start and close tags.
Also, Please Note : I know there are methods to get outerHTML() and
innerHTML() from the element, but those methods only give me ways to
retrieve the text, the remove() method removes the complete html of
the tag. Is there any way in which I can only remove the outer tags
and preserve the innerHTML ?
Thanks a lot in advance and appreciate your help.
--Rajesh

use unwrap, it preserves the inner html
doc.select("a").unwrap();
check the api-docs for more info:
http://jsoup.org/apidocs/org/jsoup/select/Elements.html#unwrap%28%29

How about extracting the inner HTML first, adding it to the DOM and then removing your tags? This code is untested, but should do the trick:
Edit:
I updated the code to use replaceWith(), making the code more intuitive and probably more efficient; thanks to A.J.'s hint in the comments.
Document doc = Jsoup.parse(inputHtml);
Elements links = doc.select("a");
String baseUri = links.get(0).baseUri();
for(Element link : links) {
Node linkText = new TextNode(link.html(), baseUri);
// optionally wrap it in a tag instead:
// Element linkText = doc.createElement("span");
// linkText.html(link.html());
link.replaceWith(linkText);
}
Instead of using a text node, you can wrap the inner html in anything you want; you might even have to, if there's not just text inside your links.

Read href inside anchor tag using Java

I have an HTML snippet like this :
View or apply to job
I want to read href value XXXXXXXXXX using Java.
Point to note: I am reading the HTML file from a URL using inputstreamreader(url.openStream()).
I am getting a complete HTML file, and above snippet is a part of that file.
How can I do this?
Thanks
Karunjay Anand

Use a html parser like Jsoup. The API is easy to learn and for your case,the following code snippet will do
URL url = new URL("http://example.com/");
Document doc = Jsoup.parse(url, 3*1000);
Elements links = doc.select("a[href]"); // a with href
for (Element link : links) {
System.out.println("Href = "+link.attr("abs:href"));
}

Use an HTML parser like TagSoup or something similar.

You can use Java's own HtmlEditorKit for parsing html. This way you wont need to depend on any third party html parser. Here is an example of how to use it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting image src in HTML with jsoup - java

I tried to get the .png file's link with this code. e = rawData.select("img[class=competitive-rank]"); for(Element el : e){ playerRankIconURL = el.attr("src"); println(playerRankIconURL); } But it seems to be not working properly...what am I doing it wrong?

Your selector is looking for an img with the class competitive-rank, but there isn't one. It's the div which has that class. You probably instead want to select an img which is contained by a div with that class, which you could do with the selector div.competitive-rank img.

Related

Why does JSoup create empty files named after the link?

JavaFx WebEngine - Overwriting a website's stylesheet with (local) files

Unable to click on images selector SELENIUM JAVA

HTML Parsing and removing anchor tags while preserving inner html using Jsoup

Read href inside anchor tag using Java

Categories

Resources