I am using Jsoup to fetch all images of a particular manga chapter from online-manga sites using only the first page link.
I have successfully retrieved the total page number and the src of the first page, for example: if supplied with this link "http://www.mangapanda.com/feng-shen-ji/1/1" the output will be:
Total page : 49
Title : Feng Shen Ji 1
ImageURL : http://i15.mangapanda.com/feng-shen-ji/1/feng-shen-ji-2974919.jpg
what I want to do now is to fetch the src of the second page and then auto-increment to get the rest. The link to the second page is in the html as:
<div id="prefetchimg" style="background-image: url("http://i34.mangapanda.com/feng-shen-ji/1/feng-shen-ji-2974921.jpg");"></div>
but when I use jsoup as
String url = "http://www.mangapanda.com/feng-shen-ji/1";
Document doc = Jsoup.connect(url).userAgent("Mozilla").get();
Elements div = doc.select("div");
for (Element divParse : div) {
if(divParse.id().equals("prefetchimg"))
System.out.println(divParse);}
I only get
<div id="prefetchimg"></div>
Instead of
<div id="prefetchimg" style="background-image: url("http://i34.mangapanda.com/feng-shen-ji/1/feng-shen-ji-2974921.jpg");"></div>
How do I get the style attribute?
#eltabo said
Ok, in your case, your tag has been modified by a javascript function, so Jsoup can't see this attribut
And this is true, JSoup is for Html page only. For Html with JS use for example HtmlUnit
Related
how to fetch anchor href attribute for shown in below screen shot using selenium in Java
If you want to fetch the anchor node href value and the class name is not dynamic and it is unique, then you can do like below :
WebElement elemnent = driver.findElement(By.xpath("//div[#class='_6ks']/a"));
String url = elemnent.getAttribute("href");
System.out.println("=> The URL is : "+url);
If the above one doesn't work then share the full html code in the text format so that it will be easy for us to track down that element.
I am using jsoup to parse webpage using the following command
Document document = Jsoup.connect("http://www.blablabla.de/").get();
then
System.out.println(document.toString());
I get the desired result. But saving the subject webpage and then trying to do the same operation
Document doc = Jsoup.parse("/user/test/test.html","UTF-8");
System.out.println(doc.toString());
I got
html
head head
body
/home/1.html
body
html
My second issue is that I want to get the content of every single div of a specific class. I am using
Elements elements = document.select("div.things.subthings");
the divs I want to catch are as follows
<div class="col_a col text">
<div class="text">
done
</div>
</div>
But saving the subject webpage and then trying to do the same operation
The wrong method is called. Actually, the method called is this one:
static Document Jsoup::parse(String html, String baseUri) // Parse HTML into a Document.
You want to call this one:
static Document parse(File in, String charsetName) // Parse the contents of a file as HTML.
Try this instead:
Document doc = Jsoup.parse(new File("/user/test/test.html"), "UTF-8");
System.out.println(doc.toString());
My second issue is that I want to get the content of every single div of a specific class.
Try one of the css queries below:
For finding all divs with class="col_a col text"
div.col_a.col.text
For finding all divs with class="col_a col text" OR class="text"
div.col_a.col.text, div.text
For finding all divs with class="col_a col text" having divs with class="text" among their descendants
div.col_a.col.text:has(div.text)
How can i insert some iframes inside a HTML page's iframe.
<HTML>
<div id="data">
<iframe height="160" width="600">
</iframe>
</div>
</HTML>
i could able to find the specific location using xpath
HtmlInlineFrame frame = (HtmlInlineFrame)page.getByXPath("//div[#id='data']/iframe").get(0);
i'm not clear how can i insert another htmlpage (iframe as htmlpage) inside this selected iframe. i have to insert more than one htmlpage (iframes as htmlpages) into this iframe Please suggest some way.
((HtmlInlineFrame)page1.getByXPath("//div[#id='data']/iframe").get(0)).setTextContent(page2.asXml());
this will work, still there is a problem that, there is a parser working in between, that is content set as
page2.asXml();
will set the content. After that when viewing the page as xml all '<' replaced with < and '>' replaced with >
((HtmlInlineFrame)page1.getByXPath("//div[#id='data']/iframe").get(0)).appendChild(page2);
will fix earlier issue still it will add two unwanted lines
I am trying to find an element on a page using Selenium. Here is some example content:
<body id="tinymce" class="mceContentBody " contenteditable="true" dir="ltr" style="overflow: auto;">
Here is how I am trying to select it:
driver.findElement(By.cssSelector("body#tinymce")).sendKeys("Hello, everyone!! Don't worry it is a test letter to check connection!!");
I do not get an element returned though.
It looks like you are testing against TinyMCE editor.
The issues are:
It's in an iframe, you need to switch into to the iframe first.
You need to send keys to <body> element (not <input>) inside that iframe
Here is what to do:
// switch to iframe, use locator of your choice, "#editMe_ifr" here as an example
WebElement editorFrame = driver.findElement(By.cssSelector("#editMe_ifr"));
driver.switchTo().frame(editorFrame);
WebElement body = driver.findElement(By.TagName("body")); // then you find the body
body.sendKeys(Keys.CONTROL + "a"); // send 'ctrl+a' to select all
body.SendKeys("Some text");
Further reading:
Interact with a cute editor using webdriver
Using C# with selenium and cleditor.
You can change your HTML to this:
<body>
<input id="tinymce" type="text"/>
</body>
And you can change the selector from body#tinymce to #tinymce. You shouldn't need to specify the tagname when using id because the id should be unique anyway.
I have the following HTML:
<div align='center' style='height:50px'>
<H1>A Simple Sample Web Page</H1>
<IMG SRC='http://sheldonbrown.com/images/scb_eagle_contact.jpeg'>
<H4>By Sheldon Brown</H4>
<H2>Demonstrating a few HTML features</H2>
</div>
HTML is really a very simple language. '
<P>
'command, which will insert a blank line.If you would like to make a link or
bookmark to this page, the URL is:
<BR>
http://sheldonbrown.com/web_sample1.html
</center>
But the image appears behind the text instead of below!
What's wrong?
if iText cannot handle it - which library is better?
This is my code:
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream("C:\\hello-world.pdf"));
document.open();
String content = "<div align='center' style='height:50px'><H1>A Simple Sample Web Page</H1><IMG SRC='http://sheldonbrown.com/images/scb_eagle_contact.jpeg'><H4>By Sheldon Brown</H4><H2>Demonstrating a few HTML features</H2></div>HTML is really a very simple language. '<P>' command, which will insert a blank line.If you would like to make a link or bookmark to this page, the URL is:<BR> http://sheldonbrown.com/web_sample1.html</center>";
// use the snippet for the PDF document
List<Element> objects = HTMLWorker.parseToList(new StringReader(content), null);
for (Element element : objects)
document.add(element);
document.close();
Do you have any css applied to this HTML? Have you achieved to view this HTML in any other way with a browser (which) ? It renders like you describe here: http://jsfiddle.net/TjUSJ/.
Maybe you want to remove the height styling property on that <div>? It seems like it renders on the middle, but it is actually rendernig at 50px from the top. See this other fiddle, without height styling: http://jsfiddle.net/TjUSJ/1/
Also, remember that the <center> tag is deprecated
The problem was that I was using an old version.
I switched to the last one - 5.1.2 and it works!