jsoup: removing iframe tags - java

I am using jsoup 1.6.1 and facing the problem when I try to remove iframe tag from html. When iframe do not have any body(i.e <iframe pro=value />), the remove() method removes all the contents after thet tag. Here is my sample code.
String html ="<p> This is start.</p><iframe frameborder="0" marginheight="0" /><p> This is end</p>";
Document doc = Jsoup.parse(html,"UTF-8");<br>
doc.select("iframe").remove();<br>
System.out.println(doc.text());
It returns to me -
This is start.
But I am expecting the result -
This is start. This is end
Thanks in advance

It appears the closing tag for iframe is required. You can't use a self closing tag:
http://msdn.microsoft.com/en-us/library/ie/ms535258(v=vs.85).aspx
http://stackoverflow.com/questions/923328/line-after-iframe-is-not-visible
http://www.w3resource.com/html/iframe/HTML-iframe-tag-and-element.php
So, Jsoup is following the spec and taking whatever follows the iframe tag and using that as its body. When you remove the iframe, "This is the end" gets removed along with it.

Related

Get web elements as they shown through view source

Using this code below I get the href from a link element.
Sting url
List<WebElement> wElements = DriverFactory.getWebDriver().findElements(By.className("link-class"))
if(wElements.size() > 0) {
url = wElements[0].getAttribute("href")
}
The problem is that it finds the element but not the href attribute!!!
If I use "Firefox Inspector" the element appears as <span> with the right class name but without href attribute.
<div><span class="link-class">The Title</span></div>
If I use the "View Page Source" the same element appears as an <a> tag, it has href attribute, but different class name!!!
<a href="/href/attribute/here" class="other-link-class"<span>The Title</span></a>
So, is there any way to get the href of an <a> element but as it shown in "View Page Source"? Using Java of course.
There can be some minor difference in the WebElements as shown through View Source and as shown through Inspector tool.
Both of the methods are two different browser features which allows users to look into the HTML DOM of the webpage. The main difference is that, the View Source shows the HTML that was delivered from the AUT (Application under Test) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool.
As using the Firefox Inspector you don't find the href attributes, similarly when you execute your tests as the <span> elements doesn't contains the href attributes and the value of the href attributes can't be collected.

How to locate ::before element in Selenium and get its contents?

screenshot.png
Hi All,
I am having a situation where a ::before in HTML code is pointing to asterisk (mandatory field) in HTML page. Please see attached screenshot.
HTML code:
<lightning-input-field class="customRequired abc">
::before
<lightning-picklist>
</lightning-picklist>
</lightning-input-field>
How to write xpath for ::before?
I think there is no straight forward solution instead use javascript :
querySelector takes css selectors
css_selector = 'lightning-input-field[class="customerRequired"]'
browser.execute_script("return window.getComputedStyle(document.querySelector('{}'),':before').getPropertyValue('content')".format(css_selector))

Reading HTML using jsoup

so i am trying to get an HTML element from a website using Jsoup, but the HTML that i get from the Jsoup.connect(url) is not complete compared to the one that i get using the inspector on the website.
EDIT : this is the link i'm working with https://www.facebook.com/livemap##35.831640894,24.82275312499999,2z
The numbers in the end designate the coordinates of the map, and you don't have to sign in to access the page, so there is no authentication problem
UPDATE :
So i have found that the element that i want does not get expanded using jsoup, is this a problem related to slow page loading ? If so, how can i make sure that Jsoup.connect(url) fully loads the webpage before fetching the HTML
from inspector (the <div id="u_0_e"> is expanded)
from jsoup.connect (the <div id="u_0_e"> is not expanded)
Jsoup dont execute javascript or jQuery events, so you will get a initial page before executing javascript.

Trouble finding element with cssSelector having parameters at end of URL

I have a link in my application as mentioned below:
<a title="xyz" href="abc/home?locale=en"> some text </a>
I wrote a cssSelector to get this element.
a[href*='home?locale=en']
The problem is that this css selector works fine in Firebug, Firepath and Chrome console. However it does not identify element(s) with Selenium WebDriver i.e. By.cssSelector("a[href*='home?locale=en']") does not work.
I identified that character ? is the problem. However I do not know how can I bypass it.
Instead of using href with cssSelector, use By.linkText() or By.partialLinkText() to identify the element. try the below selector.
driver.findElement(By.partialLinkText('some text'))
The trouble was that my server is appending a jsessionid after the href. The jsessionId is not shown appended on the browser side.
href during System.out.print
https://abc/home;jsessionid=3213323123123AAA3232?locale=en
Same href on browser side:
/abc/home?locale=en

How to insert contents into HTML page's iframe using HTMLUnit?

How can i insert some iframes inside a HTML page's iframe.
<HTML>
<div id="data">
<iframe height="160" width="600">
</iframe>
</div>
</HTML>
i could able to find the specific location using xpath
HtmlInlineFrame frame = (HtmlInlineFrame)page.getByXPath("//div[#id='data']/iframe").get(0);
i'm not clear how can i insert another htmlpage (iframe as htmlpage) inside this selected iframe. i have to insert more than one htmlpage (iframes as htmlpages) into this iframe Please suggest some way.
((HtmlInlineFrame)page1.getByXPath("//div[#id='data']/iframe").get(0)).setTextContent(page2.asXml());
this will work, still there is a problem that, there is a parser working in between, that is content set as
page2.asXml();
will set the content. After that when viewing the page as xml all '<' replaced with < and '>' replaced with >
((HtmlInlineFrame)page1.getByXPath("//div[#id='data']/iframe").get(0)).appendChild(page2);
will fix earlier issue still it will add two unwanted lines

Categories