So I found that if you get the page source with WebDriver, you actually get the generated source of the entire DOM (not just the HTML source code of the page that loaded). You can then use this String to generate a Jsoup Document. This is cool, because Jsoup is much faster than WebDriver at searching for elements, it also has a much better API to do so.
So, is there anyway to turn a Jsoup Element into a WebDriver WebElement? I saw another post on stackoverflow about using a method to generate an xpath from the Jsoup document, but that's not what I'm looking for since WebDriver will still have to parse the page and use the Xpath to lookup the element, defeating the purpose (unless your porpuse is purely to use Jsoup for its superior Selector methods).
The reason I want to try and use Jsoup to find WebElements for WebDriver is because on some websites, WebDriver is very very slow (I work for a company that automation hundreds of 3rd party websites, we have no control over these sites).
There seems to be a confusion between interactive and non-interactive tools here.
WebDriver tests are very often slow (in my experience) due to unnecessary and defensive waits and delays, using improperly-understood frameworks, and often written by junior or outsourced developers - but fundamentally also because WebDriver is mimicking a real user's actions in 'real time' on a real browser, and communicating with the browser app using an API (based on a specification) and a protocol. It's interactive.
(Less so with HtmlUnit, PhantomJS etc.)
By contrast, Jsoup is just a glorified HTTP client with extra parsing capabilities. It's non-interactive, and ultimately works off a snapshot String of data. We'd expect it to be much faster for its particular use-cases.
Clearly both are HTTP clients of a sort, and can share static web content, which is why WebDriver could pass data off for processing by Jsoup (though I've never heard of this use-case before).
However, Jsoup can never turn one of its Elements (a Java snapshot object containing some properties) into a WebDriver WebElement, which is more a kind of 'live' proxy to a real and interactive object within a program like Firefox or Chrome. (Again, less so with HtmlUnit, PhantomJS etc.)
So you need to decide whether interactivity is important to you. If it's crucial to mimic a real user, WebDriver has to 'drive' the process using a real browser.
If it's not, then you can consider the headless browsers like HtmlUnit and (especially) PhantomJS, as they will be able to execute JavaScript and update the DOM in a way that the HTTP libraries and Jsoup can't. You can then pass the output to Jsoup etc.
Potentially, if you went down the PhantomJS route, you could do all your parsing there using the JavaScript API. See: Use PhantomJS to extract html and text etc.
For a lot of people, interactivity isn't important at all, and it's quicker to drop WebDriver completely and rely on the libraries.
I know this question is incredibly old, but just so anyone who comes to see this can find this answer. This will return an xpath from your Jsoup Element. This was translated to Java by me, but the original source I copied the code from was https://stackoverflow.com/a/48376038/13274510.
You can then use the xpath with WebDriver
Edit: Code works now
public static String jsoupToXpath(Element element) {
String xpath = "/";
List<String> components = new ArrayList<>();
Element child = element.tagName().isEmpty() ? element.parent() : element;
System.out.println(child.tag());
while (child.parent() != null){
Element parent = child.parent();
Elements siblings = parent.children();
String componentToAdd = null;
if (siblings.size() == 1) {
componentToAdd = child.tagName();
} else {
int x = 1;
for(Element sibling: siblings){
if (child.tagName().equals(sibling.tagName())){
if (child == sibling){
break;
} else {
x++;
}
}
}
componentToAdd = String.format("%s[%d]", child.tagName(), x);
}
components.add(componentToAdd);
child = parent;
}
List<String> reversedComponents = new ArrayList<>();
for (int i = components.size()-1; i > 0; i--){
reversedComponents.add(components.get(i));
}
xpath = xpath + String.join("/", reversedComponents);
return xpath;
}
Related
In several threads here, there is a work-around posted for selenium drag and drop with pages that use HTML5 for drag and drop. This work-around involves using javascript to simulate the drag and drop, for example Unable to perform HTML5 drag and drop using javascript for Selenium WebDriver test, and https://gist.github.com/rcorreia/2362544. This solution works well on this page, http://the-internet.herokuapp.com/drag_and_drop.
The general approach is to read the javascript file here (https://gist.github.com/rcorreia/2362544#file-drag_and_drop_helper-js) into a string, referred to as 'jsfile' below.
then in selenium (with java), pass in the css selectors for the source and the destination, where #column-a is the id of the source and #column-b is the target.
((JavascriptExecutor) driver).executeScript(jsfile +"$('#column-a').simulateDragDrop({ dropTarget: '#column-b'});");
It works like a champ on that page.
However, a similar approach does not seem to work on this page, https://crossbrowsertesting.github.io/drag-and-drop.html. Nothing happens when I run
((JavascriptExecutor) driver).executeScript(jsfile +"$('#draggable').simulateDragDrop({ dropTarget: '#droppable'});");
I have pages that seem to behave like this second page (eg no drag and drop). As a first step in understanding this, I'd like to get an idea why this approach does not seem to work in the latter case here.
On re-testing https://crossbrowsertesting.github.io/drag-and-drop.html, it looks like the straight-forward use of the Actions class does the trick for drag and drop. In the particular app that I am testing, which is set up with some additional code to help with accessibility, I was able to get drag and drop happening by setting focus on the first element and hitting the return key, then setting the focus on the target element and hitting return again. I am fairly sure that this is custom event handling, so may not work in other applications. Just in case, I've posted code here which does this in selenium.
public void dndHtml5(String xPathSource, String xPathDestination) {
clickEnterKeyOnElement(xPathSource);
clickEnterKeyOnElement(xPathDestination);
}
public void clickEnterKeyOnElement(String xPath) {
setFocusOnElement(xPath);
WebElement target=element(xPath);
target.sendKeys(Keys.ENTER);
}
public void setFocusOnElement(String xPath) {
WebElement element = element(xPath);
Actions actions = new Actions(driver);
actions.moveToElement(element).build().perform();
}
public WebElement element(String xPath){
return driver.findElementByXPath(xPath);
}
I want to automate a test using selenium java in which I need to check whether a specific text is NOT present on the entire page.
The page has many elements where this text may be present. In general, if it were a few elements, I could resolve this via standard driver.findElement(By.xpath("//somepath")).getText(). But, I want to write an efficient test that doesn't have tons of locators just for this test.
You can use XPATH selector '//*[contains(text(), "YOUR_TEXT")] to find a text that you need.
Example of the code on Python:
def find_text_on_the_page(text):
selector = '//*[contains(text(), "{}")]'.format(text)
elements = browser.find_elements_by_xpath(selector)
assert len(elements), 'The text is not on the page'
Hope, this will help you.
You could try to locate it via xPath selector, but I would not recommend test like that.
Surely You know where to look for some test? At least a part of web page.
Here is code sample how to achieve this:
List<WebElement> list = driver.findElements(By.xpath("//*[contains(text(),'" + text + "')]"));
Assert.assertTrue("Text not found!", list.size() > 0);
or
String bodyText = driver.findElement(By.tagName("body")).getText();
Assert.assertTrue(bodyText.contains(text));
or in method:
public boolean isTextOnPagePresent(String text) {
WebElement body = driver.findElement(By.tagName("body"));
String bodyText = body.getText();
return bodyText.contains(text);
}
Hope this helps, but as I mentioned try defining a smaller scope for testing.
This question already has answers here:
Page content is loaded with JavaScript and Jsoup doesn't see it
(8 answers)
Closed 6 years ago.
I am trying to parse a webpage and extract data using Jsoup. But the link is dynamic and throws up a wait-for-loading page before displaying the details. So the Jsoup seems to process the waiting page rather than the details page. is there anyway to make this wait till page is fully loaded?
If some of the content is created dynamically once the page is loaded, then your best chance to parse the full content would be to use Selenium with JSoup:
WebDriver driver = new FirefoxDriver();
driver.get("http://stackoverflow.com/");
Document doc = Jsoup.parse(driver.getPageSource());
Probably, the page in question is t generated by JavaScript in the browser (client-side). Jsoup does not interpret JavaScript, so you are out of luck. However, you could analyze the page loading in the network tab of the browser developer tools and find out which AJAX calls are made during page load. These calls also have URLs and you may get all infos you need by directly accessing them. Alternatively, you can use a real browser engine to load the page. You can use a library like selenium webdriver for that or the JavaFX webkit component if you are using Java 8.
I think i am just expanding luksch's answer a bit more. I am not familiar with web frameworks, so the answer looked a little difficult to understand. Since page was loading dynamically using a parser like Jsoup is difficult since we must know that all the elements are loaded completely before attempting a parsing. So instead of parsing immediately, use the webdriver(selenium) to check for elements status and once they are loaded, get the page source and parse or use the webdriver itself to gather the data needed instead of using a separate parser.
WebDriver driver = new ChromeDriver();
driver.get("<DynamicURL>");
List<WebElement> elements = null;
while (elements == null)
{
elements = driver.findElements(By.className("marker"));
if (!valuePresent(elements))
{
elements = null;
}
}
if (elements != null)
{
processElements(elements);
}
I'm trying to extract Google Translate's pinyin transliteration of a Chinese word using Selenium but am having some trouble finding its WebElement.
For example, the word I look up is "事". My code would be as follows:
String word = "事";
WebDriver driver = new HtmlUnitDriver();
driver.get("http://translate.google.com/#zh-CN/zh-CN/" + word);
When I go to the actual page using my browser, I can see that its pinyin is "Shì" and that its id, according to Inspect Element is src-translit. However, when I go to view source, though the id="src-translit" is present, you don't see anything resembling "Shì" nearby. It's simply empty.
Thinking that the page has had no time to load properly. I implemented a waiting period of 30 seconds (kind of a long wait, I know, but I just wanted to know if it would work).
int timeoutInSeconds = 30;
WebDriverWait wait = new WebDriverWait(driver, timeoutInSeconds);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("src-translit")));
Unfortunately, even with the wait time, transliteration and its text still returns as empty.
WebElement transliteration = driver.findElement(By.id("src-translit"));
String pinyin = transliteration.getText();
My question, then, is: what's happened to the src-translit? Why won't it display in the html code and how can I go about finding it and copying it from Google Translate?
Sounds like javascript isn't being executed. Looking at the docs, you can enable javascript like this
HtmlUnitDriver driver = new HtmlUnitDriver();
driver.setJavascriptEnabled(true);
or
HtmlUnitDriver driver = new HtmlUnitDriver(true);
See if that makes a difference.
EDIT:
I still think the problem is related to javascript. When I run it using FirefoxDriver, it works fine: the AJAX request is made, and src-translit element has been updated with Shi.
Workaround:
In any case, monitoring the network traffic, you can see that when you want to translate 事 , it makes an AJAX call to
http://translate.google.com/translate_a/t?client=t&sl=zh-CN&tl=zh-CN&hl=en&sc=2&ie=UTF-8&oe=UTF-8&pc=1&oc=1&otf=1&rom=1&srcrom=1&ssel=0&tsel=0&q=%E6%B2%92%E4%BA%8B
Which returns JSON:
[[["事","事","Shì","Shì"]],,"zh-CN",,[["事",,false,false,0,0,0,0]],,,,[],10]
Maybe you could parse that instead for now.
I am using htmlunit in jython and am having trouble selecting a pull down link. The page I am going to has a table with other ajax links, and I can click on them and move around and it seems okay but I can't seem to figure out how to click on a pulldown menu that allows for more links on the page(this pulldown affects the ajax table so its not redirecting me or anything).
Here's my code:
selectField1 = page.getElementById("pageNumSelection")
options2 = selectField1.getOptions()
theOption3 = options2[4]
This gets the option I want, I verify its right. so I select it:
MoreOnPage = selectField1.setSelectedAttribute(theOption3, True)
and I am stuck here(not sure if selecting it works or not because I don't get any message, but I'm not sure what to do next. How do I refresh the page to see the larger list? When clicking on links all you have to do is find the link and then select linkNameVariable.click() into a variable and it works. but I'm not sure how to refresh a pulldown. when I try to use the webclient to create an xml page based on the the select variable, I still get the old page.
to make it a bit easier, I used htmlunit scripter and got some code that should work but its java and I'm not sure how to port it to jython. Here it is:
try
{
page = webClient.getPage( url );
HtmlSelect selectField1 = (HtmlSelect) page.getElementById("pageNumSelection");
List<HtmlOption> options2 = selectField1.getOptions();
HtmlOption theOption3 = null;
for(HtmlOption option: options2)
{
if(option.getText().equals("100") )
{
theOption3 = option;
break;
}
}
selectField1.setSelectedAttribute(theOption3, true );
Have a look at HtmlForm getSelectedByName
HtmlSelect htmlSelect = form.getSelectByName("stuff[1].type");
HtmlOption htmlOption = htmlSelect.getOption(3);
htmlOption.setSelected(true);
Be sure that WebClient.setJavaScriptEnabled is called. The documentation seems to indicate that it is on by default, but I think this is wrong.
Alternatively, you can use WebDriver, which is a framework that supports both HtmlUnit and Selenium. I personally find the syntax easier to deal with than HtmlUnit.
If I understand correctly, the selection of an option in the select box triggers an AJAX calls which, once finished, modifies some part of the page.
The problem here is that since AJAX is, by definition, asynchronous, you can't really know when the call is finished and when you may inspect the page again to find the new content.
HtmlUnit has a class named NicelyResynchronizingAjaxController, which you can pass an instance of to the WebClient's setAjaxController method. As indicated in the javadoc, using this ajax controller will automatically make the asynchronous calls coming from a direct user interaction synchronous instead of asynchronous. Once the setSelectedAttribute method is called, you'll thus be able to see the changed made to the original page.
The other option is to use WebClient's waitForBackgrounfJavascript method after the selection is done, and inspect he page once the background JavaScript has ended, or the timeout has been reached.
This isn't really an answer to the question because I've not used HtmlUnit much before, but you might want to look at Selenium, and in particular Selenium RC. With Selenium RC you are able to control the interactions with a page displayed in a native browser (Firefox for example). It has developer API's for Java and Python amongst others.
I understand that HtmlUnit uses its own javascript and web browser rendering engine and I'm wondering whether that may be a problem.