Java jsoup selecting links - java

I am trying to develop web scraper, I can extract all the links from a page, but I want to get some specific ones, I checked but I could not manage it as I dont have good knowledge in HTML

You can use the CSS selector presented in the snippet below:
doc.select("div.indepth-content > div.content > ul.indepth-list a")
On the screenshot, it seems you're using Chrome browser. If so, next time you can ask it to generate the CSS query for you:
Right click on the element you target
Click on "Inspect" (a node should appear selected)
Right click on this node then select Copy entry and Copy selector sub-entry
=> The CSS selector is copied in the clipboard
Please note that Chrome tends to generate (very) long CSS queries. Also, it can't generate CSS selectors for matching multiple elements.
However, if you type CTRL + F while the DevTools pane is opened and Elements tab selected, you can type a CSS selector and browse among the matched elements.
For more details, you can have look at the following resources:
JSoup CSS selector tutorial
JSoup CSS selector full syntax
How to generate CSS selectors with Chrome Developer Tools?

Element divcontent = doc.select("div.content").first();
Element ul = divcontent.select("ul.indepth-list").first();
ul.select("a[href]");
Written without editor so i can't remember if the syntax is correct.

Related

Get web elements as they shown through view source

Using this code below I get the href from a link element.
Sting url
List<WebElement> wElements = DriverFactory.getWebDriver().findElements(By.className("link-class"))
if(wElements.size() > 0) {
url = wElements[0].getAttribute("href")
}
The problem is that it finds the element but not the href attribute!!!
If I use "Firefox Inspector" the element appears as <span> with the right class name but without href attribute.
<div><span class="link-class">The Title</span></div>
If I use the "View Page Source" the same element appears as an <a> tag, it has href attribute, but different class name!!!
<a href="/href/attribute/here" class="other-link-class"<span>The Title</span></a>
So, is there any way to get the href of an <a> element but as it shown in "View Page Source"? Using Java of course.
There can be some minor difference in the WebElements as shown through View Source and as shown through Inspector tool.
Both of the methods are two different browser features which allows users to look into the HTML DOM of the webpage. The main difference is that, the View Source shows the HTML that was delivered from the AUT (Application under Test) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool.
As using the Firefox Inspector you don't find the href attributes, similarly when you execute your tests as the <span> elements doesn't contains the href attributes and the value of the href attributes can't be collected.

Picking a JavaScript button in HtmlUnit

In AliX, this page for example https://www.aliexpress.com/item/32956185908.html
how do you pick the particular country and colour?
For example, following js configure as country:
"skuPropertyValues":[
{"propertyValueDisplayName":"China","propertyValueId":201336100,"propertyValueIdLong":201336100,"propertyValueName":"China","skuPropertySendGoodsCountryCode":"CN","skuPropertyTips":"China","skuPropertyValueShowOrder":2,"skuPropertyValueTips":"China"},
{"propertyValueDisplayName":"GERMANY","propertyValueId":201336101,"propertyValueIdLong":201336101,"propertyValueName":"GERMANY","skuPropertySendGoodsCountryCode":"DE","skuPropertyTips":"GERMANY","skuPropertyValueShowOrder":2,"skuPropertyValueTips":"GERMANY"},
{"propertyValueDisplayName":"SPAIN","propertyValueId":201336104,"propertyValueIdLong":201336104,"propertyValueName":"SPAIN","skuPropertySendGoodsCountryCode":"ES","skuPropertyTips":"SPAIN","skuPropertyValueShowOrder":2,"skuPropertyValueTips":"SPAIN"},
{"propertyValueDisplayName":"Russian Federation","propertyValueId":201336103,"propertyValueIdLong":201336103,"propertyValueName":"Russian Federation","skuPropertySendGoodsCountryCode":"RU","skuPropertyTips":"Russian Federation","skuPropertyValueShowOrder":2,"skuPropertyValueTips":"Russian Federation"}]},
I'm not sure how to pick one through JavaScript or am I looing at the wrong thing, would it CS and picking up a div tag?
You have to enable js support and wait after retrieving the page until all the javascript is done. The js code will generate div's from you code above (use page.asXML() to get an idea of the page fro the viewpoint of HtmlUnit.
If the div's are there you can click() on them - like on any other html element. This click should trigger the same js code like in real browsers.
To find the div elements please have a look at https://htmlunit.sourceforge.io/gettingStarted.html; there are many different options listed.
Like #RBRi said, use the click method on the element. To choose the element (DIV in this case) you can use the getbyxpath method:
Starting from the page you can access using the XPATH of the element. The XPATH can be easily obtained using the browser inspect tool and copying the full XPATH of the div (right click over the div and use the copy option). For the purpose of your URL, "CHINA" element has the XPATH: /html/body/div[6]/div/div[2]/div/div[2]/div[7]/div/div[1]/ul/li[1]/div/ "GERMANY": /html/body/div[6]/div/div[2]/div/div[2]/div[7]/div/div[1]/ul/li[2]/div/span
page1 = webClient.getPage("https://www.aliexpress.com/item/32956185908.html");
// get china div using xpath
element = ((HtmlElement)page1.getByXPath("/html/body/div[6]/div/div[2]/div/div[2]/div[7]/div/div[1]/ul/li[1]/div/").get(0));
// Click over china
element.click();
webClient.waitForBackgroundJavaScript(10000);
If you want to iterate over the list of countries you can use the same approach, but this time copy the XPATH of the div that contains all the countries, get that element, and from this element then iterate by getting the divs. In this case, you can use the attribute "class" to get those elements:
page1 = webClient.getPage("https://www.aliexpress.com/item/32956185908.html");
element = ((HtmlElement)page1.getByXPath("/html/body/div[6]/div/div[2]/div/div[2]/div[7]/div/div[1]/ul").get(0); // Get the div that contains all countries
List<HtmlElement> elements = element.getElementsByAttribute("div", "class", "sku-property-text");
Then you can iterate over the list of elements and click the one you prefer.
:
choosenElement.click();
:

How to get the right Xpath for the <li> HTML element?

i should make Selenium to click on the element of drop down menu using Java and Inteliji. I should click on the "today" button. I tried to copy the xpath, use cssselector, i used extensions like xpath finder etc, no result. The element is <li> type, so i guess the problem is here. Any suggestions how to find the correct Xpath?
P.S. sorry for uploading the image, as a new user, i can't put them exactly in the text.
Drop down menu image
html code for the elements
You can't always get reusable XPath locator for selenium from the browser's tool. It returns an absolute XPath. You need to construct relative XPath for the elements.
Here you can learn about XPath and how XPath locators work.
The following locators based on the image you have posted.
XPath:
WebElement liToday = driver.findElement(By.xpath("//div[contains(#class,'daterangepicker') and contains(#class,'dropdown-menu')]/div[#class='ranges']/ul/li[text()='Today']"));
CSS Selector:
WebElement liToday = driver.findElement(By.cssSelector("div.daterangepicker.dropdown-menu > div.ranges > ul > li"));
After locating the element,
this part is for after you have clicked the date box and the dropdown is showing.
new WebDriverWait(driver,30).until(ExpectedConditions.visibilityOf(liToday));
liToday.click();

Unable to locate element in Selenium webdriver 2.0

I am unable to locate this element with the class name. Below is the HTML code:
<a class="j-js-stream-options j-homenav-options jive-icon-med jive-icon-gear" title="Stream options" href="#"></a>
I tried to create an xpath using class and title both of them did the work in eclipse...ex:
//a[#title='Stream options']
//a[contains(#class,'j-js-stream-options j-homenav-options jive-icon-med jive-icon-gear')]
..
the None of the above options worked and I tried a few others too...Essentially I want to click on this element and do some action..I want to locate the randomly created xpath so that I can click on the element in the next run.
FYI: the element is a hidden element I need to click on some other element before this element appears. This is a dynamically created element whose expath changes all the time.
Any suggestions would be greatly appreciated...Thanks
Is the element you want to select in a separate iframe? If so, you need to switch to the correct iframe (driver.switchTo().frame("frame-id")) before firing the xpath selector.
Additionally, something to watch out for is that old versions of IE didn't have a native xpath library. See this answer for more details.

How to access the link by searching its text in Selenium WebDriver?

I have a HTML File hierarchy in tree structure in a web page
as shown in picture.
The HTML code is
<div class="rtMid rtSelected">
< span class="rtSp"/>
< img class="rtImg" alt="Automation" src="http://192.168.1.6/eprint_prod_3.8/images/StoreImages/close_folder.png"/>
< span class="rtIn" title="Automation">Automation (1)</span>
</div>
In Selenium WebDriver is there a way to click on the Automation (1) link by searching only the text I don't want to use XPath reason is the location will be changing so is there a way to find it by its text and click on it.
XPath is powerful, you found it's unreliable you are not using it right. Spend some time at XPath Tutorial please.
This is a simple solution to your question, but there could be many other things you need to think about. E.g. matching title and text, etc.
driver.findElement(By.xpath(".//span[text()='Automation (1)']")).click();
CSS selector is also powerful and faster, more readable than XPath. But in your case, it doesn't support find by text.
searching by title worked well
driver.findElement(By.xpath("//span[contains(#title,'Automation')]")).click();
2 Approaches
Approach 1:
By Class Name:
Here we are having class Name for the Text Automation (1) that is rtIn.
Perform driver.findElement(By.className("rtIn")).click();
Approach 2:
By CSS Selector of Parent and Class Name
CSS Selector of Parent:.rtSelected
WebElement element1 = driver.findElement(By.cssSelector(".rtSelected"))
element1.className("rtIn").click();
Approach 3:
By Direct CSS Selector:
1. .rtIn
2. .rtSelected > .rtIn
It is better to use the second CSS Selector
driver.findElement(By.cssSelector(".rtSelected > .rtIn")).click();

Categories