JSoup CSS / DOM questions

JSoup CSS / DOM questions - java

1.
(From:https://www.virustotal.com/en/file/7b6b268cbca9d421aabba5f08533d3dcaba50e0f7887b07ef2bd66bf218b35ff/analysis/)
I want to get the text in the picture, in Google Developer Tools I would do that (I basically went into another childnode of the span to find the md5 in DevTools but in Jsoup it seems different and only returns the "md5" text)
document.getElementById("additional-info-content").childNodes[1].children[1].childNodes[1].innerHTML
I cant manage to get it using JSoup dom/selector.
(If it's possible to give both of these examples)
2.
How do I specify a child in CSS in Jsoup?
For example, I right click on the span class field above the blue marked line in the picture, and click "Copy Selector":
#file-details > div:nth-child(2) > div:nth-child(1) > span
It gives me file-details as first div, even thought its not the only file-details in the document, but okay, lets say it should be like that(?):
#additional-info-content > div:file-details > div:nth-child(2) > div:nth-child(1) > span
How do I manage to translate it into a working JSoup CSS script with the child? (If possible then DOM example aswell)
3.
Is there a good insight on how to look and how to find the right path when looking for a specific value/node?
What I do now is basically open Developer Tools, then click on a unique div class name, and I check the properties window inside the DevTools for the child nodes, and keep digging with the child nodes till I find the right path...(Like I copied in the first question)
Is there a better way to look at this?
I mean, using the DevTools console is so simple, just writing
.children[1].childnodes[3].children[1] while looking at the properties and seeing the correct attribute that I need, but I know it's not the right way I guess?

1)
// connect to url and retrieve source code as document
Document doc = Jsoup
.connect(url)
.userAgent("Mozilla/5.0")
.referrer("http://www.google.com")
.get();
String md5= doc
// use CSS selector to grab only enums which contain md5
.select("div#file-details.extra-info > div.enum-container > div.enum:contains(md5)")
// use the first element in the result set
.first()
// use only its text node and ignore the text node of the span
.ownText();
2) There are lots of ways to specify children. You can use CSS selectors or some of the jsoup convenience methods.
If I want to extract the text foo from the following html:
<html>
<body>
<div>
<span><b>foo</b></span>
<span><b>bar</b></span>
</div>
</body>
</html>
Each of these will produce the same result:
doc.select("div > span > b").last().ownText();
doc.select("div > span > b").get(1).ownText();
doc.select("div > span:last-child > b").text();
doc.select("div > span:last-child").text();
doc.select("div > span").last().text();
doc.select("div > span").get(1).text();
doc.select("div > span:last-child > b").first().ownText();
doc.select("span > b").last().text();
Deciding which way to go really depends on the HTML structure of the document you are parsing. See CSS Selectors for more examples.
3) Examine the source code, not the code rendered in the browser. Jsoup does not invoke JavaScript. If the DOM of your page is changed onLoad, then you need to render the page before parsing it. Here is an example of how to do this: https://stackoverflow.com/a/38572859/1176178

Related

How to get the right Xpath for the <li> HTML element?

i should make Selenium to click on the element of drop down menu using Java and Inteliji. I should click on the "today" button. I tried to copy the xpath, use cssselector, i used extensions like xpath finder etc, no result. The element is <li> type, so i guess the problem is here. Any suggestions how to find the correct Xpath?
P.S. sorry for uploading the image, as a new user, i can't put them exactly in the text.
Drop down menu image
html code for the elements

You can't always get reusable XPath locator for selenium from the browser's tool. It returns an absolute XPath. You need to construct relative XPath for the elements.
Here you can learn about XPath and how XPath locators work.
The following locators based on the image you have posted.
XPath:
WebElement liToday = driver.findElement(By.xpath("//div[contains(#class,'daterangepicker') and contains(#class,'dropdown-menu')]/div[#class='ranges']/ul/li[text()='Today']"));
CSS Selector:
WebElement liToday = driver.findElement(By.cssSelector("div.daterangepicker.dropdown-menu > div.ranges > ul > li"));
After locating the element,
this part is for after you have clicked the date box and the dropdown is showing.
new WebDriverWait(driver,30).until(ExpectedConditions.visibilityOf(liToday));
liToday.click();

Dynamic element inside several 'li'

now I have more complicated (at least for me) and struggling to find element (which is dynamic one - changes on daily basis). Following is how it looks on page. On top is 'ul'
<ul class="feed-tips" id="Grid"
Below are 50 'li' with same name:
<li class="feed-item vevent tip-list-row"
Below one of those 'li' are
<div class="tip medium-9 small-12 column padding-reset dtstart tip-list-row__tip">
Heading4
So, link to a page & Heading4 (in 'href') are dynamic ones and it will be useless from i.e tomorrow.
Above is 5th 'li' in the list and I tried to find element with css selector but it does not work - here is what I tried:
//Open 5th from the list
driver.findElement(By.cssSelector("#Grid > li:nth-child(5) > div.tip.medium-9.small-12.column.padding-reset.dtstart.tip-list-row__tip > div.tip-match.medium-12.column > div.tip-teams > a")).click();
Thank you in advance.

In case you are trying to find the anchor element with dynamic href, you can use somewhat the xpath as below:
//ul[#id='Grid']/li//div[contains(#class, 'tip-teams')]//a[#href]
I did not understand the total problem, but it will list down all the links with attribute href within that hierarchy. In this case, do not use any sort of indexing. Also, not required to drill down all the levels of hierarchy.
//a[#href] - It will provide you all links with #href without comparing any value.

Java jsoup selecting links

I am trying to develop web scraper, I can extract all the links from a page, but I want to get some specific ones, I checked but I could not manage it as I dont have good knowledge in HTML

You can use the CSS selector presented in the snippet below:
doc.select("div.indepth-content > div.content > ul.indepth-list a")
On the screenshot, it seems you're using Chrome browser. If so, next time you can ask it to generate the CSS query for you:
Right click on the element you target
Click on "Inspect" (a node should appear selected)
Right click on this node then select Copy entry and Copy selector sub-entry
=> The CSS selector is copied in the clipboard
Please note that Chrome tends to generate (very) long CSS queries. Also, it can't generate CSS selectors for matching multiple elements.
However, if you type CTRL + F while the DevTools pane is opened and Elements tab selected, you can type a CSS selector and browse among the matched elements.
For more details, you can have look at the following resources:
JSoup CSS selector tutorial
JSoup CSS selector full syntax
How to generate CSS selectors with Chrome Developer Tools?

Element divcontent = doc.select("div.content").first();
Element ul = divcontent.select("ul.indepth-list").first();
ul.select("a[href]");
Written without editor so i can't remember if the syntax is correct.

How can I click on a particular href in the code below

I am in a situation where there are no unique id and there are number of div's under a class. Cssselector and xpath's are so generic that they are not being recognized.
This is what the Html looks like:
This is my code which doesn't work:
#Test
public void NaviToEpisode(){
driver.findElement(By.linkText("/episode")).click();
title_episode = driver.getTitle();
Assert.assertTrue(title_episode.contains("File uploading"));
}
Please help!

You can use cssSelector, in your case it would be:
driver.findElement(By.cssSelector("#links>div>a").click();
If you use Firefox, install Firebug plugin, then right click on the element you wish to inspect and in menu click on "Inspect with Firebug", once the snippet of your code highlighted right click on it and you should see an option to copy xpath or css.

try this driver.findElement(By.xpath("//*[contains(#href, '/episode/')]")).click();

<div id="links" . . > seems to be static I hope it's unique as well. Following css selector can be used to select first link (i.e. /episodes/)
#links div:nth-child(1) a
Similarly you can use css selectors to select sub-sequent elements. For example to select 2nd element:
#links div:nth-child(2) a
So instead of using By.linkText("/episode"), use By.cssSelector("#links div:nth-child(1) a").

How to access the link by searching its text in Selenium WebDriver?

I have a HTML File hierarchy in tree structure in a web page
as shown in picture.
The HTML code is
<div class="rtMid rtSelected">
< span class="rtSp"/>
< img class="rtImg" alt="Automation" src="http://192.168.1.6/eprint_prod_3.8/images/StoreImages/close_folder.png"/>
< span class="rtIn" title="Automation">Automation (1)</span>
</div>
In Selenium WebDriver is there a way to click on the Automation (1) link by searching only the text I don't want to use XPath reason is the location will be changing so is there a way to find it by its text and click on it.

XPath is powerful, you found it's unreliable you are not using it right. Spend some time at XPath Tutorial please.
This is a simple solution to your question, but there could be many other things you need to think about. E.g. matching title and text, etc.
driver.findElement(By.xpath(".//span[text()='Automation (1)']")).click();
CSS selector is also powerful and faster, more readable than XPath. But in your case, it doesn't support find by text.

searching by title worked well
driver.findElement(By.xpath("//span[contains(#title,'Automation')]")).click();

2 Approaches
Approach 1:
By Class Name:
Here we are having class Name for the Text Automation (1) that is rtIn.
Perform driver.findElement(By.className("rtIn")).click();
Approach 2:
By CSS Selector of Parent and Class Name
CSS Selector of Parent:.rtSelected
WebElement element1 = driver.findElement(By.cssSelector(".rtSelected"))
element1.className("rtIn").click();
Approach 3:
By Direct CSS Selector:
1. .rtIn
2. .rtSelected > .rtIn
It is better to use the second CSS Selector
driver.findElement(By.cssSelector(".rtSelected > .rtIn")).click();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JSoup CSS / DOM questions - java

Related

How to get the right Xpath for the <li> HTML element?

Dynamic element inside several 'li'

Java jsoup selecting links

How can I click on a particular href in the code below

How to access the link by searching its text in Selenium WebDriver?

Categories

Resources