How to get Attribute using selector syntax in jsoup - java

I need to get value of attribute href of a tag.
I know using a.attr("href") I can get href attribute value.
But I want to know is there any other way to get href attribute as like in jTidy
(using syntax like //a/#href) for Jsoup.
Means can I use some selector syntax to get attribute directly ?
Thanks.

No, you cant retrieve the attribute value by a single selector. Its purpose is to select elements by various criteria.
But you can select only those elements which have the attribute and then ask it's value.
Element withAttr = doc.select("a[href]").first();
String attrAvlue = withAttr.attr("href");

Related

Get element from multiple div class with colon in css html

There are 2 classes with the same name
<div class="website text:middle"> A</div>
<div class="website text:middle"> 1</div>
How to get A and 1? I tried using getElementById with :eq(0) and it gives out null
Method getElementById queries for elements with a specified id, not class; I'm not sure what you were trying to query with :eq(0) either.
Try:
// String html = ...
Document doc = Jsoup.parse(html);
List<String> result = doc.getElementsByClass("text:middle").eachText();
// result = ["A", "1"]
EDIT
You can query for elements that match multiple classes! See Jsoup select div having multiple classes.
However, a colon (:) is a special character in css and needs to be escaped when it appears as part of a class name in a selector query. I don't think that jsoup currently supports this and simply treats everything after a colon as a pseudo-class.
To add to Janez's correct answer - while jsoup's CSS selector (currently) doesn't support escaping a : character in the class name, there are other ways to get it to work if you want to use the select() method instead of getElementsByXXX -- e.g. if you want to combine selectors in one call:
Elements divs = doc.select("div[class=website text:middle]");
That will find div elements with the literal attribute class="website text:middle". Example.
Or:
Elements divs = doc.select("div[class~=text:middle]");
That finds elements with the class attribute that matches the regex /text:middle/. Example
For the presented data though, I think think the getElementsByClass() DOM method is the way to go and the most general. I just wanted to show a couple alternatives for other cases.
document.querySelectorAll(".website")[0] // 0 is child index
you should use querySelector it is fully supported by every browser
check this for support details support

How to select all classNames of a page using java?

I'm using selenium (with java) to search all the classNames of a Page and then use Regex to only save the className(s) which have "insignia" in them.
I tried using the below code with regex to search for classNames which a mention of "insignia" in them but it didn't return any result.
System.out.println(driver.findElements(By.className(".*\\binsignia\\b.*")).get(1).getAttribute("src"));
You can't use Regex inside a locator string. You can use a CSS selector and find all elements that contain "insignia" in the class name.
System.out.println(driver.findElements(By.cssSelector("[class*='insignia']")).get(1).getAttribute("src"));
CSS selector reference
driver.findElements(By.xpath("//*[contains(#class,'.*\\binsignia\\b.*')]")
is going to return the webElements containing class name insignia

How to extract a list of string from xml file?

I have an xml file and an attribute "name" in some of the tags.
If I give the correct xpath - is there a way to extract a list of strings, each element being one of the values of this attribute?
(I do not need to get the entire list of DOM nodes...)
With XPath 2.0 or with XQuery you can write //#name/string() to get a sequence of string values of all name attributes in the document. With XPath 1.0 you can select the attribute nodes with //#name but then you need to use your host language (e.g. Java) to build a list of all the attribute values.

How to getText() for XPath with attribute 'display:none'

I would like to use getText() for one XPath, need text what is there.
//span(contains(#style,'display:none'))
XPath is working tested in firebug, I've tried getText, getAttribute, so far no luck
It's a little hard to say without the exact HTML, which you have not specified in your question...
To begin with, you need to change this:
"//span(contains(#style,'display:none'))"
To this:
"//span[contains(#style,'display:none')]"
UPDATE:
Alternatively, since the span element is not visible, you might be able to do it with:
String innerHTML = elem.getAttribute("innerHTML");
Where elem is the parent node of the span element.
Then, in order to get the actual text, you will need to parse the innerHTML string.
Because the element is invisible (it has display:none), Selenium cannot natively interact with it. You need cast your driver to JavascriptExecutor, then execute the following javascript:
$x("//span(contains(#style,'display:none'))")[0].text
The [0] returns the 1st element returned by the xpath.
This will return the inner text of the element.

XPath to find element based on another XPath element

I have an Java AST and I try to find a variable inside it via XPath.
Lets say the variable is called 'foobar' I could use
//VariableDeclarator/VariableDeclaratorId[#Image='foobar']
but what if I dont know the text 'foobar', but want to read it from another element
//VariableDeclarator/VariableDeclaratorId[#Image=//SynchronizedStatement/Expression/PrimaryExpression/PrimaryPrefix/Name]
the 'Name' node has the information 'foobar' in #Image, but PrimaryPrefix/Name[#Image] does not work.
How must I rewrite the condition //SynchronizedStatement/Expression/PrimaryExpression/PrimaryPrefix/Name that it is the same as #Image='foobar' ?
Thanks
Try this XPath:-
//VariableDeclarator/VariableDeclaratorId[#Image=//SynchronizedStatement/Expression/PrimaryExpression/PrimaryPrefix/Name/#Image]

Categories