I trying to write xpath for a randomly generated name or number(which is usually the project name), which produces xpath like below:
//*[#id="job_10"]/td[3]/a
//*[#id="job_11"]/td[3]/a
//*[#id="job_12"]/td[3]/a
10,11,12 are the project numbers and can be words too.
Any suggestions?
Try to use below XPath expression:
//*[starts-with(#id, "job_")]/td[3]/a
This should allow to match elements with id attribute which starts with job_ and ends with string or number...whatever
Are you trying to select an html element? If so, you might be better of using a css selector instead of xpath.
If you have the html as a String, you can use JSoup to parse the document, then use a css selector to find the last row in the table.
String htmlDocument = ...
Document doc = Jsoup.parse(htmlDocument);
Element anchor = doc.select("tr[id^=job_]:last-child td:nth-child(3) a");
Related
Need to get text 6537
Tried many xpaths:
driver.findElement(By.xpath("//b[contains(text(),'Client ID')]")).getText()
It just gives text Master Client Id not 6537.
if we change xpath to //b[contains(text(),'Client ID')]/text()
Then selenium gives error as below
The result of the xpath expression "//b[contains(text(),'Client ID')]/text()" is: [object Text]. It should be an element.
Based on the screenshot it seems like the text you are trying to capture is not part of the tag. It is the text of the parent element.
What you need to do is to have get the text of the entire parent div and use a regex to extract the number out.
String parentText = driver.findElement(By.xpath("//b[contains(text(),'Client ID')]")).getText()
// find and remove any non digit characters
String number = parentText.replaceAll("\\D+","");
I want to extract two tags from a website beside each others(adjacently), the first tag is a href and it should be extracted as the the absolute url . the second tag is a div tag and I should extract
the data inside it.
I want the output to be as the following
100 USD http:\www.somesite..............
200 usd http:\www.thesite.............
Why? because later I will insert them into a table in a database .
I tried with the following code but I couldn't get the absolute url in addition I couldn't get rid of the tags while I want to extract the data only (without tags).
Document doc = Jsoup.connect("http://www.bezaat.com/ksa/jeddah/cars/all/1?so=77").get();
for (Element link : doc.select("div.rightFloat.price,a[abs:href].more-details"))
{
String absHref = url.attr("abs:href");
String attr = link.absUrl("href");
System.out.println(link);
}
If I try using
System.out.println(link.text())
in my code I will miss the hyperlink completely !
Any help please?
I don't think that Jsoup css selector combinators (i.e. the comma in the selector) guarantees an ordering in the output. At least I would not count on it, even if you find the two elements in the ordering you expect. Instead of using the comma selector, I would first loop over the outer containers that hold the adjacent divs you are interested in. Within each div you can then access the price and link.
something like this. Note, that this is out of my head and untested!
Document doc = Jsoup.connect("http://www.bezaat.com/ksa/jeddah/cars/all/1?so=77").get();
for (Element adDiv : doc.select("div.category-listing-normal-ad")){
Element priceDiv = adDiv.select("div.rightFloat.price").first();
Element linkA = adDiv.select("a.more-details").first();
System.out.println(priceDiv.text() + " " + linkA.absUrl("href"));
}
For example i have html content like this.
<div>go to the text from here.<br> from there <br> Go to the text</div>
In the above content, i want to insert span tag for the word alone Like the below output using java.
I'm using org.w3c.dom package.
I tried but not able to make success
Element e = doc.createElement("span");
String text = preElement.getTextContent();
if(text.indexOf("text"){
e.setTextContent("text");
}
// Afterwards how to insert this to document. How to use insertBefore method for the //inbetween text.
Expected Output:
<div>go to the <span>text</span> from here.<br> from there <br> Go to the <span>text</span></div>
Please help.
You have to use the splitText method on your text node to split it into three nodes, isolating the word you need to wrap in your element. Then you only have to replace the text node you just isolated (use replaceChild) with the new element. There is no need to create a new text node, you can simply put the one you removed in the element you added.
Java implementation reference: http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Text.html#splitText%28int%29 http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html#replaceChild%28org.w3c.dom.Node,%20org.w3c.dom.Node%29.
I have below string in html and I want to build Dom tree and get name value pair. How i can do this using html parser or xml parser or REGEXP. any code snippet will be useful. Thanks
<$$TagStarts>
<==0>Name0</==0><##0>Value0</##0>
<==1>Name1</==1><##1>Value1</##1>
<==2>Name2</==2><##2>Value2</##2>
<==3>Name3</==3><##3>Value3</##3>
<==4>Name4</==4><##4>Value4</##4>
<==5>Name5</==5><##5>Value5</##5>
</$$TagStarts>
Assuming the tag names are just for sample.... and you will have some meaningful tag names...
Try using any of the following HTML parsers...
http://home.ccil.org/~cowan/XML/tagsoup/
http://nekohtml.sourceforge.net/
http://jtidy.sourceforge.net/
They will give you the W3 compliant document object.... After this it is just a game of getElementsByTagName or getElementById or Use XPath or Xquery to get the elements from the DOM.
Otherwise you can use the following... They have their own document object implementation...
http://htmlcleaner.sourceforge.net/ [It also has some basic XPath support]
http://jsoup.org/ [It has jquery like query API]
ADD
Check this...
http://jsoup.org/cookbook/extracting-data/selector-syntax
I will recommend ... Either JSoup or Nekohtml
I'm trying to use dom4j to parse an xhtml document. If I simply print out the document I can see the entire document so I know it is being loaded correctly. The two divs that I'm trying to select are at the exact same level in the document.
html
body
div
table
tbody
tr
td
table
tbody
tr
td
div class="definition"
div class="example"
My code is
List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");
but the list is empty when i do System.out.println(list);
If i only do List<Element> list = document.selectNodes("//html"); it does actually return a list with one element in it. So I'm confused about whats wrong with my xpath and why it won't find those divs
Try declaring the xhtml namespace to the xpath, e.g. bind it to the prefix x and use //x:html/x:body... as XPath expression (see also this article which is however for Groovy, not for plain Java). Probably something like the following should do it in Java:
DefaultXPath xpath = new DefaultXPath("//x:html/x:body/...");
Map<String,String> namespaces = new TreeMap<String,String>();
namespaces.put("x","http://www.w3.org/1999/xhtml");
xpath.setNamespaceURIs(namespaces);
list = xpath.selectNodes(document);
(untested)
What about just "//div"? Or "//html/body/div/table/tbody"? I've found long literal XPath expressions hard to debug, as it's easy for my eyes to get tricked... so I break them down until it DOES work and then build back up again.
An alternative could be: -
//div[#class='definition' or #class='example']
This searches for "div" elements, anywhere in the document with "class" attributes values equal to "definition" or "example".
I find this approach more clearly illustrates what you are trying to retrieve from the page. An added benefit is if the structure of the page changes, but the div classes stay the same, then your xpath doesn't need to be updated.
You can also check your xpath works against an HTML document using the following firefox plugin which is very useful.
Firefox Plugin - XPath Checker 0.4.4