Get first level element of HTML document - Java - java

I want to get the first level elements of the HTML tag <wicket:extend> in below document. I am using Jericho API for html parsing but didn't find any method/way to get the first level elements.
<body>
<wicket:extend>
<div wicket:id="container1"> some elements</div>
<div wicket:id="container2">
<h2>This is container 2</h2>
</div>
<div wicket:id="container3">
<h2>This is container 3</h2>
</div>
<div id="panel2">
<h2>This is panel2</h2>
</div>
<h3>This is heading3</h3>
</wicket:extend>
Expected output
<div wicket:id="container1">
<div wicket:id="container2">
<div wicket:id="container3">
<div id="panel2">
<h3>This is heading3</h3>

Related

How to get specific sub-elements of html data using Jsoup

So I am trying to get all prices from a Html file using Jsoup. The simplified Html is structured something like this:
//some html
<div class="price-point-wrap use-roundtrippricing">
<div class="price-point-wrap-top use-roundtrippricing">
<div class="pp-from-total use-roundtrippricing">Roundtrip</div>
</div>
<div class="price-point price-point-revised use-roundtrippricing">
$509
</div>
<div class="fare-select-button-div">
<input type="button" aria-describedby="sr_product_ECONOMY_123-745|1975-UA" value="Select" class="fare-select-button">
<span class="visuallyhidden">fare for Economy (lowest)</span>
</div>
</div>
//some html
<div class="price-point-wrap use-roundtrippricing">
<div class="price-point-wrap-top use-roundtrippricing">
<div class="pp-from-total use-roundtrippricing">Roundtrip</div>
</div>
<div class="price-point price-point-revised use-roundtrippricing">
$1,046
</div>
<div class="fare-select-button-div">
<input type="button" aria-describedby="sr_product_MIN-BUSINESS-OR-FIRST_123-745|1975-UA" value="Select" class="fare-select-button">
<span class="visuallyhidden">fare for First (2-cabin, lowest)</span>
</div>
<div class="pp-remaining-seats">​5 tickets left at this price​</div>
</div>
//some html
This is what I have tried so far:
File input = new File("Flights.html");
Document document = Jsoup.parse(input, "UTF-8", "");
Elements prices = document.getElementsByClass("price-point");
for(Element e: prices){
System.out.println(e.toString());
}
This gives me the following result:
<div class="price-point price-point-revised use-roundtrippricing">
$509
</div>
<div class="price-point price-point-revised use-roundtrippricing">
$1,046
</div>
.....
But now I only want prices like:
509
1046
I tried regex by only keeping the digits e.toString().replaceAll("\\D+","") when printing it, this seems to work but that is not how I want to achieve it. How can I get only the numbers using Jsoup?
Thanks to the comment from #Eritrean, I needed to use e.text() instead of e.toString()which gave me
$509
$1,046
I still need to use regex like e.replaceAll("[$,]", "") to get rid of the dollar signs.

parse data of certain tag which is before a particular class

I need parse data from web page by tag ("p"). I try like this:
Elements content = document.getElementsByTag("p");
for(Element el : content) {
System.out.println(el.text());
}
And it's work fine. But I get superfluous data.
For example:
<div class="DicCellTerm">
<h1>Impossible</h1>
<div class=des>
<p class=par2><span class=hint><em>smth</em></span></p>
<p class=par2>1) (<em>with</em>) all, do</p>
<p class=par2>2) <span class=hint><em>text</em></span> some words</p>
<p class=par3>it is impossible</p>
</div>
</div>
</div><!--DicCell end-->
<div align="center" class="AdContent" id="adcontentnoprint">
<div class=SharedItems>
<div class=DicCellParent>
<span class=LinkOtherDic>+ dictionary <strong>impossible</strong> - translate</span>
<div class=DicCellOther id=diccellothershow>
<h2>impossible</h2>
<div class=des>
<p class=par1>1) important, is</p>
<p class=par1>what</p>
<p class=par1>2) true, false</p>
</div>
</div>
<!--DicCellOther end-->
</div>
<!--DicCellParent end-->
<div class=DicCellParent>
<span class=LinkOtherDic>+ translate <strong>important</strong> - dictionary</span>
<div class=DicCellOther id=diccellothershow>
<h2>importnant</h2>
<div class=des>
<p class=par1>1) müim, emiyetli; emiyet bar</p>
<p class=par1>it is very important - bu pek müimdir, bunıñ büyük emiyeti bar</p>
<p class=par1>2) qopayıp, qabarıp</p>
</div>
</div>
<!--DicCellOther end-->
</div>
<!--DicCellParent end-->
</div>
<!--SharedItems end-->
I need to get data by tag "p" before class SharedItems.
I tried parse data by class "DicCellTerm" and I get properly data. And all data is written in one line, but I need to get data as on web page.
Elements elements = document.select(".DicCellTerm p");
This grabs all p inside the .DicCellTerm class, then you can iterate over elements. Here is a link to all possible selectors in jsoup, this is where i get most of my help =)
https://jsoup.org/apidocs/index.html?org/jsoup/select/Selector.html

Selenium - Find element using xpath or cssSelector

I need to click on or find element "Compute vmSwitch". I tried many ways using xpath (class & contains), cssSelector as well, but could not able to locate element:
driver.findElement(By.xpath("//span[contains(#class,'nopadding vm-create-text-style-3 block-with-text-4 ng-binding') and contains(text(), 'Compute vmSwitch')]")).click();
The code is given below:
<div class="w-full"><br>
<img class="img-responsive center-block m-t-47" src="/src/icon/background/create_vm_img5.png">
<div class="col-md-12 m-t-md wordwrap">
<p class="nopadding vm-create-text-style-3 block-with-text-4 ng-binding">
Compute vmSwitch</p>
</div>
Why do you try with the span tag?
If this is your html:
<html>
<head></head>
<body>
<div class="w-full">
<br>
<img class="img-responsive center-block m-t-47" src="/src/icon/background/create_vm_img5.png">
<div class="col-md-12 m-t-md wordwrap">
<p class="nopadding vm-create-text-style-3 block-with-text-4 ng-binding"> Compute vmSwitch</p>
</div>
</div>
</body>
</html>
you could try:
WebElement elem2= driver.findElement(By.xpath("//div[#class='w-full']"));
elem2.findElement(By.xpath(".//p[text()=' Compute vmSwitch']")).click();

position() function brings me wrong data

I am using Selenium and Java to write a test, I have a DOM below:
<body>
<div class='t'><span>1</span></div>
<div class='t'></div>
<div class='t'><span>2</span></div>
<div class='t'><span>3</span></div>
<div class='t'><span>4</span></div>
<div class='t'><span>5</span></div>
<div class='t'><span>6</span></div>
<div class='t'><span>7</span></div>
</body>
why the result is the same for both:
//div[position()>1 and #class='t' and .//span ]
and
//div[position()>2 and #class='t' and .//span ]
and the result is:
<div class="t">
<span>2</span>
</div>
<div class="t">
<span>3</span>
</div>
<div class="t">
<span>4</span>
</div>
<div class="t">
<span>5</span>
</div>
<div class="t">
<span>6</span>
</div>
<div class="t">
<span>7</span>
</div>
my expectation for the first xpath is the same but for the second one I think it should be:
<div class="t">
<span>3</span>
</div>
<div class="t">
<span>4</span>
</div>
<div class="t">
<span>5</span>
</div>
<div class="t">
<span>6</span>
</div>
<div class="t">
<span>7</span>
</div>
I jus figured out that it the xpath should be //div[ #class='t' and .//span ][position()>2] so it first selects all div having t as their class attribute and at least one <span> tag inside and then it gets the array of webelement after the first position
Below xpath:
//div[position()>1 and #class='t' and .//span ]
clearly specifying that the div should contains class='t', a span tag and its position should be greater than 1. There is no span tag in 2nd div. So that above xpath prints result from third div.
Mean while the Below xpath:
//div[position()>2 and #class='t' and .//span ]
also specifying that the div should contains class='t', a span tag and its position should be greater than 2. Means result starts from again third div.
div in third position is
<div class='t'><span>2</span></div>
It contains class='t' and a span tag, and also position of div is greater than 2.

How do you get the value inside an XML document?

In below XML I need to confirm "Internet" is there.
<section id="landing-content">
<div id="header">
<div class="container">
<div class="row">
<div class="span12">
<h1 class="theme--primary">Internet</h1>
</div>
</div>
</div>
</div>
I tried the following:
WebElement findInternet = driver.findElement(By.xpath("//h1"));
System.out.println(findInternet);
I think this will work for you:
WebElement findInternet = driver.findElement(By.cssSelector("h1.theme--primary"));
System.out.println(findInternet.getText());
Your xpath selector will probably work as well, the key thing you were missing is your println was printing the findInternet object. getText() will get the inner text of the selected element.`

Categories