Get number of words in a text - java

I'm using Java and Selenium, and I have to extract the number of words in a specific text. I'm stuck because I get more results than I expected.
Considering the following HTML
<div data-v-2f952c88="" class="text1">
<section data-v-3b70ad5b="" data-v-2f952c88="" data-content-provider="ABC" class="description__section">
<div data-v-051a83e7="" data-v-3b70ad5b="" class="markdown" data-v-2f952c88="">
<p>Headline 1
Hello everyone i´m new at stack overflow</p>
<p> And I need your help
to get the total of words in this exemple
</p>
</div>
</section>
<section data-v-3b70ad5b="" data-v-2f952c88="" data-content-provider="DEF" class="description__section">
<div data-v-051a83e7="" data-v-3b70ad5b="" class="markdown" data-v-2f952c88="">
<p>I Love Coding
I use Java</p>
<p> Another Text
And Selenium
</p>
</div>
</section>
</div>
<div data-v-2f952c99="" class="querty">
<section data-v-3b755ad5b="" data-v-2f952288="" data-content-provider="DEF" class="description__section">
<div data-v-051a18e7="" data-v-3b789d5b="" class="markdown" data-v-2f962c88="">
<p>This is another text along the WEBPAGE
I don´t want to count this words in my total count</p>
</div>
</section>
</div>
In Java I've created this function:
private String countWords(WebDriver driver){
int totalLetters = 0;
try{
List<WebElement> className = driver.findElements(By.cssSelector("[class*='text1']"));
for(WebElement classElement: className){
if(classElement!=null) {
String[] tags = {"p", "section"};
for (String tag: tags) {
List<WebElement> elements = driver.findElements(By.tagName(tag));
for (WebElement element: elements) {
String text=element.getText();
String[] words = text.split("\\s+");
if (words!=null) {
totalLetters = totalLetters + words.length;
}
}
}
}
}
}
catch(NoSuchMethodError e){
//e.printStackTrace();
throw e;
}
String s=String.valueOf(totalLetters);
System.out.println("How many word? " + s);
return s;
So my problem is that my function is extracting all the words inside every "p" and "section" tags in the webpage and I only wanted the "p" and "section" inside the first "div ..... class="text1" ".
What am I doing wrong?

Please refer to the image to check why it gives count of all 'p' and 'section' tag
Is this helpful to find your problem ?
Or your problem is that it is also giving the counts of class ='querty'?
<div data-v-2f952c99="" class="querty">
<section data-v-3b755ad5b="" data-v-2f952288="" data-content-provider="DEF" class="description__section">
<div data-v-051a18e7="" data-v-3b789d5b="" class="markdown" data-v-2f962c88="">
<p>This is another text along the WEBPAGE
I don´t want to count this words in my total count</p>
</div>
</section>
</div>

Related

Iterate <div> inside <ul> tag Java - Jsoup

I'm trying to get all <div> inside a <ul> tag using jsoup.
This is the HTML
<html>
<head>
<title>Try jsoup</title>
</head>
<body>
<ul class="product__listing product__grid">
<div class="product-item">
<div class="content-thumb_gridpage">
<a class="thumb" href="index1.html" title="Tittle 1">
</div>
</div>
<div class="product-item">
<div class="content-thumb_gridpage">
<a class="thumb" href="index2.html" title="Tittle 2">
</div>
</div>
<div class="product-item">
<div class="content-thumb_gridpage">
<a class="thumb" href="index3.html" title="Tittle 3">
</div>
</div>
</ul>
</body>
</html>
What I'm trying to iterate is all <div class="product-item"> so then I can add to a list all <a class="thumb"> properties
List-product-details
[0] href="index1.html" title="Tittle 1"
[1] href="index2.html" title="Tittle 2"
[2] href="index3.html" title="Tittle 3"
Note that there can be 'N' product-item div
Here is What I got so far:
Elements productList = sneakerList.select("ul.product__listing product__grid");
Elements product = productList.select("ul.product-item");
for (int i = 0; i < product.size(); i++) {
Elements productInfo = product.get(i).select("div.product-item").select("div.content-thumb_gridpage").select("a.thumb");
System.out.format("%s %s %s\n", productInfo.attr("title"), productInfo.attr("href"), productInfo.text());
}
Did you try debugging line by line and checking at which line your code doesn't do what you expect?
I see two mistakes.
The first selector "ul.product__listing product__grid" contains a space. Now it means: find element ul with class product__listing and inside search for element <product__grid> </product__grid>. You probably meant: select element ul having class product__listing and having class product__grid. You have to use dot . before second class name and remove space to look at the same level. So correct selector will be: "ul.product__listing.product__grid".
Second selector you're using is "ul.product-item". It will return empty result. That's because you're already inside ul and you're searching for another ul. Selector should be relative to where you are so using only ".product-item" will be enough.
And now I get the ouput:
Tittle 1 index1.html
Tittle 2 index2.html
Tittle 3 index3.html

Java selenium - skip span element within div

Here's my html structure and I am trying to skip a span within a div to get div's text only (which is dynamic) for testing.
<div class="items">
<div class="payment void" id="payment-000000899799">
<div class="payment__details">
<div class="method">CASH <span class="label"><span>Payment Voided</span></span></div>
<div class="date">2/12/2021, 3:02:15 PM</div>
</div>
<div class="payment__details">
<div class="amount">$20.00</div>
<div class="ref">Ref ID: REF-ID-01</div>
</div>
</div>
<div class="payment sale" id="payment-000000899806">
<div class="payment__details">
<div class="method">CASH </div>
<div class="date">2/12/2021, 3:02:21 PM</div>
</div>
<div class="payment__details">
<div class="amount">$100.00</div>
<div class="ref">Ref ID: REF-ID-02</div>
</div>
</div>
</div>
In my step definition I've
List<List<String>> data = capturedData.raw();
WebElement paymentDetails = driver.findElement(By.xpath("(//*[#class='payment__details']/div[#class='method'])[" + data.get(1).get(0) + "]"));
String paymentType = paymentDetails.getText();
System.out.println(paymentType); //This prints CASH Payment Voided
But actually I want only 'CASH' which is a text of div and skip the text 'Payment Voided' of span. And data is coming from a feature file.
How do I get text of div only and skip the text span which is inside the same div?
You cannot skip it directly since it is part of element.
1st Approach
String fulltext = driver.findElement(By.xpath("//*[#class='payment__details']/div[#class='method'][1]")).getText();
String spantext = driver.findElement(By.xpath("//*[#class='payment__details']/div[#class='method'][1]/span")).getText();
//Now replace span text with ""
String output = fulltext.replace(span,"");
2nd Approach
WebElement parent = driver.findElement(By.xpath("//*[#class='payment__details']/div[#class='method'][1]"));
JavascriptExecutor js = (JavascriptExecutor) driver;
String output = (String) js.executeScript("return arguments[0].firstChild.textContent",parent);

Get the total number of letters in a text

I have to extract to total number of letters in this following example.
<div data-v-2f952c88="" class="text">
<section data-v-3b70ad5b="" data-v-2f952c88="" data-content-provider="ABC" class="description__section"><!---->
<div data-v-051a83e7="" data-v-3b70ad5b="" class="markdown" data-v-2f952c88="">
<p>Headline 1<br>
This is my first example</p>
<p> Another Text
this is onother example
</p>
</div>
</section>
<section data-v-3b70ad5b="" data-v-2f952c88="" data-content-provider="DEF" class="description__section">
<div data-v-051a83e7="" data-v-3b70ad5b="" class="markdown" data-v-2f952c88="">
<p>Headline 2<br>
Java Rocks</p>
<p> Another Text
Selenium also rocks
</p>
</div>
</section>
</div>
How can I extract all the letters inside the several tags "p" that are under several tags "section"?
have you tried to iterate throught all elements something like this(didn´t look the java syntasys but adapt it for yourself jus take the idea)
foreach(IwebElement element in driver.findElements(By.Tag("p"))){
//Work with the element.Text
}
What about this?
String[] tags = {"p", "section"};
int totalLetters = 0;
for (String tag: tags) {
List<WebElement> elements = driver.findElements(By.tagName(tag));
for (WebElement element: elements) {
totalLetters = totalLetters + element.getText().length();
}
}
how are you? well... first of all do you read something about HTML DOM?
in Javascript Using DOM you can do something like this:
var myCollection = document.getElementsByTagName("p");
Next you will have something like an collection of "p" tags
You can access them by index number: y = myCollection[1]; or loop it:
var i;
for (i = 0; i < myCollection.length; i++) {
//do something with myCollection...
}
Your example can look something like:
var myCollection = document.getElementsByTagName("p");
var i;
var added = 0;
for (i = 0; i < myCollection.length; i++) {
added += myCollection[i].innerText.length;
}
alert(added);
<html>
<head>
</head>
<body>
<div data-v-2f952c88="" class="text">
<section data-v-3b70ad5b="" data-v-2f952c88="" data-content-provider="ABC" class="description__section"><!---->
<div data-v-051a83e7="" data-v-3b70ad5b="" class="markdown" data-v-2f952c88="">
<p>Headline 1<br>
This is my first example</p>
<p> Another Text
this is onother example
</p>
</div>
</section>
<section data-v-3b70ad5b="" data-v-2f952c88="" data-content-provider="DEF" class="description__section">
<div data-v-051a83e7="" data-v-3b70ad5b="" class="markdown" data-v-2f952c88="">
<p>Headline 2<br>
Java Rocks</p>
<p> Another Text
Selenium also rocks
</p>
</div>
</section>
</div>
</body>
</html>
I hope you find it useful!

How to find same element from item grid using loop in java selenium?

I am trying to find button add to cart is present or not using loop from all item box from following code
<div class="page-body">
<div class="product-selectors">
<div class="product-filters-wrapper">
<div class="product-grid">
<div class="item-box">
<div class="item-box">
<div class="item-box">
<div class="item-box">
</div>
in each item box folowing code
<div class="item-box">
<div class="product-item" data-productid="20">
<div class="picture">
<div class="details">
<h2 class="product-title">
<div class="product-rating-box" title="1 review(s)">
<div class="description"> 12x optical zoom; SuperRange Optical Image Stabilizer </div>
<div class="add-info">
<div class="prices">
<div class="buttons">
<input class="button-2 product-box-add-to-cart-button" type="button" onclick="AjaxCart.addproducttocart_catalog('/addproducttocart/catalog/20/1/1 ');return false;" value="Add to cart">
</div>
</div>
</div>
</div>
</div>
I need to find that all itembox have add to cart button present or not using loop. if anyone can help please
I suggest to avoid looping if not necessary. You do not need to do the loop to find out unless there is an explicit need of doing so. You can find the count of Add to cart button and compare with a known value
By byCss = By.cssSelector(".item-box>div input[value='Add to cart']");
int cartCount = driver.findElements(byCss).size();
if (cartCount != 4){
//fail the test
}
If you exactly one to looping and check if the input button exist or not.
By itemBoxes = By.className("item-box");
By button = By.cssSelector("[type='button'][value='Add to cart']");
List<WebElement> webElementList = driver.findElements(itemBoxes);
for (WebElement element: webElementList){
//simply taking size if exist it will return 1
if (element.findElements(button).size() != 1){
//fail
}
}
you can use searching by xpath inside of the loop.
Something like
".//input[#value='Add to cart'][1]"
".//input[#value='Add to cart'][2]"
".//input[#value='Add to cart'][3]"
etc
not sure that this xpath is correct, but generally it will work for you, bro.
Or something like this:
string xpath=".//input[#value='Add to cart']";
var AddToCartBtnsList = driver.findElements(By.Xpath(xpath));
foreach(IWebElement button in AddToCartBtnsList )
{
button.click();
}

How do I correctly parse data using JSoup (java)

I want to parse the data out of this HTML (CompanyName, Location, jobDescription,...) using JSoup (java). I get stuck when trying to iterate the joblistings
The extract from the HTML is one of many "JOBLISTING" divs which I want to iterate and extract the Data out of it. I just can't handle how to iterate the specific div objects. Sorry for this noob question, but maybe someone can help me who already knows which function to use. Select?
<div class="between_listings"><!-- local.spacer --></div>
<div id="joblisting-2944914" class="joblisting listing-even listing-even company-98028 " itemscope itemtype="http://schema.org/JobPosting">
<div class="company_logo" itemprop="hiringOrganization" itemscope itemtype="http://schema.org/Organization">
<a href="/stellenangebote-des-unternehmens--Delivery-Hero-Holding-GmbH--98028.html" title="Jobs Delivery Hero Holding GmbH" itemprop="url">
<img src="/upload_de/logo/D/logoDelivery-Hero-Holding-GmbH-98028DE.gif" alt="Logo Delivery Hero Holding GmbH" itemprop="image" width="160" height="80" />
</a>
</div>
<div class="job_info">
<div class="h3 job_title">
<a id="jobtitle-2944914" href="/stellenangebote--Junior-Business-Intelligence-Analyst-CRM-m-f-Berlin-Delivery-Hero-Holding-GmbH--2944914-inline.html?ssaPOP=204&ssaPOR=203" title="Arbeiten bei Delivery Hero Holding GmbH" itemprop="url">
<span itemprop="title">Junior Business Intelligence Analyst / CRM (m/f)</span>
</a>
</div>
<div class="h3 company_name" itemprop="hiringOrganization" itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">Delivery Hero Holding GmbH</span>
</div>
</div>
<div class="job_location_date">
<div class="job_location target-location">
<div class="job_location_info" itemprop="jobLocation" itemscope itemtype="http://schema.org/Place">
<div class="h3 locality" itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="addressLocality"> Berlin</span>
</div>
<span class="location_actions">
<a href="javaScript:PopUp('http://www.stepstone.de/5/standort.html?OfferId=2944914&ssaPOP=203&ssaPOR=203','resultList',800,520,1)" class="action_showlistingonmap showlabel" title="Google Maps" itemprop="maps">
<span class="location-icon"><!-- --></span>
<span class="location-label">Google Maps</span>
</a>
</span>
</div>
</div>
<div class="job_date_added" itemprop="datePosted"><time datetime="2014-07-04">04.07.14</time></div>
</div>
<div class="job_actions">
</div>
</div>
<div class="between_listings"><!-- local.spacer --></div>
File input = new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"); // Load file into extraction1 Document ParseResult = Jsoup.parse(input, "UTF-8", "http://example.com/"); Elements jobListingElements = ParseResult.select(".joblisting"); for (Element jobListingElement: jobListingElements) { jobListingElement.select(".companyName span[itemprop=\"name\"]"); // other element properties System.out.println(jobListingElements);
Java code:
File input = new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt");
// Load file into extraction1
Document ParseResult = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements jobListingElements = ParseResult.select(".joblisting");
for (Element jobListingElement: jobListingElements) {
jobListingElement.select(".companyName span[itemprop=\"name\"]");
// other element properties
System.out.println(jobListingElements);
}
Thank you!
So you got your Jsoup document right? Than it seems pretty easy if the css class joblisting does not appear anywhere else.
Document document = Jsoup.parse(new File("d:/bla.html"), "utf-8");
Elements elements = document.select(".joblisting");
for (Element element : elements) {
Elements jobTitleElement = element.select(".job_title span");
Elements companyNameElement = element.select(".company_name spanspan[itemprop=name]");
String companyName = companyNameElement.text();
String jobTitle = jobTitleElement.text();
System.out.println(companyName);
System.out.println(jobTitle);
}
I don't know why the attribute [itemprop*=\"name\"] selector does not find the span (Further reading: http://jsoup.org/cookbook/extracting-data/selector-syntax )
Got it: span[itemprop=name] without any quotes or escapes. Other attributes or values also should work to get a more specific selection.

Categories