How do i get "this text" from the following html code using Jsoup?
<h2 class="link title"><a href="myhref.html">this text<img width=10
height=10 src="img.jpg" /><span class="blah">
<span>Other texts</span><span class="sometime">00:00</span></span>
</a></h2>
When I try
String s = document.select("h2.title").select("a[href]").first().text();
it returns
this textOther texts00:00
I tried to read the api for Selector in Jsoup but could not figure out much.
Also how do i get an element of class class="link title blah" (multiple classes?). Forgive me I only know both Jsoup and CSS a little.
Use Element#ownText() instead of Element#text().
String s = document.select("h2.link.title a[href]").first().ownText();
Note that you can select elements with multiple classes by just concatenating the classname selectors together like as h2.link.title which will select <h2> elements which have at least both the link and title class.
Related
i want to get text All New Products and Launches i am trying this
driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2/span")).getText();
but it is printing Folder
<h2 class="iselmf-h2-icon">
<span>Folder:</span>
All New Products and Launches
</h2>
The text is in the <h2> tag, not the <span> tag. Try
driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2")).getText();
Two ways to do this,
1) driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2")).getText();
2) Get the text from both and then concatenate it, something like
String firstblock = driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2")).getText();
String secondblock = driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2/span")).getText();
String finaltext= firstblock+secondblock;
Look at the HTML tags
<span>Folder:</span>
The /span indicates that this is the end of the span, so if you're using the span as the identifier then it will only return folder.
The parent element of the span is the h2 tag. So to get everything between h2 and /h2 tags you need to do this -
driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2")).getText();
I had quite similar issue when getting the text from span attribute. The solution was to use .getAttribute("value") like this:
driver.findElement(By.xpath(".//*[#id='iselmf-folder-detail']/h2/span")).getAttribute("value");
**String org.openqa.selenium.WebElement.getAttribute(String name)**
I would like to select the text inside the strong-tag but without the div under it...
Is there a possibility to do this with jsoup directly?
My try for the selection (doesn't work, selects the full content inside the strong-tag):
Elements selection = htmlDocument.select("strong").select("*:not(.dontwantthatclass)");
HTML:
<strong>
I want that text
<div class="dontwantthatclass">
</div>
</strong>
You are looking for the ownText() method.
String txt = htmlDocument.select("strong").first().ownText();
Have a look at various methods jsoup have to deal with it https://jsoup.org/apidocs/org/jsoup/nodes/Element.html. You can use remove(), removeChild() etc.
One thing you can do is use regex.
Here is a sample regex that matches start and end tag also appended by </br> tag
https://www.debuggex.com/r/1gmcSdz9s3MSimVQ
So you can do it like
selection.replace(/<([^ >]+)[^>]*>.*?<\/\1>|<[^\/]+\/>/ig, "");
You can further modify this regex to match most of your cases.
Another thing you can do is, further process your variable using javascript or vbscript:-
Elements selection = htmlDocument.select("strong")
jquery code here:-
var removeHTML = function(text, selector) {
var wrapped = $("<div>" + text + "</div>");
wrapped.find(selector).remove();
return wrapped.html();
}
With regular expression you can use ownText() methods of jsoup to get and remove unwanted string.
I guess you're using jQuery, so you could use "innerText" property on your "strong" element:
var selection = htmlDocument.select("strong")[0].innerText;
https://jsfiddle.net/scratch_cf/8ds4uwLL/
PS: If you want to wrap the retrieved text into a "strong" tag, I think you'll have to build a new element like $('<strong>retrievedText</strong>');
I find elements either by their ID or tag or etc. But my element is in a body tag with no tags at all, how can I find this? I know it is in the body tag but there are other elements too! The "text I want to find" is a php error displayed and I am hoping to catch that. I usually go writing WebElement x = driver.findElement(By.??); I cant proceed because I am uncertain what to do.
Sample HTML doc
<head></head>
<body>
Text I want to find
<div>xx</div>
<div>yy</div>
</body>
The reason for the java tag is, I am using Java to write my code?
In your situation I'd have used "context item expression" i.e. a .(dot) operator. So if I write an Xpath like this:
//div[contains(.,'Text To Be Searched')]
Then it will find all div elements which contain text Text To Be Searched. For you my answer would be
driver.findElement(By.xpath("//body[contains(.,'Text I want to find')]"));
You should add that text inside p tag and then you can write :
WebElement x = driver.getElementByTag('p');
I have a html file like
<div class="student">
<h4 id="Classnumber100" class="studentheading">
<a id="studentlink22" href="/grade8/greg">22. Greg</a>
</h4>
<div class="studentcategories">
<div class="studentneighborhoods">
</div>
</div>
</div>
I want to use JSOUP to get the url = /grade8/greg and "22. Greg".
I tried with selector
Elements listo = doc.select("h4 #studentlink22");
I am not able to get the values.
Actually I want to select based on Classnumber100
There are 300 records in the HTML page , with the only thing consistent is " Classnumber100.
So I want my selector to select all the hrefs and text after classnumber100.
How can I do that.
I tried
doc.select("class#studentheading"); and many other possibilities but they are not working
First of all, multiple elements should not share the same id, so each of these elements should not have the id Classnumber100. However, if this is the case, then you can still select them using the selector [id=Classnumber100].
If you're only interested in the a tags inside, then you can use [id=Classnumber100] > a.
Upon re-reading the question, it appears that the h4 tags you're interested in share the class attribute of studentheading. In which case you can use the class selector, ie
doc.select(".studentheading > a")
The select method looks for the html tag, here h4 and a, and then secondarily the attributes if you tell it to do so. Have you gone to the jsoup site as the use of select is well described for this situation.
e.g.
// code not tested
Elements listo = doc.select("h4[id=Classnumber100]").select("a");
String text = listo.text(); // for "22. Greg"
String path = listo.attr("href"); // for "/grade8/greg"
.
I have multiple div's in a webpage URL that I have to parse which have the same class name but different names with no id's.
for eg.
<div class="answer" style="display: block;" name="yyy" oldblock="block" jQuery1317140119108="11">
and
<div class="answer" style="display: block;" name="xxx" oldblock="block" jQuery1317140119108="11">
I want to select data and parse from only one of the div's say namely (name="yyy") (the content inside the div's are <href> links which differ for each class.
I've looked up the selector syntax in the Jsoup webpage but can't get a way to work around it. Can you please help me with this or let me know if I'm missing something?
Use the [attributename=attributevalue] selector.
Elements xxxDivs = document.select("div.answer[name=xxx]");
// ...
Elements yyyDivs = document.select("div.answer[name=yyy]");
// ...