jsoup to get a particular id from a html file

jsoup to get a particular id from a html file - java

I have a html file like
<div class="student">
<h4 id="Classnumber100" class="studentheading">
<a id="studentlink22" href="/grade8/greg">22. Greg</a>
</h4>
<div class="studentcategories">
<div class="studentneighborhoods">
</div>
</div>
</div>
I want to use JSOUP to get the url = /grade8/greg and "22. Greg".
I tried with selector
Elements listo = doc.select("h4 #studentlink22");
I am not able to get the values.
Actually I want to select based on Classnumber100
There are 300 records in the HTML page , with the only thing consistent is " Classnumber100.
So I want my selector to select all the hrefs and text after classnumber100.
How can I do that.
I tried
doc.select("class#studentheading"); and many other possibilities but they are not working

First of all, multiple elements should not share the same id, so each of these elements should not have the id Classnumber100. However, if this is the case, then you can still select them using the selector [id=Classnumber100].
If you're only interested in the a tags inside, then you can use [id=Classnumber100] > a.
Upon re-reading the question, it appears that the h4 tags you're interested in share the class attribute of studentheading. In which case you can use the class selector, ie
doc.select(".studentheading > a")

The select method looks for the html tag, here h4 and a, and then secondarily the attributes if you tell it to do so. Have you gone to the jsoup site as the use of select is well described for this situation.
e.g.
// code not tested
Elements listo = doc.select("h4[id=Classnumber100]").select("a");
String text = listo.text(); // for "22. Greg"
String path = listo.attr("href"); // for "/grade8/greg"
.

Related

How to get Css selector when using java?

I am having trouble in selecting a selector when I am trying to select it as a 'css-selector'
I have this selector:
<div role="button" class="jss300 jss299" tabindex="-1">
<span class="jss313">system-all</span></div>
</div>
and I am trying to get the css-selector from it, I tried this way:
"div[class~='system-paloaltonetworks']"
and my need is to get the text from the selector, in this case I want to get "system-paloaltonetworks" into string variable.
hope now the question is clear.

"system-paloaltonetworks" is the element text, not the class attribute (the class is jss313). You can't locate it with cssSelector you need to use xpath (you should also notice the element has span tag, not div)
driver.findElement(By.xpath("//span[text()='system-paloaltonetworks']"));

You are using class~= but aren't comparing with the class...
You try: driver.findElement(By.xpath("//*[#class='jss313']"));

Extract text from only some divs in the same class with jsoup

I would like to extract a text from specific <div> of a website using jsoup, but I'm not sure how.
The problem is, that I want to get a text from div that has a class="name".
But, there can be more <div>s with this class (and I don't want to get the text from those).
It looks like this in the HTML file:
.
.
<div class="name">
Some text I don't want
<span class="a">Tree</span>
</div>
.
.
<div class="name">Some text I do want</div>
.
.
So the only difference there is that the <div> I want the text from does not have <span> inside of it. But I have not found a way to use that as a key to extract the text in jsoup.
Is it possible?

Use JSoup's selector syntax. For instance to select all div's with class = "name" use
Elements nameElements = doc.select("div.name");
Note that your text you "do" and "don't" want above are in the same relative HTML locations, and in fact I have no clue why you want one or the other. HTML and JSoup will see them the same.
If you want to avoid elements containing span elements, then one way is to iterate through the elements obtained above and test by selector if they have span elements or not:
Elements nameElements = doc.select("div.name");
for (Element element : nameElements) {
if (element.select("span").isEmpty()) {
System.out.println("No span");
System.out.println(element.text());
System.out.println();
} else {
System.out.println("span");
System.out.println(element.text());
System.out.println();
}
}

You can select all div elements with class="name", and then loop through them. Check if an element has child elements - if not, this is the div you want.

How do I get this text using Jsoup?

How do i get "this text" from the following html code using Jsoup?
<h2 class="link title"><a href="myhref.html">this text<img width=10
height=10 src="img.jpg" /><span class="blah">
<span>Other texts</span><span class="sometime">00:00</span></span>
</a></h2>
When I try
String s = document.select("h2.title").select("a[href]").first().text();
it returns
this textOther texts00:00
I tried to read the api for Selector in Jsoup but could not figure out much.
Also how do i get an element of class class="link title blah" (multiple classes?). Forgive me I only know both Jsoup and CSS a little.

Use Element#ownText() instead of Element#text().
String s = document.select("h2.link.title a[href]").first().ownText();
Note that you can select elements with multiple classes by just concatenating the classname selectors together like as h2.link.title which will select <h2> elements which have at least both the link and title class.

How can I using Javascript Swap Out A h2 URL Destination with Limited Access to HTML?

I don't have access to my HTML code but I have access to Javascript in the footer of my document. With that being said I would like to switch out the URL "/vistor_signup" with a new URL of my choosing. Lets say "http://www.example.com/account_signup"
And I would also like to do the same for "/user_signup", lets say swap to "http://www.example.com/master_signup"
I have to use JavaScript to do so and I don't have any understanding of JS.
How do I make this work with JS code?
My code
<div class="grid_12">
<div id="login">
<div class="panel" id="login-form">
<div id="login-promo">
<div class="clear"></div>
<h2>Visitor Sign-Up ></h2>
<h2>User Sign-Up ></h2>
</div>
</div>
</div>
</div>
</div>

you mean something like this:
var anchors = document.body.getElementsByTagName("a");
for(var i=0; i < anchors.length; i++) {
var anc = anchors[i];
if (anc.getAttribute("href") == "/visitor_signup") {
anc.setAttribute("href", "http://www.example.com/account_signup");
}
}
WARNING: due to the way browser render HTML (parsing the page, semi-sequentially fetching referenced resources, evaluating javascript along the way), it might happen that someone sees the html before your script gets executed, and even clicks the '/visitor_signup' link.

Under your limitations, esp.
No access to code
No id tag on elements
your best bet is to
use document.body.GetElementsByTagName() to find all tags
on those check the href property
change it accordingly
EDIT: This is exactly what #milan's answer does, so please disregard this one

Since you can't edit the HTML and the <h2>s aren't differentiated, using jQuery might be easier than using plain JS in order to reach the elements.
This jQuery could be:
$('#login-promo h2:first a').attr("href", "/account_signup").parent().next().find('a').attr("href", "/master_signup");
Here we are selecting the first <h2> <a> and changing its href. Then we go back tho the <a>s parent, find the next <h2> <a>and change its href too.
You can check an example in this jsfiddle.

Select by "name" in JSoup

I have multiple div's in a webpage URL that I have to parse which have the same class name but different names with no id's.
for eg.
<div class="answer" style="display: block;" name="yyy" oldblock="block" jQuery1317140119108="11">
and
<div class="answer" style="display: block;" name="xxx" oldblock="block" jQuery1317140119108="11">
I want to select data and parse from only one of the div's say namely (name="yyy") (the content inside the div's are <href> links which differ for each class.
I've looked up the selector syntax in the Jsoup webpage but can't get a way to work around it. Can you please help me with this or let me know if I'm missing something?

Use the [attributename=attributevalue] selector.
Elements xxxDivs = document.select("div.answer[name=xxx]");
// ...
Elements yyyDivs = document.select("div.answer[name=yyy]");
// ...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

jsoup to get a particular id from a html file - java

Related

How to get Css selector when using java?

Extract text from only some divs in the same class with jsoup

How do I get this text using Jsoup?

How can I using Javascript Swap Out A h2 URL Destination with Limited Access to HTML?

Select by "name" in JSoup

Categories

Resources