JSoup - Select only one listobject - java

I'm trying to extract some certain data from a website using JSoup and Java. So far I've been successful in what I'm trying to achieve.
<ul class="beverageFacts">
<li><span>Årgång</span><strong>**2009** </strong></li>
I want to extract what is inside the ** in the above HTML. I can do this by using the code that follows in JSoup:
doc.select("ul.beverageFacts li:lt(1) strong");
I'm using the lt(1) because there are several more list items following that I want to omit.
Now to my problem; there's an optional information tab on the site I'm extracting data from, and it also has a class called "beverageFacts". My code will at the moment extract that data too, which I don't want it to do.
The code is further down in the source of the website, and I've tried to use the indexer :lt(1) here aswell, but it wont work.
<div id="beverageMoreFacts" style="display: block">
<ul class="beverageFacts"><li class="half">
<span> Färg</span><strong> Ljusgul färg.</strong>
My overall result is that I extract "2009 Ljusgul färg." instead of only "2009". How can I write my code so it will only extract the first part, which it succesfully does, and omits the rest?
EDIT:
I get the same result using:
doc.select("ul.beverageFacts li:eq(0) strong");
Thanks,
Z

You are qualifying only one part, whereas you should qualify both. Try this:
doc.select("ul.beverageFacts:eq(0) li:eq(0) strong");
What you are saying is: give me the first list item of each list of beverages. What you need to say instead is: Give me the first item of the first list of beverages.

Related

Using jsoup for extracting attributes from "a" inside "span" inside "class" for sports software

I´ve been reading all the questions i could find regarding jsoup and attributes, classes, spans and so on.. But none could help me get this data from this website.
I am working on some sports software and retrieve match-data from the site soccer24.com
and now i want to get more data from specific match pages(win-lose history)
so i need either the last scores, or whats even better the "win" or "lose" result
the scores are written like this:
<td class="" style="cursor: pointer;"><span class="score"><strong>2 : 1</strong></span></td>
here i could work with the "2:1"
this is what i try right now:
Elements wl =docl.select("span.score");
System.out.println(wl);
for(Element w :wl){
System.out.println(w.ownText());
}
the result is written like this:
<td class="winLose" style="cursor: pointer;"><span class="winLoseIcon"><a title="Win" class="form-bg-last form-w"><span></span></a></span></td>
here i would need the "win" from the a title
ive really tried everything but cant extract it.. would be really grateful for any help..... and before i make it another question... i would also need the odds-movement..
i get the final odds but the movements are written like this:
<span class="up" alt="1.73[u]1.75">1.75</span>
so the "alt" attribute
if i could get all these things would be awesome and i know its not a big deal for u , but ive been trying around for hours now and this is really my last resort
thanks in advance :)
If I understand your question correctly, you want to extract attribute from an element ? If so,
EDIT:
Now it seems your real issue is not JSOUP parsing, but getting the content.
The link contains #h2h;overall. means it is not getting actual response from server, but it makes ajax request after it loads the page, to another url(http://d.soccer24.com/x/feed/d_hh_K2AUJ0ih_en_2)
When I checked the response, I found that it repetitively makes calls to server and updates the result. This request and response both are encrypted. Following updated code should display you correct results.
// ** Test Data
//Document doc = Jsoup.parse("<html><body><h1></h1><table>"
// + "<td class=\"winLose\" style=\"cursor: pointer;\"><span class=\"winLoseIcon\"><a title=\"Win\" class=\"form-bg-last form-w\"><span></span></a></span></td>"
// + "<span class=\"up\" alt=\"1.73[u]1.75\">1.75</span>" + "</table>/</body></html>");
//
Connection con = Jsoup.connect("http://d.soccer24.com/x/feed/d_hh_K2AUJ0ih_en_2");
con.header("X-Fsign", "SW9D1eZo");
Document doc = con.get();
//Your code
Elements elems=doc.select("td.winLose > span.winLoseIcon > a[title]");
for(Element elem:elems){
System.out.println(elem.attr("title"));
}
Similarly for odds:
Elements elems=doc.select("span.up[alt]");
for(Element elem:elems) println( elem.attr("alt"));
RESULT:
..Lots of lines Win | Lose | Draw..

how to click two identical attributes with Selenium WebDriver?

In firepath I saw two identical attributes, firepath has two results.
Here is the highlighted HTML code below in firebug:
<button class="list_header_search_toggle icon-search btn btn-icon table-btn-lg" style="margin-left:0px">
And below is the whole code:
<button class="list_header_search_toggle icon-search btn btn-icon table-btn-lg" style="margin-left:0px">
<span class="sr-only">Search</span>
</button>
NOTE: There is only 1 search button, I search it every where and there is only 1 but it shows two??
How to code this in selenium web driver?
The snippet from firepath:
Update:
Html code image, from firepath:
You can use XPath functions, for example:
position() returns the position of element at DOM
//button[#id='hdr_problem_task']/th[2]/button[position()=1]
last()
//button[#id='hdr_problem_task']/th[2]/button[last()]
something like first() doesn't exist, instead of this you can use index:
//button[#id='hdr_problem_task']/th[2]/button[1]
Also if button has some text you can use it as well:
//button[#id='hdr_problem_task']/th[2]/button[text()='button name']
or with contains()
//button[#id='hdr_problem_task']/th[2]/button[contains(text(), 'button name')]
UPDATE:
The button has name Search you can use XPath with - contains().
One more small suggestion, don't forget about future support. And instead of the following locator:
//*[#id='hdr_problem_task']/th[2]/button
Much better will be:
//button[#id='hdr_problem_task']/th[2]/button
You can use th tag's name attribute value in order to recognize the correct Search button, as shown below:
//th[#name='search'][1]/button/span[text()='Search']
Let me know, whether it works for you.

Embed hyperlink in sentence for Wicket property file

I'm facing a situation where I want to display a link like this:
For more information check out our FAQ.
Where the full stop is displayed right after the link. It's seems like an Overkill to define multiple properties for this like
E.g.
faq.info=For more information check out our
faq.markup=FAQ
faq.href=http://www.some.very.nice.url.com
faq.fullstop=.
Neither do I want to include the dot only in the html. Is it possible to insert the dot at the end or the link inbetween?
You can embed components inside string messages:
<wicket:message key="faq.info">
<a wicket:id="faq">
<wicket:message key="faq.info.label"/>
</a>
</wicket:message>
faq.info=For more information check out our ${faq}.
faq.info.label=FAQ
For this to work you'll have to add a link with id "faq" from your Java code.

Using Jsoup to select classes and id

I am using this as an example
http://www.shopping.com/digital-camera/products?CLT=SCH&KW=digital+camera
In the linke above there is a class
<span class="numTotalResults">
Results 1 - 40 of 1500+
</span>
I got it using
Document query_result = Jsoup.connect("http://www.shopping.com")
.data("CLT", "digital camera")
.post();
but when I
System.out.println(query_result.select(".numTotalResults"));
System.out.println(query_result.select("#quickLookItem-1"));
System.out.println(query_result.select("[name=D0]"));
Nothing happens,
while
System.out.println(query_result);
System.out.println(query_result.select("span"));
clearly prints out the values
The selector seems to work only with div and span and anchor, but I can' select the classes or the id
Can someone help me?
Thanks
Edit:
It seems like the post did not go through. I don't quite understand why it didn't.
Instead of using POST request, try GET one:
Document query_result = Jsoup.connect("http://www.shopping.com/digital-camera/products?CLT=SCH&KW=digital+camera")
.get();
Take a look how does this search works. It doesn't use POST method and it keeps all search parameters in a query string. After this small change your first select example will work well.

Getting Started With Android & JSOUP

I am currently attempting to make an Android application and have come to the conclusion that I must use JSOUP to finish it. I am using JSOUP to extract data from the Internet and then post it on my app.
What I am trying to figure out is how to extract multiple bits of data from the url and then use each one of them inside of their own XML String TextView (If that is correct?)
Here is a snipbit of the HTML I am trying to extract.
a href="http://www.campusdish.com/en-US/CSMA/OldDominion/Locations/rda.aspx?RCN=m12296&MI=122&RN=BACoN TURKEY SLICED" OnCick="javascript: NewWindow('http://www.campusdish.com/en-US/CSMA/OldDominion/Locations/rda.aspx?RCN=m12296&MI=122&RN=BACON TURKEY SLICED', 'RDA_window', 'width=450, height=600, scrollbars=no, toolbar=no, directories=no, status=no, menubar=no, copyhistory=no');return false" Class="recipeLink">BACON TURKEY SLICED
I am trying to extract the words BACON TURKEY SLICED
The problem is I do not understand JSOUP at all. Like I have an idea about it but I can't seem to practically use it and all that. I was wondering if someone could try and give me a push in the right direction.
Also, I have tried reading the cookbook to no prevail.
If anyone could help, thank you so much!
EDIT
Here are two more. I believe they are the exact same thing.
a href="http://www.campusdish.com/en-US/CSMA/OldDominion/Locations/rda.aspx?RCN=m4903&MI=122&RN=STATION OMELET" OnClick="javascript: NewWindow('http://www.campusdish.com/en-US/CSMA/OldDominion/Locations/rda.aspx?RCN=m4903&MI=122&RN=STATION OMELET', 'RDA_window', 'width=450, height=600, scrollbars=no, toolbar=no, directories=no, status=no, menubar=no, copyhistory=no');return false" Class="recipeLink">STATION OMELET
a href="http://www.campusdish.com/en-US/CSMA/OldDominion/Locations/rda.aspx?RCN=m784&MI=122&RN=CEREAL HOT GRITS" OnClick="javascript: NewWindow('http://www.campusdish.com/en-US/CSMA/OldDominion/Locations/rda.aspx?RCN=m784&MI=122&RN=CEREAL HOT GRITS', 'RDA_window', 'width=450, height=600, scrollbars=no, toolbar=no, directories=no, status=no, menubar=no, copyhistory=no');return false" Class="recipeLink">CEREAL HOT GRITS
So, this answer is going to assume that you are interested in:
<a href=".." >TEXT YOU WANT</a>
All these <a> tags have the style attribute "recipeLink".
Given your example, here as a String:
String tastyTurkeySandwich= "BACON TURKEY SLICED";
You can extract the (first) text with the following code:
Document doc = Jsoup.parse(tastyTurkeySandwich);
Elements links = doc.select("a[href].recipeLink");
// This will just print the text in the first one
System.out.println(links.first().text());
To iterate over an Elements (which implements the Iterable interface) instance:
for (Element link : links) {
// Calling link.text() will return BACON TURKEY SLICED etc. etc.
System.out.println(link.text());
}
In short:
a[href] will match all the <a> tags that have a href attribute.
the .recipeLink part will filter that selection to only include links that have the recipeLink style.

Categories