I want to display my element to an textview.
code
Document doc = Jsoup.parse(myURL);
Elements name = doc.getElementsByClass(".lNameHeader");
for (Element nametext : name){
String text = nametext.text();
tabel1.setText(text);
but it displays nothing.
(the site i am parsing http://roosters.gepro-osi.nl/roosters/rooster.php?leerling=120777&type=Leerlingrooster&afdeling=12-13_OVERIG&tabblad=2&school=905)
From your previous question it shows that myURL is a String. In this case you are are using the constructor Jsoup.parse(String html).
You need the one that takes a URL to make the connection:
Document doc = Jsoup.parse(new URL(myURL), 2000);
Elements name = doc.getElementsByClass("lNameHeader");
Also drop the leading . character from the class name. If you don't wish to specify a timeout you can simply use:
Document doc = Jsoup.connect(myURL).get();
Actually the class for it is:
lNameHeader
Note that first letter is not 1 (one) - it's l (letter L)
So it should be:
Elements name = doc.getElementsByClass("lNameHeader");
Note also that JSoup getElementsByClass methods doesn't work like CSS selectors - so the . must be omitted.
Related
How can I use selenium java to verify that the data table are masked after the 4th character.
I have used the below code to extract the WebElements from the GUI.
List IDds = driver.findElements(By.xpath("//tbody/tr/td[1]"));
Example of data:
1111$$$$
2222$$
Find all the elements you need in a loop, or give a CSS expression that returns a list of web elements:
WebElement myElm = driver.findElementByCSS("CSSExpression");
String text = myElm.getAttribute("Text");
// or
// String text = myElm.getAttribute("Value");
Char C = text.charAt(4);
if (C=='$')
// Good
else
// bad
You can also find the first location in the string where you see $ and make sure all the other chars are the same.
I got an a element with few attributes one of them is data-product-id this is my element that I want.
for example data-product-id="002212" I am intrested in the number "002212"
My problem is that there can be couple a elements with this link
There is how link looks like.
<a href="something.com" title="test tile" class="title-product" data-jsevent="obj:title--product" data-product-name="test" data-product-id="002212" ddata-product-price="1.99" data-product-brand="test" data-product-quantity="1">
I did something like this:
Elements links = document.select("a.title-product");
I receives every a element with class title-product now How can I get from received html data-product-id but with my number 002212?
I can't parse links to String.
I also tried something like this:
if(links.contains("data-product-id=\"002212\"")){
System.out.println("it works");
} else {
System.out.println("nothing");
}
But links.contains equals always "false" even this number is there.
also I tried
it works but I get only first element with for example number 002211 instead of 002212
String linktext = a.attr("data-product-id");
and this is null
String linktext = a.attr("data-product-id=\"002212\"");
Solved this line below did it.
Elements links = document.select("a[data-product-id=\"002212\"]");
I came up with something like this which didn't work out. I am trying to extract the texts that contain the keyword alone and not the entire text of the webpage just because the webpage has that keyword.
String pconcat="";
for (i = 0; i < urls.length; i++) {
Document doc=Jsoup.connect(urls[i]).ignoreContentType(true).timeout(60*1000).get();
for(int x=0;x<keyWords.length;x++){
if(doc.body().text().toLowerCase().contains(keyWords[x].toLowerCase())){
Elements e=doc.select("body:contains("+keyWords[x]+")");
for(Element element : e)
{
pconcat+=element.text();
System.out.println("pconcat"+pconcat);
}
}
}
}
Consider example.com , if the keyword I look for is "documents" , I need the output as "This domain is established to be used for illustrative examples in documents." and nothing else
You don't need to lowercase the body text in order to use the :contains selector, it is case insensitive.
elements that contains the specified text. The search is case
insensitive. The text may appear in the found element, or any of its
descendants.
select() is only going to return elements if it finds a match.
elements that match the query (empty if none match)
You don't need an if-statement to check for "documents", just use css selectors to select any element that matches then do something with the results.
Document doc = Jsoup
.connect(url)
.ignoreContentType(true)
.timeout(60*1000)
.get();
for (String keyword : keywords) {
String selector = String.format(
"p:contains(%s)",
keyword.toLowerCase());
String content = doc
.select(selector)
.text();
System.out.println(content);
}
Output
This domain is established to be used for illustrative examples in
documents. You may use this domain in examples without prior
coordination or asking for permission.
I am using JSoup to parse some HTMLL information, and I would like to parse the aria label value of a specific div attribute. The line I am trying to parse is the following:
<div class="tiny-star star-rating-non-editable-container" aria-label=" Rated 5 stars out of five stars ">
I have used the following:
Document document = Jsoup.connect(url).get();
Elements stars= document.select("div.tiny-star star-rating-non-editable-container[aria-label]");
String value = stars.text();
System.out.println("The rating is " + value);
However, the String value, returns blank. Why is this?
That selector expression won't give you what you expect. It's treated as a two-part selector
div.tiny-star - find a div element with class tiny-star
star-rating-non-editable-container[aria-label] - then look for a descendant star-rating-non-editable-container element which has an aria-label attribute
Try something more like
Element divWithStars = document.select(
"div.tiny-star.star-rating-non-editable-container[aria-label]");
String ariaLabel = divWithStars.attr("aria-label");
Note the dot rather than space between tiny-star and star-rating-..., and also the fact that select returns the element that hosts the aria-label attribute, not the attribute itself - you have to use attr to extract the attribute value.
I am just completely lost and confused when using JSOUP to parse this html document...
I dont mean to just ask for straight up code but if someone has the time or can get me started that would be great...
Here is the document:
http://radar.weather.gov/ridge/RadarImg/N0R/ILN/
If you view the source I am trying to fetch these lines:
<tr><td valign="top"><img src="/icons/image2.gif" alt="[IMG]"></td><td>ILN_20140112_0021_N0R.gif</td><td align="right">12-Jan-2014 00:23 </td><td align="right">2.2K</td><td> </td></tr>
As you notice there are many of these... I need the value in
<a href=
I also need that value in the first ten of those lines...
As i said if anyone has the time to help me out, it would be greatly appreciated!
First you need to store the contents of the HTML into a Document (explained more here):
String url = "http://radar.weather.gov/ridge/RadarImg/N0R/ILN/";
Document doc = Jsoup.connect(url).get();
Next select the Elements from the Document that you want (see here). In the following line, it will select all <a> elements with a href attribute that contains the String "gif":
Elements links = doc.select("a[href]:contains(gif)");
Then to print out the value from the first ten, you could just use a loop. The attr() method allows you to extract only the value of a certain attribute, rather than the complete HTML or its text:
for (int i=0;i<10;i++) {
System.out.println(links.get(i).attr("href"));
}
The output is:
ILN_20140112_0221_N0R.gif
ILN_20140112_0227_N0R.gif
ILN_20140112_0232_N0R.gif
ILN_20140112_0237_N0R.gif
ILN_20140112_0242_N0R.gif
ILN_20140112_0248_N0R.gif
ILN_20140112_0253_N0R.gif
ILN_20140112_0258_N0R.gif
ILN_20140112_0303_N0R.gif
ILN_20140112_0308_N0R.gif
This is essentially the basic methodology for most of the parsing you will do in Jsoup. You should have a go at extracting some other Elements from the page (use this page for reference).
Try this
String TestUrl = "<tr><td><img src='/icons/image2.gif' alt='[IMG]'></td><td><a href='ILN_20140112_0021_N0R.gif'>ILN_20140112_0021_N0R.gif</a></td><td align='right'>12-Jan-2014 00:23</td><td align='right'>2.2K</td><td> </td></tr>";
Document doc = Jsoup.parse(TestUrl);
Element link = doc.select("a").first();
/**
* value will be "ILN_20140112_0021_N0R.gif"
*/
String value = link.text();
Edit: Refer to #ashatte's solution instead.
Document doc = Jsoup.parse
(new URL("http://radar.weather.gov/ridge/RadarImg/N0R/ILN/"),
3000);
//Or whatever your link is; 3000 is timeout
int ignoreCount = 0;
//using a counter to ignore top 2 lines
for (Element item : doc.select("tr")) {
// Selects the <tr> elements so item is a single <tr>
if (a > 1) {
Element link = item.select("a").first();
// selects first <a> element
if (link != null && link.hasAttr("href"))
String href = link.attr("href"));
// fetches href attribute from the selected <a>
}
a++;
}
This is just a way to do it among many. I strongly suggest you read up the JSOUP cookbook