jsoup extract specific attribute from a hyperlink - java

I have I some hyperlinks in a web page that I want to extract the attribute title which within it
I tried
select("a[href]").attr("title")
but I get no thing
Edit
The complete div here
Trial code
Elements es = doc.select("div.mini-placard")
for(Element e:es)
{
System.out.println( e.select("span.align-image-vertically").select("a").attr("title"));
}
no output !

Please extract link element properly and then inspect attributes of the link element as below:
String html = "<p>An <a href='http://example.com/' title='hi'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkTitle = link.attr("title"); // 'hi'
Courtesy

Related

jsoup: parse data of certain tag which is just after a particular tag

I am trying to parse certain information through jsoup in Java from last 3 days -_-, this is my code:
Document document = Jsoup.connect(urlofpage).get();
Elements links = document.select(".contentBox");
for (Element link : links) {
// String name = link.text();
String title = link.select("h2").text();
String content = link.select("p").text();
System.out.println(title);
System.out.println(content);
}
It is fetching the data as it is directed, fetching the data of h2 and p separated, but the problem is, I want to parse the data inside of <p> tag which is just after every <h2> tag.
For example (HTML content):
<h2>main content</h2>
<div class="acx"><div>
<p>content</p>
<p>content 2</p>
<h2>content 2</h2>
<div class="acx"><div>
<p>new content od 2</p>
<p>new 2</p>
Now it should fetch like (in array):
array[0] = "content content 2",
array[1] = "new content od 2 new 2",
Any solutions?
You can play with "~" next element selector. For example
link.select("h2 ~ p").get(0).text(); // returns "content"
link.select("h2 ~ p").get(1).text(); // returns "new content od 2"
Just use your initial approach to iterate all necessary tags within selected .contentBox class:
Document document = Jsoup.connect(urlofpage).get();
Elements links = document.select(".contentBox");
for (Element link : links) {
for (Element h2Tag : link.select("h2"))
{
System.out.println(h2Tag.text());
}
for (Element pTag : link.select("p"))
{
System.out.println(pTag.text());
}
}

java jsoup parse how to parse html

Is there any possible way to parse
Huhi
in html:
Huhi
White
Angle
Output:
Huhi
White
Angle
Create your document and get all the a[href] links, iterate through these links and get the text they contain. Like so:
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
String text = link.text();
}
You just select a and iterate the elements and print
String html ="Huhi\n" +
"White\n" +
"Angle";
Document doc = Jsoup.parse(html);
Elements links = doc.select("a");
for (Element link : links) {
System.out.println(link.text());
}
For further reference check this link selector-syntax

How to extract those elements in Jsoup

I want to extract the "Abstract" and the "Title" as shown in the photo below. However I can't extract the title and I tried to extract the tag "Abstract" but it didn't work.
String html = "http://example.com/";
Document doc = Jsoup.parse(html);
Element link = doc.select("Abstract").first();
Try this:
Element title = doc.select("FONT[size=+1]").first();
Element abstractParagraph = doc.select("CENTER:has(b:containsOwn(Abstract)) + p").first();

Parse the inner html tags using jSoup

I want to find the important links in a site using Jsoup library. So for this suppose we have following code:
<h1>This is important </h1>
Now while parsing how can we find that the tag a is inside the h1 tag?
You can do it this way:
File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements headlinesCat1 = doc.getElementsByTag("h1");
for (Element headline : headlinesCat1) {
Elements importantLinks = headline.getElementsByTag("a");
for (Element link : importantLinks) {
String linkHref = link.attr("href");
String linkText = link.text();
System.out.println(linkHref);
}
}
Taken from the JSoup Cookbook.
Use selector:
Elements elements = doc.select("h1 > a");

Jsoup get href within a class

I have this html code that I need to parse
<a class="sushi-restaurant" href="/greatSushi">Best Sushi in town</a>
I know there's an example for jsoup that you can get all links in a page,e.g.
Elements links = doc.select("a[href]");
for (Element link : links) {
print(" * a: <%s> (%s)", link.attr("abs:href"),
trim(link.text(), 35));
}
but I need a piece of code that can return me the href for that specific class.
Thanks guys
You can select elements by class. This example finds elements with the class sushi-restaurant, then gets the absolute URL of the first result.
Make sure that when you parse the HTML, you specify the base URL (where the document was fetched from) to allow jsoup to determine what the absolute URL of a link is.
public static void main(String[] args) {
String html = "<a class=\"sushi-restaurant\" href=\"/greatSushi\">Best Sushi in town</a>";
Document doc = Jsoup.parse(html, "http://example.com/");
// find all <a class="sushi-restaurant">...
Elements links = doc.select("a.sushi-restaurant");
Element link = links.first();
// 'abs:' makes "/greatsushi" = "http://example.com/greatsushi":
String url = link.attr("abs:href");
System.out.println("url = " + url);
}
Shorter version:
String url = doc.select("a.sushi-restaurant").first().attr("abs:href");
Hope this helps!
Elements links = doc.select("a");
for (Element link : links) {
String attribute=link.attr("class");
if(attribute.equalsIgnoreCase("sushi-place")){
print link.href//You probably need this
}
}

Categories