I am extracting details from a page which I'm administering. I tried using jsoup to extract the links then from that extract names of users but it's not working. It only shows links other than user links. I tried extracting names from this link
https://www.facebook.com/plugins/fan.php?connections=100&id=pageid
which is working quite well but does not works for this link
https://www.facebook.com/browse/?type=page_fans&page_id=
Can anyone help me...Below is the code which I tried.
doc = Jsoup.connect("https://www.facebook.com/browse/?type=page_fans&page_id=mypageid").get();
Elements els = doc.getElementsByClass("fsl fwb fcb");
Elements link = doc.select("a[href]");
for(Element ele : link)
{
system.out.println(ele.attr("href"));
} }
Try This
Document doc = Jsoup.connect("https://www.facebook.com/plugins/fan.php?connections=100&id=pageid").timeout(0).get();
Elements nameLinks = doc.getElementsByClass("link");
for (Element users : nameLinks) {
String name = users.attr("title");
String url = users.attr("href");
System.out.println(name + "-" + url);
}
It will give all the users name and URl present on the first link defined in your question.
Related
I would like to parse a complete page with JSOUP .I have done in the past, parts of a page, but not a complete page. Here is for parts:
String url = "http://www.billboard.com/charts/artist-100";
doc = Jsoup.connect(url).get();
Elements names = doc.select("div.chart-row__title > h2.chart-row__song");
for (Element p : names)
Names.add(p.text().toString());
So i'm trying to create a little program that updates a World of Warcraft addon for me. Im using jsoup to get a list of links on a specific site. How do I ignore files/links that don't end in .zip?
This is my link list so far, as you can see it will print a list of all the links on the site. The goal is to only find .zip files (there are only two). And then download one of them. Direct link to download changes every time they update the addon, so I can't just download a specific link. I need to find the latest version every time.
public static void LinkList() {
Document doc;
try {
doc = Jsoup.connect("http://www.tukui.org/dl.php").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
System.out.println("\nlink : " + link.attr("href"));
}
} catch (IOException e) {
e.printStackTrace();
}
}
You can use [attr$=value] selector to checks if attribute ends with value
Elements links = doc.select("a[href$=zip]");
Demo:
Document doc = Jsoup.connect("http://www.tukui.org/dl.php").get();
Elements links = doc.select("a[href$=zip]");
List<String> list = new ArrayList<>();
for (Element link : links) {
System.out.println("link : " + link.attr("href"));
list.add(link.attr("href"));
}
String[] arr = list.toArray(new String[list.size()]);
System.out.println("array content:" + Arrays.toString(arr));
Output:
link : http://www.tukui.org/downloads/tukui-15.79.zip
link : http://www.tukui.org/downloads/elvui-6.82.zip
link : /client/win/tc2430.zip
array content:[http://www.tukui.org/downloads/tukui-15.79.zip, http://www.tukui.org/downloads/elvui-6.82.zip, /client/win/tc2430.zip]
I have this html code that I need to parse
<a class="sushi-restaurant" href="/greatSushi">Best Sushi in town</a>
I know there's an example for jsoup that you can get all links in a page,e.g.
Elements links = doc.select("a[href]");
for (Element link : links) {
print(" * a: <%s> (%s)", link.attr("abs:href"),
trim(link.text(), 35));
}
but I need a piece of code that can return me the href for that specific class.
Thanks guys
You can select elements by class. This example finds elements with the class sushi-restaurant, then gets the absolute URL of the first result.
Make sure that when you parse the HTML, you specify the base URL (where the document was fetched from) to allow jsoup to determine what the absolute URL of a link is.
public static void main(String[] args) {
String html = "<a class=\"sushi-restaurant\" href=\"/greatSushi\">Best Sushi in town</a>";
Document doc = Jsoup.parse(html, "http://example.com/");
// find all <a class="sushi-restaurant">...
Elements links = doc.select("a.sushi-restaurant");
Element link = links.first();
// 'abs:' makes "/greatsushi" = "http://example.com/greatsushi":
String url = link.attr("abs:href");
System.out.println("url = " + url);
}
Shorter version:
String url = doc.select("a.sushi-restaurant").first().attr("abs:href");
Hope this helps!
Elements links = doc.select("a");
for (Element link : links) {
String attribute=link.attr("class");
if(attribute.equalsIgnoreCase("sushi-place")){
print link.href//You probably need this
}
}
How do I extract full URL's from all paragraphs on a web page using jsoup? I am able to extract only the relative URL's.
Expected:
http://fr.wikipedia.org/wiki/Husni_al-Zaim
Actual: /Husni_al-Zaim
My Code:
Elements links = doc.select("p");
Elements linkss = links.select("a");
for (Element link : linkss) {
if (link.text().matches("^[A-Z].+") == true) {
list.add(new NamedLink(link.attr("href"), link.text()));
}
}
Use .absUrl("href") instead of .attr("href"). This only works when you get the document from a webpage or parse the full file from disk (and thus do not massage portions from HTML to text and back as in your example).
Document document = Jsoup.connect("http://stackoverflow.com").get();
Elements paragraphLinks = document.select("p a");
for (Element paragraphLink : paragraphLinks) {
String absUrl = paragraphLink.absUrl("href");
// ...
}
I am using Jsoup to extract URL of an webpage. The href attribute of those URL's are relative like:
example
Here is my attempt:
Document document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
String url = dl.select("a").attr("href");
}
This works fine, but if I use
String url = dl.select("a").attr("abs:href");
to get the absolute URL like http://example.com/text, it is not working. How can I get the absolute URL?
You need Element#absUrl().
String url = dl.select("a").absUrl("href");
You can by the way shorten the select:
Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
String url = link.absUrl("href");
}
String url = dl.select("a").absUrl("href");
Is not correct because dl.select("a") will not return a single item but a collection.
You need to get elements by index
eg :
Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");
If you are sure only one item will result from the select above, or that the item you want will be the first, you can:
String url = dl.select("a").get(0).absUrl("href");
Which is also same as
String url = dl.select("a").first().absUrl("href");
It doesn't have to be the first element anyway, you can always replace the 0 in
String url = dl.select("a").get(0).absUrl("href"); with the index of your element.
Or use a select that is more specific that will only result in one element.