Extracting user details from facebook page

Extracting user details from facebook page - java

I am extracting details from a page which I'm administering. I tried using jsoup to extract the links then from that extract names of users but it's not working. It only shows links other than user links. I tried extracting names from this link
https://www.facebook.com/plugins/fan.php?connections=100&id=pageid
which is working quite well but does not works for this link
https://www.facebook.com/browse/?type=page_fans&page_id=
Can anyone help me...Below is the code which I tried.
doc = Jsoup.connect("https://www.facebook.com/browse/?type=page_fans&page_id=mypageid").get();
Elements els = doc.getElementsByClass("fsl fwb fcb");
Elements link = doc.select("a[href]");
for(Element ele : link)
{
system.out.println(ele.attr("href"));
} }

Try This
Document doc = Jsoup.connect("https://www.facebook.com/plugins/fan.php?connections=100&id=pageid").timeout(0).get();
Elements nameLinks = doc.getElementsByClass("link");
for (Element users : nameLinks) {
String name = users.attr("title");
String url = users.attr("href");
System.out.println(name + "-" + url);
}
It will give all the users name and URl present on the first link defined in your question.

Related

Scan complete page with JSOUP

I would like to parse a complete page with JSOUP .I have done in the past, parts of a page, but not a complete page. Here is for parts:
String url = "http://www.billboard.com/charts/artist-100";
doc = Jsoup.connect(url).get();
Elements names = doc.select("div.chart-row__title > h2.chart-row__song");
for (Element p : names)
Names.add(p.text().toString());

Finding a specific file on a site using jsoup

So i'm trying to create a little program that updates a World of Warcraft addon for me. Im using jsoup to get a list of links on a specific site. How do I ignore files/links that don't end in .zip?
This is my link list so far, as you can see it will print a list of all the links on the site. The goal is to only find .zip files (there are only two). And then download one of them. Direct link to download changes every time they update the addon, so I can't just download a specific link. I need to find the latest version every time.
public static void LinkList() {
Document doc;
try {
doc = Jsoup.connect("http://www.tukui.org/dl.php").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
System.out.println("\nlink : " + link.attr("href"));
}
} catch (IOException e) {
e.printStackTrace();
}
}

You can use [attr$=value] selector to checks if attribute ends with value
Elements links = doc.select("a[href$=zip]");
Demo:
Document doc = Jsoup.connect("http://www.tukui.org/dl.php").get();
Elements links = doc.select("a[href$=zip]");
List<String> list = new ArrayList<>();
for (Element link : links) {
System.out.println("link : " + link.attr("href"));
list.add(link.attr("href"));
}
String[] arr = list.toArray(new String[list.size()]);
System.out.println("array content:" + Arrays.toString(arr));
Output:
link : http://www.tukui.org/downloads/tukui-15.79.zip
link : http://www.tukui.org/downloads/elvui-6.82.zip
link : /client/win/tc2430.zip
array content:[http://www.tukui.org/downloads/tukui-15.79.zip, http://www.tukui.org/downloads/elvui-6.82.zip, /client/win/tc2430.zip]

Jsoup get href within a class

I have this html code that I need to parse
<a class="sushi-restaurant" href="/greatSushi">Best Sushi in town</a>
I know there's an example for jsoup that you can get all links in a page,e.g.
Elements links = doc.select("a[href]");
for (Element link : links) {
print(" * a: <%s> (%s)", link.attr("abs:href"),
trim(link.text(), 35));
}
but I need a piece of code that can return me the href for that specific class.
Thanks guys

You can select elements by class. This example finds elements with the class sushi-restaurant, then gets the absolute URL of the first result.
Make sure that when you parse the HTML, you specify the base URL (where the document was fetched from) to allow jsoup to determine what the absolute URL of a link is.
public static void main(String[] args) {
String html = "<a class=\"sushi-restaurant\" href=\"/greatSushi\">Best Sushi in town</a>";
Document doc = Jsoup.parse(html, "http://example.com/");
// find all <a class="sushi-restaurant">...
Elements links = doc.select("a.sushi-restaurant");
Element link = links.first();
// 'abs:' makes "/greatsushi" = "http://example.com/greatsushi":
String url = link.attr("abs:href");
System.out.println("url = " + url);
}
Shorter version:
String url = doc.select("a.sushi-restaurant").first().attr("abs:href");
Hope this helps!

Elements links = doc.select("a");
for (Element link : links) {
String attribute=link.attr("class");
if(attribute.equalsIgnoreCase("sushi-place")){
print link.href//You probably need this
}
}

How to extract full URLs from all paragraphs in a webpage using jsoup

How do I extract full URL's from all paragraphs on a web page using jsoup? I am able to extract only the relative URL's.
Expected:
http://fr.wikipedia.org/wiki/Husni_al-Zaim
Actual: /Husni_al-Zaim
My Code:
Elements links = doc.select("p");
Elements linkss = links.select("a");
for (Element link : linkss) {
if (link.text().matches("^[A-Z].+") == true) {
list.add(new NamedLink(link.attr("href"), link.text()));
}
}

Use .absUrl("href") instead of .attr("href"). This only works when you get the document from a webpage or parse the full file from disk (and thus do not massage portions from HTML to text and back as in your example).
Document document = Jsoup.connect("http://stackoverflow.com").get();
Elements paragraphLinks = document.select("p a");
for (Element paragraphLink : paragraphLinks) {
String absUrl = paragraphLink.absUrl("href");
// ...
}

How to extract absolute URL from relative HTML links using Jsoup?

I am using Jsoup to extract URL of an webpage. The href attribute of those URL's are relative like:
example
Here is my attempt:
Document document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
String url = dl.select("a").attr("href");
}
This works fine, but if I use
String url = dl.select("a").attr("abs:href");
to get the absolute URL like http://example.com/text, it is not working. How can I get the absolute URL?

You need Element#absUrl().
String url = dl.select("a").absUrl("href");
You can by the way shorten the select:
Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
String url = link.absUrl("href");
}

String url = dl.select("a").absUrl("href");
Is not correct because dl.select("a") will not return a single item but a collection.
You need to get elements by index
eg :
Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");
If you are sure only one item will result from the select above, or that the item you want will be the first, you can:
String url = dl.select("a").get(0).absUrl("href");
Which is also same as
String url = dl.select("a").first().absUrl("href");
It doesn't have to be the first element anyway, you can always replace the 0 in
String url = dl.select("a").get(0).absUrl("href"); with the index of your element.
Or use a select that is more specific that will only result in one element.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting user details from facebook page - java

Related

Scan complete page with JSOUP

Finding a specific file on a site using jsoup

Jsoup get href within a class

How to extract full URLs from all paragraphs in a webpage using jsoup

How to extract absolute URL from relative HTML links using Jsoup?

Categories

Resources