Parse a website with jSoup - java

I am trying to parse a website, specifically this one It does not provide a api for that, like it does for bf4 or other titles, but the owner said that I should just parse the data.
The problem I have is that using jSoup, it retrieves the data, but if you look carefully, the website makes a new httpget and only after that the search is completed.
From what I could gather, i think it sends some paramethers in the header to.
If i just use jSoup to call that like I get some data, and where the search should be I get the message:
Please activate Javascript to see the search results.
Is there is a way to get the data? I really need this, any help is very much appreciated.
Please help

You need a javascript-capable client, e.g., HtmlUnit or Selenium.

Related

Java - use searchbar on given website

Let me just start by saying that this is a soft question.
I am rather new to application development, and thus why I'm asking a question without presenting you with any actual code. I know the basics of Java coding, and I was wondering if anyone could enlighten me on the following topic:
Say I have an external website, Craigslist, or some other site that allows me to search through products/services/results manually by typing a query into a searchbox somewhere on the page. The trouble is, that there is no API for this site for me to use.
However I do know that http://sfbay.craigslist.org/search/sss?query=QUERYHERE&sort=rel points me to a list of results, where QUERYHERE is replaced by what I'm looking for.
What I'm wondering here is: is it possible to store these results in an Array (or List or some form of Collection) in Java?
Is there perhaps some library or external tool that can allow me to specify a query to search for, have it paste it in to a search-link, perform the search, and fill an Array with the results?
Or is what I am describing impossible without an API?
This depends, if the query website accepts returning the result as XML or JSON (usually with a .xml or .json at the end of url) you can parse it easily with DOM for XML on Java or download and use the JSONLibrary to parse a JSON.
Otherwise you will receive a HTML that is the page that a user would see in a browser, then you can try parse it as a XML but you will have a lot of work to map all fields in the HTML to get the list as you want.

Get data from specific webpage

I need to get data from table id="maintable" from website "http://trackinn.az/GeoLoc/reports.aspx?login=ho&password=ho". I tried many methods some of them worked for other websites but not for this one. Could someone help pls?
Thanks.
You cannot get data from that table. It is being dynamically created by javascript (AJAX particularly). And I don't think there is any api wich can process javascript.
It would be possible to get data if it was a static file or the part you want would be static, but it is not.
If you are using this site to get geolocation, then use google geocoding api for this. it would be better option.

Roblox Forum, Web Scraping App (Android) Questions

I've been waiting for an idea and I think I finally have one. I am going to attempt to make a Android App using Web Scraping that will allow me to navigate and use the forums on Roblox (.com if you really want to look it up) better than I can now. Not only are the forums pretty bad in general but they are even worse on my Android Device (Samsung Galaxy Player). Can anyone give me an pointers or advice? I'm not sure what libraries I should use... This is my first big attempt at coding :)
Oh, Obviously I would want to give it a feature to reply to posts but I'm not sure how login for that type of thing would work...
EDIT: I got the idea from this application: GooglePlay, Github
You should look up how to get the data from the website, and you should also make sure that you understand html. You also need a simple way to handle html.
Get the page (use the example in the question): Read data from webpage
A bit about html: http://www.w3schools.com/html/default.asp
Handle html: http://jsoup.org/cookbook/extracting-data/dom-navigation
To login you should do a post request with the login information to the standard login page, then you keep the cookie that were generated and pass it with your other requests.
Little about handle cookies: Java: Handling cookies when logging in with POST
Some things you also might want to think about:
Linear or branched view of posts in the forum?
Should you get a message if someone post a new post?
A own search function?
Signature?
You have to use JSOUP libaray of java ,you can easily parse the html data through this library. Example: In doc object you are getting complete web page
File input = new File(url);
Document doc = null;
doc = Jsoup.connect(url).get();
Elements headlinesCat1 = doc.select("div[class=abc");

The combination of HTTP POST and GET requests + Javascript calling in JAVA?

Okay, this is going to be hard to explain but here goes nothing:
Lately I've been working a lot with POST and GET requests, but now I want to send a POST/GET request to this site called: http://www.mangareader.net/
The main problem I'm facing is that I want to use the search function of this site. Normally I would send a get request or something like that, but apparently this search function doesn't work that way, it works with some kind of Javascript code? I don't know exactly what it is, but try typing "Elf" in the search bar, and you'll get a drop down list of all the mangas (Japanese comics) with the word "Elf" in them. I want to know how this process is called, and how I can implement it into a Java program. For instance:
Login into a website
- > Send an HTTP post request. Get HTML data back. Process the HTML data. Get the information I need from the HTML source.
Using a search function on a regular site like google.com or bing.com
- > Send get request. Get HTML data back. Process the HTML data. Get the information I need from the HTML source.
Using search function on mangareader.net
- > ??????????
How would I achieve this? A theoretic explanation is enough, but a practical example would be great as well.
If you analyse the javascript that runs when search you get the following:
GET http://www.mangareader.net/actions/search/?q=test&limit=100 [HTTP/1.1 200 OK 113ms]
In other words, you can search on the site by a GET-request to
http://www.mangareader.net/actions/search/?q=test&limit=100
Where ?q contains your search word.
This site uses an ajax call to get a | ( pipe symbol ) seperated list from the page
/actions/search?q=term
It parses this list using string split and then makes it into combobox.
I have little experience with java, but a simple GET request to this page should work
replace {term} with your search function.
http://www.mangareader.net/actions/search/?q={term}&limit=100
You can use chrome network monitor to see if for your self

How do I send a query to a website and parse the results?

I want to do some development in Java. I'd like to be able to access a website, say for example
www.chipotle.com
On the top right, they have a place where you can enter in your zip code and it will give you all of the nearest locations. The program will just have an empty box for user input for their zip code, and it will query the actual chipotle server to retrieve the nearest locations. How do I do that, and also how is the data I receive stored?
This will probably be a followup question as to what methods I should use to parse the data.
Thanks!
First you need to know the parameters needed to execute the query and the URL which these parameters should be submitted to (the action attribute of the form). With that, your application will have to do an HTTP request to the URL, with your own parameters (possibly only the zip code). Finally parse the answer.
This can be done with standard Java API classes, but it won't be very robust. A better solution would be HttpClient. Here are some examples.
This will probably be a followup question as to what methods I should use to parse the data.
It very much depends on what the website actually returns.
If it returns static HTML, use an regular (strict) or permissive HTML parser should be used.
If it returns dynamic HTML (i.e. HTML with embedded Javascript) you may need to use something that evaluates the Javascript as part of the content extraction process.
There may also be a web API designed for programs (like yours) to use. Such an API would typically return the results as XML or JSON so that you don't have to scrape the results out of an HTML document.
Before you go any further you should check the Terms of Service for the site. Do they say anything about what you are proposing to do?
A lot of sites DO NOT WANT people to scrape their content or provide wrappers for their services. For instance, if they get income from ads shown on their site, what you are proposing to do could result in a diversion of visitors to their site and a resulting loss of potential or actual income.
If you don't respect a website's ToS, you could be on the receiving end of lawyers letters ... or worse. In addition, they could already be using technical means to make life difficult for people to scrape their service.

Categories