Getting a specific value from a webpage from scraped HTML - java

Curerntly using Java to scrape the HTML code from this page http://counter.onlineclock.net/
I want to get the value from the counter, but this is unique for each version of the webpage, that is, if its open in different browsers or for different people it will be a different value.
Because of this, when I scrape the HTML, the value that I am looking for is just blank. I am wondering if there is any way at all for me to get the current value I am looking for.
For example, if I have the counter at 4 I would like to be able to get that value. It does not have to be in java, any language or any way.

JSoup is a great library for scraping data out of a web page. There are a lot of good examples of its usage on the web

Related

Outputting Search results using Jsoup in java

I'm trying to create a Java Program, where I can insert a String into a search bar and then record/print out the results.
This site is: http://maple.fm/khroa
I'm fairly new to JSoup and I've spent several hours just reading the html code regarding that page and have come across variables that could be used to insert the String that I need and get results, although I'm not sure how to exactly do that. Would someone be able to point me to the right direction?
I think you missed the point of JSOUP.
JSOUP can parse a page that is already loaded - it is not used to interact with a page (as you want). You could use Selenium to interact with the page (http://www.seleniumhq.org/) and then use JSOUP to parse the loaded page's source code.
In this case, the search results seem to be all loaded when the page load, and the Item Search function only filters the (already existing) results with Javascript.
There are no absolute links you could use to get results to a particular search.

Java - use searchbar on given website

Let me just start by saying that this is a soft question.
I am rather new to application development, and thus why I'm asking a question without presenting you with any actual code. I know the basics of Java coding, and I was wondering if anyone could enlighten me on the following topic:
Say I have an external website, Craigslist, or some other site that allows me to search through products/services/results manually by typing a query into a searchbox somewhere on the page. The trouble is, that there is no API for this site for me to use.
However I do know that http://sfbay.craigslist.org/search/sss?query=QUERYHERE&sort=rel points me to a list of results, where QUERYHERE is replaced by what I'm looking for.
What I'm wondering here is: is it possible to store these results in an Array (or List or some form of Collection) in Java?
Is there perhaps some library or external tool that can allow me to specify a query to search for, have it paste it in to a search-link, perform the search, and fill an Array with the results?
Or is what I am describing impossible without an API?
This depends, if the query website accepts returning the result as XML or JSON (usually with a .xml or .json at the end of url) you can parse it easily with DOM for XML on Java or download and use the JSONLibrary to parse a JSON.
Otherwise you will receive a HTML that is the page that a user would see in a browser, then you can try parse it as a XML but you will have a lot of work to map all fields in the HTML to get the list as you want.

How to integrate a part of one html website into java program?

Given a HTML website which displays a temperature outside and other unimportant peaces of information:
<div style="">15</div>
15 - is my destination number, which I want to extract as a variable.
Now what I want to do is, that Java program will go to the website, search for the particular HTML code line (temperature=15;) and after it is found, it must display it like this: http://i.stack.imgur.com/lY0qi.jpg
All I want to know, what syntax should I use to let program request that number.
Extracting information from a website is called crawling or scraping.
You basically go to the web site, get the HTML source and search it for your element. You can search with a regular expression or (more common) with a parser like Jsoup.
You will find a lot of working examples on the official site of Jsoup (e.g. http://jsoup.org/cookbook/extracting-data/example-list-links). Jsoup will parse the HTML source into a DOM-like structure with elements and nodes. You can search for specific nodes, e.g. for all DIV elements. Then you can iterate over them and get your temperature.
There are tools called scraper that extract information from the web .thare are many Java API that let you write your own scraper. You can try with JSoup ,HTMLUnit or Jaunt .

parsing web page which is changing real time in JAVA

Heres what i want to do. Im quite a beginner with this so maybe a lame question, But, I want to implement gui application in java wich gets data from sports livescore pages
e.g
http://www.futbol24.com/Live/
http://livescore.com/
and parse it (somehow) in my app...and then i will be able to store it in for example jtable ,save full time results in database,playing sounds after goal is scored and so on
What is the best way to do this ?
It would be almost impossible to parse an HTML document from a live web page and get specific information from it. If you did manage to work out exactly where in the document the data is, the page structure could change at any time. The scores might not even be in the HTML - they could be fetched by Javascript in the page.
I suggest you find an RSS feed of the information you want. Then you'll only have a nice, small piece of XML to parse. That's what it's for.

Read and Analyze Data on another webpage and insert onto mine

Im trying to make a simple webpage which obtains football league table data
http://www.skysports.com/football/league/0,19540,11660,00.html
For example i want to read in the points column and divide it by the number of games played to get an average points per game column that i will print onto my webpage.
How can i do this online?
Im quite experienced at doing this with offline programmes such as C/Matlab but i dont know where to start with it online.
Thanks
I wouldn't suggest to do it client side (on browser). It will be easier to scrap on server side (using java for example) following the steps:
Grab the content of the webpage (skysports)
Use existing html markup with regex to locate the desired content part.
Strip/split html markup with regex to get records (tr) and fields (td).
Cast values and do your math.
Use results to generate your version of html or json or whatever.
Serve the generated content to your client.
In general scrapping is easy but not guaranteed for tomorrow as source html markup may change at any time (and without warning).
I can provide a basic sample in C# if you want. (Sorry I haven't "java" since 1997).
You use jQuery.get like this:
$.get('http://www.skysports.com/football/league/0,19540,11660,00.html', function(data) {
//do the parsing here
});
There are several programing languages capable of getting at this information, PHP would be the classic method using curl or file_get_contents and regex parsing to extract the bits you want. You could do it with Yahoo Pipes as well if your web host does not allow remote URL retrieval.
If none of the Java brigade come back with something better contact me and I'll do some rough code for you in PHP.

Categories