Im trying to make a simple webpage which obtains football league table data
http://www.skysports.com/football/league/0,19540,11660,00.html
For example i want to read in the points column and divide it by the number of games played to get an average points per game column that i will print onto my webpage.
How can i do this online?
Im quite experienced at doing this with offline programmes such as C/Matlab but i dont know where to start with it online.
Thanks
I wouldn't suggest to do it client side (on browser). It will be easier to scrap on server side (using java for example) following the steps:
Grab the content of the webpage (skysports)
Use existing html markup with regex to locate the desired content part.
Strip/split html markup with regex to get records (tr) and fields (td).
Cast values and do your math.
Use results to generate your version of html or json or whatever.
Serve the generated content to your client.
In general scrapping is easy but not guaranteed for tomorrow as source html markup may change at any time (and without warning).
I can provide a basic sample in C# if you want. (Sorry I haven't "java" since 1997).
You use jQuery.get like this:
$.get('http://www.skysports.com/football/league/0,19540,11660,00.html', function(data) {
//do the parsing here
});
There are several programing languages capable of getting at this information, PHP would be the classic method using curl or file_get_contents and regex parsing to extract the bits you want. You could do it with Yahoo Pipes as well if your web host does not allow remote URL retrieval.
If none of the Java brigade come back with something better contact me and I'll do some rough code for you in PHP.
Related
Let me just start by saying that this is a soft question.
I am rather new to application development, and thus why I'm asking a question without presenting you with any actual code. I know the basics of Java coding, and I was wondering if anyone could enlighten me on the following topic:
Say I have an external website, Craigslist, or some other site that allows me to search through products/services/results manually by typing a query into a searchbox somewhere on the page. The trouble is, that there is no API for this site for me to use.
However I do know that http://sfbay.craigslist.org/search/sss?query=QUERYHERE&sort=rel points me to a list of results, where QUERYHERE is replaced by what I'm looking for.
What I'm wondering here is: is it possible to store these results in an Array (or List or some form of Collection) in Java?
Is there perhaps some library or external tool that can allow me to specify a query to search for, have it paste it in to a search-link, perform the search, and fill an Array with the results?
Or is what I am describing impossible without an API?
This depends, if the query website accepts returning the result as XML or JSON (usually with a .xml or .json at the end of url) you can parse it easily with DOM for XML on Java or download and use the JSONLibrary to parse a JSON.
Otherwise you will receive a HTML that is the page that a user would see in a browser, then you can try parse it as a XML but you will have a lot of work to map all fields in the HTML to get the list as you want.
Heres what i want to do. Im quite a beginner with this so maybe a lame question, But, I want to implement gui application in java wich gets data from sports livescore pages
e.g
http://www.futbol24.com/Live/
http://livescore.com/
and parse it (somehow) in my app...and then i will be able to store it in for example jtable ,save full time results in database,playing sounds after goal is scored and so on
What is the best way to do this ?
It would be almost impossible to parse an HTML document from a live web page and get specific information from it. If you did manage to work out exactly where in the document the data is, the page structure could change at any time. The scores might not even be in the HTML - they could be fetched by Javascript in the page.
I suggest you find an RSS feed of the information you want. Then you'll only have a nice, small piece of XML to parse. That's what it's for.
I want to collect domain names (crawling). I have wrote a simple Java application that reads HTML page and save the code in text file. Now, I want to parse this text in order to collect all domain names without douplicates. But I need the domain names without "http://www.", just domainname.topleveldmian or the possibilities of dmianname.subdomain.topleveldomain or whatever number of subdomains (then, the collected links need to be extracted the same way and collect the links inside them till I reach certain number of links, say 100).
I have asked about this in previous posts https://stackoverflow.com/questions/11113568/simple-efficient-java-web-crawler-to-extract-hostnames , and searched. JSoup seems good solution but I have not worked with JSoup before, so before going deeply on it. I just want to ask: Does it achieve what I want to do ?? Any other suggestions for achieving my simple crawling in a simple way are welcome.
jsoup is a Java library for working with real-world HTML. It provides
a very convenient API for extracting and manipulating data, using the
best of DOM, CSS, and jquery-like methods
So yes, you can connect to a website extract its html and parse it with jsoup.
The logic of extracting the top level domain is "your part" you will need to write the code logic yourself.
Take a look at the docs for more options...
Use selector-syntax to find elements
Use DOM methods to navigate a document
How to get print out a document (Which taken from data base or current fields form the form) in java with customized page size. Mostly important thing is I want to customize the page as my requirements (May be text alignment also needed). am Not a java hard coder. Your helps will me big help to me.
Thanks.
not clear what is (Which taken from data base or current fields form the form) , I suggest to go throught the 2D Graphics tutorial, there is detailed described Printing in Java
Everywhere I've worked that wanted well formatted output from a Java back-end we've deployed Apache FO (http://xmlgraphics.apache.org/fop/) which allowed us to use XSLT to convert XML to PDF. It works really well, but has a pretty steep learning curve.
I want to do some development in Java. I'd like to be able to access a website, say for example
www.chipotle.com
On the top right, they have a place where you can enter in your zip code and it will give you all of the nearest locations. The program will just have an empty box for user input for their zip code, and it will query the actual chipotle server to retrieve the nearest locations. How do I do that, and also how is the data I receive stored?
This will probably be a followup question as to what methods I should use to parse the data.
Thanks!
First you need to know the parameters needed to execute the query and the URL which these parameters should be submitted to (the action attribute of the form). With that, your application will have to do an HTTP request to the URL, with your own parameters (possibly only the zip code). Finally parse the answer.
This can be done with standard Java API classes, but it won't be very robust. A better solution would be HttpClient. Here are some examples.
This will probably be a followup question as to what methods I should use to parse the data.
It very much depends on what the website actually returns.
If it returns static HTML, use an regular (strict) or permissive HTML parser should be used.
If it returns dynamic HTML (i.e. HTML with embedded Javascript) you may need to use something that evaluates the Javascript as part of the content extraction process.
There may also be a web API designed for programs (like yours) to use. Such an API would typically return the results as XML or JSON so that you don't have to scrape the results out of an HTML document.
Before you go any further you should check the Terms of Service for the site. Do they say anything about what you are proposing to do?
A lot of sites DO NOT WANT people to scrape their content or provide wrappers for their services. For instance, if they get income from ads shown on their site, what you are proposing to do could result in a diversion of visitors to their site and a resulting loss of potential or actual income.
If you don't respect a website's ToS, you could be on the receiving end of lawyers letters ... or worse. In addition, they could already be using technical means to make life difficult for people to scrape their service.