parsing web page which is changing real time in JAVA

parsing web page which is changing real time in JAVA - java

Heres what i want to do. Im quite a beginner with this so maybe a lame question, But, I want to implement gui application in java wich gets data from sports livescore pages
e.g
http://www.futbol24.com/Live/
http://livescore.com/
and parse it (somehow) in my app...and then i will be able to store it in for example jtable ,save full time results in database,playing sounds after goal is scored and so on
What is the best way to do this ?

It would be almost impossible to parse an HTML document from a live web page and get specific information from it. If you did manage to work out exactly where in the document the data is, the page structure could change at any time. The scores might not even be in the HTML - they could be fetched by Javascript in the page.
I suggest you find an RSS feed of the information you want. Then you'll only have a nice, small piece of XML to parse. That's what it's for.

Related

Getting a specific value from a webpage from scraped HTML

Curerntly using Java to scrape the HTML code from this page http://counter.onlineclock.net/
I want to get the value from the counter, but this is unique for each version of the webpage, that is, if its open in different browsers or for different people it will be a different value.
Because of this, when I scrape the HTML, the value that I am looking for is just blank. I am wondering if there is any way at all for me to get the current value I am looking for.
For example, if I have the counter at 4 I would like to be able to get that value. It does not have to be in java, any language or any way.

JSoup is a great library for scraping data out of a web page. There are a lot of good examples of its usage on the web

Java - use searchbar on given website

Let me just start by saying that this is a soft question.
I am rather new to application development, and thus why I'm asking a question without presenting you with any actual code. I know the basics of Java coding, and I was wondering if anyone could enlighten me on the following topic:
Say I have an external website, Craigslist, or some other site that allows me to search through products/services/results manually by typing a query into a searchbox somewhere on the page. The trouble is, that there is no API for this site for me to use.
However I do know that http://sfbay.craigslist.org/search/sss?query=QUERYHERE&sort=rel points me to a list of results, where QUERYHERE is replaced by what I'm looking for.
What I'm wondering here is: is it possible to store these results in an Array (or List or some form of Collection) in Java?
Is there perhaps some library or external tool that can allow me to specify a query to search for, have it paste it in to a search-link, perform the search, and fill an Array with the results?
Or is what I am describing impossible without an API?

This depends, if the query website accepts returning the result as XML or JSON (usually with a .xml or .json at the end of url) you can parse it easily with DOM for XML on Java or download and use the JSONLibrary to parse a JSON.
Otherwise you will receive a HTML that is the page that a user would see in a browser, then you can try parse it as a XML but you will have a lot of work to map all fields in the HTML to get the list as you want.

Does JSoup achieve this?

I want to collect domain names (crawling). I have wrote a simple Java application that reads HTML page and save the code in text file. Now, I want to parse this text in order to collect all domain names without douplicates. But I need the domain names without "http://www.", just domainname.topleveldmian or the possibilities of dmianname.subdomain.topleveldomain or whatever number of subdomains (then, the collected links need to be extracted the same way and collect the links inside them till I reach certain number of links, say 100).
I have asked about this in previous posts https://stackoverflow.com/questions/11113568/simple-efficient-java-web-crawler-to-extract-hostnames , and searched. JSoup seems good solution but I have not worked with JSoup before, so before going deeply on it. I just want to ask: Does it achieve what I want to do ?? Any other suggestions for achieving my simple crawling in a simple way are welcome.

jsoup is a Java library for working with real-world HTML. It provides
a very convenient API for extracting and manipulating data, using the
best of DOM, CSS, and jquery-like methods
So yes, you can connect to a website extract its html and parse it with jsoup.
The logic of extracting the top level domain is "your part" you will need to write the code logic yourself.
Take a look at the docs for more options...
Use selector-syntax to find elements
Use DOM methods to navigate a document

Read and Analyze Data on another webpage and insert onto mine

Im trying to make a simple webpage which obtains football league table data
http://www.skysports.com/football/league/0,19540,11660,00.html
For example i want to read in the points column and divide it by the number of games played to get an average points per game column that i will print onto my webpage.
How can i do this online?
Im quite experienced at doing this with offline programmes such as C/Matlab but i dont know where to start with it online.
Thanks

I wouldn't suggest to do it client side (on browser). It will be easier to scrap on server side (using java for example) following the steps:
Grab the content of the webpage (skysports)
Use existing html markup with regex to locate the desired content part.
Strip/split html markup with regex to get records (tr) and fields (td).
Cast values and do your math.
Use results to generate your version of html or json or whatever.
Serve the generated content to your client.
In general scrapping is easy but not guaranteed for tomorrow as source html markup may change at any time (and without warning).
I can provide a basic sample in C# if you want. (Sorry I haven't "java" since 1997).

You use jQuery.get like this:
$.get('http://www.skysports.com/football/league/0,19540,11660,00.html', function(data) {
//do the parsing here
});

There are several programing languages capable of getting at this information, PHP would be the classic method using curl or file_get_contents and regex parsing to extract the bits you want. You could do it with Yahoo Pipes as well if your web host does not allow remote URL retrieval.
If none of the Java brigade come back with something better contact me and I'll do some rough code for you in PHP.

How To Paginate A Long Text Post

I'm Creating a website for online reading stories using grails and i'm facing a business problem that if i post a story say 30 pages A4 in the Fckeditor and it had saved will .. the question is how can i display this in a 30 pages with a pagination or something like that.. do anyone have idea cause i'm out of simple ideas and i think making many lists of the story is a rough idea.. so is there any java/grails/groovy or even jQuery idea that can save my day?

You're probably going to have to write some custom code here.
If you want to auto-paginate, then you'll have to index into the string and grab a fixed number of characters, words, paragraphs, or whatever makes sense. If you don't need to auto-paginate, then you can embed pagination information in the text string. Either way you'll have to expose a page number field in the view, then use that to page into the text based on what the user provided.
Sorry I don't have a more elegant solution.

If you are willing to do it client-side then there are jQuery options, such as SimplePager plugin or jQuery Paginate. You will probably need your HTML to be divided into "segments" such as DIVs or LIs or something divisible, though.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

parsing web page which is changing real time in JAVA - java

Related

Getting a specific value from a webpage from scraped HTML

Java - use searchbar on given website

Does JSoup achieve this?

Read and Analyze Data on another webpage and insert onto mine

How To Paginate A Long Text Post

Categories

Resources