How to Store html content in database - java

I want to store the temperatures for a year from a weather forecast web site like this one into a database to can use it later in an android application. I tried to use Jsoup, but i only get pieces of the table containing temperatures.
Is there any way to get that html table content to can store it?

It would be a whole lot better if you used the API provided by wunderground instead of using jsoup in order to screen scrape the page.
The main reasons are that the implementation will be a lot cleaner and also your implementation will be immune to stylistic changes in wunderground web pages.
Here is guide on how to consume a REST web service with Spring.
Once you have retrieved the data from the API you could easily store the data in a database using an ORM framework like Hibernate since you would have already created the objects to retrieve the data.
You can make your life even easier if you use Spring with Hibernate integration to save the data. Check out this guide.
The guides mentioned above use Spring Boot to make it extremely easy to get started with the Spring framework (gone are the days where it would be almost impossible for a novice to get started with a Spring project all alone)

Broadly speaking, the HTML document displayed on the website would have to be parsed programmatically, tokenized, converted to suitable data types and finally stored into the database. However it should be checked whether the data on the website could be read via a SOAP webservice or something similar, as the interface would be cleaner and the approach more robust.

Related

Spring MVC with Google Charts, what is the best way to create javascript data tables?

I'm building a Spring MVC application where I create graphs in browser using data from database. I have already created data handling in the backend and I have needed keys and values in a Java map. I was wondering what is the best way to put data to javascript data tables which Google Charts are using. I found that there is Google Visualization Data Source Library but I didn't found much info or examples of that.
I'm leaning on sending the java map via model to javascript and parse the data there. But if that javascript parsing could somehow be avoided it would be great...
The data source library you linked to is a complete package for handling database access and returning data to your charts via the Visualization API's Query interface. If you don't want to use that, you have a spectrum of options which vary the workload from the server to the client. On one end, you can offload the parsing to the client by passing your data object as-is (or a JSON representation of it) and let the client parse that into a DataTable. On the other end, you can create a Java class that replicates the structure of a DataTable constructor's object-literal parameter (see documentation). Everything in between is transforming your data on the server into a format that is easier for the client to parse into a DataTable.
As far as I am aware, there is no pre-packaged DataTable class for Java, other than what comes in the data source library you linked to.

System architecture - Java Backend, Database, Mobile Apps

I´m building Java backends with Spring, Hibernate and RDBMSs for a while now. Also I´m regularily working on mobile applications for iOS and Android.
So I have a full stack of technology to use for this task, however I am looking for something maybe more advanced that better fits the requirements. I was having some thoughts about it, but I better first explain how my current systems work and then how I want my upcoming systems to look like.
Currently using
Spring Framework to connect everything together
Hibernate with Entity beans for persistence
MySQL or others as RDBMS
DTO objects created with Dozer
RESTful API to expose services
DTOs are transferred in JSON format
This setup works. But I have the feeling that it´s just too much work and life could be simpler with other technologies.
What I am looking for
On the mobile site, I want to receive data for the current screen that I could easily cache. JSON is something that is already serialized and that would be easy to save to disk in the mobile application, without using yet another database. So the question is, how could I store the data in the backend, so that I can more easily receive it, without using entity beans, DTOs and Dozer to convert between them? Isn´t there another database solution which already delivers JSON? What about graph databases for example, like OrientDB or Neo4J?
I definitely want to go with Java and Spring, and I am open to a replacement for Hibernate, RDBMS and entity beans and DTOs.
Looking forward to your answers!
Your current design (This setup works) has niceties which a good system should have. tiered and good separation of concerns.
If I understand your requirement correctly then, you argument is, if my end data format is JSON then why not store the data in JSON format which will get you rid of lot of plumbing code/effort in the middle tier.
It will directly enable you to fetch the data from the storage and pass it on the requesting client. These are your requirement in nutshell. Please correct me if I am wrong.
Now JSON is more of textual notation and less of storage format. Jason is generally consumed by the View tier of MVC architecture as its easy to render on the screen using Javascript.
Your reasoning of using a NoSQL DB which directly delivers JSON is credible given that tye end client is going to be mobile app.
Overall architecture looks good and highly optimized for Mobile access.
Now coming on the NoSQL JSON storage, following are the Document Store NoSQL DBs which support JSON interface
i. CouchDB
ii. JasDB
iii.SchemaFreeDB
8.You can evaluate any of these to suite your needs.
(full disclosure - I'm an engineer with Kinvey, a BaaS provider)
One option you might consider is using Backend-as-a-service. Most BaaS providers use JSON to transfer the data over the wire, which sounds like it would be compatible with your requirements.
In addition, you'll typically get a lot of common mobile app functionality baked in (i.e. push notifications, file storage and CDN infrastructure, user management, etc). This could be especially useful if you are building multiple apps, each with their own backend; rather than reinventing the wheel each time, simply spin up a new backend.
One last, but important note, would be pricing. A lot depends on your use case, but from what I've seen, a BaaS provider is usually significantly cheaper that rolling your own solution on AWS or some other cloud provider, especially since most providers offer a free tier.
Even though this question is a bit old, maybe a quick alternative for RDBMS: MongoDB. It is a document database with document-level locking. It scales really well.
Main point: it uses JSON as its document storage (actually the Binary JSON a.k.a. BSON, but that is just a superset). Inserting a document into the database is as easy as
db.collection.insert(JSON);
on the mongo shell and
DBObject bson = (DBObject) JSON.parse(JSONstr);
collection.insert(bson);
in the java driver.

Using Lucene/Solr with Spring Data

I am using Spring Data (Mongo) for my web application (close to a social networking website). Now, I wish to provide search capabilities over the content written within the application (such as posts, tags, friends, etc.).
I believe Lucene/Solr is one of the better libraries to go for such cases, but am not sure how to use (integrate?) it with Spring Data (or maybe there is some inherent support within Spring for it).
Would appreciate help (documentation, links, blog posts, etc.) on this!
Though the post has been around for a while, you may have a look at this one https://github.com/SpringSource/spring-data-solr/
The Spring Data for Solr project provides a natural Spring Data like API for querying data from Solr. Read the examples for a quick overview.
I found a good read here - http://adeithzya.wordpress.com/2011/08/25/using-apache-solr-with-spring-framework - that hits the nail on its head!
Integrating them is relatively easy, the difficult part is maintaining data consistency between them. For example, how would you answer these questions:
How and when do you intend to perform CRUD with mongo and sorl? Do you write to Mongo first (with/without waiting for a confirmation?) and then to Solr?
if you're using async writes with mongo, what happens when you send the data to solr, and then get an exception for mongo (data exist in solr, but doesn't exist in mongo)?
What happens if you get an error while trying to write to solr (data exist in mongo but not in solr)?
if you delete something from mongo, and right after that someone performs a search where solr returns that very deleted document because solr stil has that document indexed?
The point is there'll be an inconsistency window where mongo and solr are not in sync, and you probably want to handle at least some of the issues.

Can I realistically move all templating to javascript for my webapp?

I have a Spring-MVC based webapp with a JSP front end. It is your basic CRUD app with various other management and reporting screens thrown in.
We are currently using JSP with JSTL for our view, but our designer doesn't know JSP so it's been a real pain to merge his design changes into the source. Due to that, my recent thought has been that if we could just hand the entire UI over to him and let him implement it entirely in HTML/Javascript, making ajax requests for JSON data for the dynamic portions, we would be able to remove that entire merge process and just host his static HTML files. Development for him would be simple as he would be able to hit our REST webapp on our test server for sample JSON data using jsonp.
If the designer is proficient with javascript, what would we lose by changing our spring-mvc webapp to only return JSON views and use jQote or jquery-tmpl to do all dynamic bits in the HTML?
Are there any deal breakers in going this route?
You'd just lose the ability to take advantage of JSP-based frameworks and templates. If:
your developer is proficient in Javascript,
you expect future developers in his place to be proficient as well, and
you are okay with making javascript a requirement for your site
then this can be a good strategy. The JSON will probably make your AJAX calls a lot faster than returning actual content would do. You'll probably be able to make the site a lot more responsive to user interaction.
The problem with injecting content via JavaScript is that search engines cannot see it. They get the page source as it is a load time. If this is an internal application that may not matter, but if it's a public-facing site it could mean very bad things.
You can build entire interfaces from JSON data and a bit of JavaScript on the client. As a technique it works quite well and is fast, but beware of the SEO implications.
One more point to add:
Say you are loading 300 rows of data to show, then you will have to load 100 row using JS and then show it to user.
It will mimic the streaming features. Content will be shown after request is populated.

Android - Obtaining data from a website

I'm finding my way around Android and so far so good. My next big challenge is coming to grips with web services. I would like to build an app that reads data from a web site or database on web server and store the data in my app.
Basically, it will be an app that I build in conjunction with a news website that pulls their latest articles into the app. What I'm finding difficult is how to bridge the gap between my application and the data in the SQL Server database.
I'm familiar with building asp websites that read data from a database, but how would I do something similar with an app?
Do I ask the website to store the articles in an xml format? Or, is there another way that I can request a specific article and be provided with the content?
I hope I'm phrasing the question correctly and that someone can just guide me to the right way to approach this.
Thanks in advance.
You can approach this problem from different perspectives.
The common solution is to build a Webservice that will bridge the gap between your mobile application and the data that remain in your server. I personnaly prefer to setup a Rails backend and thus have a RESTful API that will help me access my data. For instance, to retrieve the list of articles I could just request the following url: http://my_server_host/articles. So for the Webservice part you can have whatever you want: Rails, J2EE, .NET etc. And you can choose the model that fits your needs (REST, SOAP, XML-RPC etc.).
Then you will have to write a class that will contain all the necessary calls to the Webservice you have built. Basically, if your Webservice returns the results as an XML format you will have to:
Send the request to the appropriate URL. (See: HttpGet or HttpPost if you want to modify a resource).
Parse the XML returned. (In short, you can use SAX or DOM to parse your XML response and transform them to a business entity (an Article, a User etc.).)
This hopefully gives you a hint about a possible solution. By the way Google is your friend, but I will probably come back to add external links/resources to help you more.
Edit
Another possible solution that could work for you, since all you need is to retrieve some articles. Just setup a simple Wordpress blog for instance. Wordpress gives you an URL for the blog's RSS feed, all you will have to do is to parse that RSS feed (XML). There is a great article on the IBM website for parsing an RSS feed that you can find here. By the way, this solution is only possible if you want to save your articles on a Wordpress blog. But you got the point hopefully.
Reading your data form the Database on the Server would be bad practice. You'd have to open up some ports and that's defiantly not what you want (if you don't have root-access, you also can't).
For non-interactive content (what you want) you would use XML or JSON.

Categories