Getting data from a website that needs you to log in (Java)

Getting data from a website that needs you to log in (Java) - java

I don't even know if what I'm asking is possible and I don't know what to search for on Google.
Basically, there are multiple projects that would require me to fetch some data from websites. The example I'm thinking of right now is to grab my account info from a banking site http://www.americanexpress.ca I'd like to know how I'd make it so my login info is entered in the fields on the left and grab the data from the resulting page. I'd then make methods to parse that data.
Obviously, this would need to be secure as I don't want my banking info stolen.
Sorry if the solution is obvious as I've never tried grabbing data from websites.

As mentioned, Apache HttpClient is one option, though personally I've always found HtmlUnit to be a bit more convenient to work with (from an API standpoint) for doing things like this. HtmlUnit is built on top of HttpClient, and exposes a higher-level API for interacting with and manipulating page content.

You have to use Apache HttpClient (or same) library. It have all required classes for you.

Related

Getting Data from Internet in Java

I thought of making the following application for my college project in java. I know core java. I want to know what should i read "specifically" for this project as there is less time:
It will have an interface to put your query. This string would go as a query to internet search engines and with the help of search engine find the data (the first web page that we see (that is data for my application for this time. :) )).
I do not want to display the data. I just want the HTML file or the source code of the generated web page. Is it sounding like Common Getaway Interface? I do not know about this.
But i think it for the same purpose. If it is this. please guide me to know how to implement this.
Whatever please specify
Problem 1 : What should i read ? Any direct help at this point is not my intention. I want to implement it myself.
Problem 2 : Is connecting to internet requires some jnlp knowledge too.
for eg. as on google we search something it shows us the links of the websites. I can see the source code of this generated web page. I just want this page for my application to work on.
EDIT:
I do not want to rely on google only or any particular web server. I want to decide that by my application.
Please also refer to my problem 2.
As i discovered that we have Terms of Conditions for websites should i try to make my crawler. Would then my application not breaking the rules . Well its important for me.

Ashish,
Here what I would recommend.
Learn the basics of JSON from these links (Introduction ,lib download)
Then look at the Google Web Search JSON API here.
Learn how to GET the data from servers using HttpClient library here.
Now what you have to do is, fire a get request for the search, read the JSON response, parse the response using the JSON lib from #1 and you have the search results.
Most of the search engines (Bing etc) offer Jason/REST apis so you can do the same for other search engines.
Note: Jason APIs are normally used from JavaScritps on the UI side but since its very easy and quick to learn, I suggested you that. You can also explore (if time permits) the XML based APIs also.

URL url = new URL("http://fooooo.com");
in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
{
System.out.println(inputLine);
}
Should be enough to get you started .
And yes , do check if you are not violating the usage terms of a website . Search Engines dont really like you trying to access them via a program .
Many , Including Google , has APIs specifically designed for this purpose.

you can do everything you want using HTMLUnit. It´s like a web browser but for java. Check some examples at their website.

Read "Working with URL's" in the Java tutorial to get an idea what is behind the available libs like HTMLUnit, HttpClient, etc

I do not want to display the data. I just want the HTML file or the source code of the generated web page.
You probably dont need the HTML either. Google provide its search results as a web service using this API. Similarly for other search engine GIYF. You get the search results as XML, which is far more easier for you to parse. Plus the XML wont have any unwanted data like ads.

Adding content to Liferay via API

I am starting using Liferay Portal and I have two basic needs which I would like to achieve with Liferay.
Is there a posibility to add content to CMS through API level? I would like to insert some data "from code".
More important. How to achieve such situation that for every created user there will be its own homepage generated with some predefined template elements on it?
I have tried to Google something so far, but I did not find it helpful. Maybe some keywords?
After some analysis of documentation devoted to services and ServiceBuilder I realized that it is not what I want.
Let me show an example based on Websphere.
In Websphere we have bunch of EJB components available to perform some actions, exchange information with portal, easy to use. Isn't there any similar mechanism in Liferay not involving web services?

My recommendation for this kind of question is to take a look at the sevencogs-hook sourcecode. The structure of this hook is basically just a long script that runs once, setting up a complete demo site with users, sites, pages, content etc. The code runs once (after the first deployment) and then never again. There are no (obvious) conditionals, no context to understand etc.
You can basically just step through everything and - in that process - understand how content (and pages, images, blog posts, etc.) are created and positioned on pages in Liferay.
This hook accesses the Java API, a very similar API is available through Webservices. Basically all of Liferay's portlets also use the same API to do their business.
Edit: Additional information to keep this answer valuable/current: Sevencogs is discontinued, but still available in old releases (source & binary). The API has slightly changed, so compiling/running it will need a bit of work. James Falkner has blogged about the leftovers and lessons learnt - those snippets are extracted from sevencogs and contain the relevant code pieces to work with the API.

Looking at this page from the documentation: It smells like a SOAP interface (they mention some sort of document uploader service and I've read axis).
You'll find some url examples that should give a list of available webservices.

For number 1, you can use the one of the:
JournalArticleLocalServiceUtil.addArticle()
methods to programmatically add Liferay Web Content from a portlet. If you download the Liferay Portal Source you can see the structure of these methods.
For number 2, can create page templates with preconfigured portlets on them (through the Plugins-SDK), and then use the API to programmatically create the pages using one of the:
LayoutLocalServiceUtil.addLayout()
methods.
If you have any more speific questions about these comment back, and I hope this helps!

Can I realistically move all templating to javascript for my webapp?

I have a Spring-MVC based webapp with a JSP front end. It is your basic CRUD app with various other management and reporting screens thrown in.
We are currently using JSP with JSTL for our view, but our designer doesn't know JSP so it's been a real pain to merge his design changes into the source. Due to that, my recent thought has been that if we could just hand the entire UI over to him and let him implement it entirely in HTML/Javascript, making ajax requests for JSON data for the dynamic portions, we would be able to remove that entire merge process and just host his static HTML files. Development for him would be simple as he would be able to hit our REST webapp on our test server for sample JSON data using jsonp.
If the designer is proficient with javascript, what would we lose by changing our spring-mvc webapp to only return JSON views and use jQote or jquery-tmpl to do all dynamic bits in the HTML?
Are there any deal breakers in going this route?

You'd just lose the ability to take advantage of JSP-based frameworks and templates. If:
your developer is proficient in Javascript,
you expect future developers in his place to be proficient as well, and
you are okay with making javascript a requirement for your site
then this can be a good strategy. The JSON will probably make your AJAX calls a lot faster than returning actual content would do. You'll probably be able to make the site a lot more responsive to user interaction.

The problem with injecting content via JavaScript is that search engines cannot see it. They get the page source as it is a load time. If this is an internal application that may not matter, but if it's a public-facing site it could mean very bad things.
You can build entire interfaces from JSON data and a bit of JavaScript on the client. As a technique it works quite well and is fast, but beware of the SEO implications.

One more point to add:
Say you are loading 300 rows of data to show, then you will have to load 100 row using JS and then show it to user.
It will mimic the streaming features. Content will be shown after request is populated.

Android - Obtaining data from a website

I'm finding my way around Android and so far so good. My next big challenge is coming to grips with web services. I would like to build an app that reads data from a web site or database on web server and store the data in my app.
Basically, it will be an app that I build in conjunction with a news website that pulls their latest articles into the app. What I'm finding difficult is how to bridge the gap between my application and the data in the SQL Server database.
I'm familiar with building asp websites that read data from a database, but how would I do something similar with an app?
Do I ask the website to store the articles in an xml format? Or, is there another way that I can request a specific article and be provided with the content?
I hope I'm phrasing the question correctly and that someone can just guide me to the right way to approach this.
Thanks in advance.

You can approach this problem from different perspectives.
The common solution is to build a Webservice that will bridge the gap between your mobile application and the data that remain in your server. I personnaly prefer to setup a Rails backend and thus have a RESTful API that will help me access my data. For instance, to retrieve the list of articles I could just request the following url: http://my_server_host/articles. So for the Webservice part you can have whatever you want: Rails, J2EE, .NET etc. And you can choose the model that fits your needs (REST, SOAP, XML-RPC etc.).
Then you will have to write a class that will contain all the necessary calls to the Webservice you have built. Basically, if your Webservice returns the results as an XML format you will have to:
Send the request to the appropriate URL. (See: HttpGet or HttpPost if you want to modify a resource).
Parse the XML returned. (In short, you can use SAX or DOM to parse your XML response and transform them to a business entity (an Article, a User etc.).)
This hopefully gives you a hint about a possible solution. By the way Google is your friend, but I will probably come back to add external links/resources to help you more.
Edit
Another possible solution that could work for you, since all you need is to retrieve some articles. Just setup a simple Wordpress blog for instance. Wordpress gives you an URL for the blog's RSS feed, all you will have to do is to parse that RSS feed (XML). There is a great article on the IBM website for parsing an RSS feed that you can find here. By the way, this solution is only possible if you want to save your articles on a Wordpress blog. But you got the point hopefully.

Reading your data form the Database on the Server would be bad practice. You'd have to open up some ports and that's defiantly not what you want (if you don't have root-access, you also can't).
For non-interactive content (what you want) you would use XML or JSON.

Automatic sitemap generation

We have recently installed a Google Search Appliance in order to power our internal search (via the Java API), and all seems to be well, however I have a question regarding 'automatic' site-map generation that I'm hoping you guys may know the answer to.
We are aware of the GSA's ability to auto-generate site maps for each of its collections, however this process is rather manual, and considering that we have around 10 regional sites that need to be updated as often as possible, its not ideal to have to log into the admin interface on a regular basis in order to export them to the site root where search engines can find them.
Unfortunately there doesn't seem to be any API support for this, at least none that I can find, so I was wondering if anyone had any ideas for a solution/workaround or, if all else fails, the best alternative.
At present I'm thinking that if we can get the full index back from the API in the form of a list, then we can write an XML file out using that the old fashioned way using a chronjob or similar, however this seems like a bit of a clumsy solution - any better ideas.

You could try the GSA Admin Toolkit, or simply write some code yourself which just logs in on the administration page and then uses that session to invoke the sitemap export URL (which is basically what the Admin Toolkit does).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.