How to programmatically download image from website? - java

I need to download images from a website, and I have the login name and password, but if i just use URL to download the image, it will throw a exception: there is no value in session.
I think I need to login the website before I can programmatically download the image.
Do you have any solutions ? Thanks in advance !

In simple circumstances you can use a URLConnection with the URL and stream the contents down. More generally I'd strongly advise you use Apache HttpClient since you'll need to do authentication and possibly receive and send cookies to the server. Read the user guide regarding Authentication and Methods, particularly Get.

Use the HTTP Client libraries in order to write a spider for content access.
I would suggest to record the HTTP traffic for login and content access and then rebuild the communication using the library, if you want to stick with Java.
There are other libraries as well for other languages like Perl:LWP.

Although the java.net package provides basic functionality for accessing resources via HTTP, it doesn't provide the full flexibility or functionality needed by many applications. HttpClient seeks to fill this void by providing an efficient, up-to-date, and feature-rich package implementing the client side of the most recent HTTP standards and recommendations.
Designed for extension while providing robust support for the base HTTP protocol, HttpClient may be of interest to anyone building HTTP-aware client applications such as web browsers, web service clients, or systems that leverage or extend the HTTP protocol for distributed communication.
HTTPClient
HTTPClient Authentication

I'd like to mention HtmlUnit. It is a headless browser with Javascript for Java.

Related

Android: Downloading an NTLM-authentication-protected file

I want to download a file from a Sharepoint server that protected with NTLM authentication from my Android application. I found some tutorials and couldn't successful.
I tried using The Java CIFS Client Library and did not successful again.
I investigated this post: Manipulating SharePoint list items with Android (JAVA) and NTLM Authentication but i do not want to consume a webservice, i just want to download a file.
Any suggestions?
Did you use standard Java java.net.Authenticator http://developer.android.com/reference/java/net/Authenticator.html? If it doesn't support NTLM check http://developer.android.com/reference/org/apache/http/auth/NTCredentials.html and related org.apache.http package. Also look at blog http://mrrask.wordpress.com/2009/08/21/android-authenticating-via-ntlm/ where it is shown how to use it. In par
Why complicate things
You should be able to send the authentication in the Uri.
URL url = new URL ("http://user:pass#sub.domain.com/FolderName/FileName.docx");
This technique should work with both Windows Authentication and Basic Authentication
Try using Chilkat, although it's not free. but you can easily implement it in your code.
Chilkat Link

Do Applets use Browser for HTTP Requests?

Is there any interaction between applets and their hosting browser when making HTTP requests, or are requests made completely independently of native browser code?
Specifically, do Java applets running in a browser have some implicit way of sharing the browser's session state and cache?
I've read a few posts from non-authoritative sources saying that when an applet makes an HTTP request that it will use the browser's cache, and that it will also have access (somehow) to the browser's cookies.
Tests I've done using URLConnection suggest that this is not the case, and my gut feeling is that it sounds far too convenient to be true. I would assume that nothing in the JVM knows anything about the world outside of that JVM, meaning the only other way this could work would be if the JVM implementation is specific to the browser its implementation of the URL-related methods delegate to native browser code?
If cookie data is not implicitly shared or available, is best practice to pass a session ID in a param tag to the applet? Are there security concerns with this approach? If the applet doesn't use the browser's cache for requests, how does caching requests in an applet work?
Applets are executed by the Java Plugin, which is a browser plugin. The applet is indeed part of an HTML page loaded by the browser, can communicate with the browser DOM and with JavaScript code in the page, and uses the browser to send requests to its originating server.
See http://docs.oracle.com/javase/tutorial/deployment/applet/appletExecutionEnv.html and http://docs.oracle.com/javase/tutorial/deployment/applet/server.html for more information.
My testing with Windows 7, Java 1.6.23 and Firefox, Chrome and Internet Explorer is that HttpURLConnections from within an applet's JVM interact in no way with the browser. They don't use the cache, and don't have cookie headers added.
I think it depends on the Java plugin. My experience is that usually it uses the browser cache for network connections, and usually it transmits the cookies. I have had to empty the browser cache before to get a new file in an applet.
If you look at the Oracle Java 7 Plugin Control Panel, you will see an option in the network parameters to use direct connections for the applets, but the default is to use "browser parameters".
As for the cookies, I have seen in the past some Java plugins that did not transmit the session cookies, in particular on MacOS X (Apple even suggested a workaround). But most developers now assume that they are transmitted, and in practice it usually works.
Applets do not share the session information by default, but you can pass the session ID via Applet parameter while initializing. And use the session ID for each HTTP request.
Applets can interact with the browser to make HTTP requests via JavaScript calls.
If you use any Java HTTP APIs e.g. UrlConnection, Apache HTTPClient, java.net.Socket these libraries will not interact with the browser. They behave as if they were in a standalone JVM.
Caching id depenednt onthe API you use, Apache HttpClient has a cache. URLConnection lets you write your own cache easy enough.
You can not directly access the existing cache in JavaScript yet, its comming tho. https://developer.mozilla.org/en-US/docs/Web/API/CacheStorage.
A param tag can not change once the page is rendered, e.g. OAuth tokens need refreshing periodically.
You could fetch cookies from the browser via JavaScript and manually add them to a Java initiated HTTP request. This mechanism allows them to be updated.
There is not much added risk sharing a cookie. You would have to remove the HTTPOnly flag on the cookie if there is one.
If you are allowing Java in the browser your users are letting you do pretty much anything. Java inside the browser does have a sandbox but its worryingly easy to break out. If you can design apps without Java they will be much more secure for users.
From the point of view of the person writing the Applet, Java is secure and much more flexible than JavaScript in a Browser.

js- can i authenticate a user into my app using OAuth with only javascript->clientside, and js/java->server side?

I want to use OAuth in one of my apps, specifically a Google Chrome extension. Can it be done through JavaScript code? My only requirement is that it should be done with client side Javascript code, and the server can use either JavaScript or Java.
If this cannot be done, then can I use simple userid-password authentication?
Again, my only requirement is that it should be done with client side Javascript code, and the server can use either javascript or java.
You can definitely use OAuth in a Google Chrome extension, although bear in mind that your application keys and secrets will be readable in the bundle.
For more information: http://code.google.com/chrome/extensions/tut_oauth.html (the example uses one of Google's API enpoints but you could use any OAuth1.0a provider). Since you are interested in doing a Chrome extension you will not be affected by the normal hassle of request origin (cross site scripting) restrictions.
You can use "normal" userid and password authorization as well of course (especially over SSL/HTTPS). If you plan on going public with the APIs then I would recommend OAuth though.
JavaScript is pure client side scripting language. It cant be used in server side.
Second, If you want your client get authenticated there must be a server side program to do so.

Client HTTP Post to external sites

Is there any web language that allows the client itself to create HTTP posts to external sites.
I know that JavaScript does this with XMLHttpRequest, but it does not allow cross-domain posting, unless the recipient domain wants to allow the sending domain.
I want to post data to an external site (that I don't control) and have the request be authenticated with what the client's browser already has (cookies, etc).
Is this possible? I tried cURL but it seems to make a server HTTP post, not a client HTTP post.
Edit:
A bit more insight of what I am trying to do:
I am trying to POST JSON to the website using the user's session (I said cookies but I believe they are PHP sessions, which I guess I still consider cookies).
The website does NOT check the referral (poor security #1)
I can execute javascript and html on the webpage using my personal homepage (poor security #2)
The JSON code will still work even if the content-type is form (poor security #3)
There is no security checking at all, just PHP session checking.
The form idea is wonderful and it works. The probably again is that its JSON. So having sent postdata as foo={"test":"123", "test2":"456"} the whole foo= part messes it up. Plus forms seem to turn JSON into form encoding, so its sending:
foo=%7B%22
test%22%3A+%22
123%22%2C+%22
test2%22%3A+%22
456%22%7D
when i need it to send;
{"test":"123", "test2":"456"}
So with everything known, is there a better chance of sending JSON or not?
I don't think so: You won't get hold of the user's auth cookies on the third party site from server side (because of the Single Origin Policy) and you can't make Ajax requests to the third party site.
The best you can do is probably create a <form> (maybe in an <iframe>), point it to the third party site, populate it with data, and have the user submit it (or auto-submit it). You will not be able to get hold of the request results programmatically (again because of the Single Origin Policy), but maybe it'll do - you can still show the request results to the user.
I think for obvious reasons this is not allowed. If this was allowed what would stop a malicious person from posting form data from a person's browser to any number of sites in some hidden iframe or popup window.
If this is a design of your application you need to rethink what you are trying to accomplish.
EDIT: As #Pekka was pointing out I know you can submit a form to a remote site using typical form submits. I was referring to using some client side ajax solution. Sorry for the confusion.
You should follow the way OpenID and other single-sign-on system works. How openID works is your website POSTs some token to openID service and in return gets authentication result. Refer How Does it Work? section here
Yes, you can use a special flash library that supports cross-domain calls: YUI connection manager
Added: not sure about the cookie authentication issue though...
The client cannot post to an external site directly; it's a breach of basic cross-domain security models. The exception is accessing javascript with JSONP. What you describe would require access to a user's cookies for another website, which is impossible as the browser only allows cookie access within the same domain/path.
You would need to use a server-side proxy to make cross-domain requests, but you still cannot access external cookies: http://jquery-howto.blogspot.com/2009/04/cross-domain-ajax-querying-with-jquery.html

java socket to read content of webpage

Is it possible to use java socket API to read content of a webpage, ex: "www.yahoo.com"? Can somebody here show an example?
And how about reading content of a page protected by the web app login screen?
Thanks in advance,
dara kok
It's possible but not advisable. The webpage is returned using HTTP, which is more than just a stream of bytes. This means that in order to use a socket you application would need to understand the instructions in the HTTP responses and behave accordingly.
To programitically access a webpage use Jakarta Commons HTTP Client.
With regards to secure webpages, it will depend on how they are secured, however given HTTP Client can maintain cookies you should be able to perform the login through code too.
Further to Nick's answer (i.e. use the Jakarta commons HTTP Client). The login security depends on how the login page is implemented, if it is an apache .htaccess secured site you will need to place username/password information in the request header. Alternatively (and generally more usual), if it is an html form, you will need to deconstruct the form fields from the original HTML and send those as key/value parameters in the http GET/POST request

Categories