How do you set javascript as enabled when using DefaultHttpClient? - java

Im trying to use DefaultHttpClient to log into xbox.com. I realize that you cant be logged in without visiting http://login.live.com, so I was going to submit to the form on that page and then use the cookies in any requests to xbox.com.
The problem is that requesting anything from live.com using DefaultHttpClient returns the followings message.
Windows Live ID requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.
How do I tell DefaultHttpClient to tell the server that javascript is available for use? I tried looking in the default options and also adding it as a parameter object but I cant see what I've got to do.

The reason this is happening is that this line of HTML is getting parsed from live:
<noscript><meta http-equiv="Refresh" content="0; URL=http://login.live.com/jsDisabled.srf?mkt=EN-US&lc=1033"/>Windows Live ID requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.<br /><br />To find out whether your browser supports JavaScript, or to allow scripts, see the browser's online help.</noscript>
Which is used to redirect you if your client does not have javascript enabled (and therefore will parse <noscript> tags.)
You could try to use a less intelligent HTTP library which does no parsing of the content, but which instead simply does the transport and leaves the parsing to you.

Use Wireshark to trace the communication using both a browser and your program, and look for the differences. It's hard to say what, exactly, live.com/xbox.com are looking for, but there is likely some AJAX-y code used to get the actual content.

Windows Live ID requires JavaScript to sign in. This web browser either does not support JavaScript, or scripts are being blocked.To find out whether your browser supports JavaScript, or to allow scripts, see the browser's online help.

Related

JAVA: how to download webpage dynamically created by servlet

I want to download a source of a webpage to a file (*.htm) (i.e. entire content with all html markups at all) from this URL:
http://isap.sejm.gov.pl/DetailsServlet?id=WDU20061831353
which works perfectly fine with FileUtils.copyURLtoFile method.
However, the said URL has also some links, for instance one which I'm very interested in:
http://isap.sejm.gov.pl/RelatedServlet?id=WDU20061831353&type=9&isNew=true
This link works perfectly fine If open it with a regular browser, but when I try to download it in Java by means of FileUtils -- I got only a no-content page with single message "trwa ladowanie danych" (which means: "loading data...") but then nothing happens, the target page is not loaded.
Could anyone help me with this? From the URL I can see that the page uses Servlets -- is there a special way to download pages created with servlets?
Regards --
This isn't a servlet issue - that just happens to be the technology used to implement the server, but generally clients don't need to care about that. I strongly suspect it's just that the server is responding with different data depending on the request headers (e.g. User-Agent). I see a very different response when I fetch it with curl compared to when I load it in Chrome, for example.
I suggest you experiment with curl, making a request which looks as close as possible to a request from a browser, and then fiddling until you can find out exactly which headers are involved. You might want to use Wireshark or Fiddler to make it easy to see the exact requests/responses involved.
Of course, even if you can fetch the original HTML correctly, there's still all the Javascript - it would be entirely feasible for the HTML to contain none of the data, but for it to include Javascript which does the actual data fetching. I don't believe that's the case for this particular page, but you may well find it happens for
try using selenium webdriver to the main page
HtmlUnitDriver driver = new HtmlUnitDriver(true);
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
driver.get(baseUrl);
and then navigate to the link
driver.findElement(By.name("name of link")).click();
UPDATE: I checked the following: if I turn off the cookies in Firefox and then try to load my page:
http://isap.sejm.gov.pl/RelatedServlet?id=WDU20061831353&type=9&isNew=true
then I yield the incorrect result just like in my java app (i.e. page with "loading data" message instead of the proper content).
Now, how can I manage the cookies in java to download this page properly then?

How to append html to a website response before it reaches the browser in Java?

Recently I used a Mac application called Spotflux. I think it's written in Java (because if you hover over its icon it literally says "java"...).
It's just a VPN app. However, to support itself, it can show you ads... while browsing. You can be browsing on chrome, and the page will load with a banner at the bottom.
Since it is a VPN app, it obviously can control what goes in and out of your machine, so I guess that it simply appends some html to any website response before passing it to your browser.
I'm not interested in making a VPN or anything like that. The real question is: how, using Java, can you intercept the html response from a website and append more html to it before it reaches your browser? Suppose that I want to make an app that literally puts a picture at the bottom of every site you visit.
This is, of course, a hypothetical answer - I don't really know how Spotflux works.
However, I'm guessing that as part of its VPN, it installs a proxy server. Proxy servers intercept HTTP requests and responses, for a variety of reasons - most corporate networks use proxy servers for caching, monitoring internet usage, and blocking access to NSFW content.
As a proxy server can see all HTTP traffic between your browser and the internet, it can modify that HTTP; often, a proxy server will inject an HTTP header, for instance; injecting an additional HTML tag for an image would be relatively easy.
Here's a sample implementation of a proxy server in Java.
There are many ways to do this. Probably the easiest would be to proxy HTTP requests through a web proxy like RabbIT (written in java). Then just extend the proxy to mess with the response being sent back to the browser.
In the case of Rabbit, this can either be done with custom code, or with a special Filter. See their FAQ.
WARNING: this is not as simple as you think. Adding an image to the bottom of every screen will be hard to do, depending on what kind of HTML is returned by the server. Depending on what CSS, javascript, etc that the remote site uses, you can't just put the same HTML markup in and expect it to act the same everywhere.

Do Applets use Browser for HTTP Requests?

Is there any interaction between applets and their hosting browser when making HTTP requests, or are requests made completely independently of native browser code?
Specifically, do Java applets running in a browser have some implicit way of sharing the browser's session state and cache?
I've read a few posts from non-authoritative sources saying that when an applet makes an HTTP request that it will use the browser's cache, and that it will also have access (somehow) to the browser's cookies.
Tests I've done using URLConnection suggest that this is not the case, and my gut feeling is that it sounds far too convenient to be true. I would assume that nothing in the JVM knows anything about the world outside of that JVM, meaning the only other way this could work would be if the JVM implementation is specific to the browser its implementation of the URL-related methods delegate to native browser code?
If cookie data is not implicitly shared or available, is best practice to pass a session ID in a param tag to the applet? Are there security concerns with this approach? If the applet doesn't use the browser's cache for requests, how does caching requests in an applet work?
Applets are executed by the Java Plugin, which is a browser plugin. The applet is indeed part of an HTML page loaded by the browser, can communicate with the browser DOM and with JavaScript code in the page, and uses the browser to send requests to its originating server.
See http://docs.oracle.com/javase/tutorial/deployment/applet/appletExecutionEnv.html and http://docs.oracle.com/javase/tutorial/deployment/applet/server.html for more information.
My testing with Windows 7, Java 1.6.23 and Firefox, Chrome and Internet Explorer is that HttpURLConnections from within an applet's JVM interact in no way with the browser. They don't use the cache, and don't have cookie headers added.
I think it depends on the Java plugin. My experience is that usually it uses the browser cache for network connections, and usually it transmits the cookies. I have had to empty the browser cache before to get a new file in an applet.
If you look at the Oracle Java 7 Plugin Control Panel, you will see an option in the network parameters to use direct connections for the applets, but the default is to use "browser parameters".
As for the cookies, I have seen in the past some Java plugins that did not transmit the session cookies, in particular on MacOS X (Apple even suggested a workaround). But most developers now assume that they are transmitted, and in practice it usually works.
Applets do not share the session information by default, but you can pass the session ID via Applet parameter while initializing. And use the session ID for each HTTP request.
Applets can interact with the browser to make HTTP requests via JavaScript calls.
If you use any Java HTTP APIs e.g. UrlConnection, Apache HTTPClient, java.net.Socket these libraries will not interact with the browser. They behave as if they were in a standalone JVM.
Caching id depenednt onthe API you use, Apache HttpClient has a cache. URLConnection lets you write your own cache easy enough.
You can not directly access the existing cache in JavaScript yet, its comming tho. https://developer.mozilla.org/en-US/docs/Web/API/CacheStorage.
A param tag can not change once the page is rendered, e.g. OAuth tokens need refreshing periodically.
You could fetch cookies from the browser via JavaScript and manually add them to a Java initiated HTTP request. This mechanism allows them to be updated.
There is not much added risk sharing a cookie. You would have to remove the HTTPOnly flag on the cookie if there is one.
If you are allowing Java in the browser your users are letting you do pretty much anything. Java inside the browser does have a sandbox but its worryingly easy to break out. If you can design apps without Java they will be much more secure for users.
From the point of view of the person writing the Applet, Java is secure and much more flexible than JavaScript in a Browser.

GWT: Check if URL is dead

I'm trying to check if a url (in String form) returns 404 error. However, I can't seem to use java.net.URL, and I read somewhere that java.net is not supported in GWT? If so, how do I check if URL is dead or not in GWT?
Much appreciated.
You are right. In client side GWT you cannot use java.net.URL. Take a look at Google's JRE Emulation Reference if you are unsure what parts of the Java standard library can be uses with GWT.
Theoretically it would be possible to check a URL with an AJAX request (see RequestBuilder). But due to the same origin policy it is likely that the browser prevents such an attempt.
So I think you should implement the check on your applications server side (according to the link provided by Roflcoptr above in the comments) and call that routine with GWT-RPC.

Using HTTP OPTIONS to retrieve information about REST resources

This problem relates to the Restlet framework and Java
When a client wants to discover the resources available on a server - they must send an HTTP request with OPTIONS as the request type. This is fine I guess for non human readable clients - i.e. in code rather than a browser.
The problem I see here is - browsers (human readable) using GET, will NOT be able to quickly discover the resources available to them and find out some extra help documentation etc - because they do not use OPTIONS as a request type.
Is there a way to make a browser send an OPTIONS/GET request so the server can fire back formatted XML to the client (as this is what happens in Restlet - i.e. the server response is to send all information back as XML), and display this in the browser?
Or have I got my thinking all wrong - i.e. the point of OPTIONS is that is meant to be used inside a client's code and not meant to be read via a browser.
Use the TunnelService (which by default is already enabled) and simply add the method=OPTIONS query parameter to your URL.
(The Restlet FAQ Q19 is a similar question.)
I think OPTIONS is not designed to be 'user-visible'.
How would you dispatch an OPTIONS request from the browser ? (note that the form element only allows GET and POST).
You could send it using XmlHttpRequest and then get back XML in your Javascript callback and render it appropriately. But I'm not convinced this is something that your user should really know about!

Categories