Javascript truncated over Servlet Connection - java

I have written a Servlet that should act like a web-proxy. But some of the Javascript GET calls only return part of the original content when I am loading a page, like localhost:8080/Proxy?requestURL=example.com.
When priting the content of the java script to the console, they are complete.
But the response at the browser is truncated.
I am writing like this:
ServletOutputStream sos = resp.getOutputStream();
OutputStreamWriter writer = new OutputStreamWriter(sos);
..
String str = content_of_get_request
..
writer.write(str);
writer.flush();
writer.close();
The strange thing is, when I request directly the Javascript that was loaded during the page request like this:
localhost:8080/Proxy?requestURL=anotherexaple.com/needed.js
The whole content is returned to the browser.
It would be great if someone had an idea.
Regards
UPDATE:
The problem was the way how I created the response String:
while ((line = rd.readLine()) != null)
{
response.append(line);
}
I read one line from a Stream and appended it on a StringBuffer, but it appears that firefox and chrome had a problem with that.
It seems that some browsers implement a maximum line length for JavaScript, however there is no maximum line length mentioned in the RFC HTTP 1.1 standard.
Fix:
Just adding a "\n" to the line fixes the issue.
response.append(line+"\n");

Because what you are doing is just reading the Html Response , but you are not actually calling the other resources that are referenced in the HTML like images, js etc.
You can observe that when you monitor how the browser renders the html though Firebug for Firefox.
1) The browser receives Html response.
2)Then it parses for referenced resources and make a separate Get call for each of those.
So in order for proxy to work you need to mimick this browser behavior.
My Advice is to use a already available open source libs HTML Unit

Related

Using ServletOutputStream and PrintWriter in the same response [duplicate]

I want to redirect to a page after writing the excel file. The servlet code is given below:
ByteArrayOutputStream outByteStream = new ByteArrayOutputStream();
workbook.write(outByteStream);
byte [] outArray = outByteStream.toByteArray();
response.setContentType("application/ms-excel");
response.setContentLength(outArray.length);
response.setHeader("Content-Disposition", "attachment; filename=name_"+date+".xlsx");
response.setIntHeader("Refresh", 1);
OutputStream outStream = response.getOutputStream();
outStream.write(outArray);
response.sendRedirect("url/reports.jsp");
This code downloads an Excel file which i have created.
when i call the above servlet, the excel file is being downloaded but it is throwing following exception in the last line :
Servlet Error: ::java.lang.IllegalStateException: Cannot call sendRedirect() after the response has been committed
Hence i am unable to redirect to a new page. what can i do to access the response object after i write the output in "outStream"
The basic problem is that this ...
I want to redirect to a page after writing the excel file.
... describes two separate responses. The server cannot chain them together by itself because the client will expect only one response to each request. Because two requests are required to elicit two responses, automation of this sequence will require client-side scripting.
Personally, I would probably put the script on the front end: a handler on the appropriate button or link that first downloads the file and then (on success) issues a request for the new page. It would also be possible to do as suggested in comments, however: put script in the new page that downloads the file.
You cannot have a body with a redirect because the browser, when receiving a redirect, will issue a second request to the URL it has found (in header Location), and it's the response of that second request that is displayed, unless it is also a redirect, in which case, it will issue a third request, and so on...

HTTP Server - Serving up favicon.ico

I'm playing around setting up my own java http server to better understand http servers and what goes on under the hood of the web. I've developed a pretty simple server and have been able to serve both html pages as well as data in JSON form. Then I saw the browser (I'm using chrome but assuming it's the same for others) was sending a request for favicon.ico. I'm able to identify that request on my server, so I'm trying to serve up a random icon I downloaded and resized to 16x16 pixels in png format, as that's what the internet says the size needs to be. Here's my code, note it's not supposed to be anything professional, just something that will work for my basic educational purposes:
[set up ServerSocket and listen]
public static String err_header = "HTTP/1.1 500 ERR\nAccess-Control-Allow-Origin: *";
public static String success_header = "HTTP/1.1 200 OK\nAccess-Control-Allow-Origin: *";
public static String end_header = "\r\n\r\n";
while(true){
try{
System.out.println("Listening for new connections");
clientSocket = server.accept();
System.out.println("Connection established");
InputStreamReader isr = new InputStreamReader(clientSocket.getInputStream());
BufferedReader reader = new BufferedReader(isr);
String getLine = reader.readLine();//first line of HTTP request
handleRequest(getLine,clientSocket);
}//end of try
catch(Exception e){
[error stuff]
}//end of catch
}//end of while
HandleRequest method:
public static void handleRequest(String getLine,Socket clientSocket) throws Exception{
if(getLine.substring(5,16).equals("favicon.ico")){
List<String> iconTag = new ArrayList<String>();
iconTag.add("\nContent-Type: image/png");
handleFileRequest("[file]",iconTag,clientSocket);
}//end of if
else{
handleFileRequest("[file]",clientSocket);
}//end of else
}//end of handleRequest
handleFileRequest for images:
public static void handleFileRequest(String fileName,List<String> headerTags,Socket clientSocket) throws Exception{
OutputStream out = clientSocket.getOutputStream();
BufferedReader read = new BufferedReader(new FileReader(fileName));
out.write(success_header.getBytes("UTF-8"));
Iterator<String> itr = headerTags.iterator();
while(itr.hasNext()){
out.write(itr.next().getBytes("UTF-8"));
}//end of while
out.write(end_header.getBytes("UTF-8"));
String readLine = "";
while((readLine = read.readLine())!=null){
out.write(readLine.getBytes("UTF-8"));
}//end of while
out.flush();
out.close();
}//end of handleFileRequest
And it appears to work, as the server sends the file, the browser shows the 200 OK response, but there's no favicon and when I filter network requests to just images, there is one image requested by the page being served but the favicon request is not listed there (the favicon request is in the "other" section). Similarly when clicking on the other image the image shows up on the preview, whereas that's not the case with the favicon request. Screenshot:
Meanwhile here's what the other image looks like, and it shows up in the page just fine:
I also tried including the Content-Length header, but that didn't seem to make a difference. Am I missing something obvious?
Also just to clarify, I know I can include the favicon in the actual html page, the goal isn't to do it, but to understand how it works.
Reading binary files
It seems the content of the favicon is not served correctly.
I suspect this is most likely due to the way you read its content:
while((readLine = read.readLine())!=null){
out.write(readLine.getBytes("UTF-8"));
}
Reading binary content line by line is inappropriate,
because the concept of lines, and also UTF-8 encoding,
don't make sense in the context of binary files.
And you cannot read binary content correctly line by line this way,
because the readLine method of a BufferedReader doesn't return the full line, because it strips the newline from the end.
You cannot manually add a newline character because you cannot know what exactly it was.
Here's a simpler and correct way to read the content of a binary file:
byte[] bytes = Files.readAllBytes(Paths.get("/path/to/file"));
Once you have this, it's easy to produce a correct file header with the content length, using the value of bytes.length.
What happens when you visit a page in a browser
It seems it will be good for you if we clarify a few things.
When you open a URL in a browser,
the browser sends a GET request to the web server to download the content of the original URL that you have specified.
Once it has the page content, it will send further GET requests:
Fetch a favicon if it doesn't have one already. The location of this may be specified in the HTML document, or else the browser will try to fetch SERVERNAME/favicon.ico by default
Fetch the images specified in src attribute of any (valid) <img/> tags in the document
Fetch the style sheets specified in href attribute of any (valid) <style/> tags in the document
... and similarly for <script/> tags, and so on...
The favicon is purely cosmetic, to show in browser tab titles,
the other resources are essential for rendering a page.
They are not essential in text-based browsers like lynx,
such browsers will obviously not fetch these resources.
This is the explanation for why the favicon is requested, and how.
How does a web server serve files?
In the most basic case, serving a file has two important components:
Produce an appropriate HTTP header: each line in the header is in name: value format, and each line must end with \n.
There must be at least a Content-type header.
The header must be terminated by a blank line.
After the blank line that terminates the header,
the content can be anything, even binary.
To illustrate with an example,
consider the curl command, which dumps the content of a url to standard output.
If you run curl url-to-some-html-file,
you will see the content of the html file.
If you run curl url-to-some-image-file,
you will see the content of the image file.
It will be unreadable, and your terminal will probably make funny noises.
You can redirect the output to a file with curl url-to-some-image-file > image.png,
and that will give you an image file,
binary content,
that you can open in any image viewer tool.
In short, serving files is really just printing a header on stdout,
then printing a blank line to terminate the header,
then printing the content on stdout.
Debugging the serving of an image
An easy way to debug that an image is correctly served is to save the URL to a file using curl,
and then verify that the saved file and the original file are identical,
for example using the cmp command:
curl -o file url-to-favicon
cmp file /path/to/original
The output of cmp should be empty.
This command only produces output if it finds a difference in the two files.
Implementing a simple HTTP server
Instead of using a ServerSocket,
here's a drastically simpler way to implement an HTTP server:
HttpServer server = HttpServer.create(new InetSocketAddress(1234), 0);
server.createContext("/favicon.ico", t -> {
byte[] bytes = Files.readAllBytes(Paths.get("/path/to/favicon"));
t.sendResponseHeaders(200, bytes.length);
try (OutputStream os = t.getResponseBody()) {
os.write(bytes);
}
});
server.createContext("/", t -> {
Charset charset = StandardCharsets.UTF_8;
List<String> lines = Files.readAllLines(Paths.get("/path/to/index"), charset);
t.sendResponseHeaders(200, 0);
try (OutputStream os = t.getResponseBody()) {
for (String line : lines) {
os.write((line + "\n").getBytes(charset));
}
}
});
server.start();

Java BufferedReader doesn't load full content of a webpage

Hello this is my first question on here and i was wondering if anybody has a solution to my problem, i am trying to get the full content of a webpage after everything has loaded. For example i have a website that pulls information in after the web page has loaded, so like a search page that uses ajax to request data from the server. When i run the code all i get is the basic shell of the webpage and nothing from the search result.
URL url = new URL("a_url");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
i am searching pirate bay for torrents, as i am testing the use of magnet downloads in java and when i try to collect the magnet links, and name of the torrent the "inputLine" does not print anything that i have searched for, only what the website consist of before the search has been added. any help would be much appreciated thanks
With your piece, you're requesting the page to the server and displaying to sysout.
Every content pulled after the page is loaded is requested by some javascript. The javascript is interpreted by the Web browser. If you want to have the same result, you should interpret the javascript as the browser does. I think that jsoup has such a feature (never tested).
Other solution : the javascript is accessing the server via a HTTP API. Try to access to the some API from your java code, without requesting the main page.

How can I download HTML from website like browser would (autoremembered data) in Java

Okay, so what I want to do is to download HTML from facebook from Java code.
I know how to do that, the problem comes when I want it to download HTML as I would in View page source in my browser, when I'm logged in instead of getting the login fb page.
I know that I can use API but I just want to check one thing in HTML and it seems like kinda too big thing to include and use a whole API.
So I was wondering if there is a simple way of doing that (maybe I should execute some link first with my credentials, although I don't think that it is the way to do that).
I want to do is to download HTML from facebook from JAVA code
You can do that by reading from a Urlconnection.
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL facebook = new URL("http://www.facebook.com/or any dir");
URLConnection yc = facebook.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
You can input any url and get the source code of that given page.
To view the source code or save the source code.
java URLConnectionReader > facebook.html(or any format)
The problems comes when I want to download HTML as it would be if I
were Logged in (But of course I'm not, it just downloads the login
page). And I don't know how to kind of progmatically login, so that I
can download the HTML as it would be after I've logged in
First a word of caution, if you don't have direct permission to do this, beware, the site in question may preclude this in their terms of service.
To answer the question, there are many, many reasons a site would reject a login. To do this successfully you need to get as close as possible to how a browser would handle the transaction. To do that you need to see what a real browser is doing.
https is more tricky as many http sniffers can't deal with it but httpwatch claims it can. Check out the HTTP transactions and then try to replicate them.
Your url.openConnection() call will actually return an instance of HTTPURLConnction, cast to that & then you'll be able to easily set various http headers such as the User-Agent.
A final note, you say a cookie may be required. Your code isn't going to deal with cookies. To do that you'll need to use a cookie manager, e.g.:refer this for example

Guaranteed way to correctly get the contents of www.bing.com/

I have been working on a program that gets the contents of www.bing.com and saves it to a file, but out of the two ways I have tried one using sockets, and the other using HtmlUnit neither shows the contents 100% correct when I open the file. I know there are other options out there, but I looking for one that is guaranteed to get the contents of www.bing.com/ correctly. I would therefore appreciate it if someone could point me to a means of accomplishing this.
The differences you see are likely due to the web server providing different content to different browsers based on the user agent string and other request headers.
Try setting the User-Agent header in your socket and HtmlUnit strategies to the one you are comparing against and see if the result is as expected. Moreover, you will likely have to replicate the request headers exactly as they are sent by your target browser.
What is "incorrect" about what is returned? Keep in mind, Bing is probably generating some of the content via JavaScript; your client will need to make additional requests to retrieve the JavaScript files, run the JavaScript, etc.
You can use a URL.openConnection() to create a URLConnection and call URLConnection.getInputStream(). You can read the InputStream contents and write it to a file.
If you need to override the User-Agent because the server is using it to serve different content you can do so by first setting the http.agent system property to empty string.
/* Somewhere in your code before you make requests */
System.setProperty("http.agent", "");
or using -Dhttp.agent= on your java command line
and then setting the User-Agent to something useful on the connection before you get the InputStream.
URLConnection conn = ... //Create your URL connection as described above.
String userAgent = ... //Some user-agent string here.
conn.setRequestProperty("User-Agent", userAgent);

Categories