BufferedReader can't read long line - java

I am reading this file: https://www.reddit.com/r/tech/top.json?limit=100 into a BufferedReader from a HttpUrlConnection. I've got it to read some of the file, but it only reads about a 1/10th of what it should. It doesn't change anything if I change the size of the input buffer - it prints the same thing just in smaller chunks:
try{
URL url = new URL(urlString);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder sb = new StringBuilder();
int charsRead;
char[] inputBuffer = new char[500];
while(true) {
charsRead = reader.read(inputBuffer);
if(charsRead < 0) {
break;
}
if(charsRead > 0) {
sb.append(String.copyValueOf(inputBuffer, 0, charsRead));
Log.d(TAG, "Value read " + String.copyValueOf(inputBuffer, 0, charsRead));
}
}
reader.close();
return sb.toString();
} catch(Exception e){
e.printStackTrace();
}
I believe the issue is that the text is all on one line since it's not formatted in json correctly, and BufferedReader can only take a line so long. Is there any way around this?

read() should continue to read as long as charsRead > 0. Every time it makes a call to read, the reader marks where it last read from and the next call starts at that place and continues on until there is no more to read. There is no limit to the size it can read. The only limit is the size of the array but the overall size of the file there is none.
You could try the following:
try(InputStream is = connection.getInputStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
int read = 0;
byte[] buffer = new byte[4096];
while((read = is.read(buffer)) > 0) {
baos.write(buffer, 0, read);
}
return new String(baos.toByteArray(), StandardCharsets.UTF_8);
} catch (Exception ex){}
The above method is using purely the bytes from the stream and reading it into the output stream, then creating the string from that.

I suggest using 3d party Http client. It could reduce your code literally to just a few lines and you don't have to worry about all those little details. Bottom line is - someone already wrote the code that you are trying to write. And it works and already well tested. Few suggestions:
Apache Http Client - A well known and popular Http client, but might be a bit bulky and complicated for a simple case like yours.
Ok Http Client - Another well-known Http client
And finally, my favorite (because it is written by me) MgntUtils Open Source library that has Http Client. Maven artifacts can be found here, GitHub that includes the library itself as a jar file, source code, and Javadoc can be found here and JavaDoc is here
Just to demonstrate the simplicity of what you want to do here is the code using MgntUtils library. (I tested the code and it works like a charm)
private static void testHttpClient() {
HttpClient client = new HttpClient();
client.setContentType("application/json; charset=utf-8");
client.setConnectionUrl("https://www.reddit.com/r/tech/top.json?limit=100");
String content = null;
try {
content = client.sendHttpRequest(HttpMethod.GET);
} catch (IOException e) {
content = client.getLastResponseMessage() + TextUtils.getStacktrace(e, false);
}
System.out.println(content);
}

My wild guess is that your default platform charset was UTF-8 and encoding problems were raised. For remote content the encoding should be specified, and not assumed to be equal to the default encoding on your machine.
The charset of the response data must be correct. For that the headers must be inspected. The default should be Latin-1, ISO-8859-1, but browsers interprete that
as Windows Latin-1, Cp-1252.
String charset = connection.getContentType().replace("^.*(charset=|$)", "");
if (charset.isEmpty()) {
charset = "Windows-1252"; // Windows Latin-1
}
Then you can better read bytes, as there is no exact correspondence to the number of bytes read and the number of chars read. If at the end of a buffer is the first char of a surrogate pair, two UTF-16 chars that form a Unicode glyph, symbol, code point above U+FFFF, I do not know the efficiency of the underlying "repair."
BufferedInputStream in = new BufferedInputStream(connection.getInputStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[512];
while (true) {
int bytesRead = in.read(buffer);
if (bytesRead < 0) {
break;
}
if (bytesRead > 0) {
out.write(buffer, 0, bytesRead);
}
}
return out.toString(charset);
And indeed it is safe to do:
sb.append(inputBuffer, 0, charsRead);
(Taking a copy was probably a repair attempt.)
By the way char[500] takes almost twice the memory of byte[512].
I saw that the site uses gzip compression in my browser. That makes sense for text such as json. I mimicked it by setting a request header Accept-Encoding: gzip.
URL url = new URL("https://www.reddit.com/r/tech/top.json?limit=100");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("Accept-Encoding", "gzip");
try (InputStream rawIn = connection.getInputStream()) {
String charset = connection.getContentType().replaceFirst("^.*?(charset=|$)", "");
if (charset.isEmpty()) {
charset = "Windows-1252"; // Windows Latin-1
}
boolean gzipped = "gzip".equals(connection.getContentEncoding());
System.out.println("gzip=" + gzipped);
try (InputStream in = gzipped ? new GZIPInputStream(rawIn)
: new BufferedInputStream(rawIn)) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[512];
while (true) {
int bytesRead = in.read(buffer);
if (bytesRead < 0) {
break;
}
if (bytesRead > 0) {
out.write(buffer, 0, bytesRead);
}
}
return out.toString(charset);
}
}
It might be for not gzip conform "browsers" the content length of the compressed content was erroneously set in the response. Which is a bug.

I believe the issue is that the text is all on one line since it's not formatted in json correctly, and BufferedReader can only take a line so long.
This explanation is not correct:
You are not reading a line at a time, and BufferedReader is not treating the text as line based.
Even when you do read from a BufferedReader a line at a time (i.e. using readLine()) the only limits on the length of a line are the inherent limits of a Java String length (2^31 - 1 characters), and the size of your heap.
Also, note that "correct" JSON formatting is subjective. The JSON specification says nothing about formatting. It is common for JSON emitters to not waste CPU cycles and network bandwidth on formatting for JSON that a human will only rarely read. Application code that consumes JSON needs to be able cope with this.
So what is actually going on?
Unclear, but here are some possibilities:
A StringBuilder also has an inherent limit of 2^31 - 1 characters. However, with (at least) some implementations, if you attempt to grow a StringBuilder beyond that limit, it will throw an OutOfMemoryError. (This behavior doesn't appear to be documented, but it is clear from reading the source code in Java 8.)
Maybe you are reading the data too slowly (e.g. because your network connection is too slow) and the server is timing out the connection.
Maybe the server has a limit on the amount of data that it is willing to send in a response.
Since you haven't mentioned any exceptions and you always seem to get the same amount of data, I suspect the 3rd explanation is the correct one.

Related

Java Socket HTTP GET request

I'm trying to create a simple Java program that create an HTTP request to a HTTP server hosted locally, by using Socket.
This is my code:
try
{
//Create Connection
Socket s = new Socket("localhost",80);
System.out.println("[CONNECTED]");
DataOutputStream out = new DataOutputStream(s.getOutputStream());
DataInputStream in = new DataInputStream(s.getInputStream());
String header = "GET / HTTP/1.1\n"
+"Host:localhost\n\n";
byte[] byteHeader = header.getBytes();
out.write(byteHeader,0,header.length());
String res = "";
/////////////READ PROCESS/////////////
byte[] buf = new byte[in.available()];
in.readFully(buf);
System.out.println("\t[READ PROCESS]");
System.out.println("\t\tbuff length->"+buf.length);
for(byte b : buf)
{
res += (char) b;
}
System.out.println("\t[/READ PROCESS]");
/////////////END READ PROCESS/////////////
System.out.println("[RES]");
System.out.println(res);
System.out.println("[CONN CLOSE]");
in.close();
out.close();
s.close();
}catch(Exception e)
{
e.printStackTrace();
}
But by when I run it the Server reponse with a '400 Bad request error'.
What is the problem? Maybe some HTTP headers to add but I don't know which one to add.
There are a couple of issues with your request:
String header = "GET / HTTP/1.1\n"
+ "Host:localhost\n\n";
The line break to be used must be Carriage-Return/Newline, i.e. you should change that to
String header = "GET / HTTP/1.1\r\n"
+ "Host:localhost\r\n\r\n";
Next problem comes when you write the data to the OutputStream:
byte[] byteHeader = header.getBytes();
out.write(byteHeader,0,header.length());
The call of readBytes without the specification of a charset uses the system's charset which might be a different than the one that is needed here, better use getBytes("8859_1"). When writing to the stream, you use header.length() which might be different from the length of the resulting byte-array if the charset being used leads to the conversion of one character into multiple bytes (e.g. with UTF-8 as encoding). Better use byteHeader.length.
out.write(byteHeader,0,header.length());
String res = "";
/////////////READ PROCESS/////////////
byte[] buf = new byte[in.available()];
After sending the header data you should do a flush on the OutputStream to make sure that no internal buffer in the streams being used prevents the data to actually be sent to the server.
in.available() only returns the number of bytes you can read from the InputStream without blocking. It's not the length of the data being returned from the server. As a simple solution for starters, you can add Connection: close\r\n to your header data and simply read the data you're receiving from the server until it closes the connection:
StringBuffer sb = new StringBuffer();
byte[] buf = new byte[4096];
int read;
while ((read = in.read(buf)) != -1) {
sb.append(new String(buf, 0, read, "8859_1"));
}
String res = sb.toString();
Oh and independent form the topic of doing an HTTP request by your own:
String res = "";
for(byte b : buf)
{
res += (char) b;
}
This is a performance and memory nightmare because Java is actually caching all strings in memory in order to reuse them. So the internal cache gets filled with each result of this concatenation. A response of 100 KB size would mean that at least 5 GB of memory are allocated during that time leading to a lot of garbage collection runs in the process.
Oh, and about the response of the server: This most likely comes from the invalid line breaks being used. The server will regard the whole header including the empty line as a single line and complains about the wrong format of the GET-request due to additional data after the HTTP/1.1.
According to HTTP 1.1:
HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body [...].
So, you'll need all of your request to be ending with \r\n.

Wrting a HTTP proxy in Java using only the Socket class

I'm trying to write an HTTP proxy in Java using only the Socket class. I had attempted to construct one earlier, and I was successfully sending a request by writing to the socket's output stream But I am having a hard time reading the response. the research I have conducted suggests that I should use the input stream and read it line by line, but I have not been able to read any web-pages successfully using this method. Would anyone have any suggestions as to where I could go from here?
My code actually uses a byte buffer to read from the input stream in order to read the page in bytes:
InputStream input = clientSocket.getInputStream()
byte[] buffer = new byte[48*1024];
byte[] redData;
StringBuilder clientData = new StringBuilder();
String redDataText;
int red;
while((red = input.read(buffer)) > -1) {
redData = new byte[red];
System.arraycopy(buffer, 0, redData, 0, red);
redDataText = new String(redData, "UTF-8");
System.out.println("Got message!! " + redDataText);
clientData.append(redDataText);
}
If you are asking for a way to read an InputStream by lines, this one may serve you:
BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(input, "UTF-8"));
String line;
StringBuilder clientData=new StringBuilder();
while ((line=bufferedReader.readLine()) != null)
{
clientData.append(line);
}
You have to be careful not to read an InputStream in this fashion unless you are a priori sure that it contains just plain text (and not binary data).
BTW: For shake of efficiency, I recommend you to pre-size the clientData with an initial size according to the final size (if not, it will start from a default size of 10, and will need to be re-sized more times).

Trouble reading bytes from webpage response (amf)

I'm trying to write a program that can read different types of encoding from webpage responses. Right now I'm trying to figure out how to successfully read AMF data's response. Sending it is no problem, and with my HttpWrapper, it gets the response string just fine, but many of the characters get lost in translation. For that purpose, I'm trying to receive the response as bytes, to then convert into readable text.
The big thing I'm getting is that characters get lost in translation, literally. I use a program called Charles 3.8.3 to help me get an idea of what I should be seeing in the response, both hex-wise and AMF-wise. It's generally fine when it comes to normal characters, but whenever it sees non-unicode character, I always get "ef bf bd." My code for reading the HTTP response is as follows:
BufferedReader d = new BufferedReader(new InputStreamReader(new DataInputStream(conn.getInputStream())));
while (d.read() != -1) {
String bytes = new String(d.readLine().getBytes(), "UTF-8");
result += bytes;
}
I then try to convert it to hex, as follows:
for (int x = 0; x < result.length(); x++) {
byte b = (byte) result.charAt(x);
System.out.print(String.format("%02x", b & 0xFF));
}
My output is: 0000000001000b2f312f6f6e526573756c7400046e756c6c00000**bf**
Whereas Charles 3.8.3 is: 0000000001000b2f312f6f6e526573756c7400046e756c6c00000**0b**
I'm at my wits end on how to resolve this, so any help would be greatly appreciated!
Thank you for your time
It looks like you're using readLine() because you're used to working with text. Wikipedia says AMF is a binary encoding, so you should be able to do something like this, rather than going through an encode/decode noop (you'd need to use ISO-8859-1, not UTF-8 for that to work) with a string.
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[2048];
try (InputStream in = conn.getInputStream()) {
int read;
while ((read = in.read(buffer)) >= 0) {
out.write(buffer, 0, read);
}
}
out.toByteArray();
// Convert to hex if you want.
Your code assumes that every stream uses UTF-8 encoding. This is simply incorrect. You will need to inspect the content-type response header field.

Resumable upload from Java client to Grails web application?

After almost 2 workdays of Googling and trying several different possibilities I found throughout the web, I'm asking this question here, hoping that I might finally get an answer.
First of all, here's what I want to do:
I'm developing a client and a server application with the purpose of exchanging a lot of large files between multiple clients on a single server. The client is developed in pure Java (JDK 1.6), while the web application is done in Grails (2.0.0).
As the purpose of the client is to allow users to exchange a lot of large files (usually about 2GB each), I have to implement it in a way, so that the uploads are resumable, i.e. the users are able to stop and resume uploads at any time.
Here's what I did so far:
I actually managed to do what I wanted to do and stream large files to the server while still being able to pause and resume uploads using raw sockets. I would send a regular request to the server (using Apache's HttpClient library) to get the server to send me a port that was free for me to use, then open a ServerSocket on the server and connect to that particular socket from the client.
Here's the problem with that:
Actually, there are at least two problems with that:
I open those ports myself, so I have to manage open and used ports myself. This is quite error-prone.
I actually circumvent Grails' ability to manage a huge amount of (concurrent) connections.
Finally, here's what I'm supposed to do now and the problem:
As the problems I mentioned above are unacceptable, I am now supposed to use Java's URLConnection/HttpURLConnection classes, while still sticking to Grails.
Connecting to the server and sending simple requests is no problem at all, everything worked fine. The problems started when I tried to use the streams (the connection's OutputStream in the client and the request's InputStream in the server). Opening the client's OutputStream and writing data to it is as easy as it gets. But reading from the request's InputStream seems impossible to me, as that stream is always empty, as it seems.
Example Code
Here's an example of the server side (Groovy controller):
def test() {
InputStream inStream = request.inputStream
if(inStream != null) {
int read = 0;
byte[] buffer = new byte[4096];
long total = 0;
println "Start reading"
while((read = inStream.read(buffer)) != -1) {
println "Read " + read + " bytes from input stream buffer" //<-- this is NEVER called
}
println "Reading finished"
println "Read a total of " + total + " bytes" // <-- 'total' will always be 0 (zero)
} else {
println "Input Stream is null" // <-- This is NEVER called
}
}
This is what I did on the client side (Java class):
public void connect() {
final URL url = new URL("myserveraddress");
final byte[] message = "someMessage".getBytes(); // Any byte[] - will be a file one day
HttpURLConnection connection = url.openConnection();
connection.setRequestMethod("GET"); // other methods - same result
// Write message
DataOutputStream out = new DataOutputStream(connection.getOutputStream());
out.writeBytes(message);
out.flush();
out.close();
// Actually connect
connection.connect(); // is this placed correctly?
// Get response
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = null;
while((line = in.readLine()) != null) {
System.out.println(line); // Prints the whole server response as expected
}
in.close();
}
As I mentioned, the problem is that request.inputStream always yields an empty InputStream, so I am never able to read anything from it (of course). But as that is exactly what I'm trying to do (so I can stream the file to be uploaded to the server, read from the InputStream and save it to a file), this is rather disappointing.
I tried different HTTP methods, different data payloads, and also rearranged the code over and over again, but did not seem to be able to solve the problem.
What I hope to find
I hope to find a solution to my problem, of course. Anything is highly appreciated: hints, code snippets, library suggestions and so on. Maybe I'm even having it all wrong and need to go in a totally different direction.
So, how can I implement resumable file uploads for rather large (binary) files from a Java client to a Grails web application without manually opening ports on the server side?
HTTP GET method have special headers for range retrieval: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35 It's used by most downloaders to do resumable download from server.
As I understand, there are no standard practice for using this headers for POST/PUT request, but it's up to you, right? You can make pretty standard Grails controller, that will accept standard http upload, with header like Range: bytes=500-999. And controller should put this 500 uploaded bytes from client into file, starting at position 500
At this case you don't need to open any socket, and make own protocols, etc.
P.S. 500 bytes is just a example, probably you're using much bigger parts.
Client Side Java Programming:
public class NonFormFileUploader {
static final String UPLOAD_URL= "http://localhost:8080/v2/mobileApp/fileUploadForEOL";
static final int BUFFER_SIZE = 4096;
public static void main(String[] args) throws IOException {
// takes file path from first program's argument
String filePath = "G:/study/GettingStartedwithGrailsFinalInfoQ.pdf";
File uploadFile = new File(filePath);
System.out.println("File to upload: " + filePath);
// creates a HTTP connection
URL url = new URL(UPLOAD_URL);
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
httpConn.setDoOutput(true);
httpConn.setRequestMethod("POST");
// sets file name as a HTTP header
httpConn.setRequestProperty("fileName", uploadFile.getName());
// opens output stream of the HTTP connection for writing data
OutputStream outputStream = httpConn.getOutputStream();
// Opens input stream of the file for reading data
FileInputStream inputStream = new FileInputStream(uploadFile);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = inputStream.read(buffer)) != -1) {
System.out.println("bytesRead:"+bytesRead);
outputStream.write(buffer, 0, bytesRead);
outputStream.flush();
}
System.out.println("Data was written.");
outputStream.flush();
outputStream.close();
inputStream.close();
int responseCode = httpConn.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// reads server's response
BufferedReader reader = new BufferedReader(new InputStreamReader(
httpConn.getInputStream()));
String response = reader.readLine();
System.out.println("Server's response: " + response);
} else {
System.out.println("Server returned non-OK code: " + responseCode);
}
}
}
Server Side Grails Programme:
Inside the controller:
def fileUploadForEOL(){
def result
try{
result = mobileAppService.fileUploadForEOL(request);
}catch(Exception e){
log.error "Exception in fileUploadForEOL service",e
}
render result as JSON
}
Inside the Service Class:
def fileUploadForEOL(request){
def status = false;
int code = 500
def map = [:]
try{
String fileName = request.getHeader("fileName");
File saveFile = new File(SAVE_DIR + fileName);
System.out.println("===== Begin headers =====");
Enumeration<String> names = request.getHeaderNames();
while (names.hasMoreElements()) {
String headerName = names.nextElement();
System.out.println(headerName + " = " + request.getHeader(headerName));
}
System.out.println("===== End headers =====\n");
// opens input stream of the request for reading data
InputStream inputStream = request.getInputStream();
// opens an output stream for writing file
FileOutputStream outputStream = new FileOutputStream(saveFile);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = inputStream.read(buffer);
long count = bytesRead
while(bytesRead != -1) {
outputStream.write(buffer, 0, bytesRead);
bytesRead = inputStream.read(buffer);
count += bytesRead
}
println "count:"+count
System.out.println("Data received.");
outputStream.close();
inputStream.close();
System.out.println("File written to: " + saveFile.getAbsolutePath());
code = 200
}catch(Exception e){
mLogger.log(java.util.logging.Level.SEVERE,"Exception in fileUploadForEOL",e);
}finally{
map <<["code":code]
}
return map
}
I have tried with above code it is worked for me(only for file size 3 to 4MB, but for small size files some bytes of code missing or not even coming but in request header content-length is coming, not sure why it is happening.)

Transmission of files through Socket or HTTP, between Android devices and desktops

I have custom socket client server data (file or text) transmission code. Now when I transfer binary files, some bytes convert onto out of range characters. So I send them in hex string. That works. But for another problem this is not the solution. This has a performance problems as well.
I took help from Java code To convert byte to Hexadecimal.
When I download images from the net, same thing happens. Some bytes change into something else. I have compared bytes by bytes.
Converting into String show ? instead of the symbol. I have tried readers and byte array input stream. I have tried all the examples on the net. What is the mistake I could be doing?
My Code to save bytes to file:
void saveFile(String strFileName){
try{
URL url = new URL(strImageRoot + strFileName);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
BufferedWriter bw = new BufferedWriter(new FileWriter(strImageDownloadPath + strFileName));
String line = null;
while ( (line = reader.readLine()) != null) {
bw.write(line);
}
}catch(FileNotFoundException fnfe){
System.out.println("FileNotFoundException occured!!!");
}catch(IOException ioe){
}catch(Exception e){
System.out.println("Exception occured : " + e);
}finally{
System.out.println("Image downloaded!!!");
}
}
i had a similar issue when i was building a Socket client server application. The bytes would be some weird characters and i tried all sorts of things to try and compare them. Then i came across a discussion where some1 pointed out to me that i should use a datainputstream, dataoutstream and let that do the conversion to and from bytes. that worked for me totally. i never touched the bytes at all.
use this code
File root = android.os.Environment.getExternalStorageDirectory();
File dir = new File (root.getAbsolutePath() + "/image");
if(dir.exists()==false) {
dir.mkdirs();
}
URL url = new URL("http://4.bp.blogspot.com/-zqJs1fVcfeY/TiZM7e-pFqI/AAAAAAAABjo/aKTtTDTCgKU/s1600/Final-Fantasy-X-Night-Sky-881.jpg");
//URL url = new URL(DownloadUrl);
//you can write here any link
File file = new File(dir,"Final-Fantasy-X-Night-Sky-881.jpg");
long startTime = System.currentTimeMillis();
//Open a connection to that URL.
URLConnection ucon = url.openConnection();
//* Define InputStreams to read from the URLConnection.
InputStream is = ucon.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is);
//* Read bytes to the Buffer until there is nothing more to read(-1).
ByteArrayBuffer baf = new ByteArrayBuffer(6000);
int current = 0;
while ((current = bis.read()) != -1) {
baf.append((byte) current);
}
//Convert the Bytes read to a String.
FileOutputStream fos = new FileOutputStream(file);
fos.write(baf.toByteArray());
fos.flush();
fos.close();
You should take the help of this link: How to encode decode in base64 in Android.
You can send byte array obtained from a file as string by encoding into Base64. This reduces the amount of data transmitted as well.
At the receiving end just decode the string using Base64 and obtain byte array.
Then you can use #Deepak Swami's solution to save bytes in file.
I recently found out that PHP service APIs do not know about what is byte array. Any String can be byte stream at the same time, so the APIs expect Base64 string in the request parameter. Please see the posts:
String to byte array in php
Passing base64 encoded strings in URL
Hence Base64 has quite importance as also it allows you to also save byte arrays in preferences, and increases performance if you have to send file data across network using Serialization.
Happy Coding :-)

Categories