I am downloading a Google Cloud Storage object (a GZIP file of size ~400MB, 25 million rows records), with the following code:
Blob blob = storage.get(bucketName, blobName);
ReadChannel readChannel = blob.reader();
The ReadChannel is been gunzip and passed to BufferedReader with following:
BufferedReader reader =
new BufferedReader(
new InputStreamReader(
new GZIPInputStream(
Channels.newInputStream(readChannel)),
UTF_8));
Problem:
The BufferedReader will only read until exactly 10110000 lines only (out of 25 millions lines) all the time (tried 10+ times).
Extra Info:
The google ReadChannel is still endOfStream=false at that point. But the BufferedReader's InputStreamReader has endOfFile=true.
I am able to read the full rows (25 millions) from the exact same ReadChannel by using the following:
InputStream inputStream = Channels.newInputStream(readChannel);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(500 * 1024 * 1024);
IOUtils.copy(inputStream, outputStream);
byte[] gzipFileBytes = outputStream.toByteArray();
BufferedReader reader =
new BufferedReader(
new InputStreamReader(
new GZIPInputStream(
new ByteArrayInputStream(gzipFileBytes)),
UTF_8));
Really appreciate your helps.
Related
what the difference between both stream chaining methods ?
first :
BufferedReader br = new BufferedReader(
new InputStreamReader(someUrlConnection.getInputStream(), encoding));
second :
InputStream raw = someUrlConnection.getInputStream();
InputStream buffer = new BufferedInputStream(raw);
Reader reader = new InputStreamReader(buffer);
I want download only first 3 bytes of file from web, but can't do that.
This method download all file
BufferedReader r = new BufferedReader(new InputStreamReader(imageStream), 3);
as I get InputStream class always download all file..
BufferedReaderis handy if you are trying to read characters.
For example:
char[] charBuff = new charBuff[n];
new BufferedReader(new InputStreamReader(stream)).read(charBuff,0,n);
This Wii read n bytes from the input stream and will store them in the char array.
If you just want to read bytes and store them in a byte array try using this:
byte[] byteBuff= new byteBuff[n];
new BufferedInputStream(input stream).read(byteBuff,0,n);
connection.setRequestProperty("Range", "bytes="+0+"-"+2);
connection.connect();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder total = new StringBuilder();
String line;
line = r.readLine();
Log.i(LOG_TAG, line);
am trying to read a JSON response using buffered reader as shown below. I'm using Apache Commons Http client. Response comes as a single line JSON and no of characters are around 1060000 and size is approximately 1 MB. Problem am facing is only part of stream is read by reader and other part is missing. How can i read the full JSON without losing any data.? Is this related to 'CharBufferSize' of BufferedReader or no of characters in the stream ?
InputStream stream = method.getResponseBodyAsStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream, "UTF-8"));
StringBuilder builder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
builder.append(line);
}
try using a json parser.
import org.codehaus.jackson.*;
JsonFactory fac = new JsonFactory();
JsonParser parser = fac .createJsonParser(stream);
If you just want to copy the complete stream into the StringBuilder, you should use the InputStreamReader and a char-array buffer.
InputStream stream = method.getResponseBodyAsStream();
InputStreamReader reader = new InputStreamReader(stream, "UTF-8");
StringBuilder builder = new StringBuilder();
char[] buffer = new char[4096];
int read;
while ((read = reader.read(buffer)) != -1) {
builder.append(buffer, 0, read);
}
Finally i was able to solve using the IOUtils in Apache Commons library. Here is the code.
BoundedInputStream boundedInputStream= new BoundedInputStream(stream);
BufferedReader reader = new BufferedReader(new InputStreamReader(boundedInputStream,"UTF-8"));
StringBuilder builder= new StringBuilder();
StringBuilderWriter writer = new StringBuilderWriter(builder);
IOUtils.copy(reader, writer);
Although it is been a while, it may be helpful for someone.
Here is the original source,
Most Robust way of reading a file or stream using Java (To prevent DoS attacks)
What's the difference between using a BufferedReader and a BufferedInputStream?
A BufferedReader is used for reading character data. A BufferedOutputStream is used for writing binary data.
Any classes inheriting from Reader or Writer deal with 16-bit unicode character data, whereas classes inherting from InputStream or OutputStream are concerned with processing binary data. The classes InputStreamReader and OutputStreamWriter can be used to bridge between the two classes of data.
Bufferedreader reads data from a file as a string. BufferedOutputStream writes to a file in bytes. BufferedInputStream reads data in bytes
Sample to Bufferedreader:
try {
BufferedReader br = new BufferedReader(new FileReader(new File(your_file));
while ((thisLine = br.readLine()) != null) {
System.out.println(thisLine);
}
}
Sample to BufferedOutputStream:
//Construct the BufferedOutputStream object
bufferedOutput = new BufferedOutputStream(new FileOutputStream(filename));
//Start writing to the output stream
bufferedOutput.write("Line 1".getBytes());
bufferedOutput.write("\r\n".getBytes());
bufferedOutput.write("Line 2".getBytes());
bufferedOutput.write("\r\n".getBytes());
Bufferedinputstream reads in byte:
Sample
:
//Construct the BufferedInputStream object
bufferedInput = new BufferedInputStream(new FileInputStream(filename));
int bytesRead = 0;
while ((bytesRead = bufferedInput.read(buffer)) != -1) {
String chunk = new String(buffer, 0, bytesRead);
System.out.print(chunk);
}
As the names imply, one is for reading data, and the other is for outputting data.
I'm trying to read a text file line by line using InputStream from the assets directory in Android.
I want to convert the InputStream to a BufferedReader to be able to use the readLine().
I have the following code:
InputStream is;
is = myContext.getAssets().open ("file.txt");
BufferedReader br = new BufferedReader (is);
The third line drops the following error:
Multiple markers at this line
The constructor BufferedReader (InputStream) is undefinded.
What I'm trying to do in C++ would be something like:
StreamReader file;
file = File.OpenText ("file.txt");
line = file.ReadLine();
line = file.ReadLine();
...
What am I doing wrong or how should I do that? Thanks!
BufferedReader can't wrap an InputStream directly. It wraps another Reader. In this case you'd want to do something like:
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"));
A BufferedReader constructor takes a reader as argument, not an InputStream. You should first create a Reader from your stream, like so:
Reader reader = new InputStreamReader(is);
BufferedReader br = new BufferedReader(reader);
Preferrably, you also provide a Charset or character encoding name to the StreamReader constructor. Since a stream just provides bytes, converting these to text means the encoding must be known. If you don't specify it, the system default is assumed.
InputStream is;
InputStreamReader r = new InputStreamReader(is);
BufferedReader br = new BufferedReader(r);