I am trying to convert a large stream (4mb) to a string which i eventually convert it to a JSON Array.
when the stream size is small ( in KB ) every thing works fine, the minute it starts to process the 4mb stream it runs out of memory
below is what i use use to convert the stream to string, I've tried almost every thing and i suspect the issue is with the while loop. can some one please help?
public String convertStreamToString(InputStream is)
throws IOException {
if (is != null) {
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try
{
Reader reader = new BufferedReader(
new InputStreamReader(is, "UTF-8"));
int n;
while ((n = reader.read(buffer)) != -1)
{
writer.write(buffer, 0, n);
}
}
finally
{
is.close();
}
return writer.toString();
} else {
return "";
}
}
Update:
ok this is where i reached at the moment, am i on the right track?
I think i am close.. not sure what else i can close or flush to regain memory..
public String convertStreamToString(InputStream is)
throws IOException {
String encoding = "UTF-8";
int maxlines = 2000;
StringWriter sWriter = new StringWriter(7168);
BufferedWriter writer = new BufferedWriter(sWriter);
BufferedReader reader = null;
if (is == null) {
return "";
} else {
try {
int count = 0;
reader = new BufferedReader(new InputStreamReader(is, encoding));
for (String line; (line = reader.readLine()) != null;) {
if (count++ % maxlines == 0) {
sWriter.close();
// not sure what else to close or flush here to regain memory
//Log.v("Max Lines Reached", "Max Lines Reached");;
}
writer.write(line);
}
Log.v("Finished Loop", "Looping over");
} finally {
is.close();
writer.close();
}
return writer.toString();
}
}
StringWriter writes to a StringBuffer internally. A StringBuffer is basically a wrapper round a char array. That array has a certain capacity. When that capacity is insufficient, StringBuffer will allocate a new larger char array and copy the contents of the previous one. At the end you call toString() on the StringWriter, which will again copy the contents of the char array into the char array of the resulting String.
If you have any means of knowing beforehand what the needed capacity is, you should use StringWriter's contructor that sets the initial capacity. That would avoid needlessly copying arrays to increase the buffer.
Yet that doesn't avoid the final copy that happens in toString(). If you're dealing with streams that can be large, you may need to reconsider whether you really need that inputstream as a String. Using a sufficiently large char array directly would avoid all the copying around, and would greatly reduce memory usage.
The ultimate solution would be to do some of the processing of the input, before all of the input has come in, so the characters that have been processed can be discarded. This way you'll only need to hold as much in memory as what is needed for a processing step.
Related
I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html
For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.
As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}
You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.
I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.
I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.
For reading any input stream to a buffer there are two methods. Can someone help me understand which is the better method and why? And in which situation we should use each method?
Reading line by line and appending it to the buffer.
Eg:
public String fileToBuffer(InputStream is, StringBuffer strBuffer) throws IOException{
StringBuffer buffer = strBuffer;
InputStreamReader isr = null;
try {
isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line = null;
while ((line = br.readLine()) != null) {
buffer.append(line + "\n");
}
} finally {
if (is != null) {
is.close();
}
if (isr != null) {
isr.close();
}
}
return buffer.toString();
}
Reading up to buffer size ie 1024 bytes in a char array.
Eg:
InputStreamReader isr = new InputStreamReader(is);
final int bufferSize = 1024;
char[] buffer = new char[bufferSize];
StringBuffer strBuffer = new StringBuffer();
/* read the base script into string buffer */
try {
while (true) {
int read = isr.read(buffer, 0, bufferSize);
if (read == -1) {
break;
}
strBuffer.append(buffer, 0, read);
}
} catch (IOException e) {
}
Consider
public String fileToBuffer(InputStream is, StringBuffer strBuffer) throws IOException {
StringBuilder sb = new StringBuilder(strBuffer);
try (BufferedReader rdr = new BufferedReader(new InputStreamReader(is))) {
for (int c; (c = rdr.read()) != -1;) {
sb.append((char) c);
}
}
return sb.toString();
}
Depends on the purpose.
For work with text files read lines (if you need them).
For work with raw binary data use chunks of bytes.
In you examples chunks of bytes are more robust.
What if a line is too long and breaks some of intermediate objects?
If your file is binary, do you know how big a line will be?
May be the size of file.
Trying to "swallow" too big String may cause ErrorOutOfMemory.
With 1024 bytes it (ok - almost) never happens.
Chunking by 1024 bytes may take longer, but its more reliable.
Using 'readLine' isn't so neat. The asker's method 2 is quite standard, but the below method is unique (and likely better):
//read the whole inputstream and put into a string
public String inputstream2str(InputStream stream) {
Scanner s = new Scanner(stream).useDelimiter("\\A");
return s.hasNext()? s.next():"";
}
From a String you can convert to byte array or whatever buffer you want.
I am getting OutOfMemory Exception. Why? I am using this code for logging. Does this approach correct?
Exceptions and closing of streams are handled in parent methods.
private static void writeToFile(File file, FileWriter out, String message) throws IOException {
if (file.exists() && file.isFile()) {
if ((file.length() + message.getBytes().length) <= FILE_MAX_SIZE_B) {
out.write(message);
} else {
int cutLenght = (int) (file.length() + message.getBytes().length - FILE_MAX_SIZE_B);
FileInputStream fileInputStream = new FileInputStream(file);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileInputStream));
char[] buf = new char[1024];
int numRead = 0;
StringBuffer text = new StringBuffer(1000);
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
String result = new String(text).substring(cutLenght);
result += message;
FileWriter fileWriter = new FileWriter(file, appendToFile);
writeToFile(file, fileWriter, result);
bufferedReader.close();
}
}
}
EDIT:
I am using this method for writting my logs in file. So for example in one second I can call 10 logs. I am getting error on lines:
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
My guess is that you are getting the OutOfMemoryError because you are reading the entire contents of the log file back into memory once it has gotten too close to its maximum size.
You could instead read and write it in smaller chunks, but that could be tricky since you have to avoid overwriting something you haven't already read.
Overall, this technique seems like a very inefficient method of maintaining the log data. Some alternative approaches off the top of my head:
(1) maintain a set of n log files, each with maximum size FILE_MAX_SIZE_B/n. When the first log fills up, open the next one for writing, and so on; when the last one fills up, go back to the first one. In this way you are discarding some of the oldest log data each time you switch files, but not all of it, and still maintaining your overall size limit.
(2) rotate the data within a single file. After each write, add a marker that indicates this is the end of the log stream. When the file has reached its maximum size, just start again at the beginning, overwriting the data that is there. The marker will tell you where the latest message is.
Try something like this:
void appendToFile(File f, CharSequence message, Charset cs, long maximumSize) throws IOException {
long available = maximumSize - f.length();
if (available > 0) {
FileOutputStream fos = new FileOutputStream(f, true);
try {
CharBuffer chars = CharBuffer.wrap(message);
ByteBuffer bytes = ByteBuffer.allocate(8 * 1024); // Re-used when encoding the string
CharsetEncoder enc = cs.newEncoder();
CoderResult res;
do {
res = enc.encode(chars, bytes, true);
bytes.flip();
long len = Math.min(available, bytes.remaining());
available -= len;
fos.write(bytes.array(), bytes.position(), (int) len);
bytes.clear();
} while (res == CoderResult.OVERFLOW && available > 0);
} finally {
fos.close();
}
}
}
Testable with this:
File f = new File(getCacheDir(), "tmp.txt");
f.delete();
// Or whatever charset you want.
Charset cs = Charset.forName("UTF-8");
int maxlen = 2 * 1024; // For this test, 2kb
try {
for (int i = 0; i < maxlen / 20; i++) {
// Write 30 characters for maxlen/20 times == guaranteed overflow
appendToFile(f, "123456789012345678901234567890", cs, maxlen);
System.out.println("Length=" + f.length());
}
} catch (Throwable t) {
t.printStackTrace();
}
f.delete();
Well, you're getting OOM because you're trying to load a huge file into memory.
Did you try opening it with append option instead?
you get OOME because you load the whole file, then get some part of the string. Instead, do a skip on your input stream and read.
I am making an HTTP get request to a website for an android application I am making.
I am using a DefaultHttpClient and using HttpGet to issue the request. I get the entity response and from this obtain an InputStream object for getting the html of the page.
I then cycle through the reply doing as follows:
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
String x = "";
x = r.readLine();
String total = "";
while(x!= null){
total += x;
x = r.readLine();
}
However this is horrendously slow.
Is this inefficient? I'm not loading a big web page - www.cokezone.co.uk so the file size is not big. Is there a better way to do this?
Thanks
Andy
The problem in your code is that it's creating lots of heavy String objects, copying their contents and performing operations on them. Instead, you should use StringBuilder to avoid creating new String objects on each append and to avoid copying the char arrays. The implementation for your case would be something like this:
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
StringBuilder total = new StringBuilder();
for (String line; (line = r.readLine()) != null; ) {
total.append(line).append('\n');
}
You can now use total without converting it to String, but if you need the result as a String, simply add:
String result = total.toString();
I'll try to explain it better...
a += b (or a = a + b), where a and b are Strings, copies the contents of both a and b to a new object (note that you are also copying a, which contains the accumulated String), and you are doing those copies on each iteration.
a.append(b), where a is a StringBuilder, directly appends b contents to a, so you don't copy the accumulated string at each iteration.
Have you tried the built in method to convert a stream to a string? It's part of the Apache Commons library (org.apache.commons.io.IOUtils).
Then your code would be this one line:
String total = IOUtils.toString(inputStream);
The documentation for it can be found here:
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString%28java.io.InputStream%29
The Apache Commons IO library can be downloaded from here:
http://commons.apache.org/io/download_io.cgi
Another possibility with Guava:
dependency: compile 'com.google.guava:guava:11.0.2'
import com.google.common.io.ByteStreams;
...
String total = new String(ByteStreams.toByteArray(inputStream ));
I believe this is efficient enough... To get a String from an InputStream, I'd call the following method:
public static String getStringFromInputStream(InputStream stream) throws IOException
{
int n = 0;
char[] buffer = new char[1024 * 4];
InputStreamReader reader = new InputStreamReader(stream, "UTF8");
StringWriter writer = new StringWriter();
while (-1 != (n = reader.read(buffer))) writer.write(buffer, 0, n);
return writer.toString();
}
I always use UTF-8. You could, of course, set charset as an argument, besides InputStream.
What about this. Seems to give better performance.
byte[] bytes = new byte[1000];
StringBuilder x = new StringBuilder();
int numRead = 0;
while ((numRead = is.read(bytes)) >= 0) {
x.append(new String(bytes, 0, numRead));
}
Edit: Actually this sort of encompasses both steelbytes and Maurice Perry's
Possibly somewhat faster than Jaime Soriano's answer, and without the multi-byte encoding problems of Adrian's answer, I suggest:
File file = new File("/tmp/myfile");
try {
FileInputStream stream = new FileInputStream(file);
int count;
byte[] buffer = new byte[1024];
ByteArrayOutputStream byteStream =
new ByteArrayOutputStream(stream.available());
while (true) {
count = stream.read(buffer);
if (count <= 0)
break;
byteStream.write(buffer, 0, count);
}
String string = byteStream.toString();
System.out.format("%d bytes: \"%s\"%n", string.length(), string);
} catch (IOException e) {
e.printStackTrace();
}
Maybe rather then read 'one line at a time' and join the strings, try 'read all available' so as to avoid the scanning for end of line, and to also avoid string joins.
ie, InputStream.available() and InputStream.read(byte[] b), int offset, int length)
Reading one line of text at a time, and appending said line to a string individually is time-consuming both in extracting each line and the overhead of so many method invocations.
I was able to get better performance by allocating a decent-sized byte array to hold the stream data, and which is iteratively replaced with a larger array when needed, and trying to read as much as the array could hold.
For some reason, Android repeatedly failed to download the entire file when the code used the InputStream returned by HTTPUrlConnection, so I had to resort to using both a BufferedReader and a hand-rolled timeout mechanism to ensure I would either get the whole file or cancel the transfer.
private static final int kBufferExpansionSize = 32 * 1024;
private static final int kBufferInitialSize = kBufferExpansionSize;
private static final int kMillisecondsFactor = 1000;
private static final int kNetworkActionPeriod = 12 * kMillisecondsFactor;
private String loadContentsOfReader(Reader aReader)
{
BufferedReader br = null;
char[] array = new char[kBufferInitialSize];
int bytesRead;
int totalLength = 0;
String resourceContent = "";
long stopTime;
long nowTime;
try
{
br = new BufferedReader(aReader);
nowTime = System.nanoTime();
stopTime = nowTime + ((long)kNetworkActionPeriod * kMillisecondsFactor * kMillisecondsFactor);
while(((bytesRead = br.read(array, totalLength, array.length - totalLength)) != -1)
&& (nowTime < stopTime))
{
totalLength += bytesRead;
if(totalLength == array.length)
array = Arrays.copyOf(array, array.length + kBufferExpansionSize);
nowTime = System.nanoTime();
}
if(bytesRead == -1)
resourceContent = new String(array, 0, totalLength);
}
catch(Exception e)
{
e.printStackTrace();
}
try
{
if(br != null)
br.close();
}
catch(IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
EDIT: It turns out that if you don't need to have the content re-encoded (ie, you want the content AS IS) you shouldn't use any of the Reader subclasses. Just use the appropriate Stream subclass.
Replace the beginning of the preceding method with the corresponding lines of the following to speed it up an extra 2 to 3 times.
String loadContentsFromStream(Stream aStream)
{
BufferedInputStream br = null;
byte[] array;
int bytesRead;
int totalLength = 0;
String resourceContent;
long stopTime;
long nowTime;
resourceContent = "";
try
{
br = new BufferedInputStream(aStream);
array = new byte[kBufferInitialSize];
If the file is long, you can optimize your code by appending to a StringBuilder instead of using a String concatenation for each line.
byte[] buffer = new byte[1024]; // buffer store for the stream
int bytes; // bytes returned from read()
// Keep listening to the InputStream until an exception occurs
while (true) {
try {
// Read from the InputStream
bytes = mmInStream.read(buffer);
String TOKEN_ = new String(buffer, "UTF-8");
String xx = TOKEN_.substring(0, bytes);
To convert the InputStream to String we use the
BufferedReader.readLine() method. We iterate until the BufferedReader return null which means there's no more data to read. Each line will appended to a StringBuilder and returned as String.
public static String convertStreamToString(InputStream is) {
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
}`
And finally from any class where you want to convert call the function
String dataString = Utils.convertStreamToString(in);
complete
I am use to read full data:
// inputStream is one instance InputStream
byte[] data = new byte[inputStream.available()];
inputStream.read(data);
String dataString = new String(data);
Note that this applies to files stored on disk and not to streams with no default size.
Is there a way to prepend a line to the File in Java, without creating a temporary file, and writing the needed content to it?
No, there is no way to do that SAFELY in Java. (Or AFAIK, any other programming language.)
No filesystem implementation in any mainstream operating system supports this kind of thing, and you won't find this feature supported in any mainstream programming languages.
Real world file systems are implemented on devices that store data as fixed sized "blocks". It is not possible to implement a file system model where you can insert bytes into the middle of a file without significantly slowing down file I/O, wasting disk space or both.
The solutions that involve an in-place rewrite of the file are inherently unsafe. If your application is killed or the power dies in the middle of the prepend / rewrite process, you are likely to lose data. I would NOT recommend using that approach in practice.
Use a temporary file and renaming. It is safer.
There is a way, it involves rewriting the whole file though (but no temporary file). As others mentioned, no file system supports prepending content to a file. Here is some sample code that uses a RandomAccessFile to write and read content while keeping some content buffered in memory:
public static void main(final String args[]) throws Exception {
File f = File.createTempFile(Main.class.getName(), "tmp");
f.deleteOnExit();
System.out.println(f.getPath());
// put some dummy content into our file
BufferedWriter w = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f)));
for (int i = 0; i < 1000; i++) {
w.write(UUID.randomUUID().toString());
w.write('\n');
}
w.flush();
w.close();
// append "some uuids" to our file
int bufLength = 4096;
byte[] appendBuf = "some uuids\n".getBytes();
byte[] writeBuf = appendBuf;
byte[] readBuf = new byte[bufLength];
int writeBytes = writeBuf.length;
RandomAccessFile rw = new RandomAccessFile(f, "rw");
int read = 0;
int write = 0;
while (true) {
// seek to read position and read content into read buffer
rw.seek(read);
int bytesRead = rw.read(readBuf, 0, readBuf.length);
// seek to write position and write content from write buffer
rw.seek(write);
rw.write(writeBuf, 0, writeBytes);
// no bytes read - end of file reached
if (bytesRead < 0) {
// end of
break;
}
// update seek positions for write and read
read += bytesRead;
write += writeBytes;
writeBytes = bytesRead;
// reuse buffer, create new one to replace (short) append buf
byte[] nextWrite = writeBuf == appendBuf ? new byte[bufLength] : writeBuf;
writeBuf = readBuf;
readBuf = nextWrite;
};
rw.close();
// now show the content of our file
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}
You could store the file content in a String and prepend the desired line by using a StringBuilder-Object. You just have to put the desired line first and then append the file-content-String.
No extra temporary file needed.
No. There are no "intra-file shift" operations, only read and write of discrete sizes.
It would be possible to do so by reading a chunk of the file of equal length to what you want to prepend, writing the new content in place of it, reading the later chunk and replacing it with what you read before, and so on, rippling down the to the end of the file.
However, don't do that, because if anything stops (out-of-memory, power outage, rogue thread calling System.exit) in the middle of that process, data will be lost. Use the temporary file instead.
private static void addPreAppnedText(File fileName) {
FileOutputStream fileOutputStream =null;
BufferedReader br = null;
FileReader fr = null;
String newFileName = fileName.getAbsolutePath() + "#";
try {
fileOutputStream = new FileOutputStream(newFileName);
fileOutputStream.write("preappendTextDataHere".getBytes());
fr = new FileReader(fileName);
br = new BufferedReader(fr);
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
fileOutputStream.write(("\n"+sCurrentLine).getBytes());
}
fileOutputStream.flush();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
fileOutputStream.close();
if (br != null)
br.close();
if (fr != null)
fr.close();
new File(newFileName).renameTo(new File(newFileName.replace("#", "")));
} catch (IOException ex) {
ex.printStackTrace();
}
}
}