Basically I have this code to decompress some string that stores in a file:
public static String decompressRawText(File inFile) {
InputStream in = null;
InputStreamReader isr = null;
StringBuilder sb = new StringBuilder(STRING_SIZE);
try {
in = new FileInputStream(inFile);
in = new BufferedInputStream(in, BUFFER_SIZE);
in = new GZIPInputStream(in, BUFFER_SIZE);
isr = new InputStreamReader(in);
int length = 0;
while ((length = isr.read(cbuf)) != -1) {
sb.append(cbuf, 0, length);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
return sb.toString();
}
Since physical IO is quite time consuming, and since my compressed version of files are all quite small(around 2K from 2M of text), is it possible for me to still do the above, but on a file that is already mapped to memory? possibly using java NIO? Thanks
It won't make any difference, at least not much. Mapped files are about 20% faster in I/O last time I looked. You still have to actually do the I/O: mapping just saves some data copying. I would look at increasing BUFFER_SIZE to at least 32k. Also the size of cbuf, which should be a local variable in this method, not a member variable, so it will be thread-safe. It might be worth not compressing the files under a certain size threshold, say 10k.
Also you should be closing isr here, not in.
It might be worth trying putting another BufferedInputStream on top of the GZIPInputStream, as well as the one underneath it. Get it to do more at once.
Related
I am working on a school project where I want to make a personal storage server. At the moment, what I am trying to achieve is being able to transfer a file from the client machine to the server. However, when testing this with an image, the file partially sends before it corrupts.
Please bare in mind that I am a reasonably new programmer and that my technical knowledge may be some-what limited.
I am using a byte array through a DataOutputStream to transfer the file. I want to use this method as it should work for any file type. I've tried to set the buffer size to the exact size of the file and larger but neither have worked.
Server:
public void run() {
try {
System.out.println("ip: " + clientSocket.getInetAddress().getHostAddress());
out = new DataOutputStream(clientSocket.getOutputStream());
in = new DataInputStream(clientSocket.getInputStream());
in.read(buffer, 0, buffer.length);
fileOut = new FileOutputStream("X:\\My Documents\\My
Pictures\\gradient.jpg");
fileOut.write(buffer, 0, buffer.length);
in.close();
out.close();
clientSocket.close();
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
Client:
public void startConnection(String ip, int port) {
try {
clientSocket = new Socket(ip, port);
out = new DataOutputStream(clientSocket.getOutputStream());
in = new DataInputStream(clientSocket.getInputStream());
x = false;
Path filePath = Paths.get("C:\\Users\\georg\\Documents\\gradient.jpg");
buffer = Files.readAllBytes(filePath);
Thread.sleep(3000);
//Files.write(filePath, buffer);
//out.write(buffer,0,buffer.length);
x = true;
sendMessage(buffer);
} catch (IOException ex) {
System.out.println(ex.getMessage());
} catch (InterruptedException ex) {
Logger.getLogger(PCS_Client.class.getName()).log(Level.SEVERE, null, ex);
}
}
public byte[] sendMessage(byte[] buffer) {
if (x==true){
try {
out.write(buffer,0,buffer.length);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
return null;
}
Here is a comparison of the files I've tried to send vs the files I receive:
https://imgur.com/gallery/T7nUUJT
Curiously, sending a single colour image produces a single colour image on the server. I believe the issue here may have to be in the timing of code execution however I am not sure and do not know how to go about fixing it.
The issue is in your server code, at this line:
in.read(buffer, 0, buffer.length);
You expect to read all the data at once, but if you read the doc you will find this:
public final int read(byte[] b,
int off,
int len)
throws IOException
Reads up to len bytes of data from the contained input stream into an
array of bytes. An attempt is made to read as many as len bytes, but a
smaller number may be read, possibly zero. The number of bytes
actually read is returned as an integer.
The important part is Reads up to len bytes of data.
You must use the return value of read and call it read repeatedly until the is nothing more to read.
I have a problem when the user upload large files (> 1 GB) (I'm using flow.js library), it creates hundred of thousand small chunked files (e.g 100KB each) inside temporary directory but failed to merge into single file, due to MemoryOutOfException. This is not happened when the file is under 1 GB. I know it sound tedious and you probably suggest me to increase the XmX in my container-but I want to have another angle besides that.
Here is my code
private void mergeFile(String identifier, int totalFile, String outputFile) throws AppException{
File[] fileDatas = new File[totalFile]; //we know the size of file here and create specific amount of the array
byte fileContents[] = null;
int totalFileSize = 0;
int filePartUploadSize = 0;
int tempFileSize = 0;
//I'm creating array of file and append the length
for (int i = 0; i < totalFile; i++) {
fileDatas[i] = new File(identifier + "." + (i + 1)); //indentifier is the name of the file
totalFileSize += fileDatas[i].length();
}
try {
fileContents = new byte[totalFileSize];
InputStream inStream;
for (int j = 0; j < totalFile; j++) {
inStream = new BufferedInputStream(new FileInputStream(fileDatas[j]));
filePartUploadSize = (int) fileDatas[j].length();
inStream.read(fileContents, tempFileSize, filePartUploadSize);
tempFileSize += filePartUploadSize;
inStream.close();
}
} catch (FileNotFoundException ex) {
throw new AppException(AppExceptionCode.FILE_NOT_FOUND);
} catch (IOException ex) {
throw new AppException(AppExceptionCode.ERROR_ON_MERGE_FILE);
} finally {
write(fileContents, outputFile);
for (int l = 0; l < totalFile; l++) {
fileDatas[l].delete();
}
}
}
Please show the "inefficient" of this method, once again... only large files that cannot be merge using this method, smaller one ( < 1 GB) no problem at all....
I appreciate if you do not suggest me to increase the heap memory instead show me the fundamental error of this method... thanks...
Thanks
It's unnecessary to allocate the entire file size in memory by declaring a byte array of the entire size. Building the concatenated file in memory in general is totally unnecessary.
Just open up an outputstream for your target file, and then for each file that you are combining to make it, just read each one as an input stream and write the bytes to outputstream, closing each one as you finish. Then when you're done with them all, close the output file. Total memory use will be a few thousand bytes for the buffer.
Also, don't do I/O operations in finally block (except closing and stuff).
Here is a rough example you can play with.
ArrayList<File> files = new ArrayList<>();// put your files here
File output = new File("yourfilename");
BufferedOutputStream boss = null;
try
{
boss = new BufferedOutputStream(new FileOutputStream(output));
for (File file : files)
{
BufferedInputStream bis = null;
try
{
bis = new BufferedInputStream(new FileInputStream(file));
boolean done = false;
while (!done)
{
int data = bis.read();
boss.write(data);
done = data < 0;
}
}
catch (Exception e)
{
//do error handling stuff, log it maybe?
}
finally
{
try
{
bis.close();//do this in a try catch just in case
}
catch (Exception e)
{
//handle this
}
}
}
} catch (Exception e)
{
//handle this
}
finally
{
try
{
boss.close();
}
catch (Exception e) {
//handle this
}
}
... show me the fundamental error of this method
The implementation flaw is that you are creating a byte array (fileContents) whose size is the total file size. If the total file size is too big, that will cause an OOME. Inevitably.
Solution - don't do that! Instead "stream" the file by reading from the "chunk" files and writing to the final file using a modest sized buffer.
There are other problems with your code too. For instance, it could leak file descriptors because you are not ensure that inStream is closed under all circumstances. Read up on the "try-with-resources" construct.
I am developing an android app that requires downloading a zip file (around 1,5 MB max) with a small amount of logos (png files of 20-30KB average size) from a webserver.
I have encapsulated the process of downloading and unzipping the files into android internal storage in an AsyncTask's doInbackground() method.
The issue I have is that the unZipIntoInternalStorage() method I have developed (pasted down), sometimes runs forever. Usually it takes around 900ms seconds to unzip and save the logos into internal storage, but for some unknown reason around 1 of 4 executions blocks during the loop (and stays there "for ever" taking more than 2 or 3 mins to decompress all png files):
while ((count = zipInputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, count);
}
Edited: After doing some logging and debugging I found out that the line slowing down so much the execution is : zipInputStream.read(buffer) inside the while condition. Any ideas why sometimes it runs extremely fast and some others extremely slow?
Here is my complete method to unzip the downloaded files and save them into android internal storage. I also add the method where the zipInputStream is initialized, from the zip file downloaded (both methods executed inside doInBackground() ):
private void unZipIntoInternalStorage(ZipInputStream zipInputStream) {
long start = System.currentTimeMillis();
Log.i(LOG_TAG, "Unzipping started ");
try {
File iconsDir = context.getDir("icons", Context.MODE_PRIVATE);
ZipEntry zipEntry;
byte[] buffer = new byte[1024];
int count;
FileOutputStream outputStream;
while ((zipEntry = zipInputStream.getNextEntry()) != null) {
File icon = new File(iconsDir, zipEntry.getName());
outputStream = new FileOutputStream(icon);
while ((count = zipInputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, count);
}
zipInputStream.closeEntry();
outputStream.close();
}
zipInputStream.close();
} catch (Exception e) {
Log.e(LOG_TAG + " Decompress", "unzip error ", e);
e.printStackTrace();
}
Log.i(LOG_TAG, "Unzipping completed time required: " + (System.currentTimeMillis() - start) + " ms");
}
private ZipInputStream httpDownloadIconsZip(String zipUrl) {
URLConnection urlConnection;
try {
URL finalUrl = new URL(zipUrl);
urlConnection = finalUrl.openConnection();
return new ZipInputStream(urlConnection.getInputStream());
} catch (IOException e) {
Log.e(LOG_TAG, Log.getStackTraceString(e));
return null;
}
}
To clarify, after testing this method several times and debugging, the blocking for ever always happens in the nested while loop I described previously. But I can't find the reason (see edited clarification)
Also I have already tried this method using BufferedOutputStream class and with the same results: nested while loop running forever sometimes and others unzipping successfully in less than a second.
Hope I have been as clear as possible, since I have spent long hours looking for posible causes to the issue in several post regarding unzipping files or java I/O methods with no success.
Any help appreciated. Thanks
I would suspect the InputStream rather than the output to be the issue.
Try :
return new ZipInputStream(new BufferedInputStream(urlConnection.getInputStream()));
You can add an argument for setting buffer size, but default settings should be fine for your use case.
The problem is typically caused by small packet size, leading to one read forcing several IO operations.
Ideally, you do want to use also a BufferedOutputStream, since the read could read much less than 1kB, but you still do pay a full I/O for each write.
As a general rule, remember I/O is 100 times slower than anything else you could do, and often leads to the scheduler putting your task on Wait. So just use BufferedStream anywhere the stream is not in memory (i.e. always except for StringBufferXXXStream basically).
In your case, due to zip protocol, your read could lead to any number of smaller reads on the actual network socket, as Zip parses and interprets headers and contents of the compressed file.
I know this error. If your zip file has been damaged, when you try to unzip it by ZipInputStream, it will be a infinite loop, because the file has no EOF.
But if your unzip it by ZipFile, you can catch that Exception!
public static boolean unZipByFilePath(String fileName, String unZipDir) {
long startUnZipTime = System.currentTimeMillis();
try {
File f = new File(unZipDir);
if (!f.exists()) {
f.mkdirs();
}
BufferedOutputStream dest = null;
BufferedInputStream is = null;
ZipEntry entry;
ZipFile zipfile = new ZipFile(fileName);
Enumeration e = zipfile.entries();
while (e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
is = new BufferedInputStream(zipfile.getInputStream(entry));
int count = 0;
byte data[] = new byte[BUFFER];
String destFilePath = unZipDir + "/" + entry.getName();
File desFile = new File(destFilePath);
if (entry.isDirectory()) {
desFile.mkdirs();
} else if (!desFile.exists()) {
desFile.getParentFile().mkdirs();
desFile.createNewFile();
}
FileOutputStream fos = new FileOutputStream(destFilePath);
dest = new BufferedOutputStream(fos, BUFFER);
while ((count = is.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
is.close();
}
zipfile.close();
} catch (Exception e) {
Log.e(TAG, "unZipByFilePath failed : " + e.getMessage());
return false;
}
return true;
}
Here is how I compressed the string into a file:
public static void compressRawText(File outFile, String src) {
FileOutputStream fo = null;
GZIPOutputStream gz = null;
try {
fo = new FileOutputStream(outFile);
gz = new GZIPOutputStream(fo);
gz.write(src.getBytes());
gz.flush();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
gz.close();
fo.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Here is how I decompressed it:
static int BUFFER_SIZE = 8 * 1024;
static int STRING_SIZE = 2 * 1024 * 1024;
public static String decompressRawText(File inFile) {
InputStream in = null;
InputStreamReader isr = null;
StringBuilder sb = new StringBuilder(STRING_SIZE);//constant resizing is costly, so set the STRING_SIZE
try {
in = new FileInputStream(inFile);
in = new BufferedInputStream(in, BUFFER_SIZE);
in = new GZIPInputStream(in, BUFFER_SIZE);
isr = new InputStreamReader(in);
char[] cbuf = new char[BUFFER_SIZE];
int length = 0;
while ((length = isr.read(cbuf)) != -1) {
sb.append(cbuf, 0, length);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
return sb.toString();
}
The decompression seems to take forever to do. I have got a feeling that I am doing too much redundant steps in the decompression bit. any idea of how I could speed it up?
EDIT: have modified the code to the above based on the following given recommendations,
1. I chaged the pattern, so to simply my code a bit, but if I couldn't use IOUtils is this still ok to use this pattern?
2. I set the StringBuilder buffer to be of 2M, as suggested by entonio, should I set it to be a little bit more? the memory is still OK, I still have around 10M available as it is suggested by the heap monitor from eclipse
3. I cut the BufferedReader and added a BufferedInputStream, but I am still not sure about the BUFFER_SIZE, any suggestions?
The above modification has improved the time taken to loop all my 30 2M files from almost 30 seconds to around 14, but I need to reduce it to under 10, is it even possible on android? Ok, basically, I need to process a text file in all 60M, I have divided them up into 30 2M, and before I start processing on each strings, I did the above timing on the time cost for me just to loop all the files and get the String in the file into my memory. Since I don't have much experience, will it be better, if I use 60 of 1M files instead? or any other improvement should I adopt? Thanks.
ALSO: Since physical IO is quite time consuming, and since my compressed version of files are all quite small(around 2K from 2M of text), is it possible for me to still do the above, but on a file that is already mapped to memory? possibly using java NIO? Thanks
The BufferedReader's only purpose is the readLine() method you don't use, so why not just read from the InputStreamReader? Also, maybe decreasing the buffer size may be helpful. Also, you should probably specify the encoding while both reading and writing, though that shouldn't have an impact on performance.
edit: more data
If you know the size of the string ahead, you should add a length parameter to decompressRawText and use it to initialise the StringBuilder. Otherwise it will be constantly resized in order to accomodate the result, and that's costly.
edit: clarification
2MB implies a lot of resizes. There is no harm if you specify a capacity higher than the length you end up with after reading (other than temporarily using more memory, of course).
You should wrap the FileInputStream with a BufferedInputStream before wrapping with a GZipInputStream, rather than using a BufferedReader.
The reason is that, depending on implementation, any of the various input classes in your decoration hierarchy could decide to read on a byte-by-byte basis (and I'd say the InputStreamReader is most likely to do this). And that would translate into many read(2) calls once it gets to the FileInputStream.
Of course, this may just be superstition on my part. But, if you're running on Linux, you can always test with strace.
Edit: once nice pattern to follow when building up a bunch of stream delegates is to use a single InputStream variable. Then, you only have one thing to close in your finally block (and can use Jakarta Commons IOUtils to avoid lots of nested try-catch-finally blocks).
InputStream in = null;
try
{
in = new FileInputStream("foo");
in = new BufferedInputStream(in);
in = new GZIPInputStream(in);
// do something with the stream
}
finally
{
IOUtils.closeQuietly(in);
}
Add a BufferedInputStream between the FileInputStream and the GZIPInputStream.
Similarly when writing.
I have a Java Applet that I'm making some edits to and am running into performance issues. More specifically, the applet generates an image which I need to export to the client's machine.
This is really at the proof-of-concept stage so bear with me. For right now, the image is exported to the clients machine at a pre-defined location (This will be replaced with a save-dialog or something in the future). However, the process takes nearly 15 seconds for a 32kb file.
I've done some 'shoot-by-the-hip' profiling where I have printed messages to the console at logical intervals throughout the method in question. I've found, to my surprise, that the bottleneck appears to be with the actual data stream writing process, not the jpeg encoding.
KEEP IN MIND THAT I ONLY HAVE A BASIC KNOWLEDGE OF JAVA AND ITS METHODS
So go slow :p - I'm mainly looking for suggestions to solve the problem rather the solution itself.
Here is the block of code where the magic happens:
ByteArrayOutputStream jpegOutput = new ByteArrayOutputStream();
JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(jpegOutput);
encoder.encode(biFullView);
byte[] imageData = jpegOutput.toByteArray();
String myFile="C:" + File.separator + "tmpfile.jpg";
File f = new File(myFile);
try {
dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(myFile),512));
dos.writeBytes(byteToString(imageData));
dos.flush();
dos.close();
}
catch (SecurityException ee) {
System.out.println("writeFile: caught security exception");
}
catch (IOException ioe) {
System.out.println("writeFile: caught i/o exception");
}
Like I mentioned, using system.out.println() I've narrowed the performance bottleneck to the DataOutputStream block. Using a variety of machines with varying hardware stats seems to have little effect on the overall performance.
Any pointers/suggestions/direction would be much appreciated.
EDIT:
As requested, byteToString():
public String byteToString(byte[] data){
String text = new String();
for ( int i = 0; i < data.length; i++ ){
text += (char) ( data[i] & 0x00FF );
}
return text;
}
You might want to take a look at ImageIO.
And I think the reason for the performance problem is the looping in byteToString. You never want to do a concatenation in a loop. You could use the String(byte[]) constructor instead, but you don't really need to be turning the bytes into a string anyway.
If you don't need the image data byte array you can encode directly to the file:
String myFile="C:" + File.separator + "tmpfile.jpg";
File f = new File(myFile);
FileOutputStream fos = null;
try {
fos = new FileOutputStream(f);
JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(
new BufferedOutputStream(fos));
encoder.encode(biFullView);
}
catch (SecurityException ee) {
System.out.println("writeFile: caught security exception");
}
catch (IOException ioe) {
System.out.println("writeFile: caught i/o exception");
}finally{
if(fos != null) fos.close();
}
If you need the byte array to perform other operations it's better to write it directly to the FileOutputStream:
//...
fos = new FileOutputStream(myFile));
fos.write(imageData, 0, imageData.length);
//...
You could also use the standard ImageIO API (classes in the com.sun.image.codec.jpeg package are not part of the core Java APIs).
String myFile="C:" + File.separator + "tmpfile.jpg";
File f = new File(myFile);
ImageIO.write(biFullView, "jpeg", f);