Compressing content in Java without file I/O - java

I would like to perform repeated compression task for CPU profiling without doing any file I/O but strictly reading a byte stream. I want to do this in Java (target of my benchmark).
Does anyone have a suggestion how to do this?
I used Zip API that uses ZipEntry but ZipEntry triggers file I/O.
Any suggestions or code samples are highly appreciated.

I used Zip API that uses ZipEntry but ZipEntry triggers file I/O.
I wouldn't expect it to if you use a ByteArrayOutputStream as the underlying output:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zipStream = new ZipOutputStream(baos);
... write to zipStream ...
Likewise wrap your byte array for reading data from in a ByteArrayInputStream.
Of course, ZipOutputStream is appropriate if you want to create content using the zip compression format, which is good for (or at least handles :) multiple files. For a single stream of data, you may want to use DeflaterOutputStream or GZIPOutputStream, again using a ByteArrayOutputStream as the underlying output.

Instead of using a FileInputStream or FileOutputStream you can use ByteArrayInputStream and ByteArrayOutputStream.

Related

iText - OutOfMemory creating more than 1000 PDFs

I want to create a ZipOutputStream filled with PDF-As. I'm using iText (Version 5.5.7). For more than 1000 pdf entries I get an OutOfMemory-exception on doc.close() and can't find the leak.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(baos));
zos.setEncoding("Cp850");
for (MyObject o : objects) {
try {
String pdfFilename = o.getName() + ".pdf";
zos.putNextEntry(new ZipEntry(pdfFilename));
pdfBuilder.buildPdfADocument(zos);
zos.closeEntry();
} ...
PdfBuilder
public void buildPdfADocument(org.apache.tools.zip.ZipOutputStream zos){
Document doc = new Document(PageSize.A4);
PdfAWriter writer = PdfAWriter.getInstance(doc, zos, PdfAConformanceLevel.PDF_A_1B);
writer.setCloseStream(false); // to not close my zos
writer.setViewerPreferences(PdfWriter.ALLOW_PRINTING | PdfWriter.PageLayoutSinglePage);
writer.createXmpMetadata();
doc.open();
// adding Element's to doc
// with flushContent() on PdfPTables
InputStream sRGBprofile = servletContext.getResourceAsStream("/WEB-INF/conf/AdobeRGB1998.icc");
ICC_Profile icc = ICC_Profile.getInstance(sRGBprofile);
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
//try to close/flush everything possible
doc.close();
writer.setXmpMetadata(null);
writer.flush();
writer.close();
if(sRGBprofile != null){
sRGBprofile.close();
}
}
Any suggestions how can I fix it? Am I forgetting something?
I've already tried to use java ZipOutputStream but it makes any difference.
Thx for ur answers! I understand the issue with the ByteOutputStream, but I am not sure what's the best approach in my case. It's a web application and I need to pack the zip in a database blob somehow.
What I am doing now is creating the PDFs directly into the ZipOutputStream with iText and saving byte array of the corresponding ByteArrayOutputSteam to blob. Options that I see are:
Split my data in 500 object packages, save first 500 PDFs to the database and then open the zip and add the next 500 ones and so on... But I assume that this creates me the same situation as I have now, namely too big stream opened in the memory.
Try to save the PDFs on the server (not sure if there's enough space), create temporary zip file and then submit the bytes to the blob...
Any suggestions/ideas?
It's because your ZipOutputStream is backed by a ByteArrayOutputStream, so even closing the entries keeps the full ZIP contents in memory.
You need to use another approach to do it with this number of arguments (1000+ files).
You are loading all the PDF files in memory on your example, you will need to do this in blocks of documents to minimize the effect of this 'memory load'.
Another approach is serialize your PDFs on filesystem, and then create your zip file.

Download a large file from HDFS

I was given DataInputStream from a HDFS client for a large file (around 2GB) and I need to store it as a file on my host.
I was thinking about using apache common IOUtils and doing something like this...
File temp = getTempFile(localPath);
DataInputStream dis = HDFSClient.open(filepath); // around 2GB file (zipped)
in = new BufferedInputStream(dis);
out = new FileOutputStream(temp);
IOUtils.copy(in, out);
I was looking for other solutions that can work better than this approach. Major concern for this is to use buffering in both input and IOUtils.copy ...
For files larger than 2GB is recommended to use IOUtils.copyLarge() (if we are speaking about the same IOUtils: org.apache.commons.io.IOUtils )
The copy in IOUtils uses a default buffer size of 4Kb (although you can specify another buffer size as a parameter).
The difference between copy() and copyLarge() is the returning result.
For copy(), if the stream is bigger than 2GB you will succeed with the copy but the result is -1.
For copyLarge() the result is exactly the amount of bytes you copied.
See more in the documentation here:
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copyLarge(java.io.InputStream,%20java.io.OutputStream)

java Files.readAllBytes(image.png) doesn't work

I was trying to read from file and then write to other file. I use code bellow to do so.
byte[] bytes = Files.readAllBytes(file1);
Writer Writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2), "UTF-8"));
for(int i=0;i<bytes.length;i++)
Writer.write(bytes[i]);
Writer.close();
But when I change file1 to picture.png and file2 to picture2.png, this method doesn't work and I can't open picture2.png using image viewer.
What have I done wrong?
Writers are for writing text, possibly in different formats (ie utf-8 / 16, etc). For writing raw bytes, don't use writers. Just use (File)OutputStreams.
It is truly as simple as
byte[] bytes = ...;
FileOutputStream fos = ...;
fos.write(bytes);
The other answers explain why what you have potentially fails.
I'm curious why you're already using one Java NIO method, but not others? The library already has methods to do this for you.
byte[] bytes = Files.readAllBytes(file1);
Files.write(file2, bytes, StandardOpenOption.CREATE_NEW); // or relevant OpenOptions
or
FileOutputStream out = new FileOutputStream(file2); // or buffered
Files.copy(file1, out);
out.close();
or
Files.copy(file1, file2, options);
The problem is that Writer.write() doesn't take a byte. It takes a char, which is variable size, and often bigger than one byte.
But once you've got the whole thing read in as a byte[], you can just use Files.write() to send the whole array to a file in much the same way that you read it in:
Files.write(filename, bytes);
This is the more modern NIO idiom, rather than using an OutputStream.
It's worth reading the tutorial.

Reading , Writing unreadable file stream to save in database

I need to write a file stream to database. The file content must be readable only through
the program. Manual open file should not display the readable content. I decided to use
ObjectOutput stream as it is the binary writing mechanism in java. But I can see the string
content when I open the file.
Writing to stream
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream os = new ObjectOutputStream(baos);
os.writeObject("HIIIIIIIIIIIIIIIIIIIIII HOW ARE YOU");
The created content is look like
’ t #HIIIIIIIIIIIIIIIIIIIIII HOW ARE YOU
How to get complete binary stream output?
The file content must be readable only through the program. Manual open file should not display the readable content.
So you need some security.
I decided to use ObjectOutput stream as it is the binary writing mechanism in java.
That's (a) a non sequitur, and (b) security by obscurity: i.e. it is no security at all.
You should use encryption.

Client-Server File Transfer in Java

I'm looking for an efficient way to transfer files between client and server processes using TCP in Java. My server code looks something like this:
socket = serverSocket.accept();
InputStream is = socket.getInputStream();
OutputStream os = socket.getOutputStream();
FileInputStream fis = new FileInputStream(new File(filename));
I'm just unsure of how to proceed. I know I want to read bytes from fis and then write them to os, but I'm unsure about the best way to read and write bytes using byte streams in Java. I'm only familiar with writing/reading text using Writers and Readers. Can anyone tell me the appropriate way to do this? What should I wrap os and fis in (if anything) and how do I keep reading bytes until the end of file without a hasNext() method (or equivalent)
You could do something like:
byte[] contents = new byte[BUFFER_SIZE];
int numBytes =0;
while((numBytes = is.read(contents))>0){
os.write(contents,0,numBytes);
}
You could use Apache's IOUtils.copy(in, out) or
import org.apache.commons.fileupload.util.Streams;
...
Streams.copy(in, out, false);
Inspecting the source might prove interesting. ( http://koders.com ?)
There is the java.nio.Channel with a transferTo method, with mixed opinions in the community wether better for smaller/larger files.
A simple block wise copy between Input/OutputStream would be okay. You could wrap it in buffered streams.

Categories