Is there an efficient way to decompress bytearrays (gzip) to String?

Is there an efficient way to decompress bytearrays (gzip) to String? - java

I'm trying to decompress a larg bytearray with around 1,500,000 bytes.
The data is retrieved from a webservice as a bas64 string which i decode first to bytearray.
The bytearray consists of a compressed JSON string which I wan't to read an object model from via GSON.
My App is using around ~12MB RAM before i start my decompression method below:
public static String decompress(byte[] arr) {
if(arr == null || arr.length == 0)
return null;
byte[] buffer = new byte[1024];
try {
ByteArrayOutputStream out = new ByteArrayOutputStream(arr.length);
GZIPInputStream gzip = new GZIPInputStream(new ByteArrayInputStream(arr));
int len;
while ((len = gzip.read(buffer)) > 0) {
out.write(buffer, 0 ,len);
}
gzip.close();
out.close();
System.gc();
return out.toString("UTF-8");
} catch (IOException e) {
e.printStackTrace();
return null;
} catch (OutOfMemoryError e) {
e.printStackTrace();
return null;
}
}
While writing to the output stream the OOM Exception occures.
I already checked many SO questions but the only hint I got was that coding it as a streaming process would help but this wasen't explaint at all and I dind't find any leads on the net. So if somebody knows if it is possible to do the decompression memory efficent please expain it or link an article that covers the topic.

Related

IO Image reading and writing: Is writing array of bytes different from writing byte at a time using write(int b) method?

I am new to java IO and I tried to simply copy and paste a photo. I used two ways to achieve this the first works nicely but the second doesn't.
This Code works fine.
try (BufferedInputStream input = new BufferedInputStream(new FileInputStream("photoOriginal.jpg"));
BufferedOutputStream output =new BufferedOutputStream(new FileOutputStream("photoCopy.jpg"))) {
try {
int n =0;
byte[] buf = new byte[4092];
while((n = input.read(buf))!=-1){
output. Write(buf,0,n);
output.flush();
}
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
e.printStackTrace();
}
But the second doesn't work , after the program finished I find the copy File with the same exact size as the original but when trying to open it ,it shows format not supported error.
try (BufferedInputStream input = new BufferedInputStream(new FileInputStream("photoOriginal.jpg"));
BufferedOutputStream output =new BufferedOutputStream(new FileOutputStream("photoCopy.jpg"))) {
try {
int byteRead = input.read();
while (byteRead != -1) {
byteRead = input.read();
output.write(byteRead);
output.flush();
}
}
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
e.printStackTrace();
}
I don't understand where the problem is, it seems that the 2 sample are doing the same thing.
Is reading to and writing from byte array different from reading and writing single byte at a time ?
Isn't writing int to a Stream with write(int b) method only writes the lowest 8 bits and vice versa as said in Documentation ?
write
public abstract void write(int b)
throws IOException
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
hope someone will help.

You're not writing out the first byte - you call input.read(), check that it's not -1, but then call input.read() again:
// Broken code
int byteRead = input.read();
while (byteRead != -1) {
byteRead = input.read();
output.write(byteRead);
output.flush();
}
If you just move the next input.read() call to the end of the loop, it will work:
// Working code with duplication
int byteRead = input.read();
while (byteRead != -1) {
output.write(byteRead);
output.flush();
byteRead = input.read();
}
Or you could combine the "read and test" to avoid duplication:
// Working code without duplication
int byteRead;
while ((byteRead = input.read()) != -1) {
output.write(byteRead);
output.flush();
}
However, this is still a very inefficient way of copying a stream. Copying a chunk at a time, as per your first code, is much more efficient (or using the built-in transferTo method if you're using Java 9 or higher, as rostamn79 notes).

Baeldung.com provides information on stream.transferTo() method which does not incur an additional copy to Java heap
https://www.baeldung.com/java-inputstream-to-outputstream
Example code
#Test
public void givenUsingJavaNine_whenCopyingInputStreamToOutputStream_thenCorrect() throws IOException {
String initialString = "Hello World!";
try (InputStream inputStream = new ByteArrayInputStream(initialString.getBytes());
ByteArrayOutputStream targetStream = new ByteArrayOutputStream()) {
inputStream.transferTo(targetStream);
assertEquals(initialString, new String(targetStream.toByteArray()));
}
}
See how this transferTo is called with both streams as arguments

How to save e-mail Attachment directly to Database without saving to HDD First?

I found a nice Java script that connects to my email server and gets the new emails content from it. This java program, download the email attachments to my HDD too. But i need to save the attachments (PDF, EXCEL, WORD, IMAGES, etc.) directly to my Database, instead of first save to HDD and then, uploading to my database (i am using Oracle 12C Database) table.
i am Java rookie programmer, any tips to my question is welcome.
thanks!
Here is the snip code that save the attachments to HDD:
public void procesMultiPart(Multipart content) {
try {
for (int i = 0; i < content.getCount(); i++) {
BodyPart bodyPart = content.getBodyPart(i);
Object o;
o = bodyPart.getContent();
if (o instanceof String) {
System.out.println("procesMultiPart");
} else if (null != bodyPart.getDisposition() && bodyPart.getDisposition().equalsIgnoreCase(Part.ATTACHMENT)) {
String fileName = bodyPart.getFileName();
System.out.println("fileName = " + fileName);
InputStream inStream = bodyPart.getInputStream();
FileOutputStream outStream = new FileOutputStream(new File(downloadDirectory + fileName));
byte[] tempBuffer = new byte[4096]; // 4 KB
int numRead;
while ((numRead = inStream.read(tempBuffer)) != -1) {
outStream.write(tempBuffer);
}
inStream.close();
outStream.close();
}
}
} catch (IOException e) {
e.printStackTrace();
} catch (MessagingException e) {
e.printStackTrace();
}
}

Caveat: I can't really test this but this is basically what you're looking for:
//----------snip
InputStream inStream = bodyPart.getInputStream();
//The outstream can be any output stream, I switch this to one that writes to memory (byte[]).
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
byte[] tempBuffer = new byte[4096]; // 4 KB
int numRead;
while ((numRead = inStream.read(tempBuffer)) != -1) {
outStream.write(tempBuffer);
}
//Handle object here
byte[] attachment = outStream.toByteArray();
//Pseudo Code Begins
SQL.createAttachment(attachment); //I'm assuming there's a static method to do this
inStream.close();
outStream.close();
//-----------------snip
The code is literally the same, you just need to target the data correctly. That means having a connection to your database, writing some SQL (or using a framework) to insert into it etc...
This is probably outside the scope of a single question answer. How would I handle it? Probably something like this (I'm assuming you can open a connection and have that all working. I obviously don't have a schema).
static Connection oracle; //Psuedo Code
//SQL class
public static createAttachment(byte[] blob)
{
//exception handling skipped
Query q = oracle.createQuery("INSERT INTO Attachments Values(?)");
q.setParameter(0, blob);
q.execute();
}
I hope that points you in the right direction. It isn't comprehensive but it is a solution. It is also a bad design, but it probably isn't an issue for what you're working with. I'm not even addressing resource management in this.

Why does getResourceAsStream() and reading file with FileInputStream return arrays of different length?

I want to read files as byte arrays and realised that amount of read bytes varies depending on the used method. Here the relevant code:
public byte[] readResource() {
try (InputStream is = getClass().getClassLoader().getResourceAsStream(FILE_NAME)) {
int available = is.available();
byte[] result = new byte[available];
is.read(result, 0, available);
return result;
} catch (Exception e) {
log.error("Failed to load resource '{}'", FILE_NAME, e);
}
return new byte[0];
}
public byte[] readFile() {
File file = new File(FILE_PATH + FILE_NAME);
try (InputStream is = new FileInputStream(file)) {
int available = is.available();
byte[] result = new byte[available];
is.read(result, 0, available);
return result;
} catch (Exception e) {
log.error("Failed to load file '{}'", FILE_NAME, e);
}
return new byte[0];
}
Calling File.length() and reading with the FileInputStream returns the correct length of 21566 bytes for the given test file, though reading the file as a resources returns 21622 bytes.
Does anyone know why I get different results and how to fix it so that readResource() returns the correct result?

Why does getResourceAsStream() and reading file with FileInputStream return arrays of different length?
Because you're misusing the available() method in a way that is specifically warned against in the Javadoc:
"It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream."
and
Does anyone know why I get different results and how to fix it so that readResource() returns the correct result?
Read in a loop until end of stream.

According to the the API docs of InputStream, InputStream.available() does not return the size of the resource - it returns
an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking
To get the size of a resource from a stream, you need to fully read the stream, and count the bytes read.
To read the stream and return the contents as a byte array, you could do something like this:
try ( InputStream is = getClass().getClassLoader().getResourceAsStream(FILE_NAME);
ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
byte[] buffer = new byte[4096];
int bytesRead = 0;
while ((bytesRead = is.read(buffer)) != -1) {
bos.write(buffer, 0, bytesRead);
}
return bos.toByteArray();
}

Java: How do I convert InputStream to GZIPInputStream?

I have a method like
public void put(#Nonnull final InputStream inputStream, #Nonnull final String uniqueId) throws PersistenceException {
// a.) create gzip of inputStream
final GZIPInputStream zipInputStream;
try {
zipInputStream = new GZIPInputStream(inputStream);
} catch (IOException e) {
e.printStackTrace();
throw new PersistenceException("Persistence Service could not received input stream to persist for " + uniqueId);
}
I wan to convert the inputStream into zipInputStream, what is the way to do that?
The above method is incorrect and throws Exception as "Not a Zip Format"
converting Java Streams to me are really confusing and I do not make them right

The GZIPInputStream is to be used to decompress an incoming InputStream. To compress an incoming InputStream using GZIP, you basically need to write it to a GZIPOutputStream.
You can get a new InputStream out of it if you use ByteArrayOutputStream to write gzipped content to a byte[] and ByteArrayInputStream to turn a byte[] into an InputStream.
So, basically:
public void put(#Nonnull final InputStream inputStream, #Nonnull final String uniqueId) throws PersistenceException {
final InputStream zipInputStream;
try {
ByteArrayOutputStream bytesOutput = new ByteArrayOutputStream();
GZIPOutputStream gzipOutput = new GZIPOutputStream(bytesOutput);
try {
byte[] buffer = new byte[10240];
for (int length = 0; (length = inputStream.read(buffer)) != -1;) {
gzipOutput.write(buffer, 0, length);
}
} finally {
try { inputStream.close(); } catch (IOException ignore) {}
try { gzipOutput.close(); } catch (IOException ignore) {}
}
zipInputStream = new ByteArrayInputStream(bytesOutput.toByteArray());
} catch (IOException e) {
e.printStackTrace();
throw new PersistenceException("Persistence Service could not received input stream to persist for " + uniqueId);
}
// ...
You can if necessary replace the ByteArrayOutputStream/ByteArrayInputStream by a FileOuputStream/FileInputStream on a temporary file as created by File#createTempFile(), especially if those streams can contain large data which might overflow machine's available memory when used concurrently.

GZIPInputStream is for reading gzip-encoding content.
If your goal is to take a regular input stream and compress it in the GZIP format, then you need to write those bytes to a GZIPOutputStream.
See also this answer to a related question.

why initialize this byte array to 1024

I'm relatively new to Java and I'm attempting to write a simple android app. I have a large text file with about 3500 lines in the assets folder of my applications and I need to read it into a string. I found a good example about how to do this but I have a question about why the byte array is initialized to 1024. Wouldn't I want to initialize it to the length of my text file? Also, wouldn't I want to use char, not byte? Here is the code:
private void populateArray(){
AssetManager assetManager = getAssets();
InputStream inputStream = null;
try {
inputStream = assetManager.open("3500LineTextFile.txt");
} catch (IOException e) {
Log.e("IOException populateArray", e.getMessage());
}
String s = readTextFile(inputStream);
// Add more code here to populate array from string
}
private String readTextFile(InputStream inputStream) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
inputStream.length
byte buf[] = new byte[1024];
int len;
try {
while ((len = inputStream.read(buf)) != -1) {
outputStream.write(buf, 0, len);
}
outputStream.close();
inputStream.close();
} catch (IOException e) {
Log.e("IOException readTextFile", e.getMessage());
}
return outputStream.toString();
}
EDIT: Based on your suggestions, I tried this approach. Is it any better? Thanks.
private void populateArray(){
AssetManager assetManager = getAssets();
InputStream inputStream = null;
Reader iStreamReader = null;
try {
inputStream = assetManager.open("List.txt");
iStreamReader = new InputStreamReader(inputStream, "UTF-8");
} catch (IOException e) {
Log.e("IOException populateArray", e.getMessage());
}
String String = readTextFile(iStreamReader);
// more code here
}
private String readTextFile(InputStreamReader inputStreamReader) {
StringBuilder sb = new StringBuilder();
char buf[] = new char[2048];
int read;
try {
do {
read = inputStreamReader.read(buf, 0, buf.length);
if (read>0) {
sb.append(buf, 0, read);
}
} while (read>=0);
} catch (IOException e) {
Log.e("IOException readTextFile", e.getMessage());
}
return sb.toString();
}

This example is not good at all. It's full of bad practices (hiding exceptions, not closing streams in finally blocks, not specify an explicit encoding, etc.). It uses a 1024 bytes long buffer because it doesn't have any way of knowing the length of the input stream.
Read the Java IO tutorial to learn how to read text from a file.

You are reading the file into a buffer of 1024 Bytes.
Then those 1024 bytes are written to outputStream.
This process repeats until the whole file is read into the outputStream.
As JB Nizet mentioned the example is full of bad practices.

Wouldn't I want to initialize it to the length of my text file? Also, wouldn't I want to use char, not byte?
Yes, and yes ... and as other answers have said, you've picked an example with a number of errors in it.
However, there is a theoretical problem doing both; i.e. setting the buffer length to the file length and using a character buffer rather than a byte buffer. The problem is that the file size is measured in bytes, but the size of the buffer needs to be measured in characters. This is normally fine, but it is theoretically possible that you will need more characters than the file size in bytes; e.g. if the input file used a 6 bit character set and packed 4 characters into 3 bytes.

To read from a file I usaully use a Scanner and a StringBuilder.
Scanner scan = new Scanner(new BufferedInputStream(new FileInputStream(filename)), "UTF-8");
StringBuilder sb = new StringBuilder();
while (scan.hasNextLine()) {
sb.append(scan.nextLine());
sb.append("\n");
}
scan.close
return sb.toString();
Try to throw your exceptions instead of swallowing them. The caller must know there was a problem reading your file.
Edit: Also note that using a BufferedInputStream is important. Otherwise it will try to read bytes by bytes which can be slow.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is there an efficient way to decompress bytearrays (gzip) to String? - java

Related

IO Image reading and writing: Is writing array of bytes different from writing byte at a time using write(int b) method?

How to save e-mail Attachment directly to Database without saving to HDD First?

Why does getResourceAsStream() and reading file with FileInputStream return arrays of different length?

Java: How do I convert InputStream to GZIPInputStream?

why initialize this byte array to 1024

Categories

Resources