Decompressing GZIP

Decompressing GZIP - java

I would like to know what I'm supposed to do in the case of a gzip-encoded response. this is the method handling my responses :
private InputStream getResultStream(Response response) throws IOException
{
InputStream resultStream = null;
if(response != null)
{
String encoding = response.getHeader("Content-Encoding");
if((encoding != null) && (encoding.equalsIgnoreCase("gzip")))
{
// What to do here ?
Log.d("Stream :", "Read GZIP");
} else if ((encoding != null) && encoding.equalsIgnoreCase("deflate")) {
resultStream = new InflaterInputStream(response.getStream(), new Inflater(true));
Log.d("Stream :", "Read Deflated.");
} else {
resultStream = response.getStream();
Log.d("Stream :","Read Normal.");
}
}
return resultStream;
}
How do I approach this ?

Wrap your stream in a GZIPInputStream and read from that.
resultStream = new GZIPInputStream(resultStream);
//proceed reading as usual

Disclaimer: I have not had a chance to test this.
According to what Android’s HTTP Clients blog post says if you are on Android 2.3+, then HttpURLConnection can do it for you automatically.

If you just want to know how to read a gziped stream, just wrap your inputstream with a GZIPInputStream
InputStream is = ...
GZIPInputStream zis = new GZIPInputStream(new BufferedInputStream(is));
try {
// Reading from 'zis' gets you the uncompressed bytes...
processStream(zis);
} finally {
zis.close();
}

Do the following :
resultStream = new java.util.zip.GZIPInputStream(response.getStream());

Related

Large File Download causing GC overhead limit exceeded

I have two services first frontend_service and second backend_service and I'm getting the large file from backend_service and trying to forward to user via frontend_service using
response.getBodyAsStream() but this is causing "java.lang.OutOfMemoryError: GC overhead limit exceeded" in frontend_service.
code for backend_service:
`
public static Result downloadLargeFile(String filePath){
File file = new File(filePath);
InputStream inputStream = new FileInputStream(file);
return ok(inputStream);
}
`
code for frontend_service:
`
public static F.Promise<Result> downloadLargeFile(String filePath) {
//this will call backend_service downloadLargeFile method.
String backEndUrl = getBackEndUrl(filePath);
return getInputStream(backEndUrl);
}
`
`
public static Promise<Result> getInputStream(String url) {
return WS.url(url).get().map(
response -> {
InputStream inputStream = response.getBodyAsStream();
return ok(inputStream);
}
);
}
`
I tried the solution suggested here by reading few bytes at a time from inputStream and creating tmp file in frontend_service and sending the tmp file as output from frontend_service.
`
public static Promise<Result> getInputStream(String url) {
return WS.url(url).get().map(
response -> {
InputStream inputStream = null;
OutputStream outputStream = null;
try {
inputStream = response.getBodyAsStream();
//write input stream to tmp file
final File tmpFile = new File("/tmp/tmp.txt");
outputStream = new FileOutputStream(tmpFile);
int read = 0;
byte[] buffer = new byte[500];
while((read = inputStream.read(buffer)) != -1){
outputStream.write(buffer, 0 , read);
}
return ok(tmpFile);
} catch (IOException e) {
e.printStackTrace();
return badRequest();
} finally {
if (inputStream != null) {inputStream.close();}
if (outputStream != null) {outputStream.close();}
}
}
);
`
Above code also throwing java.lang.OutOfMemoryError. I'm trying 1 GB file.

I do not have the implementation "under the hand", so I will write the algorithm.
1. Play uses the AsyncHttpClient under the WS. You need to get it, or create it as described in the https://www.playframework.com/documentation/2.3.x/JavaWS#Using-WSClient
2. Then, you need to implement the AsyncCompletionHandler, like in the description of the class https://static.javadoc.io/org.asynchttpclient/async-http-client/2.0.0/org/asynchttpclient/AsyncHttpClient.html
3. In the onBodyPartReceived method of the AsyncCompletionHandler class, you need to push the body part to the chunked play response. Chanked responses described here: https://www.playframework.com/documentation/2.3.x/JavaStream#Chunked-responses
P.S.
The discussion of the similar solution but in oposite direction - streaming uploading to the "backend" (Amazon) service through the "frontend" (play 2) service:
https://groups.google.com/d/msg/asynchttpclient/EpNKLSG9ymM/BAGvwl0Wby8J

Stream closed error when getting charset

I'm having issues with the following code:
try (
InputStream is = new FileInputStream(file);
BufferedReader br = new BufferedReader(
new InputStreamReader(is,
Charset.forName(SidFileUtils.charsetDetection(is))
)
);
) {
br.readLine();
br.readLine();
for (String line = br.readLine() ; line != null ; line = br.readLine()) {
lines.add(line);
}
} catch (ExceptionTechnique | IOException e) {
LOG.error("Erreur lors de la lecture du fichier " + file.getName(), e);
}
This part of the code: Chasrset.forName(...) is giving me a Stream Closed error. I think it's because I'm using the InputStream item twice and it has already been consumed but I'm not sure.
Can you help me understand what is wrong with this code please ?
Thanks a lot in advance !

Yes, the charsetDetection has no other option to read the stream further. Some streams can mark and reset the read position when the specific InputStream supports it.
if (in.markSupported()) {
final int maxBytesNeededForDetection = 8192;
in.mark(maxBytesNeededForDetection);
... do the detection
in.reset();
} else {
throw IllegalState();
}
BufferedInputStream indeed supports it, but only upto the buffer size; otherwise an IOException("Resetting to invalid mark"); is raised.
One then should specify the buffer size in the constructor.
In this case it seems no mark/reset is used by the detection. Quite logical because of the partial coverage of such a technique.
Charset charset = null;
try (InputStream is = new FileInputStream(file)) {
Charset charset = Charset.forName(SidFileUtils.charsetDetection(is));
}
if (charset != null) {
...
}

zip4j, extract a password protected file from an inputstream (blob inputstream which is a zip file)

I have a database that contains blobs and a password protected zip inside this database, using the standard File object approach i traditionally see
File zipFile = new File("C:\\file.zip");
net.lingala.zip4j.core.ZipFile table = new net.lingala.zip4j.core.ZipFile(zipFile);
if (table.isEncrypted())
table.setPassword(password);
net.lingala.zip4j.model.FileHeader entry = table.getFileHeader("file_inside_the_zip.txt");
return table.getInputStream(entry); //Decrypted inputsteam!
my question is, how do i implement something like this without the use of temporary files, and purely obtaining an inputstream of the blob alone, so far i have something like this
InputStream zipStream = getFileFromDataBase("stuff.zip");
//This point forward I have to save zipStream as a temporary file and use the traditional code above

I faced the same problem while processing a password protected zipped file in a Hadoop File System (HDFS). HDFS doesn't know about the File object.
This is what worked for me using zip4j:
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path hdfsReadPath = new Path(zipFilePath); // like "hdfs://master/dir/sub/data/the.zip"
FSDataInputStream inStream = fs.open(hdfsReadPath);
ZipInputStream zipInputStream = new ZipInputStream(inStream, passWord.toCharArray());
LocalFileHeader zipEntry = null;
BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream));
while ((zipEntry = zipInputStream.getNextEntry()) != null ) {
String entryName = zipEntry.getFileName();
System.out.println(entryName);
if (!zipEntry.isDirectory()) {
String line;
while ((line = reader.readLine()) != null) {
//process the line
}
}
}
reader.close();
zipInputStream.close();

I believe it is not possible via zip4j as it is very centered around files.
Have a look at this one: http://blog.alutam.com/2012/03/31/new-library-for-reading-and-writing-zip-files-in-java/

There is a way you can achieve it with the net.lingala.zip4j.io.inputstream.ZipInputStream
(given a byte[] zipFile and a String password)
String zipPassword = "abcabc";
ZipInputStream innerZip = new ZipInputStream(new ByteArrayInputStream(zipFile), zipPassword.toCharArray());
then you could loop over your non protected zip
File zip = null;
while ((zipEntry = zipIs.getNextEntry()) != null) {
zip = new File(file.getAbsolutePath(), zipEntry.getFileName());
....
}

public void extractWithZipInputStream(File zipFile, char[] password) throws IOException {
LocalFileHeader localFileHeader;
int readLen;
byte[] readBuffer = new byte[4096];
InputStream inputStream = new FileInputStream(zipFile);
try (ZipInputStream zipInputStream = new ZipInputStream(inputStream, password)) {
while ((localFileHeader = zipInputStream.getNextEntry()) != null) {
File extractedFile = new File(localFileHeader.getFileName());
try (OutputStream outputStream = new FileOutputStream(extractedFile)) {
while ((readLen = zipInputStream.read(readBuffer)) != -1) {
outputStream.write(readBuffer, 0, readLen);
}
}
}
}
}
This method needs to be modified according to your need. For example, you may have to change the output location. I have tried and it has worked. For better understanding see https://github.com/srikanth-lingala/zip4j

Inline input stream processing in Java

I need some help on below problem. I am working on a project where I need to deal with files.
I get the handle of input stream from the user from which before writing it to disk I need to perform certain steps.
calculate the file digest
check for only 1 zip file present, unzip the data if zipped
dos 2 unix conversion
record length validation
and encrypt and save the file to disk
Also need to break the flow if there is any exception in the process
I tried to use piped output and input stream, but the constraint is Java recommends it to run in 2 separate threads. Once I read from input stream I am not able to use it from other processing steps. Files can be very big so cannot cache all the data in buffer.
Please provide your suggestions or is there any third party lib I can use for same.

The biggest issue is that you'll need to peek ahead in the provided InputStream to decide if you received a zipfile or not.
private boolean isZipped(InputStream is) throws IOException {
try {
return new ZipInputStream(is).getNextEntry() != null;
} catch (final ZipException ze) {
return false;
}
}
After this you need to reset the inputstream to the initial position before setting up a DigestInputStream.
Then read a ZipInputstream or the DigestInputstream directly.
After you've done your processing, read the DigestInputStream to the end so you can obtain the digest.
Below code has been validated through a wrapping "CountingInputstream" that keeps track of the total number of bytes read from the provided FileInputStream.
final FileInputStream fis = new FileInputStream(filename);
final CountingInputStream countIs = new CountingInputStream(fis);
final boolean isZipped = isZipped(countIs);
// make sure we reset the inputstream before calculating the digest
fis.getChannel().position(0);
final DigestInputStream dis = new DigestInputStream(countIs, MessageDigest.getInstance("SHA-256"));
// decide which inputStream to use
InputStream is = null;
ZipInputStream zis = null;
if (isZipped) {
zis = new ZipInputStream(dis);
zis.getNextEntry();
is = zis;
} else {
is = dis;
}
final File tmpFile = File.createTempFile("Encrypted_", ".tmp");
final OutputStream os = new CipherOutputStream(new FileOutputStream(tmpFile), obtainCipher());
try {
readValidateAndWriteRecords(is, os);
failIf2ndZipEntryExists(zis);
} catch (final Exception e) {
os.close();
tmpFile.delete();
throw e;
}
System.out.println("Digest: " + obtainDigest(dis));
dis.close();
System.out.println("\nValidating bytes read and calculated digest");
final DigestInputStream dis2 = new DigestInputStream(new CountingInputStream(new FileInputStream(filename)), MessageDigest.getInstance("SHA-256"));
System.out.println("Digest: " + obtainDigest(dis2));
dis2.close();
Not really relevant, but these are the helper methods:
private String obtainDigest(DigestInputStream dis) throws IOException {
final byte[] buff = new byte[1024];
while (dis.read(buff) > 0) {
dis.read(buff);
}
return DatatypeConverter.printBase64Binary(dis.getMessageDigest().digest());
}
private void readValidateAndWriteRecords(InputStream is, final OutputStream os) throws IOException {
final BufferedReader br = new BufferedReader(new InputStreamReader(is));
// do2unix is done automatically by readline
for (String line = br.readLine(); line != null; line = br.readLine()) {
// record length validation
if (line.length() < 1) {
throw new RuntimeException("RecordLengthValidationFailed");
}
os.write((line + "\n").getBytes());
}
}
private void failIf2ndZipEntryExists(ZipInputStream zis) throws IOException {
if (zis != null && zis.getNextEntry() != null) {
throw new RuntimeException("Zip File contains multiple entries");
}
}
==> output:
Digest: jIisvDleAttKiPkyU/hDvbzzottAMn6n7inh4RKxPOc=
CountingInputStream closed. Total number of bytes read: 1100
Validating bytes read and calculated digest
Digest: jIisvDleAttKiPkyU/hDvbzzottAMn6n7inh4RKxPOc=
CountingInputStream closed. Total number of bytes read: 1072
Fun question, I may have gone overboard with my answer :)

Decompressing tar file with Apache Commons Compress

I'm using Apache Commons Compress to create tar archives and decompress them. My problems start with this method:
private void decompressFile(File file) throws IOException {
logger.info("Decompressing " + file.getName());
BufferedOutputStream outputStream = null;
TarArchiveInputStream tarInputStream = null;
try {
tarInputStream = new TarArchiveInputStream(
new FileInputStream(file));
TarArchiveEntry entry;
while ((entry = tarInputStream.getNextTarEntry()) != null) {
if (!entry.isDirectory()) {
File compressedFile = entry.getFile();
File tempFile = File.createTempFile(
compressedFile.getName(), "");
byte[] buffer = new byte[BUFFER_MAX_SIZE];
outputStream = new BufferedOutputStream(
new FileOutputStream(tempFile), BUFFER_MAX_SIZE);
int count = 0;
while ((count = tarInputStream.read(buffer, 0, BUFFER_MAX_SIZE)) != -1) {
outputStream.write(buffer, 0, count);
}
}
deleteFile(file);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (outputStream != null) {
outputStream.flush();
outputStream.close();
}
}
}
Every time I run the code, compressedFile variable is null, but the while loop is iterating over all entries in my test tar.
Could you help me to understand what I'm doing wrong?

From the official documentation
Reading entries from an tar archive:
TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
tarInput.read(content, offset, content.length - offset);
}
I have written an example starting from your implementation and testing with a very trivial .tar (just one entry of text).
Not knowing the exact requirement I just take care of solving the problem of reading the archive avoiding the nullpointer. Debugging, the entry is available as you also have found
private static void decompressFile(File file) throws IOException {
BufferedOutputStream outputStream = null;
TarArchiveInputStream tarInputStream = null;
try {
tarInputStream = new TarArchiveInputStream(
new FileInputStream(file));
TarArchiveEntry entry;
while ((entry = tarInputStream.getNextTarEntry()) != null) {
if (!entry.isDirectory()) {
File compressedFile = entry.getFile();
String name = entry.getName();
int size = 0;
int c;
while (size < entry.getSize()) {
c = tarInputStream.read();
System.out.print((char) c);
size++;
}
(.......)
AS I said: I tested with a tar including only an entry of text (you can also try this approach to verify the code) to be sure that the null is avoided.
You need to make all the needed adaptations for your real needs.
It is clear that you will have to handle streams as in the metacode I posted on top.
It shows how to deal with the single entries.

Try using getNextEntry() method instead of getNextTarEntry() method.
The second method returns a TarArchiveEntry. Probably this is not what you want!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Decompressing GZIP - java

Wrap your stream in a GZIPInputStream and read from that. resultStream = new GZIPInputStream(resultStream); //proceed reading as usual

Disclaimer: I have not had a chance to test this. According to what Android’s HTTP Clients blog post says if you are on Android 2.3+, then HttpURLConnection can do it for you automatically.

Do the following : resultStream = new java.util.zip.GZIPInputStream(response.getStream());

Related

Large File Download causing GC overhead limit exceeded

Stream closed error when getting charset

zip4j, extract a password protected file from an inputstream (blob inputstream which is a zip file)

Inline input stream processing in Java

Decompressing tar file with Apache Commons Compress

Categories

Resources