Unable to decompress gzipped files after uploading input stream chunks in S3

Unable to decompress gzipped files after uploading input stream chunks in S3 - java

I'd like to take my input stream and upload gzipped parts to s3 in a similar fashion to the multipart uploader.
However, I want to store the individual file parts in S3 and not turn the parts into a single file.
To do so, I have created the following methods.
But, when I try to gzip decompress each part gzip throws an error and says: gzip: file_part_2.log.gz: not in gzip format.
I'm not sure if I am compressing each part correctly?
If I re-initialise the gzipoutputstream: gzip = new GZIPOutputStream(baos); and set gzip.finish() after reseting the byte array output stream baos.reset(); then I am able to decompress each part. Not sure why I need todo this, is there a similar reset for the gzipoutputstream?
public void upload(String bucket, String key, InputStream is, int partSize) throws Exception
{
String row;
BufferedReader br = new BufferedReader(new InputStreamReader(is, ENCODING));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(baos);
int partCounter = 0;
int lineCounter = 0;
while ((row = br.readLine()) != null) {
if (baos.size() >= partSize) {
partCounter = this.uploadChunk(bucket, key, baos, partCounter);
baos.reset();
}else if(!row.equals("")){
row += '\n';
gzip.write(row.getBytes(ENCODING));
lineCounter++;
}
}
gzip.finish();
br.close();
baos.close();
if(lineCounter == 0){
throw new Exception("Aborting upload, file contents is empty!");
}
//Final chunk
if (baos.size() > 0) {
this.uploadChunk(bucket, key, baos, partCounter);
}
}
private int uploadChunk(String bucket, String key, ByteArrayOutputStream baos, int partCounter)
{
ObjectMetadata metaData = new ObjectMetadata();
metaData.setContentLength(baos.size());
String[] path = key.split("/");
String[] filename = path[path.length-1].split("\\.");
filename[0] = filename[0]+"_part_"+partCounter;
path[path.length-1] = String.join(".", filename);
amazonS3.putObject(
bucket,
String.join("/", path),
new ByteArrayInputStream(baos.toByteArray()),
metaData
);
log.info("Upload chunk {}, size: {}", partCounter, baos.size());
return partCounter+1;
}

The problem is that you're using a single GZipOutputStream for all chunks. So you're actually writing pieces of a GZipped file, which would have to be recombined to be useful.
Making the minimal change to your existing code:
if (baos.size() >= partSize) {
gzip.close();
partCounter = this.uploadChunk(bucket, key, baos, partCounter);
baos = baos = new ByteArrayOutputStream();
gzip = new GZIPOutputStream(baos);
}
You need to do the same at the end of the loop. Also, you shouldn't throw an exception if the line counter is 0: it's entirely possible that the file is exactly divisible into a set number of chunks.
To improve the code, I would wrap the GZIPOutputStream in an OutputStreamWriter and a BufferedWriter, so that you don't need to do the string-bytes conversion explicitly.
And lastly, don't use ByteArrayOutputStream.reset(). It doesn't save you anything over just creating a new stream, and opens the door for errors if you ever forget to reset.

Related

Part of HttpServletRequest body input stream is not read when using InputStream#read(byte[])

I am trying to create a github webhook. It sends a payload every time I publish a new package to one of my repositories. My issue is that that I cannot seem to be able to read in the whole body. It gets cut off at the same number of bytes each time. However, I can see the whole body if I read it using HttpServletRequest#getReader(). Is there something I am doing wrong when trying to read the input stream?
Here is the code for reading the body:
byte[] bodyBytes = new byte[request.getContentLength()];
System.out.println(request.getContentLength());
request.getInputStream().read(bodyBytes);
//System.out.println(request.getReader().readLine()); //works correctly
try (FileWriter fw = new FileWriter(new File("./payload.txt"))) {
for(byte i : bodyBytes)
fw.write("0x" + String.format("%02x ", i) + " ");
fw.write("\n\n\n");
fw.write(new String(bodyBytes));
}

As per the Javadocs, InputStream.read(byte[]) will read at least one byte, when available, and at most as many as the size of the byte array argument. It may read less for any reason, in which case you have to call it repeatedly to get the entire content. Simplest case: write to a ByteArrayOutputStream:
byte[] buf = new byte[1024];
int r;
InputStream is = request.getInputStream();
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
while ((r = is.read(buf)) >= 0) {
baos.write(buf, 0, r);
}
// the bytes are now accessible:
byte[] entireContent = baos.getBytes();
}
This is the principle; it has the disadvantage that it stores the entire content into memory. You may want to process each "batch" of the input and write it to the file instead of in memory, e.g. as:
byte[] buf = new byte[1024];
int r, i;
InputStream is = request.getInputStream();
try (FileWriter fw = new FileWriter(new File("./payload.txt"))) {
while ((r = is.read(buf)) >= 0) {
for (i=0; i < r; i++) {
fw.write("0x" + String.format("%02x ", buf[i]) + " ");
}
// *************** NOTE ****************************
// Apparently you need the entire content as well, so
// this kind of streaming does not apply in this case.
// You have to store the entire content in memory.
// Keeping the code here as an example/reference.
}
}

SAXParseException question (how to strip any BOM characters from an xml file)

I have some data in an xml file and I am using the Process library to parse thru that file. I ran into the BOM marker issue, that caused some errors to be thrown. I found a work around elsewhere, which is very slow: I'm using Apache Commons BOMInputStream to read the file as a bunch of bytes, after skipping the ones that represent that BOM data.
I think that the source of my problem is actually my lack of knowledge about streams, readers and writers. There are so many different readers and writers and all kinds of "streams" (a word I barely understand) that I want to pull my hair out trying to figure out which one to use and how. I think I just picked the wrong implementation.
Question:
Can someone show me why my code is so slow, and also help me improve my understanding of file i/o?
Code:
private static XML noBOM(String filename, PApplet p) throws FileNotFoundException, IOException{
ByteArrayOutputStream out = new ByteArrayOutputStream();
File f = new File(filename);
InputStream stream = new FileInputStream(f);
BOMInputStream bomIn = new BOMInputStream(stream);
int tmp = -1;
while ((tmp = bomIn.read()) != -1){
out.write(tmp);
}
String strXml = out.toString();
return p.parseXML(strXml);
}
public static Map<String, Float> lifeExpectancyFromXML(String filename, PApplet p,
int year) throws FileNotFoundException, IOException{
Map<String, Float> dataMap = new HashMap<>();
XML xml = noBOM(filename, p);
if(xml != null){
XML[] records = xml.getChild("data").getChildren("record");
for (XML record : records){
XML[] fields = record.getChildren("field");
String country = fields[0].getContent();
int entryYear = fields[2].getIntContent();
float lifeEx = fields[3].getFloatContent();
if (entryYear == year){
System.out.println("Country: " + country);
System.out.println("Life Expectency: " + lifeEx);
dataMap.put(country, lifeEx);
}
}
}
else {
System.out.println("String could not be parsed.");
}
return dataMap;
}

Problem is probably, that InputStream is read byte by byte. Try to use buffer to make it more performant:
try (BOMInputStream bis = new BOMInputStream(new FileInputStream(new File(filename)))) {
byte[] buffer = new byte[1000];
while (bis.read(buffer) != -1) {
out.write(buffer);
}
}
Updated:
Resulting ByteArrayOutputStream may contain some empty bytes in the end. To remove them trim the resulting string:
out.toString("UTF-8").trim()

My solution was to use BufferedReader instead of creating my own buffer. It made everything quite speedy:
private static XML noBOM(String path, PApplet p) throws
FileNotFoundException, UnsupportedEncodingException, IOException{
//set default encoding
String defaultEncoding = "UTF-8";
//create BOMInputStream to get rid of any Byte Order Mark
BOMInputStream bomIn = new BOMInputStream(new FileInputStream(path));
//If BOM is present, determine encoding. If not, use UTF-8
ByteOrderMark bom = bomIn.getBOM();
String charSet = bom == null ? defaultEncoding : bom.getCharsetName();
//get buffered reader for speed
InputStreamReader reader = new InputStreamReader(bomIn, charSet);
BufferedReader breader = new BufferedReader(reader);
//Build string to parse into XML using Processing's PApplet.parsXML
StringBuilder buildXML = new StringBuilder();
int c;
while((c = breader.read()) != -1){
buildXML.append((char) c);
}
reader.close();
return p.parseXML(buildXML.toString());
}

Inline input stream processing in Java

I need some help on below problem. I am working on a project where I need to deal with files.
I get the handle of input stream from the user from which before writing it to disk I need to perform certain steps.
calculate the file digest
check for only 1 zip file present, unzip the data if zipped
dos 2 unix conversion
record length validation
and encrypt and save the file to disk
Also need to break the flow if there is any exception in the process
I tried to use piped output and input stream, but the constraint is Java recommends it to run in 2 separate threads. Once I read from input stream I am not able to use it from other processing steps. Files can be very big so cannot cache all the data in buffer.
Please provide your suggestions or is there any third party lib I can use for same.

The biggest issue is that you'll need to peek ahead in the provided InputStream to decide if you received a zipfile or not.
private boolean isZipped(InputStream is) throws IOException {
try {
return new ZipInputStream(is).getNextEntry() != null;
} catch (final ZipException ze) {
return false;
}
}
After this you need to reset the inputstream to the initial position before setting up a DigestInputStream.
Then read a ZipInputstream or the DigestInputstream directly.
After you've done your processing, read the DigestInputStream to the end so you can obtain the digest.
Below code has been validated through a wrapping "CountingInputstream" that keeps track of the total number of bytes read from the provided FileInputStream.
final FileInputStream fis = new FileInputStream(filename);
final CountingInputStream countIs = new CountingInputStream(fis);
final boolean isZipped = isZipped(countIs);
// make sure we reset the inputstream before calculating the digest
fis.getChannel().position(0);
final DigestInputStream dis = new DigestInputStream(countIs, MessageDigest.getInstance("SHA-256"));
// decide which inputStream to use
InputStream is = null;
ZipInputStream zis = null;
if (isZipped) {
zis = new ZipInputStream(dis);
zis.getNextEntry();
is = zis;
} else {
is = dis;
}
final File tmpFile = File.createTempFile("Encrypted_", ".tmp");
final OutputStream os = new CipherOutputStream(new FileOutputStream(tmpFile), obtainCipher());
try {
readValidateAndWriteRecords(is, os);
failIf2ndZipEntryExists(zis);
} catch (final Exception e) {
os.close();
tmpFile.delete();
throw e;
}
System.out.println("Digest: " + obtainDigest(dis));
dis.close();
System.out.println("\nValidating bytes read and calculated digest");
final DigestInputStream dis2 = new DigestInputStream(new CountingInputStream(new FileInputStream(filename)), MessageDigest.getInstance("SHA-256"));
System.out.println("Digest: " + obtainDigest(dis2));
dis2.close();
Not really relevant, but these are the helper methods:
private String obtainDigest(DigestInputStream dis) throws IOException {
final byte[] buff = new byte[1024];
while (dis.read(buff) > 0) {
dis.read(buff);
}
return DatatypeConverter.printBase64Binary(dis.getMessageDigest().digest());
}
private void readValidateAndWriteRecords(InputStream is, final OutputStream os) throws IOException {
final BufferedReader br = new BufferedReader(new InputStreamReader(is));
// do2unix is done automatically by readline
for (String line = br.readLine(); line != null; line = br.readLine()) {
// record length validation
if (line.length() < 1) {
throw new RuntimeException("RecordLengthValidationFailed");
}
os.write((line + "\n").getBytes());
}
}
private void failIf2ndZipEntryExists(ZipInputStream zis) throws IOException {
if (zis != null && zis.getNextEntry() != null) {
throw new RuntimeException("Zip File contains multiple entries");
}
}
==> output:
Digest: jIisvDleAttKiPkyU/hDvbzzottAMn6n7inh4RKxPOc=
CountingInputStream closed. Total number of bytes read: 1100
Validating bytes read and calculated digest
Digest: jIisvDleAttKiPkyU/hDvbzzottAMn6n7inh4RKxPOc=
CountingInputStream closed. Total number of bytes read: 1072
Fun question, I may have gone overboard with my answer :)

InputStream returns unexpected -1/empty

I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html

For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.

As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}

You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.

I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.

I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.

Prepend lines to file in Java

Is there a way to prepend a line to the File in Java, without creating a temporary file, and writing the needed content to it?

No, there is no way to do that SAFELY in Java. (Or AFAIK, any other programming language.)
No filesystem implementation in any mainstream operating system supports this kind of thing, and you won't find this feature supported in any mainstream programming languages.
Real world file systems are implemented on devices that store data as fixed sized "blocks". It is not possible to implement a file system model where you can insert bytes into the middle of a file without significantly slowing down file I/O, wasting disk space or both.
The solutions that involve an in-place rewrite of the file are inherently unsafe. If your application is killed or the power dies in the middle of the prepend / rewrite process, you are likely to lose data. I would NOT recommend using that approach in practice.
Use a temporary file and renaming. It is safer.

There is a way, it involves rewriting the whole file though (but no temporary file). As others mentioned, no file system supports prepending content to a file. Here is some sample code that uses a RandomAccessFile to write and read content while keeping some content buffered in memory:
public static void main(final String args[]) throws Exception {
File f = File.createTempFile(Main.class.getName(), "tmp");
f.deleteOnExit();
System.out.println(f.getPath());
// put some dummy content into our file
BufferedWriter w = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f)));
for (int i = 0; i < 1000; i++) {
w.write(UUID.randomUUID().toString());
w.write('\n');
}
w.flush();
w.close();
// append "some uuids" to our file
int bufLength = 4096;
byte[] appendBuf = "some uuids\n".getBytes();
byte[] writeBuf = appendBuf;
byte[] readBuf = new byte[bufLength];
int writeBytes = writeBuf.length;
RandomAccessFile rw = new RandomAccessFile(f, "rw");
int read = 0;
int write = 0;
while (true) {
// seek to read position and read content into read buffer
rw.seek(read);
int bytesRead = rw.read(readBuf, 0, readBuf.length);
// seek to write position and write content from write buffer
rw.seek(write);
rw.write(writeBuf, 0, writeBytes);
// no bytes read - end of file reached
if (bytesRead < 0) {
// end of
break;
}
// update seek positions for write and read
read += bytesRead;
write += writeBytes;
writeBytes = bytesRead;
// reuse buffer, create new one to replace (short) append buf
byte[] nextWrite = writeBuf == appendBuf ? new byte[bufLength] : writeBuf;
writeBuf = readBuf;
readBuf = nextWrite;
};
rw.close();
// now show the content of our file
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}

You could store the file content in a String and prepend the desired line by using a StringBuilder-Object. You just have to put the desired line first and then append the file-content-String.
No extra temporary file needed.

No. There are no "intra-file shift" operations, only read and write of discrete sizes.

It would be possible to do so by reading a chunk of the file of equal length to what you want to prepend, writing the new content in place of it, reading the later chunk and replacing it with what you read before, and so on, rippling down the to the end of the file.
However, don't do that, because if anything stops (out-of-memory, power outage, rogue thread calling System.exit) in the middle of that process, data will be lost. Use the temporary file instead.

private static void addPreAppnedText(File fileName) {
FileOutputStream fileOutputStream =null;
BufferedReader br = null;
FileReader fr = null;
String newFileName = fileName.getAbsolutePath() + "#";
try {
fileOutputStream = new FileOutputStream(newFileName);
fileOutputStream.write("preappendTextDataHere".getBytes());
fr = new FileReader(fileName);
br = new BufferedReader(fr);
String sCurrentLine;
while ((sCurrentLine = br.readLine()) != null) {
fileOutputStream.write(("\n"+sCurrentLine).getBytes());
}
fileOutputStream.flush();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
fileOutputStream.close();
if (br != null)
br.close();
if (fr != null)
fr.close();
new File(newFileName).renameTo(new File(newFileName.replace("#", "")));
} catch (IOException ex) {
ex.printStackTrace();
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Unable to decompress gzipped files after uploading input stream chunks in S3 - java

Related

Part of HttpServletRequest body input stream is not read when using InputStream#read(byte[])

SAXParseException question (how to strip any BOM characters from an xml file)

Inline input stream processing in Java

InputStream returns unexpected -1/empty

Prepend lines to file in Java

Categories

Resources