Upload to S3 using Gzip in Java - java

I'm new to Java and I'm trying to upload a large file ( ~10GB ) to Amazon S3. Could anyone please help me with how to use GZip outputsteam for it ?
I've been through some documentations but got confused about Byte Streams, Gzip streams. They must be used together ? Can anyone help me with this piece of code ?
Thanks in advance.

Have a look at this,
Is it possible to gzip and upload this string to Amazon S3 without ever being written to disk?
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
GZipOuputStream gzipOut = new GZipOutputStream(byteOut);
// write your stuff
byte[] bites = byteOut.toByteArray();
//write the bites to the amazon stream
Since its a large file you might want to have a look at multi part upload

This question could have been more specific and there are several ways to achieve this. One approach might look like the below.
The example depends on the commons-io and commons-compress libraries, and uses classes from the java.nio.file package.
public static void compressAndUpload(AmazonS3 s3, InputStream in)
throws IOException
{
// Create temp file
Path tmpPath = Files.createTempFile("prefix", "suffix");
// Create and write to gzip compressor stream
OutputStream out = Files.newOutputStream(tmpPath);
GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out);
IOUtils.copy(in, gzOut);
// Read content from temp file
InputStream fileIn = Files.newInputStream(tmpPath);
long size = Files.size(tmpPath);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType("application/x-gzip");
metadata.setContentLength(size);
// Upload file to S3
s3.putObject(new PutObjectRequest("bucket", "key", fileIn, metadata));
}
Buffering, error handling and closing of streams are omitted for brevity.

Related

Creating an Avro file in Amazon S3 bucket

How to create an Avro file in s3 bucket and then appending avro records to it.
I have all the avro records in the form of Byte array and were successfully transferred in an avro file. But his file is (what i know) not a complete avro file. Since a complete avro file is schema + data.
Following is the code to transfer the byte records in a file in S3.
Any one knows how to create a avro schema based file and then transfer these bytes to that same file.
public void sendByteData(byte [] b, Schema schema){
try{
AWSCredentials credentials = new BasicAWSCredentials("XXXXX", "XXXXXX");
AmazonS3 s3Client = new AmazonS3Client(credentials);
//createFolder("encounterdatasample", "avrofiles", s3Client);
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(b.length);
InputStream stream = new ByteArrayInputStream(b);
/* File file = new File("/home/abhishek/sample.avro");
DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(writer);
dataFileWriter.create(schema, file);
s3Client.putObject("encounterdatasample", dataFileWriter.create(schema, file), stream, meta);
*/
s3Client.putObject("encounterdatasample", "sample.avro", stream,meta);
System.out.println("Done writing the data");
}catch(Exception e){
e.printStackTrace();
}
}
The code in comments doesn't work. Was just trying to play around with it.
Any help on this.
Thanks.
I believe your assertion is correct, you can't encode both the data and the schema in the byte array. You need to use some container, typically a file, to encode both.
With a few fixes, the code you have commented out should work. I just did something similar from within a Lambda written in Java. I wrote the file out to local disk (/tmp) using DataFileWriter, then put that file to S3 using your syntax without issue.
Two suggestions:
call dataFileWriter.close() once you're finished writing to file.
use the file object directly in the s3Client.putObject call, e.g. s3Client.putObject(bucket,key,file)

iText - OutOfMemory creating more than 1000 PDFs

I want to create a ZipOutputStream filled with PDF-As. I'm using iText (Version 5.5.7). For more than 1000 pdf entries I get an OutOfMemory-exception on doc.close() and can't find the leak.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(baos));
zos.setEncoding("Cp850");
for (MyObject o : objects) {
try {
String pdfFilename = o.getName() + ".pdf";
zos.putNextEntry(new ZipEntry(pdfFilename));
pdfBuilder.buildPdfADocument(zos);
zos.closeEntry();
} ...
PdfBuilder
public void buildPdfADocument(org.apache.tools.zip.ZipOutputStream zos){
Document doc = new Document(PageSize.A4);
PdfAWriter writer = PdfAWriter.getInstance(doc, zos, PdfAConformanceLevel.PDF_A_1B);
writer.setCloseStream(false); // to not close my zos
writer.setViewerPreferences(PdfWriter.ALLOW_PRINTING | PdfWriter.PageLayoutSinglePage);
writer.createXmpMetadata();
doc.open();
// adding Element's to doc
// with flushContent() on PdfPTables
InputStream sRGBprofile = servletContext.getResourceAsStream("/WEB-INF/conf/AdobeRGB1998.icc");
ICC_Profile icc = ICC_Profile.getInstance(sRGBprofile);
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
//try to close/flush everything possible
doc.close();
writer.setXmpMetadata(null);
writer.flush();
writer.close();
if(sRGBprofile != null){
sRGBprofile.close();
}
}
Any suggestions how can I fix it? Am I forgetting something?
I've already tried to use java ZipOutputStream but it makes any difference.
Thx for ur answers! I understand the issue with the ByteOutputStream, but I am not sure what's the best approach in my case. It's a web application and I need to pack the zip in a database blob somehow.
What I am doing now is creating the PDFs directly into the ZipOutputStream with iText and saving byte array of the corresponding ByteArrayOutputSteam to blob. Options that I see are:
Split my data in 500 object packages, save first 500 PDFs to the database and then open the zip and add the next 500 ones and so on... But I assume that this creates me the same situation as I have now, namely too big stream opened in the memory.
Try to save the PDFs on the server (not sure if there's enough space), create temporary zip file and then submit the bytes to the blob...
Any suggestions/ideas?
It's because your ZipOutputStream is backed by a ByteArrayOutputStream, so even closing the entries keeps the full ZIP contents in memory.
You need to use another approach to do it with this number of arguments (1000+ files).
You are loading all the PDF files in memory on your example, you will need to do this in blocks of documents to minimize the effect of this 'memory load'.
Another approach is serialize your PDFs on filesystem, and then create your zip file.

How to write encoded text to a file using Java?

How to write encoded text to a file using java/jsp with FileWriter?
FileWriter testfilewriter = new FileWriter(testfile, true);
testfilewriter.write(testtext);
testtext:- is text
testfile:- is String (Encoded)
What I m trying to do is encoding testfile with Base64 and storing it in file. What is the best way to do so?
Since your data is not plain text, you can't use FileWriter. What you need is FileOutputStream.
Encode your text as Base64:
byte[] encodedText = Base64.encodeBase64( testtext.getBytes("UTF-8") );
and write to file:
try (OutputStream stream = new FileOutputStream(testfile)) {
stream.write(encodedText);
}
or if you don't want to lose existing data, write in append mode by setting append boolean to true:
try (OutputStream stream = new FileOutputStream(testfile, true)) {
stream.write(encodedText);
}
You can do the encoding yourself and then write to the file as suggested by #Alper OR if you want to create a stream which does encoding/decoding to while writing and reading from file , apache commons codec library will come in handy see Base64OutputStream and Base64InputStream
Interestingly Java 8 has a similar API Base64.Encoder. Checkout the wrap method
Hope this helps.
The Approach to be followed depends on the algorithm you are using and writing the encoded file is same as writing the file in java
IMHO, if you are trying to do it using jsp , Kindly go with servlets .As jsp are not meant for business layers rather do servlets.
I'm not going to give the code, as it is pretty easy if you try it. I'll share the best way to do it as a psuedo code. Here are steps to write your encoded text.
Open input file in read mode & output file in append mode.
If input file isn't huge (it can fit in memory) then read whole file at once, otherwise read line-by-line.
Encode the text retrieved from file using Base64Encoder
Write in the output file in append mode.
You can't use a FileWriter directly for this task.
You asked how you can do it, but you didn't give any information about which JDK and library you use, so here are a few solutions with the standard tools.
If you're using Java 8:
String testFile = "";
try (Writer writer = new OutputStreamWriter(
Base64.getEncoder().wrap(
java.nio.file.Files.newOutputStream(
Paths.get(testFile),
StandardOpenOption.APPEND)),
StandardCharsets.UTF_8)
) {
writer.write("text to be encoded in Base64");
}
If you're using Java 7 with Guava:
String testFile = "";
CharSink sink = BaseEncoding.base64()
.encodingSink(
com.google.common.io.Files.asCharSink(
new File(testFile),
StandardCharsets.UTF_8,
FileWriteMode.APPEND))
.asCharSink(StandardCharsets.UTF_8);
try (Writer writer = sink.openStream()) {
writer.write("text to be encoded in Base64");
}
If you're using Java 6 with Guava:
String testFile = "";
CharSink sink = BaseEncoding.base64()
.encodingSink(
com.google.common.io.Files.asCharSink(
new File(testFile),
Charsets.UTF_8,
FileWriteMode.APPEND))
.asCharSink(Charsets.UTF_8);
Closer closer = Closer.create();
try {
Writer writer = closer.register(sink.openStream());
writer.write("text to be encoded in Base64");
} catch (Throwable e) { // must catch Throwable
throw closer.rethrow(e);
} finally {
closer.close();
}
I don't have much knowledge about other libraries so I won't pretend I do and add another helper.

Obtaining the wav file from ServletInputStream

Our current project requires us to send an audio file to the server and then use the audio file for further computation.
Using the Java sound api, I was able to capture the recording and save it as a wav file in my system. Then in order to pass the audio wav to the server, I am using Apache Commons HttpClient to post a request to the server. (I am using InputstreamEntity provided by apache and sending the data as a chunk).
The problem appears when i am trying to recreate/retrieve the wav file on the server. I understand that I would have to use the AudioSystem.write API to create the wav file (exactly as what was done on my system). However what I observe is that althought the file gets created , it does not play (I am using vlc media player to test it FYI). I have searched in Google for sample codes and have tried to implement it, but is unable to play it once the file gets created.
The sample code snippets indicates the approaches i have tried:
//******************************************************************
try {
InputStream is = request.getInputStream();
FileOutputStream fs = new FileOutputStream("output123.wav");
byte[] tempbuffer = new byte[4096];
int bytesRead;
while((bytesRead=is.read(tempbuffer))!=-1)
{
fs.write(tempbuffer, 0,bytesRead);
}
is.close();
fs.close();
AudioInputStream inputStream =AudioSystem.getAudioInputStream(newFile("output123.wav"));
int numofbytes = inputStream.available();
byte[] buffer = new byte[numofbytes];
inputStream.read(buffer);
int bytesWritten = AudioSystem.write(inputStream, AudioFileFormat.Type.WAVE,new File("outputtest.wav"));
System.out.println("written"+bytesWritten);
Approach 2
InputStream is = request.getInputStream();
System.out.println("inputStream obtained : "+is.toString());
ByteArrayInputStream bais = null;
byte[] audioBuffer = IOUtils.toByteArray(is);
System.out.println(" is audioBuffer empty? : length = ? "+audioBuffer.length);
try {
AudioFileFormat ai = AudioSystem.getAudioFileFormat(is);
System.out.println("ai bytelength ? "+ai.getByteLength());
System.out.println("ai frame length = "+ai.getFrameLength());
Set<Map.Entry<String,Object>> audioProperties = ai.getFormat().properties().entrySet();
System.out.println("entry set is empty ? "+audioProperties.isEmpty());
for(Map.Entry me : audioProperties){
System.out.println("key = "+me.getKey());
System.out.println("value ="+me.getValue());}
bais = new ByteArrayInputStream(audioBuffer);
AudioInputStream ais = new AudioInputStream(bais, new AudioFormat(8000,8,2,true,true), 2);
AudioSystem.write(ais, AudioFileFormat.Type.WAVE,new File("testtest.wav"));
//*************************************************************************************
The audioFormat properties all turned out to be null. Are these null values giving the problem? So while creating the wave file on the server, I tried to set the properties manually once again. But even then the wav file would not play.
I have also tried quite a few approaches already mentioned on this site, but somehow they aren't working. I am sure i am missing something, but I am unable to pinpoint the exact problem.
Would be really helpful, if you guys can point out how to go about the conversion from ServletInputStream to getting a wav.
P.S (1) I know the code is shabby, because i have been under a trial and error situation for quite some time now. But I will give more details on the approaches if needed.
2) Apologise for the clumsiness, this happens to be my first post.. )
this is not how you copy a stream (from Approach 1). you have the correct code to copy a stream just above this.:
int numofbytes = inputStream.available();
byte[] buffer = new byte[numofbytes];
inputStream.read(buffer);
If all your server wants to do is get the data and write it to a file, then you do not need to use any of the audio API: simply treat the data as a stream of bytes.
So the part of approach 1 that is before any mention of AudioInputStream should be sufficient.
Although the approach chosen might not be the perfect solution, due to time constraints, I adopted a simpler approach. Using java.util.zip i simply zipped it up and sent it over to the server and then wrote a layer wherin the file gets unzipped . then i deleted the zip files. Seems like an immature solution (bcos the original challenge was to send the audio file). now i am incurring an overhead of zipping the files, but the file transfer would hapeen relatively faster. Thanks for your help guys.

Writing to CSV Files and then Zipping it up in Appengine (Java)

I'm currently working on a project that is done in Java, on google appengine.
Appengine does not allow files to be stored so any on-disk representation objects cannot be used. Some of these include the File class.
I want to write data and export it to a few csv files, and then zip it up, and allow the user to download it.
How may I do this without using any File classes? I'm not very experienced in file handling so I hope you guys can advise me.
Thanks.
You can create a zip file and add to it while the user is downloading it. If you are using a servlet, this is straigthforward:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// ..... process request
// ..... then respond
response.setContentType("application/zip");
response.setStatus(HttpServletResponse.SC_OK);
// note : intentionally no content-length set, automatic chunked transfer if stream is larger than the internal buffer of the response
ZipOutputStream zipOut = new ZipOutputStream(response.getOutputStream());
byte[] buffer = new byte[1024 * 32];
try {
// case1: already have input stream, typically ByteArrayInputStream from a byte[] full of previoiusly prepared csv data
InputStream in = new BufferedInputStream(getMyFirstInputStream());
try {
zipOut.putNextEntry(new ZipEntry("FirstName"));
int length;
while((length = in.read(buffer)) != -1) {
zipOut.write(buffer, 0, length);
}
zipOut.closeEntry();
} finally {
in.close();
}
// case 2: write directly to output stream, i.e. you have your raw data but need to create csv representation
zipOut.putNextEntry(new ZipEntry("SecondName"));
// example setup, key is to use the below outputstream 'zipOut' write methods
Object mySerializer = new MySerializer(); // i.e. csv-writer
Object myData = getMyData(); // the data to be processed by the serializer in order to make a csv file
mySerizalier.setOutput(zipOut);
// write whatever you have to the zipOut
mySerializer.write(myData);
zipOut.closeEntry();
// repeat for the next file.. or make for-loop
}
} finally {
zipOut.close();
}
}
There is no reason to store your data in files unless you have memory constraints. Files give you InputStream and OutputStream, both which have in-memory equivalents.
Note that creating a csv writer usually means doing something like this, where the point is to take a piece of data (array list or map, whatever you have) and make it into byte[] parts. Append the byte[] parts into an OutputStream using a tool like DataOutputStream (make your own if you like) or OutputStreamWriter.
If your data is not huge, meaning can stay in memory then exporting to CSV and zipping up and streaming it for downloading can all be done on-they-fly. Caching can be done at any of these steps which greatly depends on your application's business logic.

Categories