I have a lot of files on S3 that I need to zip and then provide the zip via S3. Currently I zip them from stream to a local file and then upload the file again. This takes up a lot of disk space, as each file has around 3-10MB and I have to zip up to 100.000 files. So a zip can have more than 1TB. So I would like a solution just along this lines:
Create a zip file on S3 from files on S3 using Lambda Node
Here it seams the zip is created directly on S3 without taking up local disk space. But I am just not smart enough to transfer the above solution to Java. I am also finding conflicting information on the java aws sdk, saying that they planned on changing the stream behavior in 2017.
Not sure if this will help, but here's what I've been doing so far (Upload is my local model that holds S3 information). I just removed logging and stuff for better readability. I think I am not taking up space for the download "piping" the InputStream directly into the zip. But like I said I would also like to avoid the local zip file and create it directly on S3. That however would probably require the ZipOutputStream to be created with S3 as target instead of a FileOutputStream. Not sure how that can be done.
public File zipUploadsToNewTemp(List<Upload> uploads) {
List<String> names = new ArrayList<>();
byte[] buffer = new byte[1024];
File tempZipFile;
try {
tempZipFile = File.createTempFile(UUID.randomUUID().toString(), ".zip");
} catch (Exception e) {
throw new ApiException(e, BaseErrorCode.FILE_ERROR, "Could not create Zip file");
}
try (
FileOutputStream fileOutputStream = new FileOutputStream(tempZipFile);
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream)) {
for (Upload upload : uploads) {
InputStream inputStream = getStreamFromS3(upload);
ZipEntry zipEntry = new ZipEntry(upload.getFileName());
zipOutputStream.putNextEntry(zipEntry);
writeStreamToZip(buffer, zipOutputStream, inputStream);
inputStream.close();
}
zipOutputStream.closeEntry();
zipOutputStream.close();
return tempZipFile;
} catch (IOException e) {
logError(type, e);
if (tempZipFile.exists()) {
FileUtils.delete(tempZipFile);
}
throw new ApiException(e, BaseErrorCode.IO_ERROR,
"Error zipping files: " + e.getMessage());
}
}
// I am not even sure, but I think this takes up memory and not disk space
private InputStream getStreamFromS3(Upload upload) {
try {
String filename = upload.getId() + "." + upload.getFileType();
InputStream inputStream = s3FileService
.getObject(upload.getBucketName(), filename, upload.getPath());
return inputStream;
} catch (ApiException e) {
throw e;
} catch (Exception e) {
logError(type, e);
throw new ApiException(e, BaseErrorCode.UNKOWN_ERROR,
"Unkown Error communicating with S3 for file: " + upload.getFileName());
}
}
private void writeStreamToZip(byte[] buffer, ZipOutputStream zipOutputStream,
InputStream inputStream) {
try {
int len;
while ((len = inputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, len);
}
} catch (IOException e) {
throw new ApiException(e, BaseErrorCode.IO_ERROR, "Could not write stream to zip");
}
}
And finally the upload Source code. Inputstream is created from the Temp Zip file.
public PutObjectResult upload(InputStream inputStream, String bucketName, String filename, String folder) {
String uploadKey = StringUtils.isEmpty(folder) ? "" : (folder + "/");
uploadKey += filename;
ObjectMetadata metaData = new ObjectMetadata();
byte[] bytes;
try {
bytes = IOUtils.toByteArray(inputStream);
} catch (IOException e) {
throw new ApiException(e, BaseErrorCode.IO_ERROR, e.getMessage());
}
metaData.setContentLength(bytes.length);
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketPrefix + bucketName, uploadKey, byteArrayInputStream, metaData);
putObjectRequest.setCannedAcl(CannedAccessControlList.PublicRead);
try {
return getS3Client().putObject(putObjectRequest);
} catch (SdkClientException se) {
throw s3Exception(se);
} finally {
IOUtils.closeQuietly(inputStream);
}
}
Just found a similar question to what I need also without answer:
Upload ZipOutputStream to S3 without saving zip file (large) temporary to disk using AWS S3 Java
You can get input stream from your S3 data, then zip this batch of bytes and stream it back to S3
long numBytes; // length of data to send in bytes..somehow you know it before processing the entire stream
PipedOutputStream os = new PipedOutputStream();
PipedInputStream is = new PipedInputStream(os);
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(numBytes);
new Thread(() -> {
/* Write to os here; make sure to close it when you're done */
try (ZipOutputStream zipOutputStream = new ZipOutputStream(os)) {
ZipEntry zipEntry = new ZipEntry("myKey");
zipOutputStream.putNextEntry(zipEntry);
S3ObjectInputStream objectContent = amazonS3Client.getObject("myBucket", "myKey").getObjectContent();
byte[] bytes = new byte[1024];
int length;
while ((length = objectContent.read(bytes)) >= 0) {
zipOutputStream.write(bytes, 0, length);
}
objectContent.close();
} catch (IOException e) {
e.printStackTrace();
}
}).start();
amazonS3Client.putObject("myBucket", "myKey", is, meta);
is.close(); // always close your streams
I would suggest using an Amazon EC2 instance (as low as 1c/hour, or you could even use a Spot Instance to get it at a lower price). Smaller instance types are lower cost but have limited bandwidth, so play around with the size to get your preferred performance.
Write a script to loop through the files then:
Download
Zip
Upload
Delete local files
All the zip magic happens on local disk. No need to use streams. Just use the Amazon S3 download_file() and upload_file() calls.
If the EC2 instance is in the same region as Amazon S3 then there is no Data Transfer charge.
Related
I've been trying to tackle this problem for a day or two and can't seem to figure out precisely how to add text files to a zip file, I was able to figure out how to add these text files to a 7zip file which was insanely easy, but a zip file seems to me much more complicated for some reason. I want to return a zip file for user reasons btw.
Here's what I have now:
(I know the code isn't too clean at the moment, I plan to tackle that after getting the bare functionality down).
private ZipOutputStream addThreadDumpsToZipFile(File file, List<Datapoint<ThreadDump>> allThreadDumps, List<Datapoint<String>> allThreadDumpTextFiles) {
ZipOutputStream threadDumpsZipFile = null;
try {
//creat new zip file which accepts input stream
//TODO missing step: create text files containing each thread dump then add to zip
threadDumpsZipFile = new ZipFile(new FileOutputStream(file));
FileInputStream fileInputStream = null;
try {
//add data to each thread dump entry
for(int i=0; i<allThreadDumpTextFiles.size();i++) {
//create file for each thread dump
File threadDumpFile = new File("thread_dump_"+i+".txt");
FileUtils.writeStringToFile(threadDumpFile,allThreadDumpTextFiles.get(i).toString());
//add entry/file to zip file (creates block to add input to)
ZipEntry threadDumpEntry = new ZipEntry("thread_dump_"+i); //might need to add extension here?
threadDumpsZipFile.putNextEntry(threadDumpEntry);
//add the content to this entry
fileInputStream = new FileInputStream(threadDumpFile);
byte[] byteBuffer = new byte[(int) threadDumpFile.length()]; //see if this sufficiently returns length of data
int bytesRead = -1;
while ((bytesRead = fileInputStream.read(byteBuffer)) != -1) {
threadDumpsZipFile.write(byteBuffer, 0, bytesRead);
}
}
threadDumpsZipFile.flush();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
fileInputStream.close();
} catch(Exception e) {
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return threadDumpsZipFile;
}
As you can sort of guess, I have a set of Thread Dumps that I want to add to my zip file and return to the user.
Let me know if you guys need any more info!
PS: There might be some bugs in this question, I just realized with some breakpoints that the threadDumpFile.length() won't really work.
Look forward to your replies!
Thanks,
Arsa
Here's a crack at it. I think you'll want to keep the file extensions when you make your ZipEntry objects. See if you can implement the below createTextFiles() function; the rest of this works -- I stubbed that method to return a single "test.txt" file with some dummy data to verify.
void zip()
{
try {
FileOutputStream fos = new FileOutputStream("yourZipFile.zip");
ZipOutputStream zos = new ZipOutputStream(fos);
File[] textFiles = createTextFiles(); // should be an easy step
for (int i = 0; i < files.length; i++) {
addToZipFile(file[i].getName(), zos);
}
zos.close();
fos.close();
} catch (Exception e) {
e.printStackTrace();
}
}
void addToZipFile(String fileName, ZipOutputStream zos) throws Exception {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file);
ZipEntry zipEntry = new ZipEntry(fileName);
zos.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {
zos.write(bytes, 0, length);
}
zos.closeEntry();
fis.close();
}
I have a zip file file.zip that is compressed. Is there a way to change the compression level of the file to store (no compression).
I have written and tried the following code and it works, but I will be running this in an environment where memory and storage will be a limitation and there might not be enough space. I am using the zip4j library.
This code extracts the input zip to a folder, then rezips it with store compression level. The problem with this is that at one point in execution, there are 3 copies of the zip on storage, which is a problem because space is a limitation.
try {
String zip = "input.zip";
final ZipFile zipFile = new ZipFile(zip);
zipFile.extractAll("dir");
File file = new File("dir");
ZipParameters params = new ZipParameters();
params.setCompressionMethod(Zip4jConstants.COMP_STORE);
params.setIncludeRootFolder(false);
ZipFile output = new ZipFile(new File("out.zip"));
output.addFolder(file, params);
file.delete();
return "Done";
} catch (Exception e) {
e.printStackTrace();
return "Error";
}
So any suggestions on another way to approach this problem? Or maybe some speed or memory optimizations to my current code?
As an alternative we can read files from zip one by one in memory or into temp file, like here
ZipInputStream is = ...
ZipOutputStream os = ...
os.setMethod(ZipOutputStream.STORED);
int bSize = ... calculate max available size
byte[] buf = new byte[bSize];
for (ZipEntry e; (e = is.getNextEntry()) != null;) {
ZipEntry e2 = new ZipEntry(e.getName());
e2.setMethod(ZipEntry.STORED);
int n = is.read(buf);
if (is.read() == -1) {
// in memory
e2.setSize(n);
e2.setCompressedSize(n);
CRC32 crc = new CRC32();
crc.update(buf, 0, n);
e2.setCrc(crc.getValue());
os.putNextEntry(e2);
os.write(buf, 0, n);
is.closeEntry();
os.closeEntry();
} else {
// use tmp file
}
}
reading in memory is supposed to be faster
I finally got it after a few hours by playing around with input streams.
try {
final ZipFile zipFile = new ZipFile("input.zip");
File output = new File("out.zip");
byte[] read = new byte[1024];
ZipInputStream zis = new ZipInputStream(new FileInputStream(zip));
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(output));
ZipEntry ze;
zos.setLevel(ZipOutputStream.STORED);
zos.setMethod(ZipOutputStream.STORED);
while((ze = zis.getNextEntry()) != null) {
int l;
zos.putNextEntry(ze);
System.out.println("WRITING: " + ze.getName());
while((l = zis.read(read)) > 0) {
zos.write(read, 0, l);
}
zos.closeEntry();
}
zis.close();
zos.close();
return "Done";
} catch (Exception e) {
e.printStackTrace();
return "Error";
}
Thanks so much for your answer Evgeniy Dorofeev, I literally just got my answer when I read yours! However, I prefer my method as it only takes up a maximum of 1 MB in memory (Am I right?). Also, I tried executing your code and only the first file in the input zip was transferred.
I have a method like
public void put(#Nonnull final InputStream inputStream, #Nonnull final String uniqueId) throws PersistenceException {
// a.) create gzip of inputStream
final GZIPInputStream zipInputStream;
try {
zipInputStream = new GZIPInputStream(inputStream);
} catch (IOException e) {
e.printStackTrace();
throw new PersistenceException("Persistence Service could not received input stream to persist for " + uniqueId);
}
I wan to convert the inputStream into zipInputStream, what is the way to do that?
The above method is incorrect and throws Exception as "Not a Zip Format"
converting Java Streams to me are really confusing and I do not make them right
The GZIPInputStream is to be used to decompress an incoming InputStream. To compress an incoming InputStream using GZIP, you basically need to write it to a GZIPOutputStream.
You can get a new InputStream out of it if you use ByteArrayOutputStream to write gzipped content to a byte[] and ByteArrayInputStream to turn a byte[] into an InputStream.
So, basically:
public void put(#Nonnull final InputStream inputStream, #Nonnull final String uniqueId) throws PersistenceException {
final InputStream zipInputStream;
try {
ByteArrayOutputStream bytesOutput = new ByteArrayOutputStream();
GZIPOutputStream gzipOutput = new GZIPOutputStream(bytesOutput);
try {
byte[] buffer = new byte[10240];
for (int length = 0; (length = inputStream.read(buffer)) != -1;) {
gzipOutput.write(buffer, 0, length);
}
} finally {
try { inputStream.close(); } catch (IOException ignore) {}
try { gzipOutput.close(); } catch (IOException ignore) {}
}
zipInputStream = new ByteArrayInputStream(bytesOutput.toByteArray());
} catch (IOException e) {
e.printStackTrace();
throw new PersistenceException("Persistence Service could not received input stream to persist for " + uniqueId);
}
// ...
You can if necessary replace the ByteArrayOutputStream/ByteArrayInputStream by a FileOuputStream/FileInputStream on a temporary file as created by File#createTempFile(), especially if those streams can contain large data which might overflow machine's available memory when used concurrently.
GZIPInputStream is for reading gzip-encoding content.
If your goal is to take a regular input stream and compress it in the GZIP format, then you need to write those bytes to a GZIPOutputStream.
See also this answer to a related question.
Could you point me out to a code or url where I can find some examples how to use dropbox java api and upload binary files like, .doc files jpg and video files.
Current examples in the web only point to uploading a text file. But when I try to read files using java InputStream and convert them to byte array and pass into dropbox file upload functions files get corrupted. Same issue with downloading files as well. Thanks in Advance.
Regards,
Waruna.
EDIT--
Code Sample
FileInputStream fis = new FileInputStream(file);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte [] buf = new byte[1024];
for(int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum);
System.out.println("read "+ readNum + "bytes,");
}
ByteArrayInputStream inputStream2 = new ByteArrayInputStream(bos.toByteArray());
Entry newEntry = mDBApi.putFile("/uploads/"+file.getName(), inputStream2, file.toString().length(), null, null);
System.out.println("Done. \nRevision of file: " + newEntry.rev + " " + newEntry.mimeType);
return newEntry.rev;
The 3rd argument of DropboxAPI.putFile() should be the number of bytes to read from the input stream - You are passing the length of the filename.
Instead of
Entry newEntry = mDBApi.putFile("/uploads/"+file.getName(), inputStream2,
file.toString().length(), null, null);
Use
Entry newEntry = mDBApi.putFile("/uploads/"+file.getName(), inputStream2,
bos.size(), null, null);
I don't think you need to convert to byte array, simply use FileInputStream is enough for a file, txt as well as binary. The following code works, I just tested with JPG.
DropboxAPI<?> client = new DropboxAPI<WebAuthSession>(session);
FileInputStream inputStream = null;
try {
File file = new File("some_pic.jpg");
inputStream = new FileInputStream(file);
DropboxAPI.Entry newEntry = client.putFile("/testing.jpg", inputStream,
file.length(), null, null);
System.out.println("The uploaded file's rev is: " + newEntry.rev);
} catch (DropboxUnlinkedException e) {
// User has unlinked, ask them to link again here.
System.out.println("User has unlinked.");
} catch (DropboxException e) {
System.out.println("Something went wrong while uploading.");
} catch (FileNotFoundException e) {
System.out.println("File not found.");
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {}
}
}
I have a 7zip archive which contains some hundred files separated into different directories. The target is to download it from a FTP server and then extract it on the phone.
My problem is that the 7zip SDK doesn't contain a lot. I am looking for examples, tutorials and snippets regarding the decompression of 7z files.
(Decompression via Intent is only a secondary option)
Go here:
LZMA SDK just provides the encoder and decoder for encoding/decoding the raw data, but 7z archive is a complex format for storing multiple files.
i found this page that provides an alternative that works like a charm. You only have to add compile 'org.apache.commons:commons-compress:1.8'
to your build gradle script and use the feature you desire. For this issue i did the following :
AssetManager am = getAssets();
InputStream inputStream = null;
try {
inputStream = am.open("a7ZipedFile.7z");
File file1 = createFileFromInputStream(inputStream);
} catch (IOException e) {
e.printStackTrace();
}
SevenZFile sevenZFile = null;
try{
File f = new File(this.getFilesDir(), "a7ZipedFile.7z");
OutputStream outputStream = new FileOutputStream(f);
byte buffer[] = new byte[1024];
int length = 0;
while((length=inputStream.read(buffer)) > 0) {
outputStream.write(buffer,0,length);
}
try {
sevenZFile = new SevenZFile(f);
SevenZArchiveEntry entry = sevenZFile.getNextEntry();
while (entry != null) {
System.out.println(entry.getName());
FileOutputStream out = openFileOutput(entry.getName(), Context.MODE_PRIVATE);
byte[] content = new byte[(int) entry.getSize()];
sevenZFile.read(content, 0, content.length);
out.write(content);
out.close();
entry = sevenZFile.getNextEntry();
}
sevenZFile.close();
outputStream.close();
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}catch (IOException e) {
//Logging exception
e.printStackTrace();
}
The only draw back is approximately 200k for the imported library. Other than that it is really easy to use.