Unzipping a multi-part zip file volumes using Java - java

I need to unzip a set of files that are a zip archive. This isn't a set of zip files, this is one big zip file that has been broken up into multiple zip files based on a size requirement.
For example if you have a 2.5MB zip file and your mail system only supports 1MB files, you can ask Zip to create 3 files of at most 1MB.
So it creates a.zip.001, a.zip.002, a.zip.003 ... different libraries name them differently but essentially they all work the same way.
How do you unzip this in java? It doesn't look like the compression libs in std supports this.
Thanks.

Try to concatenate all the files into a single file and then extract the single file. Something like:
File dir = new File("D:/arc");
FileOutputStream fos = new FileOutputStream(new File(
"d:/arc/archieve-full.zip"));
FileInputStream fis = null;
Set<String> files = new TreeSet<String>();
for (String fname : dir.list()) {
files.add(fname);
}
for (String fname : files) {
try {
fis = new FileInputStream(new File(dir.getAbsolutePath(), fname));
byte[] b = new byte[fis.available()];
fis.read(b);
fos.write(b);
} finally {
if (fis != null) {
fis.close();
}
fos.flush();
}
}
fos.close();
ZipFile zipFile = new ZipFile("d:/arc/archieve-full.zip");
/*extract files from zip*/
Update: used a TreeSet to sort the file names, as dir.list() doesn't guarantee alphabetical order.

Related

Transferring files from AWS lambda tmp folder to sftp server

I have created a AWS lambda function that takes some files from an S3 bucket, zips them and transfers the zipped file to a sftp server. When I look in the server, I see that the tmp folder has been carries over with the files and a tmp folder gets created inside the zip file. When I open the zip file, there is a tmp folder and inside that folder are the files that I had zipped. I have scoured the internet and AWS trying to figure out how to change the directory in AWS Lambda when I am retrieving the files to be zipped, but have not had any luck. I don't want to have a tmp folder in my zip file. When I unzip the zip file, I just want to see the files that I had selected to be zipped without any folders. Does anyone know how to do this? I am programming in Java.
My code is below.
private DownloadFile(){
File localFile = new File(fileName);
//pull data and audit files from s3 bucket
s3Client.getObject(new GetObjectRequest("pie-dd-demo/daniel20", fileName), localFile);
zipOS = new ZipOutputStream(fos);
//send files to be zipped
writeToZipFile(fileName, zipOS);
}
public static void writeToZipFile(String path, ZipOutputStream zipStream)
throws FileNotFoundException, IOException {
File aFile = new File(path);
FileInputStream fis = new FileInputStream(aFile);
ZipEntry zipEntry = new ZipEntry(path);
try {
zipStream.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {
zipStream.write(bytes, 0, length);
System.out.println(path + "write to zipfile complete");
}
} catch (FileNotFoundException exception) {
// Output expected FileNotFoundExceptions.
} catch (Exception exception) {
// Output unexpected Exceptions.
}
zipStream.closeEntry();
fis.close();
}
I think the problem is that you are creating a zip entry using new ZipEntry(path) and that means that the resulting zip file will contain the full path as the name of the zip entry.
You can retrieve the actual filename from a full path/file in Java as follows:
File f = new File("/tmp/folder/cat.png");
String fname = f.getName();
You can then use fname to create the zip entry by calling new ZipEntry(fname).

SevenZFile - Apache Commons Compress 1.15, Uncompress

While Uncompressing .7z file, Empty folders are ignored, I want to consider Empty folders as well after uncompressing any .7z file.
My Code is as below
SevenZFile sevenZFile = new SevenZFile(new File(filename));
SevenZArchiveEntry entry;
while ((entry = sevenZFile.getNextEntry()) != null){
if (entry.isDirectory()){
continue;
}
File curfile = new File(DestinationPath,entry.getName());
File parent = curfile.getParentFile();
if (!parent.exists()) {
parent.mkdirs();
}
FileOutputStream out = new FileOutputStream(curfile);
byte[] content = new byte[(int) entry.getSize()];
sevenZFile.read(content, 0, content.length);
out.write(content);
out.close();
Your code seems working.
Probably the folder aren't in the "yourfile.7zip" from the beginning.
This is a common issue of 7zip and you have to update your 7zip version.
If the 7Zip contains proper arguments just use:
if (entry.isDirectory()){
new File(DestinationPath,entry.getName()).mkdir();
continue;
}
Since:
A file output stream is an output stream for writing data to a File or
to a FileDescriptor.
That is the proper method to accomplish the task because there's no folder implementation by native library vendor.

Extract files from *.gz extension

I have successfully extracted *.gz from *.tgz and now I have no idea how to actually extract final files from *.tgz.
There are some options using custom packages but that's not an option for me, I need to use standard Java packages only.
What I tried is using same function that I use for *.tgz for *.gz but it doesn't work.
java.util.zip.ZipException: Not in GZIP format 1.gz
Here is function that is extracting *.tgz files.
public String ExtractFile(String path) {
try {
File newFile = new File(this.getFullPathWithoutExtension(path) + ".gz");
GZIPInputStream gStream;
FileOutputStream outStream;
try (FileInputStream fileStream = new FileInputStream(path)) {
gStream = new GZIPInputStream(fileStream);
outStream = new FileOutputStream(newFile);
byte[] buf = new byte[1024];
int len;
while ((len = gStream.read(buf)) > 0) {
outStream.write(buf, 0, len);
}
}
gStream.close();
outStream.close();
newFile.createNewFile();
return newFile.getPath();
} catch (Exception e) {
System.out.print(e);
}
return null;
}
TL;DR; *.tgz files extracted to *.gz files, however *.gz files cannot be extracted.
A .tgz file wouldn't normally be extracted to a .gz file - it would be extracted to a .tar file. (A .gz file is gzipped; a .tar file is an uncompressed archive containing multiple files; a .tgz is a .tar file that's then been gzipped - you've already "undone" the gzipping.)
I don't think there's anything within Java's standard libraries to handle tar files - so you'll either have to revisit your "I can't use anything not in the standard library" decision or reimplement it yourself. The file format is easily available if you decide to do that.

Zipping Files using util.zip No directory

I have the following situation:
I am able to zip my files with the following method:
public boolean generateZip(){
byte[] application = new byte[100000];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// These are the files to include in the ZIP file
String[] filenames = new String[]{"/subdirectory/index.html", "/subdirectory/webindex.html"};
// Create a buffer for reading the files
try {
// Create the ZIP file
ZipOutputStream out = new ZipOutputStream(baos);
// Compress the files
for (int i=0; i<filenames.length; i++) {
byte[] filedata = VirtualFile.fromRelativePath(filenames[i]).content();
ByteArrayInputStream in = new ByteArrayInputStream(filedata);
// Add ZIP entry to output stream.
out.putNextEntry(new ZipEntry(filenames[i]));
// Transfer bytes from the file to the ZIP file
int len;
while ((len = in.read(application)) > 0) {
out.write(application, 0, len);
}
// Complete the entry
out.closeEntry();
in.close();
}
// Complete the ZIP file
out.close();
} catch (IOException e) {
System.out.println("There was an error generating ZIP.");
e.printStackTrace();
}
downloadzip(baos.toByteArray());
}
This works perfectly and I can download the xy.zip which contains the following directory and file structure:
subdirectory/
----index.html
----webindex.html
My aim is to completely leave out the subdirectory and the zip should only contain the two files. Is there any way to achieve this?
(I am using Java on Google App Engine).
Thanks in advance
If you are sure the files contained in the filenames array are unique if you leave out the directory, change your line for constructing ZipEntrys:
String zipEntryName = new File(filenames[i]).getName();
out.putNextEntry(new ZipEntry(zipEntryName));
This uses java.io.File#getName()
You can use Apache Commons io to list all your files, then read them to an InputStream
Replace the line below
String[] filenames = new String[]{"/subdirectory/index.html", "/subdirectory/webindex.html"}
with the following
Collection<File> files = FileUtils.listFiles(new File("/subdirectory"), new String[]{"html"}, true);
for (File file : files)
{
FileInputStream fileStream = new FileInputStream(file);
byte[] filedata = IOUtils.toByteArray(fileStream);
//From here you can proceed with your zipping.
}
Let me know if you have issues.
You could use the isDirectory() method on VirtualFile

Java most efficient way to retrieve something out of the middle of a ZIP

I am seeking for most efficient way (in terms of speed) to retrieve some file out of the middle of a ZIP file.
e.g. I have ZIP file, which includes 700 folders (tagged 1 to 700). Each folder equals picture and mp3 file. There is special folder called Info, which contains XML file. Problem is, I need to iterate through this ZIP file to find XML file and then I am displaying images from desired folders. I am using ZipFile approach (thus I am iterating through whole ZIP file, even if I want folder 666, I need to go through 665 items in ZIP file) -> selecting from ZIP file is extremely slow.
I would like to ask you, If you have faced similar issue, how have you solved this? Is there any approach in Java, which turns my ZIP file into virtual folder to browse it much more quicker? Is there any external library, which is the most efficient in terms of time?
Source Code snippet:
try {
FileInputStream fin = new FileInputStream(
"sdcard/external_sd/mtp_data/poi_data/data.zip");
ZipInputStream zin = new ZipInputStream(fin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
// Log.d("ZE", ze.getName());
if (ze.getName().startsWith("body/665/")) {
// Log.d("FILE F", "soubor: "+ze.getName());
if (ze.getName().endsWith(".jpg")
|| ze.getName().endsWith(".JPG")) {
Log.d("OBR", "picture: " + ze.getName());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int count;
while ((count = zin.read(buffer)) != -1) {
baos.write(buffer, 0, count);
}
byte[] bytes = baos.toByteArray();
bmp = BitmapFactory.decodeByteArray(bytes, 0,
bytes.length);
photoField.add(bmp);
i++;
}
}
}
}
The ZipFile.getEntry() and ZipFile.getInputStream() methods can be used to access a specific file in a ZIP archive. For example:
ZipFile file = ...
ZipEntry entry = file.getEntry("folder1/picture.jpg");
InputStream in = file.getInputStream(entry);

Categories