I am trying to unzip a file in Java and add all the contents to an array list rather than write to a disk. The issue I am having is that I pass in a certain path to a zip file and then when it reads the zip file and adds the file to the list - when I come to process the files they have strange paths inside my project directory which do not exist.
Please can someone help me here?
public void processZipFile(String path) {
File file = new File(path);
file.setReadable(true);
ZipFile zip;
ArrayList<File> files = new ArrayList<File>();
try {
zip = new ZipFile(file);
Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip
.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
File f = new java.io.File(entry.getName());
allFiles.add(f);
}
} catch (ZipException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
AFAIK, your entry.getName() won't return anything useful that can be used to open a file. Remember that this is just a zip entry and not a physical file.
I would suggest you to store the inputStreams for every entry in you array using zipFile.getInputStream and then extract your contents from the inputstream and finally close these streams when they are useless to you.
Related
I have gone through the link of how to extract a .tar file and several link on SOF using Java.
However, I didnt find any which can relate to my concerns which is multilevel or nested .tar/.tgz/.zip file.
my concern is with something like below
Abc.tar.gz
--DEF.tar
--sample1.txt
--sample2.txt
--FGH.tgz
--sample3.txt
-sample4.txt
This is the simple one which I can give here . As it can be in any compressed combination with the folder like .tar inside .tar and .gz and again .tgz and so on....
My problem is I am able to extract till the first level using Apache Commons Compress library. that is if Abc.tar.gz gets extracted then in the destination/output folder its only DEF.tar available . beyond that my extraction is not working.
I tried to give the output of first to the input to the second on the fly but I got stuck with FileNotFoundException. As at that point of time output file would have not been in place and the second extraction not able to get the file.
Pseudocode:
public class CommonExtraction {
TarArchiveInputStream tar = null;
if((sourcePath.trim().toLowerCase.endsWith(".tar.gz")) || sourcePath.trim().toLowerCase.endsWith(".tgz")) {
try {
tar=new TarArchiveInputStream(new GzipCompressorInputStream(new BufferedInputStream(new FileInputStream(sourcePath))));
extractTar(tar,destPath)
} catch (Exception e) {
e.printStackTrace();
}
}
}
Public static void extractTar(TarArchiveInputStream tar, String outputFolder) {
try{
TarArchiveEntry entry;
while (null!=(entry=(TarArchiveEntry)tar.getNextTarEntry())) {
if(entry.getName().trim().toLowerCase.endsWith(".tar")){
final String path = outputFolder + entry.getName()
tar=new TarArchiveInputStream(new BufferedInputStream(new FileInputStream(path))) // failing as .tar folder after decompression from .gz not available at destination path
extractTar(tar,outputFolder)
}
extractEntry(entry,tar,outputFolder)
}
tar.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
Public static void extractEntry(TarArchiveEntry entry , InputStream tar, String outputFolder){
final String path = outputFolder + entry.getName()
if(entry.isDirectory()){
new File(path).mkdirs();
}else{
//create directory for the file if not exist
}
// code to read and write until last byte is encountered
}
}
Ps: please ignore the syntax and all in the code.
Try this
try (InputStream fi = file.getInputStream();
InputStream bi = new BufferedInputStream(fi);
InputStream gzi = new GzipCompressorInputStream(bi, false);
ArchiveInputStream archive = new TarArchiveInputStream(gzi)) {
withArchiveStream(archive, result::appendEntry);
}
As i see what .tar.gz and .tgz is same formats. And my method withArchiveEntry is:
private void withArchiveStream(ArchiveInputStream archInStream, BiConsumer<ArchiveInputStream, ArchiveEntry> entryConsumer) throws IOException {
ArchiveEntry entry;
while((entry = archInStream.getNextEntry()) != null) {
entryConsumer.accept(archInStream, entry);
}
}
private void appendEntry(ArchiveInputStream archive, ArchiveEntry entry) {
if (!archive.canReadEntryData(entry)) {
throw new IOException("Can`t read archive entry");
}
if (entry.isDirectory()) {
return;
}
// And for example
String content = new String(archive.readAllBytes(), StandardCharsets.UTF_8);
System.out.println(content);
}
You have a recursive problem, so you can use recursion to solve it. Here is some pseudocode to show how it can be done:
public class ArchiveExtractor
{
public void extract(File file)
{
List<File> files; // list of extracted files
if(isZip(file))
files = extractZip(file);
else if(isTGZ(file))
files = extractTGZ(file);
else if(isTar(file))
files = extractTar(file);
else if(isGZip(file))
files = extractGZip(file);
for(File f : files)
{
if(isArchive(f))
extract(f); // recursive call
}
}
private List<File> extractZip(File file)
{
// extract archive and return list of extracted files
}
private List<File> extractTGZ(File file)
{
// extract archive and return list of extracted files
}
private List<File> extractTar(File file)
{
// extract archive and return list of extracted files
}
private List<File> extractGZip(File file)
{
// extract archive and return list of extracted file
}
}
where:
isZip() tests if the file extension is zip
isTGZ() tests if the file extension is tgz
isTar() tests if the file extension is tar
isGZip() tests if the file extension is gz
isArchive() means isZip() || isTGZ() || isTar() || isGZip()
As for the directory where each archive is extracted: you are free to do as you want.
If you process test.zip for example, you may extract in the same directory as where the archive is,
or create the directory test and extract in it.
I have created a AWS lambda function that takes some files from an S3 bucket, zips them and transfers the zipped file to a sftp server. When I look in the server, I see that the tmp folder has been carries over with the files and a tmp folder gets created inside the zip file. When I open the zip file, there is a tmp folder and inside that folder are the files that I had zipped. I have scoured the internet and AWS trying to figure out how to change the directory in AWS Lambda when I am retrieving the files to be zipped, but have not had any luck. I don't want to have a tmp folder in my zip file. When I unzip the zip file, I just want to see the files that I had selected to be zipped without any folders. Does anyone know how to do this? I am programming in Java.
My code is below.
private DownloadFile(){
File localFile = new File(fileName);
//pull data and audit files from s3 bucket
s3Client.getObject(new GetObjectRequest("pie-dd-demo/daniel20", fileName), localFile);
zipOS = new ZipOutputStream(fos);
//send files to be zipped
writeToZipFile(fileName, zipOS);
}
public static void writeToZipFile(String path, ZipOutputStream zipStream)
throws FileNotFoundException, IOException {
File aFile = new File(path);
FileInputStream fis = new FileInputStream(aFile);
ZipEntry zipEntry = new ZipEntry(path);
try {
zipStream.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {
zipStream.write(bytes, 0, length);
System.out.println(path + "write to zipfile complete");
}
} catch (FileNotFoundException exception) {
// Output expected FileNotFoundExceptions.
} catch (Exception exception) {
// Output unexpected Exceptions.
}
zipStream.closeEntry();
fis.close();
}
I think the problem is that you are creating a zip entry using new ZipEntry(path) and that means that the resulting zip file will contain the full path as the name of the zip entry.
You can retrieve the actual filename from a full path/file in Java as follows:
File f = new File("/tmp/folder/cat.png");
String fname = f.getName();
You can then use fname to create the zip entry by calling new ZipEntry(fname).
I want to extract a zip file which contains a jar file. This file has complex folder structure and in one of the folders there is a jar file. When I am trying to use the following code to extract the jar file the program goes in infinite loop in reading the jar file and never recovers. It keeps on writing the contents of the jar till we reach the limit of the disc space even though the jar is of only a few Mbs.
Please find the code snippet below
`
// using a ZipInputStream to get the zipIn by passing the zipFile as FileInputStream
ZipEntry entry = zipIn.getNextEntry();
String fileName= entry.getName()
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileName));
byte[] bytesIn = new byte[(int)bufferSize];
while (zipIn.read(bytesIn) > 0) // This is the part where the loop does not end
{
bos.write(bytesIn);
}
..
// flushing an closing the bos
Please let me know if there is any way we can avoid this and get the jar file out at required location.
Does this suit your needs?
public static void main(String[] args) {
try {
copyJarFromZip("G:\\Dateien\\Desktop\\Desktop.zip",
"G:\\Dateien\\Desktop\\someJar.jar");
} catch (IOException ex) {
ex.printStackTrace();
}
}
public static void copyJarFromZip(final String zipPath, final String targetPath) throws IOException {
try (ZipFile zipFile = new ZipFile(zipPath)) {
for (final Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements();) {
ZipEntry zipEntry = e.nextElement();
if (zipEntry.getName().endsWith(".jar")) {
Files.copy(zipFile.getInputStream(zipEntry), Paths.get(targetPath),
StandardCopyOption.REPLACE_EXISTING);
}
}
}
}
I'm looking for a way to extract Zip file. So far I have tried java.util.zip and org.apache.commons.compress, but both gave a corrupted output.
Basically, the input is a ZIP file contain one single .doc file.
java.util.zip: Output corrupted.
org.apache.commons.compress: Output blank file, but with 2 mb size.
So far only the commercial software like Winrar work perfectly. Is there a java library that make use of this?
This is my method using java.util library:
public void extractZipNative(File fileZip)
{
ZipInputStream zis;
StringBuilder sb;
try {
zis = new ZipInputStream(new FileInputStream(fileZip));
ZipEntry ze = zis.getNextEntry();
byte[] buffer = new byte[(int) ze.getSize()];
FileOutputStream fos = new FileOutputStream(this.tempFolderPath+ze.getName());
int len;
while ((len=zis.read(buffer))>0)
{
fos.write(buffer);
}
fos.flush();
fos.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally
{
if (zis!=null)
{
try { zis.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Many thanks,
Mike
I think your input may be compressed by some "incompatible" zip program like 7zip.
Try investigating first if it can be unpacked with a classical WinZip or such.
Javas zip handling is very well able to deal with zipped archives that come from a "compatible" zip compressor.
It is an error in my code. I need to specify the offset and len of bytes write.
it works for me
ZipFile Vanilla = new ZipFile(new File("Vanilla.zip")); //zipfile defined and needs to be in directory
Enumeration<? extends ZipEntry> entries = Vanilla.entries();// all (files)entries of zip file
while(entries.hasMoreElements()){//runs while there is files in zip
ZipEntry entry = entries.nextElement();//gets name of file in zip
File folderw =new File("tkwgter5834");//creates new directory
InputStream stream = Vanilla.getInputStream(entry);//gets input
FileInputStream inpure= new FileInputStream("Vanilla.zip");//file input stream for zip file to read bytes of file
FileOutputStream outter = new FileOutputStream(new File(folderw +"//"+ entry.toString())); //fileoutput stream creates file inside defined directory(folderw variable) by file's name
outter.write(inpure.readAllBytes());// write into files which were created
outter.close();//closes fileoutput stream
}
Have you tried jUnrar? Perhaps it might work:
https://github.com/edmund-wagner/junrar
If that doesn't work either, I guess your archive is corrupted in some way.
If you know the environment that you're going to be running this code in, I think you're much better off just making a call to the system to unzip it for you. It will be way faster than anything that you implement in java.
I wrote the code to extract a zip file with nested directories and it ran slowly and took a lot of CPU. I wound up replacing it with this:
Runtime.getRuntime().exec(String.format("unzip %s -d %s", archive.getAbsolutePath(), basePath));
That works a lot better.
This question already has answers here:
How to create Uncompressed Zip archive in Java
(4 answers)
Closed 6 years ago.
I'm trying to compress directory content into zip archive using java.
Everything is fine, but I just want to clarify some facts.
Here is the code which I use to compress files:
public void pack(#Nonnull String archiveName, #Nonnull File outputDir, #Nonnull File targetDir) {
File zipFile = new File(outputDir, "out.zip");
ZipOutputStream zipOutputStream = null;
OutputStream outputStream;
try {
// create stream for writing zip archive
outputStream = new FileOutputStream(zipFile);
zipOutputStream = new ZipOutputStream(outputStream);
// write files recursively
writeFiles(zipOutputStream, targetDir.listFiles(), "");
} catch (IOException e) {
LOGGER.error("IO exception while packing files to archive", e);
} finally {
// close output streams
if (zipOutputStream != null) {
try {
zipOutputStream.close();
} catch (IOException e) {
LOGGER.error("Unable to close zip output stream", e);
}
}
}
}
/**
* Writes specified files and their children (in case of directories) to archive
*
* #param zipOutputStream archive output stream
* #param files which should be added to archive
* #param path path relative of root of archive where files should be placed
*/
private void writeFiles(#Nonnull ZipOutputStream zipOutputStream, #Nullable File[] files, #Nonnull String path) throws IOException {
if (files == null || files.length == 0) {
return;
}
for (File file : files) {
if (file.isDirectory()) {
// recursively add files in this directory
String fullDirectoryName = path + file.getName() + "/";
File[] childFiles = file.listFiles();
if (childFiles != null && childFiles.length > 0) {
// write child files to archive. current directory will be created automatically
writeFiles(zipOutputStream, childFiles, fullDirectoryName);
} else {
// empty directory. write directory itself to archive
ZipEntry entry = new ZipEntry(fullDirectoryName);
zipOutputStream.putNextEntry(entry);
zipOutputStream.closeEntry();
}
} else {
// put file in archive
BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));
zipOutputStream.putNextEntry(new ZipEntry(path + file.getName()));
ByteStreams.copy(bufferedInputStream, zipOutputStream);
zipOutputStream.closeEntry();
bufferedInputStream.close();
}
}
}
Now there are the questions:
Is it correct that by default (and in my case too) I will get already compressed archive (using Deflate method)?
How to get uncompressed archive:
If I set method zipOutputStream.setMethod(ZipOutputStream.STORED) I have to provide size, compressed size (is it will be equal to size?) and crc, otherwise I will get exceptions
If I don't want to calculate size and crc by myself I can use DEFLATE method with zero level:
zipOutputStream.setMethod(ZipOutputStream.DEFLATED);
zipOutputStream.setLevel(ZipOutputStream.STORED);So, is it correct that in this case I get not compressed archive at all?
Is there more convenient-obvious method to creating not-compressed archives?
Rather than re-invent the wheel I'd seriously consider using an existing library for this, such as Apache Ant. The basic idiom for creating a zip file is:
Project p = new Project();
p.init();
Zip zip = new Zip();
zip.setProject(p);
zip.setDestFile(new File(outputDir, "out.zip"));
FileSet fs = new FileSet();
fs.setProject(p);
fs.setDirectory(targetDir);
zip.addFileset(fs);
zip.perform();
By default you will get a compressed archive. For an uncompressed zip all you need to add is
zip.setCompress(false);
after the setDestFile (in fact anywhere before the perform).