Reading files inside a subdir within a zip file

Reading files inside a subdir within a zip file - java

I have file in the following structure:
--BA.zip
|
|--- BA (directory)
|
|---BA_KKSSI_20201013.zip
| |
| |---BA_KKSSI_20201013.txt
|---BA_KKSSI_20201014.zip
| |
| |---BA_KKSSI_20201014.txt
|---BA_KKSSI_20201015.zip
|
|---BA_KKSSI_20201015.txt
I need to read BA_KKSSI_20201013.txt without extracting the parent file which is BA.zip
I have already written parts of code to read if there is no sub dirs. For example:
public static String readChildZip(Path zipPath) throws IOException {
try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
// since there is only one text file
ZipEntry textFile = zipFile.entries().nextElement();
// the zip
System.out.println(zipFile.getName());
InputStream is = zipFile.getInputStream(textFile);
String contents = IOUtils.toString(is, StandardCharsets.UTF_8);
return contents;
}
}
Above code can process the last zip and txt part (i.e., if there are no sub-dirs within a zip)
I looked through most of the SO posts and all of them propose extracting the sub-directory first and then read through the secondary zip files.
Is there a way to do this without extracting in the first place?

You can use ZipInputStream (https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html) to read entries in the "outer" zip file as zip files as well.
Meaning open the zip file as you have but then iterate over and if a entry is a zipfile itself you create a ZipInputStream with the InputStream for that ZipEntry.

This returns the contents of the first text file inside the first Zip file in zipPath.
public static String readChildZip(Path zipPath) throws IOException {
try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
ZipEntry childZipEntry = zipFile.entries().nextElement();
try (InputStream childInputStream = zipFile.getInputStream(childZipEntry);
ZipInputStream childZipStream = new ZipInputStream(childInputStream)) {
childZipStream.getNextEntry();
return new String(childZipStream.readAllBytes(), StandardCharsets.UTF_8);
}
}
}
And this will print the contents of all text files inside the first Zip file in zipPath.
public static void readChildZipAll(Path zipPath) throws IOException {
try (ZipFile zipFile = new ZipFile(zipPath.toFile())) {
ZipEntry childZipEntry = zipFile.entries().nextElement();
try (InputStream childInputStream = zipFile.getInputStream(childZipEntry);
ZipInputStream childZipStream = new ZipInputStream(childInputStream)) {
ZipEntry grandChildEntry;
while ((grandChildEntry = childZipStream.getNextEntry()) != null) {
System.out.println(grandChildEntry + ":"
+ new String(childZipStream.readAllBytes(), StandardCharsets.UTF_8));
}
}
}
}

You should look at using NIO ZIP File System or ZipInputStream if wanting to scan a ZIP.
Here is an example of using ZIP File System in a recursive scanner which can be used to inspect any level of depth of JAR/ZIP/WAR/EAR hierarchy. You should adapt to suit own purposes for whatever action you need to perform on the content, this example just cats any ".txt" files to the console.
Note that ZIP File System returns zip filesystem Path objects which can be used with NIO Files.xxx() calls such as Files.find() and Find.copy() just like you would use for Path that originate from default HDD filesystems.
private static Pattern ZIP_PATTERN = Pattern.compile("(?i).*\\.(jar|war|ear|zip)");
public static void traverseZip(Path zip) {
System.out.println("traverseZip "+zip.toAbsolutePath());
try (FileSystem fs = FileSystems.newFileSystem(zip)) {
for (Path root : fs.getRootDirectories()) {
try (Stream<Path> stream = Files.find(root, Integer.MAX_VALUE, (p,a) -> true)) {
stream.forEach(entry -> {
System.out.println(zip.toString()+" -> "+entry);
// SOME ACTION HERE, for example
if (entry.toString().endsWith(".txt")) {
cat(entry, System.out);
}
if (ZIP_PATTERN.matcher(entry.toString()).matches() && Files.isRegularFile(entry)) {
traverseZip(entry);
}
});
}
}
}
catch(IOException e) {
throw new UncheckedIOException(e);
}
}
private static void cat(Path path, OutputStream out) {
try {
Files.copy(path, out);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
Launch with:
traverseZip(Path.of("some.zip"));

Related

Unable to extract nested tar within zip file i.e. a .tar file inside a zip file and so on

I have gone through the link of how to extract a .tar file and several link on SOF using Java.
However, I didnt find any which can relate to my concerns which is multilevel or nested .tar/.tgz/.zip file.
my concern is with something like below
Abc.tar.gz
--DEF.tar
--sample1.txt
--sample2.txt
--FGH.tgz
--sample3.txt
-sample4.txt
This is the simple one which I can give here . As it can be in any compressed combination with the folder like .tar inside .tar and .gz and again .tgz and so on....
My problem is I am able to extract till the first level using Apache Commons Compress library. that is if Abc.tar.gz gets extracted then in the destination/output folder its only DEF.tar available . beyond that my extraction is not working.
I tried to give the output of first to the input to the second on the fly but I got stuck with FileNotFoundException. As at that point of time output file would have not been in place and the second extraction not able to get the file.
Pseudocode:
public class CommonExtraction {
TarArchiveInputStream tar = null;
if((sourcePath.trim().toLowerCase.endsWith(".tar.gz")) || sourcePath.trim().toLowerCase.endsWith(".tgz")) {
try {
tar=new TarArchiveInputStream(new GzipCompressorInputStream(new BufferedInputStream(new FileInputStream(sourcePath))));
extractTar(tar,destPath)
} catch (Exception e) {
e.printStackTrace();
}
}
}
Public static void extractTar(TarArchiveInputStream tar, String outputFolder) {
try{
TarArchiveEntry entry;
while (null!=(entry=(TarArchiveEntry)tar.getNextTarEntry())) {
if(entry.getName().trim().toLowerCase.endsWith(".tar")){
final String path = outputFolder + entry.getName()
tar=new TarArchiveInputStream(new BufferedInputStream(new FileInputStream(path))) // failing as .tar folder after decompression from .gz not available at destination path
extractTar(tar,outputFolder)
}
extractEntry(entry,tar,outputFolder)
}
tar.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
Public static void extractEntry(TarArchiveEntry entry , InputStream tar, String outputFolder){
final String path = outputFolder + entry.getName()
if(entry.isDirectory()){
new File(path).mkdirs();
}else{
//create directory for the file if not exist
}
// code to read and write until last byte is encountered
}
}
Ps: please ignore the syntax and all in the code.

Try this
try (InputStream fi = file.getInputStream();
InputStream bi = new BufferedInputStream(fi);
InputStream gzi = new GzipCompressorInputStream(bi, false);
ArchiveInputStream archive = new TarArchiveInputStream(gzi)) {
withArchiveStream(archive, result::appendEntry);
}
As i see what .tar.gz and .tgz is same formats. And my method withArchiveEntry is:
private void withArchiveStream(ArchiveInputStream archInStream, BiConsumer<ArchiveInputStream, ArchiveEntry> entryConsumer) throws IOException {
ArchiveEntry entry;
while((entry = archInStream.getNextEntry()) != null) {
entryConsumer.accept(archInStream, entry);
}
}
private void appendEntry(ArchiveInputStream archive, ArchiveEntry entry) {
if (!archive.canReadEntryData(entry)) {
throw new IOException("Can`t read archive entry");
}
if (entry.isDirectory()) {
return;
}
// And for example
String content = new String(archive.readAllBytes(), StandardCharsets.UTF_8);
System.out.println(content);
}

You have a recursive problem, so you can use recursion to solve it. Here is some pseudocode to show how it can be done:
public class ArchiveExtractor
{
public void extract(File file)
{
List<File> files; // list of extracted files
if(isZip(file))
files = extractZip(file);
else if(isTGZ(file))
files = extractTGZ(file);
else if(isTar(file))
files = extractTar(file);
else if(isGZip(file))
files = extractGZip(file);
for(File f : files)
{
if(isArchive(f))
extract(f); // recursive call
}
}
private List<File> extractZip(File file)
{
// extract archive and return list of extracted files
}
private List<File> extractTGZ(File file)
{
// extract archive and return list of extracted files
}
private List<File> extractTar(File file)
{
// extract archive and return list of extracted files
}
private List<File> extractGZip(File file)
{
// extract archive and return list of extracted file
}
}
where:
isZip() tests if the file extension is zip
isTGZ() tests if the file extension is tgz
isTar() tests if the file extension is tar
isGZip() tests if the file extension is gz
isArchive() means isZip() || isTGZ() || isTar() || isGZip()
As for the directory where each archive is extracted: you are free to do as you want.
If you process test.zip for example, you may extract in the same directory as where the archive is,
or create the directory test and extract in it.

ZipInputStream - functional approach to unzip files

I have a zip file containing only files and not directories. I want to unzip the file into a directory by using only functional Java.
The code below works as expected and unzips the file into the target folder.
public static void unzip(Path source, Path target) throws IOException {
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(source.toFile()))) {
ZipEntry zipEntry = zis.getNextEntry();
while (zipEntry != null) {
Path targetDirResolved = target.resolve(zipEntry.getName());
Path normalizePath = targetDirResolved.normalize();
if (!normalizePath.startsWith(target)) {
throw new IOException("Bad zip entry: " + zipEntry.getName());
}
Files.copy(zis, normalizePath, StandardCopyOption.REPLACE_EXISTING);
zipEntry = zis.getNextEntry();
}
zis.closeEntry();
}
}
I want to achieve the same functionality as above but use a more functional approach.
My initial thoughts are to transform the while loop into something like IntStream.range(0, ).... but the number of entries inside the zip file is not known.
Any ideas?

One way to eliminate the while loop is to use ZipFilestream:
try (ZipFile fs = new ZipFile(source.toFile())) {
fs.stream()
.filter(Predicate.not(ZipEntry::isDirectory))
// etc
.forEach(System.out::println);
}
You could also look at ZIP filesystem with Files.find to provide a suitable Stream of ZIP entries (of level one files) and if required, do additional filtering / map or other conversions on the stream instead of forEach as shown here:
try (FileSystem fs = FileSystems.newFileSystem(source)) {
for (Path root : fs.getRootDirectories()) {
try (Stream<Path> stream = Files.find(root, 1, (p,a) -> a.isRegularFile())) {
stream.forEach(p -> copy(p, Path.of(target.toString(), p.toString())));
}
}
}
Unfortunately the above still uses normal loops for root directories and copy is a separate method to handle IOException:
private static void copy(Path from, Path to) {
try {
Files.copy(from, to, StandardCopyOption.REPLACE_EXISTING);
}
catch (IOException e) {
throw new UncheckedIOException(e);
}
}

Java: Add files to zip-file recursively but without full path

I am trying to put files from a folder inside a zip file in the following structure:
Folder structure:
myFolder
|-file1.txt
|-file2.txt
|-folder172
|-file817.txt
|-file818.txt
...
Supposed structure inside ZipFile:
file1.txt
file2.txt
folder172
|-file817.txt
|-file818.txt
This is my code:
public static void writeZip(String path) throws IOException{
FileOutputStream fos = new FileOutputStream(path+File.separator+"atest.zip");
ZipOutputStream zos = new ZipOutputStream(fos);
try {
Files.walk(Paths.get(path)).filter(Files::isRegularFile).forEach((string) -> addToZipFile(string.toString(),zos));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
zos.close();
fos.close();
}
public static void addToZipFile(String fileName, ZipOutputStream zos) throws IOException {
System.out.println("Writing '" + fileName + "' to zip file");
File file = new File(fileName);
FileInputStream fis = null;
fis = new FileInputStream(file);
ZipEntry zipEntry = new ZipEntry(fileName);
zos.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {
zos.write(bytes, 0, length);
}
zos.closeEntry();
fis.close();
}
The problem is now, when i call writeZip("/home/arthur/.grutil/");, i get the following structure in the zip-file:
home
|-arthur
|-.grutil
|-file1.txt
|-file2.txt
|-folder172
|-file817.txt
|-file818.txt
...
How do i need to change my code to get the supposed structure (as described above) and not the structure with the full path '/home/arthur/.grutil/ ...'?

Whilst this can be done with the ancient ZipOutputStream I would recommend against it.
It is much more intuitive to think about a Zip archive as a compressed filesystem inside a file, than a stream of bytes. For this reason, Java provides the ZipFileSystem.
So all you need to do is open the Zip as a FileSystem and then manually copy files across.
There are a couple of gotchas:
You need to only copy files, directories need to be created.
The NIO API does not support operations such as relativize across different filesystems (reasons should be obvious) so this you need to do yourself.
Here are a couple of simple methods that will do exactly that:
/**
* This creates a Zip file at the location specified by zip
* containing the full directory tree rooted at contents
*
* #param zip the zip file, this must not exist
* #param contents the root of the directory tree to copy
* #throws IOException, specific exceptions thrown for specific errors
*/
public static void createZip(final Path zip, final Path contents) throws IOException {
if (Files.exists(zip)) {
throw new FileAlreadyExistsException(zip.toString());
}
if (!Files.exists(contents)) {
throw new FileNotFoundException("The location to zip must exist");
}
final Map<String, String> env = new HashMap<>();
//creates a new Zip file rather than attempting to read an existing one
env.put("create", "true");
// locate file system by using the syntax
// defined in java.net.JarURLConnection
final URI uri = URI.create("jar:file:/" + zip.toString().replace("\\", "/"));
try (final FileSystem zipFileSystem = FileSystems.newFileSystem(uri, env);
final Stream<Path> files = Files.walk(contents)) {
final Path root = zipFileSystem.getPath("/");
files.forEach(file -> {
try {
copyToZip(root, contents, file);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
}
}
/**
* Copy a specific file/folder to the zip archive
* If the file is a folder, create the folder. Otherwise copy the file
*
* #param root the root of the zip archive
* #param contents the root of the directory tree being copied, for relativization
* #param file the specific file/folder to copy
*/
private static void copyToZip(final Path root, final Path contents, final Path file) throws IOException {
final Path to = root.resolve(contents.relativize(file).toString());
if (Files.isDirectory(file)) {
Files.createDirectories(to);
} else {
Files.copy(file, to);
}
}

Extracting a zip file containing a jar file in Java

I want to extract a zip file which contains a jar file. This file has complex folder structure and in one of the folders there is a jar file. When I am trying to use the following code to extract the jar file the program goes in infinite loop in reading the jar file and never recovers. It keeps on writing the contents of the jar till we reach the limit of the disc space even though the jar is of only a few Mbs.
Please find the code snippet below
`
// using a ZipInputStream to get the zipIn by passing the zipFile as FileInputStream
ZipEntry entry = zipIn.getNextEntry();
String fileName= entry.getName()
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileName));
byte[] bytesIn = new byte[(int)bufferSize];
while (zipIn.read(bytesIn) > 0) // This is the part where the loop does not end
{
bos.write(bytesIn);
}
..
// flushing an closing the bos
Please let me know if there is any way we can avoid this and get the jar file out at required location.

Does this suit your needs?
public static void main(String[] args) {
try {
copyJarFromZip("G:\\Dateien\\Desktop\\Desktop.zip",
"G:\\Dateien\\Desktop\\someJar.jar");
} catch (IOException ex) {
ex.printStackTrace();
}
}
public static void copyJarFromZip(final String zipPath, final String targetPath) throws IOException {
try (ZipFile zipFile = new ZipFile(zipPath)) {
for (final Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements();) {
ZipEntry zipEntry = e.nextElement();
if (zipEntry.getName().endsWith(".jar")) {
Files.copy(zipFile.getInputStream(zipEntry), Paths.get(targetPath),
StandardCopyOption.REPLACE_EXISTING);
}
}
}
}

How to copy file inside jar to outside the jar?

I want to copy a file from a jar. The file that I am copying is going to be copied outside the working directory. I have done some tests and all methods I try end up with 0 byte files.
EDIT: I want the copying of the file to be done via a program, not manually.

First of all I want to say that some answers posted before are entirely correct, but I want to give mine, since sometimes we can't use open source libraries under the GPL, or because we are too lazy to download the jar XD or what ever your reason is here is a standalone solution.
The function below copy the resource beside the Jar file:
/**
* Export a resource embedded into a Jar file to the local file path.
*
* #param resourceName ie.: "/SmartLibrary.dll"
* #return The path to the exported resource
* #throws Exception
*/
static public String ExportResource(String resourceName) throws Exception {
InputStream stream = null;
OutputStream resStreamOut = null;
String jarFolder;
try {
stream = ExecutingClass.class.getResourceAsStream(resourceName);//note that each / is a directory down in the "jar tree" been the jar the root of the tree
if(stream == null) {
throw new Exception("Cannot get resource \"" + resourceName + "\" from Jar file.");
}
int readBytes;
byte[] buffer = new byte[4096];
jarFolder = new File(ExecutingClass.class.getProtectionDomain().getCodeSource().getLocation().toURI().getPath()).getParentFile().getPath().replace('\\', '/');
resStreamOut = new FileOutputStream(jarFolder + resourceName);
while ((readBytes = stream.read(buffer)) > 0) {
resStreamOut.write(buffer, 0, readBytes);
}
} catch (Exception ex) {
throw ex;
} finally {
stream.close();
resStreamOut.close();
}
return jarFolder + resourceName;
}
Just change ExecutingClass to the name of your class, and call it like this:
String fullPath = ExportResource("/myresource.ext");
Edit for Java 7+ (for your convenience)
As answered by GOXR3PLUS and noted by Andy Thomas you can achieve this with:
Files.copy( InputStream in, Path target, CopyOption... options)
See GOXR3PLUS answer for more details

Given your comment about 0-byte files, I have to assume you're trying to do this programmatically, and, given your tags, that you're doing it in Java. If that's true, then just use Class.getResource() to get a URL pointing to the file in your JAR, then Apache Commons IO FileUtils.copyURLToFile() to copy it out to the file system. E.g.:
URL inputUrl = getClass().getResource("/absolute/path/of/source/in/jar/file");
File dest = new File("/path/to/destination/file");
FileUtils.copyURLToFile(inputUrl, dest);
Most likely, the problem with whatever code you have now is that you're (correctly) using a buffered output stream to write to the file but (incorrectly) failing to close it.
Oh, and you should edit your question to clarify exactly how you want to do this (programmatically, not, language, ...)

Faster way to do it with Java 7+ , plus code to get the current directory:
/**
* Copy a file from source to destination.
*
* #param source
* the source
* #param destination
* the destination
* #return True if succeeded , False if not
*/
public static boolean copy(InputStream source , String destination) {
boolean succeess = true;
System.out.println("Copying ->" + source + "\n\tto ->" + destination);
try {
Files.copy(source, Paths.get(destination), StandardCopyOption.REPLACE_EXISTING);
} catch (IOException ex) {
logger.log(Level.WARNING, "", ex);
succeess = false;
}
return succeess;
}
Testing it (icon.png is an image inside the package image of the application):
copy(getClass().getResourceAsStream("/image/icon.png"),getBasePathForClass(Main.class)+"icon.png");
About the line of code (getBasePathForClass(Main.class)): -> check the answer i have added here :) -> Getting the Current Working Directory in Java

Java 8 (actually FileSystem is there since 1.7) comes with some cool new classes/methods to deal with this. As somebody already mentioned that JAR is basically ZIP file, you could use
final URI jarFileUril = URI.create("jar:file:" + file.toURI().getPath());
final FileSystem fs = FileSystems.newFileSystem(jarFileUri, env);
(See Zip File)
Then you can use one of the convenient methods like:
fs.getPath("filename");
Then you can use Files class
try (final Stream<Path> sources = Files.walk(from)) {
sources.forEach(src -> {
final Path dest = to.resolve(from.relativize(src).toString());
try {
if (Files.isDirectory(from)) {
if (Files.notExists(to)) {
log.trace("Creating directory {}", to);
Files.createDirectories(to);
}
} else {
log.trace("Extracting file {} to {}", from, to);
Files.copy(from, to, StandardCopyOption.REPLACE_EXISTING);
}
} catch (IOException e) {
throw new RuntimeException("Failed to unzip file.", e);
}
});
}
Note: I tried that to unpack JAR files for testing

Robust solution:
public static void copyResource(String res, String dest, Class c) throws IOException {
InputStream src = c.getResourceAsStream(res);
Files.copy(src, Paths.get(dest), StandardCopyOption.REPLACE_EXISTING);
}
You can use it like this:
File tempFileGdalZip = File.createTempFile("temp_gdal", ".zip");
copyResource("/gdal.zip", tempFileGdalZip.getAbsolutePath(), this.getClass());

Use the JarInputStream class:
// assuming you already have an InputStream to the jar file..
JarInputStream jis = new JarInputStream( is );
// get the first entry
JarEntry entry = jis.getNextEntry();
// we will loop through all the entries in the jar file
while ( entry != null ) {
// test the entry.getName() against whatever you are looking for, etc
if ( matches ) {
// read from the JarInputStream until the read method returns -1
// ...
// do what ever you want with the read output
// ...
// if you only care about one file, break here
}
// get the next entry
entry = jis.getNextEntry();
}
jis.close();
See also: JarEntry

To copy a file from your jar, to the outside, you need to use the following approach:
Get a InputStream to a the file inside your jar file using getResourceAsStream()
We open our target file using a FileOutputStream
We copy bytes from the input to the output stream
We close our streams to prevent resource leaks
Example code that also contains a variable to not replace the existing values:
public File saveResource(String name) throws IOException {
return saveResource(name, true);
}
public File saveResource(String name, boolean replace) throws IOException {
return saveResource(new File("."), name, replace)
}
public File saveResource(File outputDirectory, String name) throws IOException {
return saveResource(outputDirectory, name, true);
}
public File saveResource(File outputDirectory, String name, boolean replace)
throws IOException {
File out = new File(outputDirectory, name);
if (!replace && out.exists())
return out;
// Step 1:
InputStream resource = this.getClass().getResourceAsStream(name);
if (resource == null)
throw new FileNotFoundException(name + " (resource not found)");
// Step 2 and automatic step 4
try(InputStream in = resource;
OutputStream writer = new BufferedOutputStream(
new FileOutputStream(out))) {
// Step 3
byte[] buffer = new byte[1024 * 4];
int length;
while((length = in.read(buffer)) >= 0) {
writer.write(buffer, 0, length);
}
}
return out;
}

A jar is just a zip file. Unzip it (using whatever method you're comfortable with) and copy the file normally.

${JAVA_HOME}/bin/jar -cvf /path/to.jar

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading files inside a subdir within a zip file - java

Related

Unable to extract nested tar within zip file i.e. a .tar file inside a zip file and so on

ZipInputStream - functional approach to unzip files

Java: Add files to zip-file recursively but without full path

Extracting a zip file containing a jar file in Java

How to copy file inside jar to outside the jar?

Categories

Resources