How do you uncompress a split volume zip in Java? - java

I need to reassemble a 100-part zip file and extract the content. I tried simply concatenating the zip volumes together in an input stream but that does not work. Any suggestions would be appreciated.
Thanks.

Here is the code you can start from. It extracts a single file entry from the multivolume zip archive:
package org.test.zip;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.SequenceInputStream;
import java.util.Arrays;
import java.util.Collections;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public class Main {
public static void main(String[] args) throws IOException {
ZipInputStream is = new ZipInputStream(new SequenceInputStream(Collections.enumeration(
Arrays.asList(new FileInputStream("test.zip.001"), new FileInputStream("test.zip.002"), new FileInputStream("test.zip.003")))));
try {
for(ZipEntry entry = null; (entry = is.getNextEntry()) != null; ) {
OutputStream os = new BufferedOutputStream(new FileOutputStream(entry.getName()));
try {
final int bufferSize = 1024;
byte[] buffer = new byte[bufferSize];
for(int readBytes = -1; (readBytes = is.read(buffer, 0, bufferSize)) > -1; ) {
os.write(buffer, 0, readBytes);
}
os.flush();
} finally {
os.close();
}
}
} finally {
is.close();
}
}
}

Just a note to make it more dynamic -- 100% based on mijer code below.
private void CombineFiles (String[] files) throws FileNotFoundException, IOException {
Vector<FileInputStream> v = new Vector<FileInputStream>(files.length);
for (int x = 0; x < files.length; x++)
v.add(new FileInputStream(inputDirectory + files[x]));
Enumeration<FileInputStream> e = v.elements();
SequenceInputStream sequenceInputStream = new SequenceInputStream(e);
ZipInputStream is = new ZipInputStream(sequenceInputStream);
try {
for (ZipEntry entry = null; (entry = is.getNextEntry()) != null;) {
OutputStream os = new BufferedOutputStream(new FileOutputStream(entry.getName()));
try {
final int bufferSize = 1024;
byte[] buffer = new byte[bufferSize];
for (int readBytes = -1; (readBytes = is.read(buffer, 0, bufferSize)) > -1;) {
os.write(buffer, 0, readBytes);
}
os.flush();
} finally {
os.close();
}
}
} finally {
is.close();
}
}

To just concatenate the segment data did not work for me. In this case the segments had been created with Linux command-line zip (InfoZip version 3.0):
> zip -s 5m data.zip -r data/
Segment files named data.z01, data.z02, ..., data.zip was created.
The first segment data.z01 contained the spanning signature 0x08074b50, as described in the Zip File Format Specification by PKWARE. The presence of these 4 bytes made Java ZipInputStream ignore all entries in the archive. The central registry in the last segment also contained extra segment information compared to a non-split archive but that did not cause ZipInputStream any problems.
All I had to do was to skip the spanning signature. The following code will extract entries both from an archive that have been segmented with zip -s and from a zip file that have been split by the Linux split commad, like this: split -d -b 5M data.zip data.zip.. The code is based on szhem's.
public class ZipCat {
private final static byte[] SPANNING_SIGNATURE = {0x50, 0x4b, 0x07, 0x08};
public static void main(String[] args) throws IOException {
List<InputStream> asList = new ArrayList<>();
byte[] buf4 = new byte[4];
PushbackInputStream pis = new PushbackInputStream(new FileInputStream(args[0]), buf4.length);
asList.add(pis);
if (pis.read(buf4) != buf4.length) {
throw new IOException(args[0] + " is too small for a zip file/segment");
}
if (!Arrays.equals(buf4, SPANNING_SIGNATURE)) {
pis.unread(buf4, 0, buf4.length);
}
for (int i = 1; i < args.length; i++) {
asList.add(new FileInputStream(args[i]));
}
try (ZipInputStream is = new ZipInputStream(new SequenceInputStream(Collections.enumeration(asList)))) {
for (ZipEntry entry = null; (entry = is.getNextEntry()) != null;) {
if (entry.isDirectory()) {
new File(entry.getName()).mkdirs();
} else {
try (OutputStream os = new BufferedOutputStream(new FileOutputStream(entry.getName()))) {
byte[] buffer = new byte[1024];
int count = -1;
while ((count = is.read(buffer)) != -1) {
os.write(buffer, 0, count);
}
}
}
}
}
}
}

Related

java.io.UTFDataFormatException while reading file entry name

Im trying to "pack" several files (previously inside a jar archive) in another single non-jar file by using DataInputStream / DataOutputStream.
The idea was:
First int = number of entries
First UTF is the first entry name
Second Int is entry byte array length (entry size)
Then repeat for every entry.
The code:
public static void main(String[] args) throws Throwable {
test();
System.out.println("========================================================================================");
final DataInputStream dataInputStream = new DataInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut")));
for (int int1 = dataInputStream.readInt(), i = 0; i < int1; ++i) {
final String utf = dataInputStream.readUTF();
System.out.println("Entry name: " + utf);
final byte[] array = new byte[dataInputStream.readInt()];
for (int j = 0; j < array.length; ++j) {
array[j] = dataInputStream.readByte();
}
System.out.println("Entry bytes length: " + array.length);
}
}
Unpacking original & packing to new one:
private static void test() throws Throwable {
JarInputStream stream = new JarInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJar.jar")));
JarInputStream stream1 = new JarInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJar.jar")));
final byte[] buffer = new byte[2048];
final DataOutputStream outputStream = new DataOutputStream(new FileOutputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut")));
int entryCount = 0;
for (ZipEntry entry; (entry = stream.getNextJarEntry()) != null; ) {
entryCount++;
}
outputStream.writeInt(entryCount);
for (JarEntry entry; (entry = stream1.getNextJarEntry()) != null; ) {
int entryRealSize = stream1.read(buffer);
if (!(entryRealSize == -1)) {
System.out.println("Writing: " + entry.getName() + " Length: " + entryRealSize);
outputStream.writeUTF(entry.getName());
outputStream.writeInt(entryRealSize);
for (int len = stream1.read(buffer); len != -1; len = stream1.read(buffer)) {
outputStream.write(buffer, 0, len);
}
}
}
outputStream.flush();
outputStream.close();
}
Apparently im able to unpack the first entry without any problems, the second one and others:
Entry name: META-INF/services/org.jd.gui.spi.ContainerFactory
Entry bytes length: 434
Exception in thread "main" java.io.UTFDataFormatException: malformed input around byte 279
at java.io.DataInputStream.readUTF(DataInputStream.java:656)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at it.princekin.esercizio.Bootstrap.main(Bootstrap.java:29)
Disconnected from the target VM, address: '127.0.0.1:54384', transport: 'socket'
Process finished with exit code 1
Does anyone knows how to fix this? Why is this working for the first entry but not the others?
My take on this is that the jar file (which in fact is a zip file) has a Central Directory which is only read with the ZipFile (or JarFile) class.
The Central Directory contains some data about the entries such as the size.
I think the ZipInputStream will not read the Central Directory and thus the ZipEntry will not contain the size (returning -1 as it is unknown) whereas reading ZipEntry from ZipFile class will.
So if you first read the size of each entry using a ZipFile and store that in a map, you can easily get it when reading the data with the ZipInputStream.
This page includes some good examples as well.
So my version of your code would be:
import java.io.*;
import java.util.HashMap;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipInputStream;
public class JarRepacker {
public static void main(String[] args) throws Throwable {
JarRepacker repacker = new JarRepacker();
repacker.repackJarToMyFileFormat("commons-cli-1.3.1.jar", "randomJarOut.bin");
repacker.readMyFileFormat("randomJarOut.bin");
}
private void repackJarToMyFileFormat(String inputJar, String outputFile) throws Throwable {
int entryCount;
Map<String, Integer> sizeMap = new HashMap<>();
try (ZipFile zipFile = new ZipFile(inputJar)) {
entryCount = zipFile.size();
zipFile.entries().asIterator().forEachRemaining(e -> sizeMap.put(e.getName(), (int) e.getSize()));
}
try (final DataOutputStream outputStream = new DataOutputStream(new FileOutputStream(outputFile))) {
outputStream.writeInt(entryCount);
try (ZipInputStream stream = new ZipInputStream(new BufferedInputStream(new FileInputStream(inputJar)))) {
ZipEntry entry;
final byte[] buffer = new byte[2048];
while ((entry = stream.getNextEntry()) != null) {
final String name = entry.getName();
outputStream.writeUTF(name);
final Integer size = sizeMap.get(name);
outputStream.writeInt(size);
//System.out.println("Writing: " + name + " Size: " + size);
int len;
while ((len = stream.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
}
}
outputStream.flush();
}
}
private void readMyFileFormat(String fileToRead) throws IOException {
try (DataInputStream dataInputStream
= new DataInputStream(new BufferedInputStream(new FileInputStream(fileToRead)))) {
int entries = dataInputStream.readInt();
System.out.println("Entries in file: " + entries);
for (int i = 1; i <= entries; i++) {
final String name = dataInputStream.readUTF();
final int size = dataInputStream.readInt();
System.out.printf("[%3d] Reading: %s of size: %d%n", i, name, size);
final byte[] array = new byte[size];
for (int j = 0; j < array.length; ++j) {
array[j] = dataInputStream.readByte();
}
// Still need to do something with this array...
}
}
}
}
The problem, probably, lies in that you are mixing not reciprocal read/write methods:
The writer method writes with outputStream.writeInt(entryCount) and the main method reads with dataInputStream.readInt(). That is OK.
The writer method writes with outputStream.writeUTF(entry.getName()) and the main method reads with dataInputStream.readUTF(). That is OK.
The writer method writes with outputStream.writeInt(entryRealSize) and the main method reads with dataInputStream.readInt(). That is OK.
The writer method writes with outputStream.write(buffer, 0, len) and the main method reads with dataInputStream.readByte() several times. WRONG.
If you write an array of bytes with write(buffer, offset, len), you must read it with read(buffer, offset, len), because write(buffer, offset, len) writes exactly len physical bytes onto the output stream, while writeByte (the counterpart of readByte) writes a lot of metadata overhead about the object type, and then its state variables.
Bugs in the writer method
There is also a mayor bug in the writer method: It invokes up to three times stream1.read(buffer), but it just uses once the buffer contents. The result is that the real size of file is actually written onto the output stream metadata, but it is followed by just a small part of the data.
If you need to know the input file size before writing it in the output stream, you have two choices:
Either chose a large enough buffer size (like 204800) which will allow you to read the whole file in just one read and write it in just one write.
Or either separate read from write algorithms: First a method to read the whole file and store it in memory (a byte[], for example), and then another method to write the byte[] onto the output stream.
Full fixed solution
I've fixed your program, with specific, decoupled methods for each task. The process consists in parsing the input file to a memory model, write it to an intermediate file according to your custom definition, and then read it back.
public static void main(String[] args)
throws Throwable
{
File inputJarFile=new File(args[0]);
File intermediateFile=new File(args[1]);
List<FileData> fileDataEntries=parse(inputJarFile);
write(fileDataEntries, intermediateFile);
read(intermediateFile);
}
public static List<FileData> parse(File inputJarFile)
throws IOException
{
List<FileData> list=new ArrayList<>();
try (JarInputStream stream=new JarInputStream(new FileInputStream(inputJarFile)))
{
for (ZipEntry entry; (entry=stream.getNextJarEntry()) != null;)
{
byte[] data=readAllBytes(stream);
if (data.length > 0)
{
list.add(new FileData(entry.getName(), data));
}
stream.closeEntry();
}
}
return list;
}
public static void write(List<FileData> fileDataEntries, File output)
throws Throwable
{
try (DataOutputStream outputStream=new DataOutputStream(new FileOutputStream(output)))
{
int entryCount=fileDataEntries.size();
outputStream.writeInt(entryCount);
for (FileData fileData : fileDataEntries)
{
int entryRealSize=fileData.getData().length;
{
System.out.println("Writing: " + fileData.getName() + " Length: " + entryRealSize);
outputStream.writeUTF(fileData.getName());
outputStream.writeInt(entryRealSize);
outputStream.write(fileData.getData());
}
}
outputStream.flush();
}
}
public static void read(File intermediateFile)
throws IOException
{
try (DataInputStream dataInputStream=new DataInputStream(new FileInputStream(intermediateFile)))
{
for (int entryCount=dataInputStream.readInt(), i=0; i < entryCount; i++)
{
String utf=dataInputStream.readUTF();
int entrySize=dataInputStream.readInt();
System.out.println("Entry name: " + utf + " size: " + entrySize);
byte[] data=readFixedLengthBuffer(dataInputStream, entrySize);
System.out.println("Entry bytes length: " + data.length);
}
}
}
private static byte[] readAllBytes(InputStream input)
throws IOException
{
byte[] buffer=new byte[4096];
byte[] total=new byte[0];
int len;
do
{
len=input.read(buffer);
if (len > 0)
{
byte[] total0=total;
total=new byte[total0.length + len];
System.arraycopy(total0, 0, total, 0, total0.length);
System.arraycopy(buffer, 0, total, total0.length, len);
}
}
while (len >= 0);
return total;
}
private static byte[] readFixedLengthBuffer(InputStream input, int size)
throws IOException
{
byte[] buffer=new byte[size];
int pos=0;
int len;
do
{
len=input.read(buffer, pos, size - pos);
if (len > 0)
{
pos+=len;
}
}
while (pos < size);
return buffer;
}
private static class FileData
{
private final String name;
private final byte[] data;
public FileData(String name, byte[] data)
{
super();
this.name=name;
this.data=data;
}
public String getName()
{
return this.name;
}
public byte[] getData()
{
return this.data;
}
}

Create a directory set of files to multiple zips?

I want to convert my directory(i.e. too large) which i want to convert to multiple zip files . Right now i can able to convert all files in the dirctory to one zip file using the below code .
package bulkimport;
import java.io.*;
import java.util.zip.*;
public class Tzip {
static final int BUFFER = 2048;
public static void main (String argv[]) {
try {
BufferedInputStream origin = null;
FileOutputStream dest = new
FileOutputStream("D:/abc/file.zip");
ZipOutputStream out = new ZipOutputStream(new
BufferedOutputStream(dest));
//out.setMethod(ZipOutputStream.DEFLATED);
byte data[] = new byte[BUFFER];
// get a list of files from current directory
File f = new File("D://current//");
String files[] = f.list();
for (int i=0; i<files.length; i++) {
System.out.println("Adding: "+files[i]);
FileInputStream fi = new
FileInputStream("D://current//"+files[i]);
origin = new
BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(files[i]);
out.putNextEntry(entry);
int count;
while((count = origin.read(data, 0,
BUFFER)) != -1) {
out.write(data, 0, count);
}
origin.close();
}
out.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}
What i need to do to above code in such a way that it will create multiple zip files based on our condition(i.e.By setting threshold as 50MB it will create multiple zip files of each file having 50MB of compressed data )
Any help is appreciated
Thanks

Extracting SFX 7-Zip

I want to extract two specific files from a .zip file. I tried the following library:
ZipFile zipFile = new ZipFile("myZip.zip");
Result:
Exception in thread "main" java.util.zip.ZipException: error in opening zip file
I also tried:
public void extract(String targetFileName) throws IOException
{
OutputStream outputStream = new FileOutputStream("targetFile.foo");
FileInputStream fileInputStream = new FileInputStream("myZip.zip");
ZipInputStream zipInputStream = new ZipInputStream(new BufferedInputStream(fileInputStream));
ZipEntry zipEntry;
while ((zipEntry = zipInputStream.getNextEntry()) != null)
{
if (zipEntry.getName().equals("targetFile.foo"))
{
byte[] buffer = new byte[8192];
int length;
while ((length = zipInputStream.read(buffer)) != -1)
{
outputStream.write(buffer, 0, length);
}
outputStream.close();
break;
}
}
}
Result:
No exception, but an empty targetFile.foo file.
Note that the .zip file is of type SFX 7-zip and initially had the .exe extensions so that may be the reason for the failure.
As in Comments, Extracting SFX 7-Zip file is basically not supported with your library. But you can do with commons compress and xz Libary together with a quick "hack":
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
import org.apache.commons.io.FileUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
...
protected File un7zSFXFile(File file, String password)
{
SevenZFile sevenZFile = null;
File tempFile = new File("/tmp/" + file.getName() + ".temp");
try
{
FileInputStream in = new FileInputStream(file);
/**
* Yes this is Voodoo Code:
* first 205824 Bytes get skipped as these is are basically the 7z-sfx-runnable.dll
* common-compress does fail if this information is not cut away
* ATTENTION: the amount of bytes may vary depending of the 7z Version used!
*/
in.skip(205824);
// EndOfVoodoCode
tempFile.getParentFile().mkdirs();
tempFile.createNewFile();
FileOutputStream temp = new FileOutputStream(tempFile);
byte[] buffer = new byte[1024];
int length;
while((length = in.read(buffer)) > 0)
{
temp.write(buffer, 0, length);
}
temp.close();
in.close();
LOGGER.info("prepared exefile for un7zing");
if (password!=null) {
sevenZFile = new SevenZFile(tempFile, password.toCharArray());
} else {
sevenZFile = new SevenZFile(tempFile);
}
SevenZArchiveEntry entry;
boolean first = true;// accept only files with
while((entry = sevenZFile.getNextEntry()))
{
if(entry.isDirectory())
{
continue;
}
File curfile = new File(file.getParentFile(), entry.getName());
File parent = curfile.getParentFile();
if(!parent.exists())
{
parent.mkdirs();
}
FileOutputStream out = new FileOutputStream(curfile);
byte[] content = new byte[(int) entry.getSize()];
sevenZFile.read(content, 0, content.length);
out.write(content);
out.close();
}
}
catch(Exception e)
{
throw e;
}
finally
{
try
{
tempFile.delete();
sevenZFile.close();
}
catch(Exception e)
{
LOGGER.trace("error on cloasing Stream: " + sevenZFile.getDefaultName(), e);
}
}
}
Please acknowledge that this simple solution does only unpack in to the same directory as the as sfx-file is placed!

Splitting and Merging large files (size in GB) in Java

Suppose,
I am splitting 2590400 KB (approx 2.5 GB) file in 30 parts.
It will produce 30 files with size of 86347 KB.
Which seems correct, 2590400/30 = 86346.66666667
Now if I merge all the parts (30) again it is producing the file of 3453873 KB file, which should be 2590410 KB.
Can anyone help me why this difference is there? I am using below code for merge and split files.
SplitFile.java
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.RandomAccessFile;
/**
* #author vishal.zanzrukia
*
*/
public class SplitFile {
public static final String INPUT_FILE = "D:\\me\\projects\\input\\file\\path.txt";
public static final int NUMBER_OF_OUTPUT_FILES = 30;
public static final String FILE_SUFFIX = ".txt";
/**
* split file
*
* #throws Exception
*/
static void splitFile() throws Exception{
File inputFile = new File(INPUT_FILE + "_Splits");
inputFile.mkdir();
RandomAccessFile raf = new RandomAccessFile(INPUT_FILE, "r");
long sourceSize = raf.length();
long bytesPerSplit = sourceSize / NUMBER_OF_OUTPUT_FILES;
long remainingBytes = sourceSize % NUMBER_OF_OUTPUT_FILES;
int maxReadBufferSize = 8 * 1024; // 8KB
for (int destIx = 1; destIx <= NUMBER_OF_OUTPUT_FILES; destIx++) {
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(INPUT_FILE + "_Splits\\split." + destIx + FILE_SUFFIX));
if (bytesPerSplit > maxReadBufferSize) {
long numReads = bytesPerSplit / maxReadBufferSize;
long numRemainingRead = bytesPerSplit % maxReadBufferSize;
for (int i = 0; i < numReads; i++) {
readWrite(raf, bw, maxReadBufferSize);
}
if (numRemainingRead > 0) {
readWrite(raf, bw, numRemainingRead);
}
} else {
readWrite(raf, bw, bytesPerSplit);
}
bw.close();
}
if (remainingBytes > 0) {
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split." + NUMBER_OF_OUTPUT_FILES + 1));
readWrite(raf, bw, remainingBytes);
bw.close();
}
raf.close();
}
/**
* join file
*
* #throws Exception
*/
static void joinFiles() throws Exception{
int maxReadBufferSize = 8 * 1024;
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(INPUT_FILE + "_Splits\\fullJoin" + FILE_SUFFIX));
File inputFileDir = new File(INPUT_FILE + "_Splits");
RandomAccessFile raf = null;
if(inputFileDir.isDirectory()){
for(File file : inputFileDir.listFiles()){
raf = new RandomAccessFile(file, "r");
long numReads = raf.length() / maxReadBufferSize;
long numRemainingRead = raf.length() % maxReadBufferSize;
for (int i = 0; i < numReads; i++) {
readWrite(raf, bw, maxReadBufferSize);
}
if (numRemainingRead > 0) {
readWrite(raf, bw, numRemainingRead);
}
raf.close();
}
}
bw.close();
}
public static void mergeFiles() {
File[] files = new File[NUMBER_OF_OUTPUT_FILES];
for(int i=1;i<=NUMBER_OF_OUTPUT_FILES;i++){
files[i-1] = new File(INPUT_FILE + "_Splits\\split."+i+FILE_SUFFIX);
}
String mergedFilePath = INPUT_FILE + "_Splits\\fullJoin" + FILE_SUFFIX;
File mergedFile = new File(mergedFilePath);
mergeFiles(files, mergedFile);
}
public static void mergeFiles(File[] files, File mergedFile) {
FileWriter fstream = null;
BufferedWriter out = null;
try {
fstream = new FileWriter(mergedFile, true);
out = new BufferedWriter(fstream);
} catch (IOException e1) {
e1.printStackTrace();
}
for (File f : files) {
System.out.println("merging: " + f.getName());
FileInputStream fis;
try {
fis = new FileInputStream(f);
BufferedReader in = new BufferedReader(new InputStreamReader(fis));
String aLine;
while ((aLine = in.readLine()) != null) {
out.write(aLine);
out.newLine();
}
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
try {
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
// splitFile();
mergeFiles();
}
static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
byte[] buf = new byte[(int) numBytes];
int val = raf.read(buf);
if (val != -1) {
bw.write(buf);
}
}
}
Use your joinFiles method: don't try to read a file by line-by-line using a Reader if you want to keep it exactly like it was, because line endings may differ by platform.
Instead read them as a binary file using an InputStream or RandomAccessFile and write using an OutputStream.
The only problem in your joinFiles method is that it used File.listFiles() which makes no guarantees about the order in which the files are returned.
I combined your mergeFiles() code with joinFiles() to make this work (remember to invoke joinFiles() instead of mergeFiles() from your main method)
static void joinFiles(File[] files) throws Exception {
int maxReadBufferSize = 8 * 1024;
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream(INPUT_FILE + "_Splits\\fullJoin"
+ FILE_SUFFIX));
RandomAccessFile raf = null;
for (File file : files) {
raf = new RandomAccessFile(file, "r");
long numReads = raf.length() / maxReadBufferSize;
long numRemainingRead = raf.length() % maxReadBufferSize;
for (int i = 0; i < numReads; i++) {
readWrite(raf, bw, maxReadBufferSize);
}
if (numRemainingRead > 0) {
readWrite(raf, bw, numRemainingRead);
}
raf.close();
}
bw.close();
}
public static void joinFiles() throws Exception {
File[] files = new File[NUMBER_OF_OUTPUT_FILES];
for (int i = 1; i <= NUMBER_OF_OUTPUT_FILES; i++) {
files[i - 1] = new File(INPUT_FILE + "_Splits\\split." + i + FILE_SUFFIX);
}
joinFiles(files);
}
The problem is the very last line of code:
static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
byte[] buf = new byte[(int) numBytes];
int val = raf.read(buf);
if (val != -1) {
bw.write(buf);
}
}
When you write, you write back numBytes of data, but the read function has usefully returned:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of this file has been reached.
Therefore, your fix is to use a different write:
bw.write(buf, 0 val);

Modifying a text file in a ZIP archive in Java

My use case requires me to open a txt file, say abc.txt which is inside a zip archive which contains key-value pairs in the form
key1=value1
key2=value2
.. and so on where each key-value pair is in a new line.
I have to change one value corresponding to a certain key and put the text file back in a new copy of the archive. How do I do this in java?
My attempt so far:
ZipFile zipFile = new ZipFile("test.zip");
final ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("out.zip"));
for(Enumeration e = zipFile.entries(); e.hasMoreElements(); ) {
ZipEntry entryIn = (ZipEntry) e.nextElement();
if(!entryIn.getName().equalsIgnoreCase("abc.txt")){
zos.putNextEntry(entryIn);
InputStream is = zipFile.getInputStream(entryIn);
byte [] buf = new byte[1024];
int len;
while((len = (is.read(buf))) > 0) {
zos.write(buf, 0, len);
}
}
else{
// I'm not sure what to do here
// Tried a few things and the file gets corrupt
}
zos.closeEntry();
}
zos.close();
Java 7 introduced a much simpler way for doing zip archive manipulations - FileSystems API, which allows to access contents of a file as a file system.
Besides much more straightforward API, it is doing the modification in-place and doesn't require to rewrite other (irrelevant) files in a zip archive (as done in the accepted answer).
Here's sample code that solves OP's use case:
import java.io.*;
import java.nio.file.*;
public static void main(String[] args) throws IOException {
modifyTextFileInZip("test.zip");
}
static void modifyTextFileInZip(String zipPath) throws IOException {
Path zipFilePath = Paths.get(zipPath);
try (FileSystem fs = FileSystems.newFileSystem(zipFilePath, null)) {
Path source = fs.getPath("/abc.txt");
Path temp = fs.getPath("/___abc___.txt");
if (Files.exists(temp)) {
throw new IOException("temp file exists, generate another name");
}
Files.move(source, temp);
streamCopy(temp, source);
Files.delete(temp);
}
}
static void streamCopy(Path src, Path dst) throws IOException {
try (BufferedReader br = new BufferedReader(
new InputStreamReader(Files.newInputStream(src)));
BufferedWriter bw = new BufferedWriter(
new OutputStreamWriter(Files.newOutputStream(dst)))) {
String line;
while ((line = br.readLine()) != null) {
line = line.replace("key1=value1", "key1=value2");
bw.write(line);
bw.newLine();
}
}
}
For more zip archive manipulation examples, see demo/nio/zipfs/Demo.java sample which you can download here (look for JDK 8 Demos and Samples).
You had almost got it right. One possible reason, the file was shown as corrupted is that you might have used
zos.putNextEntry(entryIn)
in the else part as well. This creates a new entry in the zip file containing information from the existing zip file. Existing information contains entry name(file name) and its CRC among other things.
And then, when u try to update the text file and close the zip file, it will throw an error as the CRC defined in the entry and the CRC of the object you are trying to write differ.
Also u might get an error if the length of the text that you are trying to replace is different than the one existing i.e. you are trying to replace
key1=value1
with
key1=val1
This boils down to the problem that the buffer you are trying to write to has length different than the one specified.
ZipFile zipFile = new ZipFile("test.zip");
final ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("out.zip"));
for(Enumeration e = zipFile.entries(); e.hasMoreElements(); ) {
ZipEntry entryIn = (ZipEntry) e.nextElement();
if (!entryIn.getName().equalsIgnoreCase("abc.txt")) {
zos.putNextEntry(entryIn);
InputStream is = zipFile.getInputStream(entryIn);
byte[] buf = new byte[1024];
int len;
while((len = is.read(buf)) > 0) {
zos.write(buf, 0, len);
}
}
else{
zos.putNextEntry(new ZipEntry("abc.txt"));
InputStream is = zipFile.getInputStream(entryIn);
byte[] buf = new byte[1024];
int len;
while ((len = (is.read(buf))) > 0) {
String s = new String(buf);
if (s.contains("key1=value1")) {
buf = s.replaceAll("key1=value1", "key1=val2").getBytes();
}
zos.write(buf, 0, (len < buf.length) ? len : buf.length);
}
}
zos.closeEntry();
}
zos.close();
The following code ensures that even if data that is replaced is of less length than the original length, no IndexOutOfBoundsExceptions occur.
(len < buf.length) ? len : buf.length
Only a little improvement to:
else{
zos.putNextEntry(new ZipEntry("abc.txt"));
InputStream is = zipFile.getInputStream(entryIn);
byte[] buf = new byte[1024];
int len;
while ((len = (is.read(buf))) > 0) {
String s = new String(buf);
if (s.contains("key1=value1")) {
buf = s.replaceAll("key1=value1", "key1=val2").getBytes();
}
zos.write(buf, 0, (len < buf.length) ? len : buf.length);
}
}
That should be:
else{
zos.putNextEntry(new ZipEntry("abc.txt"));
InputStream is = zipFile.getInputStream(entryIn);
long size = entry.getSize();
if (size > Integer.MAX_VALUE) {
throw new IllegalStateException("...");
}
byte[] bytes = new byte[(int)size];
is.read(bytes);
zos.write(new String(bytes).replaceAll("key1=value1", "key1=val2").getBytes());
}
In order to capture all the occurrences
The reason is that, with the first, you could have "key1" in one read and "=value1" in the next, not being able to capture the occurrence you want to change

Categories