Simple question,
I'm writing a series of text files into a zip, just wrapping a fileoutputstream in a zipoutputstream and then in a printwriter.
public static int saveData(File outfile, DataStructure input) {
//variables
ArrayList<String> out = null;
FileOutputStream fileout = null;
ZipOutputStream zipout = null;
PrintWriter printer = null;
//parameter tests
try {
fileout = new FileOutputStream(outfile);
zipout = new ZipOutputStream(fileout);
printer = new PrintWriter(zipout);
} catch (Exception e) {
e.printStackTrace();
return util.FILE_INVALID;
}
for(DataItem data : input){
//process the data into a list of strings
try {
zipout.putNextEntry(new ZipEntry( dataFileName ));
for(String s : out) {
printer.println(s);
}
zipout.closeEntry();
} catch (Exception e) {
try {
fileout.close();
} catch (Exception x) {
x.printStackTrace();
return util.CRITICAL_ERROR;
}
e.printStackTrace();
return util.CRITICAL_ERROR;
}
}
try {
fileout.close();
} catch (Exception e) {
e.printStackTrace();
return util.CRITICAL_ERROR;
}
return util.SUCCESS;
}
Previously in the app i've been developing I've just been saving to the current directory for testing and I know in the case of a file already existing that the file will be overwritten (and have been exploiting this). What I dont know is the behaviour for zips. Will it overwrite entries of the same name? Or will it simply overwrite the whole zip file (which would be convenient for my purposes.
K.Barad
As Joel said, If you try to add a duplicate ZipEntry you will get an exception. If you want to replace the current entry you need to delete it and re-insert it.
You might want to do something like here below to achieve it:
private ZipFile addFileToExistingZip(File zipFile, File versionFile) throws IOException{
// get a temp file
File tempFile = File.createTempFile(zipFile.getName(), null);
// delete it, otherwise you cannot rename your existing zip to it.
tempFile.delete();
boolean renameOk=zipFile.renameTo(tempFile);
if (!renameOk)
{
throw new RuntimeException("could not rename the file "+zipFile.getAbsolutePath()+" to "+tempFile.getAbsolutePath());
}
byte[] buf = new byte[4096 * 1024];
ZipInputStream zin = new ZipInputStream(new FileInputStream(tempFile));
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipFile));
ZipEntry entry = zin.getNextEntry();
while (entry != null) {
String name = entry.getName();
boolean toBeDeleted = false;
if (versionFile.getName().indexOf(name) != -1) {
toBeDeleted = true;
}
if(!toBeDeleted){
// Add ZIP entry to output stream.
out.putNextEntry(new ZipEntry(name));
// Transfer bytes from the ZIP file to the output file
int len;
while ((len = zin.read(buf)) > 0) {
out.write(buf, 0, len);
}
}
entry = zin.getNextEntry();
}
// Close the streams
zin.close();
// Compress the files
InputStream in = new FileInputStream(versionFile);
String fName = versionFile.getName();
// Add ZIP entry to output stream.
out.putNextEntry(new ZipEntry(fName));
// Transfer bytes from the file to the ZIP file
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
// Complete the entry
out.closeEntry();
in.close();
// Complete the ZIP file
out.close();
tempFile.delete();
return new ZipFile(zipFile);
}
The above code worked for me where the need was to add a new zip entry to an existing zip file. If the entry is already present inside the zip, then overwrite it.
Comments/improvements in the code are welcome!
Thanks!
If you try to add a duplicate ZipEntry you will get an exception. If you want to replace the current entry you need to delete it and re-insert it. I suspect the exception you get is much the same as this one.
Related
I must get file content from ZIP archive (only one file, I know its name) using SFTP. The only thing I'm having is ZIP's InputStream. Most examples show how get content using this statement:
ZipFile zipFile = new ZipFile("location");
But as I said, I don't have ZIP file on my local machine and I don't want to download it. Is an InputStream enough to read?
UPD: This is how I do:
import java.util.zip.ZipInputStream;
import com.jcraft.jsch.Channel;
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;
public class SFTP {
public static void main(String[] args) {
String SFTPHOST = "host";
int SFTPPORT = 3232;
String SFTPUSER = "user";
String SFTPPASS = "mypass";
String SFTPWORKINGDIR = "/dir/work";
Session session = null;
Channel channel = null;
ChannelSftp channelSftp = null;
try {
JSch jsch = new JSch();
session = jsch.getSession(SFTPUSER, SFTPHOST, SFTPPORT);
session.setPassword(SFTPPASS);
java.util.Properties config = new java.util.Properties();
config.put("StrictHostKeyChecking", "no");
session.setConfig(config);
session.connect();
channel = session.openChannel("sftp");
channel.connect();
channelSftp = (ChannelSftp) channel;
channelSftp.cd(SFTPWORKINGDIR);
ZipInputStream stream = new ZipInputStream(channelSftp.get("file.zip"));
ZipEntry entry = zipStream.getNextEntry();
System.out.println(entry.getName); //Yes, I got its name, now I need to get content
} catch (Exception ex) {
ex.printStackTrace();
} finally {
session.disconnect();
channelSftp.disconnect();
channel.disconnect();
}
}
}
Below is a simple example on how to extract a ZIP File, you will need to check if the file is a directory. But this is the simplest.
The step you are missing is reading the input stream and writing the contents to a buffer which is written to an output stream.
// Expands the zip file passed as argument 1, into the
// directory provided in argument 2
public static void main(String args[]) throws Exception
{
if(args.length != 2)
{
System.err.println("zipreader zipfile outputdir");
return;
}
// create a buffer to improve copy performance later.
byte[] buffer = new byte[2048];
// open the zip file stream
InputStream theFile = new FileInputStream(args[0]);
ZipInputStream stream = new ZipInputStream(theFile);
String outdir = args[1];
try
{
// now iterate through each item in the stream. The get next
// entry call will return a ZipEntry for each file in the
// stream
ZipEntry entry;
while((entry = stream.getNextEntry())!=null)
{
String s = String.format("Entry: %s len %d added %TD",
entry.getName(), entry.getSize(),
new Date(entry.getTime()));
System.out.println(s);
// Once we get the entry from the stream, the stream is
// positioned read to read the raw data, and we keep
// reading until read returns 0 or less.
String outpath = outdir + "/" + entry.getName();
FileOutputStream output = null;
try
{
output = new FileOutputStream(outpath);
int len = 0;
while ((len = stream.read(buffer)) > 0)
{
output.write(buffer, 0, len);
}
}
finally
{
// we must always close the output file
if(output!=null) output.close();
}
}
}
finally
{
// we must always close the zip file.
stream.close();
}
}
Code excerpt came from the following site:
http://www.thecoderscorner.com/team-blog/java-and-jvm/12-reading-a-zip-file-from-java-using-zipinputstream#.U4RAxYamixR
Well, I've done this:
zipStream = new ZipInputStream(channelSftp.get("Port_Increment_201405261400_2251.zip"));
zipStream.getNextEntry();
sc = new Scanner(zipStream);
while (sc.hasNextLine()) {
System.out.println(sc.nextLine());
}
It helps me to read ZIP's content without writing to another file.
The ZipInputStream is an InputStream by itself and delivers the contents of each entry after each call to getNextEntry(). Special care must be taken, not to close the stream from which the contents is read, since it is the same as the ZIP stream:
public void readZipStream(InputStream in) throws IOException {
ZipInputStream zipIn = new ZipInputStream(in);
ZipEntry entry;
while ((entry = zipIn.getNextEntry()) != null) {
System.out.println(entry.getName());
readContents(zipIn);
zipIn.closeEntry();
}
}
private void readContents(InputStream contentsIn) throws IOException {
byte contents[] = new byte[4096];
int direct;
while ((direct = contentsIn.read(contents, 0, contents.length)) >= 0) {
System.out.println("Read " + direct + "bytes content.");
}
}
When delegating reading contents to other logic, it can be necessary to wrap the ZipInputStream with a FilterInputStream to close only the entry instead of the whole stream as in:
public void readZipStream(InputStream in) throws IOException {
ZipInputStream zipIn = new ZipInputStream(in);
ZipEntry entry;
while ((entry = zipIn.getNextEntry()) != null) {
System.out.println(entry.getName());
readContents(new FilterInputStream(zipIn) {
#Override
public void close() throws IOException {
zipIn.closeEntry();
}
});
}
}
OP was close. Just need to read the bytes. The call to getNextEntry positions the stream at the beginning of the entry data (docs). If that's the entry we want (or the only entry), then the InputStream is in the right spot. All we need to do is read that entry's decompressed bytes.
byte[] bytes = new byte[(int) entry.getSize()];
int i = 0;
while (i < bytes.length) {
// .read doesn't always fill the buffer we give it.
// Keep calling it until we get all the bytes for this entry.
i += zipStream.read(bytes, i, bytes.length - i);
}
So if these bytes really are text, then we can decode those bytes to a String. I'm just assuming utf8 encoding.
new String(bytes, "utf8")
Side note: I personally use apache commons-io IOUtils to cut down on this kind of lower level stuff. The docs for ZipInputStream.read seem to imply that read will stop at the end of the current zip entry. If that is true, then reading the current textual entry is one line with IOUtils.
String text = IOUtils.toString(zipStream)
Unzip archive (zip) with preserving file structure into given directory.
Note; this code use deps on "org.apache.commons.io.IOUtils"), but you can replace it by yours custom 'read-stream' code
public static void unzipDirectory(File archiveFile, File destinationDir) throws IOException
{
Path destPath = destinationDir.toPath();
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(archiveFile)))
{
ZipEntry zipEntry;
while ((zipEntry = zis.getNextEntry()) != null)
{
Path resolvedPath = destPath.resolve(zipEntry.getName()).normalize();
if (!resolvedPath.startsWith(destPath))
{
throw new IOException("The requested zip-entry '" + zipEntry.getName() + "' does not belong to the requested destination");
}
if (zipEntry.isDirectory())
{
Files.createDirectories(resolvedPath);
} else
{
if(!Files.isDirectory(resolvedPath.getParent()))
{
Files.createDirectories(resolvedPath.getParent());
}
try (FileOutputStream outStream = new FileOutputStream(resolvedPath.toFile()))
{
IOUtils.copy(zis, outStream);
}
}
}
}
}
Here a more generic solution to process a zip inputstream with a BiConsumer. It's nearly the same solution that was used by haui
private void readZip(InputStream is, BiConsumer<ZipEntry,InputStream> consumer) throws IOException {
try (ZipInputStream zipFile = new ZipInputStream(is);) {
ZipEntry entry;
while((entry = zipFile.getNextEntry()) != null){
consumer.accept(entry, new FilterInputStream(zipFile) {
#Override
public void close() throws IOException {
zipFile.closeEntry();
}
});
}
}
}
You can use it by just calling
readZip(<some inputstream>, (entry, is) -> {
/* don't forget to close this stream after processing. */
is.read() // ... <- to read each entry
});
If content of your ZIP consist of 1 file (for example, zipped content of HTTP response), you can read text content using Kotlin as follows:
#Throws(IOException::class)
fun InputStream.readZippedContent() = ZipInputStream(this).use { stream ->
stream.nextEntry?.let { stream.bufferedReader().readText() } ?: String()
}
This extension function unzips first ZIP entry of Zip file and read content as plain text.
Usage:
val inputStream: InputStream = ... // your zipped InputStream
val textContent = inputStream.readZippedContent()
We are storing zip files, containing XML files, in HDFS. We need to be able to programmatically unzip the file and stream out the contained XML files, using Java. FileSystem.open returns a FSDataInputStream but ZipFile constructors only take File or String as parameters. I really don't want to have to use FileSystem.copyToLocalFile.
Is it possible to stream the contents of a zip file stored in HDFS without first copying the zip file to the local file system? If so how?
Hi Please find the sample code,
public static Map<String, byte[]> loadZipFileData(String hdfsFilePath) {
try {
ZipInputStream zipInputStream = readZipFileFromHDFS(new Path(hdfsFilePath));
ZipEntry zipEntry = null;
byte[] buf = new byte[1024];
Map<String, byte[]> listOfFiles = new LinkedHashMap<>();
while ((zipEntry = zipInputStream.getNextEntry()) != null ) {
int bytesRead = 0;
String entryName = zipEntry.getName();
if (!zipEntry.isDirectory()) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
while ((bytesRead = zipInputStream.read(buf, 0, 1024)) > -1) {
outputStream.write(buf, 0, bytesRead);
}
listOfFiles.put(entryName, outputStream.toByteArray());
outputStream.close();
}
zipInputStream.closeEntry();
}
zipInputStream.close();
return listOfFiles;
} catch (Exception e) {
e.printStackTrace();
}
}
protected ZipInputStream readZipFileFromHDFS(FileSystem fileSystem, Path path) throws Exception {
if (!fileSystem.exists(path)) {
throw new IllegalArgumentException(path.getName() + " does not exist");
}
FSDataInputStream fsInputStream = fileSystem.open(path);
ZipInputStream zipInputStream = new ZipInputStream(fsInputStream);
return zipInputStream;
}
I've been trying to tackle this problem for a day or two and can't seem to figure out precisely how to add text files to a zip file, I was able to figure out how to add these text files to a 7zip file which was insanely easy, but a zip file seems to me much more complicated for some reason. I want to return a zip file for user reasons btw.
Here's what I have now:
(I know the code isn't too clean at the moment, I plan to tackle that after getting the bare functionality down).
private ZipOutputStream addThreadDumpsToZipFile(File file, List<Datapoint<ThreadDump>> allThreadDumps, List<Datapoint<String>> allThreadDumpTextFiles) {
ZipOutputStream threadDumpsZipFile = null;
try {
//creat new zip file which accepts input stream
//TODO missing step: create text files containing each thread dump then add to zip
threadDumpsZipFile = new ZipFile(new FileOutputStream(file));
FileInputStream fileInputStream = null;
try {
//add data to each thread dump entry
for(int i=0; i<allThreadDumpTextFiles.size();i++) {
//create file for each thread dump
File threadDumpFile = new File("thread_dump_"+i+".txt");
FileUtils.writeStringToFile(threadDumpFile,allThreadDumpTextFiles.get(i).toString());
//add entry/file to zip file (creates block to add input to)
ZipEntry threadDumpEntry = new ZipEntry("thread_dump_"+i); //might need to add extension here?
threadDumpsZipFile.putNextEntry(threadDumpEntry);
//add the content to this entry
fileInputStream = new FileInputStream(threadDumpFile);
byte[] byteBuffer = new byte[(int) threadDumpFile.length()]; //see if this sufficiently returns length of data
int bytesRead = -1;
while ((bytesRead = fileInputStream.read(byteBuffer)) != -1) {
threadDumpsZipFile.write(byteBuffer, 0, bytesRead);
}
}
threadDumpsZipFile.flush();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
fileInputStream.close();
} catch(Exception e) {
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return threadDumpsZipFile;
}
As you can sort of guess, I have a set of Thread Dumps that I want to add to my zip file and return to the user.
Let me know if you guys need any more info!
PS: There might be some bugs in this question, I just realized with some breakpoints that the threadDumpFile.length() won't really work.
Look forward to your replies!
Thanks,
Arsa
Here's a crack at it. I think you'll want to keep the file extensions when you make your ZipEntry objects. See if you can implement the below createTextFiles() function; the rest of this works -- I stubbed that method to return a single "test.txt" file with some dummy data to verify.
void zip()
{
try {
FileOutputStream fos = new FileOutputStream("yourZipFile.zip");
ZipOutputStream zos = new ZipOutputStream(fos);
File[] textFiles = createTextFiles(); // should be an easy step
for (int i = 0; i < files.length; i++) {
addToZipFile(file[i].getName(), zos);
}
zos.close();
fos.close();
} catch (Exception e) {
e.printStackTrace();
}
}
void addToZipFile(String fileName, ZipOutputStream zos) throws Exception {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file);
ZipEntry zipEntry = new ZipEntry(fileName);
zos.putNextEntry(zipEntry);
byte[] bytes = new byte[1024];
int length;
while ((length = fis.read(bytes)) >= 0) {
zos.write(bytes, 0, length);
}
zos.closeEntry();
fis.close();
}
Here i have folder(ZipFilesFolder) in that it consist of 10 zip files say one.zip,two.zip,three.zip..ten.zip,i'm passing file every time from this folder to zipFileToUnzip as zipFilename.I need the result in the same folder(ZipFilesFolder)i need to unzip those files and instead of one.zip,two.zip,..one,two,three folder has to visible.
public static void zipFileToUnzip(File zipFilename) throws IOException {
try {
//String destinationname = "D:\\XYZ";
byte[] buf = new byte[1024];
ZipInputStream zipinputstream = null;
ZipEntry zipentry;
zipinputstream = new ZipInputStream(new FileInputStream(zipFilename));
zipentry = zipinputstream.getNextEntry();
while (zipentry != null) {
//for each entry to be extracted
String entryName = zipentry.getName();
System.out.println("entryname " + entryName);
int n;
FileOutputStream fileoutputstream;
File newFile = new File(entryName);
String directory = newFile.getParent();
if (directory == null) {
if (newFile.isDirectory()) {
break;
}
}
fileoutputstream = new FileOutputStream(
destinationname + entryName);
while ((n = zipinputstream.read(buf, 0, 1024)) > -1) {
fileoutputstream.write(buf, 0, n);
}
fileoutputstream.close();
zipinputstream.closeEntry();
zipentry = zipinputstream.getNextEntry();
}//while
zipinputstream.close();
} catch (IOException e) {
}
}
This is my code ,but it is not working,could anybody help me,how to get desired output.
There are a couple of problems with your code:
it does not compile since destinationname is commented, but referenced when opening the FileOutputStream
IOExceptions are caught and ignored. If you throw them you would get error messages that could help you diagnose the problem
when opening the FileOutputStream, you just concatenate two strings without adding a path-separator in between.
if the file to be created is in a directory, the directory is not created and thus FileOutputStream cannot create the file.
streams are not closed when exceptions occur.
If you do not mind using guava, which simplifies life when it comes to copying streams to files, you could use this code instead:
public static void unzipFile(File zipFile) throws IOException {
File destDir = new File(zipFile.getParentFile(), Files.getNameWithoutExtension(zipFile.getName()));
try(ZipInputStream zipStream = new ZipInputStream(new FileInputStream(zipFile))) {
ZipEntry zipEntry = zipStream.getNextEntry();
if(zipEntry == null) throw new IOException("Empty or no zip-file");
while(zipEntry != null) {
File destination = new File(destDir, zipEntry.getName());
if(zipEntry.isDirectory()) {
destination.mkdirs();
} else {
destination.getParentFile().mkdirs();
Files.asByteSink(destination).writeFrom(zipStream);
}
zipEntry = zipStream.getNextEntry();
}
}
}
Alternatively you might also use zip4j, see also this question.
I'm using Apache Commons Compress to create tar archives and decompress them. My problems start with this method:
private void decompressFile(File file) throws IOException {
logger.info("Decompressing " + file.getName());
BufferedOutputStream outputStream = null;
TarArchiveInputStream tarInputStream = null;
try {
tarInputStream = new TarArchiveInputStream(
new FileInputStream(file));
TarArchiveEntry entry;
while ((entry = tarInputStream.getNextTarEntry()) != null) {
if (!entry.isDirectory()) {
File compressedFile = entry.getFile();
File tempFile = File.createTempFile(
compressedFile.getName(), "");
byte[] buffer = new byte[BUFFER_MAX_SIZE];
outputStream = new BufferedOutputStream(
new FileOutputStream(tempFile), BUFFER_MAX_SIZE);
int count = 0;
while ((count = tarInputStream.read(buffer, 0, BUFFER_MAX_SIZE)) != -1) {
outputStream.write(buffer, 0, count);
}
}
deleteFile(file);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (outputStream != null) {
outputStream.flush();
outputStream.close();
}
}
}
Every time I run the code, compressedFile variable is null, but the while loop is iterating over all entries in my test tar.
Could you help me to understand what I'm doing wrong?
From the official documentation
Reading entries from an tar archive:
TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
tarInput.read(content, offset, content.length - offset);
}
I have written an example starting from your implementation and testing with a very trivial .tar (just one entry of text).
Not knowing the exact requirement I just take care of solving the problem of reading the archive avoiding the nullpointer. Debugging, the entry is available as you also have found
private static void decompressFile(File file) throws IOException {
BufferedOutputStream outputStream = null;
TarArchiveInputStream tarInputStream = null;
try {
tarInputStream = new TarArchiveInputStream(
new FileInputStream(file));
TarArchiveEntry entry;
while ((entry = tarInputStream.getNextTarEntry()) != null) {
if (!entry.isDirectory()) {
File compressedFile = entry.getFile();
String name = entry.getName();
int size = 0;
int c;
while (size < entry.getSize()) {
c = tarInputStream.read();
System.out.print((char) c);
size++;
}
(.......)
AS I said: I tested with a tar including only an entry of text (you can also try this approach to verify the code) to be sure that the null is avoided.
You need to make all the needed adaptations for your real needs.
It is clear that you will have to handle streams as in the metacode I posted on top.
It shows how to deal with the single entries.
Try using getNextEntry() method instead of getNextTarEntry() method.
The second method returns a TarArchiveEntry. Probably this is not what you want!