RandomAccessFile from ZipEntry (java) - java

I was looking for something about reading zip-archives via RandomAccessFile. So, I found this example: http://www.java2s.com/Code/JavaAPI/java.io/RandomAccessFilereadLine.htm
However it doesn't work for me, it tells that there's no such file or directory, but the file-path is right. Is this example incorrect?
UPDATE: from docs.oracle.com:
RandomAccessFile(String name, String mode)
Creates a random access file stream to read from, and optionally to write to, a file with the specified name.
It's weird that they try to create RAF with entryName as a "name" parameter in this example
There's one more example with the same thing: http://www.java-tips.org/java-se-tips/java.util.zip/how-to-read-files-within-a-zip-file-3.html

I think this is a case where un-vetted code winds up on the internets and causes no end of problems.
There is no way the code in those two examples is going to do anything useful. The only way that code would do anything is if the contents of the zip file had already been extracted into the folder that contains the zip.
Long and short: you can't use RAF with ZipEntry because the ZipEntry refers to a compressed stream. You can't do random access on a stream (unless you decompress the whole thing and buffer the results).
It's really interesting to me how:
a) the code in the java-tips article doesn't follow proper naming conventions for Java
b) the code in both articles is astoundingly similar
Here's sample code that shows how to properly use ZipInputStream

With the NIO.2 File API (Java 7) working with zip files becomes much easier.
Try (untested):
try (FileSystem zipFS = FileSystems.newFileSystem(URI.create("jar:" + zipURI), Map.of())) {
Path targetInZipPath = zipFS.getPath(targetInZipPathString);
// do something here
}
Read more about the ZIP filesystem (JDK module jdk.zipfs) here: https://docs.oracle.com/en/java/javase/17/docs/api/jdk.zipfs/module-summary.html

Related

Thread.currentThread().getContextClassLoader().getResourceAsStream reads a property file multiple times

I have a property file named sysconfig.properties, I want to read it multiple times, because it is mutable.But I found when I changed the content of the sysconfig.properties then I read the content that is imutable, which is the same with the first time I read from the systemconfig.properties file.The content of the sysconfig.propertes file as follows:
isInitSuccess=TRUE
isStartValid=2013
May be sometime it will been changed as follows:
isInitSuccess=FALSE
isStartValid=2013
The code of read the properties file as follows:
InputStream inStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(filePath);
I use the code read the file mutilple times, but every time the "isInitSuccess" is "TRUE", even though I changed the isInitSuccess=FALSE.Is the system just read it one time, then I read the file, it just get the input stream from the memory?
But when I use the code below, it will work fine:
InputStream inStream = new FileInputStream(new File(strPath));
I googled, but I did not find any help, the problem confused me a lot, any help would be appreciate.
You need to read up on what the classpath is.
In short, Java has a concept of classpath which includes all the resources (.class files, .properties files, and anything really) it needs to run. When you use ClassLoader#getResourceAsStream(String), you're actually getting the InputStream of a classpath resource. This resource can be a physical resource on disk or it can be in an archive.
When you use a FileInputStream, you are getting the InputStream of a file on disk.
The InputStream from the ClassLoader and the one from the FileInputStream do not correspond to the same file.
You should read up on how your IDE (or whatever build system) handles your files.

handling about 450.000 files in a zip

My question is simple. Would Java handle a .zip file with about 450,000 files in there? The code that I wrote would not load all of the files, just one specific file would be searched in the zip, and be read line by line. The file size is about 500kb.
Would this work or will I get an OutOfMemory Exception?
Oh sry, uncompressed there about 0,5MB. Zipped are they whole files about 250mb.
Ok, the name of the Files are IDs + Date(unique) in that zip file. If i have to check a log, ill call Java and give the ID + Date and Java is reading just that one file, never more.
Edit: It works, it works very well. About 400.000 files in a zip, if u have the Memory to Zip the Files works without any problem.
Edit2: It works on Linux Filesystems witout a problem, on NTFS sometimes it crashed. NTFS has a problem with that musch files in 1 Zip.
Using the zip filesystem in Java 7, you can actually access one individual file pretty easily and open a BufferedReader on it.
First you have to create the FileSystem:
public static FileSystem getZipFileSystem(final String zipPath)
{
final Path path = Paths.get(zipPath).toAbsolutePath();
final Map<String, Object> env = new HashMap<>();
final URI uri = URI.create("jar:file:" + path.toString());
return FileSystems.newFileSystem(uri, env, null);
}
Once you have done that, you can create a BufferedReader from an entry in the zip itself:
try (
final FileSystem fs = getZipFileSystem("/path/to/the.zip");
final BufferedReader reader = Files.newBufferedReader(fs.getPath("path/to/entry"),
StandardCharsets.UTF_8);
) {
// operate on the reader
}
You could also read all lines in the entry at once using Files.readAllLines().
If you wish to copy a zip entry to a file on the filesystem, you can also do that:
Files.copy(zipfs.getPath("path/to/entry"), Paths.get("file/on/local/fs"));
Or you can directly copy the result to an OutputStream, or directly create an entry from an OutputStream...
Or even walk the entire zip using Files.walkFileTree().
Or get all the entries in a "directory" in a zip using Files.newDirectoryStream(). Note that as its name says, this is a stream; unlike File.listFiles() (which only works on files on disk anyway), this returns a iterator over the entries.
Or... Or... Or...
Note that a FileSystem needs to be .close()d.
I'm not sure that I understand what you're trying to do.
If it's 0.5 MB/file and 450,000 files, you'll need 225GB. You won't have enough memory to do all this in a single zip in memory even if you get 90% compression.
I'd recommend breaking it into manageable chunks. You'll be able to parallelize that way too, so it's not a bad idea.

tempfile gives wrong absolutepath after renameto

I'm changing contents of a file, therefore I read a file line by line, replace what I want and write line by line to a tempfiles. When the whole file is processed, I delete the original file, and rename the tempfile to the original filename.
like this
File orginialFile = new File("C:\\java\\workspace\\original.xml");
File tempFile = File.createTempFile("tempfile", ".tmp", new File(C:\\java\\workspace\\"));
while ((str_String = reader.readLine()) != null) {
//read lines and replace and write lines
}
orginialFile .delete();
tempFile.renameTo(new File("C:\\java\\workspace\\original.xml"));
After this is done, I request the absolutepath (tempFile.getAbsolutePath();) of the temp file. But this gives me
c:\java\workspace\tempfile3729727953777802965.tmp (the number changes every run of the program) in stead of c:\java\workspace\original.xml
How come?
I debugged it and just before the moment that I request the absolutepath, I checked in c:\java\workspace (windows explorer) and there is no tempfile. Only original file.
So the process runs correctly, I just wanted to know why it is not showing the renamed absolutepath. (I would use it for logging)
Thx
In the documentation of java.io.File, before the Interoperability with java.nio.file package:
Instances of the File class are immutable; that is, once created, the abstract pathname represented by a File object will never change.
So it won't show the renamed absolutepath.
There is a missing reader.close() before the delete. Likely edited out for us. Also you can do:
tempFile.renameTo(originialFile);
Have you checked the return value from renameTo()? I suspect it to be false.
Also pay attention to the api documentation. It states that a lot of things can go wrong - e.g. moving between file systems.
You might be better off with Files.move

Xuggle can't open in-memory input

I am working on a program that integrates Hadoop's MapReduce framework with Xuggle. For that, I am implementing a IURLProtocolHandlerFactory class that reads and writes from and to in-memory Hadoop data objects.
You can see the relevant code here:
https://gist.github.com/4191668
The idea is to register each BytesWritable object in the IURLProtocolHandlerFactory class with a UUID so that when I later refer to that name while opening the file it returns a IURLProtocolHandler instance that is attached to that BytesWritable object and I can read and write from and to memory.
The problem is that I get an exception like this:
java.lang.RuntimeException: could not open: byteswritable:d68ce8fa-c56d-4ff5-bade-a4cfb3f666fe
at com.xuggle.mediatool.MediaReader.open(MediaReader.java:637)
(see also under the posted link)
When debugging I see that the objects are correctly found in the factory, what's more, they are even being read from in the protocol handler. If I remove the listeners from/to the output file, the same happens, so the problem is already with the input. Digging deeper in the code of Xuggle I reach the JNI code (which tries to open the file) and I can't get further than this. This apparently returns an error code.
XugglerJNI.IContainer_open__SWIG_0
I would really appreciate some hint where to go next, how should I continue debugging. Maybe my implementation has a flaw, but I can't see it.
I think the problem you are running into is that a lot of the types of inputs/outputs are converted to a native file descriptor in the IContainer JNI code, but the thing you are passing cannot be converted. It may not be possible to create your own IURLProtocolHandler in this way, because it would, after a trip through XuggleIO.map(), just end up calling IContainer again and then into the IContainer JNI code which will probably try to get a native file descriptor and call avio_open().
However, there may be a couple of things that you can open in IContainer which are not files/have no file descriptors, and which would be handled correctly. The things you can open can be seen in the IContainer code, namely java.io.DataOutput and java.io.DataOutputStream (and the corresponding inputs). I recommend making your DataInput/DataOutput implementation which wraps around BytesReadable/BytesWriteable, and opening it in IContainer.
If that doesn't work, then write your inputs to a temp file and read the outputs from a temp file :)
You can copy file to local first and then try open the container:
filePath = split.getPath();
final FileSystem fileSystem = filePath.getFileSystem(job);
Path localFile = new Path(filePath.getName());
fileSystem.createNewFile(localFile);
fileSystem.copyToLocalFile(filePath, localFile);
int result = container.open(filePath.getName(), IContainer.Type.READ, null);
This code works for me in the RecordReader class.
In your case you may copy the file to local first and then try to create the MediaReader

decompress .gz file in batch

I have 100 of .gz files which I need to de-compress.
I have couple of questions
a) I am using the code given at http://www.roseindia.net/java/beginners/JavaUncompress.shtml to decompress the .gz file. Its working fine.
Quest:- is there a way to get the file name of the zipped file. I know that Zip class of Java gives of enumeration of entery file to work upon. This can give me the filename, size etc stored in .zip file. But, do we have the same for .gz files or does the file name is same as filename.gz with .gz removed.
b) is there another elegant way to decompress .gz file by calling the utility function in the java code. Like calling 7-zip application from your java class. Then, I don't have to worry about input/output stream.
Thanks in advance.
Kapil
a) Zip is an archive format, while gzip is not. So an entry iterator does not make much sense unless (for example) your gz-files are compressed tar files. What you want is probably:
File outFile = new File(infile.getParent(), infile.getName().replaceAll("\\.gz$", ""));
b) Do you only want to uncompress the files? If not you may be ok with using GZIPInputStream and read the files directly, i.e. without intermediate decompression.
But ok. Let's say you really only want to uncompress the files. If so, you could probably use this:
public static File unGzip(File infile, boolean deleteGzipfileOnSuccess) throws IOException {
GZIPInputStream gin = new GZIPInputStream(new FileInputStream(infile));
FileOutputStream fos = null;
try {
File outFile = new File(infile.getParent(), infile.getName().replaceAll("\\.gz$", ""));
fos = new FileOutputStream(outFile);
byte[] buf = new byte[100000];
int len;
while ((len = gin.read(buf)) > 0) {
fos.write(buf, 0, len);
}
fos.close();
if (deleteGzipfileOnSuccess) {
infile.delete();
}
return outFile;
} finally {
if (gin != null) {
gin.close();
}
if (fos != null) {
fos.close();
}
}
}
Regarding A, the gunzip command creates an uncompressed file with the original name minus the .gz suffix. See the man page.
Regarding B, Do you need gunzip specifically, or will another compression algorithm do? There's a java port of the LZMA compression algorithm used by 7zip to create .7z files, but it will not handle .gz files.
If you have a fixed number of files to decompress once, why don't you use existing tools for that?
As Paul Morie noticed, gunzip can do that:
for i in *.gz; do gunzip $i; done
And it would automatically name them, stripping .gz$
On windows, try winrar, probably, or gunzip from http://unxutils.sf.net
GZip is normally used only on single files, so it generally does not contain information about individual files. To bundle multiple files into one compressed archive, they are first combined into an uncompressed Tar file (with info about individual contents), and then compressed as a single file. This combination is called a Tarball.
There are libraries to extract the individual file info from a Tar, just as with ZipEntries. One example. You will first have to extract the .gz file into a temporary file in order to use it, or at least feed the GZipInputStream into the Tar library.
You may also call 7-Zip from the command line using Java. 7-Zip command-line syntax is here: 7-Zip Command Line Syntax. Example of calling the command shell from Java: Executing shell commands in Java. You will have to call 7-Zip twice: once to extract the Tar from the .tar.gz or .tgz file, and again to extract the individual files from the Tar.
Or, you could just do the easy thing and write a brief shell script or batch file to do your decompression. There's no reason to hammer a square peg in a round hole -- this is what batch files are made for. As a bonus, you can also feed them parameters, reducing the complexity of a java command line execution considerably, while still letting java control execution.
Have you tried
gunzip *.gz
.gz files (gzipped) can store the filename of a compressed file. So for example FuBar.doc can be saved inside myDocument.gz and with appropriate uncompression, the file can be restored to the filename FuBar.doc. Unfortunately, java.util.zip.GZIPInputStream does not support any way of reading the filename even if it is stored inside the archive.

Categories