How to decompress an pkAES-256 Deflate encrypted zip files? - java

I need to unzip zip files with Java which are compressed and password secured with following information:
Method: pkAES-256 Deflate
Chraracteristics: 0xD StrongCrypto : Encrypt StrongCrypto
I tried to use zip4j but it always gives me this stacktrace:
net.lingala.zip4j.exception.ZipException: java.io.IOException: java.util.zip.DataFormatException: invalid code lengths set
at net.lingala.zip4j.tasks.AsyncZipTask.performTaskWithErrorHandling(AsyncZipTask.java:51)
at net.lingala.zip4j.tasks.AsyncZipTask.execute(AsyncZipTask.java:38)
at net.lingala.zip4j.ZipFile.extractFile(ZipFile.java:494)
at net.lingala.zip4j.ZipFile.extractFile(ZipFile.java:460)
at Main.main(Main.java:29)
Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid code lengths set
at net.lingala.zip4j.io.inputstream.InflaterInputStream.read(InflaterInputStream.java:55)
at net.lingala.zip4j.io.inputstream.ZipInputStream.read(ZipInputStream.java:141)
at net.lingala.zip4j.io.inputstream.ZipInputStream.read(ZipInputStream.java:121)
at net.lingala.zip4j.tasks.AbstractExtractFileTask.unzipFile(AbstractExtractFileTask.java:82)
at net.lingala.zip4j.tasks.AbstractExtractFileTask.extractFile(AbstractExtractFileTask.java:64)
at net.lingala.zip4j.tasks.ExtractFileTask.executeTask(ExtractFileTask.java:39)
at net.lingala.zip4j.tasks.ExtractFileTask.executeTask(ExtractFileTask.java:21)
at net.lingala.zip4j.tasks.AsyncZipTask.performTaskWithErrorHandling(AsyncZipTask.java:44)
... 4 more
Caused by: java.util.zip.DataFormatException: invalid code lengths set
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
at net.lingala.zip4j.io.inputstream.InflaterInputStream.read(InflaterInputStream.java:45)
... 11 more
Does anybody knows how to deal with such an encryption? I can only open these files with 7zip - but I need to do that with Java.
Thank you for your help.

The ZIP file format, at least, the one that is universally understood and supported by tons of libraries, only supports one kind of encryption; it is called 'ZipCrypto', it is of dubious quality (it's not completely broken, but it's rather easy to end up in a scenario where someone who shouldn't be able to read that zip file will figure it out. It is for example quite easy to try tons of passwords, so if the password is a simple dictionary word, it's mostly useless). This is the crypto you get when you run zip -c on the command line for just about every distribution of the 'zip' executable.
WinZip added, all on its own, an extension to the ZIP format called StrongCrypto which is AES-256 based. It sounds like you have that.
zip is more or less public domain (it's tricky; PKWare as a company more or less owns various parts of it, but nevertheless, e.g. the /bin/unzip command in your linux distro is fully open source, legally the fate of zip is somewhat tricky to explain)... so when winzip, on its own, just adds features to the zip concept, that was quite idiotic: Neither the open source community at large, nor PKWare, would agree to this random flyby upgrade, so for a long while, these 'WinZip based strongcrypto zip files that end in .zip' just weren't zip files, and if that's confusing, the blame falls entirely on WinZip, Inc.'s shoulders. What you have just isnt a zip file, even if it looks like one.
However, since then, at least WinZip and PKWare now reached an agreement and they can decrypt each other's stronger crypto offerings. However, the open source community has mostly washed its hands and doesn't consider these strongcrypto options as 'zip files'. That explains why the library you have cannot decrypt this file, and probably never will.
Thus, because of this mess entirely due to PKWare and WinZip's shenanigans: if you want to encrypt a zip file, I STRONGLY suggest you don't use zip's built in stuff (neither ZipCrypto which is bad, nor StrongCrypto which is badly supported), but to just zip as normal with no encryption, and then encrypt the resulting file (and then don't name that file foo.zip, as it is no longer a zip file. foo.zip.enc would be a better name).
If you're stuck on this, and there is no possibility to change the format of the file being sent, you need 7zip. 7zip is open source and can probably decrypt this file, whereas most open source 'zip' libraries can't. A big problem is that there is no all-java 7zip impl that I'm aware of. There is the 7zip-binding project, which just farms out the work to a C library, which means you need a so-called 'native' file with your java project (a DLL on windows, a .SO file on linux, and a .JNILIB file on mac), and you need one such file for every architecture/OS combo you want to support. Kinda painful, it ruins the 'write once run anywhere' promise of java, but it's what you'd have to do. The site looks like it's old enough to order beers, but as far as I know it is being maintained, so there's that. But, seriously, don't use zip's built in encryption stuff, it sucks. Try to avoid it.
NB: The reason 7zip can do it is difference of opinion: the open source communities supporting plain zip endeavour to keep it simple to ensure as many platforms can do it, which is probably why there are various all-java zip impls around. 7zip tries to go for awesome support, at the cost of making it a lot harder to port 7zip around, which is probably why there isn't an all-java 7zip impl, only a binding. As a consequence, 7zip is willing to try to figure out how to decrypt this winzip stuff, plain zip isn't.

Related

Get RandomAccessFile from JAR archive

Summary:
I have a program I want to ship as a single jar file.
It depends on three big resource files (700MB each) in a binary format. The file content can easily be accessed via indexing, my parser therefore reads these files as RandomAccessFile-objects.
So my goal is to access resource files from a jar via File objects.
My problem:
When accessing the resource files from my file system, there is no issue, but I aim to pack them into the jar file of the program, so the user does not need to handle these files themselves.
The only way I found so far to access a file packed in a jar is via InputStream (generated by class.getResourceAsStream()), which is totally useless for my application as it would be much too slow reading these files from start to end instead of using RandomAccessFile.
Copying the file content into a file, reading it and deleting it in runtime is no option eigher for the same reason.
Can someone confirm that there is no way to achieve my goal or provide a solution (or a hint so I can work it out myself)?
What I found so far:
I found this answer and if I understand the answer it says that there is no way to solve my problem:
Resources in a .jar file are not files in the sense that the OS can access them directly via normal file access APIs.
And since java.io.File represents exactly that kind of file (i.e. a thing that looks like a file to the OS), it can't be used to refer to anything in a .jar file.
A possible workaround is to extract the resource to a temporary file and refer to that with a File.
I think I can follow the reasoning behind it, but it is over eight years old now and while I am not very educated when it comes to file systems and archives, I know that the Java language has evolved quite much since then, so maybe there is hope? :)
Probably useless background information:
The files are genomes in the 2bit format and I use the TwoBitParser from biojava via the wrapper class TwoBitFacade?. The Javadocs can be found here and here.
Resources are not files, and they live in a JAR file, which is not a random access medium.

How to mark file as already encrypted

I am trying to implement a simple encryption utility for educational pourpuses, it works, at least with simple files, but when I succesfully encrypt one file, i'd like not to encrypt it again, because that could lead to loose my data if i encrypt/decrypt it in a wrong way... Is there a way to prevent me from do it?
I am using java, and the default encryption library.
Thanks in advance
The answer to what you want to know here depends very much on how your encrypting the files in the first place.
I'll list a couple of different approaches that might help you however
Approach 1 - Scripting
If your using a 3rd party tool such as an encryption util written by another java programmer, and if your running this tool in some kind of a shell session, your best bet might be to wrap the invocation of said tool in a shell script.
If your running on windows this could be a batch file, on Linux a bash script.
Essentially you use this approach by working out ahead of time what command you need to use, then putting that command into said shell script while substituting any parameters you need to change.
Following on from the wrapped command you could then provide further commands to rename the file, or embed some kind of information in it's file properties or file name, a possible example might be something like:
IF NOT EXIST %%0.encrypted(
encrypt %%0 -a -b -c -d
rename %%0 %%0.encrypted) ELSE (
)
ELSE
()
NOTE: These are just theoretical examples as I don't know what your OS is
If this was saved in a file called 'myencrypt.bat', then you could just type
myencrypt.bat afile.ext
Approach 2 - Custom Bytes
If you have direct control of the source code and consequently the application that performs this encryption, then why not make a pseudo file format.
Add some kind of a marker into the file that your program then checks for.
By way of an example, you could perhaps
add the following string to the front of the file
ENCFOriginalFile.Ext
Turn that into a set of bytes, then load the file in, encrypt it, add the bytes from the text on and save it back out, maybe with a custom file extension.
When you come to encrypt a file again, all you then need to do is read the first 4 bytes and if they are equal to ENCF you know that the file is already encrypted.
Those are just 2 ideas I can think of off the top of my head, but it's late here and I'm tired. If I was more awake I could probably come up with a whole page full.
Since it is encrypted it cannot be opened in the default program for that file type, so you can savely rename the file. This can be done for example by adding .enc as the extension. Doing so will make it easy to spot the encrypted file for you and your java application.
Depending on your use case you can also let your java application manage a database of encrypted files.

Verifying integrity of documents

What are the steps to verify integrity of these documents ? doc,docx,docm,odt,rtf,pdf,odf,odp,xls,xlsx,xlsm,ppt,pptm
Or at least of some of them. Usually when uploaded to a content repository.
I guess that inputStream is always 99,99% read properly from MultiPart http request otherwise exception would be thrown and action taken. But user can upload already corrupted file - do I use third party libraries for checking that? I didn't see anything like that in odftoolkit, itextpdf, pdfbox, apache poi or tika
There are many kinds of "corrupt".
Some corruptions should be easy to detect. For instance a truncated ODF file will most likely fail when you attempt to open it because the ZIP reader can't read it.
Others will be literally impossible to detect. For instance a one character corruption in an RTF file will be undetectable, and so (I think) will most RTF file truncations.
I'd be surprised if you found a single (free) tool to do this job for all of those file types, even to the extent that it is technically possible. The current generation of open source libraries for reading / writing document formats tend to focus on one family of formats only. If you are serious about this, you probably need to use a commercial library.
For all of the above listed file formats there are 3rd-party libraries which can open etc. - I don't know of a "verification only" but I think being able to open them without exceptions etc. is at least a basic check that the file is within the specified format... One such (commercial) library is Aspose - not affiliated, just a happy customer...
You can do checksums/hashes (that is, a secure hash) of the file before uploading, then upload the checksum separately. If the subsequently downloaded file has the same checksum, it has not been changed (to a certain high probability, depending on the checksum/hash used) from the original.
Go to check LibreOffice project (that already handles these archives), it has parts written in Java, and for sure you could find and use their mecanisms to check for corrupted files.
I think you can get the code from here:
http://www.libreoffice.org/get-involved/developers/

Android: .java files readable from .apk file?

I'm currently developing an application for a company which includes livescoring. The XML-files I access (from the net like: "http://company.com/files/xml/livescoring.xml") are not intended to be public and should only known to me.
I was wondering if it is possible for anyone to decode the .apk file and read my original .java files (which include the link to the XML files).
So, I renamed the .apk file to .zip and could access the "classes.dex", which seemed to include the .java files (or classes). Googling led me to a tool named "AvaBoxV2" which decoded this "classes.dex" file. Now I have a folder including an "out" folder where files named .smali exist. I opend one of these with an editor and finally there is the link to the xml file. Not good. :(
Is there a way to encrypt my app or the classes.dex file? I don't want to tell that company, that anyone can access the original xml-files. Maybe signing the app probably helps?
Also, do you know a really noob-friendly tutorial to prepare apps (signing, versioning,...) for Google Market?
Thanks in advance!
The .java source code is not included in the APK.
It is possible to disassemble the Dalvik bytecode into bytecode mnemonics using a tool like baksmali, but there's no way a user can recover the original .java source.
Furthermore, you can use a tool like proguard (included in the Android SDK) to obfuscate your byte code, making it hard to interpret the behavior of the disassembled bytecode.
You can make small tricks too, like storing the link string in some sort of obfuscated form, and then de-obfuscating it at run-time in your app (a simple example would be to use base 64 encoding, but someone could probably reverse that quickly if they wanted to).
That said, it's pretty trivial for someone to run tcpdump and sniff the network traffic between your device and the server, and get the URL that way, so there's no way to completely prevent anyone from getting this value.
Yeah, its impossible to fully prevent something like this. Its the same on a desktop application, or any other application.
As mentioned, obfuscation will help, but people who are persistent can still get past it. Especially for something like a url like that.
One solution of making it much more tricky for hackers is to use PHP on your webserver and some sort of token system to determine if the request is coming from your app or not... That would get a bit tricky though, so I don't really suggest it.

Reading binary file without knowing file format

I'm working on a java project and i have to read some files like these:
- EntryID.data
- EntryID.index
- KeyText.data
- KeyText.index
...
I think these files are used in a dictionary project but i can't find a any document about this. How can i read them or know the format of them ? Sorry for my english =.=
Thanks alot!
This looks like files from a database management system. One file to store the data, another one to store at least one index to speed up queries.
I'd start with a hex editor and look at the file. Sometimes, the content binaries gives a hint.
Another idea: look at the classpath and inspect property and resource files. Maybe you'll find a database driver or some config files with jdbc connect strings.
Google told me, that all four files are used by Apple's Dictionary.app. Have a look at this blog, this can point you in the correct direction.
Last note - reading undocumented binaries is a challenge. I usually start with 010 Editor to analyse the datastructure and develop a java based test tool to read the data. It's some sort of try and error evolutionary process.
Well, this is kinda difficult. data could mean anything.
You could try the UNIX utility file or open the file with a hex editor and look for interesting strings (the utility strings is helpful for that too).
Some information is in info.plist.
KeyText.data is sometimes compressed using zlib. 78 9C is well-known zlib-header so you can decompress when you find it. Size of decompressed entry comes before compressed entry.
Size of entry comes before entry of array.
C# library is in https://github.com/kurema/MacDictionaryGeneral. But *.index is too difficult to understand and implement. info.plist says *.index is trie index which is not enough information to understand fully.

Categories