Reading binary file without knowing file format - java

I'm working on a java project and i have to read some files like these:
- EntryID.data
- EntryID.index
- KeyText.data
- KeyText.index
...
I think these files are used in a dictionary project but i can't find a any document about this. How can i read them or know the format of them ? Sorry for my english =.=
Thanks alot!

This looks like files from a database management system. One file to store the data, another one to store at least one index to speed up queries.
I'd start with a hex editor and look at the file. Sometimes, the content binaries gives a hint.
Another idea: look at the classpath and inspect property and resource files. Maybe you'll find a database driver or some config files with jdbc connect strings.
Google told me, that all four files are used by Apple's Dictionary.app. Have a look at this blog, this can point you in the correct direction.
Last note - reading undocumented binaries is a challenge. I usually start with 010 Editor to analyse the datastructure and develop a java based test tool to read the data. It's some sort of try and error evolutionary process.

Well, this is kinda difficult. data could mean anything.
You could try the UNIX utility file or open the file with a hex editor and look for interesting strings (the utility strings is helpful for that too).

Some information is in info.plist.
KeyText.data is sometimes compressed using zlib. 78 9C is well-known zlib-header so you can decompress when you find it. Size of decompressed entry comes before compressed entry.
Size of entry comes before entry of array.
C# library is in https://github.com/kurema/MacDictionaryGeneral. But *.index is too difficult to understand and implement. info.plist says *.index is trie index which is not enough information to understand fully.

Related

How to mark file as already encrypted

I am trying to implement a simple encryption utility for educational pourpuses, it works, at least with simple files, but when I succesfully encrypt one file, i'd like not to encrypt it again, because that could lead to loose my data if i encrypt/decrypt it in a wrong way... Is there a way to prevent me from do it?
I am using java, and the default encryption library.
Thanks in advance
The answer to what you want to know here depends very much on how your encrypting the files in the first place.
I'll list a couple of different approaches that might help you however
Approach 1 - Scripting
If your using a 3rd party tool such as an encryption util written by another java programmer, and if your running this tool in some kind of a shell session, your best bet might be to wrap the invocation of said tool in a shell script.
If your running on windows this could be a batch file, on Linux a bash script.
Essentially you use this approach by working out ahead of time what command you need to use, then putting that command into said shell script while substituting any parameters you need to change.
Following on from the wrapped command you could then provide further commands to rename the file, or embed some kind of information in it's file properties or file name, a possible example might be something like:
IF NOT EXIST %%0.encrypted(
encrypt %%0 -a -b -c -d
rename %%0 %%0.encrypted) ELSE (
)
ELSE
()
NOTE: These are just theoretical examples as I don't know what your OS is
If this was saved in a file called 'myencrypt.bat', then you could just type
myencrypt.bat afile.ext
Approach 2 - Custom Bytes
If you have direct control of the source code and consequently the application that performs this encryption, then why not make a pseudo file format.
Add some kind of a marker into the file that your program then checks for.
By way of an example, you could perhaps
add the following string to the front of the file
ENCFOriginalFile.Ext
Turn that into a set of bytes, then load the file in, encrypt it, add the bytes from the text on and save it back out, maybe with a custom file extension.
When you come to encrypt a file again, all you then need to do is read the first 4 bytes and if they are equal to ENCF you know that the file is already encrypted.
Those are just 2 ideas I can think of off the top of my head, but it's late here and I'm tired. If I was more awake I could probably come up with a whole page full.
Since it is encrypted it cannot be opened in the default program for that file type, so you can savely rename the file. This can be done for example by adding .enc as the extension. Doing so will make it easy to spot the encrypted file for you and your java application.
Depending on your use case you can also let your java application manage a database of encrypted files.

Embed datafile in java code

I have a dictionary file that is being used for word matching, the java code is to be submitted online and get executed.(for a online coding competition)
How would I be able to use the dictionary data file, while my program executes online.
could it be embedded in the source code as compressed byte stream?
please suggest
There are multiple ways to achieve this:
either refer to the dictionairy file as a remote resource in your code. This means that you ll most your dictionary file on a different online location which is well known by your online application code. You can then download the dictionary file and cache the file in memory for usage
You can encode the dictionary file (for instance in Base64 encoding - to take care of special characters in the dictonary file) as a predefined datastructure / buffer in your code. This means however that you need to convert your dictionary file & rebuild your application each time you adapt the dictionary file.
Pointing to a different "online" location would seem to more suitable solution.

Reading lyrics information from a .kar file in java

I want to develop a karaoke player in Java that works with kar files. I got it to play the song, but I coundn't make it read the lyric information from the file. I've searched a lot, but I coundn't find any clue about how kar files work.
How can I do it? I appreciate some example.
Thanks.
I don't think there is a Java library for that so you'll have to write it on your own code.
First, you need to know the inner structure of a .kar file. Since these files are so small, they're probably some kind of text files. Try opening one of them with your text editor and see what it looks like. Then you'll know how to process it.
Apologies as I don't have a definitive answer for you. I have seen several discussions about this but no real solid solutions. Here are a couple links to others' discussions on the topic. It seems to be a much more complicated task than one might imagine, as it is an uncommonly used file type. From what I know, it consists of MIDI data, albeit with slightly different meta data.
Reading lyrics information from a .kar file
How to read MIDI file in C#?
Can you open the .kar file in a text editor? What does it give you?
The best I could find is this:
"KAR"
Origin: The Company Tune 1000 A file kar (kar) is in fact, a midi
file, but whose words are standardized in events meta of the type
TEXT. The texts starting with # are additional indications compared to
the words. Example:
#L specifies the language of the words
#I any additional information
#T information of title
#KMIDI KARAOKE SPINS information of copyright
and type of file
Several lines of titles and information can be present. KaraWin
extracts information from titles to post them. The text even has a
very simple format to him: \ indicates a page break, /a return
indicates has the line
Source.
Besides that, according to Wikipedia, XBMC is an open source program that supports kar files. Since it's an open source project, you can download its code. If you are really interested, you can try searching among its 10k+ files (in C, not Java), how they do it. Maybe you can ask them or in their forum for a little bit of guidance.
You can also try in this yahoo group about karaoke software.

How to write/change a visio file with java (jacob)

I wasn't able to find anything useful about how to write a visio file with java, and probably looked through every article on the web.
If found a lot of stuff about doing it with c#, vb, and other stuff but couldn't find what I need.
I also found this, but I am not sure if this (reverse engineering) could do what I need >>How to generate MS Visio diagram automatically?<<. Because I really want to create a vsd file with my java application write a lot of data into it (from an xml file for example) and open it afterwards with visio to check and correct that data.
It took me a while to find this
http://www.java2s.com/Open-Source/Java/Development/jacob-1.15/com/jacob/samples/visio/Catalogvisio.htm <<
which at least allowed me to open a visio file.
But I still need to know how to write into that file and save it afterwards.
So in short, my acutal attempt is to use jacob, but I don't know how to do stuff like "add block at position x,y" with that com bridge. Is there a docu for that stuff?
any help would be appreciated
thanks

Massive multiprogramming and read-only file access

I am trying to create a dictionary-based tagger running on a Hadoop cluster using Pig. Basically, what it does, is for each document (quite large text documents, up to a few MBs) to run each word in each sentence against the dictionary to read the corresponding value.
There will be up to a few hundred java programs (not threads) running in parallel, using the dictionary file in read-only mode. The idea is to load the dictionary from text and create a Map to query against it.
Question: what should I be prepared for? Is it even remotely logic to want to read a file in a multiprogramming environment or should I first copy the (relatively small) file for each instance of the program? Is a BufferedReader something I should use while reading the file?
There is very little structured documentation on multiprogramming (compared to multithreading) so I am a bit afraid of running against a wall by doing so.
Note: you are only allowed to answer that my way of thinking is totally wrong if you provide me with a better way ;-)
I think your approach is fine. You should load your dictionary from the DistributedCache to memory, and do the checks with the memory-loaded dictionary (e.g., a HashMap).

Categories