I am storing large amounts of information inside of text files that are written via java. I have two questions relating to this:
Is there any efficiency boost to writing in binary or bytecode over Strings?
What would I use to write the data type into a file.
I already have a setup based around Strings, but I want to compare and at least know how to write the file in bytecode or binary.
When I read in the file, it will be translated into Strings again, but according to my reasearch if I write the file straight into bytecode it removes the added process on both ends of translating between Strings and code both for writing the file and for reading it.
cHao has a good point about just using Strings anyway, but I am still interested in the how if how to write varied data types in the file.
In other words, can I still use the FileReader and BufferedReader to read and translate back to Strings, or is there another thing to use. Also using a BinaryWriter, is it still just the FileWriter class that I use???
If you want to write it in "binary", and you want to save space, why not just zip it using the jdk? Meets all your requirements.
Related
How to change the part of content of the file, starting at specific character, without reading and writing whole file?
Use java.io.RandomAccessFile class. You can seek() to an arbitrary position in the file and then read or write from/to there. Try looking at writeUTF(String) for writing text, and getFilePointer() for remembering position in the file. Unfortunately, there is no easy way to "insert" text as you would do it in an editor, instead the contents are always "overwritten".
Also, FileWriter and FileOutputStream support append-mode, which you can use for appending extra data to the end of the file without rewriting it. But if you need to change things in the middle, you have to use random access file.
check out the scanner class
it makes it easier to read and parse strings and primitive types using regular expressions.
I've to make a code to upload/download a file on remote machine. But when i upload the file new line is not saved as well as it automatically inserts some binary characters. Also I'm not able to save the file in its actual format, I've to save it as "filename.ser". I'm using serialization-deserialization concept of java.
Thanks in advance.
How exactly are you transmitting the files? If you're using implementations of InputStream and OutputStream, they work on a byte-by-byte level so you should end up with a binary-equal output.
If you're using implementations of Reader and Writer, they convert the bytes to characters according to some character mapping, and then perform the reverse process when saving. Depending on the platform encodings of the various machines (and possibly other effects if you're not specifying the charset explicitly), you could well end up with differences in the binary file.
The fact that you mention newlines makes me think that you're using Readers to send strings (and possibly that you're stitching the strings back together yourself by manually adding newlines). If you want the files to be binary equal, then send them as a stream of bytes and store that stream verbatim. If you want them to be equal as strings in a given character set, then use Readers and Writers but specify the character set explicitly. If you want them to be transmitted as strings in the platform default set (not very useful), then accept that they're not going to be binary equal as files.
(Also, your question really doesn't provide much information to solve it. To me, it basically reads "I wrote some code to do X, and it doesn't work. Where did I go wrong?" You seem to assume that your code is correct by not listing it, but at the same time recognise that it's not...)
When reading zipfiles (using Java ZipInputStream or any other library) from an unknown source is there any way of detecting which entries are "character data" (and if so the encoding) or "binary data". And, if binary, any way of determining any more information (MIME types, etc.)
EDIT does the ByteOrderMark (BOM) occur in zipentries and if so do we have to make special operations for it.
It basically boils down to heuristics for determining the contents of files. For instance, for text files (ASCII) it should be possible to make a fairly good guess by checking the range of byte values used in the file -- although this will never be completely fool-proof.
You should try to limit the classes of file types you want to identify, e.g. is it enough to discern between "text data" and "binary data" ? If so you should be able to get a fairly high success rate for detection.
For UNIX systems, there is always the file command which tries to identify file types based on (mostly) content.
Maybe implement a Java component that is capable of applying the rules defined in /usr/share/file/magic. I would love to have something like that. (You would basically have to be able to look at the first x couple of bytes.)
I'm writing arbitrary byte arrays (mock virus signatures of 32 bytes) into arbitrary files, and I need code to overwrite a specific file given an offset into the file. My specific question is: is there source code/libraries that I can use to perform this particular task?
I've had this problem with Python file manipulation as well. I'm looking for a set of functions that can kill a line, cut/copy/paste, etc. My assumptions are that these are extremely common tasks, and I couldn't find it in the Java API nor my google searches.
Sorry for not RTFM well; I haven't come across any information, and I've been looking for a while now.
Maybe you are looking for something like the RandomAccessFile class in the standard Java JDK. It supports reads and writes at some offset, as well as byte arrays.
Java's RandomAccessFile is exactly what you want.
It includes methods like seek(long) that allow you to move wherever you need in the file. It also allows for reading and writing at the same time.
As far as I know, Java has primarily lower level functions for manipulating files directly. Here is the best I've come up with
The actions you describe are standard in the Swing world, and for text comes down to manipulating a Document object. These act on data in memory. The class java.nio.channels.FileChannel has similar methods that act directly on a file. Neither fine the end of lines automatically, but other classes in java.io and java.nio do.
Apache Commons has a sandbox library called Flatfile which looks like it does what you want. The problem is that no code has been released yet. You may, however, want to talk to people working on it to get some more ideas. I didn't do a general check on libraries.
Have you looked into File/FileReader/FileWriter/BufferedReader? You can get the contents of the files and manipulate it as you like, you can search the data in the files, you can overwrite files, create new, append to an existing....
I am not sure this is exactly what you are asking for but I use these APIs all the time for logging, RTF editors, text file creation for email, and many other things.
As far as cut/copy/past goes, I have not come across the ability to do that directly, however, you can output the contents of the file and "copy" what part of it you want and "paste" it into a new file, or append it to an existing.
While writing a byte array to a file is a common task, writing to a give file 32-bytes byte array just once is just not something you are going to find in java.io :)
To get started, would the below method and comments look reasonable to you? I bet someone here, maybe even myself, could whip it out quick like.
public static void writeFauxVirusSignature(File file, byte[] bytes, long offset) {
//open file
//move to offset
//write bytes
//close file
}
Questions:
How big could the potential target files be?
Do you need performance?
I ask because clean, easy to read code would use Apache Commons lib's, but large file writes in a performance sensitive environment will necessitate using java.nio libraries
Has anybody written any classes for reading and writing Palm Database (PDB) files in Java? (I mean on a server, not on the Palm device itself.) I tried to google, but all I got were Protein Data Bank references.
I wrote a Perl program that does it using Palm::PDB.pm, but I want to turn it into a servlet for a GWT app.
The jSyncManager project at http://www.jsyncmanager.org/ is under the LGPL and includes classes to read and write PDB files -- look in jSyncManager/API/Protocol/Util/DLPDatabase.java in its source code. It looks like the core code you need from this could be isolated from the rest of the library with a little effort.
There are a few ways that you can go about this;
Easiest but slowest: Find a perl-> java bridge. This will not be quick, but it will work and it should involve the least amount of work.
Find a C++/C# implementation that you have the source to and convert it (this should be the fastest solution)
Find a Java reader ... there seems to be a few listed under google... however I do not have any experience with these.
Depending on what your intended usage is, you might look into writing a simple reader yourself. The format is pretty simple and you only need to handle a couple of simple fields to parse it.
Basically there is a header for the entire file which has a 2 byte integer at the end which specifies the number of record. So just skip your way through the bytes for all the other fields in the header and then read the last field which is the number of records in the file. Be aware that the PDB format writes integers with most significant byte first.
Following this, there will be a record header for each record, the first field of which is the actual offset into the file for the record itself. Again, be aware of the byte order.
So, now you have the offsets into the file for each record in the file, which should make it very easy to read the actual records as long as you know the format of these for the type of PDB file you are trying to read.
Wikipedia has a nice overview of the header formats.
Maybe JPilot can help? They must have a lot of Java code dealing with Palm OS data.