I'm writing arbitrary byte arrays (mock virus signatures of 32 bytes) into arbitrary files, and I need code to overwrite a specific file given an offset into the file. My specific question is: is there source code/libraries that I can use to perform this particular task?
I've had this problem with Python file manipulation as well. I'm looking for a set of functions that can kill a line, cut/copy/paste, etc. My assumptions are that these are extremely common tasks, and I couldn't find it in the Java API nor my google searches.
Sorry for not RTFM well; I haven't come across any information, and I've been looking for a while now.
Maybe you are looking for something like the RandomAccessFile class in the standard Java JDK. It supports reads and writes at some offset, as well as byte arrays.
Java's RandomAccessFile is exactly what you want.
It includes methods like seek(long) that allow you to move wherever you need in the file. It also allows for reading and writing at the same time.
As far as I know, Java has primarily lower level functions for manipulating files directly. Here is the best I've come up with
The actions you describe are standard in the Swing world, and for text comes down to manipulating a Document object. These act on data in memory. The class java.nio.channels.FileChannel has similar methods that act directly on a file. Neither fine the end of lines automatically, but other classes in java.io and java.nio do.
Apache Commons has a sandbox library called Flatfile which looks like it does what you want. The problem is that no code has been released yet. You may, however, want to talk to people working on it to get some more ideas. I didn't do a general check on libraries.
Have you looked into File/FileReader/FileWriter/BufferedReader? You can get the contents of the files and manipulate it as you like, you can search the data in the files, you can overwrite files, create new, append to an existing....
I am not sure this is exactly what you are asking for but I use these APIs all the time for logging, RTF editors, text file creation for email, and many other things.
As far as cut/copy/past goes, I have not come across the ability to do that directly, however, you can output the contents of the file and "copy" what part of it you want and "paste" it into a new file, or append it to an existing.
While writing a byte array to a file is a common task, writing to a give file 32-bytes byte array just once is just not something you are going to find in java.io :)
To get started, would the below method and comments look reasonable to you? I bet someone here, maybe even myself, could whip it out quick like.
public static void writeFauxVirusSignature(File file, byte[] bytes, long offset) {
//open file
//move to offset
//write bytes
//close file
}
Questions:
How big could the potential target files be?
Do you need performance?
I ask because clean, easy to read code would use Apache Commons lib's, but large file writes in a performance sensitive environment will necessitate using java.nio libraries
Related
I need a xml parser to parse a file that is approximately 1.8 gb.
So the parser should not load all the file to memory.
Any suggestions?
Aside the recommended SAX parsing, you could use the StAX API (kind of a SAX evolution), included in the JDK (package javax.xml.stream ).
StAX Project Home: http://stax.codehaus.org/Home
Brief introduction: http://www.xml.com/pub/a/2003/09/17/stax.html
Javadoc: https://docs.oracle.com/javase/8/docs/api/javax/xml/stream/package-summary.html
Use a SAX based parser that presents you with the contents of the document in a stream of events.
StAX API is easier to deal with compared to SAX. Here is a short tutorial
Try VTD-XML. I've found it to be more performant, and more importantly, easier to use than SAX.
As others have said, use a SAX parser, as it is a streaming parser. Using the various events, you extract your information as necessary and then, on the fly store it someplace else (database, another file, what have you).
You can even store it in memory if you truly just need a minor subset, or if you're simply summarizing the file. Depends on the use case of course.
If you're spooling to a DB, make sure you take some care to make your process restartable or whatever. A lot can happen in 1.8GB that can fail in the middle.
Stream the file into a SAX parser and read it into memory in chunks.
SAX gives you a lot of control and being event-driven makes sense. The api is a little hard to get a grip on, you have to pay attention to some things like when the characters() method is called, but the basic idea is you write a content handler that gets called when the start and end of each xml element is read. So you can keep track of the current xpath in the document, identify which paths have which data you're interested in, and identify which path marks the end of a chunk that you want to save or hand off or otherwise process.
Use almost any SAX Parser to stream the file a bit at a time.
I had a similar problem - I had to read a whole XML file and create a data structure in memory. On this data structure (the whole thing had to be loaded) I had to do various operations. A lot of the XML elements contained text (which I had to output in my output file, but wasn't important for the algorithm).
FIrstly, as suggested here, I used SAX to parse the file and build up my data structure. My file was 4GB and I had an 8GB machine so I figured maybe 3GB of the file was just text, and java.lang.String would probably need 6GB for those text using its UTF-16.
If the JVM takes up more space than the computer has physical RAM, then the machine will swap. Doing a mark+sweep garbage collection will result in the pages getting accessed in a random-order manner and also objects getting moved from one object pool to another, which basically kills the machine.
So I decided to write all my strings out to disk in a file (the FS can obviously handle sequential-write of the 3GB just fine, and when reading it in the OS will use available memory for a file-system cache; there might still be random-access reads but fewer than a GC in java). I created a little helper class which you are more than welcome to download if it helps you: StringsFile javadoc | Download ZIP.
StringsFile file = new StringsFile();
StringInFile str = file.newString("abc"); // writes string to file
System.out.println("str is: " + str.toString()); // fetches string from file
+1 for StaX. It's easier to use than SaX because you don't need to write callbacks (you essentially just loop over all elements of the while until you're done) and it has (AFAIK) no limit as to the size of the files it can process.
We have a file I/O bottleneck. We have a directory which contains lots of JPEG files, and we want to read them in in real time as a movie. Obviously this is not an ideal format, but this is a prototype object tracking system and there is no possibility to change the format as they are used elsewhere in the code.
From each file we build a frame object which basically means having a buffered image and an explicit bytebuffer containing all of the information from the image.
What is the best strategy for this? The data is on a SSD which in theory has read/write rates around 400Mb/s, but in practice is reading no more than 20 files per second (3-4Mb/s) using the naive implementation:
bufferedImg = ImageIO.read(imageFile);[1]
byte[] data = ((DataBufferByte)bufferedImg.getRaster().getDataBuffer()).getData();[2]
imgBuf = ByteBuffer.wrap(data);
However, Java produces lots of possibilities for improving this.
(1) CHannels. Esp File Channels
(2) Gathering/Scattering.
(3) Direct Buffering
(4) Memory Mapped Buffers
(5) MultiThreading - use a bunch of callables to access many files simultaneously.
(6) Wrapping the files in a single large file.
(7) Other things I haven't thought of yet.
I would just like to know if anyone has extensively tested the different options, and knows what is optimal? I assume that (3) is a must, but I would still like to optimise the reading of a single file as far as possible, and am unsure of the best strategy.
Bonus Question: In the code snipped above, when does the JVM actually 'hit the disk' and read in the contents of the file, is it [1] or is that just a file handler which `points' to the object? It kind of makes sense to lazily evaluate but I don't know how the implementation of the ImageIO class works.
ImageIO.read(imageFile)
As it returns BufferedImage, I assume it will hit disk and just not file handler.
I am storing large amounts of information inside of text files that are written via java. I have two questions relating to this:
Is there any efficiency boost to writing in binary or bytecode over Strings?
What would I use to write the data type into a file.
I already have a setup based around Strings, but I want to compare and at least know how to write the file in bytecode or binary.
When I read in the file, it will be translated into Strings again, but according to my reasearch if I write the file straight into bytecode it removes the added process on both ends of translating between Strings and code both for writing the file and for reading it.
cHao has a good point about just using Strings anyway, but I am still interested in the how if how to write varied data types in the file.
In other words, can I still use the FileReader and BufferedReader to read and translate back to Strings, or is there another thing to use. Also using a BinaryWriter, is it still just the FileWriter class that I use???
If you want to write it in "binary", and you want to save space, why not just zip it using the jdk? Meets all your requirements.
In my program i want the user to be able to take some images from a directory, and save them under a single file, that can be transferred to another computer possibly, and actually read and displayed(using the same program).
How would i go about doing this, especially if i want to save other data along with it, perhaps objects and such. I know you can use the ObjectOutputStream class, but im not sure how to integrate it with images.
So overall, i want the program to be able to read/write data, objects, and images to/from a single file.
Thanks in Advance.
[EDIT - From Responses + Comment regarding Zip Files]
A zip might be able to get the job done.
But i want it to be read only be the program. ( You think making it a zip, changing the file extension would work, then when reading it just chaing it back and reading as a zip?? ) I dont want users to be able to see the contents directly.
Ill elaborate a bit more saying its a game, and users can create their own content using xml files, images and such. But when a user creates something i dont want other users to be able to see exactly how they created it, or what they used, only the end result.
You can programatically create a zip file, and read a zip file from Java, no need to expose it as a regular .zip file.
See: java.io.zip pacakge for more information, and these others for code samples on how to read/write zip using java.
Now if you want to prevent the users from unzipping this file, but you don't want to complicate your life by encrypting the content, or creating a complex format, you can emulate a simple internet message format, similar to the one used for e-mails to attach files.
You can read more about the internet message format here
This would be a custom file format only used by your application so you can do it as simple as you want. You just have to define your format.
It could be:
Header with the names ( and number ) of files in that bundle.
Followed by a list of separators ( for instance limit.a.txt=yadayada some identifier to know you have finished with that content )
Actual content
So, you create the bundle with something like the following:
public void createBundle() {
ZipOutputStream out = ....
writeHeader( out );
writeLimits( out yourFiles );
for( File f : youFiles ) {
writeFileTo( f, out );
}
out.close();
}
Sort of...
And the result would be a zipped file with something like:
filenames =a.jpg, b.xml, c.ser, d.properties, e.txt
limits.a.jpg =poiurqpoiurqpoeiruqeoiruqproi
limits.b.xml =faklsdjfñaljsdfñalksjdfa
limit.s.ser =sdf09asdf0as9dfasd09fasdfasdflkajsdfñlk
limit.d.properties =adfa0sd98fasdf90asdfaposdifasdfklasdfkñm
limit.e.txt =asdf9asdfaoisdfapsdfñlj
attachments=
<include binary data from a.jpg here>
--poiurqpoiurqpoeiruqeoiruqproi
<include binary data from b.xml here>
--faklsdjfñaljsdfñalksjdfa
etc
Since is your file format you can keep it as simple as possible or complicate your life at infinitum.
If you manage to include a MIME library in your app, that could save you a lot of time.
Finally if you want to add extra security, you have to encrypt the file, which is not that hard after all, the problems is, if you ship the encrypting code too, your users could get curious about it and decompile them to find out. But a good encrypting mechanism would prevent this.
So, depending on your needs you can go from a simple zip, a zip with a custom format, a zip with a complicated customformat or a zip with a custom complicated encrypted format.
Since that's to broad you may ask about specific parts here: https://stackoverflow.com/questions/ask
In your case I would use a ZIP library to package all the images in a ZIP file. For the metadata you want to save along with these, use XML files. XML and ZIP are quite a de-facto standard today, simple to handle and though flexible if you want to add new files or metadata. There are also serializing tools to serialize your objects into XML. (I don't know them exactly in Java, but I'm sure there are.)
Yep, just pack/unpack them with java.util.zip.* which is pretty straightforward to go. Every Windows Version since XP has built in zip support, so your good to go. There are many good (and faster) free zip libraries for java/c#, too.
I know you can use the ObjectOutputStream class, but im not sure how to integrate it with images.
Images are binary data, so reading it into a byte[] and writing the byte[] to ObjectOutputStream should work. It's however only memory hogging since every byte eats at least one byte of JVM's memory. You'll need to take this into account.
Has anybody written any classes for reading and writing Palm Database (PDB) files in Java? (I mean on a server, not on the Palm device itself.) I tried to google, but all I got were Protein Data Bank references.
I wrote a Perl program that does it using Palm::PDB.pm, but I want to turn it into a servlet for a GWT app.
The jSyncManager project at http://www.jsyncmanager.org/ is under the LGPL and includes classes to read and write PDB files -- look in jSyncManager/API/Protocol/Util/DLPDatabase.java in its source code. It looks like the core code you need from this could be isolated from the rest of the library with a little effort.
There are a few ways that you can go about this;
Easiest but slowest: Find a perl-> java bridge. This will not be quick, but it will work and it should involve the least amount of work.
Find a C++/C# implementation that you have the source to and convert it (this should be the fastest solution)
Find a Java reader ... there seems to be a few listed under google... however I do not have any experience with these.
Depending on what your intended usage is, you might look into writing a simple reader yourself. The format is pretty simple and you only need to handle a couple of simple fields to parse it.
Basically there is a header for the entire file which has a 2 byte integer at the end which specifies the number of record. So just skip your way through the bytes for all the other fields in the header and then read the last field which is the number of records in the file. Be aware that the PDB format writes integers with most significant byte first.
Following this, there will be a record header for each record, the first field of which is the actual offset into the file for the record itself. Again, be aware of the byte order.
So, now you have the offsets into the file for each record in the file, which should make it very easy to read the actual records as long as you know the format of these for the type of PDB file you are trying to read.
Wikipedia has a nice overview of the header formats.
Maybe JPilot can help? They must have a lot of Java code dealing with Palm OS data.