How to change specific part of a file using java? - java

I was writing a program that implements a dictionary.
Actually what I did is just to write a java applet to show the words which is defined in a .xml file. And I did that with the org.w3c.dom package.
Now, I want to add a new feature that users can modify a word in the dictionary in the the program then the modification will be saved to the original .xml file.
Here is my question: what should I do to save the changes? Note that users can only modify one word a time so I don't want to load the whole file and modify the certain part and re-write the whole file to the disk. Is there a novel way to do that?

An XML file is a sequential text file. This means that there is no formula or other convenient way to locate the n-th word in a dictionary stored in XML. Elements need to be written one after the other, character by character (and one character may or may not result in a byte). Thus, what is called a random update, is out.
Look at JAXB for a most convenient way to read and write XML, and invest some work so that a user cannot update in memory and terminate the program without saving.

Reading and writing files in specific formats is a little bit trickier that what you portray.
Seen with "XML eyes" you are only changing a portion of the file - but to do that on the file level you need to seek to the position of change and write new bytes from there. The problem with that is that the content after that position won't adjust according to the new portion you write.
TL;DR - no - you need to read+write the complete XML file when making changes.

Related

Parsing XML file from the end of file

I want to use XML for storing some data. But I do not want read full file when I want to get the last data that was inserted there, as well as I do not want to rewrite full file when adding new data there. Is there a standard way in java to parse xml file not from the beginning but from the end. So that for example SAX or StaX parser will first encounter last closing root tag and than last tag. Or if I want to do this I should read and write everything like I am reading/writing regular text file?
Fundamentally, XML is a poor representation choice for this. The format is inherently "contained" like this, and I haven't seen any APIs which encourage you to fight against that.
Options:
Choose a different format entirely (e.g. use a database)
Create lots of small XML files instead - each one self-contained. When you want the whole of the data, read all the files
Just swallow the hit and read/write the whole file each time.
I found a good topic on this with example solutions for what I want.
This link: http://www.oreillynet.com/xml/blog/2007/03/parsing_xml_backwards.html
Seems that XML is not good file format to achieve what I want. There is no standard parser that can parse XML from the end instead of beginning.
Probably the best solution for will be storing all xml data in one file that contains composition of many xml files contents. On each line stored separate contents of XML. The file itself is not well formed XML but each line contains well formed xml that I will parse using standard xml parser(StaX).
This way I will be able to read just lines from the end of file and append new data to the end of file. When I need the whole data or only the part of it I will read all line or part of them. Probably I can also implement pagination from the end of file for that because the file can be big.
Why XML in each line? I think it is easy to use API for parsing it as well as it is human readable to store data in xml instead of just separating values in the line with some symbol.
Why not use sax/stax and simply process only your last entry? Yes, it will need to open and go through the whole file, but at least it's fairly efficient as opposed to loading the whole DOM tree.
Short of doing that, I don't think you can do what you're asking using XML as a source.
Another alternative, apart from the ones provided by Jon Skeet in his answer, would be to keep the same format but insert the latest entries first, and stop processing the files as soon as you've read your entry.

Reading PDF in java as a file and making "PDF" editable

I have a program which will be used for building questions database. I'm making it for a site that want user to know that contet was donwloaded from that site. That's why I want the output be PDF - almost everyone can view it, almost nobody can edit it (and remove e.g. footer or watermark, unlike in some simpler file types). That explains why it HAS to be PDF.
This program will be used by numerous users which will create new databases or expand existing ones. That's why having output formed as multple files is extremly sloppy and inefficient way of achieving what I want to achieve (it would complicate things for the user).
And what I want to do is to create PDF files which are still editable with my program once created.
I want to achieve this by implementing my custom file type readable with my program into the output PDF.
I came up with three ways of doing that:
Attach the file to PDF and then corrupting the part of PDF which contains it in a way it just makes the PDF unaware that it contains the file, thus making imposible for user to notice it (easely). Upon reading the document I'd revert the corruption and extract file using one of may PDF libraries.
Hide the file inside an image which would be added to the PDF somwhere on the first or last page, somehow (that is still need to work out) hidden from the public eye. Knowing it's location, it should be relativley easy to retrieve it using PDF library.
I have learned that if you add "%" sign as a first character in line inside a PDF, the whole line will be ignored (similar to "//" in Java) by the PDF reader (atleast Adobe reader), making possible for me to add as many lines as I want to the PDF (if I know where, and I do) whitout the end user being aware of that. I could implement my whole custom file into PDF that way. The problem here is that I actually have to read the PDF using one of the Java's input readers, but I'm not sure which one. I understand that PDF can't be read like a text file since it's a binary file (Right?).
In the end, I decided to go with the method number 3.
Unless someone has any better ideas, and the conditions are:
1. One file only. And that file is PDF.
2. User must not be aware of the addition.
The problem is that I don't know how to read the PDF as a file (I'm not trying to read it as a PDF, which I would do using a PDF library).
So, does anyone have a better idea?
If not, how do I read PDF as a FILE, so the output is array of characters (with newline detection), and then rewrite the whole file with my content addition?
In Java, there is no real difference between text and binary files, you can read them both as an inputstream. The difference is that for binary files, you can't really create a Reader for it, because that assumes there's a way to convert the byte stream to unicode characters, and that won't work for PDF files.
So in your case, you'd need to read the files in byte buffers and possibly loop over them to scan for bytes representing the '%' and end-of-line character in PDF.
A better way is to use another existing way of encoding data in a PDF: XMP tags. This is allows any sort of complex Key-Value pairs to be encoded in XML and embedded in PDF's, JPEGs etc. See http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf.
There's an open source library in Java that allows you to manipulate that: http://pdfbox.apache.org/userguide/metadata.html. See also a related question from another guy who succeeded in it: custom schema to XMP metadata or http://plindenbaum.blogspot.co.uk/2010/07/pdfbox-insertextract-metadata-frominto.html
It's all just 1's and 0's - just use RandomAccessFile and start reading. The PDF specification defines what a valid newline character(s) is/are (there are several). Grab a hex editor and open a PDF and you can at least start getting a feel for things. Be careful of where you insert your lines though - you'll need to add them towards the end of the file where they won't screw up the xref table offsets to the obj entries.
Here's a related question that may be of interest: PDF parsing file trailer
I would suggest putting your comment immediately before the startxref line. If you put it anywhere else, you could wind up shifting things around and breaking the xref table pointers.
So a simple algorithm for inserting your special comment will be:
Go to the end of the file
Search backwards for startxref
Insert your special comment immediately before startxref - be sure to insert a newline character at the end of your special comment
Save the PDF
You can (and should) do this manually in a hex editor.
Really important: are your users going to be saving changes to these files? i.e. if they fill in the form field, are they going to hit save? If they are, your comment lines may be removed during the save (and different versions of different PDF viewers could behave differently in this regard).
XMP tags are the correct way to do what you are trying to do - you can embed entire XML segments, and I think you'd be hard pressed to come up with a data structure that couldn't be expressed as XML.
I personally recommend using iText for this, but I'm biased (I'm one of the devs). The iText In Action book has an excellent chapter on embedding XMP data into PDFs. Here's some sample code from the book (which I definitely recommend): http://itextpdf.com/examples/iia.php?id=217

Indexing text files in java

I have a set of text files providing informations that are parsed, analysed and allow building a model. Sometime, the user of this model wants to know which part of a text file was used to generate a given model item.
For that I am thinking of keeping track of the range of lines (or bytes) ids to be able to read the appropriate text part once required.
My question is: I wonder if it their exists any java Reader able to read a file by using a start and stop line (or byte) id instead of reading the file from the begining and counting the lines (bytes)?
Best regards
If you know exactly amount of bytes, that should be skipped, you can use seek method method of RandomAccessFile
To read from the certain byte - SeekableByteChannel. Of cause, there aren't any Readers able to start from the line id - because positions of line separators are unknown.
You can use InputStream.mark() and InputStream.skip() to navigate to concrete position into the file.
But are you sure you really have to implement this yourself? Take a look on Lucine - the indexing service that probably will help you.

How to edit a specific attribute in file?

So say you have a file that is written in XML or soe other coding language of that sort. Is it possible to just rewrite one line rather than getting the entire file into a string, then changing then line, then having to rewrite the whole string back to the file?
In general, no. File systems don't usually support the idea of inserting or modifying data in the middle of a file.
If your data file is in a fixed-size record format then you can edit a record without overwriting the rest of the file. For something like XML, you could in theory overwrite one value with a shorter one by inserting semantically-irrelevant whitespace, but you wouldn't be able to write a larger value.
In most cases it's simpler to just rewrite the whole file - either by reading and writing in a streaming fashion if you can (and if the file is too large to read into memory in one go) or just by loading the whole file into some in-memory data structure (e.g. XDocument), making the changes, and then saving the file again. (You may want to consider saving to a different file then moving the files around to avoid losing data if the save operation fails for some reason.)
If all of this ends up being too expensive, you should consider using multiple files (so each one is smaller) or a database.
If the line you want to replace is larger than the new line that you want to replace it with, then it is possible as long as it is acceptable to have some kind of padding (for example white-space characters ' ') which will not effect your application.
If on the other hand the new content are larger than the content to be replaced you will need to shift all the data downwards, so you need to rewrite the file, or at least from the replaced line onwards.
Since you mention XML, it might be you are approaching your problem in the wrong way. Could it be that what you need is to replace a specific XML node? In which case you might consider using DOM to read the XML into a hierarchy of nodes and adding/updating/removing in there before writing the XML tree back to the file.

How to write data to a file through java?

I want to make a GUI application that contains three functions as follows:
Add a record
Edit a record
Delete a record
A record contains two fields - Name and Profession
There are two restrictions for the application
You can't use database to store info. You have to use a flat file.
Total file should not be re-written for every add/delete operation.
So, my questions are mentioned below:
Q1. Which file format would be better? (.xml or .csv or .txt or any other)
Q2. How can we perform the add/delete operation without the whole file being re-written?
The second part of your question is answered here : Best Way to Write Bytes in the Middle of a File in Java
As for the format - I would go with something as simple as possible. You don't want to have to deal with a bunch of markup processing, as using RandomAccessFile, you will going directly to a byte position. A fixed width style format would be good, so that based on the record number, you can calculate the starting position of a record or field in the file, without having to read everything in the file. The fields would then be padded out to the fixed width with spaces or some other suitable character.
I would go with CSV, zipped. it is both readable, and editable externally.
If CSV is your choice, this can help: http://javacsv.sourceforge.net/
Did you look at this? http://sourceforge.net/projects/flatworm/
Also consider Apache Derbi and HSQLDB
Another solution is this http://www.coyotegulch.com/products/jisp/index.html
You can reinvent the wheel, but that is only required if this is an academic assigment...
Given that the whole file must not be rewritten, I would suggest using RandomAccessFile that allow you to read and write only the record you want.
For the file format, a binary file, using fixed length for the record : ex: Name on 20 characters, Profession on 30.
This will allow you to use the seek() method of RandomAccessFile to directly access your data.

Categories