Java RandomAccessFile truncate from start

Java RandomAccessFile truncate from start - java

I know how to truncate a RandomAccess file so that bytes at the end are removed.
raf.getChannel().truncate(file.length() - 4);
or
raf.setLength(file.length() - 4);
But how to truncate a RandomAccessFile in such a way that bytes at the start is removed? I don't need to write contents of this file to a new file. I googled and could not find an answer. Please help. Thanks in advance.

It's not an operation most file systems support. The model is a sequence of bytes starting at a particular place on the disc. Files are variable length and can be appended, and so truncation is relatively easy from there.
So you will actually need to copy all the bytes in the file. If at all possible avoid. One technique to manage queue files (such as logs), is to have a sequence of files then start a new file periodically and drop one off the end.

Related

Is it possible to have a ZipOutputstream or GZIPOutputStream in Android that can be incrementally added to?

My app creates a large amount of output, but only over a long time. Each time there is new output to add it is just a string (a few hundred bytes worth).
It would simplify my code considerably if I could add incrementally (i.e. append) to a pre-existing GZIP (or Zip) file. Is this even possible (in Java, specifically)?
I am looking for a solution that will create a file that can be opened by 3rd party apps.
I realize I can decompress the file, add the additional text and compress it again as a new blob.
Thanks
PVS

Yes. See this example in C in the examples directory of the zlib distribution: gzlog.h and gzlog.c. It does exactly that, allowing you to append short pieces of data to a gzip file. It does so efficiently, by not compressing the additions until a threshold is reached, and then compressing what hasn't been compressed so far. After each addition, the gzip file contains the addition and is a valid gzip file. The code also protects against system crashes in the middle of an append operation, recovering the file on the next append operation.
Though allowed by the format, this code does not simply concatenate short gzip streams. That would result in very poor compression.

Delete file contents using RandomAccessFile

I have a file which contains lot of zeros and as per the requirement the zeros in the file are invalid. I am using RandomAccessFile api to locate data in the file. Is there way so that all the zeros can be removed from the file using the same api.

You'll have to stream through the file and write out the content, minus the zeros, to a separate temporary file. You can then close and delete the original and rename the new file to the old file name. That's your best alternative for this particular use case.

You can use RandomAccessFile to read the files' data, and when you reach a point where you need to change the data you can overwrite the existing number of bytes with equal number of bytes. It's iff the new value is exactly the same length as the old value.
With RandomAccessFile its difficult and equally complex when the size of two, the one being changed and the new value are different. It involves a lot of seeks, reads and writes to move data back
Try to read the whole file, change the bits you have to change and write a new file. You might process one line at a time or read the whole file into memory, modify it and write it all back out again. It is a good idea to perform the edit in the following manner:
Read file
Write to Temporary File [just to back-up]
Rename original to back-up
Work on Temporary file.
Remove Backup if you were successful.

How to read arbitrary but continuous n lines from a huge file

I would like to read arbitrary number of lines. The files are normal ascii text files for the moment (they may be UTF8/multibyte character files later)
So what I want is for a method to read a file for specific lines only (for example from 101-200) and while doing so it should not block any thing (ie same file can be read by another thread for 201-210 and it should not wait for the first reading operation.
In the case there are no lines to read it should gracefully return what ever it could read. The output of the methods could be a List
The solution I thought up so far was to read the entire file first to find number of lines as well as the byte positions of each new line character. Then use the RandomAccessFile to read bytes and convert them to lines. I have to convert the bytes to Strings (but that can be done after the reading is done). I would avoid the end of file exception for reading beyond file by proper book keeping. The solution is bit inefficient as it does go through the file twice, but the file size can be really big and we want to keep very little in the memory.
If there is a library for such thing that would work, but a simpler native java solution would be great.
As always I appreciate your clarification questions and I will edit this question as it goes.

Why not use Scanner and just loop through hasNextLine() until you get to the count you want, and then grab as many lines as you wish... if it runs out, it'll fail gracefully. That way you're only reading the file once (unless Scanner reads it fully... I've never looked under the hood... but it doesn't sound like you care, so... there you go :)

If you want to minimise memory consumption, I would use a memory mapped file. This uses almost no heap. The amount of the file kept in memory is handled by the OS so you don't need to tune the behaviour yourself.
FileChannel fc = new FileInputStream(fileName).getChannel();
final MappedByteBuffer map = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
If you have a file of 2 GB or more, you need multiple mappings. In the simplest case you can scan the data and remember all the indexes. The indexes them selves could take lots of space so you might only remember every Nth e.g. every tenth.
e.g. a 2 GB file with 40 byte lines could have 50 million lines requiring 400 MB of memory.
Another way around having a large index is to create another memory mapped file.
FileChannel fc = new RandomAccessFile(fileName).getChannel();
final MappedByteBuffer map2 = fc.map(FileChannel.MapMode.READ_WRITE, 0, fc.size()/10);
The problem being, you don't know how big the file needs to be before you start. Fortunately if you make it larger than needed, it doesn't consume memory or disk space, so the simplest thing to do is make it very large and truncate it when you know the size it needs to be.
This could also be use to avoid re-indexing the file each time you load the file (only when it is changed) If the file is only appended to, you could index from the end of the file each time.
Note: Using this approach can use a lot of virtual memory, for a 64-bit JVM this is no problem as your limit is likely to 256 TB. For a 32-bit application, you limits is likely to be 1.5 - 3.5 GB depending on your OS.

Inserting data in RandomAccessFile and updating index

I've got a RandomAccessFile in Java where i manage some data. Simplified:
At the start of the file i have an index. (8 byte long value per dataset which represents the offset where the real data can be found).
So if i want to now where i can find the data of dataset no 3 for example. I read 8 Bytes at offset (2*8). (Indexing starts with 0).
A dataset itsself consists of 4 Bytes which represents the size of the dataset and then all the bytes belonging to the dataset.
So that works fine in case i always rewrite the whole file.
It's pretty important here, that Dataset no 3 could have been written as the first entry in the file so the index is ordered but not the data itsself.
If i insert a new dataset, i always append it to the end of the file. But the number of datasets that could be i n one file is limited. If i can store 100 datasets in the file there will be always 100 entries in the index. If the offset read from the index of a dataset is 0 the dataset is new and will be appended to the file.
Bu there's one case which is not working for me yet. If i read dataset no. 3 from the file and i add some data to it in my application and i want to update it in the file i have no idea how to do this.
If it has the same length as befor i can simply overwrite the old data. But if the new dataset has more bytes than the old one i'll have to move all the data in the file which is behind this dataset and update the indexes for these datasets.
Any idea how to do that?
Or is there maybe a better way to manage storing these datasets in a file?
PS: Yes of course i thought of using a database but this is not applicable for my project. I really do need simple files.

You can't easily insert data into the middle of a file. You'd basically have to read all the remaining data, write the "new" data and then rewrite the "old" data. Alternatively, you could potentially invalidate the old "slow" (potentially allowing it to be reused later) and then just write the whole new record to the end of the file. Your file format isn't really clear to me to be honest, but fundamentally you need to be aware that you can't insert (or delete) in the middle of a file.

I've got a RandomAccessFile in Java where i manage some data.
Stop right there. You have a file. You are presently accessing it via RandomAccessFile in Java. However your entire question relates to the file itself, not to RandomAccessFile or Java. You have a major file design problem, as you are assuming facilities like inserting into the middle of a file that don't exist in any filesystem I have used since about 1979.

As the others answered too, there's no real possibility to make the file longer/shorter without rewriting the whole. There are some workarounds and maybe one solution would work after all.
Limit all datasets to a fixed length.
Delete by changing/removing the index and add by always adding to the end of the file. Update by removing the old dataset and adding the new dataset to the end if the new dataset is longer. Compress the file from time to time by actually deleting the "ignored datasets" and moving all valid datasets together (rewriting everything).
If you can't limit the dataset to a fixed length and you intend to update a dataset making it longer, you can also leave a pointer at the end of the first part of a dataset and continue it later in the file. Thus you get a structure like a linked list. If a lot of editing takes place it would make here sense too, to rearrange & compress the file.
Most solutions have a data overhead but file size is usually not the problem and as mentioned you can let some method "clean it up".
PS: I hope it's ok to answer such old questions - I couldn't find anything about it in the help center and I'm relatively new here.

Shift the file while writing?

Is it possible to shift the contents of a file while writing to it using FileWriter?
I need to write data constants to the head of the file and if I do that it overwrites the file.
What technique should I use to do this or should I make make copies of the file (with the new data on top) on every file write?

If you want to overwrite certain bytes of the file and not others, you can use seek and write to do so. If you want to change the content of every byte in the file (by, for example, adding a single byte to the beginning of the file) then you need to write a new file and potentially rename it after you've done writing it.
Think of the answer to the question "what will be the contents of the byte at offset x after I'm done?". If, for a large percent of the possible values of x the answer is "not what it used to be" then you need a new file.

Rather than contending ourselves with the question "what will be the contents of the byte at offset x after I'm done?", lets change the mindset and ask why can't the file system or perhaps the hard disk firmware do : a) provide another mode of accessing the file [let's say, inline] b) increase the length of the file by the number of bytes added at the front or in the middle or even at the end c) move each byte that starts from the crossection by the newcontent.length positions
It would be easier and faster to handle these operations at the disk firmware or file system implementation level rather than leaving that job to the application developer. I hope file system writers or hard disk vendors would offer such feature soon.
Regards,
Samba

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.