Rolling file implementation - java

I am always curious how a rolling file is implemented in logs.
How would one even start creating a file writing class in any language in order to ensure that the file size is not exceeded.
The only possible solution I can think of is this:
write method:
size = file size + size of string to write
if(size > limit)
close the file writer
open file reader
read the file
close file reader
open file writer (clears the whole file)
remove the size from the beginning to accommodate for new string to write
write the new truncated string
write the string we received
This seems like a terrible implementation, but I can not think up of anything better.
Specifically I would love to see a solution in java.
EDIT: By remove size from the beginning is, let's say I have 20 byte string (which is the limit), I want to write another 3 byte string, therefore I remove 3 bytes from the beginning, and am left with end 17 bytes, and by appending the new string I have 20 bytes.

Because your question made me look into it, here's an example from the logback logging framework. The RollingfileAppender#rollover() method looks like this:
public void rollover() {
synchronized (lock) {
// Note: This method needs to be synchronized because it needs exclusive
// access while it closes and then re-opens the target file.
//
// make sure to close the hereto active log file! Renaming under windows
// does not work for open files
this.closeOutputStream();
try {
rollingPolicy.rollover(); // this actually does the renaming of files
} catch (RolloverFailure rf) {
addWarn("RolloverFailure occurred. Deferring roll-over.");
// we failed to roll-over, let us not truncate and risk data loss
this.append = true;
}
try {
// update the currentlyActiveFile
currentlyActiveFile = new File(rollingPolicy.getActiveFileName());
// This will also close the file. This is OK since multiple
// close operations are safe.
// COMMENT MINE this also sets the new OutputStream for the new file
this.openFile(rollingPolicy.getActiveFileName());
} catch (IOException e) {
addError("setFile(" + fileName + ", false) call failed.", e);
}
}
}
As you can see, the logic is pretty similar to what you posted. They close the current OutputStream, perform the rollover, then open a new one (openFile()). Obviously, this is all done in a synchronized block since many threads are using the logger, but only one rollover should occur at a time.
A RollingPolicy is a policy on how to perform a rollover and a TriggeringPolicy is when to perform a rollover. With logback, you usually base these policies on file size or time.

Related

Java: deleting gz files failed occasionally

UPDATE 2: Please close this question
After further debugging it is found that the problem is not in the inner try block, but a bug inside the 'while' loop. An exception was caused there and was not caught, which therefore skips the inner try block. Apologies for my mistake, please delete this thread.
UPDATE: added logging to capture errors during delete.
I am downloading 8000ish GZ files from a server, process its content locally, then delete the downloaded copy upon completion. I am running this over a number of threads, each process a disjoint batch of GZ files. But I do not understand for what reason that my code does not successfully delete the GZ files occasionally (not always). The code generally looks like this:
....
private static final Logger LOG = Logger.getLogger(....class.getName());
.....
for (String inputGZFile : gzFiles) { //gzFiles is a list of urls to be process by this thread
try {
File downloadTo = new
File(this.outFolder + "/" + new File(downloadFrom.getPath()).getName());
FileUtils.copyURLToFile(downloadFrom, downloadTo);
InputStream fileStream = new FileInputStream(downloadTo);
InputStream gzipStream = new GZIPInputStream(fileStream);
Reader decoder = new InputStreamReader(gzipStream, Charset.forName("utf8"));
Scanner inputScanner = new Scanner(decoder);
inputScanner.useDelimiter(" .");
while (inputScanner.hasNextLine() && (content = inputScanner.nextLine()) != null) {
//do something
}
try {
inputScanner.close();
FileUtils.forceDelete(downloadTo);
}catch (Exception e){
LOG.info("\t thread " + id + " deleting gz file error "+ inputGZFile);
LOG.info("\t thread " + id+ExceptionUtils.getFullStackTrace(e));
}
}catch(Exception e){
e.printStackTrace();
}
}
The only reason I can think of is that the scanner did not close the file or release the file handle. But that would be strange because I already call the close method to close the scanner.
Any suggestions highly appreiciated.
Without the ability to look into your log files, or debug your system first hand, it is close to impossible to tell you what is going wrong here.
But what you can definitely do: do that call to FileUtils.forceDelete(downloadTo); within a finally block for example.
The whole point of try/catch/finally is to enable you to enforce that specific actions always take place, no matter what happened in the try block!
Also note: if you are unable to tell what your code does, then add logging support to it. So that instead of printStackTrace(); you log the whole exception to a place where it does not get lost.
Meaning: the real answer here is that you step back and take the necessary actions to find out where your problems are coming from.

Prevent Text File From Being Deleted When Accessed From Multiple Threads

I'm trying to debug a problem that just surfaced in my program. Until now, I've been writing, reading and updating props file with no problem using the following code structure:
public void setAndReplacePropValue(String dir, String key, String value) throws FileNotFoundException, IOException {
if (value != null) {
File file = new File(dir);
if (!file.exists()) {
System.out.println("File: " + dir + " is not present. Attempting to create new file now..");
new FilesAndFolders().createTextFileWithDirsIfNotPresent(dir);
}
if (file.exists()) {
try {
FileInputStream fileInputStream = null;
fileInputStream = new FileInputStream(file);
if (fileInputStream != null) {
Properties properties = new Properties();
properties.load(fileInputStream);
fileInputStream.close();
if (properties != null) {
FileOutputStream fileOutputStream = new FileOutputStream(file);
properties.setProperty(key, value);
properties.store(fileOutputStream, null);
fileOutputStream.close();
}
}
}
catch (Exception e) {
e.printStackTrace();
}
} else {
System.out.println("File: " + dir + " does not exist and attempt to create new file failed");
}
}
}
However, recently I noticed that a specific file (let's call it: C:\\Users\\Admin\\Desktop\\props.txt) is being deleted after being updated from multiple threads. I'm not sure the exact source of this error, as it seems to happen randomly.
I thought that, perhaps, if two threads call setAndReplacePropValue() and the first thread calls FileOutputStream fileOutputStream = new FileOutputStream(file); before it has a chance to re-write data to file (via properties.store(fileOutputStream, null) ) then the second thread might call fileInputStream = new FileInputStream(file); on an empty file - causing the thread to delete previous data when writing 'empty' data back to file.
To test my hypothesis I tried calling setAndReplacePropValue() from multiple threads several hundred to thousand times in a row while making changes to setAndReplacePropValue() as needed. Here are my results:
If setAndReplace() is declared as static + synchronized the original props data is preserved. This remains true even when I add a random delay after calling FileOutputStream - as long as JVM exists normally. If JVM is killed/terminated (after FileOutputStream is called) then previous data will be deleted.
If I remove both static and synchronized modifiers from setAndReplace() and call setAndReplace() 5,000 times, the old data is still preserved (why?) - as long as JVM ends normally. This appears to be true even when I add random delay in setAndReplace() (after calling FileOutputStream).
When I try modifying props file using ExecutorService (I occasionally access setAndReplacePropValue() via ExecutorService in my program), file content is preserved as long as there's no delay after FileOutputStream. If I add delay and the delay is > 'timout' value set in future.get() (so interrupted exception is thrown) the data is NOT preserved. This remains true even if I add static + synchronized keywords to method.
In short, my question is what is the most likely explanation for why file is being deleted? (I thought point 3 might explain error but I'm not actually sleeping after calling new FileOutputStream() so presumably this would not prevent data from being written back to file after calling new FileOutputStream()). Is there another possibility I didn't think of?
Also, why is point 2 true? If method is not declared as static/synchronized shouldn't this cause one thread to create InputStream from empty file? Thanks.
Unfortunately it is very difficult to provide feedback on your code without a ton of additional information but hopefully, my comments will be helpful.
In general, having multiple threads reading and writing from the same file is a really bad idea. I can't agree more with #Hovercraft-Full-Of-Eels who recommends that you have 1 thread do the reading/writing and the other threads just add updates to a shared BlockingQueue.
But that said here are some comments.
If setAndReplace() is declared as static + synchronized the original props data is preserved.
Right, this stops the terrible race condition in your code where 2 threads could be trying to write to the output file at the same time. Or it could be that 1 thread starts to write and another thread reads an empty file causing data to be lost.
If JVM is killed/terminated (after FileOutputStream is called) then previous data will be deleted.
I don't quite understand this part but your code should have good try/finally clauses to make sure that the files are closed appropriately when the JVM terminates. If the JVM is hard-killed then the file may have been opened but not be written yet (depending on timing). In this case, I would recommend that you write to a temporary file and rename to your properties file which is atomic. Then you might miss the update if the JVM is killed but the file will never be overwritten and be empty.
If I remove both static and synchronized modifiers from setAndReplace() and call setAndReplace() 5,000 times, the old data is still preserved (why?)
No idea. Depends on race conditions. Maybe you are just getting lucky.
When I try modifying props file using ExecutorService (I occasionally access setAndReplacePropValue() via ExecutorService in my program), file content is preserved as long as there's no delay after FileOutputStream. If I add delay and the delay is > 'timeout' value set in future.get() (so interrupted exception is thrown) the data is NOT preserved. This remains true even if I add static + synchronized keywords to method.
I can't answer that without seeing the specific code.
This would be a good idea actually if you had a fixed thread pool with 1 thread, then each of the threads that want to update a value would just submit the field/value object to the thread-pool. This is approximately what #Hovercraft-Full-Of-Eels was talking about.
Hope this helps.

Usefulness of DELETE_ON_CLOSE

There are many examples on the internet showing how to use StandardOpenOption.DELETE_ON_CLOSE, such as this:
Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE);
Other examples similarly use Files.newOutputStream(..., StandardOpenOption.DELETE_ON_CLOSE).
I suspect all of these examples are probably flawed. The purpose of writing a file is that you're going to read it back at some point; otherwise, why bother writing it? But wouldn't DELETE_ON_CLOSE cause the file to be deleted before you have a chance to read it?
If you create a work file (to work with large amounts of data that are too large to keep in memory) then wouldn't you use RandomAccessFile instead, which allows both read and write access? However, RandomAccessFile doesn't give you the option to specify DELETE_ON_CLOSE, as far as I can see.
So can someone show me how DELETE_ON_CLOSE is actually useful?
First of all I agree with you Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE) in this example the use of DELETE_ON_CLOSE is meaningless. After a (not so intense) search through the internet the only example I could find which shows the usage as mentioned was the one from which you might got it (http://softwarecave.org/2014/02/05/create-temporary-files-and-directories-using-java-nio2/).
This option is not intended to be used for Files.write(...) only. The API make is quite clear:
This option is primarily intended for use with work files that are used solely by a single instance of the Java virtual machine. This option is not recommended for use when opening files that are open concurrently by other entities.
Sorry I can't give you a meaningful short example, but see such file like a swap file/partition used by an operating system. In cases where the current JVM have the need to temporarily store data on the disc and after the shutdown the data are of no use anymore. As practical example I would mention it is similar to an JEE application server which might decide to serialize some entities to disc to freeup memory.
edit Maybe the following (oversimplified code) can be taken as example to demonstrate the principle. (so please: nobody should start a discussion about that this "data management" could be done differently, using fixed temporary filename is bad and so on, ...)
in the try-with-resource block you need for some reason to externalize data (the reasons are not subject of the discussion)
you have random read/write access to this externalized data
this externalized data only is of use only inside the try-with-resource block
with the use of the StandardOpenOption.DELETE_ON_CLOSE option you don't need to handle the deletion after the use yourself, the JVM will take care about it (the limitations and edge cases are described in the API)
.
static final int RECORD_LENGTH = 20;
static final String RECORD_FORMAT = "%-" + RECORD_LENGTH + "s";
// add exception handling, left out only for the example
public static void main(String[] args) throws Exception {
EnumSet<StandardOpenOption> options = EnumSet.of(
StandardOpenOption.CREATE,
StandardOpenOption.WRITE,
StandardOpenOption.READ,
StandardOpenOption.DELETE_ON_CLOSE
);
Path file = Paths.get("/tmp/enternal_data.tmp");
try (SeekableByteChannel sbc = Files.newByteChannel(file, options)) {
// during your business processing the below two cases might happen
// several times in random order
// example of huge datastructure to externalize
String[] sampleData = {"some", "huge", "datastructure"};
for (int i = 0; i < sampleData.length; i++) {
byte[] buffer = String.format(RECORD_FORMAT, sampleData[i])
.getBytes();
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
sbc.position(i * RECORD_LENGTH);
sbc.write(byteBuffer);
}
// example of processing which need the externalized data
Random random = new Random();
byte[] buffer = new byte[RECORD_LENGTH];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
for (int i = 0; i < 10; i++) {
sbc.position(RECORD_LENGTH * random.nextInt(sampleData.length));
sbc.read(byteBuffer);
byteBuffer.flip();
System.out.printf("loop: %d %s%n", i, new String(buffer));
}
}
}
The DELETE_ON_CLOSE is intended for working temp files.
If you need to make some operation that needs too be temporaly stored on a file but you don't need to use the file outside of the current execution a DELETE_ON_CLOSE in a good solution for that.
An example is when you need to store informations that can't be mantained in memory for example because they are too heavy.
Another example is when you need to store temporarely the informations and you need them only in a second moment and you don't like to occupy memory for that.
Imagine also a situation in which a process needs a lot of time to be completed. You store informations on a file and only later you use them (perhaps many minutes or hours after). This guarantees you that the memory is not used for those informations if you don't need them.
The DELETE_ON_CLOSE try to delete the file when you explicitly close it calling the method close() or when the JVM is shutting down if not manually closed before.
Here are two possible ways it can be used:
1. When calling Files.newByteChannel
This method returns a SeekableByteChannel suitable for both reading and writing, in which the current position can be modified.
Seems quite useful for situations where some data needs to be stored out of memory for read/write access and doesn't need to be persisted after the application closes.
2. Write to a file, read back, delete:
An example using an arbitrary text file:
Path p = Paths.get("C:\\test", "foo.txt");
System.out.println(Files.exists(p));
try {
Files.createFile(p);
System.out.println(Files.exists(p));
try (BufferedWriter out = Files.newBufferedWriter(p, Charset.defaultCharset(), StandardOpenOption.DELETE_ON_CLOSE)) {
out.append("Hello, World!");
out.flush();
try (BufferedReader in = Files.newBufferedReader(p, Charset.defaultCharset())) {
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
System.out.println(Files.exists(p));
This outputs (as expected):
false
true
Hello, World!
false
This example is obviously trivial, but I imagine there are plenty of situations where such an approach may come in handy.
However, I still believe the old File.deleteOnExit method may be preferable as you won't need to keep the output stream open for the duration of any read operations on the file, too.

Why InputStream.available() so time consuming?

I have implemented my own class to read pcap files. (Binary files, i.e. tcpdump, wireshark)
public class PcapReader implements Iterator<PcapPacket> {
private InputStream is;
public PcapReader (File file) throws FileNotFoundException, IOException {
is = this(new DataInputStream(
new BufferedInputStream(
new FileInputStream(file))));
}
#Override
public boolean hasNext () {
try {
return (is.available() > 0);
} catch (IOException e) {
return false;
}
}
//pseudo code!
#Override
public PcapPacket next () {
is.read(header);
is.read(body);
return new PcapPacket(header, body);
}
//more code here
}
Then I use it like this:
PcapReader reader = new PcapReader(file);
while (reader.hasNext()) {
PcapPacket pcapPacket = reader.next();
//process packet
}
The file under test has 190 Mb. And I also use JVisualVM to profile.
hasNext() is called 1.7 million times and time is 7.7 seconds
next() is called same number of times and time is 3.6 seconds
My main question is why hasNext() is so time consuming in absolute value and also twice greater than next?
When you call is.available(), in your hasNext() method, it goes down to FileInputStream.available() implementation. This is a native method, as one may see from FileInputStream source code.
In the end, this is indeed a time-consumming operation, as the Operating System implementation of the file operations will have to check ahead if more data is available to be read. So, it will actually do a read operation without updating the file pointer (or updating it back to the original position), just to check if there is a "next" byte.
I'm sure, that internal (native) implementation of available() method is not something like just returning some return availableSize;, but more complicated. Stream counts available data using OS API; especially, for example, for log files, which are written due Stream reads them.
I have implemented my own class to read pcap files.
Because you're not using jNetPcap, or because you are using jNetPcap but need something that can read from a File?
If the latter, you probably want to use a pattern other than one that has a "more data is available" method and a separate "so read that data" method; something that reads the data and either returns a "packet available"/"end of file"/"error" indication or throws an exception for one or both of the latter conditions (DataInputStream appears to throw exceptions for both I/O errors and EOF, so it might make sense to do the same for your class).
Yeah, that means it can't be an Iterator, but maybe Iterators weren't originally intended to represent records in a sequential file (besides, if you really want it to be an Iterator, what are you going to do about the remove method?).
And if you can avoid needing to read from a File, you could then use jNetPcap's own routines for reading capture files, which, in libpcap 1.1.0 and later, can also read some pcap-ng files.

FileOutputStream does not create file

I actually checked other posts that could be related to this and I couldn't find any answer to my question. So, had to create this newly:
The file does not get created in the given location with this code:
File as = new File ("C:\\Documents and Settings\\<user>\\Desktop\\demo1\\One.xls");
if (!as.exists()) {
as.createNewFile();
}
FileOutputStream fod = new FileOutputStream(as);
BufferedOutputStream dob = new BufferedOutputStream(fod);
byte[] asd = {65, 22, 123};
byte a1 = 87;
dob.write(asd);
dob.write(a1);
dob.flush();
if (dob!=null){
dob.close();
}
if(fod!=null){
fod.close();
The code runs fine and I don't get any FileNotFoundException!!
Is there anything that I'm missing out here?
You can rewrite your code like this:
BufferedOutputStream dob = null;
try {
File file = new File("C:\\Documents and Settings\\<user>\\Desktop\\demo1\\One.xls");
System.out.println("file created:" + file.exists());
FileOutputStream fod = new FileOutputStream(file);
System.out.println("file created:" + file.exists());
BufferedOutputStream dob = new BufferedOutputStream(fod);
byte[] asd = {65, 22, 123};
byte a1 = 87;
dob.write(asd);
dob.write(a1);
//dob.flush();
}
catch (Exception ex) {
ex.printStackTrace();
}
finally {
if (dob != null) {
dob.close();
}
}
In this case it is only necessary to call the topmost stream handler close() method - the BufferedOutputStream's one:
Closes this output stream and releases any system resources associated with the stream.
The close method of FilterOutputStream calls its flush method, and then calls the close method of its underlying output stream.
so, the dob.flush() in try block is commented out because the dob.close() line in the finally block flushes the stream. Also, it releases the system resources (e.g. "closes the file") as stated in the apidoc quote above. Using the finally block is a good practice:
The finally block always executes when the try block exits. This ensures that the finally block is executed even if an unexpected exception occurs. But finally is useful for more than just exception handling — it allows the programmer to avoid having cleanup code accidentally bypassed by a return, continue, or break. Putting cleanup code in a finally block is always a good practice, even when no exceptions are anticipated.
The FileOutputStream constructor creates an empty file on the disk:
Creates a file output stream to write to the file represented by the specified File object. A new FileDescriptor object is created to represent this file connection.
First, if there is a security manager, its checkWrite method is called with the path represented by the file argument as its argument.
If the file exists but is a directory rather than a regular file, does not exist but cannot be created, or cannot be opened for any other reason then a FileNotFoundException is thrown.
Where a FileDescriptor is:
Instances of the file descriptor class serve as an opaque handle to the underlying machine-specific structure representing an open file, an open socket, or another source or sink of bytes. The main practical use for a file descriptor is to create a FileInputStream or FileOutputStream to contain it.
Applications should not create their own file descriptors.
This code should either produce a file or throw an exception. You have even confirmed that no conditions for throwing exception are met, e.g. you are replacing the string and the demo1 directory exists. Please, rewrite this to a new empty file and run.
If it still behaving the same, unless I have missed something this might be a bug. In that case, add this line to the code and post output:
System.out.println(System.getProperty("java.vendor")+" "+System.getProperty("java.version"));
Judging from the path, I'd say you are using Win 7, am I right? What version?
Then it means there is a file already in your directory

Categories