reduce number of opened files in java code - java

Hi I have some code that uses block
RandomAccessFile file = new RandomAccessFile("some file", "rw");
FileChannel channel = file.getChannel();
// some code
String line = "some data";
ByteBuffer buf = ByteBuffer.wrap(line.getBytes());
channel.write(buf);
channel.close();
file.close();
but the specific of the application is that I have to generate large number of temporary files, more then 4000 in average (used for Hive inserts to the partitioned table).
The problem is that sometimes I catch exception
Failed with exception Too many open files
during the app running.
I wounder if there any way to tell OS that file is closed already and not used anymore, why the
channel.close();
file.close();
does not reduce the number of opened files. Is there any way to do this in Java code?
I have already increased max number of opened files in
#/etc/sysctl.conf:
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.ipc.somaxconn=8096
Update:
I tried to eliminate the problem, so I parted the code to investigate each part of it (create files, upload to hive, delete files).
Using class 'File' or 'RandomAccessFile' fails with the exception "Too many open files".
Finally I used the code:
FileOutputStream s = null;
FileChannel c = null;
try {
s = new FileOutputStream(filePath);
c = s.getChannel();
// do writes
c.write("some data");
c.force(true);
s.getFD().sync();
} catch (IOException e) {
// handle exception
} finally {
if (c != null)
c.close();
if (s != null)
s.close();
}
And this works with large amounts of files (tested on 20K with 5KB size each). The code itself does not throw exception as previous two classes.
But production code (with hive) still had the exception. And it appears that the hive connection through the JDBC is the reason of it.
I will investigate further.

The amount of open file handles that can be used by the OS is not the same thing as the number of file handles that can be opened by a process. Most unix systems restrict the number of file handles per process. Most likely it something like 1024 file handles for your JVM.
a) You need to set the ulimit in the shell that launches the JVM to some higher value. (Something like 'ulimit -n 4000')
b) You should verify that you don't have any resource leaks that are preventing your files from being 'finalized'.

Make sure to use a finally{} block. If there is an exception for some reason the close will never happen in the code as written.

Is this the exact code? Because I can think of one scenario where you might be opening all the files in a loop and written the code to close all of them in the end which is causing this problem. Please post the full code.

Related

Creating mp4 file doesn't remove tmp files

I'm trying to write an InputStream that is an mp4 that I get from calling an external SOAP service, when I do so, it always generates this tmp files for my chosen temporary directory(java.io.tmpdir) that aren't removable and stay after the writing is done.
Writing images that I also get from the SOAP service works normal without the permanent tmp on the directory. I'm using java 1.8 SpringBoot
tmp files
This is what I'm doing:
File targetFile = new File("D:/archive/video.mp4");
targetFile.getParentFile().mkdirs();
targetFile.setWritable(true);
InputStream inputStream = filesToWrite.getInputStream();
OutputStream outputStream = new FileOutputStream(targetFile);
try {
int byteRead;
while ((byteRead = inputStream.read()) != -1) {
outputStream.write(byteRead);
}
} catch (IOException e) {
logger.fatal("Error# SaveFilesThread for guid: " + guid, e);
}finally {
try {
inputStream.close();
outputStream.flush();
outputStream.close();
}catch (Exception e){
e.printStackTrace();
}
also tried:
byte data[] = IOUtils.toByteArray(inputStream);
Path file = Paths.get("video.mp4");
Files.write(file, data);
And from apache commons IO:
FileUtils.copyInputStreamToFile(initialStream, targetFile);
When your code starts, the damage is already done. Your code is not the source of the temporary files (It's.. a ton of work for something that could be done so much simpler, though, see below), it's the framework that ends up handing you that filesToWrite variable.
It is somewhat likely that you can hook in at an earlier point and get the raw inputstream representing the socket or HTTP connection, and start saving the files straight from there. Alternatively, Perhaps filesToWrite has a way to get at the files themselves, in which case you can just move them into place instead of copying them over.
But, your code to do this is a mess, it has bad exception handling, and leaks memory, and is way too much code for a simple job, and is possibly 2000x to 10000x slower than needed depending on your harddisk (I'm not exaggerating, calling single-byte read() on unbuffered streams is thousands of times slower!)
// add `throws IOException` to your method signature.
// it saves files, it's supposed to throw IOException,
// 'doing I/O' is in the very definition of your method!
try (InputStream in = filesToWrite.getInputStream();
OutputStream out = new FileOutputStream(targetFile)) {
in.transferTo(out);
}
That's it. That solves all the problems - no leaks, no speed loss, tiny amount of code, fixes the deplorable error handling (which, here, is 'log something to the log, then print something to standard out, then potentially leak a bunch of resources, then don't tell the calling code anything went wrong and return exactly as if the copy operation succeeded).

How to check that file is opened by another process in Java? [duplicate]

I need to write a custom batch File renamer. I've got the bulk of it done except I can't figure out how to check if a file is already open. I'm just using the java.io.File package and there is a canWrite() method but that doesn't seem to test if the file is in use by another program. Any ideas on how I can make this work?
Using the Apache Commons IO library...
boolean isFileUnlocked = false;
try {
org.apache.commons.io.FileUtils.touch(yourFile);
isFileUnlocked = true;
} catch (IOException e) {
isFileUnlocked = false;
}
if(isFileUnlocked){
// Do stuff you need to do with a file that is NOT locked.
} else {
// Do stuff you need to do with a file that IS locked
}
(The Q&A is about how to deal with Windows "open file" locks ... not how implement this kind of locking portably.)
This whole issue is fraught with portability issues and race conditions:
You could try to use FileLock, but it is not necessarily supported for your OS and/or filesystem.
It appears that on Windows you may be unable to use FileLock if another application has opened the file in a particular way.
Even if you did manage to use FileLock or something else, you've still got the problem that something may come in and open the file between you testing the file and doing the rename.
A simpler though non-portable solution is to just try the rename (or whatever it is you are trying to do) and diagnose the return value and / or any Java exceptions that arise due to opened files.
Notes:
If you use the Files API instead of the File API you will get more information in the event of a failure.
On systems (e.g. Linux) where you are allowed to rename a locked or open file, you won't get any failure result or exceptions. The operation will just succeed. However, on such systems you generally don't need to worry if a file is already open, since the OS doesn't lock files on open.
// TO CHECK WHETHER A FILE IS OPENED
// OR NOT (not for .txt files)
// the file we want to check
String fileName = "C:\\Text.xlsx";
File file = new File(fileName);
// try to rename the file with the same name
File sameFileName = new File(fileName);
if(file.renameTo(sameFileName)){
// if the file is renamed
System.out.println("file is closed");
}else{
// if the file didnt accept the renaming operation
System.out.println("file is opened");
}
On Windows I found the answer https://stackoverflow.com/a/13706972/3014879 using
fileIsLocked = !file.renameTo(file)
most useful, as it avoids false positives when processing write protected (or readonly) files.
org.apache.commons.io.FileUtils.touch(yourFile) doesn't check if your file is open or not. Instead, it changes the timestamp of the file to the current time.
I used IOException and it works just fine:
try
{
String filePath = "C:\sheet.xlsx";
FileWriter fw = new FileWriter(filePath );
}
catch (IOException e)
{
System.out.println("File is open");
}
I don't think you'll ever get a definitive solution for this, the operating system isn't necessarily going to tell you if the file is open or not.
You might get some mileage out of java.nio.channels.FileLock, although the javadoc is loaded with caveats.
Hi I really hope this helps.
I tried all the options before and none really work on Windows. The only think that helped me accomplish this was trying to move the file. Event to the same place under an ATOMIC_MOVE. If the file is being written by another program or Java thread, this definitely will produce an Exception.
try{
Files.move(Paths.get(currentFile.getPath()),
Paths.get(currentFile.getPath()), StandardCopyOption.ATOMIC_MOVE);
// DO YOUR STUFF HERE SINCE IT IS NOT BEING WRITTEN BY ANOTHER PROGRAM
} catch (Exception e){
// DO NOT WRITE THEN SINCE THE FILE IS BEING WRITTEN BY ANOTHER PROGRAM
}
If file is in use FileOutputStream fileOutputStream = new FileOutputStream(file); returns java.io.FileNotFoundException with 'The process cannot access the file because it is being used by another process' in the exception message.

How to test if a file is "complete" (completely written) with Java

Let's say you had an external process writing files to some directory, and you had a separate process periodically trying to read files from this directory. The problem to avoid is reading a file that the other process is currently in the middle of writing out, so it would be incomplete. Currently, the process that reads uses a minimum file age timer check, so it ignores all files unless their last modified date is more than XX seconds old.
I'm wondering if there is a cleaner way to solve this problem. If the filetype is unknown (could be a number of different formats) is there some reliable way to check the file header for the number of bytes that should be in the file, vs the number of bytes currently in the file to confirm they match?
Thanks for any thoughts or ideas!
The way I've done this in the past is that the process writing the file writes to a "temp" file, and then moves the file to the read location when it has finished writing the file.
So the writing process would write to info.txt.tmp. When it's finished, it renames the file to info.txt. The reading process then just had to check for the existence of info.txt - and it knows that if it exists, it has been written completely.
Alternatively you could have the write process write info.txt to a different directory, and then move it to the read directory if you don't like using weird file extensions.
You could use an external marker file. The writing process could create a file XYZ.lock before it starts creating file XYZ, and delete XYZ.lock after XYZ is completed. The reader would then easily know that it can consider a file complete only if the corresponding .lock file is not present.
I had no option of using temp markers etc as the files are being uploaded by clients over keypair SFTP. they can be very large in size.
Its quite hacky but I compare file size before and after sleeping a few seconds.
Its obviously not ideal to lock the thread but in our case it is merely running as a background system processes so seems to work fine
private boolean isCompletelyWritten(File file) throws InterruptedException{
Long fileSizeBefore = file.length();
Thread.sleep(3000);
Long fileSizeAfter = file.length();
System.out.println("comparing file size " + fileSizeBefore + " with " + fileSizeAfter);
if (fileSizeBefore.equals(fileSizeAfter)) {
return true;
}
return false;
}
Note: as mentioned below this might not work on windows. This was used in a Linux environment.
One simple solution I've used in the past for this scenario with Windows is to use boolean File.renameTo(File) and attempt to move the original file to a separate staging folder:
boolean success = potentiallyIncompleteFile.renameTo(stagingAreaFile);
If success is false, then the potentiallyIncompleteFile is still being written to.
This possible to do by using Apache Commons IO maven library FileUtils.copyFile() method. If you try to copy file and get IOException its means that file is not completely saved.
Example:
public static void copyAndDeleteFile(File file, String destinationFile) {
try {
FileUtils.copyFile(file, new File(fileDirectory));
} catch (IOException e) {
e.printStackTrace();
copyAndDeleteFile(file, fileDirectory, delayThreadPeriod);
}
Or periodically check with some delay size of folder that contains this file:
FileUtils.sizeOfDirectory(folder);
Even the number of bytes are equal, the content of the file may be different.
So I think, you have to match the old and the new file byte by byte.
2 options that seems to solve this issue:
the best option- writer process notify reading process somehow that
the writing was finished.
write the file to {id}.tmp, than when finish- rename it to {id}.java, and the reading process run only on *.java files. renaming taking much less time and the chance this 2 process work together decrease.
First, there's Why doesn't OS X lock files like windows does when copying to a Samba share? but that's variation of what you're already doing.
As far as reading arbitrary files and looking for sizes, some files have that information, some do not, but even those that do do not have any common way of representing it. You would need specific information of each format, and manage them each independently.
If you absolutely must act on the file the "instant" it's done, then your writing process would need to send some kind of notification. Otherwise, you're pretty much stuck polling the files, and reading the directory is quite cheap in terms of I/O compared to reading random blocks from random files.
One more method to test that a file is completely written:
private void waitUntilIsReadable(File file) throws InterruptedException {
boolean isReadable = false;
int loopsNumber = 1;
while (!isReadable && loopsNumber <= MAX_NUM_OF_WAITING_60) {
try (InputStream in = new BufferedInputStream(new FileInputStream(file))) {
log.trace("InputStream readable. Available: {}. File: '{}'",
in.available(), file.getAbsolutePath());
isReadable = true;
} catch (Exception e) {
log.trace("InputStream is not readable yet. File: '{}'", file.getAbsolutePath());
loopsNumber++;
TimeUnit.MILLISECONDS.sleep(1000);
}
}
}
Use this for Unix if you are transferring files using FTP or Winscp:
public static void isFileReady(File entry) throws Exception {
long realFileSize = entry.length();
long currentFileSize = 0;
do {
try (FileInputStream fis = new FileInputStream(entry);) {
currentFileSize = 0;
while (fis.available() > 0) {
byte[] b = new byte[1024];
int nResult = fis.read(b);
currentFileSize += nResult;
if (nResult == -1)
break;
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("currentFileSize=" + currentFileSize + ", realFileSize=" + realFileSize);
} while (currentFileSize != realFileSize);
}

Unable to access Java-created file -- sometimes

In Java, I'm working with code running under WinXP that creates a file like this:
public synchronized void store(Properties props, byte[] data) {
try {
File file = filenameBasedOnProperties(props);
if ( file.exists() ) {
return;
}
File temp = File.createTempFile("tempfile", null);
FileOutputStream out = new FileOutputStream(temp);
out.write(data);
out.flush();
out.close();
file.getParentFile().mkdirs();
temp.renameTo(file);
}
catch (IOException ex) {
// Complain and whine and stuff
}
}
Sometimes, when a file is created this way, it's just about totally inaccessible from outside the code (though the code responsible for opening and reading the file has no problem), even when the application isn't running. When accessed via Windows Explorer, I can't move, rename, delete, or even open the file. Under Cygwin, I get the following when I ls -l the directory:
ls: cannot access [big-honkin-filename]
total 0
?????????? ? ? ? ? ? [big-honkin-filename]
As implied, the filenames are big, but under the 260-character max for XP (though they are slightly over 200 characters).
To further add to the sense that my computer just wants me to feel stupid, sometimes the files created by this code are perfectly normal. The only pattern I've spotted is that once one file in the directory "locks", the rest are screwed.
Anybody ever run into something like this before, or have any insights into what's going on here?
Make sure you always close the stream in a finally block. In your case if an exception is thrown the stream might not get closed and will leak a file handle. You could use procexp from SysInternals to see which process holds the handle to the file.
Although, by definition, NTFS should handle path length up to 2^15-1, in practice the length of paths is limited to 255.
You can create files with a longer path name (filename including parent folder names), but you cannot access them afterwards. The error I get in these cases is that the file could not be found. To get rid of these files, I have to shorten the names of parent folders, until the path length is short enough.

Reading a file and editing it in Java

What I am doing is I am reading in a html file and I am looking for a specific location in the html for me to enter some text.
So I am using a bufferedreader to read in the html file and split it by the tag . I want to enter some text before this but I am not sure how to do this. The html would then be along the lines of ...(newText)(/HEAD) (The brackets round head are meant to be angled brackets. Don't know how to insert them)
Would I need a PrintWriter to the same file and if so, how would I tell that to write it in the correct location.
I am not sure which way would be most efficient to do something like this.
Please Help.
Thanks in advance.
Here is part of my java code:
File f = new File("newFile.html");
FileOutputStream fos = new FileOutputStream(f);
PrintWriter pw = new PrintWriter(fos);
BufferedReader read = new BufferedReader(new FileReader("file.html"));
String str;
int i=0;
boolean found = false;
while((str= read.readLine()) != null)
{
String[] data = str.split("</HEAD>");
if(found == false)
{
pw.write(data[0]);
System.out.println(data[0]);
pw.write("</script>");
found = true;
}
if(i < 1)
{
pw.write(data[1]);
System.out.println(data[1]);
i++;
}
pw.write(str);
System.out.println(str);
}
}
catch (Exception e) {
e.printStackTrace( );
}
When I do this it gets to a point in the file and I get these errors:
FATAL ERROR: MERLIN: Unable to connect to EDG API,
Cannot find .edg_properties file.,
java.lang.OutOfMemoryError: unable to create new native thread,
Cannot truncate table,
EXCEPTION:Cannot open connection to server: SQLExceptio,
Caught IOException: java.io.IOException: JZ0C0: Connection is already closed, ...
I'm not sure why I get these or what all of these mean?
please Help.
Should be pretty easy:
Read file into a String
Split into before/after chunks
Open a temp file for writing
Write before chunk, your text, after chunk
Close up, and move temp file to original
Sounds like you are wondering about the last couple steps in particular. Here is the essential code:
File htmlFile = ...;
...
File tempFile = File.createTempFile("foo", ".html");
FileWriter writer = new FileWriter(tempFile);
writer.write(before);
writer.write(yourText);
writer.write(after);
writer.close();
tempFile.renameTo(htmlFile);
Most people suggest writing to a temporary file and then copying the temporary file over the original on successful completion.
The forum thread has some ideas of how to do it.
GL.
For reading and writing you can use FileReaders/FileWriters or the corresponding IO stream classes.
For the editing, I'd suggest to use an HTML parser to handle the document. It can read the HTML document into an internal datastructure which simplifies your effort to search for content and apply modification. (Most?) Parsers can serialize the document to HTML again.
At least you're sure to not corrupt the HTML document structure.
Following up on the list of errors in your edit, a lot of that possibly stems from the OutOfMemoryError. That means you simply ran out of memory in the JVM, so Java was unable to allocate objects. This may be caused by a memory leak in your application, or it could simply be that the work you're trying to do does need more memory transiently than you have allocated it.
You can increase the amount of memory that the JVM starts up with by providing the Xmx argument to the java executable, e.g.:
-Xmx1024m
would set the maximum heap size to 1024 megabytes.
The other issues might possibly caused by this; when objects can't reliably be created or modified, lots of weird things tend to happen. That said, there's a few things that look like you can take action. In particular, whatever MERLIN is it looks like it can't do it's work because it needs a property file for EDG, which it's unable to find in the location it's looking. You'll probably need to either put a config file there, or tell it to look at another location.
The other IOExceptions are fairly self-explanatory. Your program could not establish a connection to the server because of a SQLException (the underlying exception itself will probably be found in the logs); and some other part of the program tried to communicate to a remote machine using a closed connection.
I'd look at fixing the properties file (if it's not a benign error) and the memory issues first, and then seeing if any of the remaining problems still manifest.

Categories