Specifically I am using lucene to perform full text searching and in certain scenarios the index file might become corrupted or simply has not been created yet at which point I would delete the file and rewrite the index to said file. My question pertains to the actual act of deleting and re-writing to a file in a multi threaded Java program.
Will synchronizing protect the IO File while its being deleted and restored? In other words will it block access to another thread coming along and attempting to use the same method and begin rewriting while its already in the process?
The setDirectory method needs to be run before any other methods in the class will work (it will throw errors otherwise), so does the way I have the synchronization setup protect me from any multi threaded mishaps?
When another thread attempts to use the setDirectory method and the buildCompleteIndex method is already in progress, will the thread simply wait for that to finish and then its check on whether the path exists will pass and it will move on to opening the index?
In lucene do I have to synchronize when writing, deleting, or searching the index or can these tasks be done concurrently?
public void setDirectory(int organizationId) throws IOException {
this.organizationId = organizationId;
File path = new File(INDEX_PATH + "/" + String.valueOf(organizationId));
//If path does not exist, create it and create new index for organization
synchronized(this) {
if(!path.exists()) {
path.mkdirs();
buildCompleteIndex(organizationId, false);
}
}
this.directory = FSDirectory.open(path); //Open directory
}
private void buildCompleteIndex(int organizationId, boolean rebuildDir) {
if(rebuildDir) {
File path = new File(INDEX_PATH + "/" + String.valueOf(organizationId));
try {
Utils.deleteDirectory(path);
} catch (IOException e) {
throw new LuceneIndexException("Error rebuilding index directory.", e);
}
path.mkdirs();
}
List<Tag> tagList = tagDAO.findAll(organizationId);
for(Tag tag : tagList) {
add(tag);
}
}
Related
I am writing a method that takes a a string of html and writes it to a file. The method should increment the file name if the file already exists. For example, if wordmatch.html already exists then a new file should be created wordmatch1.html so and so fourth.
I have created a method that writes the html to a file. I'm working on the last part to incrementally change the name of a new file if the file already existst.
public void saveContent(WordMatch wordMatch){
logger.info(wordMatch);
try {
File file = new File("wordmatch0.html");
String html = wordMatch.toString();
String cleanedHTML = html.replace("WordMatch(content=","").replace(")","");
logger.info(cleanedHTML);
if (file.createNewFile()) {
System.out.println("File created: " + file.getName());
try {
FileWriter myWriter = new FileWriter("word_match.html");
myWriter.write(cleanedHTML);
myWriter.close();
System.out.println("Successfully wrote to the file.");
} catch (IOException e) {
System.out.println("An error occurred.");
e.printStackTrace();
}
} else {
String fileName = file.getName().toString();
String index = fileName.substring(fileName.indexOf("h") + 1);
index = index.substring(0, index.indexOf("."));
Integer parsedInt = Integer.parseInt(index);
System.out.println(parsedInt);
parsedInt+=1;
fileName = fileName.replace(index,parsedInt.toString());
System.out.println(fileName);
System.out.println("fileName should have been printed by now");
file = new File(fileName);
FileWriter myWriter = new FileWriter(file);
myWriter.write(cleanedHTML);
myWriter.close();
//TODO add method to write file name with new index
System.out.println("File already exists.");
}
} catch (IOException e) {
System.out.println("An error occurred.");
e.printStackTrace();
}
}
Any help with this would be greatly appreciated. Thanks.
A simple approach will be count the number of files matching your file name and then increment the numberOfFiles to create a new file name :
Stream<Path> files = Files.list(Paths.get("C:\\your\\local\\path"));
long numberOfFiles = files.map(Path.class::cast)
.filter(path -> path.getFileName().toString().startsWith("wordmatch"))
.count();
After all you have to manage certains situations, to have a good algorithm for managing your files.
A problem that seems trivial but has so many pitfalls.
The algorithm you wrote won't work for the following reasons:
Simple if-else is not enough, you need to go through a loop to find the last index, because potentially there could be many files created already.
Else block tries to find an index from the file name that should't have one.
Moreover, there are additional questions that may raise.
What if someone deleted the intermediate indexes and now you have 1 and 4, do you want to go with 2 or 5?
Can someone delete the files from the directory except the programm?
Are nested directories possible?
How often files are created?
Can someone manually create a file with a proper name bypassing the programm?
And more importand question is - do you really want to stick to the strict brute-force counter on the actual files listed in a directory?
If the answer is yes, the more reasonable would be to check the files using File.list(), sort them, take the last index and increment them instead of trying to create a file and increment on a failure.
public class Main {
public static void main(String[] args) {
File directoryPath = new File("path_to_your_dir");
FilenameFilter filenameFilter = (dir, name) -> !dir.isFile();
Integer maxIndex = Arrays.stream(directoryPath.list(filenameFilter))
// Take numeric index from the end of the file name, beware of file extensions!
.map(name -> name.substring("word_match".length(), name.lastIndexOf('.')))
.map(Main::parseOrDefault) // Parse it to a number
.max(Integer::compareTo) // Define how to compare them
.orElse(-1);
// -1 is no files, 0 if a file with no index, otherwise max index
System.out.println(maxIndex);
}
private static Integer parseOrDefault(String integer) {
try {
return Integer.valueOf(integer);
} catch (NumberFormatException e) {
return 0;
}
}
}
If the answer is no, you can have a counter that is persisted somewhere (file system, BD) and incremented regardless every time.
And more simple approach is establish a frequence of file creations and simply append a timestamp/date-time to the end of each file.
I am facing a strand kind of issue in multi threaded environment.
Though this code was pretty old and was working since long time.
One of the person complained that they are facing issue like. Even
though the file created by one thread exist , another thread saying no
file exist.
I providing a sample method where the problem is coming..
/**
* Creates a temporary directory. Will be deleted when the program closed if it
* is empty!
*
* #return the temporary directory
* #throws com.osm.exception.WMException if there is a problem
*/
public static File createTempDir() throws WMException {
synchronized (pdm.semaphore) {
try {
final File parent = WMSession.getWMSession().getRootTempDir();
if (!parent.exists()) {
throw new IllegalStateException(parent + " does not exist"); //frozen
}
final File tmpDirectory = File.createTempFile("WMTempDir", "", parent); //frozen
tmpDirectory.delete();
tmpDirectory.mkdirs();
logFileCreated(tmpDirectory);
return tmpDirectory;
} catch (final IOException ioe) {
throw new WMException(ioe);
}
}
}
This code is being called from another method code as below.
void copy_from_db2_using_temp_dir(String phys_name, int storage_type, int store_date, int file_flag,
String directory, String file_name) throws WMException {
final File destDir = new File(directory);
if (!destDir.exists()) {
// no conflict possible since destination directory does not yet exist.
pdm.copy_from_db2(phys_name, storage_type, store_date, file_flag, directory, file_name);
return;
}
final File tmpDir = WMFile.createTempDir();
final File tmpDestFile = new File(tmpDir, file_name);
final File destFile = new File(directory, file_name);
try {
final boolean destFileExistsFlag = destFile.exists();
if (destFileExistsFlag && (file_flag != DEL_OLD)) {
final String msg = pdm.fix_mesg(pdm.catgets("data_mgr", 266, "*** ERROR: Cannot overwrite file '{1}'"),
destFile.getAbsolutePath());
throw new WMException(msg);
}
pdm.copy_from_db2(phys_name, storage_type, store_date, file_flag, tmpDir.getAbsolutePath(), file_name);
if (tmpDestFile.isFile() && destFile.isDirectory()) {
final String msg = pdm.fix_mesg(pdm.catgets("data_mgr", 269, "*** ERROR: Could not remove file '{1}'"),
destFile.getAbsolutePath());
throw new WMException(msg);
}
moveFiles(tmpDestFile, destFile, (file_flag == DEL_OLD));
} finally {
deleteTempDir(tmpDir);
}
}
The another thread/process always getting the condition
!parent.exists() true. Which is incorrect as it must get the parent
file.
Need suggestion input or any logging that will helpful to know if the
invocation has some issue or some issue in the code.
I got something on StackOverflow but not sure if that is relevant
here.
File.exists() issue in multi threaded env
if (!parent.exists()) { in your createTempDir function is triggered, because the parentFolder of the file that you are trying to create does not exist. This has nothing to do with multithreading.
Example:
Lets say we are trying to create the folder C:\myGame\logs in the createTempDir method. Your code will first test to see if C:\myGame exists. If it does not exist, then your code will throw an illegal state exception and not continue execution.
In other words: the parent directory in which you want to create your logs directory does not exist. This could be due to a number of reasons:
WMSession.getWMSession().getRootTempDir() is not properly configured: it points to a wrong filepath.
Perhaps you don't even need to assert that the parent directory exists. Because you call mkdirs() in your code, all required ancestor-directories for your logs directory will be automatically created.
You can consider the following solutions:
Properly configure WMSession so that it points to the correct folder, assuming that you want the parent directory to exist in advance of your code execution.
Simply don't care about if the parent directory exists, as mkdirs handles this for you.
zip4j is a great library. But i run into a problem when using it in a class that uses a thread. The zip4j method is called from a class that implements thread and sometimes (not always) it leaves files uncompress and somtimes there are leftofer files with the extension *.zip345. Also the process returns net.lingala.zip4j.exception.ZipException: cannot rename modified zip file.
The method zip4jProcess is called from the class public method. Class name is: SZipInterface.class
The SZipInterface.class is initialized in the thread class ex: ThreadObj.class and instantiated per thread. No static method is used.
What is the cause of the problems? How do you fix it? Is zip4j thread safe?
Method:
private int zip4jProcess() {
int status = 0;
if (null != getInFiles() && getInFiles().length > 0) {
for (String file : getInFiles()) {
File sourceFile = new File(file);
ZipFile zipFile = null;
ZipParameters zipParams = new ZipParameters();
if (getPassword() != null
&& !getPassword().trim().equalsIgnoreCase("")) {
zipParams.setPassword(getPassword());
zipParams.setEncryptFiles(true);
zipParams
.setEncryptionMethod(Zip4jConstants.ENC_METHOD_STANDARD);
}
zipParams
.setCompressionLevel(Zip4jConstants.DEFLATE_LEVEL_NORMAL);
if (sourceFile.exists()) {
try {
zipFile = new ZipFile(getZipFileName());
if (zipFile.getFile().exists()) {
zipFile.addFile(sourceFile, zipParams);
if (log.isDebugEnabled()) {
log.debug("Adding: " + sourceFile.getName()
+ " to " + zipFile.getFile().getName()
+ " Pass: " + getPassword());
}
} else {
zipFile.createZipFile(sourceFile, zipParams);
if (log.isDebugEnabled()) {
log.debug("Creating: " + sourceFile.getName()
+ " to " + zipFile.getFile().getName()
+ " Pass: " + getPassword());
}
}
} catch (ZipException e) {
log.error(e);
status = 1;
}
}
}
}
return status;
}
I believe the times where you have leftovers or uncomprossed files may be when multiple threads try to use the same zip file (probably at zipFile.addFile(...)).
So try handling the addFile differently with concurrency in mind.
Their support forum said it's tricky and not currently supported - see the link for the limitations of doing it.
This can be quite tricky to implement, if not impossible to achieve,
especially when using encryption or when compressing the file (and not
just using the store method, which just copies the source file to the
zip without any compression). A current block of file being
compressed/decompressed depends on the previous block. So, if multiple
threads were to read or write, these threads cannot do this process
simultaneously, but have to wait until the block n-1 (if n is the
current block) is read/wrote. So, its as good as running the process
in the same thread.
Writing different files in different threads to a zip file (each
thread handling a unique file in the zip) can be tricky as well. For
example: AES encryption requires a unique number (as part of salt
calculation) for each file in the zip. And another example: if a zip
file is being created and multiple number of files being added (with
compression), then the second thread, which will start writing the
second file to the zip should know exactly at which location in the
zip file to start writing, and this cannot be determined until the
first thread is done writing.
Some compression algorithms, like LZMA/LZMA2, support multithreading.
Unfortunately, these compression methods are not supported by Zip4j at
the moment.
Full text of their response (in case the post gets removed).
When a directory monitored by a WatchService gets deleted, its parent directory does not immediately reflect the deletion in its File's listFiles method and cannot be deleted. Until the entire service is explicitly stopped the consequences for the parent appear to be:
The recommended recursive solution for deleting a non-empty directory failing.
deleteOnExit not being carried out on normal termination
Calls to delete returning false and having no effect on the filesystem.
To demonstrate, this test code:
import java.io.*;
import java.nio.file.*;
class DirectoryTester {
static WatchService watcher;
static {
try{watcher = FileSystems.getDefault().newWatchService();}
catch (IOException e) {e.printStackTrace();}
}
public static void main(String[] args) throws IOException {
String SEPARATE = System.getProperty("file.separator");
String testDirName = System.getProperty("user.dir") + SEPARATE + "testDir";
String subDirName = testDirName + SEPARATE + "subDir";
String fileName = subDirName + SEPARATE +"aFile";
create(fileName);
Paths.get(subDirName).register(watcher, StandardWatchEventKinds.ENTRY_DELETE);
delete(new File(testDirName));
}
static void create(String nameOfFile) throws IOException {
new File(nameOfFile).getParentFile().mkdirs();
Files.createFile(Paths.get(nameOfFile));
System.out.println("Created " + nameOfFile);
}
static void delete(File toDelete) throws IOException {
if (toDelete.isDirectory())
for (File c : toDelete.listFiles())
delete(c);
int numContainedFiles = toDelete.listFiles() != null ? toDelete.listFiles().length : 0;
if (!toDelete.delete()) {
System.out.println("Failed to delete " + toDelete + " containing " + numContainedFiles);
}
else {
System.out.println("Deleted " + toDelete + " containing " + numContainedFiles);
}
}
}
gives the following output on windows, which corresponds with testDir not being deleted on the filesystem.
Created C:\Dropbox\CodeSpace\JavaTestbed\src\testDir\subDir\aFile
Deleted C:\Dropbox\CodeSpace\JavaTestbed\src\testDir\subDir\aFile containing 0
Deleted C:\Dropbox\CodeSpace\JavaTestbed\src\testDir\subDir containing 0
Failed to delete C:\Dropbox\CodeSpace\JavaTestbed\src\testDir containing 1
If I put a breakpoint after the subDir deletion I can see that it has actually been deleted on the filesystem. Resuming from the breakpoint causes the last deletion to suceed, suggesting that this might be an issue with the visibility of changes made by the watch service thread. Does anyone know what is going on here, and if it is a bug? What I am actually trying to do is to delete directories that are monitored without stopping the monitoring on other directories, given that there does not appear to be an unregister path method provided by the API what are other standard Java ways of accomplishing this?
possibly related:
http://bugs.sun.com/view_bug.do?bug_id=6972833
The WatchService has an open handle to each watched directory. If a a watch directory is deleted then the WatchService closes the handle so that the directory entry can be removed from the parent directory. A problem arises for utilities and application that expect to be able to delete the parent directory immediately as it can take a few milliseconds for the watch service to get the notificationa and close the handle. If during that time that the tool attempts to delete the parent directory then it will fail. We don't have a solution to this issue at this time.
If I do this:
File f = new File("c:\\text.txt");
if (f.exists()) {
System.out.println("File exists");
} else {
System.out.println("File not found!");
}
Then the file gets created and always returns "File exists". Is it possible to check if a file exists without creating it?
EDIT:
I forgot to mention that it's in a for loop. So here's the real thing:
for (int i = 0; i < 10; i++) {
File file = new File("c:\\text" + i + ".txt");
System.out.println("New file created: " + file.getPath());
}
When you instantiate a File, you're not creating anything on disk but just building an object on which you can call some methods, like exists().
That's fine and cheap, don't try to avoid this instantiation.
The File instance has only two fields:
private String path;
private transient int prefixLength;
And here is the constructor :
public File(String pathname) {
if (pathname == null) {
throw new NullPointerException();
}
this.path = fs.normalize(pathname);
this.prefixLength = fs.prefixLength(this.path);
}
As you can see, the File instance is just an encapsulation of the path. Creating it in order to call exists() is the correct way to proceed. Don't try to optimize it away.
Starting from Java 7 you can use java.nio.file.Files.exists:
Path p = Paths.get("C:\\Users\\first.last");
boolean exists = Files.exists(p);
boolean notExists = Files.notExists(p);
if (exists) {
System.out.println("File exists!");
} else if (notExists) {
System.out.println("File doesn't exist!");
} else {
System.out.println("File's status is unknown!");
}
In the Oracle tutorial you can find some details about this:
The methods in the Path class are syntactic, meaning that they operate on the Path instance. But eventually you must access the file system to verify that a particular Path exists, or does not exist. You can do so with the exists(Path, LinkOption...) and the notExists(Path, LinkOption...) methods. Note that !Files.exists(path) is not equivalent to Files.notExists(path). When you are testing a file's existence, three results are possible:
The file is verified to exist.
The file is verified to not exist.
The file's status is unknown. This result can occur when the program does not have access to the file.
If both exists and notExists return false, the existence of the file cannot be verified.
Creating a File instance does not create a file on the file system, so the posted code will do what you require.
The Files.exists method has noticeably poor performance in JDK 8, and can slow an application significantly when used to check files that don't actually exist.
This can be applied too for Files.noExists, Files.isDirectory and Files.isRegularFile
According this you can use the following :
Paths.get("file_path").toFile().exists()