File issues with threading in tomcat - java

I have a tomcat server and i have a controller which writes in to a file, the data coming in the request. SO my doubt is whether multiple threads within the server can write into the same file at the same time and cause issues?
My requirement is that all requests appends data to the same file. I am not using any threading from my end.
My code is as follows:
File file = new File(fileName);
try {
if(!file.exists()) {
file.createNewFile();
}
InputStream inputStream = request.getInputStream();
FileWriter fileWriter = new FileWriter(fileName,true);
BufferedWriter bufferWriter = new BufferedWriter(fileWriter);
bufferWriter.write(IOUtils.toString(inputStream));
bufferWriter.flush();
bufferWriter.close();
}

There is the standard solution for such issue.
You have to create singleton class, which will be shared between all threads.
This singleton will have some BlockingQueue (e.g. LinkedBlockingQueue) in which all threads will put their messages for writing into single file.
This singleton by it self also will be the Thread and inside its run() method it will constantly take values from queue and sequentially write it into needed file.

My requirement is that all requests appends data to the same file
Doing a task for each request (like logging or in your case, appending text to a file) can be best implemented using a filter (javax.servlet.Filter). You don't have to create a singleton manually then and you can turn a filter on or off whenever you need its functionality or not.
However, you still need to synchronize the concurrent access to your file. As Andremoniy pointed out, you can do this using an own Thread, so that your filter does not block the request/response.
EDIT
One thing about the shared object used to write to the file: It is better to store an instance of this object in the javax.servlet.ServletContext rather than creating a singleton object. This is the standard way to go if you need to have an object accessible by all other components in a Java web application using servlets.

Related

Is it worth to use singleton to load a configuration file?

I have a java web applilication. This application loads its configuration file with singleton according source file below.
public class Configuration {
private static Configuration config;
private static Properties prop = null;
private static InputStream input = null;
private Configuration() {
input = this.getClass().getResourceAsStream("/config.properties");
prop = new Properties();
prop.load(input);
input = new FileInputStream(prop.getProperty("soap.config"));
prop = new Properties(prop);
prop.load(new InputStreamReader(input, StandardCharsets.UTF_8));
}
public static Configuration getInstance() {
if (config == null) {
config = new Configuration();
}
return config;
}
}
config.properties (located into resources folder of java project)
soap.config=/home/wildfly/soap.propeties
Content of soap.properties file:
server=192.168.1.1
user=John
pass=thepass
Server features:
Total memory: 8GBram. 30% ram used
1 core, 40gb hard disk
Widfly Server
linux virtual machine
If I want to change some value in config file, it's necessary to restart the aplication by Wildfly admin console also. I think it's more useful to change config file values without restarting the application.
Additionally, the application receives more than thousands request a day and I see server status is fine.
Questions:
Is it beneficial or worth to use singleton to load a configuration file?
The instruction this.getClass().getResourceAsStream("/config.properties") will read the config file and after that it will close it inmediatelly. Is it correct?
Let's break this into two pieces: is a singleton ok for a config file? Kind of. Singletons have some major flaws when it comes to testability. It's better in general to use injection, and to pass an instance of the class to every place that needs it. A singleton will work, but it will make your testing far harder.
Is there a better way to handle config files so they handle changes? Yes. Use a WatchService to watch for when the file is changed (https://docs.oracle.com/javase/tutorial/essential/io/notification.html). The Configuration class should handle this. Then when it does change, it should parse the new file and update itself. This does provide a gap for race conditions though where part of the data is fetched from the old file, and part from the new. There's techniques you can use to avoid that however (providing all the data atomically, or allowing a client to lock the configuration file and only updating when its unlocked, etc).
The best solution I can think of:
simply give the config another field: lastFileEditedTime that stores the timestamp of the file that was loaded last.
create a static variable: lastFileUpdateCheckedTime. this stores the last check time in ms (System.getCurrentMilis) and if the last check has been made more than x seconds ago, check that file again
use both in the getInstance() method. First check against lastFileUpdateCheckedTime, and if that triggers, check against lastFileEditedTime
(you could also make both static or add both to the config, however you like)
This way the system keeps loading updated files, but will not reload too many times per second or do the filesystem timestamp check too often.
And to answer your questions:
yes, singleton is beneficial, because it prevents loading each time, and all parts of your code that use it are (more or less) on the same config
no, getResourceAsStream will give you an open stream, you should close it. Activate compiler warnings, they will clearly show you. Best use try-resource-catch for closing

What is the recommended way to append to files on HDFS?

I'm having trouble figuring out a safe way to append to files in HDFS.
I'm using a small, 3-node Hadoop cluster (CDH v.5.3.9 to be specific). Our process is a data pipeliner which is multi-threaded (8 threads) and it has a stage which appends lines of delimited text to files in a dedicated directory on HDFS. I'm using locks to synchronize access of the threads to the buffered writers which append the data.
My first issue is deciding on the approach generally.
Approach A is to open the file, append to it, then close it for every line appended. This seems slow and would seem to create too many small blocks, or at least I see some such sentiment in various posts.
Approach B is to cache the writers but periodically refresh them to make sure the list of writers doesn't grow unbounded (currently, it's one writer per each input file processed by the pipeliner). This seems like a more efficient approach but I imagine having open streams over a period of time however controlled may be an issue, especially for output file readers (?)
Beyond this, my real issues are two. I am using the FileSystem Java Hadoop API to do the appending and am intermittently getting these 2 exceptions:
org.apache.hadoop.ipc.RemoteException: failed to create file /output/acme_20160524_1.txt for DFSClient_NONMAPREDUCE_271210261_1 for client XXX.XX.XXX.XX because current leaseholder is trying to recreate file.
org.apache.hadoop.ipc.RemoteException: BP-1999982165-XXX.XX.XXX.XX-1463070000410:blk_1073760252_54540 does not exist or is not under Constructionblk_1073760252_545
40{blockUCState=UNDER_RECOVERY, primaryNodeIndex=1, replicas=[ReplicaUnderConstruction[[DISK]DS-ccdf4e55-234b-4e17-955f-daaed1afdd92:NORMAL|RBW], ReplicaUnderConst
ruction[[DISK]DS-1f66db61-759f-4c5d-bb3b-f78c260e338f:NORMAL|RBW]]}
Anyone have any ideas on either of those?
For the first problem, I've tried instrumenting logic discussed in this post but didn't seem to help.
I'm also interested in the role of the dfs.support.append property, if at all applicable.
My code for getting the file system:
userGroupInfo = UserGroupInformation.createRemoteUser("hdfs"); Configuration conf = new Configuration();
conf.set(key1, val1);
...
conf.set(keyN, valN);
fileSystem = userGroupInfo.doAs(new PrivilegedExceptionAction<FileSystem>() {
public FileSystem run() throws Exception {
return FileSystem.get(conf);
}
});
My code for getting the OutputStream:
org.apache.hadoop.fs.path.Path file = ...
public OutputStream getOutputStream(boolean append) throws IOException {
OutputStream os = null;
synchronized (file) {
if (isFile()) {
os = (append) ? fs.append(file) : fs.create(file, true);
} else if (append) {
// Create the file first, to avoid "failed to append to non-existent file" exception
FSDataOutputStream dos = fs.create(file);
dos.close();
// or, this can be: fs.createNewFile(file);
os = fs.append(file);
}
// Creating a new file
else {
os = fs.create(file);
}
}
return os;
}
I got file appending working with CDH 5.3 / HDFS 2.5.0. My conclusions so far are as follows:
Cannot have one dedicated thread doing appends per file, or multiple threads writing to multiple files, whether we’re writing data via one and the same instance of the HDFS API FileSystem, or different instances.
Cannot refresh (i.e. close and reopen) the writers; they must stay open.
This last item leads to occasional relatively rare ClosedChannelException which appears to be recoverable (by retrying to append).
We use a single thread executor service with a blocking queue (one for appending to all files); a writer per file, the writers stay open (till the end of processing when they’re closed).
When we upgrade to CDH newer than 5.3, we’ll want to revisit this and see what threading strategy makes sense: one and only thread, one thread per file, multiple threads writing to multiple files. Additionally, we’ll want to see if writers can be/need to be periodically closed and reopened.
In addition, I have seen the following error as well, and was able to make it go away by setting 'dfs.client.block.write.replace-datanode-on-failure.policy' to 'NEVER' on the client side.
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[XXX.XX.XXX.XX:50010, XXX.XX.XXX.XX:50010], original=[XXX.XX.XXX.XX:50010, XXX.XX.XXX.XX:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969) ~[hadoop-hdfs-2.5.0.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035) ~[hadoop-hdfs-2.5.0.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) ~[hadoop-hdfs-2.5.0.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:532) ~[hadoop-hdfs-2.5.0.jar:?]

Java- using an InputStream as a File

I'm trying to generate a PDF document from an uploaded ".docx" file using JODConverter.
The call to the method that generates the PDF is something like this :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
I'm using apache commons FileUpload to handle uploading the docx file, from which I can get an InputStream object. I'm aware that Java.io.File is just an abstract reference to a file in the system.
I want to avoid the disk write (saving the InputStream to disk) and the disk read (reading the saved file in JODConverter).
Is there any way I can get a File object refering to an input stream? just any other way to avoid disk IO will also do!
EDIT: I don't care if this will end up using a lot of system memory. The application is going to be hosted on a LAN with very little to zero number of parallel users.
File-based conversions are faster than stream-based ones (provided by StreamOpenOfficeDocumentConverter) but they require the OpenOffice.org service to be running locally and have the correct permissions to the files.
Try the doc to avoid disk writting:
convert(java.io.InputStream inputStream, DocumentFormat inputFormat, java.io.OutputStream outputStream, DocumentFormat outputFormat)
There is no way to do it and make the code solid. For one, the .convert() method only takes two Files as arguments.
So, this would mean you'd have to extend File, which is possible in theory, but very fragile, as you are required to delve into the library code, which can change at any time and make your extended class non functional.
(well, there is a way to avoid disk writes if you use a RAM-backed filesystem and read/write from that filesystem, of course)
Chances are that commons fileupload has written the upload to the filesystem anyhow.
Check if your FileItem is an instance of DiskFileItem. If this is the case the write implementation of DiskFileItem willl try to move the file to the file object you pass. You are not causing any extra disk io then since the write already happened.

Deny access to a file for all other processes?

I'm writing an application (for educational purposes), which needs to use database management system (I wrote my own extremely primitive DBMS, it is part of the task). And I want to ensure that at any time my application is running contents of all tables are correct. For that purposes I wrote method, which looks through each file and make necessary checks. The problem is that I want to call this method only once, when application starts and deny access to files to ensure that nobody changed their contents while my program is working.
I use the following approach. When application starts, I initialize InputStreamReader and OutputStreamWriter, store them and close them only when my application is terminated.
Part of initialization method:
FileInputStream fis = new FileInputStream(file);
FileOutputStream fos = new FileOutputStream(file, true);
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
this.tables.get(table).put("fis", fis);
this.tables.get(table).put("fos", fos);
this.tables.get(table).put("isr", isr);
this.tables.get(table).put("osw", osw);
Close method:
try {
for(Map<String, Object> table_map: tables.values()) {
OutputStreamWriter osw = (OutputStreamWriter)table_map.get("osw");
InputStreamReader isr = (InputStreamReader)table_map.get("isr");
if (osw != null)
osw.close();
if (isr != null)
isr.close();
}
}
catch (IOException e) {
throw new DBException("Closing error");
}
Partly, this approach works, because when I try to modify any of these files using MS Notepad, I get the following error
"The process cannot access the file because it is being used by
another process"
That's what I want to see. But if I use Notepad++, I can make any modifications when my application is running, that's not what I expect to see. So what can I do to ensure that no other process can modify my files?
I tried to use FileLock, but it denies access only for my process, if I'm not mistaken.
Sorry for my poor English, hope you will understand my question anyway.
I'm not sure this is a problem worth solving. Whatever approach you take, someone with the correct privileges can probably undo your file protection and could make changes anyway.
It is best to focus on gracefully handling invalid data and otherwise trusting what is in the file. Adding some kind of integrity check (per row or table) will make it harder for someone to accidentally or maliciously change your data in a way that leaves it looking "valid".
If you read the section "Platform dependencies" in the java.nio.channels.FileLock docsyou see that:
FileLocks are not (only) for locking inside one JVM but for all processes on the computer.
File locks (note the different spelling) are greatly platform and configuration specific.
So you basicyll have to ask yourself: What protection do I really need?
If you only want to guard against running your programm multiple times on the same data you can assume that your programm "behaves well" and
use FileLocks or
use a marker lock file or
use a "dirty/locked" marker inside the file
If you want to protect against every other program then you are lost as you have seen in the Notepad++ scenario: Considering all platforms and all possible ways to circumvent locks and using Java- you have no chance.

JAVA: How can my two apps access the same file?

I've made two apps designed to run concurrently (I do not want to combine them), and one reads from a certain file and the other writes to it. When one or the other are running no errors, however if they are both running a get an access is denied error.
Relevant code of the first:
class MakeImage implements Runnable {
#Override
public void run() {
File file = new File("C:/Users/jeremy/Desktop/New folder (3)/test.png");
while (true) {
try{
//make image
if(image!=null)
{
file.createNewFile();
ImageIO.write(image, "png", file);
hello.repaint();}}
catch(Exception e)
{
e.printStackTrace();
}
}
}
}
Relevant code of the second:
BufferedImage image = null;
try {
// Read from a file
image = ImageIO.read(new File("C:/Users/jeremy/Desktop/New folder (3)/test.png"));
}
catch(Exception e){
e.printStackTrace();
}
if(image!=null)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write( image, "png", baos );
baos.flush();
byte[] imageInByte = baos.toByteArray();
baos.close();
returns=Base64.encodeBase64String(imageInByte);
}
I looked at this: Java: how to handle two process trying to modify the same file, but that is when both are writting to the file where here only one is. I tried the retry later method as suggested in the former's answer without any luck. Any help would be greatly appreciated.
Unless you use OS level file locking of some sort and check for the locks you're not going to be able to reliably do this very easily. A fairly reliable way to manage this would be to use another file in the directory as a semaphore, "touch" it when you're writing or reading and remove it when you're done. Check for the existence of the semaphore before accessing the file. Otherwise you will need to use a database of some sort to store the file lock (guaranteed consistency) and check for it there.
That said, you really should just combine this into 1 program.
Try RandomAccessFile.
This is a useful but very dangerous feature. It goes like this "if you create different instances of RandomAccessFile for a same file you can concurrently write to the different parts of the file."
You can create multiple threads pointing to different parts of the file using seek method and multiple threads can update the file at the same time. Seek allow you to move to any part of the file even if it doesn't exist (after EOF), hence you can move to any location in the newly created file and write bytes on that location. You can open multiple instances of the same file and seek to different locations and write to multiple locations at the same time.
Use synchronized on the method that modify the file.
Edited:
As per the Defination of a Thread safe class, its this way.. " A class is said to be thread safe, which it works correctly in the presence of the underlying OS interleaving and scheduling with NO means of synchronization mechanism from the client side".
I believe there is a File which is to be accessed on to a different machine, so there must be some client-server mechanism, if its there.. then Let the Server side have the synchronization mechanism, and then it doesnt matters how many client access it...
If not, synchronized is more than enough........

Categories