How to handle simultaneous reads/writes of text file [duplicate]

How to handle simultaneous reads/writes of text file [duplicate] - java

Here's the scenario:
ThreadA is going to read from some socket, and write data to "MyFile.txt"
ThreadB is going to read "MyFile", and when it reaches the end, it will loops until new data are available in MyFile (because i don't want to re-open "MyFile.txt", and lose the time so i reach the position from where i was..).
Is it possible to do such a thing ?
If not, is there another way to do such a thing ?

The problem you mention is a famous Producer Consumer Problem
Common solution to this is to use BlockingQueue
An example of real world usage is in AjaxYahooSearchEngineMonitor
What Thread A does is, it will submit a string to queue, and then return immediately.
What Thread B does is, it will pick up the item from queue one by one, and process them.
When there is no item in the queue, Thread B will just wait there. See line 83 of the source code.

Related

Java threading correct way of implementation

The main point of this project is to add a timestamp to the database each second.
What i need.
1. BackgroundTask for adding generated timestamp to db.
2. BackgroundTask for adding data to Buffer while server connection is offline.
3. Add the data that was saved to the buffer inside the db, while saving new timestamps while the app is working
I have compleated the 1st and 2nd part but having trouble figuring out the 3rd part.
I have 2 thread classes and both implement Runnable.
When server status is positive, Thread A adds data to database.
When server status is negative Thread B creates a Buffer and stores the data there.
Now I need Thread C , that tries to connect to the server every 5 seconds and when it establishes the connection Thread B should somehow implement the data to database(in FIFO order) .
I'm having trouble figuring out what to do with the threads and the correct way of implementing further functionality, could someone give me a navigation of how I should implement the following functionality?

How about using BlockingQueue and specifically LinkedBlockingQueue(FIFO)?
Thread A will keep adding the data in queue no matter the connection is available or not and thread B will try to read and commit to DB so in case of no data in queue Thread B will block and wait for data .
Note : Go for unbounded Blocking queue if you want to ensure no tasks are rejected.

I would implement it using two threads, thread A would write to a LinkedList every second, and that's all thread A would be concerned about; and thread B to continuously read from the head (LinkedList.remove()) and attempt to upload it to the database, this way if it fails you can retry indefinitely until it succeeds and then continue reading the head of the LinkedList.
However, you would have to keep thread safety in mind. That being said, I think you should be fine if you just set thread B to run half a second behind thread A since thread B would never get ahead of thread A, even if every single upload to the database is successful.

Pausing a thread from another one: how to or alternatives

I'm facing a problem. I have a Java client-server application that manages a restaurant; I need to use just keyboard input and output in order to manage it.
My waiter class has two threads: one that reads orders from the input, and one that is continuatively listening (using multicast) for a ready dish that needs to be delivered to the table.
Considering that everything is meant to be done with keyboard, my "ordering" thread writes to standard output "Do you wanna make an order? [y/n]" and waits for an answer, while the "delivery" thread listens for something to be delivered.
If the waiter chooses to order something, then the second thread doesn't show anything until the order is finished (done using a status boolean); if the waiter is free (which means, is showing the hint: "Do you wanna order?") and a ready dish arrives he will see on standard input "There is something to be delivered. Wanna deliver it? [y/n]" and wait for an answer.
My problem is that, whatever he chooses, I don't have any control on which thread will read the answer: did he mean to deliver or to order?
I've tried many possibilities, everyone not working:
- closing the ordering scanner from input, but it can't close;
- pausing the first thread from the other one, but you can't do that in Java;
- synchronizing everything, not working because the threads are meant to work together, not one at a time;
- use some semaphores/status boolean, but in that case I need to modify all the "ordering" part, including an infinite loop that checks that semaphores (I can't use acquire or release without stopping everything).
Any ideas/hints on how to solve the problem?

Have one thread and only one thread listening to UDP. It can store the results in a thread safe collection for the keyboard thread to read or wait on.

Access File through multiple threads

I want to access a large file (file size may vary from 30 MB to 1 GB) through 10 threads and then process each line in the file and write them to another file through 10 threads. If I use only one thread to access the IO, the other threads are blocked. The processing takes some time almost equivalent to reading a line of code from file system. There is one more constraint, the data in the output file should be in the same order as that of the input file.
I want your thoughts on the design of this system. Is there any existing API to support concurrent access to files?
Also writing to same file may lead to deadlock.
Please suggest how to achieve this if I am concerned with time constraint.

I would start with three threads.
a reader thread that reads the data, breaks it into "lines" and puts them in a bounded blocking queue (Q1),
a processing thread that reads from Q1, does the processing and puts them in a second bounded blocking queue (Q2), and
a writer thread that reads from Q2 and writes to disk.
Of course, I would also ensure that the output file is on a physically different disk than the input file.
If processing tends to be faster slower than the I/O (monitor the queue sizes), you could then start experimenting with two or more parallell "processors" that are synchronized in how they read and write their data.

You should abstract from the file reading. Create a class that reads the file and dispatches the content to a various number of threads.
The class shouldn't dispatch strings, it should wrap them in a Line class that contains meta information, e. g. The line number, since you want to keep the original sequence.
You need a processing class, that does the actual work on the collected data. In your case there is no work to do. The class just stores the information, you can extend it someday to do additional stuff (E.g. reverse the string. Append some other strings, ...)
Then you need a merger class, that does some kind of multiway merge sort on the processing threads and collects all the references to the Line instances in sequence.
The merger class could also write the data back to a file, but to keep the code clean...
I'd recommend to create a output class, that again abstracts from all the file handling and stuff.
Of course you need much memory for this approach, if you are short on main memory. You'd need a stream based approach that kind of works inplace to keep the memory overhead small.
UPDATE Stream-based approach
Everthing stays the same except:
The Reader thread pumps the read data into a Balloon. This balloon has a certain number of Line instances it can hold (The bigger the number, the more main memory you consume).
The processing threads take Lines from the balloon, the reader pumps more lines into the balloon as it gets emptier.
The merger class takes the lines from the processing threads as above and the writer writes the data back to a file.
Maybe you should use FileChannel in the I/O threads, since it's more suited for reading big files and probably consumes less memory while handling the file (but that's just an estimated guess).

Any sort of IO whether it be disk, network, etc. is generally the bottleneck.
By using multiple threads you are exacerbating the problem as it is very likely only one thread can have access to the IO resource at one time.
It would be best to use one thread to read, pass off info to a worker pool of threads, and then writing directly from there. But again if the workers write to the same place there will be bottlenecks as only one can have the lock. Easily fixed by passing the data to a single writer thread.
In "short":
Single reader thread writes to BlockingQueue or the like, this gives it a natural ordered sequence.
Then worker pool threads wait on the queue for data, recording its sequence number.
Worker threads then write the processed data to another BlockingQueue this time attaching its original sequence number so that
The writer thread can take the data and write it in sequence.
This will likely yield the fastest implementation possible.

One of the possible ways will be to create a single thread that will read input file and put read lines into a blocking queue. Several threads will wait for data from this queue, process the data.
Another possible solution may be to separate file into chunks and assign each chunk to a separate thread.
To avoid blocking you can use asynchronous IO. You may also take a look at Proactor pattern from Pattern-Oriented Software Architecture Volume 2

You can do this using FileChannel in java which allows multiple threads to access the same file. FileChannel allows you to read and write starting from a position. See sample code below:
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class OpenFile implements Runnable
{
private FileChannel _channel;
private FileChannel _writeChannel;
private int _startLocation;
private int _size;
public OpenFile(int loc, int sz, FileChannel chnl, FileChannel write)
{
_startLocation = loc;
_size = sz;
_channel = chnl;
_writeChannel = write;
}
public void run()
{
try
{
System.out.println("Reading the channel: " + _startLocation + ":" + _size);
ByteBuffer buff = ByteBuffer.allocate(_size);
if (_startLocation == 0)
Thread.sleep(100);
_channel.read(buff, _startLocation);
ByteBuffer wbuff = ByteBuffer.wrap(buff.array());
int written = _writeChannel.write(wbuff, _startLocation);
System.out.println("Read the channel: " + buff + ":" + new String(buff.array()) + ":Written:" + written);
}
catch (Exception e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
throws Exception
{
FileOutputStream ostr = new FileOutputStream("OutBigFile.dat");
FileInputStream str = new FileInputStream("BigFile.dat");
String b = "Is this written";
//ostr.write(b.getBytes());
FileChannel chnl = str.getChannel();
FileChannel write = ostr.getChannel();
ByteBuffer buff = ByteBuffer.wrap(b.getBytes());
write.write(buff);
Thread t1 = new Thread(new OpenFile(0, 10000, chnl, write));
Thread t2 = new Thread(new OpenFile(10000, 10000, chnl, write));
Thread t3 = new Thread(new OpenFile(20000, 10000, chnl, write));
t1.start();
t2.start();
t3.start();
t1.join();
t2.join();
t3.join();
write.force(false);
str.close();
ostr.close();
}
}
In this sample, there are three threads reading the same file and writing to the same file and do not conflict. This logic in this sample has not taken into consideration that the sizes assigned need not end at a line end etc. You will have find the right logic based on your data.

I have encountered a similar situation before and the way I've handled it is this:
Read the file in the main thread line by line and submit the processing of the line to an executor. A reasonable starting point on ExecutorService is here. If you are planning on using a fixed no of threads, you might be interested in Executors.newFixedThreadPool(10) factory method in the Executors class. The javadocs on this topic isn't bad either.
Basically, I'd submit all the jobs, call shutdown and then in the main thread continue to write to the output file in the order for all the Future that are returned. You can leverage the Future class' get() method's blocking nature to ensure order but you really shouldn't use multithreading to write, just like you won't use it to read. Makes sense?
However, 1 GB data files? If I were you, I'd be first interested in meaningfully breaking down those files.
PS: I've deliberately avoided code in the answer as I'd like the OP to try it himself. Enough pointers to the specific classes, API methods and an example have been provided.

Be aware that the ideal number of threads is limited by the hardware architecture and other stuffs (you could think about consulting the thread pool to calculate the best number of threads). Assuming that "10" is a good number, we proceed. =)
If you are looking for performance, you could do the following:
Read the file using the threads you have and process each one according to your business rule. Keep one control variable that indicates the next expected line to be inserted on the output file.
If the next expected line is done processing, append it to a buffer (a Queue) (it would be ideal if you could find a way to insert direct in the output file, but you would have lock problems). Otherwise, store this "future" line inside a binary-search-tree, ordering the tree by line position. Binary-search-tree gives you a time complexity of "O(log n)" for searching and inserting, which is really fast for your context. Continue to fill the tree until the next "expected" line is done processing.
Activates the thread that will be responsible to open the output file, consume the buffer periodically and write the lines into the file.
Also, keep track of the "minor" expected node of the BST to be inserted in the file. You can use it to check if the future line is inside the BST before starting searching on it.
When the next expected line is done processing, insert into the Queue and verify if the next element is inside the binary-search-tree. In the case that the next line is in the tree, remove the node from the tree and append the content of the node to the Queue and repeat the search if the next line is already inside the tree.
Repeat this procedure until all files are done processing, the tree is empty and the Queue is empty.
This approach uses
- O(n) to read the file (but is parallelized)
- O(1) to insert the ordered lines into a Queue
- O(Logn)*2 to read and write the binary-search-tree
- O(n) to write the new file
plus the costs of your business rule and I/O operations.
Hope it helps.

Spring Batch comes to mind.
Maintaining the order would require a post process step i.e Store the read index/key ordered in the processing context.The processing logic should store the processed information in context as well.Once processing is done you can then post process the list and write to file.
Beware of OOM issues though.

Since order need to be maintained, so problem in itself says that reading and writing cannot be done in parallel as it is sequential process, the only thing that you can do in parallel is processing of records but that also doesnt solve much with only one writer.
Here is a design proposal:
Use One Thread t1 to read file and store data into a LinkedBlockingQueue Q1
Use another Thread t2 to read data from Q1 and put into another LinkedBlockingQueue Q2
Thread t3 reads data from Q2 and writes into a file.
To make sure that you dont encounter OutofMemoryError you should initialize Queues with appropriate size
You can use a CyclicBarrier to ensure all thread complete their operation
Additionally you can set an Action in CyclicBarrier where you can do your post processing tasks.
Good Luck, hoping you get the best design.
Cheers !!

I have faced similar problem in past. Where i have to read data from single file, process it and write result in other file. Since processing part was very heavy. So i tried to use multiple threads. Here is the design which i followed to solve my problem:
Use main program as master, read the whole file in one go (but dont start processing). Create one data object for each line with its sequence order.
Use one priorityblockingqueue say queue in main, add these data objects into it. Share refernce of this queue in constructor of every thread.
Create different processing units i.e. threads which will listen on this queue. When we add data objects to this queue, we will call notifyall method. All threads will process individually.
After processing, put all results in single map and put results against with key as its sequence number.
When queue is empty and all threads are idle, means processing is done. Stop the threads. Iterate over map and write results to a file

Is it possible to read and write in file at the same time?

Here's the scenario:
ThreadA is going to read from some socket, and write data to "MyFile.txt"
ThreadB is going to read "MyFile", and when it reaches the end, it will loops until new data are available in MyFile (because i don't want to re-open "MyFile.txt", and lose the time so i reach the position from where i was..).
Is it possible to do such a thing ?
If not, is there another way to do such a thing ?

The problem you mention is a famous Producer Consumer Problem
Common solution to this is to use BlockingQueue
An example of real world usage is in AjaxYahooSearchEngineMonitor
What Thread A does is, it will submit a string to queue, and then return immediately.
What Thread B does is, it will pick up the item from queue one by one, and process them.
When there is no item in the queue, Thread B will just wait there. See line 83 of the source code.

deleting a file in java while uploading it in other thread

i'm trying to build a semi file sharing program, when each computer acts both as a server and as a client.
I give multiple threads the option to DL the file from my system.
also, i've got a user interface that can recieve a delete message.
my problem is that i want that the minute a delete message receieved, i wait for all the threads that are DL the file to finish DL, and ONLY than excute file.delete().
what is the best way to do it?
I thought about some database that holds > and iterate and check if the thread is active, but it seems clumsy. is there a better way?
thanks

I think you can do this more simply than using a database. I would put a thin wrapper class around File.. a TrackedFile. It has the file inside, and a count of how many people are reading it. When you do to delete, just stop allowing new people to grab the file, and wait for the count to get to 0.
Since you are dealing with many threads accessing shared state, make sure you properly use java.util.concurrent

I am not sure this addresses all your problems, but this is what I have in mind:
Assumming that all read/write/delete operations occur only from within the same application, a thread synchronization mechanism using locks can be useful.
For every new file that arrives, a new read/write lock can be created (See Java's ReentrantReadWriteLock). The read lock should be acquired for all read operations, while the write lock should be acquired for write/delete operations. Of course, when the lock is acquired you should check whether the operation is still meaningful (i.e. whether the file still exists).

Your delete event handling thread (probably your UI) will become un-responsive if you have to wait for all readers to finish. Instead queue the delete and periodically poll for deletions which can be processed. You can use:
private class DeleteRunnable implements Runnable {
public void run() {
while (!done) {
ArrayList<DeletedObject> tmpList = null;
synchronized (masterList) {
tmpList = new ArrayList<DeletedObjects>(masterList);
}
for (DeletedObject o : tmpList)
if (o.waitForReaders(500, TimeUnit.MilliSeconds))
synchronized (masterList) {
masterList.remove(o);
}
}
}
}

If you were to restructure your design just slightly so that loading the file from disk and uploading the file to the client were not done in the same thread, you could wait for the file to stop being accessed simply by locking out new threads from reading this file, then iterating over all of the threads reading from that file and do a join() on each one, one at a time. As long as the file-reading threads terminate directly after loading the file, the iteration will finish the moment the last thread is no longer reading the file and you are good to go.
The following paragraph is based on the assumption that you keep re-reading the file data over multiple times, even if the reading threads are both reading during the same general time frame, since that's what it sounds like you're doing.
Doing it this way, separating file-reading into separate threads, would also allow you to have a single thread loading a specific file and to have multiple client-uploads getting the data from the single reading pass over the file. There are several optimizations you could implement with this, depending on what type of project this is for. If you do, make sure you don't keep too much file data in memory, or the obvious will happen. But if you are guaranteed by the nature of your project that there will be few and/or small files that will not take up too much memory, this is a great side effect of separating file-loading into a separate thread.
If you go this route of using join(), you could use the join(milliseconds) variant if you want the deletion thread to be able to wait a certain period then demand the other threads stop (for huge files and/or times when many files are being accessed so HD is going slow), if they haven't already. Just get a timestamp of (now + theDurationYouWantToWait) and join(impatientTimestamp-currentTimestamp), and send an interrupt to all file-loading threads in the middle of the loop on if(currentTimestamp >= impatientTimestamp) - then have the file-loading threads check for it in the loop where they're reading file data, then re-join() the thread that the join(milliseconds) aborted from and continue the join()ing iteration you were doing.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.