Updating the collection by timeout with constant availability

Updating the collection by timeout with constant availability - java

Is there a Java/Scala collection (List, Array or something similar) that automatically updates whole collection content after a given timeout? But with one feature that updating will happen in another thread and the old data will be available during updating.
My case is the enrichment of the Flink stream with a periodically updated collection, the data of which I get from PostgreSQL. I do not need to query the database for each message. I do a full data download (SELECT * FROM table) from the table every N minutes. And I don't want to make a pause for updating data.

Viktor,
I guess caching is one of the best solution you can use. But if you are not aware on caching, following answer will help you.
Solution
You can use TimerTask class to run a specific method timely manner. Following simple example will give you an idea.
ListUpdater (which is a TimerTask child)
import java.util.TimerTask;
public class ListUpdater extends TimerTask{
#Override
public void run() {
// You can write your code to update your list.
System.out.println("Updating list");
}
}
Application Class
public class Test {
public static void main(String[] args) {
TimerTask task = new ListUpdater();
Timer timer = new Timer();
timer.schedule(task, 1000,6000);
}
}
In this example, the timer will print the “Updating list” message every 6 seconds, with a 1 second delay for the first time of execution.
There are lot of thread safe collections in java. You are free to use any of them.
ConcurrentLinkedQueue
ArrayBlockingQueue
LinkedBlockingDeque
LinkedBlockingQueue
PriorityBlockingQueue
SynchronousQueue
DelayQueue
Or to create a brand new thread-safe list:
List newList = Collections.synchronizedList(new ArrayList());

Related

Java Thread: Real Time Application Example

I was asked a question in an interview, where i have list available in the main method and and i was told there is some operation to be performed on each item in the list, how would i achieve this using threads concept.
Consider the following scenario:
I have a list of integers. I need to print all the values from the list. Can it be done using threads concept where i have multiple threads running on each item in the list and where each thread is used to print out a value rather than one thread printing all the values? I am not trying to modify any value in the list.

I hope you are looking for something like that:
public class MaltiThreadExample {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3));
for (int i : list) {
Thread th = new Thread() {
#Override
public void run() {
System.out.println(i);
}
};
th.start();
}
}
}
The output is for one execution:
run:
3
1
2
BUILD SUCCESSFUL (total time: 0 seconds)

Yes, it is a typical producer-consumer paradigm:
Imagine a Runnable class who receives an Iterator as parameter, and waits over a certain monitor, and then consumes one item from the iterator, and last notifies the same monitor. Loops while the iterator has more items.
Upon this, it will be enough to create the list of numbers, create the consumer threads passing them the list's iterator, and start them.

The code below is not tested at all. It's just something that comes into mind. The last implementation using parallelStream() might be what you are looking for.
public class DemoApplication {
public static void main(String[] args) {
final List<Integer> myIntegerList = Arrays.asList(1, 2, 3);
// Good old for-each-loop
for (Integer item : myIntegerList) {
System.out.print(item);
}
// Java 8 forEach with Consumer
final Consumer<Integer> consumer = new Consumer<Integer>() {
#Override
public void accept(Integer item) {
System.out.print(item);
}
};
myIntegerList.forEach(consumer);
// Java 8 forEach with Lambda
myIntegerList.forEach((item) -> System.out.print(item));
// Java 8 forEach on parallelStream with Lambda
myIntegerList.parallelStream().forEach((item) -> System.out.print(item));
}
}

i am trying to understand the advantage of threads.
There are basically two reasons for using multiple threads in a program:
(1) Asynchronous event handling: Imagine a program that must wait for and respond to several different kinds of input, and each kind of input can happen at completely arbitrary times.
Before threads, we used to write a big event loop, that would poll for each different kind of event, and then dispatch to different handler functions. Things could start to get ugly when one or more of the event handlers was stateful (i.e., what it did next would depend on the history of previous events.)
A program that has one thread for each different kind of event often is much cleaner. That is to say, it's easier to understand, easier to modify, etc. Each thread loops waiting for just one kind of event, and its state (if any) can be kept in local variables, or its state can be implicit (i.e., depends on what function the thread is in at any given time).
(2) Multiprocessing (a.k.a., "parallel processing", "concurrent programming",...): Using worker threads to perform background computations probably is the most widespread model of multiprocessing in use at this moment in time.
Multithreading is the lowest-level of all multiprocessing models which means (a) it is the hardest to understand, but (b) it is the most versatile.

It can be done. We can make use of concurrenthashmap. We can add the list to this map and pass it to the threads. Each thread will try to get the lock on the resource to operate.

I have combine the results which comes from n number of threads in java

I need to write a CSV file ,So I need to hit web service for more than 2k records(Each time once) ,So I am using threads to hit the webservice,
Currently I am doing like below,
for(String customer: Customers)
{
Thread th=new Thread(new TaskFile(customer))
th.start();
}
**static Map=TaskFile.map
iterate map...**
}
public class TaskFile implements runnable
{
private String customer;
public static Map map=new HashMap();
public taskFile(String customer)
{
this.customer=customer;
}
public void run()
{
websrivce call..
map.put(customer,result)
}
}
so I am using static map to club the results of all values but is there any other way to combine all the thread results. Since static is loaded by jvm.
I may have 100k records in future.
Ok I have my own frame work to take care of this thread which I can't share due to security issues. What about static ..This just an example. So help me on static Map and clubbing the results

The simplest answer for this environment is to use a Queue.
Create one queue, create as many threads as you like (recommend ThreadPools but that's your choice) and tell each thread about the queue. Each thread then pushes items onto the queue as fas as it likes.
You then have one consumer that reads all of the results out of the queue.
Queue<Type> queue = new LinkedBlockingQueue<>();
for (String customer : Customers) {
Thread t = new Thread(new Producer(queue, customer));
t.start();
}
Thread consumer = new Thread(new Consumer(queue));

I would not want to spawn 100k threads at once - you may easily end up overwhelming the web service. Rather use a ThreadPoolExecutor to spawn the tasks, it also has methods to wait for a result. That way you can easily tune the number of concurrent requests.
If you just want to add your results to a CSV file, I would add a central disruptor where your web service client threads are the producers and your CSV writing thread is the consumer. That way you neatly decouple hitting the web service from writing the file, minimizing latency in the process.

First, you need to use a thread-safe map, like ConcurrentHashMap
Second, you don't need a static map, you can pass the map in the constructor to each Runnable:
public class TaskFile implements runnable
{
private String customer;
private Map map;
public taskFile(String customer, Map map)
{
this.customer=customer;
this.map = map;
}
public void run()
{
// websrivce call..
map.put(customer,result)
}
}
// ...
Map map = new ConcurrentHashMap();
for(String customer: Customers)
{
Thread th=new Thread(new TaskFile(customer, map))
th.start();
}
Notice that I have ignored generics as you did not specify which types you use, but you should definitly use generics.
Also, as #Marko Topolnik said, starting a thread for each row is an overkill.

This looks like you are creating one thread for each customer record, which will probably do something bad to your system (for reference, try looking at task manager to see how many threads other processes like to use)
My approach here would be to add the customer records to some form of (concurrent!) Queue object, and then create a handful of Runnable objects whose task is to pull from this queue as fast as they can, processing records, looping until the queue is empty, at which point they terminate.
I'm not sure whether this approach would have much performance benefit over simply handling the records single-threaded, but it is a much more sane way of doing multithreading.

I think your concern is if the application is deployed and there are more than one requests, your static map doesn't seem applicable (as both jobs results will be put in the same static map!).
In this case I would suggest creating a hashmap for each job and pass it along the constructor of thread. You can remove the static within thread for the map, thereby making all threads for a job refer a single map on whom you store the results.
Map<> map = new HashMap<>();
for(String customer: Customers)
{
Thread th=new Thread(new TaskFile(customer,map ));
th.start();
}
Iterate over map upon completion, as the reference is retained and results are accessible.
But, as mentioned by others, there are lots of possible leaks in your approach! Do keep an eye out.

Thread removing from arrayList and function inserting

As title states I have problem with thread based structure. What I need to do is:
one thread running in loop and checking if there exists something in list and if so then performing some operation on object and then removing it from list
function which is called from 'outside' and adding new objects to this list.
here is my approach:
public class Queue implements Runnable {
private List<X> listOfObjects = new ArrayList<X>;
public void addToList(X toAdd){
listOfObject.add(toAdd);
}
public void run() {
while(true){
synchronized(listOfObjects){
if(!listOfObjects.isEmpty()){
listOfObjects.get(0).doSth();
listOfObjects.remove(0);
}
}
}
}
}
Is it proper approach? Should I also synchronize adding to this list?

Looks like you should try an implementation of java.util.concurrent.BlockingQueue, rather than try and write your own! I suspect a LinkedBlockingQueue will work nicely for you. You can write entries into the queue from multiple sources, and your consumer thread will take each entry off and process it in turn, in a thread-safe fashion.
Note that your consumer thread for a BlockingQueue will wait (by calling the take() method). Your implementation above, however, will spin and consume CPU whilst waiting for entries to be processed in the queue (this should be evident if you run it and monitor your CPU usage).
Here's an explanatory article.

You need to synchronize all accesses to your list:
public void addToList(X toAdd){
synchronized(listOfObject) {
listOfObject.add(toAdd);
}
}
Alternatively you could use a thread safe implementation of List such a CopyOnWriteArrayList, in which case you can remove all the synchronized blocks.
ps: as mentioned in another answer, you seeem to be reimplementing an ArrayBlockingQueue.

It appears you are trying to create a queue for a thread to process elements added.
A simpler approach is to use an ExecutorService.
ExecutorService service = Executors.newSingleThreadedPool();
// to add a task.
service.submit(new Runnable() {
public void run() {
// process X here
}
});

It's better to use Collections.synchronizedList() wrapper instead of a manually synchronized block.

Java- Efficient Scheduling Structure?

I apologise for the length of this problem, but I thought it important to include sufficient detail given that I'm looking for a suitable approach to my problem, rather than a simple code suggestion!
General description:
I am working on a project that requires tasks being able to be 'scheduled' at some relative repeating interval.
These intervals are in terms of some internal time, that is represented as an integer that is incremented as the program executes (so not equal to real time). Each time this happens, the schedule will be interogated to check for any tasks due to execute at this timestep.
If a task is executed, it should then be rescheduled to run again at a position relative to the current time (e.g. in 5 timesteps). This relative position is simply stored as an integer property of the Task object.
The problem:
I am struggling somewhat to decide upon how I should structure this- partly because it is a slightly difficult set of search terms to look for.
As it stands, I am thinking that each time the timer is incremented I need to:
Execute tasks at the '0' position in the schedule
Re-add those tasks to the schedule again at their relative position (e.g. a task that repeats every 5 steps will be returned to the position 5)
Each group of tasks in the schedule will have their 'time until execution' decremented one (e.g. a task at position 1 will move to position 0)
Assumptions:
There are a couple of assumptions that may limit the possible solutions I can use:
The interval must be relative, not a specific time, and is defined to be an integer number of steps from the current time
These intervals may take any integer value, e.g. are not bounded.
Multiple tasks may be scheduled for the same timestep, but their order of execution is not important
All execution should remain in a single thread- multi-threaded solutions are not suitable due to other constraints
The main questions I have are:
How could I design this Schedule to work in an efficient manner? What datatypes/collections may be useful?
Is there another structure/approach I should consider?
Am I wrong to dismiss scheduling frameworks (e.g. Quartz), which appear to work more in the 'real' time domain rather 'non-real' time domain?
Many thanks for any possible help. Please feel free to comment for further information if neccessary, I will edit wherever needed!

Well, Quartz is quite powerfull tools, however it has limited configuration possibilities, so if you need specific features, you should propably write your own solution.
However, it's a good idea to study the Quartz source code and data structures, because they have successfully dealt with much problems you would find f.g. inter-process synchronization on database level, running delayed tasks etc.
I've written once my own scheduler, which was adapted to tasks where propably Quartz would not be easy to adapt, but once I've learned Quartz I've understood how much I could improve in my solutions, knowing how it was done in Quartz.

How about this, it uses your own Ticks with executeNextInterval() :
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class Scheduler {
private LinkedList<Interval> intervals = new LinkedList<Scheduler.Interval>();
public void addTask(Runnable task, int position) {
if(position<0){
throw new IllegalArgumentException();
}
while(intervals.size() <= position){
intervals.add(new Interval());
}
Interval interval = intervals.get(position);
interval.add(task);
}
public void executeNextInterval(){
Interval current = intervals.removeFirst();
current.run();
}
private static class Interval {
private List<Runnable> tasks = new ArrayList<Runnable>();
public void add(Runnable task) {
tasks.add(task);
}
public void run() {
for (Runnable task : tasks) {
task.run();
}
}
}
}
You might want to add some error handling, but it should do your job.
And here are some UnitTests for it :)
import junit.framework.Assert;
import org.junit.Test;
public class TestScheduler {
private static class Task implements Runnable {
public boolean didRun = false;
public void run() {
didRun = true;
}
}
Runnable fail = new Runnable() {
#Override
public void run() {
Assert.fail();
}
};
#Test
public void queue() {
Scheduler scheduler = new Scheduler();
Task task = new Task();
scheduler.addTask(task, 0);
scheduler.addTask(fail, 1);
Assert.assertFalse(task.didRun);
scheduler.executeNextInterval();
Assert.assertTrue(task.didRun);
}
#Test
public void queueWithGaps() {
Scheduler scheduler = new Scheduler();
scheduler.addTask(fail, 1);
scheduler.executeNextInterval();
}
#Test
public void queueLonger() {
Scheduler scheduler = new Scheduler();
Task task0 = new Task();
scheduler.addTask(task0, 1);
Task task1 = new Task();
scheduler.addTask(task1, 1);
scheduler.addTask(fail, 2);
scheduler.executeNextInterval();
scheduler.executeNextInterval();
Assert.assertTrue(task0.didRun);
Assert.assertTrue(task1.didRun);
}
}

A circular linked list might be the data structure you're looking for. Instead of decrementing fields in each task element, you simply increment the index of the 'current' field in the circular list of tasks. A pseudocode structure might look something like this:
tick():
current = current.next()
for task : current.tasklist():
task.execute()
any time you schedule a new task, you just add it in the position N ticks forward of the current 'tick'

Here are a couple of thoughts:
Keep everything simple. If you don't have millions of tasks, there is no need for an optimized data structure (except pride or the urge for premature optimization).
Avoid relative times. Use an absolute internal tick. If you add a task, set the "run next time" to the current tick value. Add it to the list, sort the list by time.
When looking for tasks, start at the head of the list and pick everything which has a time <= current tick, run the task.
Collect all those tasks in another list. After all have run, calculate the "run next time" based on the current tick and the increment (so you don't get tasks that loop), add all of them to the list, sort.

Take a look at the way DelayQueue uses a PriorityQueue to maintain such an ordered list of events. DelayQueue works using real time and hence can use the variable timed wait methods available in Condition and LockSupport. You could implement something like a SyntheticDelayQueue that behaves in the same way as DelayQueue but uses your own synthetic time service. You would obviously have to replace the timed wait/signalling mechanisms that come for free with the jdk though and this might be non trivial to do efficiently.

If I had to do it, I'd create a simple queue ( linked list variant). This queue would contain a dynamic data structure (simple list for example) containing all the tasks that need to be done. At each time interval (or time-step), the process reads the first node of the queue, executes the instructions it finds in the list of that node. At the end of each execution it would compute the rescheduling and add the new execution to another node in the queue or create nodes up to that position before storing the instruction within that node. The first node is then removed and the second node (now the first) is executed at the next time-step. This system would also not require any integers to be kept track of and all data structures needed are found in the java language. This should solve your problem.

Use a ScheduledExecutorService. It has everything you need built right in. Here's how simple it is to use:
// Create a single-threaded ScheduledExecutorService
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); // 1 thread
// Schedule something to run in 10 seconds time
scheduler.schedule(new Runnable() {
public void run() {
// Do something
}}, 10, TimeUnit.SECONDS);
// Schedule something to run in 2 hours time
scheduler.schedule(new Runnable() {
public void run() {
// Do something else
}}, 2, TimeUnit.HOURS);
// etc as you need

Java: Best way to retrieve timings form multiple threads

We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul

Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}

Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.

The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.

You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.