Caching data in JVM from external database - Initial load and keep syncing - java

I am using In memory cache(Spring cache abstraction + Caffeine) and I like to load all the records from our oracle database to our application's memory for caching purpose. So for that, I am planning to perform loading data after application start up and also in regular intervals(#Scheduled with CRON) with the intention of updating all cache entries.
I understood the ApplicationReadyEvent will be triggered after the start up and the method hydrate() will be executing. I am trying to figure out how I can avoid the #Scheduled to be triggered during that time. Let's say If application started at 1:59pm and CRON is scheduled to run at 2pm(To update cache) and the initial loading(started at 1:59 pm) takes about 3-4 mins, I don't want #Scheduled to be run at 2pm. can you suggest some ways I can do that please.Is there a way I can devlop this logic just with #Scheduled? Please also suggest any improvements I can do for this technique.
#Scheduled("assume every 5 mins")
#EventListner(ApplicationReadyEvent.class)
public void hydrate() {
//load from oracle table
//Save to cache - some implementation.
}
Might be silly but one solution I think of is
public class Class {
private boolean isRunning = false;
#Scheduled("assume every 5 mins")
#EventListner(ApplicationReadyEvent.class)
public void hydrate() {
if(!isRunning) {
try {
isRunning = true;
//load from oracle table
//Save to cache - some implementation.
} finally {
isRunning = false;
}
}
}
}
Is this any good?
Thanks

Related

Creating a deep copy of cache in mulithreaded Java application

Setup
I have a multithreaded Java application which will receive 200-300 requests per second to perform a task 'A'(which take approximately 30 milliseconds) on an input received in a request.
The application has a cache(max size = 1MB) which is read by each thread to perform task 'A' on input received:
public class DataProvider() {
private HashMap<KeyObject, ValueObject> cache;
private Database database;
// Scheduled to run in interval of 15 seconds by a background thread
public synchronized void updateData() {
this.cache = database.getData();
}
public HashMap<KeyObject, ValueObject> getCache() {
return this.cache;
}
}
KeyObject and ValueObject are POJO. ValueObject contains List of another POJO.
For every request received task is done in following way:
public class TaskExecutor() {
private DataProvider dataProvider;
public boolean doTask(final InputObject input) {
final HashMap<KeyObject, ValueObject> data = dataProvider.getCache(); // shallow copy I think
// Do Task 'A' using data
}
}
Problem
One of the thread starts executing task 'A' at timestamp 't' using data 'd1' from cache. At time 't + t1' cache data gets updated to 'd2'. Thread now starts using data 'd2' to finish rest of the task. Task gets completed at 't+t1+t2'. Half of the task was completed with different data. This will lead to invalid outcome of task.
Current Approach
Each thread will create a deep copy of the cache and then use the deep copy to perform the task using one of the following approach(best in performance) to perform deep copy:
How do you make a deep copy of an object in Java?
Deep clone utility recommendation
Limitation
Cloning using deep copy will create thousand of objects which may crash JVM.
All the cloning approaches don't look good in terms of performance.
For Your use case, returning a new cache from database.getData(); is much better choice. Because If You choose this way, You would only have to create new cache object once in 15 second. If You choose to clone cache in each task, You would have to create 4501 cache object in 15 second. Obviously returning new cache object is the right choice.
If the code You provided is the same code as in Your project, I believe database.getData(); method changing the content of a single cache object instead of returning a new one. If You return a new cache object from this method Your problem will be solved.

get input from multiple threads and upload file with fixed size to S3

I write a thread safe class to get input from multiple threads and upload the result to S3 once it runs up to a fixed size.
S3Exporter class
// this class is thread safe.
public class S3Exporter {
private static final int BUFFER_PADDING = 1000;
private final int targetSize;
private final ByteArrayOutputStream buf;
private volatile boolean started;
public S3Exporter(final int targetSize) {
buf = new ByteArrayOutputStream(targetSize + BUFFER_PADDING);
this.targetSize = targetSize;
started = false;
}
public synchronized void start() {
started = true;
}
public synchronized void end() {
started = false;
flush();
}
public synchronized void export(byte[] data) throws IOException {
Preconditions.checkState(started, "Not started!");
buf.write(b, buf.size(), b.length);
flushIfNeeded();
}
private void flushIfNeeded() {
if (buf.size() >= targetSize) {
flush();
}
}
public synchronized void flush() {
if (buf.size() > 0) {
// upload buf to s3, it's a time-consuming operation
buf.reset();
}
}
}
The client calls export method to pass data and if exception is thrown the client will pass that data later.
To avoid losing data when restarting the application, I add a shutdown hook when creating S3Exporter object:
S3Exporter exporter = new S3Exporter(10000);
Runtime.getRuntime().addShutdownHook(new Thread(() -> exporter.end()));
My concern is the class is not scalable, I mean it could become bottleneck of the system when data are getting more. I could figure out 2 ways to improve the situation:
do the time-consuming upload operation asynchronously: use an executor to upload and call ThreadPoolExecutor.awaitTermination() in the shutdown hook.
just put data to a LinkedBlockingQueue in export method and use multiple threads to handle it.( This way is more scalable than the first per my understanding)
Then I need to do more work in the shutdown hook thread to make sure not losing the accepted data and it's not a good idea as I know. I'll take the risk of losing data when restarting the application, which is the last thing I wanna see.
My question
Is my concern about the scalability a really problem?( To make the question less stupid, let's say the data size is a few bytes and TPS to call export method is 500)
If the answer to the 1st question is yes, what about my improvements, are they right? How to do the cleanup work to avoid losing data?
Scalability depends on requirements, constraints, desired service level, personal preferences, expected users growth rate, and especially money: given infinite resources, every piece of software can be scaled. You didn't mention any, so I guess you don't have any actual figure. In this phase, as a programmer, your job is to make a correct program that uses a predictable amount of resources.
Your program seems correct, and most of your assumptions are correct, too. However I suggest to immediately store chunks to some local persistent database (or the raw filesystem) and have a periodic job, run in a separate thread, that upload group of chunks to S3, and remove any shutdown hooks (you can use Camel for the boring parts). This is because such hooks are unreliable and should only be used as last resources for quick and optional cleanup (optional in the sense that you must be prepared that the cleanup could not have been run properly until the end).
Using a file instead of memory, your data can survive fatal errors and the working memory required by your application is almost independent on the load: there's an irrelevant amount of extra CPU and some disk I/O that is way cheaper then memory.

How to implement Java single Database thread

I have made a Java program that connects to a SQLite database using SQLite4Java.
I read from the serial port and write values to the database. This worked fine in the beginning, but now my program has grown and I have several threads. I have tried to handle that with a SQLiteQueue-variable that execute database operations with something like this:
public void insertTempValue(final SQLiteStatement stmt, final long logTime, final double tempValue)
{
if(checkQueue("insertTempValue(SQLiteStatement, long, double)", "Queue is not running!", false))
{
queue.execute(new SQLiteJob<Object>()
{
protected Object job(SQLiteConnection connection) throws SQLiteException
{
stmt.bind(1, logTime);
stmt.bind(2, tempValue);
stmt.step();
stmt.reset(true);
return null;
}
});
}
} // end insertTempValue(SQLiteStatement, long, double)
But now my SQLite-class can't execute the statements reporting :
DB[1][U]: disposing [INSERT INTO Temperatures VALUES (?,?)]DB[1][U] from alien thread
SQLiteDB$6#8afbefd: job exception com.almworks.sqlite4java.SQLiteException: [-92] statement is disposed
So the execution does not happen.
I have tried to figure out what's wrong and I think I need a Java wrapper that makes all the database operations calls from a single thread that the other threads go through.
Here is my problem I don't know how to implement this in a good way.
How can I make a method-call and ensure that it always runs from the same thread?
Put all your database access code into a package and make all the classes package private. Write one Runnable or Thread subclass with a run() method that runs a loop. The loop checks for queued information requests, and runs the appropriate database access code to find the information, putting the information into the request and marking the request complete before going back to the queue.
Client code queues data requests and waits for answers, perhaps by blocking until the request is marked complete.
Data requests would look something like this:
public class InsertTempValueRequest {
// This method is called from client threads before queueing
// Client thread queues this object after construction
public InsertTempValueRequest(
final long logTime,
final double tempValue
) {
this.logTime = logTime
this.tempValue = tempValue
}
// This method is called from client threads after queueing to check for completion
public isComplete() {
return isComplete;
}
// This method is called from the database thread after dequeuing this object
execute(
SQLiteConnection connection,
SQLiteStatement statement
) {
// execute the statement using logTime and tempValue member data, and commit
isComplete = true;
}
private volatile long logTime;
private volatile double tempValue;
private volatile boolean isComplete = false;
}
This will work, but I suspect there will be a lot of hassle in the implementation. I think you could also get by by using a lock that only permits one thread at a time to access the database, and also - this is the difference from your existing situation - beginning the access by creating the database resources - including statements - from scratch, and disposing of those resources before releasing the lock.
I found a solution to my problem. I have now implemented a wrapper-class that makes all operations with my older SQLite-class using an ExecutorService, inspired from Thread Executor Example and got the correct usage from Java Doc ExecutorService.

How to schedule a periodic Java task under WebSphere 7?

I need to have a Java method run every 30 seconds within a WebSphere 7 clustered environment (two boxes with 1 server each) - what's the current best-practice to do this while avoiding concurrency issues?
Some more details:
We've got records in an Oracle database that need to be twiddled exactly once. If they get double-twiddled, bad things will happen.
So I'm imagining something like this:
public synchronized void BatchTwiddle() {
List myList = findRecordsToBeTwiddled();
twiddleRecords(myList);
}
public void twiddleRecords(myRecords myList) {
ListIterator<myRecord> myRecordsIterator = myList.listIterator();
while (myRecordsIterator.hasNext()) {
myRecordsIterator.next().twiddleRecord();
}
}
How do I get BatchTwiddle() called every thirty seconds when there's multiple servers (A total of 2) involved? Is it best to just run it on ONE server
So far, I've been digging into the WebSphere Scheduler concept, using ScheduledExecutorService, or using EJB Timers, but nothing seems like a clear winner yet.
Use node configuration to know which node have to run your task and which not. Then create construction like this:
if(shouldRun(THIS_NODE_NAME)) {
//do job...
}

Long running method causing race condition

I'm relatively new with hibernate so please be gentle. I'm having an issue with a long running method (~2 min long) and changing the value of a status field on an object stored in the DB. The pseudo-code below should help explain my issue.
public foo(thing) {
if (thing.getStatus() == "ready") {
thing.setStatus("finished");
doSomethingAndTakeALongTime();
} else {
// Thing already has a status of finished. Send the user back a message.
}
}
The pseudo-code shouldn't take much explanation. I want doSomethingAndTakeALongTime() to run, but only if it has a status of "ready". My issue arises whenever it takes 2 minutes for doSomethingAndTakeALongTime() to finish and the change to thing's status field doesn't get persisted to the database until it leaves foo(). So another user can put in a request during those 2 minutes and the if statement will evaluate to true.
I've already tried updating the field and flushing the session manually, but it didn't seem to work. I'm not sure what to do from here and would appreciate any help.
PS: My hibernate session is managed by spring.
Basically you need to let it run in a separate Thread to make the method to return immediately. Else it will indeed block until the long running task is finished. You can pass the entity itself to the thread, so that it can update the status itself. Here's a basic kickoff example using a simple Thread.
public class Task extends Thread {
private Entity entity;
public Task(Entity entity) {
this.entity = entity;
}
public void run() {
entity.setStatus(Status.RUNNING);
// ...
// Long running task here.
// ...
entity.setStatus(Status.FINISHED);
}
}
and
public synchronized void foo(Entity entity) {
if (entity.getStatus() == Status.READY) {
new Task(entity).start();
} else {
// ...
}
}
With the Status in an enum you can even use a switch statement instead of an if/else.
switch (entity.getStatus()) {
case READY:
new Task(entity).start();
break;
case RUNNING:
// It is still running .. Have patience!
break;
case FINISHED:
// It is finished!
break;
}
For a more robust control of running threads, you may want to consider ExecutorService instead. Therewith you can control the maximum number of threads and specify a timeout.
What the method doSomethingAndTakeALongTime() is doing? is it for DB operation or just executing some business logic?
If its not doing any DB operation, and you got your status fine then you can persist the object before calling that method.
And if its doing some DB operation, then you need to wait for it. So, even if you put in thread you need to wait for that thread to complete (using thread.join() we can do that)
the thing is, before you persist you must have completed all operation based on you ORM object right? so try to optimized the logic for the method to get it executed before you persist.
thanks.

Categories