Data Leakage Using InheritableThreadLocal

Data Leakage Using InheritableThreadLocal - java

I'm using a threadpool to run some tasks in my application. Each task contains an object called TaskContext, which looks pretty much like this:
public class TaskContext implements Serializable {
private static InheritableThreadLocal<TaskContext> taskContextTL = new InheritableThreadLocal<>() ;
private final String taskName ;
private final String user;
public TaskContext(String taskName, String user) {
this.taskName= taskName;
this.user = user ;
}
public String getTaskName() {
return taskName ;
}
public static synchronized TaskContext getTaskContext() {
return taskContextTL.get() ;
}
public static synchronized void setTaskContext(TaskContext context) {
taskContextTL.set(context) ;
}
}
I use InheritableThreadLocal because I need the task data to be inherited by children threads.
At the beginning of each task, I use the setTaskContext(new TaskContext(taskName, user)) method to set the task parameters, and before the task ends- I use: setTaskContext(null) to clear this data.
The problem is that for some reason, when the same thread runs a different task, and for that thread I use the getTaskContext().getTaskName() method, I don't get the current task name but some previous task name that this thread ran.
Why is this happening? Why does setting InheritableThreadLocal value to null doesn't clear the data? How it can be avoided?
Thanks a lot for the help
Update:
I found a source online that claims this: "calling set(null) to remove the value might keep the reference to this pointer in the map, which can cause memory leak in some scenarios. Using remove is safer to avoid this issue."
But not sure what it means...

The source that you found that claims "calling set(null) to remove the value might keep the reference to this pointer in the map, which can cause memory leak in some scenarios. Using remove is safer to avoid this issue." is https://rules.sonarsource.com/java/tag/leak/RSPEC-5164.
Although I don't fully understand why they claim this I trust the people from sonarsource.com enough to consider this claim valid.
More to the point of your question they also provide a fix for this problem. Adapted to your code fragment it means that you should not use setTaskContext(null) to remove the TaskContext but rather create a method
public static void clearTaskContext() {
taskContextTL.remove() ;
}
and use this method to remove the TaskContext.
Also note that I didn't make this method synchronized and also the synchronization in getTaskContext() and setTaskContext() is not needed. Since the TaskContext is stored in a ThreadLocal that is (as its name implies) local to a specific thread there can never be a synchronization issue with them

Related

AWS Java Lambda local variables vs object variables

I am trying to find answer to a very specific question. Trying to go through documentation but so far no luck.
Imagine this piece of code
#Override
public void handleRequest(InputStream input, OutputStream output, Context context) throws IOException {
Request request = parseRequest(input);
List<String> validationErrors = validate(request);
if (validationErrors.size() == 0){
ordersManager.getOrderStatusForStore(orderId, storeId);
} else {
generateBadRequestResponse(output, "Invalid Request", null);
}
}
private List<String> validate(Request request) {
orderId = request.getPathParameters().get(PATH_PARAM_ORDER_ID);
programId = request.getPathParameters().get(PATH_PARAM_STORE_ID);
return new ArrayList<>();
}
Here, I am storing orderId and storeId in field variables. Is this okay? I am not sure if AWS will cache this function and hence cache the field variables or would it initiate a new Java object for every request. If its a new object, then storing in field variable is fine but not sure.

AWS will spin up a JVM and instantiate an instance of your code on the first request. AWS has an undocumented spin down time, where if you do not invoke your Lambda again within this time limit, it will shut down the JVM. You will notice these initial requests can take significantly longer but once your function is "warmed up", then it will be much quicker.
So to directly answer your question, your instance will be reused if the next request comes in quick enough. Otherwise, a new instance will be stood up.
A simple Lambda function that can illustrate this point:
/**
* A Lambda handler to see where this runs and when instances are reused.
*/
public class LambdaStatus {
private String hostname;
private AtomicLong counter;
public LambdaStatus() throws UnknownHostException {
this.counter = new AtomicLong(0L);
this.hostname = InetAddress.getLocalHost().getCanonicalHostName();
}
public void handle(Context context) {
counter.getAndIncrement();
context.getLogger().log("hostname=" + hostname + ",counter=" + counter.get());
}
}
Logs from invoking the above.
22:49:20 hostname=ip-10-12-169-156.ec2.internal,counter=1
22:49:27 hostname=ip-10-12-169-156.ec2.internal,counter=2
22:49:39 hostname=ip-10-12-169-156.ec2.internal,counter=3
01:19:05 hostname=ip-10-33-101-18.ec2.internal,counter=1

Strongly not recommended.
Multiple invocations may use the same Lambda function instance and this will break your current functionality.
You need to ensure your instance variables are thread safe and can be accessed by multiple threads when it comes to Lambda. Limit your instance variable writes to initialization - once only.

Realm - multiple operations on the same object. Does my implementation hit the performance?

I need to check some data, whether or not to send a tracking info. This data is saved inside the Realm database. Here is the model:
public class RealmTrackedState extends RealmObject {
#PrimaryKey
private int id = 1;
private RealmList<RealmChat> realmChatsStarted;
private boolean isSupportChatOpened;
private boolean isSupportChatAnswered;
/* getters and setters */
}
The idea is - every chat that is not inside the realmChatsStarted should be tracked and then added to this list. Similar thing for isSupportChatOpened boolean - however because of the business logic this is a special case.
So - I've wrapped this inside one Realm object. And I've wrapped this into few shouldTrack() methods, like this:
#Override
public void insertOrUpdateAsync(#NonNull final RealmModel object, #Nullable OnInsertListener listener) {
Realm instance = getRealmInstance();
instance.executeTransactionAsync(realm -> realm.insertOrUpdate(object), () ->
notifyOnSuccessNclose(listener, instance),
error -> notifyOnErrorNclose(listener, error, instance));
}
#Override
public RealmTrackedState getRealmTrackedState() {
try (Realm instance = getRealmInstance()) {
RealmResults<RealmTrackedState> trackedStates = instance.where(RealmTrackedState.class).findAll();
if (!trackedStates.isEmpty()) {
return instance.copyFromRealm(trackedStates.first());
}
RealmTrackedState trackedState = new RealmTrackedState();
trackedState.setRealmChatsStarted(new RealmList<>());
insertOrUpdateAsync(trackedState, null);
return trackedState;
}
}
#Override
public boolean shouldTrackChatStarted(#NonNull RealmChat chat) {
if (getCurrentUser().isRecruiter()) {
return false;
}
RealmList<RealmChat> channels = getRealmTrackedState().getRealmChatsStarted();
for (RealmChat trackedChats : channels) {
if (trackedChats.getId() == chat.getId()) {
return false;
}
}
getRealmInstance().executeTransaction(realm -> {
RealmTrackedState realmTrackedState = getRealmTrackedState();
realmTrackedState.addChatStartedChat(chat);
realm.insertOrUpdate(realmTrackedState);
});
return true;
}
And for any other field inside RealmTrackedState model happens the same.
So, within the presenter class, where I'm firing a track I have this:
private void trackState(){
if(dataManager.shouldTrackChatStarted(chatCache)){
//track data
}
if(dataManager.shouldTrackSupportChatOpened(chatCache)){
//track data
}
if(dataManager.shouldTrackWhatever(chatCache)){
//track data
}
...
}
And I wonder:
a. How much of a performance impact this would have.
I'm new to Realm, but for me opening and closing a DB looks ... heavy.
I like in this implementation that each should(...) method is standalone. Even though I'm launching three of them in a row - in other cases I'd probably use only one.
However would it be wiser to get this main object once and then operate on it? Sounds like it.
b. I see that I can either operate on synchronous and asynchronous transactions. I'm afraid that stacking a series of synchronous transactions may clog the CPU, and using the series of asynchronous may cause unexpected behaviour.
c. #PrimaryKey - I used this because of the wild copy paste session. Assuming that this class should have only instance - is it a correct way to do this?

ad a.
Realm caches instances so opening and closing instances are not that expensive as it sounds. First time an app is opening a Realm file, a number of consistency checks are performed (primarily does model classes match classes on disk) but next time you open an instance, you don't do this check.
ad b.
If your transactions depend on each other, you might have to be careful. On the other hand, why have multiple transactions? An async transaction will notify you when it has completed which can help me to get the behaviour you except.
ad c.
Primary keys are useful when you update objects (using insertOrUpdate()) as the value is use to decide if you are creating/inserting or updating an object.

What is the best way to ensure concurrent object calls without locking

I have the following objects:
// this class is immutable, acts like container for several properties.
public class MyDataAddOps{
private final boolean isActive;
private final Map<String,Object> additionalProps;
public MyDataAddOps(boolean isActive, Map<String,Object> additionalProps){
this.isActive = isActive;
this.additionalProps = additionalProps;
}
public boolean isActive(){return isActive;}
public Map<String,Object> getAdditionalProps(){ return additionalProps;}
}
// this class acts as "spring" bean that calls load on construction,
// and then another scheduler bean calls the load per some cron expression (once a minute for example)
public class MyDataAddOpsService{
private MyDataAddOps data;
// this method will be executed periodically outside
// via some spring quartz for example
// the quartz is not re-entrant
public void load(){
// opens some defined file and returns content string
String fileData = getFileContent();
boolean isActive = getIsActive(fileData);
Map<String, Object> props = getProps(fileData);
data = new MyDataAddOps(isActive, props);
}
// This method is executed by many workers threads inside the application
public boolean isActive(){
return data.isActive();
}
public final Map<String, Object> getProps(){
return data.getAdditionalProps();
}
}
This approach probably has a race condition where one thread executes isActive() and another load(). Although it operates on reference and the object state is not changed.
What is the best solution to support such concurrency? I would like to avoid syncronized on methods, and also read-write lock.
Maybe AtomicReference or volatile? Or maybe it would be better to return only reference to the data itself without proxy methods? So no need for locking at all, and all the usage logic is outside this service?
public class MyDataAddOpsService{
private MyDataAddOps data;
public void load(){
....
data = new MyDataAddOps(isActive, props);
}
public MyDataAddOps getData(){
return data;
}
}

Your code has yet to grow towards having a race condition; currently it contains something much more severe, which is a data race. Publishing a reference across threads without inducing a happens before relationship between the write and the future reads means that the reader can see the data object in a partially initialized, inconsistent state. Your proposal of a solution does not help with that.
Once you make the data field volatile, only then will you have a race condition between one thread first reading the data reference, then another thread updating the data reference, then the first thread reading isActive from the old data. This may actually be a benign case for your logic.

Disabling an if-Condition for one static method call by setting a static field

We have got a class, let it be named AttributeUpdater in our project handling the copying of values from one entity to another. The core method traverses through the attributes of an entity and copies them as specified into the second one. During that loop the AttributeUpdater collects all reports, which contain information about what value was overwritten during copying, into a nice list for eventual logging purposes. This list is deleted in case that the old entity which values got overwritten was never persisted into the database, because in that case you only would overwrite default values and logging that is deemed redundant. In pseudo Java code:
public class AttributeUpdater {
public static CopyResult updateAttributes(Entity source, Entity target, String[] attributes) {
List<CopyReport> reports = new ArrayList<CopyReport>();
for(String attribute : attributes) {
reports.add(copy(source, target, attribute));
}
if(target.isNotPersisted()) {
reports.clear();
}
return new CopyResult(reports);
}
}
Now someone got the epiphany that there is one case in which the reports actually matter even if the entity has not been persisted yet. This would not be that big of a deal if I could just add another parameter to the method signature, but that is somewhat out of option due to the actual structure of the class and the amount of required refractoring. Since the method is static the only other solution I came up with is adding a flag as a static field and setting it just before the function call.
public class AttributeUpdater {
public static final ThreadLocal<Boolean> isDeletionEnabled = new ThreadLocal<Boolean> {
#Override protected Boolean initialValue() {
return Boolean.TRUE;
}
public static Boolean getDeletionEnabled() { return isDeletionEnabled.get(); }
public static void setDeletionEnabled(Boolean b) { isDeletionEnabled.set(b); }
public static CopyResult updateAttributes(Entity source, Entity target, String[] attributes) {
List<CopyReport> reports = new ArrayList<CopyReport>();
for(String attribute : attributes) {
reports.add(copy(source, target, attribute));
}
if(isDeletionEnabled.get() && target.isNotPersisted()) {
reports.clear();
}
return new CopyResult(reports);
}
}
ThreadLocal is a container used for thread-safety. This solution, while it does the job, has at least for me one major drawback: for all the other methods which assume that the reports are deleted there is now no way of guaranteeing that those reports will be deleted as expected. Again refractoring is not an option. So I came up with this:
public class AttributeUpdater {
private static final ThreadLocal<Boolean> isDeletionEnabled = new ThreadLocal<Boolean> {
#Override protected Boolean initialValue() {
return Boolean.TRUE;
}
public static Boolean getDeletionEnabled() { return isDeletionEnabled.get(); }
public static void disableDeletionForNextCall() { isDeletionEnabled.set(Boolean.FALSE); }
public static CopyResult updateAttributes(Entity source, Entity target, String[] attributes) {
List<CopyReport> reports = new ArrayList<CopyReport>();
for(String attribute : attributes) {
reports.add(copy(source, target, attribute));
}
if(isDeletionEnabled.get() && target.isNotPersisted()) {
reports.clear();
}
isDeletionEnabled.set(Boolean.TRUE);
return new CopyResult(reports);
}
}
This way I can guarantee that for old code the function will always work like it did before the change. The downside to this solution is, especially for nested entities, that I am going to be accessing the ThreadLocal-Container a lot - Iteration over one of those means calling disableDeletionForNextCall() for each nested element. Also as the method is called a lot overall there are valid performance concerns.
TL;DR: Look at pseudo Java source code. First one is old code, second and third are different attempts to allow deletion disabling. Parameters cannot be added to method signature.
Is there a possibility to determine which solution is better or is this merely a philosophical issue? Or is there even a better solution to this problem?

The obvious way to decide which solution is better in terms of performance would be benchmarking this. As both solutions access the thread-local variable at least for reading, I doubt that they would differ too much. You could perhaps combine them like this:
if(!isDeletionEnabled.get())
isDeletionEnabled.set(Boolean.TRUE);
else if (target.isNotPersisted())
reports.clear();
In this case, you will have the benefit of the second solution (guaranteed resetting of the flag) without unneccessary writes.
I doubt there will be much practical difference. With a bit of luck, the HotSpot JVM will compile the thread local variable into some nice native code which works without too much of a performance penalty, though I have no actual experience there.

State pattern: why states are not Singletons?

I've used the State pattern to implement a simple finite state machine. Looking at the description given on Wikipedia, and more specifically at the suggested Java implementation, I wondered why classes implementing the State interface (i.e. the various states) are not Singletons?
In the suggested implementation a new State is created whenever a transition occurs. However, one object is sufficient to represent each state. So, why wasting time creating a new instance every time a transition occurs?

Because each state can store instance variables?
Take a look at the Wikipedia example you reference:
class StateB implements State {
private int count=0;
public void writeName(StateContext stateContext, String name) {
System.out.println(name.toUpperCase());
if(++count>1) {
stateContext.setState(new StateA());
}
}
}
Can you see how it stores a count of the number of times it has been entered?
Now, in a FSM you probably want each state to be idempotent (subsequent calls give the same feedback) but the State pattern is more general. One target use as described on the wikipedia page is:
A clean way for an object to
partially change its type at runtime
As most objects probably use their local variables when performing actions, you would want the "changed type" version to use local variables as well.

Assume your object has a state. Now what if you need "just one more whole thing like that"?

You may want a 'stateful-State' object (like demonstrated as one example on the references wikipedia page) and in addition you may want to run several state machines of the same type in the same JVM.
This wouldn't be possible if each State was a Singleton.

If your states don't need machine-specific additional state data, it makes perfect sense to reuse them across machines. That doesn't mean they are Singletons: Singletons also imply global access which you almost never want.
Here's a simple state machine that reuses states, but doesn't make them singletons.
public class SwitchState
{
public SwitchState(bool isOn)
{
mIsOn = isOn;
}
public void InitToggleState(SwitchState state)
{
mToggleState = toggleState;
}
public bool IsOn { get { return mIsOn; } }
public SwitchState Toggle() { return mToggleState; }
private SwitchState mToggleState;
private bool mIsOn;
}
public class LightSwitch
{
public LightSwitch()
{
mState = sOnState;
}
public bool IsOn { get { return mState.IsOn; } }
public void Toggle()
{
mState = mState.Toggle();
}
static LightSwitch()
{
sOnState = new SwitchState(true);
sOffState = new SwitchState(false);
sOnState.InitToggleState(sOffState);
sOffState.InitToggleState(sOnState);
}
private static SwitchState sOnState;
private static SwitchState sOffState;
private SwitchState mState;
}
You can see there will only be a single on and off state in the entire application regardless of how many LightSwitch instances there are. At the same time, nothing outside of LightSwitch has access to the states, so they aren't singletons. This is a classic example of the Flyweight pattern.

The question should be asked the other way around: why have State as a singleton? A singleton is only needed when you require global access and it is an error to have more than one instance.
It's certainly not an error to have more than one instance of a State, and you also do not require global access, so there is no need to make them singletons.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Data Leakage Using InheritableThreadLocal - java

Related

AWS Java Lambda local variables vs object variables

Realm - multiple operations on the same object. Does my implementation hit the performance?

What is the best way to ensure concurrent object calls without locking

Disabling an if-Condition for one static method call by setting a static field

State pattern: why states are not Singletons?

Categories

Resources