Java Optimization with Threads - java

I'm using a custom class Foo in Java as the key type in a HashMap. All the fields of Foo instances are immutable (they are declared final and private and are assigned values only in the constructor). Thus, the hashCode() of a given Foo object is also fixed, and for optimization purposes, I am calculating it in the constructor and simply returning that value in the hashCode() method.
Instances of Foo also have a value() method which returns a similar fixed value once the object has been instantiated. Currently I am also calculating it in the constructor and returning it in the method, but there is a difference between hashCode() and value(): hashCode() is called for the first time almost instantly after the object is created, but value() is called much later. I understand that having a separate Thread to calculate the hash-code would simply increase the run-time because of synchronization issues, but:
is this a good way to calculate value()? Would it improve run-time at all?
are simple Threads enough, or do I need to use pools etc.?
Note: this may seem like I'm optimizing the wrong parts of my program, but I've already worked on the 'correct' parts and brought the average run-time down from ~17 seconds to ~2 seconds. Edit: there will be upwards of 5000 Foo objects, and that's a conservative estimate.

It definitely sounds like deferred calculation is a good approach here - and yes, if you create a lot of these objects, a thread pool is the way to go.
As for value()'s return value until it's ready, I would stay away from returning invalid values, and instead either make it blocking (and add some isValueReady() helper) or make it instantly return a "future" - some object that offers those same isReady and a blocking get methods.
Also, never rely on "much later" - always make sure the value there is ready before using it.

I recommend creating a Future for value - create a static fixedTheadPool and submit the value calculations on it. This way there's no risk that value will be accessed before it's available - the worst case is that whatever is accessing value will block on a Future.get call (or use the version with a timeout if e.g. deadlock is a concern)
Because Future.get throws checked exceptions which can be a nuisance, you can wrap the get call in your class's getter method and wrap the checked exceptions in a RuntimeException
class MyClass {
private static final ExecutorService executor = Executors.newFixedThreadPool(/* some value that makes sense */);
private final Future<Value> future;
public MyClass() {
future = executor.submit(/* Callable */);
}
public boolean isValueDone() {
return future.isDone();
}
public Value value() {
try {
return future.get();
} catch(InterruptedException|ExecutionException e) {
throw new RuntimeException(e);
}
}
}

Related

Is "double checked locking" broken here in java?

I find an example for double checked locking.
However, I think this example is invalid because it's possible that another thread may see a non-null reference to a DoorControlManage object of door 1 but see the default values for fields of the DoorControlManage object of door 1 rather than the values set in the constructor.
(Ref: https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html)
Could you let me know whether I am right?
Thanks a lot!
public class DoorControlManager {
private static HashMap<Integer, DoorControlManager> mInstances = new HashMap<>();
public static DoorControlManager getInstance(int door) {
if (!mInstances.containsKey(door)) {
synchronized (mInstances) {
if (!mInstances.containsKey(door)) {
mInstances.put(slotId, new DoorControlManager(door));
}
}
}
return mInstances.get(slotId);
}
...
}
Yes this code is broken, though not for the normal reason.
In this case, you have different threads accessing HashMap without proper synchronization. Since HashMap is not a thread-safe class, this is not thread-safe. It is possible that the first containsKey call will see stale values the internals of the map, and behave in unspecified (implementation dependent) ways.
Making "simple" changes to concurrency sensitive code can completely destroy the properties that make the original version thread-safe. If you are going to attempt to write "clever" code like this, you need to have a deep understanding of Java concurrency ... and how the Java Memory Model really works.
There are a couple of ways that this code could be written correctly:
Use a ConcurrentHashMap and implement the getInstance method as:
return mInstances.computeIfAbsent(
slotId, () -> new DoorControlManager(door));
Keep using a HashMap and don't use the DCL pattern. Simply lock before testing.
Note that DCL initialization pattern in Java 5+ is not broken, provided that the you are initializing a single field and the field is declared as volatile. But there are other (better) ways to achieve the same effect, so its use is not recommended.

Java store reflected Method statically in class: Safe?

Is something like the following 'safe' in Java, and why?
public final class Utility {
private Utility() {}
private static Method sFooMethod = null;
public static void callFoo(SomeThing thing) {
try {
if(sFooMethod == null)
sFooMethod = SomeThing.class.getMethod("foo");
sFooMethod.invoke(thing);
} catch(Exception e) {} // Just for simplicity here
}
}
My rationale would be that even if another thread writes to sFooMethod in the background and the current thread sees it suddenly somewhere during execution of callFoo(), it would still just result in the same old reflective invoke of thing.foo()?
Extra question: In what ways does the following approach differ (positive/negative) from the above? Would it be preferred?
public final class Utility {
private Utility() {}
private static final Method sFooMethod;
static {
try {
sFooMethod = SomeThing.class.getMethod("foo");
} catch(Exception e) {}
}
public static void callFoo(SomeThing thing) {
try {
if(sFooMethod != null)
sFooMethod.invoke(thing);
} catch(Exception e) {}
}
}
Background update from comment:
I am writing an Android app and I need to call a method that was private until API 29, when it was made public without being changed. In an alpha release (can't use this yet) of the AndroidX core library Google provides a HandlerCompat method that uses reflection to call the private method if it is not public. So I copied Google's method into my own HandlerCompatP class for now, but I noticed that if I call it 1000 times, then the reflective lookup will occur 1000 times (I couldn't see any caching). So that got me thinking about whether there is a good way to perform the reflection once only, and only if needed.
"Don't use reflection" is not an answer here as in this case it is required, and Google themselves intended for it to happen in their compatibility library. My question is also not whether using reflection is safe and/or good practice, I'm well aware it's not good in general, but instead whether given that I am using reflection, which method would be safe/better.
The key to avoiding memory consistency errors is understanding the happens-before relationship. This relationship is simply a guarantee that memory writes by one specific statement are visible to another specific statement.
Java language specification states following:
17.4.5. Happens-before Order
Two actions can be ordered by a happens-before relationship. If one
action happens-before another, then the first is visible to and
ordered before the second.
If we have two actions x and y, we write hb(x, y) to indicate that x
happens-before y.
If x and y are actions of the same thread and x comes before y in
program order, then hb(x, y).
As, in your case, writing to and then reading from the static field are happening in same tread. So the `happens before' relation is established. So the read operation will always see effects of the write operation.
Also, all threads are going to write same data. At worse, all eligible threads will write to the variable same time. The variable will have reference to the object that got assigned last and rest of the dereferenced objects will be garbage collected.
There won't be many threads in your App which will enter the same method at once, which will cause significant performance hit due to lot of object creation. But if you want to set the variable only once then second approach is better. As static blocks are thread safe.
Is something like the following 'safe' in Java, and why?
No I would not recommend using reflections, unless you have to.
Most of the time developers design their classes in a way, so that access to a hidden field or method is never required. There will most likely be a better way to access the hidden content.
Especially hidden fields and methods could change their name, when the library they are contained in is updated. So your code could just stop working suddenly and you would not know why, since the compiler would not output any errors.
It is also faster to directly access a method or field then through reflections, because the reflections first need to search for it and the direct access don't
So don't use reflections if you don't have to
I'm not sure what your goal is -- there is probably a better way to do what you're trying to do.
The second approach, with a static initializer, is preferable because your first implementation has a race condition.

Proper use of volatile variables and synchronized blocks

I am trying to wrap my head around thread safety in java (or in general). I have this class (which I hope complies with the definition of a POJO) which also needs to be compatible with JPA providers:
public class SomeClass {
private Object timestampLock = new Object();
// are "volatile"s necessary?
private volatile java.sql.Timestamp timestamp;
private volatile String timestampTimeZoneName;
private volatile BigDecimal someValue;
public ZonedDateTime getTimestamp() {
// is synchronisation necessary here? is this the correct usage?
synchronized (timestampLock) {
return ZonedDateTime.ofInstant(timestamp.toInstant(), ZoneId.of(timestampTimeZoneName));
}
}
public void setTimestamp(ZonedDateTime dateTime) {
// is this the correct usage?
synchronized (timestampLock) {
this.timestamp = java.sql.Timestamp.from(dateTime.toInstant());
this.timestampTimeZoneName = dateTime.getZone().getId();
}
}
// is synchronisation required?
public BigDecimal getSomeValue() {
return someValue;
}
// is synchronisation required?
public void setSomeValue(BigDecimal val) {
someValue = val;
}
}
As stated in the commented rows in the code, is it necessary to define timestamp and timestampTimeZoneName as volatile and are the synchronized blocks used as they should be? Or should I use only the synchronized blocks and not define timestamp and timestampTimeZoneName as volatile? A timestampTimeZoneName of a timestamp should not be erroneously matched with another timestamp's.
This link says
Reads and writes are atomic for all variables declared volatile
(including long and double variables)
Should I understand that accesses to someValue in this code through the setter/getter are thread safe thanks to volatile definitions? If so, is there a better (I do not know what "better" might mean here) way to accomplish this?
To determine if you need synchronized, try to imagine a place where you can have a context switch that would break your code.
In this case, if the context switch happens where I put the comment, then in getTimestamp() you're going to be reading different values from each timestamp type.
Also, although assignments are atomic, this expression java.sql.Timestamp.from(dateTime.toInstant()); certainly isn't, so you can get a context switch inbetween dateTime.toInstant() and the call to from. In short you definitely need the synchronized blocks.
synchronized (timestampLock) {
this.timestamp = java.sql.Timestamp.from(dateTime.toInstant());
//CONTEXT SWITCH HERE
this.timestampTimeZoneName = dateTime.getZone().getId();
}
synchronized (timestampLock) {
return ZonedDateTime.ofInstant(timestamp.toInstant(), ZoneId.of(timestampTimeZoneName));
}
In terms of volatile, I'm pretty sure they're required. You have to guarantee that each thread definitely is getting the most updated version of a variable.
This is the contract of volatile. And although it may be covered by the synchronized block, and volatile not actually necessary here, it's good to write anyway. If the synchronized block does the job of volatile already, the VM won't do the guarantee twice. This means volatile won't cost you any more, and it's a very good flashing light that says to the programmer: "I'M USED IN MULTIPLE THREADS".
For someValue: If there's no synchronized block here, then volatile is definitely necessary. If you call a set in one thread, the other thread has no queue that tells it that may have been updated outside of this thread. So it may use an old and cached value. The JIT can do a lot of funny optimizations if it assumes single thread. Ones that can simply break your program.
Now I'm not entirely certain if synchronized is required here. My guess is no. I would add it anyway to be safe though. Or you can let java worry about the synchronization and use http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicInteger.html
Nothing new here, this is just a more explicit version of something #Cruncher already said:
You need synchronized whenever it is important for two or more fields in your program to be consistent with one another. Suppose you have two parallel lists, and your code depends on them both being the same length. That's called an invariant as in, the two lists are invariably the same length.
How can you write a method, append(x,y), that adds a new pair of values to the lists without temporarily breaking the invariant? You can't. The method must add one item to the first list, breaking the invariant, and then add the other item to the second list, fixing it again. There's no other way.
In a single-threaded program, that temporary broken state is no problem because no other method can possibly use the lists while append(x,y) is running. That's no longer true in a multithreaded program. In the worst case, append(x,y) could add x to the x list, and then the scheduler could suspend the thread at that exact moment to allow other threads to run. The CPUs could execute millions of instructions before append(x,y) gets to finish the job and make the lists right again. During all of that time, other threads would see the broken invariant, and possibly corrupt your data or crash the program as a result.
The fix is for append(x,y) to be synchronized on some object, and (this is the important part), for every other method that uses the lists to be synchronized on the same object. Since only one thread can be synchronized on a given object at a given time, it will not be possible for any other thread to see the lists in an inconsistent state.
So, if thread A calls append(x,y), and thread B tries to look at the lists "at the same time", will thread B see the what the lists looked like before or after thread A did its work? That's called a data race. And with only the synchronization that I have described so far, there's no way to know which thread will win. All we've done so far is to guarantee one particular invariant.
If it matters which thread wins the race, then that means that there is some higher-level invariant that also needs protection. You will have to add more synchronization to protect that one too. "Thread safety" -- two little words to name a subject that is both broad and deep.
Good Luck, and Have Fun!
// is synchronisation required?
public BigDecimal getSomeValue() {
return someValue;
}
// is synchronisation required?
public void setSomeValue(BigDecimal val) {
someValue = val;
}
I think Yes you are require to put the synchronization block because consider an example in which one thread is setting the value and at the same time other thread is trying to read from getter method, like here in the example you will see the syncronization block.So, if you take your variable inside the method then you must require the synchronization block.

Do You Cache Properties in Local Variables?

Consider the class Foo.
public class Foo {
private double size;
public double getSize() {
return this.size; // Always O(1)
}
}
Foo has a property called size, which is frequently accessed, but never modified, by a given method. I've always cached a property in a variable whenever it is accessed more than once in any method, because "someone told me so" without giving it much thought. i.e.
public void test(Foo foo) {
double size = foo.getSize(); // Cache it or not?
// size will be referenced in several places later on.
}
Is this worth it, or an overkill?
If I don't cache it, are modern compilers smart enough to cache it themselves?
A couple of factors (in no particular order) that I consider when deciding whether or not to store the value returned by a call to a "get() method":
Performance of the get() method - Unless the API specifies, or unless the calling code is tightly coupled with the called method, there are no guarantees of the performance of the get() method. The code may be fine in testing now, but may get worse if the get() methods performace changes in the future or if testing does not reflect real-world conditions. (e.g. testing with only a thousand objects in a container when a real-world container might have ten million) Used in a for-loop, the get() method will be called before every iteration
Readability - A variable can be given a specific and descriptive name, providing clarification of its use and/or meaning in a way that may not be clear from inline calls to the get() method. Don't underestimate the value of this to those reviewing and maintaining the code.
Thread safety - Can the value returned by the get() method potentially change if another thread modifies the object while the calling method is doing its thing? Should such a change be reflected in the calling method's behavior?
Regarding the question of whether or not compilers will cache it themselves, I'm going to speculate and say that in most cases the answer has to be 'no'. The only way the compiler could safely do so would be if it could determine that the get() method would return the same value at every invocation. And this could only be guaranteed if the get() method itself was marked final and all it did was return a constant (i.e an object or primitive also marked 'final'). I'm not sure but I think this is probably not a scenario the compiler bothers with. The JIT compiler has more information and thus could have more flexibility but you have no guarantees that some method will get JIT'ed.
In conclusion, don't worry about what the compiler might do. Caching the return value of a get() method is probably the right thing to do most of the time, and will rarely (i.e almost never) be the wrong thing to do. Favor writing code that is readable and correct over code that is fast(est) and flashy.
I don't know whether there is a "right" answer, but I would keep a local copy.
In your example, I can see that getSize() is trivial, but in real code, I don't always know whether it is trivial or not; and even if it is trivial today, I don't know that somebody won't come along and change the getSize() method to make it non-trivial sometime in the future.
The biggest factor would be performance. If it's a simple operation that doesn't require a whole lot of CPU cycles, I'd say don't cache it. But if you constantly need to execute an expensive operation on data that doesn't change, then definitely cache it. For example, in my app the currently logged in user is serialized on every page in JSON format, the serialization operation is pretty expensive, so in order to improve performance I now serialize the user once when he signs in and then use the serialized version for putting JSON on the page. Here is before and after, made a noticeable improvement in performance:
//Before
public User(Principal principal) {
super(principal.getUsername(), principal.getPassword(), principal.getAuthorities());
uuid = principal.getUuid();
id = principal.getId();
name = principal.getName();
isGymAdmin = hasAnyRole(Role.ROLE_ADMIN);
isCustomBranding= hasAnyRole(Role.ROLE_CUSTOM_BRANDING);
locations.addAll(principal.getLocations());
}
public String toJson() {
**return JSONAdapter.getGenericSerializer().serialize(this);**
}
// After
public User(Principal principal) {
super(principal.getUsername(), principal.getPassword(), principal.getAuthorities());
uuid = principal.getUuid();
id = principal.getId();
name = principal.getName();
isGymAdmin = hasAnyRole(Role.ROLE_ADMIN);
isCustomBranding= hasAnyRole(Role.ROLE_CUSTOM_BRANDING);
locations.addAll(principal.getLocations());
**json = JSONAdapter.getGenericSerializer().serialize(this);**
}
public String toJson() {
return json;
}
The User object has no setter methods, there is no way the data would ever change unless the user signs out and then back in, so in this case I'd say it is safe to cache the value.
If the value of size was calculated each time say by looping through an array and thus not O(1), caching the value would have obvious benefits performance-wise. However since size of Foo is not expected to change at any point and it is O(1), caching the value mainly aids in readability. I recommend continuing to cache the value simply because readability is often times more of a concern than performance in modern computing systems.
IMO, if you are really worried about performance this is a bit overkill or extensive but there is a couple of ways to ensure that the variable is "cached" by your VM,
First, you can create final static variables of the results (as per your example 1 or 0), hence only one copy is stored for the whole class, then your local variable is only a boolean (using only 1 bit), but still maintaining the result value of double (also, maybe you can use int, if it is only 0 or 1)
private static final double D_ZERO = 0.0;
private static final double D_ONE = 1.0;
private boolean ZERO = false;
public double getSize(){
return (ZERO ? D_ZERO : D_ONE);
}
Or if you are able to set the size on initialization of the class you can go with this, you can set the final variable through constructor, and static, but since this is a local variable you can go with the constructor:
private final int SIZE;
public foo(){
SIZE = 0;
}
public double getSize(){
return this.SIZE;
}
this can be accessed via foo.getSize()
In my code, i would cache it if either the getSize() method is time consuming or - and that is more often - the result is used in more or less complex expressions.
For example if calculating an offset from the size
int offset = fooSize * count1 + fooSize * count2;
is easier to read (for me) than
int offset = foo.getSize() * count1 + foo.getSize() * count2;

java - is initializing a temporary variable for simple getters better or not?

A very unimportant question about Java performance, but it made me wondering today.
Say I have simple getter:
public Object getSomething() {
return this.member;
}
Now, say I need the result of getSomething() twice (or more) in some function/algorithm. My question: is there any difference in either calling getSomething() twice (or more) or in declaring a temporary, local variable and use this variable from then on?
That is, either
public void algo() {
Object o = getSomething();
... use o ...
}
or
public void algo() {
... call getSomething() multiple times ...
}
I tend to mix both options, for no specific reason. I know it doesn't matter, but I am just wondering.
Thanks!
Technically, it's faster to not call the method multiple times, however this might not always be the case. The JVM might optimize the method calls to be inline and you won't see the difference at all. In any case, the difference is negligible.
However, it's probably safer to always use a getter. What if the value of the state changes between your calls? If you want to use a consistent version, then you can save the value from the first call. Otherwise, you probably want to always use the getter.
In any case, you shouldn't base this decision on performance because it's so negligible. I would pick one and stick with it consistently. I would recommend always going through your getters/setters.
Getters and setters are about encapsulation and abstraction. When you decide to invoke the getter multiple times, you are making assumptions about the inner workings of that class. For example that it does no expensive calculations, or that the value is not changed by other threads.
I'd argue that its better to call the getter once and store its result in a temporary variable, thus allowing you to freely refactor the implementing class.
As an anecdote, I was once bitten by a change where a getter returned an array, but the implementing class was changed from an array property to using a list and doing the conversion in the getter.
The compiler should optimize either one to be basically the same code.

Categories