In particular, is the method javax.xml.bind.DatatypeConverter.parseBase64Binary(String) thread-safe?
There is nothing in the documentation that suggests the class is thread-safe. Therefore I recommend you assume it isn't.
I'd recommend Base64 from Apache Commons Codec which states it's thread-safe in the documentation.
My reading of the source code is that that implementation is thread-safe.
The parseBase64Binary method calls a parseBase64Binary method on a shared DatatypeConverterImpl object that is lazily created.
It is easy to see that the lazy creation is thread safe. (The code is in another Answer ...)
An examination of DatatypeConverterImpl shows that it has no instance variables, so there cannot be thread-safety concerns over access / update to the state of the instance.
The DatatypeConverterImpl.parseBase64Binary method (in turn) calls the static _parseBase64Binary method.
The _parseBase64Binary method uses its input (which is immutable) and local variables that refer to thread-confined objects. The only exception is the decodeMap variable which is a private static final array.
The decodeMap variable is initialized and safely published during class (static) initialization.
Once initialized, the decodeMap variable is only ever read. Thus, there can be no synchronization issues or memory model "hazards" related to updates.
Of course, this analysis only applies to the version of the class that I linked to. It is conceivable that the method is not thread-safe in other versions. (But the source code is freely available for multiple versions, so you should be able to check this for the version of JAXP that you are using.)
After taking a look at the javax.xml.bind.DatatypeConverter.parseBase64Binary(String) source code (JAXB api) which is a static method using an immutable class as an argument.
final public class DatatypeConverter {
...
// delegate to this instance of DatatypeConverter
private static volatile DatatypeConverterInterface theConverter = null;
...
public static byte[] parseBase64Binary( String lexicalXSDBase64Binary ) {
if (theConverter == null) initConverter();
return theConverter.parseBase64Binary( lexicalXSDBase64Binary );
}
...
private static synchronized void initConverter() {
theConverter = new DatatypeConverterImpl();
}
...
}
We can assume that it's thread safe. The initConverter() method is static synchronized.
Related
At the about bottom of http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html, it says:
Double-Checked Locking Immutable Objects
If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic.
The sample and explanation of mutable one is as follows:
// Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null)
synchronized(this) {
if (helper == null)
helper = new Helper();
}
return helper;
}
// other functions and members...
}
The first reason it doesn't work
The most obvious reason it doesn't work it that the writes that initialize the Helper object and the write to the helper field can be done or perceived out of order. Thus, a thread which invokes getHelper() could see a non-null reference to a helper object, but see the default values for fields of the helper object, rather than the values set in the constructor.
If the compiler inlines the call to the constructor, then the writes that initialize the object and the write to the helper field can be freely reordered if the compiler can prove that the constructor cannot throw an exception or perform synchronization.
Even if the compiler does not reorder those writes, on a multiprocessor the processor or the memory system may reorder those writes, as perceived by a thread running on another processor.
My question is: why immutable class does't have the problem? I cannot see any relation of the reorder with whether the class is mutable.
Thanks
The reason why the code is "broken" for usual objects is that helper could be non null but point to an object that has not been completely initialised yet as explained in your quote.
However if the Helper class is immutable, meaning that all its fields are final, the Java Memory Model guarantees that they are safely published even if the object is made available through a data race (which is the case in your example):
final fields also allow programmers to implement thread-safe immutable objects without synchronization. A thread-safe immutable object is seen as immutable by all threads, even if a data race is used to pass references to the immutable object between threads. This can provide safety guarantees against misuse of an immutable class by incorrect or malicious code. final fields must be used correctly to provide a guarantee of immutability.
Immutable classes did have the problem. The part that you have quoted is true after changes to the Java Memory were made in JSR133.
Specifically the changes that affect immutable objects are related to some changes that were made to the final keyword. Checkout http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight.
The important part is:
The values for an object's final fields are set in its constructor. Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization.
I am reading static method synchronization in java. Where i read static methods get a lock on object of java.lang.class. I was trying to understand the concept of java.lang.class and its role in static method synchronization and i have these questions.
I was reading the blog where it says every class in Java has an instance of java.lang.Class and all instances of a class share this object. Instance of java.lang.Class describes type of object? What is the role of java.lang.Class here? How does it describes type of object?
Secondly for static method synchronization, we need to get the monitor of java.lang.Class. Why is that? Why do we need a lock on java.lang.Class monitor? Why not on the instance of our own class for example Test(my own custom class)?
Can someone elaborate on it. I am really sorry because it sounds a pretty basic question but i am pretty new to this concept.
Tentative explanation, although admittedly it is not fully correct. For whatever class C, when you do:
final C c = new C();
two Objects are involved here: the Class<C> object (which is provided via the context classloader) and the c instance. c will know which class it is via its .getClass() method (defined in Object).
The fact that the new keyword is able to establish a "backlink" to the correct Class is the responsibility of the JVM implementation. While this is certainly mentioned in the JLS, I cannot tell where...
Now, more to the point.
If you have a method declared as:
synchronized void whatever() { someCode(); }
then it is roughly equivalent to (why roughly: see below):
void whatever()
{
synchronized(this) {
someCode();
}
}
That is, this code is synchronized at the instance level.
If the method is static however, this:
public static synchronized void meh() { someOtherCode(); }
is roughly equivalent to (why roughly: see below):
public static void meh()
{
synchronized(getClass()) {
someOtherCode();
}
}
One thing to note is that all Class objects are singletons; no matter how many instances of class C you create, .getClass() will always return the same Class object. Try this:
public static void main(final String... args)
{
final String s1 = "foo";
final String s2 = "bar";
System.out.println(s1.getClass() == s2.getClass()); // true
}
Add the fact that getClass() is equivalent to this.getClass() and you get the picture. Class itself being an Object, it obeys the monitor rules of any Object.
And since here we always refer to the exact same object, monitor rules apply ;)
Now, "roughly": in the code written above, the logic is the same; however, depending on how you write that code, the bytecode may differ; but the JIT will have its say in it and will eventually optimize code paths.
Every object in java is an instance of some class. In addition to that every class is an object too, so it is an instance of some class too.
Instance of java.lang.Class describes type of object?
Not exactly. java.lang.Class is a class of class instance.
What is the role of java.lang.Class here? How does it describes type of object?
It describes type of all types.
Secondly for static method synchronization, we need to get the monitor of java.lang.Class. Why is that? Why do we need its instances lock not our class lock?
You need to synchronize on some object. Static methods have no access to this, by definition, so the only shared thing that is left is a class where they are defined.
Secondly for static method synchronization, we need to get the monitor
of java.lang.Class. Why is that? Why do we need a lock on
java.lang.Class monitor? Why not on the instance of our own class for
example Test(my own custom class)?
There are two ways to synchronize static methods. One is this :
static synchronized void methodName(){}
In this case, user need not care about acquiring the lock externally. Internally all the static methods of this class which are marked as synchronized will need to acquire the lock to its java.lang.class instance. It is very obvious in this case that, instance(new Class()) locks cannot be acquired over here as the method is static, and static methods can exist without instances of the class. Also static methods are shared by all the objects of that class. So instances of this class is out of question.
Other way is to use synchronized block inside static method :
static void methodName() {
synchronized(ClassName.class){ // same as above approach
// method defination
}
synchronized(this){ } // not allowed. compile time error
// to get lock of instance of this class you do as shown below. But it is not useful at all. Because every time u acquire different instance. So synchronization is not achieved.
synchronized(new Class()){ }
}
OR
static OtherClass lock = new OtherClass();
static void methodName() {
synchronized(lock){ // instance of other class can be used a well
// method defination
}
}
The class java.lang.Class is a representation of your class. The primary use of the class Class is to use reflection (gettings constructors and methods for example).
For this think of it as a meta object... all instances of one class share this meta object.
The designers of java have chosen that monitors have to work on objects. To have a monitor on a static method you have to use the afore mentioned meta object (or class).
I think that this made the design and implementation of the monitor for synchronized blocks easier. Also, as mentioned, the class java.lang.Class is used for reflection and therefore is there already.
I am looking at production code in hadoop framework which does not make sense. Why are we using transient and why can't I make the utility method a static method (was told by the lead not to make isThinger a static method)? I looked up the transient keyword and it is related to serialization. Is serialization really used here?
//extending from MapReduceBase is a requirement of hadoop
public static class MyMapper extends MapReduceBase {
// why the use of transient keyword here?
transient Utility utility;
public void configure(JobConf job) {
String test = job.get("key");
// seems silly that we have to create Utility instance.
// can't we use a static method instead?
utility = new Utility();
boolean res = utility.isThinger(test);
foo (res);
}
void foo (boolean a) { }
}
public class Utility {
final String stringToSearchFor = "ineverchange";
// it seems we could make this static. Why can't we?
public boolean isThinger(String word) {
boolean val = false;
if (word.indexOf(stringToSearchFor) > 0) {
val = true;
}
return val;
}
}
The problem in your code is the difference between the local mode (dev&testcases using it usually) and the distributed mode.
In the local mode everything will be inside a single JVM, so you can safely assume that if you change a static variable (or a static method that shares some state, in your case stringToSearchFor) the change will be visible for the computation of every chunk of input.
In distributed mode, every chunk is processed in its own JVM. So if you change the state (e.G. in stringToSearchFor) this won't be visible for every other process that runs on other hosts/jvms/tasks.
This is an inconsistency that leads to the following design principles when writing map/reduce functions:
Be as stateless as possible.
If you need state (mutable classes for example), never declare references in the map/reduce classes static (otherwise it will behave different when testing/develop than in production)
Immutable constants (for example configuration keys as String) should be defined static and final.
transient in Hadoop is pretty much useless, Hadoop is not serializing anything in the usercode (Mapper/Reducer) class/object. Only if you do something with the Java serialization which we don't know of, this will be an issue.
For your case, if the Utility is really a utility and stringToSearchFor is an immutable constant (thus not be changed ever), you can safely declare isThinger as static. And please remove that transient, if you don't do any Java serialization with your MapReduceBase.
Unless there is something not shown here, then I suspect that the matter of making Utility a static method largely comes down to style. In particular, if you are not injecting the Utility instance rather than instantiating it on demand within, then it is rather pointless. As it is written, it cannot be overridden nor can it be more easily tested than static method.
As for transient, you are right that it is unnecessary. I wouldn't be surprised if the original developer was using Serialization somewhere in the inheritance or implementation chain, and that they were avoiding a compiler warning by marking the non-serializable instance variable as transient.
Suppose I have a Utility class,
public class Utility {
private Utility() {} //Don't worry, just doing this as guarantee.
public static int stringToInt(String s) {
return Integer.parseInt(s);
}
};
Now, suppose, in a multithreaded application, a thread calls, Utility.stringToInt() method and while the operation enters the method call, another thread calls the same method passing a different s.
What happens in this case? Does Java lock a static method?
There is no issue here. Each thread will use its own stack so there is no point of collision among different s. And Integer.parseInt() is thread safe as it only uses local variables.
Java does not lock a static method, unless you add the keyword synchronized.
Note that when you lock a static method, you grab the Mutex of the Class object the method is implemented under, so synchronizing on a static method will prevent other threads from entering any of the other "synchronized" static methods.
Now, in your example, you don't need to synchronize in this particular case. That is because parameters are passed by copy; so, multiple calls to the static method will result in multiple copies of the parameters, each in their own stack frame. Likewise, simultaneous calls to Integer.parseInt(s) will each create their own stack frame, with copies of s's value passed into the separate stack frames.
Now if Integer.parseInt(...) was implemented in a very bad way (it used static non-final members during a parseInt's execution; then there would be a large cause for concern. Fortunately, the implementers of the Java libraries are better programmers than that.
In the example you gave, there is no shared data between threads AND there is no data which is modified. (You would have to have both for there to be a threading issue)
You can write
public enum Utility {
; // no instances
public synchronized static int stringToInt(String s) {
// does something which needs to be synchronised.
}
}
this is effectively the same as
public enum Utility {
; // no instances
public static int stringToInt(String s) {
synchronized(Utility.class) {
// does something which needs to be synchronised.
}
}
}
however, it won't mark the method as synchronized for you and you don't need synchronisation unless you are accessing shared data which can be modified.
It should not unless specified explicitly. Further in this case, there wont be any thread safety issue since "s" is immutable and also local to the method.
You dont need synchronization here as the variable s is local.
You need to worry only if multiple threads share resources, for e.g. if s was static field, then you have to think about multi-threading.
System.out is declared as public static final PrintStream out.
But you can call System.setOut() to reassign it.
Huh? How is this possible if it's final?
(same point applies to System.in and System.err)
And more importantly, if you can mutate the public static final fields, what does this mean as far as the guarantees (if any) that final gives you? (I never realized nor expected System.in/out/err behaved as final variables)
JLS 17.5.4 Write Protected Fields:
Normally, final static fields may not be modified. However System.in, System.out, and System.err are final static fields that, for legacy reasons, must be allowed to be changed by the methods System.setIn, System.setOut and System.setErr. We refer to these fields as being write-protected to distinguish them from ordinary final fields.
The compiler needs to treat these fields differently from other final fields. For example, a read of an ordinary final field is "immune" to synchronization: the barrier involved in a lock or volatile read does not have to affect what value is read from a final field. Since the value of write-protected fields may be seen to change, synchronization events should have an effect on them. Therefore, the semantics dictate that these fields be treated as normal fields that cannot be changed by user code, unless that user code is in the System class.
By the way, actually you can mutate final fields via reflection by calling setAccessible(true) on them (or by using Unsafe methods). Such techniques are used during deserialization, by Hibernate and other frameworks, etc, but they have one limitation: code that have seen value of final field before modification is not guaranteed to see the new value after modification. What's special about the fields in question is that they are free of this limitation since they are treated in special way by the compiler.
Java uses a native method to implement setIn(), setOut() and setErr().
On my JDK1.6.0_20, setOut() looks like this:
public static void setOut(PrintStream out) {
checkIO();
setOut0(out);
}
...
private static native void setOut0(PrintStream out);
You still can't "normally" reassign final variables, and even in this case, you aren't directly reassigning the field (i.e. you still can't compile "System.out = myOut"). Native methods allow some things that you simply can't do in regular Java, which explains why there are restrictions with native methods such as the requirement that an applet be signed in order to use native libraries.
To extend on what Adam said, here is the impl:
public static void setOut(PrintStream out) {
checkIO();
setOut0(out);
}
and setOut0 is defined as:
private static native void setOut0(PrintStream out);
Depends on the implementation. The final one may never change but it could be a proxy/adapter/decorator for the actual output stream, setOut could for example set a member that the out member actually writes to. In practice however it is set natively.
the out which is declared as final in System class is a class level variable.
where as out which is in the below method is a local variable.
we are no where passing the class level out which is actually a final one into this method
public static void setOut(PrintStream out) {
checkIO();
setOut0(out);
}
usage of the above method is as below:
System.setOut(new PrintStream(new FileOutputStream("somefile.txt")));
now the data will be diverted to the file.
hope this explanation makes the sense.
So no role of native methods or reflections here in changing purpose of the final keyword.
As far as how, we can take a look at the source code to java/lang/System.c:
/*
* The following three functions implement setter methods for
* java.lang.System.{in, out, err}. They are natively implemented
* because they violate the semantics of the language (i.e. set final
* variable).
*/
JNIEXPORT void JNICALL
Java_java_lang_System_setOut0(JNIEnv *env, jclass cla, jobject stream)
{
jfieldID fid =
(*env)->GetStaticFieldID(env,cla,"out","Ljava/io/PrintStream;");
if (fid == 0)
return;
(*env)->SetStaticObjectField(env,cla,fid,stream);
}
...
In other words, JNI can "cheat". ; )
I think setout0 is modifying local level variable out, it can't modify class level variable out.