transient variables and static methods in hadoop, dev seeks enlightenment

transient variables and static methods in hadoop, dev seeks enlightenment - java

I am looking at production code in hadoop framework which does not make sense. Why are we using transient and why can't I make the utility method a static method (was told by the lead not to make isThinger a static method)? I looked up the transient keyword and it is related to serialization. Is serialization really used here?
//extending from MapReduceBase is a requirement of hadoop
public static class MyMapper extends MapReduceBase {
// why the use of transient keyword here?
transient Utility utility;
public void configure(JobConf job) {
String test = job.get("key");
// seems silly that we have to create Utility instance.
// can't we use a static method instead?
utility = new Utility();
boolean res = utility.isThinger(test);
foo (res);
}
void foo (boolean a) { }
}
public class Utility {
final String stringToSearchFor = "ineverchange";
// it seems we could make this static. Why can't we?
public boolean isThinger(String word) {
boolean val = false;
if (word.indexOf(stringToSearchFor) > 0) {
val = true;
}
return val;
}
}

The problem in your code is the difference between the local mode (dev&testcases using it usually) and the distributed mode.
In the local mode everything will be inside a single JVM, so you can safely assume that if you change a static variable (or a static method that shares some state, in your case stringToSearchFor) the change will be visible for the computation of every chunk of input.
In distributed mode, every chunk is processed in its own JVM. So if you change the state (e.G. in stringToSearchFor) this won't be visible for every other process that runs on other hosts/jvms/tasks.
This is an inconsistency that leads to the following design principles when writing map/reduce functions:
Be as stateless as possible.
If you need state (mutable classes for example), never declare references in the map/reduce classes static (otherwise it will behave different when testing/develop than in production)
Immutable constants (for example configuration keys as String) should be defined static and final.
transient in Hadoop is pretty much useless, Hadoop is not serializing anything in the usercode (Mapper/Reducer) class/object. Only if you do something with the Java serialization which we don't know of, this will be an issue.
For your case, if the Utility is really a utility and stringToSearchFor is an immutable constant (thus not be changed ever), you can safely declare isThinger as static. And please remove that transient, if you don't do any Java serialization with your MapReduceBase.

Unless there is something not shown here, then I suspect that the matter of making Utility a static method largely comes down to style. In particular, if you are not injecting the Utility instance rather than instantiating it on demand within, then it is rather pointless. As it is written, it cannot be overridden nor can it be more easily tested than static method.
As for transient, you are right that it is unnecessary. I wouldn't be surprised if the original developer was using Serialization somewhere in the inheritance or implementation chain, and that they were avoiding a compiler warning by marking the non-serializable instance variable as transient.

Related

Is DatatypeConverter thread-safe?

In particular, is the method javax.xml.bind.DatatypeConverter.parseBase64Binary(String) thread-safe?

There is nothing in the documentation that suggests the class is thread-safe. Therefore I recommend you assume it isn't.
I'd recommend Base64 from Apache Commons Codec which states it's thread-safe in the documentation.

My reading of the source code is that that implementation is thread-safe.
The parseBase64Binary method calls a parseBase64Binary method on a shared DatatypeConverterImpl object that is lazily created.
It is easy to see that the lazy creation is thread safe. (The code is in another Answer ...)
An examination of DatatypeConverterImpl shows that it has no instance variables, so there cannot be thread-safety concerns over access / update to the state of the instance.
The DatatypeConverterImpl.parseBase64Binary method (in turn) calls the static _parseBase64Binary method.
The _parseBase64Binary method uses its input (which is immutable) and local variables that refer to thread-confined objects. The only exception is the decodeMap variable which is a private static final array.
The decodeMap variable is initialized and safely published during class (static) initialization.
Once initialized, the decodeMap variable is only ever read. Thus, there can be no synchronization issues or memory model "hazards" related to updates.
Of course, this analysis only applies to the version of the class that I linked to. It is conceivable that the method is not thread-safe in other versions. (But the source code is freely available for multiple versions, so you should be able to check this for the version of JAXP that you are using.)

After taking a look at the javax.xml.bind.DatatypeConverter.parseBase64Binary(String) source code (JAXB api) which is a static method using an immutable class as an argument.
final public class DatatypeConverter {
...
// delegate to this instance of DatatypeConverter
private static volatile DatatypeConverterInterface theConverter = null;
...
public static byte[] parseBase64Binary( String lexicalXSDBase64Binary ) {
if (theConverter == null) initConverter();
return theConverter.parseBase64Binary( lexicalXSDBase64Binary );
}
...
private static synchronized void initConverter() {
theConverter = new DatatypeConverterImpl();
}
...
}
We can assume that it's thread safe. The initConverter() method is static synchronized.

What is the alternative to a static initialization block?

My projects had some developer who loved a static initialization block. What is the alternative to this? What is the downside of this alternative?
public class BlockTest {
String test = new String();
static{
test = "test string";
}
}
As far as I understood the static initialization block is used to set values of static field if it cannot be done in one line. But I do not understand why we need a special block for that. This leads to less readability and some confusion.

The example is not good. First of all it does not compile, you cannot assign a instance variable from static init block. But if even it were correct
public class BlockTest {
static String test = new String();
static{
test = "test string";
}
it would make no sense since it is equivalent to
public class BlockTest {
static String test = "test string";
but this static init block has no alternative
public class Object {
private static native void registerNatives();
static {
registerNatives();
}
...

It can be used for performing all the tasks that needs to be done when the class is referred for the first time, even before the instances of the class are created. It could have call to different methods or just initialization of static members. Static block ensures that these activities will be performed only once in the lifetime of the class and will be performed before any other operation takes place with regard to the class.
Programmer can depend on static block as it is ensured that the block will be executed only once and before any other activity related to that class is performed.
Moreover, I do not think it hampers readability at all. It again may vary from person to person.

If you have static members in your class which require a longer handling, you won't get around a static initializer (constructor). These must be initialized somewhere after all. You could do that in the constructor of your class, but then you would reinitialize these values EVERYTIME you create a new object.
There is no real alternative if you must handle more than just a simple initialization.
See this post and this.

If you have simple assignments, you can do the assignment directly in the member declaration. No need for a separate block which just extends the complexity and readabillity.
An alternative would be to use a lazy initialization. Advantage is that it can also be arbitrary complex, but is only executed when actually needed. But of course this only works if you have getters in your classes. If you access the members directly, then this would be a big change.

IMHO,There is no need for static block.
String test = "test string";
And From docs
Instance variables can be initialized in constructors, where error handling or other logic can be used. To provide the same capability for class variables, the Java programming language includes static initialization blocks.
But
Note: It is not necessary to declare fields at the beginning of the class definition, although this is the most common practice. It is only necessary that they be declared and initialized before they are used.

Java Constructor and static method

When should I use a constructor and when should I use static method?
Can you explain above with small snippet? I skimmed through a few threads but I'm still not clear with this.

Joshua Bloch advises to favor static factory methods instead of constructors (which I think is a good practice). Couple of advantages and disadvantages :
Advantages of static factory methods :
unlike constructors, they have names
unlike constructors, they are not required to create a new object each time they're invoked (you can cache instances : e.g. Boolean.valueOf(..)
unlike constructors, they can return an object of any subtype of their return type (great flexibility)
Disadvantages of static factory methods :
They are not really distiguishable from other static methods (it's hard to find out how to initialize an object if you are not familiar with the API)
The main disadvantage (if you use only static factory methods, and make constructors private) is that you cannot subclass that class.

Use a public constructor when you only ever want to return a new object that type and you want simplicity.
A good example is StringBuilder as it's mutable and you are likely to want a new object each time.
public String toString() {
StringBuilder sb = new StringBuilder();
// append fields to the sb
return sb.toString();
}
Use a static factor method when you might want to re-use objects (esp if immutable), you might want to return a sub-class or you want descriptice construction. A good example is EnumSet which has a number of static factories which do different things even though some have the same arguments.
EnumSet.noneOf(RetentionPolicy.class);
// has the same arguments, but is not the same as
EnumSet.allOf(RetentionPolicy.class);
In this case, using a static factory makes it clear what the difference between these two ways of construction the set.
Also EnumSet can return two different implementations, one optimised for enums with a small number of values (<= 64) RegularEnumSet and another for many values called JumboEnumSet

Always use a constructor if your class has a state (even for a single instance; singleton pattern ).
Only use static for utility methods like in java.lang.Math
Example:
public static int max(int a, int b) {
return (a >= b) ? a : b;
}
Doesn't change any state (instance variables) of an object, thus it can be declared static.

Use constructor when you need an object and other stuffs like functions and variables having one copy for every object.
when you want to do something without creating object then use static method.
Example:
public class Test {
public int value;
public static int staticValue;
public int getValue() {
return ++value;
}
public static int getStaticValue() {
return ++staticValue;
}
}
public class TestClass {
public static void main(String[] args) {
Test obj = new Test();
Test obj1 = new Test();
S.o.p(obj.getValue());
S.o.p(obj1.getValue));
S.o.p(Test.getStaticValue());
S.o.p(Test.getStaticValue());
}
}

Static factory methods have names, constructors don't. Thus factory methods can carry natural documentation about what they do that constructors can't. For example, see the factory methods in the Guava Libraries, like ImmutableMap.copyOf(otherMap). While this might have little effect on behaviour of construction, it has a huge effect on readability of the code. Definitely consider this if you're publishing an API.
Also you can use a factory when you need to do any more complicated configuration of the object you're creating, especially if you need to publish to other threads (registering in pools, exposing as an MBean, all manner of other things...) to avoid racy publication. (See e.g. Java Concurrency In Practice section 3.2)
Static methods that do something (e.g. Math.min) are not really the same thing as static factories, which can be considered direct replacements for constructors, with added flexibility, evolvability and (often) clarity.

Whenever you need to create an instance of an object you will have to use the constructor.
So, if you want to create a Car object, then you will need a constructor for that.
The keyword static means, that your method can be called without creating an instance.

class Car
{
private int num_of_seats;
public Car(int number_of_seats)
{
this.num_of_seats = number_of_seats;
}
// You want to get the name of the class that has to do with
// this class, but it's not bounded with any data of the class
// itself. So you don't need any instance of the class, and
// you can declare it as static.
static String getClassName()
{
return "[Car]";
}
}
In general you will use static class with data that are not correlated with the instance of the object.
Another example is:
class Ring
{
private List nodes;
public Ring(List nodes)
{
this.nodes = nodes;
}
// You want to calculate the distance of two ids on the ring, but
// you don't care about the ring. You care only about the ids.
// However, this functionality logical falls into the notion of
// the ring, that's why you put it here and you can declare it
// as static. That way you don't have to manage the instance of
// ring.
static double calculateDistance(int id_1, int id_2)
{
return (id_1 - id_2)/383; // The divisor is just random just like the calculation.
}
}
As the posts above say, it's just a matter of what you want to do and how you want to do it. Also, don't try to understand everything rightaway, write some code then try different approaches of that code and try to understand what your code does. Examples are good, but you need to write and then understand what you did. I think it's the only way you will figure out
why you do staff the way you have to do.

Static methods do not have to instantiate new objects everytime. Since object instantiation is expensive it allows instances to be cached within the object. So, it can improve performance.
This is the explanation from the Effective Java :
This allows immutable classes (Item 15) to use preconstructed
instances, or to cache instances as they’re constructed, and dispense
them repeatedly to avoid creating unnecessary duplicate objects. The
Boolean.valueOf(boolean) method illustrates this technique: it never
creates an object. This technique is similar to the Flyweight pattern
[Gamma95, p. 195]. It can greatly improve performance if equivalent
objects are requested often, especially if they are expensive to
create.

i.e. if you want to use a singleton, which means that you have only one instance of the object, which might be shared with others, then you need a static method, which will internally will call the constructor. So, every time someone wants an instance of that object you will return always the same, thus you will consume memory only for one. You always need a constructor in object oriented programming, in every OO language. In java an in many other languages the default constructor of an object is implied, and built automatically. But you need some custom functionality you have to make your own.
Above you see a few good examples of the usage. However, if you have something specific in your mind, please let us know. I mean if you have a specific case where you are not sure if you should use a static method or a constructor. Anyhow, you will definitely need a constructor, but I am not sure about the static method.

java: "final" System.out, System.in and System.err?

System.out is declared as public static final PrintStream out.
But you can call System.setOut() to reassign it.
Huh? How is this possible if it's final?
(same point applies to System.in and System.err)
And more importantly, if you can mutate the public static final fields, what does this mean as far as the guarantees (if any) that final gives you? (I never realized nor expected System.in/out/err behaved as final variables)

JLS 17.5.4 Write Protected Fields:
Normally, final static fields may not be modified. However System.in, System.out, and System.err are final static fields that, for legacy reasons, must be allowed to be changed by the methods System.setIn, System.setOut and System.setErr. We refer to these fields as being write-protected to distinguish them from ordinary final fields.
The compiler needs to treat these fields differently from other final fields. For example, a read of an ordinary final field is "immune" to synchronization: the barrier involved in a lock or volatile read does not have to affect what value is read from a final field. Since the value of write-protected fields may be seen to change, synchronization events should have an effect on them. Therefore, the semantics dictate that these fields be treated as normal fields that cannot be changed by user code, unless that user code is in the System class.
By the way, actually you can mutate final fields via reflection by calling setAccessible(true) on them (or by using Unsafe methods). Such techniques are used during deserialization, by Hibernate and other frameworks, etc, but they have one limitation: code that have seen value of final field before modification is not guaranteed to see the new value after modification. What's special about the fields in question is that they are free of this limitation since they are treated in special way by the compiler.

Java uses a native method to implement setIn(), setOut() and setErr().
On my JDK1.6.0_20, setOut() looks like this:
public static void setOut(PrintStream out) {
checkIO();
setOut0(out);
}
...
private static native void setOut0(PrintStream out);
You still can't "normally" reassign final variables, and even in this case, you aren't directly reassigning the field (i.e. you still can't compile "System.out = myOut"). Native methods allow some things that you simply can't do in regular Java, which explains why there are restrictions with native methods such as the requirement that an applet be signed in order to use native libraries.

To extend on what Adam said, here is the impl:
public static void setOut(PrintStream out) {
checkIO();
setOut0(out);
}
and setOut0 is defined as:
private static native void setOut0(PrintStream out);

Depends on the implementation. The final one may never change but it could be a proxy/adapter/decorator for the actual output stream, setOut could for example set a member that the out member actually writes to. In practice however it is set natively.

the out which is declared as final in System class is a class level variable.
where as out which is in the below method is a local variable.
we are no where passing the class level out which is actually a final one into this method
public static void setOut(PrintStream out) {
checkIO();
setOut0(out);
}
usage of the above method is as below:
System.setOut(new PrintStream(new FileOutputStream("somefile.txt")));
now the data will be diverted to the file.
hope this explanation makes the sense.
So no role of native methods or reflections here in changing purpose of the final keyword.

As far as how, we can take a look at the source code to java/lang/System.c:
/*
* The following three functions implement setter methods for
* java.lang.System.{in, out, err}. They are natively implemented
* because they violate the semantics of the language (i.e. set final
* variable).
*/
JNIEXPORT void JNICALL
Java_java_lang_System_setOut0(JNIEnv *env, jclass cla, jobject stream)
{
jfieldID fid =
(*env)->GetStaticFieldID(env,cla,"out","Ljava/io/PrintStream;");
if (fid == 0)
return;
(*env)->SetStaticObjectField(env,cla,fid,stream);
}
...
In other words, JNI can "cheat". ; )

I think setout0 is modifying local level variable out, it can't modify class level variable out.

Change private member to default for testing

Is that good idea to change private class members to default(package access) for testing their behavior? I mean test case should destinate in test directory but in same package as tested member's class.
EDIT: All you guys tell the true. But classes have helper private methods often. And these methods can be complicated so need to be tested. And that is too bad - to test public methods for ensure correct working for private complicated methods. Don't you think so?

I generally prefer writing my classes and tests in a way that writing the tests against the public API makes sense. So basically I'm saying if you need to access the private state of your class under test you're probably already too involved in the internals of that class with your test..

No, it isn't. Because changing the test object may change the result. If you really need to call private members or methods during test, it's safer to add an accessor. This still changes the class, but with a lower risk. Example:
private void method() { /* ... */ }
// For testing purpose only, remove for production
#Deprecated // just another way to create awareness ;)
void testMethod() {
method();
}
OK - one more solution, if you need to test private methods: you can call any method with reflection and instantiation API.
Assuming, we have:
public class SomeClass {
private Object helper(String s, String t) { /* ... +/ }
}
then we can test it like
#Test public void testHelper() {
try {
SomeClass some = new SomeClass();
Method helperMethod = some.getClass().getDeclaredMethod("helper", String.class, String,class);
helperMethod.setAccessible(true);
Object result = helperMethod.invoke(some, "s", "t");
// do some assert...
catch(Exception e) {
// TODO - proper exception handling
}
}

I understand what you mean about needing to test private methods, and I also see why people say only test the public methods. I have just encountered some legacy code that has a lot of private methods, some of which are called by public methods, but some are threads, or called by threads, which are kicked off when the object is constructed. Since the code is riddled with bugs and lacks any comments I am forced to test the private code.
I have used this method to address the issue.
MainObject.cs
class MainObject
{
protected int MethodOne(); // Should have been private.
....
}
TestMainObject.cs
class ExposeMainObject : MainObject
{
public int MethodOne();
}
class TestMainObject
{
public void TestOne()
{
}
}
Since the test objects aren't shipped I can't see a problem with it, but if there is please tell me.

Testing trumps privacy modifiers. Really, how often is a bug caused by having "a little too much" visibility for a method? Compared to bugs caused by a method that was not fully tested?
It would be nice if Java had a "friend" option, like C++. But a limitation in the language should never be an excuse for not testing something.
Michael Feathers chimes in on this debate in "Working Effectively with Legacy Code" (excellent book), and suggests that this may be a smell of a sub-class that wants to be extracted (and have public methods).
In our shop (~ 1M LOC), we replace 'private' with '/TestScope/' as an indicator that a method should be effectively private, but still testable.
Trying to circumvent 'private' with reflection is IMHO a smell. It's making the tests harder to write, read, and debug in order to retain a 'fetish' of privacy, which you're working around anyway. Why bother?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

transient variables and static methods in hadoop, dev seeks enlightenment - java

Related

Is DatatypeConverter thread-safe?

What is the alternative to a static initialization block?

Java Constructor and static method

java: "final" System.out, System.in and System.err?

Change private member to default for testing

Categories

Resources