Why is Java's HashTable synchronized? - java

What was the driving factor or design plan in making the methods of HashTable synchronized?
This link says that HashTable is synchronized because its methods are synchronized. But, I want to know the reason "why" the methods were synchronized?
Was it just to provide some synchronization feature? A developer could explicitly handle a race condition through synchronization techniques. Why provide HashTable with this feature?

Keep in mind: these classes were created "ages" ago - when you check the javadoc for Hashtable, you find it says "since Java 1.0"; whereas HashMap says "1.2"!
Back then, Java was trying to compete with languages like C and C++; by providing unique selling points such as "built-in concurrency".
But people quickly figured that one better synchronizes containers when using them in multi-threaded environments!
So my (more of an opinion-based) answer is: at the time when this class was first designed, people assumed that the requirement "can be used by multiple threads" was more important than "gives optimal performance".
Because Java was "advertised" like: "use it to write multi-threaded write once run everywhere code". That approach fails quickly when the default container classes given to people need additional outside wrapping to actually make them "multi-threaded" ready.
During the years, the people behind Java started to understand that "more granular" solutions are required. Therefore the core collection classes are not synchronized to avoid the corresponding performance hits. Meaning: the default with collections is to go "unprotected"; so you have to put in some thoughts when your requirements is that "multi-threaded" correctness.
Same for "lists" btw: Vector is synchronized; ArrayList is not.

We cannot tell you why. Those who designed Java over two decades ago maybe can. It's not a useful question. Assuming you actually wanted to ask about java.util.Hashtable and not the fictional HashTable type, bear in mind that it's been obsolescent for nineteen years. Nineteen years! Don't use it. It (and Vector) have cruft that the replacement types, both synchronized and unsynchronized, do not carry. Use the modern (as of nineteen years ago) types.

Related

Violation of single responsibility principle in Iterator from Java core

Why java.util.Iterator interface has method remove()?
Certainly sometimes this method is necessary and all have become accustomed to its presence. But in fact the main and only objective of the iterator is just to provide access container elements. And when someone wants to create his own implementation for this interface, and cannot or does not want for any reason to provide the ability to remove an element, then he is forced to throw UnsupportedOperationException. And throwing of that exception usually indicates a not too well thought out architecture or some flaws in design.
Really I don't understand the reasons for such a decision. And I guess it would be more correctly separate a specific subinterface to support the optional method:
Any reasoned versions why remove() is part of the Iterator? Is not this example of a direct violation of the single responsibility principle from SOLID?
In addition to fancy technical answers ... please consider the timeline too. The "single responsibility principle" was coined by Robert Martin at some point in the middle/late 90es.
The Java iterator interface came into existence with Java 1.2; so around 1998.
It is very much possible that the folks at Sun had never heard of this concept while working on the early releases of Java.
Of course, many smart people have the same ideas without reading a book about it ... so a good designer might have implemented "SRP" like without knowing about "SRP" - but it also requires a high degree of awareness to unveil all the big and small violations of this rule ...
This design decision is explained in the Java Collections API Design FAQ. Specifically, see the first question on why collections don't support immutability and instead require optional operations. The short answer is that they didn't want "an explosion" in the number of interfaces.
There seems to be a mix-up of semantics here. Robert C. Martin defines Single Responsibility as a "single reason to change" (SRP.pdf), not as "doing only a single thing". SRP is very much related with cohesion: a software module should only contain things that are functionally related to each other.
With these things in mind, I don't think that having the remove method included in Iterator violates the SRP. Removing an element is often something you might want to do while iterating over the elements; the operations are essentially cohesive. Also, enabling the removal of elements through Iterator makes the Iterable interface (which was added in Java 5) much more powerful. This feature is utilized in e.g. many of the methods in Guava's Iterables utility class.
More info on the history of the term in this excellent article by Uncle Bob himself.

What's wrong with returning this?

At the company I work for there's a document describing good practices that we should adhere to in Java. One of them is to avoid methods that return this, like for example in:
class Properties {
public Properties add(String k, String v) {
//store (k,v) somewhere
return this;
}
}
I would have such a class so that I'm able to write:
properties.add("name", "john").add("role","swd"). ...
I've seen such idiom many times, like in StringBuilder and don't find anything wrong with it.
Their argumentation is :
... can be the source of synchronization problems or failed expectations about the states of target objects.
I can't think of a situation where this could be true, can any of you give me an example?
EDIT The document doesn't specify anything about mutability, so I don't see the diference between chaining the calls and doing:
properties.add("name", "john");
properties.add("role", "swd");
I'll try to get in touch with the originators, but I wanted to do it with my guns loaded, thats' why I posted the question.
SOLVED: I got to talk with one of the authors, his original intention was apparently to avoid releasing objects that are not yet ready, like in a Builder pattern, and explained that if a context switch happens between calls, the object could be in an invalid state. I argued that this had nothing to do with returning this since you could make the same mistake buy calling the methods one by one and had more to do with synchronizing the building process properly. He admitted the document could be more explicit and will revise it soon. Victory is mine/ours!
My guess is that they are against mutable state (and often are rightly so). If you are not designing fluent interfaces returning this but rather return a new immutable instance of the object with the changed state, you can avoid synchronization problems or have no "failed expectations about the states of target objects". This might explain their requirement.
The only serious basis for the practice is avoiding mutable objects; the criticism that it is "confusing" and leads to "failed expectations" is quite weak. One should never use an object without first getting familiar with its semantics, and enforcing constraints on the API just to cater for those who opt out of reading Javadoc is not a good practice at all— especially because, as you note, returning this to achieve a fluent API design is one of the standard approaches in Java, and indeed a very welcome one.
I think sometimes this approach can be really useful, for example in 'builder' pattern.
I can say that in my organization this kind of things is controlled by Sonar rules, and we don't have such a rule.
Another guess is that maybe the project was built on top of existing codebase and this is kind of legacy restriction.
So the only thing I can suggest here is to talk to the people who wrote this doc :)
Hope this helps
I think it's perfectly acceptable to use that pattern in some situations.
For example, as a Swing developer, I use GridBagLayout fairly frequently for its strengths and flexibility, but anyone who's ever used it (with it's partener in crime GridBagConstraints) knows that it can be quite verbose and not very readable.
A common workaround that I've seen online (and one that I use) is to subclass GridBagConstraints (GBConstraints) that has a setter for each different property, and each setter returns this. This allows for the developer to chain the different properties on an as-needed basis.
The resultant code is about 1/4 the size, and far more readable/maintainable, even to the casual developer who might not be familiar with using GridBagConstaints.

Important topics/APIs list for Java interview

This is little different from what already been asked for here
I would like to know what topics/APIs are most important for Java interviews. for example -
Concurrency,
Collections
.....and like that.
The reason is because implementations like ConcurrentHashMap (read here) have so much details in them, that one would like to discuss about them as it covers many important aspects
java.io - difference between streams and writers. Buffered streams.
java.util - the collection framework. Set and List. What's HashMap, TreeMap. Some questions on efficiency of concrete collections
java.lang - wrapper types, autoboxing
java.util.concurrent - synchronization aids, atomic primitives, executors, concurrent collections.
multithreading - object monitors, synchronized keyword, methods - static and non-static.
I'd say there are two things you need for every java interview:
For Basic knowledge of the Language, consult your favorite book or the Sun Java Tutorial
For Best Practices read Effective Java by Joshua Bloch
Apart from that, read whatever seems appropriate to the job description, but I'd say these two are elementary.
I guess these packages are relevant for every java job:
java.lang (Core classes)
java.io (File and Resource I/O)
java.util (Collections Framework)
java.text (Text parsing / manipulation)
IMHO its more important to have a firm understanding of the concepts rather than specific knowledge of the API and especially the internal workings of specific classes. For example;
knowing that HashMap is not synchronized is important
knowing how this might affect a multithreaded app is important
knowing what kind of solutions exist for this problem is important
I wouldn't worry too much about specific API details like individual methods of ConcurrentHashMap, unless you're interviewing for a job that is advertised as needing a lot of advanced threading logic.
A thorough understanding of the basic Java API's is more important, and books like Effective Java can help there. At least as important though is to know higher level concepts like Object Orientation and Design Patterns.
Understanding what Polymorphism, Encapsulation and Inheritance are, and when and how to use them, is vital. Know how to decide between Polymorphism and Delegation (is-a versus has-a is a decent start, but the Liskov Substitution Principle is a better guide), and why you may want to favor composition over inheritance. I highly recommend "Agile Software Development" by Robert Martin for this subject. Or check out this link for an initial overview.
Know some of the core patterns like singleton, factory, facade and observer. Read the GoF book and/or Head First Design Patterns.
I also recommend learning about refactoring, unit testing, dependency injection and mocking.
All these subject won't just help you during interviews, they will make you a better developer.
We usually require the following knowledge on new developers:
Low level (programming) questions:
http://www.interview-questions-java.com/
Antipatterns:
http://en.wikipedia.org/wiki/Anti-pattern
Design:
http://en.wikipedia.org/wiki/Design_Patterns
On some of the interviews I have been to, there is also the java.io package covered, sometimes with absurd questions on what kind of exceptions would some rarely used method declare to throw, or whether some strange looking constructor overload exists.
Concurrency is always important for higher-level positions, but I think that knowing the concepts well (and understanding them ofc) would win you more points than specific API knowledge.
Some other APIs that get mentioned at interviews are Reflection (maybe couple questions on what can be achieved with it) and also java.lang.ref.Reference and its subclasses.
I ask some basic questions ('whats the difference between a list and a set?', 'whats an interface?', etc) and then I go off the resume. If hibernate is on there 5 times, I expect the candidate to be able to define ORM. You would be surprised how often it happens that they can't. I am also interested in how the candidate approaches software -- do they have a passion for it? And it is very important that the candidate believes in TDD. Naturally, if its a really senior position, the questions will be more advanced (e.g. 'whats ThreadLocal and when do you use it'), but for most candidates this is not necessary.
Completely agree with Luke here. We can not stick to some API's to prepare for Core Java interviews. I think a complete understanding of the OOPS concept in Java is must. Have good knowledge of oops shows the interviewer that the person can learn new API's easily and quick.
Topics that should be covered are as follows:
OOPS Concept
Upcasting & DownCasting
Threading
Collection framework.
Here is a good post to get started. Core Java Interview Q & A

OGNL thread safety

I'm going to reuse OGNL library out of Struts2 scope. I have rather large set of formulas, that is why I would like to precompile all of them:
Ognl.parseExpression(expressionString);
But I'm not sure if precompiled expression can be used in multi-thread environment. Does anybody knows if it can be used?
This PropertyUtils code from OGNL is written to be thread-safe, and so I would guess that compiled expressions are intended to be thread safe.
Further evidence is that most of the accessor API provide the mutable state as a context parameter (e.g. see PropertyAccessor), so the classes themselves have little mutable state. Immutable classes are intrinsicly thread-safe. The developer guide urges extensions to be thread-safe, and finally
looking through the code, where there is mutable state, it is guarded in a synchronized block, for example see EvaluationPool.
In summary, it seems OGNL has been designed to be thread-safe. Whether it actually is or not is another question! You could write a quick test to see for sure, using for example Concutest. Alternatively, if the number of threads is reasonable, storing all the expressions in a ThreadLocal sidesteps the issue altogether, at the cost of a little extra memory (or possibly not, as OGNL does expression caching.)
I think your best option is to contact original developers, directly or through mailing list:
http://www.opensymphony.com/ognl/members.action
https://ognl.dev.java.net/servlets/ProjectMailingListList
The project seems to be abandoned for some time, so there is hardly anybody else who knows :/

Empirical data on the effects of immutability?

In class today, my professor was discussing how to structure a class. The course primarily uses Java and I have more Java experience than the teacher (he comes from a C++ background), so I mentioned that in Java one should favor immutability. My professor asked me to justify my answer, and I gave the reasons that I've heard from the Java community:
Safety (especially with threading)
Reduced object count
Allows certain optimizations (especially for garbage collector)
The professor challenged my statement by saying that he'd like to see some statistical measurement of these benefits. I cited a wealth of anecdotal evidence, but even as I did so, I realized he was right: as far as I know, there hasn't been an empirical study of whether immutability actually provides the benefits it promises in real-world code. I know it does from experience, but others' experiences may differ.
So, my question is, have there been any statistical studies done on the effects of immutability in real-world code?
I would point to Item 15 in Effective Java. The value of immutability is in the design (and it isn't always appropriate - it is just a good first approximation) and design preferences are rarely argued from a statistical point of view, but we have seen mutable objects (Calendar, Date) that have gone really bad, and serious replacements (JodaTime, JSR-310) have opted for immutability.
The biggest advantage of immutability in Java, in my opinion, is simplicity. It becomes much simpler to reason about the state of an object, if that state cannot change. This is of course even more important in a multi-threaded environment, but even in simple, linear single-threaded programs it can make things far easier to understand.
See this page for more examples.
So, my question is, have there been
any statistical studies done on the
effects of immutability in real-world
code?
I'd argue that your professor is just being obtuse -- not necessarily intentionally or even a bad thing. Its just that the question is too vague. Two real problems with the question:
"Statistical studies on the effect of [x]" doesn't really mean anything if you don't specify what kind of measurements you're looking for.
"Real-world code" doesn't really mean anything unless you state a specific domain. Real world code includes scientific computing, game development, blog engines, automated proof generators, stored procedures, operating system kernals, etc
For what its worth, the ability for the compiler to optimize immutable objects is well-documented. Off the top of my head:
The Haskell compiler performs deforestation (also called short-cut fusion), where Haskell will transform the expression map f . map g to map f . g. Since Haskell functions are immutable, these expressions are guaranteed to produce equivalent output, but the second function runs twice as fast since we don't need to create an intermediate list.
Common subexpression elimination where we could convert x = foo(12); y = foo(12) to temp = foo(12); x = temp; y = temp; is only possible if the compiler can guarantee foo is a pure function. To my knowledge, the D compiler can perform substitutions like this using the pure and immutable keywords. If I remember correctly, some C and C++ compilers will aggressively optimize calls to these functions marked "pure" (or whatever the equivalent keyword is).
So long as we don't have mutable state, a sufficiently smart compiler can execute linear blocks of code multiple threads with a guarantee that we won't corrupt the state of variables in another thread.
Regarding concurrency, the pitfalls of concurrency using mutable state are well-documented and don't need to be restated.
Sure, this is all anecdotal evidence, but that's pretty much the best you'll get. The immutable vs mutable debate is largely a pissing match, and you are not going to find a paper making a sweeping generalization like "functional programming is superior to imperative programming".
At most, you'll probably find that you can summarize the benefits of immutable vs mutable in a set of best practices rather than as codified studies and statistics. For example, mutable state is the enemy of multithreaded programming; on the other hand, mutable queues and arrays are often easier to write and more efficient in practice than their immutable variants.
It takes practice, but eventually you learn to use the right tool for the job, rather than shoehorning your favorite pet paradigm into project.
I think your professor's being overly stubborn (probably deliberately, to push you to a fuller understanding). Really the benefits of immutability are not so much what the complier can do with optimisations, but really that it's much easier for us humans to read and understand. A variable that is guaranteed to be set when the object is created and is guaranteed not to change afterwards, is much easier to grok and reason with than one which is this value now but might be set to some other value later.
This is especially true with threading, in that you don't need to worry about processor caches and monitors and all that boilerplate that comes with avoiding concurrent modifications, when the language guarantees that no such modification can possibly occur.
And once you express the benefits of immutability as "the code is easier to follow", it feels a bit sillier to ask for empirical measurements of productivity increases vis-a-vis "easier-to-followness".
On the other hand, the compiler and Hotspot can probably perform certain optimisations based on knowing that a value can never change - like you I have a feeling that this would take place and is a good things but I'm not sure of the details. It's a lot more likely that there will be empirical data for the types of optimisation that can occur, and how much faster the resulting code is.
Don't argue with the prof. You have nothing to gain.
These are open questions, like dynamic vs static typing. We sometimes think functional techniques involving immutable data are better for various reasons, but it's mostly a matter of style so far.
What would you objectively measure? GC and object count could be measured with mutable/immutable versions of the same program (although how typical that would be would be subjective, so this is a pretty weak argument). I can't imagine how you could measure the removal of threading bugs, except maybe anecdotally by comparison with a real world example of a production application plagued by intermittent issues fixed by adding immutability.
Immutability is a good thing for value objects. But how about other things? Imagine an object that creates a statistic:
Stats s = new Stats ();
... some loop ...
s.count ();
s.end ();
s.print ();
which should print "Processed 536.21 rows/s". How do you plan to implement count() with an immutable? Even if you use an immutable value object for the counter itself, s can't be immutable since it would have to replace the counter object inside of itself. The only way out would be:
s = s.count ();
which means to copy the state of s for every round in the loop. While this can be done, it surely isn't as efficient as incrementing the internal counter.
Moreover, most people would fail to use this API right because they would expect count() to modify the state of the object instead of returning a new one. So in this case, it would create more bugs.
As other comments have claimed, it would be very, very hard to collect statistics on the merits of immutable objects, because it would be virtually impossible to find control cases - pairs of software applications which are alike in every way, except that one uses immutable objects and the other does not. (In nearly every case, I would claim that one version of that software was written some time after the other, and learned numerous lessons from the first, and so improvements in performance will have many causes.) Any experienced programmer who thinks about this for a moment ought to realize this. I think your professor is trying to deflect your suggestion.
Meanwhile, it is very easy to make cogent arguments in favor of immutability, at least in Java, and probably in C# and other OO languages. As Yishai states, Effective Java makes this argument well. So does the copy of Java Concurrency in Practice sitting on my bookshelf.
Immutable objects allow code which to share an object's value by sharing a reference. Mutable objects, however, have the identity that code which wants to share an object's identity to do so by sharing a reference. Both kinds of sharing are essential in most applications. If one doesn't have immutable objects available, it's possible to share values by copying them into either new objects or objects supplied by the intended recipient of those values. Getting my without mutable objects is much harder. One could somewhat "fake" mutable objects by saying stateOfUniverse = stateOfUniverse.withSomeChange(...), but would requires that nothing else modify stateOfUniverse while its withSomeChange method is running [precluding any sort of multi-threading]. Further, if one were e.g. trying to track a fleet of trucks, and part of the code was interested in one particular truck, it would be necessary for that code to always look up that truck in a table of trucks any time it might have changed.
A better approach is to subdivide the universe into entities and values. Entities would have changeable characteristics, but an immutable identity, so a storage location of e.g. type Truck could continue to identify the same truck even as the truck itself changes position, loads and unloads cargo, etc. Values would not have generally have a particular identity, but would have immutable characteristics. A Truck might store its location as type WorldCoordinate. A WorldCoordinate that represents 45.6789012N 98.7654321W would continue to so as long as any reference to it exists; if a truck that was at that location moved north slightly, it would create a new WorldCoordinate to represent 45.6789013N 98.7654321W, abandon the old one, and store a reference to that new one.
It is generally easiest to reason about code when everything encapsulates either an immutable value or an immutable identity, and when the things which are supposed to have an immutable identity are mutable. If one didn't want to use any mutable objects outside a variable stateOfUniverse, updating a truck's position would require something like:
ImmutableMapping<int,Truck> trucks = stateOfUniverse.getTrucks();
Truck myTruck = trucks.get(myTruckId);
myTruck = myTruck.withLocation(newLocation);
trucks = trucks.withItem(myTruckId,myTruck);
stateOfUniverse = stateOfUniverse.withTrucks(trucks);
but reasoning about that code would be more difficult than would be:
myTruck.setLocation(newLocation);

Categories