Ambiguity in Iterable interface and its implementation in Java - java

Does Java Iterator interface enforce us to return a new Object when we call next() method on this interface? I went through the documentation and there was no Obligation for returning a new Object per each call, but it causes many ambiguities. It seems, that Hadoop mapreduce framework breaks some undocumented rule, that causes many problem in my simple program (including using Java8 Streams). It returns the same Object with different content when I call next() method on theIterator (although it is against my imagination, it seems that it does not break the rule of Iterator, at least it seems that it does not break the documented rule of Iterator interface). I want to know why it happens? is it a mapreduce fault? or is it Java fault for not documenting Iterator interface to return new instance on every call to next() method:
For the sake of simplicity and showing what is happening in hadoop mapreduce I write my own Iterator which is similar to what mapreduce is doing so you can understand what I'm getting at (so it is not a flawless program and might have many problems, but please focus on the concept that I'm trying to show).
Imagine I have the following Hospital Entity:
#Getter
#Setter
#AllArgsConstructor
#ToString
public class Hospital {
private AREA area;
private int patients;
public Hospital(AREA area, int patients) {
this.area = area;
this.patients = patients;
}
public Hospital() {
}
}
For which I have Written following MyCustomHospitalIterable:
public class MyCustomHospitalIterable implements Iterable<Hospital> {
private List<Hospital> internalList;
private CustomHospitalIteration customIteration = new CustomHospitalIteration();
public MyCustomHospitalIterable(List<Hospital> internalList) {
this.internalList = internalList;
}
#Override
public Iterator<Hospital> iterator() {
return customIteration;
}
public class CustomHospitalIteration implements Iterator<Hospital> {
private int currentIndex = 0;
private Hospital currentHospital = new Hospital();
#Override
public boolean hasNext() {
if (MyCustomHospitalIterable.this.internalList.size() - 1 > currentIndex) {
currentIndex++;
return true;
}
return false;
}
#Override
public Hospital next() {
Hospital hospital =
MyCustomHospitalIterable.this.internalList.get(currentIndex);
currentHospital.setArea(hospital.getArea());
currentHospital.setPatients(hospital.getPatients());
return currentHospital;
}
}
}
Here, instead of returning new Object on next() method call, I return the same Object with different content. You might ask what is the advantage of doing this? It has its own advantage in mapreduce because in big data they don't want to create new Object for performance consideration. Does this break any documented rule of Iterator interface?
Now let's see some consequences of having implemented Iterable that way:
consider following simple program:
public static void main(String[] args) {
List<Hospital> hospitalArray = Arrays.asList(
new Hospital(AREA.AREA1, 10),
new Hospital(AREA.AREA2, 20),
new Hospital(AREA.AREA3, 30),
new Hospital(AREA.AREA1, 40));
MyCustomHospitalIterable hospitalIterable = new MyCustomHospitalIterable(hospitalArray);
List<Hospital> hospitalList = new LinkedList<>();
Iterator<Hospital> hospitalIter = hospitalIterable.iterator();
while (hospitalIter.hasNext()) {
Hospital hospital = hospitalIter.next();
System.out.println(hospital);
hospitalList.add(hospital);
}
System.out.println("---------------------");
System.out.println(hospitalList);
}
It is so unlogical and counterintuitive that the output of the program is as follow:
Hospital{area=AREA2, patients=20}
Hospital{area=AREA3, patients=30}
Hospital{area=AREA1, patients=40}
---------------------
[Hospital{area=AREA1, patients=40}, Hospital{area=AREA1, patients=40}, Hospital{area=AREA1, patients=40}]
And to make it worse imagine what happens when we are woking with Streams in Java. What would be the output of following program in Java:
public static void main(String[] args) {
List<Hospital> hospitalArray = Arrays.asList(
new Hospital(AREA.AREA1, 10),
new Hospital(AREA.AREA2, 20),
new Hospital(AREA.AREA3, 30),
new Hospital(AREA.AREA1, 40));
MyCustomHospitalIterable hospitalIterable = new MyCustomHospitalIterable(hospitalArray);
Map<AREA, Integer> sortedHospital =
StreamSupport.stream(hospitalIterable.spliterator(), false)
.collect(Collectors.groupingBy(
Hospital::getArea, Collectors.summingInt(Hospital::getPatients)));
System.out.println(sortedHospital);
}
It depends we use parallel Stream or sequential one:
In seqentioal one output is as follow:
{AREA2=20, AREA1=40, AREA3=30}
and in parallel one it is:
{AREA1=120}
As a user I want to use interface as they are and don't have any concern about the implementations of that interface.
The problem is that here I know how MyCustomHospitalIterable is implemeted but in hadoop mapreduce I have to implement method like bellow and I have no idea where Iterable<IntWritable> came from and what is its implementation. I just want to use it as a pure Iterable interface but as I showed above it does not work as expected:
public void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException {
List<IntWritable> list = new LinkedList<>();
Iterator<IntWritable> iter = values.iterator();
while (iter.hasNext()) {
IntWritable count = iter.next();
System.out.println(count);
list.add(count);
}
System.out.println("---------------------");
System.out.println(list);
}
Here is my question:
Why my simple program has broken?
Is it mapreduce fault to not implementing undocomented conventional rule of Iterable and Iterator(or there is documentation for this behavior which I haven't noticed)?
Or is it Java for not documenting Iterable and Iterator interface to return new Object on each call?
or it's my fault as a programmer?

It is rather very unusual to return the same mutable object with different content for an Iterable. I did not find something in the java language reference; though not searched much. It is simple too error prone to be correct language usage.
You mention of other tools, like Streams, are apt.
Also the next java's record type is just intended for such tuple like usage, of course as multiple immutable objects. "Your" Iterable suffers from not being able to use in collections, unless on does a .next().clone() or such.
This weakness of Iterable is in the same category as having a mutable object as Map key. It is deadly wrong.

Related

How to keep track of a String variable while changing it with Functions using Stream API?

I want to use Stream API to keep track of a variable while changing it with functions.
My code:
public String encoder(String texteClair) {
for (Crypteur crypteur : algo) {
texteClair = crypteur.encoder(texteClair);
}
return texteClair;
}
I have a list of classes that have methods and I want to put a variable inside all of them, like done in the code above.
It works perfectly, but I was wondering how it could be done with streams?
Could we use reduce()?
Use an AtomicReference, which is effectively final, but its wrapped value may change:
public String encoder(String texteClair) {
AtomicReference<String> ref = new AtomicReference<>(texteClair);
algo.stream().forEach(c -> ref.updateAndGet(c::encoder)); // credit Ole V.V
return ref.get();
}
Could we use reduce()?
I guess we could. But keep in mind that it's not the best case to use streams.
Because you've mentioned "classes" in plural, I assume that Crypteur is either an abstract class or an interface. As a general rule you should favor interfaces over abstract classes, so I'll assume the that Crypteur is an interface (if it's not, that's not a big issue) and it has at least one implementation similar to this :
public interface Encoder {
String encoder(String str);
}
public class Crypteur implements Encoder {
private UnaryOperator<String> operator;
public Crypteur(UnaryOperator<String> operator) {
this.operator = operator;
}
#Override
public String encoder(String str) {
return operator.apply(str);
}
}
Then you can utilize your encoders with stream like this:
public static void main(String[] args) {
List<Crypteur> algo =
List.of(new Crypteur(str -> str.replaceAll("\\p{Punct}|\\p{Space}", "")),
new Crypteur(str -> str.toUpperCase(Locale.ROOT)),
new Crypteur(str -> str.replace('A', 'W')));
String result = encode(algo, "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system");
System.out.println(result);
}
public static String encode(Collection<Crypteur> algo, String str) {
return algo.stream()
.reduce(str,
(String result, Crypteur encoder) -> encoder.encoder(result),
(result1, result2) -> { throw new UnsupportedOperationException(); });
}
Note that combiner, which is used in parallel to combine partial results, deliberately throws an exception to indicate that this task ins't parallelizable. All transformations must be applied sequentially, we can't, for instance, apply some encoders on the given string and then apply the rest of them separately on the given string and merge the two results - it's not possible.
Output
EVERYPIECEOFKNOWLEDGEMUSTHWVEWSINGLEUNWMBIGUOUSWUTHORITWTIVEREPRESENTWTIONWITHINWSYSTEM

How to populate Map using the multiple classes using the thread safety approach [duplicate]

This question already has answers here:
Thread safe Hash Map?
(3 answers)
Closed 1 year ago.
Maybe this question has the answer but based on my searching words could not find something that fits best for me so posting the question here. This question may seem silly but this is something new to me. Please provide some suggestions or workarounds.
I have a Populator class which has a Map. I want this map to be populated with various values during the code execution then finally I want to obtain all values in Map then process it further based on my requirement.
As of now, I am using the Static method and variable to achieve this. This seems to work fine as of now. But my mentor advised me that this is not going to be thread-safe when processing multiple requests in parallel. I would like to know how to make the below code thread safe.
I would explain with code for better understanding:
Following is my Populcator.class which will be used to populate the Map during the code processing which I access at the end for some further processing. I have created one more class AnotherPopulator as a work-around for the issue but it does not work as per my need:
public class Populator {
#Getter
private static final HashMap<String,String> namespaces = new HashMap<>();
public static void namespacePopulator(String key,String value){
namespaces.put(key,value);
}
}
#NoArgsConstructor
#Getter
class AnotherPopulator {
private final HashMap<String,String> namespaces = new HashMap<>();
public void namespacePopulator(String key,String value){
this.namespaces.put(key,value);
}
}
Following are class A and B which will be invoked my Main.class to populate the Map during the code execution:
public class A {
public void populatorA() {
Populator.namespacePopulator("KeyA", "ValueA");
}
public void anotherPopulatorA(){
AnotherPopulator anotherPopulator = new AnotherPopulator();
anotherPopulator.namespacePopulator("KeyAA","ValueA1");
}
}
public class B {
public void populatorB() {
Populator.namespacePopulator("KeyB", "ValueB");
}
public void anotherPopulatorB(){
AnotherPopulator anotherPopulator = new AnotherPopulator();
anotherPopulator.namespacePopulator("KeyB1","ValueB1");
}
}
Following is my Main.class which will invoke A and B then finally obtain the Map with all the values populated during the execution:
public class Main {
public static void main(String[] args) {
A a = new A();
B b = new B();
a.populatorA();
b.populatorB();
//This will provide me with desired result but does not provide the thread safety
System.out.println(Populator.getNamespaces());
System.out.println("****************************");
//This will provide thread safety but does not provide the desired result I would want as new object will be created at every stage
AnotherPopulator anotherPopulator = new AnotherPopulator();
System.out.println(anotherPopulator.getNamespaces());
//I would like to populate a Map present in class from various classes during the execution then finally I would like to obtain all the values that were added during the
// execution but want to do this using the thread safety approach
}
}
Following is the output I get. 1st part has the values I need but it's not a thread-safe approach. 2nd part does not have the values I need but it's a thread-safe approach I believe.
{KeyB=ValueB, KeyA=ValueA}
****************************
{}
I would like to know how can I declare a Map using the Thread-safety approach and populate it during the entire code execution life cycle and then finally obtain all the values together.
I hope I am able to explain the issue clearly. Any help/workarounds/suggestions will be really helpful for me. Thanks in advance.
As mentioned in the comment use ConcurrentHashMap
public class Populator {
#Getter
private static final ConcurrentHashMap<String,String> namespaces = new ConcurrentHashMap <>();
public static void namespacePopulator(String key,String value){
namespaces.putIfAbsent(key,value);
}
}

Java Stage-based Processing Implementation

There's some domain knowledge/business logic baked into the problem I'm trying to solve but I'll try to boil it down to the basics as much as possible.
Say I have an interface defined as follows:
public interface Stage<I, O> {
StageResult<O> process(StageResult<I> input) throws StageException;
}
This represents a stage in a multi-stage data processing pipeline, my idea is to break the data processing steps into sequential (non-branching) independent steps (such as read from file, parse network headers, parse message payloads, convert format, write to file) represented by individual Stage implementations. Ideally I'd implement a FileInputStage, a NetworkHeaderParseStage, a ParseMessageStage, a FormatStage, and a FileOutputStage, then have some sort of
Stage<A, C> compose(Stage<A, B> stage1, Stage<B, C> stage2);
method such that I can eventually compose a bunch of stages into a final stage that looks like FileInput -> FileOutput.
Is this something (specifically the compose method, or a similar mechanism for aggregating many stages into one stage) even supported by the Java type system? I'm hacking away at it now and I'm ending up in a very ugly place involving reflection and lots of unchecked generic types.
Am I heading off in the wrong direction or is this even a reasonable thing to try to do in Java? Thanks so much in advance!
You didn't post enough implementation details to show where the type safety issues are but here is my throw on how you could address the problem:
First dont make the whole thing too generic, make your satges specific reguarding their inputs and outputs
Then create a composit stage which implements Stage and combines two stages into one final result.
Here is a very simpele implementatiom
public class StageComposit<A, B, C> implements Stage<A, C> {
final Stage<A, B> stage1;
final Stage<B, C> stage2;
public StageComposit(Stage<A, B> stage1, Stage<B, C> stage2) {
this.stage1 = stage1;
this.stage2 = stage2;
}
#Override
public StageResult<C> process(StageResult<A> input) {
return stage2.process(stage1.process(input));
}
}
Stage result
public class StageResult<O> {
final O result;
public StageResult(O result) {
this.result = result;
}
public O get() {
return result;
}
}
Example specific Stages:
public class EpochInputStage implements Stage<Long, Date> {
#Override
public StageResult<Date> process(StageResult<Long> input) {
return new StageResult<Date>(new Date(input.get()));
}
}
public class DateFormatStage implements Stage<Date, String> {
#Override
public StageResult<String> process(StageResult<Date> input) {
return new StageResult<String>(
new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
.format(input.get()));
}
}
public class InputSplitStage implements Stage<String, List<String>> {
#Override
public StageResult<List<String>> process(StageResult<String> input) {
return new StageResult<List<String>>(
Arrays.asList(input.get().split("[-:\\s]")));
}
}
And finally a small test demonstrating how to comibine all
public class StageTest {
#Test
public void process() {
EpochInputStage efis = new EpochInputStage();
DateFormatStage dfs = new DateFormatStage();
InputSplitStage iss = new InputSplitStage();
Stage<Long, String> sc1 =
new StageComposit<Long, Date, String>(efis, dfs);
Stage<Long, List<String>> sc2 =
new StageComposit<Long, String, List<String>>(sc1, iss);
StageResult<List<String>> result =
sc2.process(new StageResult<Long>(System.currentTimeMillis()));
System.out.print(result.get());
}
}
Output for current time would be a list of strings
[2015, 06, 24, 16, 27, 55]
As you see no type safety issues or any type castings. When you need to handle other types of inputs and outputs or convert them to suite the next stage just write a new Stage and hook it up in your stage processing chain.
You may want to consider using a composite pattern or a decorator pattern. For the decorator each stage will wrap or decorate the previous stage. To do this you have each stage implement the interface as you are doing allow a stage to contain another stage.
The process() method does not need to accept a StageResult parameter anymore since it can call the contained Stage's process() method itself, get the StageResult and perform its own processing, returning another StageResult.
One advantage is that you can restructure your pipeline at run time.
Each Stage that may contain another can extend the ComposableStage and each stage that is an end point of the process can extend the LeafStage. Note that I just used those terms to name the classes by function but you can create more imaginative names.

Thread-safe cache of one object in java

let's say we have a CountryList object in our application that should return the list of countries. The loading of countries is a heavy operation, so the list should be cached.
Additional requirements:
CountryList should be thread-safe
CountryList should load lazy (only on demand)
CountryList should support the invalidation of the cache
CountryList should be optimized considering that the cache will be invalidated very rarely
I came up with the following solution:
public class CountryList {
private static final Object ONE = new Integer(1);
// MapMaker is from Google Collections Library
private Map<Object, List<String>> cache = new MapMaker()
.initialCapacity(1)
.makeComputingMap(
new Function<Object, List<String>>() {
#Override
public List<String> apply(Object from) {
return loadCountryList();
}
});
private List<String> loadCountryList() {
// HEAVY OPERATION TO LOAD DATA
}
public List<String> list() {
return cache.get(ONE);
}
public void invalidateCache() {
cache.remove(ONE);
}
}
What do you think about it? Do you see something bad about it? Is there other way to do it? How can i make it better? Should i look for totally another solution in this cases?
Thanks.
google collections actually supplies just the thing for just this sort of thing: Supplier
Your code would be something like:
private Supplier<List<String>> supplier = new Supplier<List<String>>(){
public List<String> get(){
return loadCountryList();
}
};
// volatile reference so that changes are published correctly see invalidate()
private volatile Supplier<List<String>> memorized = Suppliers.memoize(supplier);
public List<String> list(){
return memorized.get();
}
public void invalidate(){
memorized = Suppliers.memoize(supplier);
}
Thanks you all guys, especially to user "gid" who gave the idea.
My target was to optimize the performance for the get() operation considering the invalidate() operation will be called very rare.
I wrote a testing class that starts 16 threads, each calling get()-Operation one million times. With this class I profiled some implementation on my 2-core maschine.
Testing results
Implementation Time
no synchronisation 0,6 sec
normal synchronisation 7,5 sec
with MapMaker 26,3 sec
with Suppliers.memoize 8,2 sec
with optimized memoize 1,5 sec
1) "No synchronisation" is not thread-safe, but gives us the best performance that we can compare to.
#Override
public List<String> list() {
if (cache == null) {
cache = loadCountryList();
}
return cache;
}
#Override
public void invalidateCache() {
cache = null;
}
2) "Normal synchronisation" - pretty good performace, standard no-brainer implementation
#Override
public synchronized List<String> list() {
if (cache == null) {
cache = loadCountryList();
}
return cache;
}
#Override
public synchronized void invalidateCache() {
cache = null;
}
3) "with MapMaker" - very poor performance.
See my question at the top for the code.
4) "with Suppliers.memoize" - good performance. But as the performance the same "Normal synchronisation" we need to optimize it or just use the "Normal synchronisation".
See the answer of the user "gid" for code.
5) "with optimized memoize" - the performnce comparable to "no sync"-implementation, but thread-safe one. This is the one we need.
The cache-class itself:
(The Supplier interfaces used here is from Google Collections Library and it has just one method get(). see http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/base/Supplier.html)
public class LazyCache<T> implements Supplier<T> {
private final Supplier<T> supplier;
private volatile Supplier<T> cache;
public LazyCache(Supplier<T> supplier) {
this.supplier = supplier;
reset();
}
private void reset() {
cache = new MemoizingSupplier<T>(supplier);
}
#Override
public T get() {
return cache.get();
}
public void invalidate() {
reset();
}
private static class MemoizingSupplier<T> implements Supplier<T> {
final Supplier<T> delegate;
volatile T value;
MemoizingSupplier(Supplier<T> delegate) {
this.delegate = delegate;
}
#Override
public T get() {
if (value == null) {
synchronized (this) {
if (value == null) {
value = delegate.get();
}
}
}
return value;
}
}
}
Example use:
public class BetterMemoizeCountryList implements ICountryList {
LazyCache<List<String>> cache = new LazyCache<List<String>>(new Supplier<List<String>>(){
#Override
public List<String> get() {
return loadCountryList();
}
});
#Override
public List<String> list(){
return cache.get();
}
#Override
public void invalidateCache(){
cache.invalidate();
}
private List<String> loadCountryList() {
// this should normally load a full list from the database,
// but just for this instance we mock it with:
return Arrays.asList("Germany", "Russia", "China");
}
}
Whenever I need to cache something, I like to use the Proxy pattern.
Doing it with this pattern offers separation of concerns. Your original
object can be concerned with lazy loading. Your proxy (or guardian) object
can be responsible for validation of the cache.
In detail:
Define an object CountryList class which is thread-safe, preferably using synchronization blocks or other semaphore locks.
Extract this class's interface into a CountryQueryable interface.
Define another object, CountryListProxy, that implements the CountryQueryable.
Only allow the CountryListProxy to be instantiated, and only allow it to be referenced
through its interface.
From here, you can insert your cache invalidation strategy into the proxy object. Save the time of the last load, and upon the next request to see the data, compare the current time to the cache time. Define a tolerance level, where, if too much time has passed, the data is reloaded.
As far as Lazy Load, refer here.
Now for some good down-home sample code:
public interface CountryQueryable {
public void operationA();
public String operationB();
}
public class CountryList implements CountryQueryable {
private boolean loaded;
public CountryList() {
loaded = false;
}
//This particular operation might be able to function without
//the extra loading.
#Override
public void operationA() {
//Do whatever.
}
//This operation may need to load the extra stuff.
#Override
public String operationB() {
if (!loaded) {
load();
loaded = true;
}
//Do whatever.
return whatever;
}
private void load() {
//Do the loading of the Lazy load here.
}
}
public class CountryListProxy implements CountryQueryable {
//In accordance with the Proxy pattern, we hide the target
//instance inside of our Proxy instance.
private CountryQueryable actualList;
//Keep track of the lazy time we cached.
private long lastCached;
//Define a tolerance time, 2000 milliseconds, before refreshing
//the cache.
private static final long TOLERANCE = 2000L;
public CountryListProxy() {
//You might even retrieve this object from a Registry.
actualList = new CountryList();
//Initialize it to something stupid.
lastCached = Long.MIN_VALUE;
}
#Override
public synchronized void operationA() {
if ((System.getCurrentTimeMillis() - lastCached) > TOLERANCE) {
//Refresh the cache.
lastCached = System.getCurrentTimeMillis();
} else {
//Cache is okay.
}
}
#Override
public synchronized String operationB() {
if ((System.getCurrentTimeMillis() - lastCached) > TOLERANCE) {
//Refresh the cache.
lastCached = System.getCurrentTimeMillis();
} else {
//Cache is okay.
}
return whatever;
}
}
public class Client {
public static void main(String[] args) {
CountryQueryable queryable = new CountryListProxy();
//Do your thing.
}
}
Your needs seem pretty simple here. The use of MapMaker makes the implementation more complicated than it has to be. The whole double-checked locking idiom is tricky to get right, and only works on 1.5+. And to be honest, it's breaking one of the most important rules of programming:
Premature optimization is the root of
all evil.
The double-checked locking idiom tries to avoid the cost of synchronization in the case where the cache is already loaded. But is that overhead really causing problems? Is it worth the cost of more complex code? I say assume it is not until profiling tells you otherwise.
Here's a very simple solution that requires no 3rd party code (ignoring the JCIP annotation). It does make the assumption that an empty list means the cache hasn't been loaded yet. It also prevents the contents of the country list from escaping to client code that could potentially modify the returned list. If this is not a concern for you, you could remove the call to Collections.unmodifiedList().
public class CountryList {
#GuardedBy("cache")
private final List<String> cache = new ArrayList<String>();
private List<String> loadCountryList() {
// HEAVY OPERATION TO LOAD DATA
}
public List<String> list() {
synchronized (cache) {
if( cache.isEmpty() ) {
cache.addAll(loadCountryList());
}
return Collections.unmodifiableList(cache);
}
}
public void invalidateCache() {
synchronized (cache) {
cache.clear();
}
}
}
I'm not sure what the map is for. When I need a lazy, cached object, I usually do it like this:
public class CountryList
{
private static List<Country> countryList;
public static synchronized List<Country> get()
{
if (countryList==null)
countryList=load();
return countryList;
}
private static List<Country> load()
{
... whatever ...
}
public static synchronized void forget()
{
countryList=null;
}
}
I think this is similar to what you're doing but a little simpler. If you have a need for the map and the ONE that you've simplified away for the question, okay.
If you want it thread-safe, you should synchronize the get and the forget.
What do you think about it? Do you see something bad about it?
Bleah - you are using a complex data structure, MapMaker, with several features (map access, concurrency-friendly access, deferred construction of values, etc) because of a single feature you are after (deferred creation of a single construction-expensive object).
While reusing code is a good goal, this approach adds additional overhead and complexity. In addition, it misleads future maintainers when they see a map data structure there into thinking that there's a map of keys/values in there when there is really only 1 thing (list of countries). Simplicity, readability, and clarity are key to future maintainability.
Is there other way to do it? How can i make it better? Should i look for totally another solution in this cases?
Seems like you are after lazy-loading. Look at solutions to other SO lazy-loading questions. For example, this one covers the classic double-check approach (make sure you are using Java 1.5 or later):
How to solve the "Double-Checked Locking is Broken" Declaration in Java?
Rather than just simply repeat the solution code here, I think it is useful to read the discussion about lazy loading via double-check there to grow your knowledge base. (sorry if that comes off as pompous - just trying teach to fish rather than feed blah blah blah ...)
There is a library out there (from atlassian) - one of the util classes called LazyReference. LazyReference is a reference to an object that can be lazily created (on first get). it is guarenteed thread safe, and the init is also guarenteed to only occur once - if two threads calls get() at the same time, one thread will compute, the other thread will block wait.
see a sample code:
final LazyReference<MyObject> ref = new LazyReference() {
protected MyObject create() throws Exception {
// Do some useful object construction here
return new MyObject();
}
};
//thread1
MyObject myObject = ref.get();
//thread2
MyObject myObject = ref.get();
This looks ok to me (I assume MapMaker is from google collections?) Ideally you wouldn't need to use a Map because you don't really have keys but as the implementation is hidden from any callers I don't see this as a big deal.
This is way to simple to use the ComputingMap stuff. You only need a dead simple implementation where all methods are synchronized, and you should be fine. This will obviously block the first thread hitting it (getting it), and any other thread hitting it while the first thread loads the cache (and the same again if anyone calls the invalidateCache thing - where you also should decide whether the invalidateCache should load the cache anew, or just null it out, letting the first attempt at getting it again block), but then all threads should go through nicely.
Use the Initialization on demand holder idiom
public class CountryList {
private CountryList() {}
private static class CountryListHolder {
static final List<Country> INSTANCE = new List<Country>();
}
public static List<Country> getInstance() {
return CountryListHolder.INSTANCE;
}
...
}
Follow up to Mike's solution above. My comment didn't format as expected... :(
Watch out for synchronization issues in operationB, especially since load() is slow:
public String operationB() {
if (!loaded) {
load();
loaded = true;
}
//Do whatever.
return whatever;
}
You could fix it this way:
public String operationB() {
synchronized(loaded) {
if (!loaded) {
load();
loaded = true;
}
}
//Do whatever.
return whatever;
}
Make sure you ALWAYS synchronize on every access to the loaded variable.

Persistent data structures in Java

Does anyone know a library or some at least some research on creating and using persistent data structures in Java? I don't refer to persistence as long term storage but persistence in terms of immutability (see Wikipedia entry).
I'm currently exploring different ways to model an api for persistent structures. Using builders seems to be a interesting solution:
// create persistent instance
Person p = Builder.create(Person.class)
.withName("Joe")
.withAddress(Builder.create(Address.class)
.withCity("paris")
.build())
.build();
// change persistent instance, i.e. create a new one
Person p2 = Builder.update(p).withName("Jack");
Person p3 = Builder.update(p)
.withAddress(Builder.update(p.address())
.withCity("Berlin")
.build)
.build();
But this still feels somewhat boilerplated. Any ideas?
Builders will make your code too verbose to be usable. In practice, almost all immutable data structures I've seen pass in state through the constructor. For what its worth, here are a nice series of posts describing immutable data structures in C# (which should convert readily into Java):
Part 1: Kinds of Immutability
Part 2: Simple Immutable Stack
Part 3: Covariant Immutable Stack
Part 4: Immutable Queue
Part 5: Lolz! (included for completeness)
Part 6: Simple Binary Tree
Part 7: More on Binary Trees
Part 8: Even More on Binary Trees
Part 9: AVL Tree Implementation
Part 10: Double-ended Queue
Part 11: Working Double-ended Queue Implementation
C# and Java are extremely verbose, so the code in these articles is quite scary. I recommend learning OCaml, F#, or Scala and familiarizing yourself with immutability with those languages. Once you master the technique, you'll be able to apply the same coding style to Java much more easily.
I guess the obvious choices are:
o Switch to a transient data structure (builder) for the update. This is quite normal. StringBuilder for String manipulation for example. As your example.
Person p3 =
Builder.update(p)
.withAddress(
Builder.update(p.address())
.withCity("Berlin")
.build()
)
.build();
o Always use persistent structures. Although there appears to be lots of copying, you should actually be sharing almost all state, so it is nowhere near as bad as it looks.
final Person p3 = p
.withAddress(
p.address().withCity("Berlin")
);
o Explode the data structure into lots of variables and recombine with one huge and confusing constructor.
final Person p3 = Person.of(
p.name(),
Address.of(
p.house(), p.street(), "Berlin", p.country()
),
p.x(),
p.y(),
p.z()
);
o Use call back interfaces to provide the new data. Even more boilerplate.
final Person p3 = Person.of(new PersonInfo(
public String name () { return p.name(); )
public Address address() { return Address.of(new AddressInfo() {
private final Address a = p.address();
public String house () { return a.house() ; }
public String street () { return a.street() ; }
public String city () { return "Berlin" ; }
public String country() { return a.country(); }
})),
public Xxx x() { return p.x(); }
public Yyy y() { return p.y(); }
public Zzz z() { return p.z(); }
});
o Use nasty hacks to make fields transiently available to code.
final Person p3 = new PersonExploder(p) {{
a = new AddressExploder(a) {{
city = "Berlin";
}}.get();
}}.get();
(Funnily enough I was just put down a copy of Purely Functional Data Structures by Chris Okasaki.)
Have a look at Functional Java. Currently provided persistent datastructures include:
Singly-linked list (fj.data.List)
Lazy singly-linked list (fj.data.Stream)
Nonempty list (fj.data.NonEmptyList)
Optional value (a container of length 0 or 1) (fj.data.Option)
Set (fj.data.Set)
Multi-way tree (a.k.a. rose tree) (fj.data.Tree)
Immutable map (fj.data.TreeMap)
Products (tuples) of arity 1-8 (fj.P1..P8)
Vectors of arity 2-8 (fj.data.vector.V2..V8)
Pointed list (fj.data.Zipper)
Pointed tree (fj.data.TreeZipper)
Type-safe, generic heterogeneous list (fj.data.hlist.HList)
Immutable arrays (fj.data.Array)
Disjoint union datatype (fj.data.Either)
A number of usage examples are provided with the binary distribution. The source is available under a BSD license from Google Code.
I implemented a few persistent data structures in Java. All open source (GPL) on Google code for anyone who is interested:
http://code.google.com/p/mikeralib/source/browse/#svn/trunk/Mikera/src/mikera/persistent
The main ones I have so far are:
Persistent mutable test object
Persistent hash maps
Persistent vectors/lists
Persistent sets (including a specialised persistent set of ints)
Follow a very simple tentative with dynamic proxy:
class ImmutableBuilder {
static <T> T of(Immutable immutable) {
Class<?> targetClass = immutable.getTargetClass();
return (T) Proxy.newProxyInstance(targetClass.getClassLoader(),
new Class<?>[]{targetClass},
immutable);
}
public static <T> T of(Class<T> aClass) {
return of(new Immutable(aClass, new HashMap<String, Object>()));
}
}
class Immutable implements InvocationHandler {
private final Class<?> targetClass;
private final Map<String, Object> fields;
public Immutable(Class<?> aTargetClass, Map<String, Object> immutableFields) {
targetClass = aTargetClass;
fields = immutableFields;
}
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
if (method.getName().equals("toString")) {
// XXX: toString() result can be cached
return fields.toString();
}
if (method.getName().equals("hashCode")) {
// XXX: hashCode() result can be cached
return fields.hashCode();
}
// XXX: naming policy here
String fieldName = method.getName();
if (method.getReturnType().equals(targetClass)) {
Map<String, Object> newFields = new HashMap<String, Object>(fields);
newFields.put(fieldName, args[0]);
return ImmutableBuilder.of(new Immutable(targetClass, newFields));
} else {
return fields.get(fieldName);
}
}
public Class<?> getTargetClass() {
return targetClass;
}
}
usage:
interface Person {
String name();
Person name(String name);
int age();
Person age(int age);
}
public class Main {
public static void main(String[] args) {
Person mark = ImmutableBuilder.of(Person.class).name("mark").age(32);
Person john = mark.name("john").age(24);
System.out.println(mark);
System.out.println(john);
}
}
grow directions:
naming policy (getName, withName, name)
caching toString(), hashCode()
equals() implementations should be straightforward (although not implemented)
hope it helps :)
It is very difficult, if not impossible, to make things immutable that ain't designed so.
If you can design from ground up:
use only final fields
do not reference non immutable objects
Do you want immutability :
so external code cannot change the data?
so once set a value cannot be changed?
In both cases there are easier ways to accomplish the desired result.
Stopping external code from changing the data is easy with interfaces:
public interface Person {
String getName();
Address getAddress();
}
public interface PersonImplementor extends Person {
void setName(String name);
void setAddress(Address address);
}
public interface Address {
String getCity();
}
public interface AddressImplementor {
void setCity(String city);
}
Then to stop changes to a value once set is also "easy" using java.util.concurrent.atomic.AtomicReference (although hibernate or some other persistence layer usage may need to be modified):
class PersonImpl implements PersonImplementor {
private AtomicReference<String> name;
private AtomicReference<Address> address;
public void setName(String name) {
if ( !this.name.compareAndSet(name, name)
&& !this.name.compareAndSet(null, name)) {
throw new IllegalStateException("name already set to "+this.name.get()+" cannot set to "+name);
}
}
// .. similar code follows....
}
But why do you need anything more than just interfaces to accomplish the task?
Google Guava now hosts a variety of immutable/persistent data structures.

Categories