How do I make a Java class immutable in Clojure?

How do I make a Java class immutable in Clojure? - java

I'd like to wrap java's PriorityQueue class in clojure for use in another part of my program. What I'm trying to figure out is if there is any way to do this in a lispy manner and make the priority queue immutable. Are there any good ways to do this, or am I just going to be better off using the PriorityQueue as a mutable data structure?

I don't think there is a simple way to wrap a mutable data structure as an immutable one. Immutable data structures become efficient when the new version can share data with the old version in clever ways, and I can't really see how this can be done without access to the internals of PriorityQueue.
If you really want a persistent priority queue this thread might be interesting. Those seems to have linear-time inserts though, so if that is an issue maybe you have to look for another implementation.
Edit: On second thought, a simple implementation of a persistent priority queue is just to store the (prio, value)-pairs in a sorted set. Something like this:
(defn make-pqueue []
(sorted-set))
(defn pqueue-add [pq x prio]
(conj pq [prio x]))
(defn pqueue-peek [pq]
(first pq))
(defn pqueue-pop [pq]
(let [top (first pq)]
(disj pq top)))
Of course, the code above is pretty limited (no multiple entries, for instance) but it illustrates the idea.

You can't automagically make mutable class immutable. One can always call java class directly and mutate it.
To force immutability you can either implement it in clojure, or extend java class and throw exceptions in all mutable method implementations.

Related

Vector vs SynchronizedList performance

While reading Oracle tutorial on collections implementations, i found the following sentence :
If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList
source : List Implementations
but when searching for difference between them, many people discourage the using of Vector and should be replaced by SynchronizedList when the synchronization is needed.
So which side has right to be followed ?

When you use Collections.synchronizedList(new ArrayList<>()) you are separating the two implementation details. It’s clear, how you could change the underlying storage model from “array based” to, e.g. “linked nodes”, by simply replacing new ArrayList with new LinkedList without changing the synchronized decoration.
The statement that Vector “will be slightly faster” seems to be based on the fact that its use does not bear a delegation between the wrapper and the underlying storage, but deriving statements about the performance from that was even questionable by the time, when ArrayList and the synchronizedList wrapper were introduced.
It should be noted that when you are really concerned about the performance of a list accessed by multiple threads, you will use neither of these two alternatives. The idea of making a storage thread safe by making all access methods synchronized is flawed right from the start. Every operation that involves multiple access to the list, e.g. simple constructs like if(!list.contains(o)) list.add(o); or iterating over the list or even a simple Collections.swap(list, i, j); require additional manual synchronization to work correctly in a multi-threaded setup.
If you think it over, you will realize that most operations of a real life application consist of multiple access and therefore will require careful manual locking and the fact that every low level access method synchronizes additionally can not only take away performance, it’s also a disguise pretending a safety that isn’t there.

Vector is an old API, of course. No questions there.
For speed though, it might be only because the synchronized list involves extra method calls to reach and return the data since it is a "wrapper" on top of a list after all. That's all there is to it, imho.

How can I be sure that my class is immutable

I am required to create a class very similar to String however instead of storing an array of characters, the object must store an array of bytes because I will be dealing with binary data, not strings.
I am using HashMaps within my application. I am therefore keen to make my custom byteArray class immutable since immutable objects perform faster searches in hashmaps. (I would like a source for this fact please)
I'm pretty sure my class is immutable, but its still performing poorly vs string in hashmap searches. How can I be sure it is immutable?

The most important thing is to copy the bytes into your array. If you have
this.bytes = passedInArray;
The caller can modify passedInArray and hence modify this.bytes. You must do
this.bytes = Arrays.copyOf(passedInArray, passedInArray.length);
(Or similar, clone is o.k. too). If this class will be mainly used as a key in Maps, I'd calculate the hashcode immediately (in the constructor), simpler than doing it lazily.
Implement the obvious equals() and I think you are done.

Your question is "How can I be sure that my class is immutable?" I'm not sure that's what you mean to ask, but the way to make your class immutable is listed by Josh Bloch in Effective Java, 2nd Ed. in item 15 here, and which I'll summarize in this answer:
Don't provide any mutator methods (methods that change the object's state, usually called "setters").
Ensure the class can't be extended. Generally, make the class final. This keeps others from subclassing it and modifying protected fields.
Make all fields final, so you can't change them.
Make all fields private, so others can't change them.
"Ensure exclusive access to mutable components." That is, if something else points to the data and therefore can alter it, make a defensive copy (as #user949300 pointed out).
Note that immutable objects don't automatically yield a the big performance boost. The boost from immutable objects would be from not having to lock or copy the object, and from reusing it instead of creating a new one. I believe the searches in HashMap use the class' hashCode() method, and the lookup should be O(c), or constant-time and fast. If you are having performance issues, you may need to look at if there's slowness in your hashCode() method (unlikely), or issues elsewhere.
One possibility is if you have implemented hashCode() poorly (or not at all) and this is causing a large number of collisions in your HashMap -- that is, calling that method with different instances of your class returns mostly similar or same values -- then the instances will be stored in a linked list at the location specified by hashCode(). Traversing this list will convert your efficiency from constant-time to linear-time, making performance much worse.

since immutable objects perform faster searches in hashmaps. (I would like a source for this fact please)
No, this isn't true. Performance as a hashmap key will be determined by the runtime, and collision avoidance, of hashCode.
I'm pretty sure my class is immutable, but its still performing poorly vs string in hashmap searches. How can I be sure it is immutable?
Your problem is more likely to be a poor choice of hashCode implementation. Consider basing your implementation around Arrays.hashCode.
(Your question ArrayList<Byte> vs String in Java suggests you're trying to tune a specific implementation; the advice there to use byte[] is good.)

Java HashMap - deep copy

I am just trying to find out the best solution how to make a deep copy of HashMap. There are no objects in this map which implement Cloneable. I would like to find better solution than serialization and deserialization.

Take a look at Deep Cloning, on Google Code you can find a library. You can read it on https://github.com/kostaskougios/cloning.
How it works is easy. This can clone any object, and the object doesnt have to implement any interfaces, like serializable.
Cloner cloner = new Cloner();
MyClass clone = cloner.deepClone(o);
// clone is a deep-clone of o
Be aware though: this may clone thousands of objects (if the cloned object has that many references). Also, copying Files or Streams may crash the JVM.
You can, however, ignore certain instances of classes, like streams et cetera. It's worth checking this library and its source out.

I don't think it can be implemented in a generic way.
If you have the chance to simply implement clone, I'd go that way.
A bit more complex is creating a type map, where you look up some kind of clone implmentation class based on the class of each object
When the objects might form a Directed Acyclic Graph, I'd in general keep a Map from the original to the clone of every object I've ever seen, and check if I've already made it
When you have a general graph, the problem gets really nasty. You might have strange constraints of the object creation order, it might even be impossible when you have final fields.
For now, I'd propose to re-write your question in a less general way

This is not easy, we are using some sort of workaround:
1) convert the map to json string. (for example, using Google Gson)
2) convert the json string back to map.
Do note that there is performance issue, but this is kind of easiest way.

Java: Should I always replace Arrays for ArrayLists?

Well, it seems to me ArrayLists make it easier to expand the code later on both because they can grow and because they make using Generics easier. However, for multidimensional arrays, I find the readability of the code is better with standard arrays.
Anyway, are there some guidelines on when to use one or the other? For example, I'm about to return a table from a function (int[][]), but I was wondering if it wouldn't be better to return a List<List<Integer>> or a List<int[]>.

Unless you have a strong reason otherwise, I'd recommend using Lists over arrays.
There are some specific cases where you will want to use an array (e.g. when you are implementing your own data structures, or when you are addressing a very specific performance requirement that you have profiled and identified as a bottleneck) but for general purposes Lists are more convenient and will offer you more flexibility in how you use them.
Where you are able to, I'd also recommend programming to the abstraction (List) rather than the concrete type (ArrayList). Again, this offers you flexibility if you decide to chenge the implementation details in the future.
To address your readability point: if you have a complex structure (e.g. ArrayList of HashMaps of ArrayLists) then consider either encapsulating this complexity within a class and/or creating some very clearly named functions to manipulate the structure.

Choose a data structure implementation and interface based on primary usage:
Random Access: use List for variable type and ArrayList under the hood
Appending: use Collection for variable type and LinkedList under the hood
Loop and process: use Iterable and see the above for use under the hood based on producer code
Use the most abstract interface possible when handing around data. That said don't use Collection when you need random access. List has get(int) which is very useful when random access is needed.
Typed collections like List<String> make up for the syntactical convenience of arrays.
Don't use Arrays unless you have a qualified performance expert analyze and recommend them. Even then you should get a second opinion. Arrays are generally a premature optimization and should be avoided.
Generally speaking you are far better off using an interface rather than a concrete type. The concrete type makes it hard to rework the internals of the function in question. For example if you return int[][] you have to do all of the computation upfront. If you return List> you can lazily do computation during iteration (or even concurrently in the background) if it is beneficial.

The List is more powerful:
You can resize the list after it has been created.
You can create a read-only view onto the data.
It can be easily combined with other collections, like Set or Map.
The array works on a lower level:
Its content can always be changed.
Its length can never be changed.
It uses less memory.
You can have arrays of primitive data types.

I wanted to point out that Lists can hold the wrappers for the primitive data types that would otherwise need to be stored in an array. (ie a class Double that has only one field: a double) The newer versions of Java convert to and from these wrappers implicitly, at least most of the time, so the ability to put primitives in your Lists should not be a consideration for the vast majority of use cases.
For completeness: the only time that I have seen Java fail to implicitly convert from a primitive wrapper was when those wrappers were composed in a higher order structure: It could not convert a Double[] into a double[].

It mostly comes down to flexibility/ease of use versus efficiency. If you don't know how many elements will be needed in advance, or if you need to insert in the middle, ArrayLists are a better choice. They use Arrays under the hood, I believe, so you'll want to consider using the ensureCapacity method for performance. Arrays are preferred if you have a fixed size in advance and won't need inserts, etc.

Why are variables declared with their interface name in Java? [duplicate]

This question already has answers here:
What does it mean to "program to an interface"?
(33 answers)
Closed 6 years ago.
This is a real beginner question (I'm still learning the Java basics).
I can (sort of) understand why methods would return a List<String> rather than an ArrayList<String>, or why they would accept a List parameter rather than an ArrayList. If it makes no difference to the method (i.e., if no special methods from ArrayList are required), this would make the method more flexible, and easier to use for callers. The same thing goes for other collection types, like Set or Map.
What I don't understand: it appears to be common practice to create local variables like this:
List<String> list = new ArrayList<String>();
While this form is less frequent:
ArrayList<String> list = new ArrayList<String>();
What's the advantage here?
All I can see is a minor disadvantage: a separate "import" line for java.util.List has to be added. Technically, "import java.util.*" could be used, but I don't see that very often either, probably because the "import" lines are added automatically by some IDE.

When you read
List<String> list = new ArrayList<String>();
you get the idea that all you care about is being a List<String> and you put less emphasis on the actual implementation. Also, you restrict yourself to members declared by List<String> and not the particular implementation. You don't care if your data is stored in a linear array or some fancy data structure, as long as it looks like a List<String>.
On the other hand, reading the second line gives you the idea that the code cares about the variable being ArrayList<String>. By writing this, you are implicitly saying (to future readers) that you shouldn't blindly change actual object type because the rest of the code relies on the fact that it is really an ArrayList<String>.

Using the interface allows you to quickly change the underlying implementation of the List/Map/Set/etc.
It's not about saving keystrokes, it's about changing implementation quickly. Ideally, you shouldn't be exposing the underlying specific methods of the implementation and just use the interface required.

I would suggest thinking about this from the other end around. Usually you want a List or a Set or any other Collection type - and you really do not care in your code how exactly this is implemented. Hence your code just works with a List and do whatever it needs to do (also phrased as "always code to interfaces").
When you create the List, you need to decide what actual implementation you want. For most purposes ArrayList is "good enough", but your code really doesn't care. By sticking to using the interface you convey this to the future reader.
For instance I have a habit of having debug code in my main method which dumps the system properties to System.out - it is usually much nicer to have them sorted. The easiest way is to simply let "Map map = new TreeMap(properties);" and THEN iterate through them, as TreeMap returns the keys sorted.
When you learn more about Java, you will also see that interfaces are very helpful in testing and mocking, since you can create objects with behaviour specified at runtime conforming to a given interface. An advanced (but simple) example can be seen at http://www.exampledepot.com/egs/java.lang.reflect/ProxyClass.html

if later you want to change implementation of the list and use for example LinkedList(maybe for better performance) you dont have to change the whole code(and API if its library). if order doesnt matter you should return Collection so later on you can easily change it to Set if you would need items to be sorted.

The best explanation I can come up with (because I don't program in Java as frequently as in other languages) is that it make it easier to change the "back-end" list type while maintaining the same code/interface everything else is relying on. If you declare it as a more specific type first, then later decide you want a different kind... if something happens to use an ArrayList-specific method, that's extra work.
Of course, if you actually need ArrayList-specific behavior, you'd go with the specific variable type instead.

The point is to identify the behavior you want/need and then use the interface that provides that behavior. The is the type for your variable. Then, use the implementation that meets your other needs - efficiency, etc. This is what you create with "new". This duality is one of the major ideas behind OOD. The issue is not particularly significant when you are dealing with local variables, but it rarely hurts to follow good coding practices all the time.

Basically this comes from people who have to run large projects, possibly other reasons - you hear it all the time. Why, I don't actually know. If you have need of an array list, or Hash Map or Hash Set or whatever else I see no point in eliminating methods by casting to an interface.
Let us say for example, recently I learned how to use and implemented HashSet as a principle data structure. Suppose, for whatever reason, I went to work on a team. Would not that person need to know that the data was keyed on hashing approaches rather than being ordered by some basis? The back-end approach noted by Twisol works in C/C++ where you can expose the headers and sell a library thus, if someone knows how to do that in Java I would imagine they would use JNI - at which point is seems simpler to me to use C/C++ where you can expose the headers and build libs using established tools for that purpose.
By the time you can get someone who can install a jar file in the extensions dir it would seem to me that entity could be jus short steps away - I dropped several crypto libs in the extensions directory, that was handy, but I would really like to see a clear, concise basis elucidated. I imagine they do that all the time.
At this point it sounds to me like classic obfuscation, but beware: You have some coding to do before the issue is of consequence.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.