detect circular reference in an object - java

Suppose you have a java object, would it be possible to detect where exists circular references inside that java object?
I would like to hear if there is a library to deal with this problem.
Thanks in advance.

Beware, this is not trivial task, but you already know this, right? ;)
In Java there is implementation of IdentityHashMap that is designed to be uses in such cases.

Conceptually simple, but can be quite complex to implement.
First off, a lot depends on what type of objects you're dealing with. If only a small number of object classes, and you "own" the classes and can modify them to add "search yourself" code, then it becomes much easier:
Add an interface to each class and implement the "search yourself" method in each class. The method receives a list of objects, and returns a return code. The method compares its own address to each object on the list, returning true (ie, loop found) if one matches. Then (if no match) it adds its own address to the list and calls, in turn, the "search yourself" method of each object reference it contains. If any of these calls results in a true return code, that is returned, otherwise false is returned. (This is a "depth-first, recursive" search.)
If you don't "own" the classes then you must use reflections to implement essentially the above algorithm without modifying the classes.
There are other search algorithms that can be used -- "breadth-first", and various non-recursive versions of depth-first, but they all represent trade-offs of one sort or another of between heap storage, stack storage, and performance.

A bit of a lateral answer, but how about using net.sf.json.JSONObject.fromObject(...) which checks for circular references and throws an exception if any are found. Also, you can configure the library to handle circular references differently if necessary. You would have to write a getter for those class members that exist in the cyclical relationship, since that is what JSONObject uses to create the JSON.

It seems to be simple task to code. Use Java Reflection API to crawl the graph of java objects and collect visited objects. If you visit object that is already in the set that means that there has to be a circuit. To crawl use BFS or DFS algorithms.

You don't need any library. It's simple breadth-first search alghoritm and reflection API. Of course you can try to find a library implementing this alghoritm.

Related

how can I return an object by giving only an interface [duplicate]

when programming in Java I practically always, just out of habit, write something like this:
public List<String> foo() {
return new ArrayList<String>();
}
Most of the time without even thinking about it. Now, the question is: should I always specify the interface as the return type? Or is it advisable to use the actual implementation of the interface, and if so, under what circumstances?
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList:
List bar = foo();
List myList = bar instanceof LinkedList ? new ArrayList(bar) : bar;
but that just seems horrible and my coworkers would probably lynch me in the cafeteria. And rightfully so.
What do you guys think? What are your guidelines, when do you tend towards the abstract solution, and when do you reveal details of your implementation for potential performance gains?
Return the appropriate interface to hide implementation details. Your clients should only care about what your object offers, not how you implemented it. If you start with a private ArrayList, and decide later on that something else (e.g., LinkedLisk, skip list, etc.) is more appropriate you can change the implementation without affecting clients if you return the interface. The moment you return a concrete type the opportunity is lost.
For instance, if I know that I will
primarily access the data in the list
randomly, a LinkedList would be bad.
But if my library function only
returns the interface, I simply don't
know. To be on the safe side I might
even need to copy the list explicitly
over to an ArrayList.
As everybody else has mentioned, you just mustn't care about how the library has implemented the functionality, to reduce coupling and increasing maintainability of the library.
If you, as a library client, can demonstrate that the implementation is performing badly for your use case, you can then contact the person in charge and discuss about the best path to follow (a new method for this case or just changing the implementation).
That said, your example reeks of premature optimization.
If the method is or can be critical, it might mention the implementation details in the documentation.
Without being able to justify it with reams of CS quotes (I'm self taught), I've always gone by the mantra of "Accept the least derived, return the most derived," when designing classes and it has stood me well over the years.
I guess that means in terms of interface versus concrete return is that if you are trying to reduce dependencies and/or decouple, returning the interface is generally more useful. However, if the concrete class implements more than that interface, it is usually more useful to the callers of your method to get the concrete class back (i.e. the "most derived") rather than aribtrarily restrict them to a subset of that returned object's functionality - unless you actually need to restrict them. Then again, you could also just increase the coverage of the interface. Needless restrictions like this I compare to thoughtless sealing of classes; you never know. Just to talk a bit about the former part of that mantra (for other readers), accepting the least derived also gives maximum flexibility for callers of your method.
-Oisin
Sorry to disagree, but I think the basic rule is as follows:
For input arguments use the most generic.
For output values, the most specific.
So, in this case you want to declare the implementation as:
public ArrayList<String> foo() {
return new ArrayList<String>();
}
Rationale:
The input case is already known and explained by everyone: use the interface, period. However, the output case can look counter-intuitive.
You want to return the implementation because you want the client to have the most information about what is receiving. In this case, more knowledge is more power.
Example 1: the client wants to get the 5th element:
return Collection: must iterate until 5th element vs return List:
return List: list.get(4)
Example 2: the client wants to remove the 5th element:
return List: must create a new list without the specified element (list.remove() is optional).
return ArrayList: arrayList.remove(4)
So it's a big truth that using interfaces is great because it promotes reusability, reduces coupling, improves maintainability and makes people happy ... but only when used as input.
So, again, the rule can be stated as:
Be flexible for what you offer.
Be informative with what you deliver.
So, next time, please return the implementation.
In OO programming, we want to encapsulate as much as possible the data. Hide as much as possible the actual implementation, abstracting the types as high as possible.
In this context, I would answer only return what is meaningful. Does it makes sense at all for the return value to be the concrete class? Aka in your example, ask yourself: will anyone use a LinkedList-specific method on the return value of foo?
If no, just use the higher-level Interface. It's much more flexible, and allows you to change the backend
If yes, ask yourself: can't I refactor my code to return the higher-level interface? :)
The more abstract is your code, the less changes your are required to do when changing a backend. It's as simple as that.
If, on the other hand, you end up casting the return values to the concrete class, well that's a strong sign that you should probably return instead the concrete class. Your users/teammates should not have to know about more or less implicit contracts: if you need to use the concrete methods, just return the concrete class, for clarity.
In a nutshell: code abstract, but explicitly :)
In general, for a public facing interface such as APIs, returning the interface (such as List) over the concrete implementation (such as ArrayList) would be better.
The use of a ArrayList or LinkedList is an implementation detail of the library that should be considered for the most common use case of that library. And of course, internally, having private methods handing off LinkedLists wouldn't necessarily be a bad thing, if it provides facilities that would make the processing easier.
There is no reason that a concrete class shouldn't be used in the implementation, unless there is a good reason to believe that some other List class would be used later on. But then again, changing the implementation details shouldn't be as painful as long as the public facing portion is well-designed.
The library itself should be a black box to its consumers, so they don't really have to worry about what's going on internally. That also means that the library should be designed so that it is designed to be used in the way it is intended.
It doesn't matter all that much whether an API method returns an interface or a concrete class; despite what everyone here says, you almost never change the implementiation class once the code is written.
What's far more important: always use minimum-scope interfaces for your method parameters! That way, clients have maximal freedom and can use classes your code doesn't even know about.
When an API method returns ArrayList, I have absolutely no qualms with that, but when it demands an ArrayList (or, all to common, Vector) parameter, I consider hunting down the programmer and hurting him, because it means that I can't use Arrays.asList(), Collections.singletonList() or Collections.EMPTY_LIST.
As a rule, I only pass back internal implementations if I am in some private, inner workings of a library, and even so only sparingly. For everything that is public and likely to be called from the outside of my module I use interfaces, and also the Factory pattern.
Using interfaces in such a way has proven to be a very reliable way to write reusable code.
The main question has been answered already and you should always use the interface. I however would just like to comment on
It is obvious that using the interface has a lot of advantages (that's why it's there). In most cases it doesn't really matter what concrete implementation is used by a library function. But maybe there are cases where it does matter. For instance, if I know that I will primarily access the data in the list randomly, a LinkedList would be bad. But if my library function only returns the interface, I simply don't know. To be on the safe side I might even need to copy the list explicitly over to an ArrayList.
If you are returning a data structure that you know has poor random access performance -- O(n) and typically a LOT of data -- there are other interfaces you should be specifying instead of List, like Iterable so that anyone using the library will be fully aware that only sequential access is available.
Picking the right type to return isn't just about interface versus concrete implementation, it is also about selecting the right interface.
You use interface to abstract away from the actual implementation. The interface is basically just a blueprint for what your implementation can do.
Interfaces are good design because they allow you to change implementation details without having to fear that any of its consumers are directly affected, as long as you implementation still does what your interface says it does.
To work with interfaces you would instantiate them like this:
IParser parser = new Parser();
Now IParser would be your interface, and Parser would be your implementation. Now when you work with the parser object from above, you will work against the interface (IParser), which in turn will work against your implementation (Parser).
That means that you can change the inner workings of Parser as much as you want, it will never affect code that works against your IParser parser interface.
In general use the interface in all cases if you have no need of the functionality of the concrete class. Note that for lists, Java has added a RandomAccess marker class primarily to distinguish a common case where an algorithm may need to know if get(i) is constant time or not.
For uses of code, Michael above is right that being as generic as possible in the method parameters is often even more important. This is especially true when testing such a method.
You'll find (or have found) that as you return interfaces, they permeate through your code. e.g. you return an interface from method A and you have to then pass an interface to method B.
What you're doing is programming by contract, albeit in a limited fashion.
This gives you enormous scope to change implementations under the covers (provided these new objects fulfill the existing contracts/expected behaviours).
Given all of this, you have benefits in terms of choosing your implementation, and how you can substitute behaviours (including testing - using mocking, for example). In case you hadn't guessed, I'm all in favour of this and try to reduce to (or introduce) interfaces wherever possible.

Java, best way to map objects in this specific case

This may be a duplicate, but I don't know the correct terminology to even search for what I want, so I apologise if this is and the title is not completely specific.
This is my scenario:
I have 2 different types of Objects I want to map to each other, call them ObjectA and ObjectB. These Objects are generated in a for-loop. I want each instance in an iteration to be mapped to the other. So basically, ObjectA(1) - ObjectB(1), ObjectA(2) - ObjectB(2), etc.. And there will be roughly 500 - 3000 entries to map.
The reason for this is some methods will be passed ObjectA and I need to get the corresponding ObjectB, and vice versa. I also can not use the initial loop index as reference, it just needs to be one of the objects.
I have tried making use of Guava HashBiMap, which works, but I don't like it for several reasons. 1) This is the only instance I will be making use of any Guava class, and I don't necessarily want to add ~500kb to my package for it. (This is for a mobile app, so trying to keep it small) and 2) I need to iterate the objects every frame, and iterating through the keySet() was giving significant memory allocation. I am sure there are ways around this and it was probably just some mistake I made, but still.. reason 1.
The current solution I have since I know the indexes are mapped, is to just have 2 ArrayList's, (or actually libgdx Array's, but for the purpose of understanding logic we can assume ArrayList), and I simply do;
ObjectA objectA = objectAList.get(objectBList.indexOf(objectB));
And vice versa.
Anyway, I don't like this solution either, it feels expensive and I sure there is a much simpler and faster method, I just don't specifically know what to search for.
Thanks for the help!
Maybe this will be helpful for you. There is an interface in Apache Commons Collections called BidiMap
https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/BidiMap.html
Classes implementing this interface have methods u are looking for. I mean getting key by value and vice versa and this package weighs about 150 kb.
regards

Check whether a Java Object has been modified

I would like to use a clean/automatic way to check if a Java Object has been modified.
My specific problem is the following:
In my Java application, I use XStream library to deserialize XML to Java Objects, then the user can modify or change them. I'd like a way to check if these Objects in memory are at some point different from the serialized ones, so I can inform the user and ask him if he want to save the changes (i.e. serialize using XStream) or not.
In my application there are many Objects and are quite complex.
Please consider that I don't use databases in my application, so I'm not interested in solutions like using hibernate.
Two approaches:
Implement a hashcode for your objects, and compare the hashcode of the in-memory objects against the hashcode of the serialized objects to see if they've been changed. This is has a low impact on your class design, but performance will go down as O(n^2) as the number of objects increases. Note that two objects might return the same hashcode, but a good hashing implementation will make this very unlikely. If you are concerned about this, implement and use your own equals() method.
Have your objects implement the Observer pattern and have each setter method, or any other method that modifies the object, notify the observer when it's called. Performance will be better for large numbers of objects (as long as they aren't changing constantly), but it requires you to introduce Observer code into possibly lightweight classes. Java provides a utility interface for Observable, but you'll still need to do most of the work.
You can store a version field in this object, whenever the object changed it should update its version field (increment it), you can then compare the version field with the serialized object version field

Why are variables declared with their interface name in Java? [duplicate]

This question already has answers here:
What does it mean to "program to an interface"?
(33 answers)
Closed 6 years ago.
This is a real beginner question (I'm still learning the Java basics).
I can (sort of) understand why methods would return a List<String> rather than an ArrayList<String>, or why they would accept a List parameter rather than an ArrayList. If it makes no difference to the method (i.e., if no special methods from ArrayList are required), this would make the method more flexible, and easier to use for callers. The same thing goes for other collection types, like Set or Map.
What I don't understand: it appears to be common practice to create local variables like this:
List<String> list = new ArrayList<String>();
While this form is less frequent:
ArrayList<String> list = new ArrayList<String>();
What's the advantage here?
All I can see is a minor disadvantage: a separate "import" line for java.util.List has to be added. Technically, "import java.util.*" could be used, but I don't see that very often either, probably because the "import" lines are added automatically by some IDE.
When you read
List<String> list = new ArrayList<String>();
you get the idea that all you care about is being a List<String> and you put less emphasis on the actual implementation. Also, you restrict yourself to members declared by List<String> and not the particular implementation. You don't care if your data is stored in a linear array or some fancy data structure, as long as it looks like a List<String>.
On the other hand, reading the second line gives you the idea that the code cares about the variable being ArrayList<String>. By writing this, you are implicitly saying (to future readers) that you shouldn't blindly change actual object type because the rest of the code relies on the fact that it is really an ArrayList<String>.
Using the interface allows you to quickly change the underlying implementation of the List/Map/Set/etc.
It's not about saving keystrokes, it's about changing implementation quickly. Ideally, you shouldn't be exposing the underlying specific methods of the implementation and just use the interface required.
I would suggest thinking about this from the other end around. Usually you want a List or a Set or any other Collection type - and you really do not care in your code how exactly this is implemented. Hence your code just works with a List and do whatever it needs to do (also phrased as "always code to interfaces").
When you create the List, you need to decide what actual implementation you want. For most purposes ArrayList is "good enough", but your code really doesn't care. By sticking to using the interface you convey this to the future reader.
For instance I have a habit of having debug code in my main method which dumps the system properties to System.out - it is usually much nicer to have them sorted. The easiest way is to simply let "Map map = new TreeMap(properties);" and THEN iterate through them, as TreeMap returns the keys sorted.
When you learn more about Java, you will also see that interfaces are very helpful in testing and mocking, since you can create objects with behaviour specified at runtime conforming to a given interface. An advanced (but simple) example can be seen at http://www.exampledepot.com/egs/java.lang.reflect/ProxyClass.html
if later you want to change implementation of the list and use for example LinkedList(maybe for better performance) you dont have to change the whole code(and API if its library). if order doesnt matter you should return Collection so later on you can easily change it to Set if you would need items to be sorted.
The best explanation I can come up with (because I don't program in Java as frequently as in other languages) is that it make it easier to change the "back-end" list type while maintaining the same code/interface everything else is relying on. If you declare it as a more specific type first, then later decide you want a different kind... if something happens to use an ArrayList-specific method, that's extra work.
Of course, if you actually need ArrayList-specific behavior, you'd go with the specific variable type instead.
The point is to identify the behavior you want/need and then use the interface that provides that behavior. The is the type for your variable. Then, use the implementation that meets your other needs - efficiency, etc. This is what you create with "new". This duality is one of the major ideas behind OOD. The issue is not particularly significant when you are dealing with local variables, but it rarely hurts to follow good coding practices all the time.
Basically this comes from people who have to run large projects, possibly other reasons - you hear it all the time. Why, I don't actually know. If you have need of an array list, or Hash Map or Hash Set or whatever else I see no point in eliminating methods by casting to an interface.
Let us say for example, recently I learned how to use and implemented HashSet as a principle data structure. Suppose, for whatever reason, I went to work on a team. Would not that person need to know that the data was keyed on hashing approaches rather than being ordered by some basis? The back-end approach noted by Twisol works in C/C++ where you can expose the headers and sell a library thus, if someone knows how to do that in Java I would imagine they would use JNI - at which point is seems simpler to me to use C/C++ where you can expose the headers and build libs using established tools for that purpose.
By the time you can get someone who can install a jar file in the extensions dir it would seem to me that entity could be jus short steps away - I dropped several crypto libs in the extensions directory, that was handy, but I would really like to see a clear, concise basis elucidated. I imagine they do that all the time.
At this point it sounds to me like classic obfuscation, but beware: You have some coding to do before the issue is of consequence.

Considering object encapsulation, should getters return an immutable property?

When a getter returns a property, such as returning a List of other related objects, should that list and it's objects be immutable to prevent code outside of the class, changing the state of those objects, without the main parent object knowing?
For example if a Contact object, has a getDetails getter, which returns a List of ContactDetails objects, then any code calling that getter:
can remove ContactDetail objects from that list without the Contact object knowing of it.
can change each ContactDetail object without the Contact object knowing of it.
So what should we do here? Should we just trust the calling code and return easily mutable objects, or go the hard way and make a immutable class for each mutable class?
It's a matter of whether you should be "defensive" in your code. If you're the (sole) user of your class and you trust yourself then by all means no need for immutability. However, if this code needs to work no matter what, or you don't trust your user, then make everything that is externalized immutable.
That said, most properties I create are mutable. An occasional user botches this up, but then again it's his/her fault, since it is clearly documented that mutation should not occur via mutable objects received via getters.
It depends on the context. If the list is intended to be mutable, there is no point in cluttering up the API of the main class with methods to mutate it when List has a perfectly good API of its own.
However, if the main class can't cope with mutations, then you'll need to return an immutable list - and the entries in the list may also need to be immutable themselves.
Don't forget, though, that you can return a custom List implementation that knows how to respond safely to mutation requests, whether by firing events or by performing any required actions directly. In fact, this is a classic example of a good time to use an inner class.
If you have control of the calling code then what matters most is that the choice you make is documented well in all the right places.
Joshua Bloch in his excellent "Effective Java" book says that you should ALWAYS make defensive copies when returning something like this. That may be a little extreme, especially if the ContactDetails objects are not Cloneable, but it's always the safe way. If in doubt always favour code safety over performance - unless profiling has shown that the cloneing is a real performance bottleneck.
There are actually several levels of protection you can add. You can simply return the member, which is essentially giving any other class access to the internals of your class. Very unsafe, but in fairness widely done. It will also cause you trouble later if you want to change the internals so that the ContactDetails are stored in a Set. You can return a newly-created list with references to the same objects in the internal list. This is safer - another class can't remove or add to the list, but it can modify the existing objects. Thirdly return a newly created list with copies of the ContactDetails objects. That's the safe way, but can be expensive.
I would do this a better way. Don't return a list at all - instead return an iterator over a list. That way you don't have to create a new list (List has a method to get an iterator) but the external class can't modify the list. It can still modify the items, unless you write your own iterator that clones the elements as needed. If you later switch to using another collection internally it can still return an iterator, so no external changes are needed.
In the particular case of a Collection, List, Set, or Map in Java, it is easy to return an immutable view to the class using return Collections.unmodifiableList(list);
Of course, if it is possible that the backing-data will still be modified then you need to make a full copy of the list.
Depends on the context, really. But generally, yes, one should write as defensive code as possible (returning array copies, returning readonly wrappers around collections etc.). In any case, it should be clearly documented.
I used to return a read-only version of the list, or at least, a copy. But each object contained in the list must be editable, unless they are immutable by design.
I think you'll find that it's very rare for every gettable to be immutable.
What you could do is to fire events when a property is changed within such objects. Not a perfect solution either.
Documentation is probably the most pragmatic solution ;)
Your first imperative should be to follow the Law of Demeter or ‘Tell don't ask’; tell the object instance what to do e.g.
contact.print( printer ) ; // or
contact.show( new Dialog() ) ; // or
contactList.findByName( searchName ).print( printer ) ;
Object-oriented code tells objects to do things. Procedural code gets information then acts on that information. Asking an object to reveal the details of its internals breaks encapsulation, it is procedural code, not sound OO programming and as Will has already said it is a flawed design.
If you follow the Law of Demeter approach any change in the state of an object occurs through its defined interface, therefore side-effects are known and controlled. Your problem goes away.
When I was starting out I was still heavily under the influence of HIDE YOUR DATA OO PRINCIPALS LOL. I would sit and ponder what would happen if somebody changed the state of one of the objects exposed by a property. Should I make them read only for external callers? Should I not expose them at all?
Collections brought out these anxieties to the extreme. I mean, somebody could remove all the objects in the collection while I'm not looking!
I eventually realized that if your objects' hold such tight dependencies on their externally visible properties and their types that, if somebody touches them in a bad place you go boom, your architecture is flawed.
There are valid reasons to make your external properties readonly and their types immutable. But that is the corner case, not the typical one, imho.
First of all, setters and getters are an indication of bad OO. Generally the idea of OO is you ask the object to do something for you. Setting and getting is the opposite. Sun should have figured out some other way to implement Java beans so that people wouldn't pick up this pattern and think it's "Correct".
Secondly, each object you have should be a world in itself--generally, if you are going to use setters and getters they should return fairly safe independent objects. Those objects may or may not be immutable because they are just first-class objects. The other possibility is that they return native types which are always immutable. So saying "Should setters and getters return something immutable" doesn't make too much sense.
As for making immutable objects themselves, you should virtually always make the members inside your object final unless you have a strong reason not to (Final should have been the default, "mutable" should be a keyword that overrides that default). This implies that wherever possible, objects will be immutable.
As for predefined quasi-object things you might pass around, I recommend you wrap stuff like collections and groups of values that go together into their own classes with their own methods. I virtually never pass around an unprotected collection simply because you aren't giving any guidance/help on how it's used where the use of a well-designed object should be obvious. Safety is also a factor since allowing someone access to a collection inside your class makes it virtually impossible to ensure that the class will always be valid.

Categories