Is comparing serializable object representation a good idea in java?

Is comparing serializable object representation a good idea in java? - java

I need to compare huge, multiple time nested, complex java object. There are no circular references (ufff). Adding "equal" method to each class takes weeks. I got the idea that maybe serializing these objects to array of bytes, and then comparing these arrays can be easier.
Very simplified example:
public class Car implements Serializable{
String name;
String brand;
public Car(String name, String brand) {
this.name = name;
this.brand = brand;
}
}
public class App {
public static void main(String[] args) {
Car carA = new Car("Ka", "Ford");
Car carB = new Car("P85D", "Tesla");
Car carC = new Car("Ka", "Ford");
byte[] carAr = SerializationUtils.serialize(carA);
byte[] carBr = SerializationUtils.serialize(carB);
byte[] carCr = SerializationUtils.serialize(carC);
boolean res = Arrays.equals(carAr, carBr);
System.out.println(res);
res = Arrays.equals(carAr, carAr);
System.out.println(res);
res = Arrays.equals(carAr, carCr);
System.out.println(res);
}
}
My initial tests (for much smaller and simpler structures) show that it works. Unfortunately I cannot find any proof that object properties are always serialized in the same order.
UPDATE
These classes are generated by a maven plugin and I do not know yet, how to intercept that process to add equals and hashCode methods.

If you're looking to compare objects for equality, serializing them and then comparing is pointless when you can simply override the class' equals method and compare them by their fields (there will undoubtedly be cases where certain fields of the class should not be taken into account, and your method of serialization may not offer that).
Adding "equal" method to each class takes weeks.
I'm sure this isn't the case. Modern IDEs are able to generate these methods for you, and libraries (such as Guava) offer methods that can compare objects by their fields in a single line of code.

You might want to give Apache's EqualsBuilder a try, most specifically this method (and variances of it); I think in the latest versions (3.7) it is fully recursive, but I think they also have a bug with Strings only solved in the yet-to-be-released v3.8, checking transient fields is also an option of some of the methods.

No, it isn't a good idea.
First, it ignores the content of all non-serializable base classes.
Second, it ignores any transient fields that may be present.
Third, it takes into account differences in referenced objects that you may wish to ignore.
Fourth, it requires that all classes concerned are serializable.
Fifth, it wastes time and space.
It isn't a good idea to post-modify generated code, but you could look into building a set of custom Comparators, which exist outside the class, such that you can use Comparatar.compare(T object1, T object2) to compare them.

Related

how to find elements of my custom array list that are not present in another array list

ArrayList<ParkingList> Parkings = new ArrayList<ParkingList>();
ArrayList<ParkingList> ParkingsDB = new ArrayList<ParkingList>();
for example, Parkings may contain (a,b,c,d) objects and ParkingsDB may contain (a,b)
how can i find c,d
i tried using this method but it didint work,
ArrayList<ParkingList> temp = new ArrayList<ParkingList>(Parkings);
temp.removeAll(ParkingsDB);
my class definition:
public class ParkingList {
Rectangle R;
String Name;
int level;
String BuildingName;
public ParkingList(String BuildingName,String Name, Rectangle R, int level) {
this.R=R;
this.Name=Name;
this.level=level;
this.BuildingName=BuildingName;
}
}
i just wanna know, was my method that i used above a correct method? maybe i have another problem i need to fix.
my criteria is , two objects are equal only if all of the attributes in one object are the same in another object.

in order to utilise removeAll on a collection of custom types you'll need to provide an implementation of the equals method and if possible also the hashCode method as it is used by certain collections in the collection API.
another solution would be to utilise removeIf and specify the criteria which defines when two or more objects are equal.
e.g.
ArrayList<ParkingList> temp = new ArrayList(Parkings);
temp.removeIf(x -> ParkingsDB.stream()
.anyMatch(e -> e.getName().equals(x.getName())));
in this case, the criteria is when any given object in temp has the same name as any given object in ParkingsDB then it shall be removed from the temp list.
now you'll simply need to decide whether to provide your own implementation of equals and hashCode or utilise the example above; in all cases, you'll need to provide a criteria which defines when two given objects are equal.
This is irrelevant to the problem at hand, but you don't seem to respect the Java naming conventions at all.
variables as well as methods (except constructors which is a special type of a method) should start with a lowercase letter and follow the camelCase naming convention i.e rather than Parkings it should be parkings, rather than Name it should be name etc.
Also, you seem to have freely exposed the state of ParkingList. you should enforce encapsulation here by making all the variables private and only provide getters and setters where necessary.

The easiest way for you, as it was already mentioned - to implement ParkingList.equals() method. For example you can generate it by IDE.
Than your code:
temp.removeAll(ParkingsDB);
will work as you expected. This happens since list implementation basically depends on equals() method for checking elements.
You may also use streams:
ArrayList<ParkingList> temp = Parkings.stream()
.filter(parking -> !ParkingsDB.contains(parking))
.collect(Collectors.toList());

Map vs. Class Properties advice

I'm writing a program with a bunch of classes that will be serialized to save in a database and to be sent through a network.
To make things easier for accessing the class properties via command line interface, I'm considering storing the properties in a Map class, instead of giving each property it's own variable.
Basically, instead of using something like this:
String id = account.getUserId();
I would do this
String id = account.properties.get("userId");
Is this an advisable way to do things?

Yes, it's a pretty sensible model. It's sometimes called the "prototype object model" and is very similar to how you would work in JavaScript where every object is effectively a Map. This in turn has led to the very popular JSON serialisation format.
Nice features:
You don't have to worry about messy inheritance heirarchies - you can just alter the properties at will.
You can create a new object just by copying from another object (the prototype)
Code to manipulate the data can do so in a uniform way, without having to explicitly name all the variables.
It's more "dynamic" compared to a static class definition - it's easy to extend and modify your objects
Potential risks / downsides:
You need to keep track of your property names if you use Strings - the compiler won't do it for you! This issue can be alleviated by using Enums as keys, but then you lose some flexibility...
You don't get the benefits of static type checking, so you may find that you need to write more JUnit tests as a result to ensure things are working properly
There is a slight performance overhead (though probably not enough to worry about, as map lookups are very fast)
I actually wrote an entire game in the 90s using a variant og this object model (Tyrant) and it worked very well.
Rather than having a Map object exposed however, you may want to consider encapsulating this functionality so that you can use an accessor method on the object itself, e.g.
String id = account.getProperty("userId");

How I prefer to do this is often like this:
enum StringPropertyType {
USERID, FIRSTNAME, LASTNAME
}
interface StringAttributes {
String get(StringPropertyType s);
void put(StringPropertyType s, String value);
}
class MapBasedStringAttributes implements StringAttributes {
Map<StringPropertyType, String> map = new HashMap<~>();
String get(StringPropertyType s) { return map.get(s); }
void put(StringPropertyType s, String value) { map.put(s,value); }
}
this gives you compile-time safety, refactoring, etc.
you could also use the stringPropertyType.name() to get the string representation of the enum value and use
Map<String,String>
instead..

How to detect the "likeness" of data

Generally speaking, can you suggest an approach which would let me test objects to make sure they are alike.
Accept that objects are alike if over 'n%' worth of content of the object is identical.
Other then a brute force, are there any libraries available i can take advantage of?
thanks

As a starting point, have a look at something called the Levenshtein distance and see if it's relevant to your use?

This could only be done on a case by case basis. If I really needed this functionality, I'd define an interface:
public interface Similar<Entity> {
boolean isSimilar(Entity other);
}
Each implementing Class can define what it means to be 'similar' to another instance. Things to keep in mind would be same issues that you would keep in mind for cloning: shallow copy vs deep copy, etc.
Naive implementation of Person:
public class Person implements Similar<Person> {
private String firstName;
private String lastName;
public String getLastName() {
return lastName;
}
public String getFirstName() {
return firstName;
}
public boolean isSimilar(Person other) {
if (other != null) {
if (lastName.equalsIgnoreCase(other.getLastName())
|| (firstName.equalsIgnoreCase(other.getFirstName()))) {
return true;
}
}
return false;
}
}

I believe you can find a good solution if you focus on the details of your specific problem. The only "reasonable" solution I have in mind for the general case is based on reflection: scan the data members and find similarities of corresponding pairs of members recursively.
However, there are so many problems with this idea, so I don't think it's feasible. Among them:
1) The concept of weight of member subtrees should be well defined in order to be able to return a similarity percent.
2) How to handle data members that only belong to one of the objects? this will happen frequently when comparing an instance of class A to an instance of a descendant class B.
3) Maybe the biggest problem: The mapping between the internal structure of an object to its abstract data representation is not an injective function. For example, two hashmaps representing the same mapping may have different inner structure, due to different history of table re-allocations.

One thing you can try is encoding the objects then comparing the result... In particular I've done this with JSON. For detecting if objects match completely, this is straightforward.

You could implement the Comparable interface and define your own 'logic' for comparing instances of a class.
As mentioned before me, for text similarities you could use distance calculation algorithms which you can find in the the SimMetrics library (http://www.dcs.shef.ac.uk/~sam/simmetrics.html).
Another way to compare is by comparing object hashcodes (after you override the hashCode() method of the Object class) - note sure that it's what you are looking for.

Why not allow an external interface to provide hashCode/equals for a HashMap?

With a TreeMap it's trivial to provide a custom Comparator, thus overriding the semantics provided by Comparable objects added to the map. HashMaps however cannot be controlled in this manner; the functions providing hash values and equality checks cannot be 'side-loaded'.
I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap (or a new class)? Something like this, except with better names:
interface Hasharator<T> {
int alternativeHashCode(T t);
boolean alternativeEquals(T t1, T t2);
}
class HasharatorMap<K, V> {
HasharatorMap(Hasharator<? super K> hasharator) { ... }
}
class HasharatorSet<T> {
HasharatorSet(Hasharator<? super T> hasharator) { ... }
}
The case insensitive Map problem gets a trivial solution:
new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);
Would this be doable, or can you see any fundamental problems with this approach?
Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)
EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)
EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.
EDIT: Accepted answer with relation to performance; would love a more specific answer!
EDIT: There is an implementation; see the accepted answer below.
EDIT: Rephrased the first sentence to indicate more clearly that it's the side-loading I'm after (and not ordering; ordering does not belong in HashMap).

.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.

A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap (in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap is commons-collections' own IdentityMap (only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator" to a Map instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.

HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy with the built in HashSet or HashMap. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy and a java.util.Map called UnifiedMapWithHashingStrategy.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map? UnifiedSetWithHashingStrategy uses half the memory of a UnifiedMap, and one quarter the memory of a HashMap. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains(), but not get(). UnifiedSetWithHashingStrategy implements Pool in addition to Set, so it also implements a form of get().
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase() which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
#Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
#Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.

Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.

Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String, you can write a wrapper class around String for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}

good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.

I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)

This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.

There's such a feature in com.google.common.collect.CustomConcurrentHashMap, unfortunately, there's currently no public way how to set the Equivalence (their Hasharator). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.

Vectors in Java, how to return multiple vectors in an object

I'm working on a java program, and I have several vectors defined and filled (from a file) inside a method. I need to return the contents of all the vectors from the method. I have heard you can put them all in one object to return them. Is that possible, and if so, how? If not, do you have any possible solutions for me? Thanks in advance for your help!
Here is a code snippet:
Object getInventory()
{
Vector<String> itemID=new Vector<String>();
Vector<String> itemName=new Vector<String>();
Vector<Integer> pOrdered=new Vector<Integer>();
Vector<Integer> pInStore=new Vector<Integer>();
Vector<Integer> pSold=new Vector<Integer>();
Vector<Double> manufPrice=new Vector<Double>();
Vector<Double> sellingPrice=new Vector<Double>();
Object inventoryItem=new Object(); //object to store vectors in
try
{
Scanner infile= new Scanner(new FileReader("Ch10Ex16Data.txt"));
int i=0;
while (infile.hasNext())
{
itemID.addElement(infile.next());
itemName.addElement(infile.next()+infile.nextLine());
pOrdered.addElement(infile.nextInt());
pInStore.addElement(pOrdered.elementAt(i));
pSold.addElement(0);
manufPrice.addElement(infile.nextDouble());
sellingPrice.addElement(infile.nextDouble());
i++;
}
infile.close();
System.out.println(itemID);
System.out.println(itemName);
System.out.println(pOrdered);
System.out.println(pInStore);
System.out.println(pSold);
System.out.println(manufPrice);
System.out.println(sellingPrice);
}
catch (Exception f)
{
System.out.print(f);
}
return inventoryItem;
}

Personnally, I'd scrap that approach completely. It seems like you need a Product class:
public class Product {
private String itemName;
private int itemID;
// etc etc
public Product(String itemName, int itemID) {
this.itemName = itemName;
this.itemID = itemID;
// etc etc
}
public String getItemName() {
return itemName;
}
public int getItemID() {
return itemID;
}
// etc etc
}
Then something like this :
public class Invertory {
private List<Product> products = new ArrayList<Product>
// etc etc
public Inventory(String fileName) throws IOException {
// Load file,
// Read each product,
products.add(new Product(...product arguments); //add to array
}
public Product[] getProducts() {
return products.toArray(new Product[]{});
}
}

First of all, use ArrayList instead of Vector. Then use a Map as your return object, with each value of the entry is one of your Lists.
Second of all, a much better approach is to create an object that actually holds each of your fields and return a java.util.List of these objects.
public class Item
{
String id;
String name
Integer pOrdered;
Integer inStore;
:
:

You're doing a few things wrong.
Firstly, don't use Vector. Like, ever. If ordering is important to you, you want List on the API (and possibly ArrayList or LinkedList as an implementation).
Secondly, you're trying to have a large number of arrays have values that happen to line up. That's going to be nearly impossible to use. Just create a class that represents one record, and return the List of those.
Thirdly: do not catch that exception. You don't know what to do with it, and you're just going to confuse yourself. Only catch an exception if you have a really good idea what to do in the error case (printing out an error message without a stack is just about never the right thing).
The signature of your method is the most important part. If you get that right, the implementation doesn't matter nearly as much. Aim for something that looks like this:
List<Item> getInventory(File input) throws IOException {
}

You really should reconsider your design here. You have multiple vectors, each with properties of the same type of thing — an item in your inventory. You should probably turn this into a single class, perhaps InventoryItem, with members for the name, price, etc. Then, when reading in each item, you construct an InventoryItem with the given properties, and return a single Vector<InventoryItem>.
If you're really attached to keeping track of all those individual Vectors, you could just return a Vector[] with all the vectors you have:
return new Vector[] { itemID, itemName, pOrdered, pInStore, pSold, manufPrice, sellingPrice };
Also, as Robin says, you should use the ArrayList container instead of Vector. The only thing that will change is that you need to change all calls to someVector.AddElement to someList.add.

Sounds like this should be tagged "Homework".
Okay, first of all, are you required to use all these Vectors, or is that your own decision? Though some may point out that using ArrayLists is better, I'd do away with them and create your own Item class.
This way, instead of having a conceptual item's properties distributed across multiple Vectors (the way you're doing now) you have 1 Item instance per item, with fields for all the data relevant to that item. Now, you only need one data structure (Vector or ArrayList) for all your item objects, and you can return that structure from getInventory().

The easiest way to declare the object would be something like
List<Vector<? extends Object>> inventoryItem = new ArrayList<Vector<? extends Object>>
but this has several problems, namely that Java's generics aren't reified, so you have to test and cast the contents of each vector that you get back. A better solution would be to define a container object that has each of the Vectors as fields and add to those.
However, this looks like it is really missing the point. Instead, you should define an InventoryItem who has each of your seven fields. Each time you read an object from the file, instantiate a new InventoryItem and populate its fields. Then, you add this to a single Vector.
Also, it is generally recommended that you do not use the Vector class. Instead, you should use ArrayList. Vector should really only be used if you need its synchronization properties, and even then you should consider wrapping some other list in a Collections.synchronizedList().
Finally, the places where you would want to catch just an Exception can be counted on one hand. You should really be catching an IOException and even that you might want to consider just rethrowing. Also, you should call printStackTrace() on the exception rather than System.out.println().

I find that a good rule of thumb is that it's never really a good idea to pass collections around outside your objects. They are obviously useful inside your object, but outside you lose control and they are not obvious.
Consider the principle of making your code readable instead of documenting it. If you take a collection, how does that tell the caller what to pass in? Even if you use generics, there is no way to assert control over what happens to the collection--someone could be adding to it and deleting from it in another thread after it's passed to you.
There is no reason not to create a business class that contains your collections along with the business logic to manipulate them (yeah, there is always business logic--it's the copy and paste code you'll find around the locations that you access the collection).
I used to find it frustrating that the JDK always seems to take arrays of built-in types rather than collections, but it makes a lot more sense after coming to terms with the idea that passing collections (like passing around any basic type) is just not a very good idea.

While in general I heartily agree with the advice to use List/ArrayList instead of Vector, it is important to know why. Indeed, I have to vehemently disagree with Dustin who says not to use Vector "ever".
A Vector is in essence a synchronized ArrayList. If you truly need synchronization, by all means then, ignore Dustin's admonition, and use Vector.
There is another instance in which Vector is justified. And that is when you need to maintain compatibility with a pre-Java2 code base.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.