I have an external key-value storage, which contains a set of binary values per key. These binary values are Scala objects serialized using Jackson 2.5 with a ScalaPlugin.
To preserve set semantics in the storage, binary representation of an object should be stable, i.e. serializing it two times should result in the same sequence of bytes. Does Jackson have this guarantee? If it uses JSON serialization as an intermediate step for example, there is no node ordering, so binary serialization will not be stable.
This may get even more tricky if an object contains unordered structures internally (like sets or maps). Does Jackson handle that?
If not, are there sound alternatives?
Update:
I have found com.fasterxml.jackson.databind.SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS which resolves a part of the question.
You can use the option SORT_PROPERTIES_ALPHABETICALLY as mentioned in the answers of this question to achieve stable serialization (together with the option ORDER_MAP_ENTRIES_BY_KEYS that you already mentioned).
Some more background is explained in these emails:
"... in absence of explicit declaration (via #JsonPropertyOrder), and general mechanisms (sort-alphabetically; Object Id and Type Id preceding other properties), there is no defined ordering."
and
"One thing to note is that Oracle did change behavior of JDK 7, such that previously stable ordering of methods and fields returned by Introspection become arbitrary." ... "Existing default ordering is based on simple traversal and ordering that JDK provides; ...".
(copied from the email from Tatu Saloranta).
Related
I've read the thread How to set DynamoDB range key, String or Map. However after some thought, I came up with more consideration.
I use jackson for serialize & deserialize object in spring app. This app was created by someone else and it's in my hand now to enhance it after its initial release.
There are 2 options to generate the partition key :
Use delimiter ( e.g. value1#value2 )
serialize to JSON ( e.g. {"field1":"value1","field2":"value2"} )
Delimiter-approach:
If I use delimiter, I can create a delimiter that very unlikely (in my case) shows up as part of value1 or value2. This approach will generate shorter string than using jackson. Also the official DynamoDB documentation give example with this approach.
However this approach means I need to document the order and rule. In case other app (e.g. Node-based or Go use the same DB, I need to tell them to read the docs.
Jackson-approach:
If I use jackson, it's quite self documented, I can use #JsonFormat to maintain formatting consistency, I can use #JsonFilter to pick which fields I want to serialize for this case. I can use #JsonProperty to shorten field name (although I doubt that I'll need to shorten field name), and also the implementation will be more robust.
But then again, in my case there's already a table that already implemented the delimiter approach. It means that there will be two different formatting needs to be written on the app documentation. Also I think that a key generation should be very explicit.
I'm a bit pros to delimiter-approach for the sake of consistent formatting, but then again I might miss or haven't see the good things from Jackson-approach, especially in the future of this project development.
My question is : should I use the delimiter-approach or the other?
Does anyone know a reflection based Java object graph serializer, which stores the fields identified by field order instead name of the field? This is what I want to do:
load a JSON file with Jackson JSON deserializer
save it in binary format which doesn't contain the field names...
load the previously serialized object with the OBFUSCATED version of the application.
The serialized content won't be transferred to any other JVM. Excluding serialized POJOs from obfuscation is not an option for now.
Protostuff by default orders fields from top to bottom as defined in your pojo. You can additionally control the field number using annotations.
Note that the order is not guaranteed on some (non-sun) vms (especially dalvik).
Sun jdk6 or higher is recommended for guaranteed ordering.
Note: Due to the lack of questions like this on SO, I've decided to put one up myself as a Q&A
Serializing objects (using an ObjectOutputStream and an ObjectInputStream) is a method for storing an instance of a Java Object as data that can be later deserialized for use. This can cause problems and frustration when the Class used to deserialize the data does not remain the same (source-code changes; program updates).
So how can an Object be serialized and deserialized with an updated / downgraded version of a Class?
Here are a few common ways of serializing an object that can be deserialized in a backwards-compatible way.
1. Store the data in the JSON format using import and export methods designed to save all fields needed to recreate the instance. This can be made backwards-compatible by including a version key that allows for an update algorithm to be called if the version is too low. A common library for this is the Google Gson library which can represent Java objects in JSON as well as normally editing a JSON file.
2. Use the built-in java Properties class in a way similar to the method described above. Properties objects can be later stored using a stream (store()) written as a regular Java Properties file, or saved in XML (storeToXML()).
3. Sometimes simple objects can be easily represented with key-value pairs in a place where storing them in a JSON, XML, or Properties file is either too complicated or not neccessary (overkill one could say). In this case, an effective way of serializing the object could be using the ObjectOutputStream class to serialize a HashMap object containing key-value pairs where the key could be a String and the value could be an Object (HashMap<String,Object>). This allows for all of the object's fields to be stored as well as including a version key while providing much versatility.
Note: Although serializing an object using the ObjectOutputStream for persistence storage is normally considered bad convention, it can be used either way as long as the class' source code remains the same.
Also Note about versioning: Changes to a class can be safely made without disrupting deserialization using an ObjectOutputStream as long as they are a compatible change. As mentioned in the Versioning of Serializable Objects chapter of the Object Serialization Specification:
A compatible change is a change that does not affect the contract
between the class and its callers.
I have a JSON object which I have constructed within my Java program.
JSONObject jObj = {"AAA:aaa","BBB:bbb","CCC:ccc"}
I am sending this object to a server in which it expects the JSON object in the following type.
{"BBB:bbb", "AAA:aaa", "CCC:ccc"}
My question is that does the order of the JSON object really matters on the server side? If yes, how can I change the order?
My question is that does the order of the JSON object really matters on the server side?
It should not matter. According to various JSON specifications, the order of the attributes is not significant. For example:
"An object is an unordered set of name/value pairs." (Source json.org)
"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array." (Source RFC 7159)
Unfortunately, there are nitwits out there1 who ignore that aspect of the specs, and place some significance on the order of the attributes. (The mistake is usually made when there is a disconnect between the people specifying the APIs and those implementing them, and the people doing the specification work don't really understand JSON.)
Fortunately, the chances are that whoever designed / implemented the server didn't make that mistake. Most Java JSON parsers I've come across don't preserve the attribute order when parsing ... by default2. It would be hard to accidentally implement a server where the order of the JSON attributes being parsed was significant.
If yes, how can i change the order?
With difficulty, I fear:
You could generate the JSON by hand.
There is at least one JSON for java implementation3 that allows you to supply the Map object that holds a JSON object's attributes. If you use a LinkedHashMap or TreeMap, it should retain the insertion order or the lexical order of the attribute keys.
1 - For example, the nitwits that this poor developer was working for ... https://stackoverflow.com/a/4515863/139985
2 - RFC 7159 also says this: "JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.". By my reading, this recommends that JSON libraries should hide any order of the pairs from application code.
3 - JSON-simple : https://code.google.com/p/json-simple/. There could be others too.
IMHO not possible.
JSON docs says
An object is an unordered set of name/value pairs
So the way is getting the values in required order,rather than ordering json
You could use list assuming your server can accept it:
{"list": [ {"AAA":"aaa"},{"BBB":"bbb"},{"CCC":"ccc"}]}
The other answers rightly point out that the order should not matter. There are circumstances were the order may matter in a specific implementation that misunderstands the unordered nature of JSON.
For example say you want take a hash of the JSON string and store the hash for comparison against future hashes. The hash would be different if the order of the fields in the JSON string is not the same the next time you create the hash (even thought the data in the JSON string is the same).
This can happen if you're working with an API or a deserializer that returns JSON strings, with the fields in an inconsistent order.
This question more thoroughly discusses that issue and provides solutions to getting a consistent order JSON order mixed up
The order of fields in a JSON object actually can matter. It depends on the serializer you are using. For example, when you serialize an inherited object you will get an extra JSON field called type=''. When you deserialize it the type field must come before any other JSON Field, otherwise it takes on the type of the parent.
I have two sets both containing the same object types. I would like to be able to access the following:
the intersection of the 2 sets
the objects contained in set 1 and not in set 2
the objects contained in set 2 and not in set 1
My question relates to how best to compare the two sets to acquire the desired views. The class in question has numerous id properties which can be used to uniquely identify that entity. However, there are also numerous properties in the class that describe the current status of the object. The two sets can contain objects that match according to the ids, but which are in a different state (and as such, not all properties are equal between the two objects).
So - how do I best implement my solution. To implement an equals() method for the class which does not take into account the status properties and only looks at the id properties would not seem to be very true to the name 'equals' and could prove to be confusing later on. Is there some way I can provide a method through which the comparisons are done for the set methods?
Also, I would like to be able to access the 3 views described above without modifying the original sets.
All help is much appreciated!
(Edit: My first suggestion has been removed because of an unfortunate implementation detail in TreeSet, as pointed out by Martin Konecny. Some collection classes (e.g. TreeSet) allow you to supply a Comparator that is to be used to compare elements, so you might want to use one of those classes - at least, if there is some natural way of ordering your objects.)
If not (i.e. if it would be difficult to implement CompareTo(), while it would be simpler to implement HashCode() and Equals()), you could create a wrapper class which implements those two functions by looking at the relevant fields from the objects they wrap, and create a regular HashSet of these wrapper objects.
Short version: implement equals based on the entity's key, not state.
Slightly longer version: What the equals method should check depends on the type of object. For something that's considered a "value" object (say, an Integer or String or an Address), equality is typically based on all fields being the same. For an object with a set of fields that uniquely identify it (its primary key), equality is typically based on the fields of the primary key only. Equality doesn't necessarily need to (and often shouldn't) take in to consideration the state of an object. It needs to determine whether two objects are representations of the same thing. Also, for objects that are used in a Set or as keys in a Map, the fields that are used to determine equality should generally not be mutable, since changing them could cause a Set/Map to stop working as expected.
Once you've implemented equals like this, you can use Guava to view the differences between the two sets:
Set<Foo> notInSet2 = Sets.difference(set1, set2);
Set<Foo> notInSet1 = Sets.difference(set2, set1);
Both difference sets will be live views of the original sets, so changes to the original sets will automatically be reflected in them.
This is a requirement for which the Standard C++ Library fares better with its set type, which accepts a comparator for this purpose. In the Java library, your need is modeled better by a Map— one mapping from your candidate key to either the rest of the status-related fields, or to the complete object that happens to also contain the candidate key. (Note that the C++ set type is mandated to be some sort of balanced tree, usually implemented as a red-black tree, which means it's equivalent to Java's TreeSet, which does accept a custom Comparator.) It's ugly to duplicate the data, but it's also ugly to try to work around it, as you've already found.
If you have control over the type in question and can split it up into separate candidate key and status parts, you can eliminate the duplication. If you can't go that far, consider combining the candidate key fields into a single object held within your larger, complete object; that way, the Map key type will be the same as that candidate key type, and the only storage overhead will be the map keys' object references. The candidate key data would not be duplicated.
Note that most set types are implemented as maps under the covers; they map from the would-be set element type to something like a Boolean flag. Apparently there's too much code that would be duplicated in wholly disjoint set and map types. Once you realize that, backing up from using a set in an awkward way to using a map no longer seems to impose the storage overhead you thought it would.
It's a somewhat depressing realization, having chosen the mathematically correct idealized data structure, only to find it's a false choice down a layer or two, but even in your case your problem sounds better suited to a map representation than a set. Think of it as an index.