How to safely serialize a lambda?

How to safely serialize a lambda? - java

Although it is possible to serialize a lambda in Java 8, it is strongly discouraged; even serializing inner classes is discouraged. The reason given is that lambdas may not deserialize properly on another JRE. However, doesn't this mean that there is a way to safely serialize a lambda?
For example, say I define a class to be something like this:
public class MyClass {
private String value;
private Predicate<String> validateValue;
public MyClass(String value, Predicate<String> validate) {
this.value = value;
this.validateValue = validate;
}
public void setValue(String value) {
if (!validateValue(value)) throw new IllegalArgumentException();
this.value = value;
}
public void setValidation(Predicate<String> validate) {
this.validateValue = validate;
}
}
If I declared an instance of the class like this, I should not serialize it:
MyClass obj = new MyClass("some value", (s) -> !s.isEmpty());
But what if I made an instance of the class like this:
// Could even be a static nested class
public class IsNonEmpty implements Predicate<String>, Serializable {
#Override
public boolean test(String s) {
return !s.isEmpty();
}
}
MyClass isThisSafeToSerialize = new MyClass("some string", new IsNonEmpty());
Would this now be safe to serialize? My instinct says that yes, it should be safe, since there's no reason that interfaces in java.util.function should be treated any differently from any other random interface. But I'm still wary.

It depends on which kind of safety you want. It’s not the case that serialized lambdas cannot be shared between different JREs. They have a well defined persistent representation, the SerializedLambda. When you study, how it works, you’ll find that it relies on the presence of the defining class, which will have a special method that reconstructs the lambda.
What makes it unreliable is the dependency to compiler specific artifacts, e.g. the synthetic target method, which has some generated name, so simple changes like the insertion of another lambda expression or recompiling the class with a different compiler can break the compatibility to existing serialized lambda expression.
However, using manually written classes isn’t immune to this. Without an explicitly declared serialVersionUID, the default algorithm will calculate an id by hashing class artifacts, including private and synthetic ones, adding a similar compiler dependency. So the minimum to do, if you want reliable persistent forms, is to declare an explicit serialVersionUID.
Or you turn to the most robust form possible:
public enum IsNonEmpty implements Predicate<String> {
INSTANCE;
#Override
public boolean test(String s) {
return !s.isEmpty();
}
}
Serializing this constant does not store any properties of the actual implementation, besides its class name (and the fact that it is an enum, of course) and a reference to the name of the constant. Upon deserialization, the actual unique instance of that name will be used.
Note that serializable lambda expressions may create security issues because they open an alternative way of getting hands on an object that allows to invoke the target methods. However, this applies to all serializable classes, as all variant shown in your question and this answer allow to deliberately deserialize an object allowing to invoke the encapsulated operation. But with explicit serializable classes, the author is usually more aware of this fact.

Related

Double Dispatch and inheritance

I have a number of dumb object classes that I would like to serialize as Strings for the purpose of out-of-process storage. This is a pretty typical place to use double-dispatch / the visitor pattern.
public interface Serializeable {
<T> T serialize(Serializer<T> serializer);
}
public interface Serializer<T> {
T serialize(Serializeable s);
T serialize(FileSystemIdentifier fsid);
T serialize(ExtFileSystemIdentifier extFsid);
T serialize(NtfsFileSystemIdentifier ntfsFsid);
}
public class JsonSerializer implements Serializer<String> {
public String serialize(Serializeable s) {...}
public String serialize(FileSystemIdentifier fsid) {...}
public String serialize(ExtFileSystemIdentifer extFsid) {...}
public String serialize(NtfsFileSystemIdentifier ntfsFsid) {...}
}
public abstract class FileSystemIdentifier implements Serializeable {}
public class ExtFileSystemIdentifier extends FileSystemIdentifier {...}
public class NtfsFileSystemIdentifier extends FileSystemIdentifier {...}
With this model, the classes that hold data don't need to know about the possible ways to serialize that data. JSON is one option, but another serializer might "serialize" the data classes into SQL insert statements, for example.
If we take a look at the implementation of one of the data classes, the impementation looks pretty much the same as all the others. The class calls the serialize() method on the Serializer passed to it, providing itself as the argument.
public class ExtFileSystemIdentifier extends FileSystemIdentifier {
public <T> T serialize(Serializer<T> serializer) {
return serializer.serialize(this);
}
}
I understand why this common code cannot be pulled into a parent class. Although the code is shared, the compiler knows unambiguously when it is in that method that the type of this is ExtFileSystemIdentifier and can (at compile time) write out the bytecode to call the most type-specific overload of the serialize().
I believe I understand most of what is happening when it comes to the V-table lookup as well. The compiler only knows the serializer parameter as being of the abstract type Serializer. It must, at runtime, look into the V-table of the serializer object to discover the location of the serialize() method for the specific subclass, in this case JsonSerializer.serialize()
The typical usage is to take a data object, known to be Serializable and serialize it by giving it to a serializer object, known to be a Serializer. The specific types of the objects are not known at compile time.
List<Serializeable> list = //....
Serializer<String> serializer = //....
list.stream().map(serializer::serialize)
This instance works similar to the other invocation, but in reverse.
public class JsonSerializer implements Serializer<String> {
public String serialize(Serializeable s) {
s.serialize(this);
}
// ...
}
The V-table lookup is now done on the instance of Serializable and it will find, for example, ExtFileSystemIdentifier.serialize. It can statically determine that the closest matching overload is for Serializer<T> (it just so happens to also be the only overload).
This is all well and good. It achieves the main goal of keeping the input and output data classes oblivious to the serialization class. And it also achieves the secondary goal of giving the user of the serialization classes a consistent API regardless of what sort of serialization is being done.
Imagine now that a second set of dumb data classes exist in a different project. A new serializer needs to be written for these objects. The existing Serializable interface can be used in this new project. The Serializer interface, however, contains references to the data classes from the other project.
In an attempt to generalize this, the Serializer interface could be split into three
public interface Serializer<T> {
T serialize(Serializable s);
}
public interface ProjectASerializer<T> extends Serializer<T> {
T serialize(FileSystemIdentifier fsid);
T serialize(ExtFileSystemIdentifier fsid);
// ... other data classes from Project A
}
public interface ProjectBSerializer<T> extends Serializer<T> {
T serialize(ComputingDevice device);
T serialize(PortableComputingDevice portable);
// ... other data classes from Project B
}
In this way, the Serializer and Serializable interfaces could be packaged and reused. However, this breaks the double-dispatch and it results in an infinite loop in the code. This is the part I'm uncertain about in the V-table lookup.
When stepping through the code in a debugger, the issue arises when in the data class' serialize method.
public class ExtFileSystemIdentifier implements Serializable {
public <T> T serialize(Serializer<T> serializer) {
return serializer.serialize(this);
}
}
What I think is happening is that at compile time, the compiler is attempting to select the correct overload for the serialize method, from the available options in the Serializer interface (since the compiler knows it only as a Serializer<T>). This means by the time we get to the runtime to do the V-table lookup, the method being looked for is the wrong one and the runtime will select JsonSerializer.serialize(Serializable), leading to the infinite loop.
A possible solution to this problem is to provide a more type-specific serialize method in the data class.
public interface ProjectASerializable extends Serializable {
<T> T serialize(ProjectASerializer<T> serializer);
}
public class ExtFileSystemIdentifier implements ProjectASerializable {
public <T> T serialize(Serializer<T> serializer) {
return serializer.serialize(this);
}
public <T> T serialize(ProjectASerializer<T> serializer) {
return serializer.serialize(this);
}
}
Program control flow will bounce around until the most type-specific Serializer overload is reached. At that point, the ProjectASerializer<T> interface will have a more specific serialize method for the data class from Project A; avoiding the infinite loop.
This makes the double-dispatch slightly less attractive. There is now more boilerplate code in the data classes. It was bad enough that obviously duplicate code can't be factored out to a parent class because it circumvented the double-dispatch trickery. Now, there is more of it and it compounds with the depth of the inheritance of the Serializer.
Double-dispatch is static typing trickery. Is there some more static typing trickery that will help me avoid the duplicated code?

as you noticed the serialize method of
public interface Serializer<T> {
T serialize(Serializable s);
}
does not make sense. The visitor pattern is there for doing case analysis but with this method you make no progress (you already know it is a Serializable), hence the inevitable infinite recursion.
What would make sense is a base Serializer interface that has at least one concrete type to visit, and that concrete type shared between the two projects. If there is no shared concrete type, then there is no hope of a Serializer hierarchy being useful.
Now if you are looking to reduce boilerplate when implementing the visitor pattern I suggest the use of a code generator (via annotation processing), eg. adt4j or derive4j.

How can I safely proxy a lambda when I don't know whether it's a lambda up-front?

We have some legacy reflected proxy generation code which basically works like this if you look at it as a black box:
Object someObject = new Anything();
Object debugObject = ProxyUtils.wrapWithDebugLogging(someObject);
wrapWithDebugLogging takes any object and overrides any method it can (final methods are obviously unfixable if you're extending a real class), intercepting it to log a message about the call, then calling the real method.
Inside, it's using cglib to do the work and has a bit of protective logic before it constructs the proxy, because anonymous classes are final, yet can be handled by using the superclass or single interface they implement:
Class<?> clazz = someObject.getClass();
Class<?> interfaces = clazz.getInterfaces();
// Anonymous classes are final so you can't extend them, but we know they only have one
// superclass or one interface.
if (clazz.isAnonymousClass()) {
clazz = interfaces.length > 0 ?
interfaces[0] : primaryType.getSuperclass();
}
Enhancer enhancer = new Enhancer();
if (clazz.isInterface()) {
interfaces.add(clazz);
} else {
enhancer.setSuperclass(primaryType);
}
enhancer.setInterfaces(interfaces.toArray(new Class[interfaces.size()]));
The problem is that Java 8's "lambda" classes return false for isAnonymousClass(). But we would like to treat them exactly the same as the anonymous class.
It has been pointed out before that there is no way to determine that a class is a lambda class "by design". But this just seems more like something lacking in the reflection API to me and it certainly isn't the first time Java has "forgotten" to add something obvious to a new API.
So is there a sensible way to distinguish a lambda from a non-lambda without having this feature in the API? I can see that isSynthetic() returns true, but it also returns true for all kinds of other things, presumably.

You shouldn’t create conditional code depending on the question whether a class was generated for a lambda or not. After all, only the properties of the class matter.
The class is final so yo can’t subclass it. Even if it wasn’t final, subclasses weren’t possible due to the fact that it has only private constructors. And it implements an interface. These are the relevant properties of the class.
It’s not unrealistic to encounter the same scenario without any lambda expressions:
final class NotALambda implements Function<String,String> {
public static final Function<String,String> INSTANCE=new NotALambda();
private NotALambda() {}
public String apply(String t) {
return t.toLowerCase();
}
}
Why do you want to treat this class different from a class generated via
Function<String,String> f=String::toLowerCase;? It has the same properties and the same obstacles for creating a proxy. And in the comments you said you want to make a difference based on the question whether the method is declared final or not. This makes even lesser sense as I could add a final modifier to the method in the above example without changing anything, neither the semantic nor the difficulties you will face when creating a proxy.

I don't think this limitation (if you want to call it that) should be an issue.
If you're properly programming to interfaces where appropriate, making your proxy have a superclass (other than Object) becomes unnecessary. Set the proxy's superclass when it is available (no private constructor, not final, etc.).
In all cases, all you need to do is capture the proxied object, the target, intercept all method invocations on it, do your logging, and then delegate or route the invocation to it.
public static Object wrapWithDebugLogging(Object target) {
... // prepare the enhancer as described above
enhancer.setCallback(new MethodInterceptor() {
#Override
public Object intercept(Object obj, Method method, Object[] args, MethodProxy proxy) throws Throwable {
logger.debug("Some useful logging message.");
return method.invoke(target, args);
}
});
return enhancer.create();
}
It doesn't matter if the method invoked was final, you're not trying to override it, you're just intercepting it.

Here are the results of a quick experiment. The foo method just outputs getClass().toString() of its argument. (getClass().getName() outputs the same results.)
foo (new Runnable() {
#Override
public void run() {
System.out.println("run");
}
});
output: Test32$1
foo (() -> System.out.println("run"));
output: Test32$$Lambda$1/640070680
No guarantees about how portable this is.

Why is Optional<T> declared as a final class?

I was playing with the following question: Using Java 8's Optional with Stream::flatMap and wanted to add a method to a custom Optional<T> and then check if it worked.
More precise, I wanted to add a stream() to my CustomOptional<T> that returns an empty stream if no value is present, or a stream with a single element if it is present.
However, I came to the conclusion that Optional<T> is declared as final.
Why is this so? There are loads of classes that are not declared as final, and I personally do not see a reason here to declare Optional<T> final.
As a second question, why can not all methods be final, if the worry is that they would be overridden, and leave the class non-final?

According to this page of the Java SE 8 API docs, Optional<T> is a value based class. According to this page of the API docs, value-based classes have to be immutable.
Declaring all the methods in Optional<T> as final will prevent the methods from being overridden, but that will not prevent an extending class from adding fields and methods. Extending the class and adding a field together with a method that changes the value of that field would make that subclass mutable and hence would allow the creation of a mutable Optional<T>. The following is an example of such a subclass that could be created if Optional<T> would not be declared final.
//Example created by #assylias
public class Sub<T> extends Optional<T> {
private T t;
public void set(T t) {
this.t = t;
}
}
Declaring Optional<T> final prevents the creation of subclasses like the one above and hence guarantees Optional<T> to be always immutable.

As others have stated Optional is a value based class and since it is a value based class it should be immutable which needs it to be final.
But we missed the point for this. One of the main reason why value based classes are immutable is to guarantee thread safety. Making it immutable makes it thread safe. Take for eg String or primitive wrappers like Integer or Float. They are declared final for similar reasons.

Probably, the reason is the same as why String is final; that is, so that all users of the Optional class can be assured that the methods on the instance they receive keep to their contract of always returning the same value.

Though we could not extend the Optional class, we could create our own wrapper class.
public final class Opt {
private Opt() {
}
public static final <T> Stream<T> filledOrEmpty(T t) {
return Optional.ofNullable(t).isPresent() ? Stream.of(t) : Stream.empty();
}
}
Hope it might helps you. Glad to see the reaction!

The right and wrong approach to writing Java Enums

Recently, I've discovered this code of the following structure:
Interface:
public interface Base<T> {
public T fromValue(String v);
}
Enum implementation:
public enum AddressType implements Base<AddressType> {
NotSpecified("Not Specified."),
Physical("Physical"),
Postal("Postal");
private final String label;
private AddressType(String label) {
this.label = label;
}
public String getLabel() {
return this.label;
}
#Override
public AddressType fromValue(String v) {
return valueOf(v);
}
}
My immediate reaction is that one cannot create an instance of an enum by deserialization or by reflection, so the fromValue() should be static.
I'm not trying to start a debate, but is this correct? I have read, Why would an Enum implement an interface, and I totally agree with the answers provided, but the above example is invalid.
I am doing this because the "architect" doesn't want to take my answer, so this is to create a strong argument (with facts) why the above approach is good/bad.

Your Base interface does not declare valueOf and the fromValue method is indeed implemented. I see no reason why this code should not compile. If you are referring to the valueOf call inside fromValue, that is a call of a static method defined for every enum. I would have to agree, though, that the design of it is quite misguided as you need an arbitrary member of the enum just to call fromValue and get the real member.
On the other hand, in a project that I'm doing right now I have several enums implementing a common interface. This because the enums are related and I want to be able to treat them uniformly with respect to their common semantics.

In my opinion this design is wrong. In order to use valueFrom() one has to get an instance of this enum beforehand. Thus, it will look like:
AddressType type = AddressType.Postal.valueFrom("Physical");
What sense does it make?

Your Base interface seems to serve a whole other purpose (if any).
It is probably meant to be a String-to-T-converter, since it generates a T from a String. The enum is simply wrong if it implements this interface (#yegor256 already pointed out why). So you can keep the enum and you can have some AddressTypeConverter implements Base<AddressType> which calls AddressType.valueOf() in its fromString() method.
But don't get me wrong: enums implementing interfaces are NOT a bad practice, it's just this particular usage that is completely wrong.

How to remove the dependency on a Java enum's values?

[Mind the gap: I know that the best solution would be to get rid of the enum completely, but that's not an option for today as mentioned in the comments, but it is planned for the (far) future.]
We have two deployment units: frontend and backend. The frontend uses an enum and calls an EJB service at the backend with the enum as a parameter. But the enum changes frequently, so we don't want the backend to know its values.
String constants
A possible solution would be to use String constants insteadof enums, but that would cause a lot of little changes at the frontend. I'm searching a solution, which causes as few changes as possible in the frontend.
Wrapper class
Another solution is the usage of a wrapper class with the same interface as an enum. The enum becomes an wrapper class and the enum values become constants within that wrapper. I had to write some deserialization code to ensure object identity (as enums do), but I don't know if it is a correct solution. What if different classloaders are used?
The wrapper class will implement a Java interface, which will replace the enum in the backend. But will the deserialiaztion code execute in the backend even so?
Example for a wrapper class:
public class Locomotion implements Serializable {
private static final long serialVersionUID = -6359307469030924650L;
public static final List<Locomotion> list = new ArrayList<Locomotion>();
public static final Locomotion CAR = createValue(4654L);
public static final Locomotion CYCLE = createValue(34235656L);
public static final Locomotion FEET = createValue(87687L);
public static final Locomotion createValue(long type) {
Locomotion enumValue = new Locomotion(type);
list.add(enumValue);
return enumValue;
}
private final long ppId;
private Locomotion(long type) {
this.ppId = type;
}
private Object readResolve() throws ObjectStreamException {
for (Locomotion enumValue : list) {
if (this.equals(enumValue)) {
return enumValue;
}
}
throw new InvalidObjectException("Unknown enum value '" + ppId + "'");
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (ppId ^ (ppId >>> 32));
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (!(obj instanceof Locomotion)) {
return false;
}
Locomotion other = (Locomotion) obj;
if (ppId != other.ppId) {
return false;
}
return true;
}
}
Did you already had the same problem? How did you solved it?

Ok, let me see if I understand. You said that
"The frontend uses an enum and calls
an EJB service at the backend with the
enum as a parameter. But the enum
changes frequently, so we don't want
the backend to know its values"
When you say "values" I assume you are referring to the numeric value you pass in the enum constructor and not to the enum constants themselves.
Therefore, this implies that the frontend and the backend will have two different versions of the enum class, but the enum constants in them will be the same.
I am only assuming the communication is via RMI (but this is not entirely clear in your post).
Now, serialization/deserialization of enums works different than with other objects. According to the Java Serialization Specification, when a enum is serialized, only its name is serialized. And when it is deserialized, it is built using the Enum.valueOf(name) method.
So, your original wrapper proposal would not work, because the server, due to stipulated serialization of Enums will never know the actual value of the enums in the client.
Bottom line, if you intend to pass an enum to the server there is no possible way to do what you pretend to do because the values in the frontend will never reach the backend if serialization is implied.
If RMI is implied, a good solution would be to use code mobility, this way you could place the problematic class in a repository accessible to both, server and client, and when the frontend developers change the class definition, you can publish the class in the repository and the server can get it from there.
See this article about dynamic code downloading using code base property in RMI
http://download.oracle.com/javase/6/docs/technotes/guides/rmi/codebase.html
Another possible solution is that you could stop using a Java Enum and use Java class with final constants, as we used to do in the old days before enums, and that way you can ensure that its values will be properly serialized when they are are sent to the backend.
Somewhat like this
public class Fruit implements Serializable{
private static final long serialVersionUID = 1L;
public final Fruit ORANGE = new Fruit("orange");
public final Fruit LEMON = new Fruit("lemon");
private String name;
private Fruit(String name){
this.name = name;
}
}
This way you can be in full control of what happens upon deserialization and your wrapper pattern might work this way.
This type of construction cannot substitute an enum completely, for instance, it cannot be used in switch statements. But, if this is an issue, you could use this object as the parameter sent to the server, and let the server rebuild the enum out of it with its version of the enum class.
Your enum, therefore, could have two new methods, one to build Java instances out of the enum itself:
public static Fruit toFruit(FruitEnum enum);
public FruitEnum valueOf(Fruit fruit);
And you can use those to convert back and forth versions of the parameter for the server.

It's an odd request, as i would think the server should know about the values of what is going into the database, but ok, i'll play along. Perhaps you could do this
public enum Giant {Fee, Fi, Fo, Fum};
public void client() {
Giant giant = Giant.Fee;
server(giant);
}
public void server(Enum e) {
String valueForDB = e.name();
//or perhaps
String valueForDB = e.toString();
}

For data transfer between frontend and backend both need to use the same class versions because of possible serialization during marshalling parameters. So again they have to know exactly the same enums or whatever other classes you try to use. Switching enums to something different won't work either. You have to set on a known class identiy for both.
So if the server should do actions based on some kind of processing/calculating the values of the parameters use strings or whatever other non-changing class you decide on and put your values inside: string of characters, array of numbers or whatever.
So if you put your database id inside the wrapper object the server will be able to get the objects out of the database. But still - they both need exact the same version of the wrapper class in their classpaths.

Okay, I can't be too exact because I don't see your code but in my experience something that changes like that should be external data, not enums.
What I almost always find is that if I externalize the information that was in the enums, then I have to externalize a few other pieces as well, but after doing it all I end up factoring away a LOT of code.
Any time you actually use the values of an enum you are almost certainly writing duplicate code. What I mean is that if you have enums like "HEARTS", "DIAMONDS"...
The ONLY way they can be used in your code is in something like a switch statement:
switch(card.suit)
case Suit.HEARTS:
load_graphic(Suit.HEARTS);
// or better yet:
Suit.HEARTS.loadGraphic();
break;
case Suit.SPADES:
Suit.SPADES.loadGraphic();
...
Now, this is obviously stupid but I made the stupid constraint to say that you USED the values in the code. My assertion is that if you don't USE the values you don't need an enum--Let's not use the values in code and see:
card.suit.loadGraphic();
Wow, all gone. But suddenly, the entire point of using an enum is gone--instead you get rid of the whole class preload a "Suit" factory with 4 instances from a text file with strings like "Heart.png" and "Spade.png".
Nearly every time I use enums I end up factoring them out like this.
I'm not saying there isn't any code that can benefit from enums--but the better that I get at factoring code and externalizing data, the less I can imagine really needing them.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to safely serialize a lambda? - java

Related

Double Dispatch and inheritance

How can I safely proxy a lambda when I don't know whether it's a lambda up-front?

Why is Optional<T> declared as a final class?

The right and wrong approach to writing Java Enums

How to remove the dependency on a Java enum's values?

Categories

Resources