We have the following scenario:
There is a program (already written) in Java which runs on a server (in the web). Let's call it JavaServerProgram. It takes user input, calculates stuff and finally generates a bunch classes. Let's call these classes JavaClasses. All these classes are serializable to json.
There is a library (already written) in C# that contains many classes describing a tree-like data structure. Let's call it C#Data. Let's call the root class C#Root. All classes in C#Dataare (de-)serializable to/from json.
The bunch of JavaClasses that JavaServerProgram outputs can be converted into a C#Root instance. We have a library written in C# for that which takes json representations of the JavaClasses as input and creates a C#Root instance. Let's call it C#Convert. This library will always be needed by another project; i.e. it can not be discontinued.
There is a program (already written) in C# that takes an instance of C#Root and does some actions (like shown a GUI, modifying files, ...) on a client. Let's call it C#ClientRun.
The workflow should be like this:
JavaServerProgram runs on the server and outputs JavaClasses.
The JavaClasses are converted into a C#Root instance on the server.
C#ClientRun gets the C#Root instance as input and runs on the client.
The question is, how do we implement the whole thing?
Version A:
We use all already existing programs and libraries. That means:
We modify JavaServerProgram so that after creation of the JavaClasses, it serializes them into jsons and outputs them.
We write a C# program that takes the json representations of the JavaClasses from JavaServerProgram as input, uses C#Convert to create a C#Root instance, serializes the C#Root instance to json and outputs it.
Then, after JavaServerProgram has run, we run that C# program and finally send the resulting C#Root json to the client where it will be derserialized into a C#Root instance and input into C#Run.
Pro: We use existing code.
Con: We have overhead due to
the conversion being an own program (it takes time and memory for the OS to manage it),
"media discontinuity": Instead of directly converting the JavaClasses into C#Root, we must serialize them to json (in Java) to be able to send them to the converter. (The converter does NOT deserialize them to JavaClasses, though. It processes the jsons directly.)
Version B:
We make a Java clone of C#Covernt, i.e.
we duplicate all classes from C#Data in Java as well as their ability to be serialized to json,
we duplicate the conversion algorithm in Java but without using jsons in-between, i.e. we directly convert from JavaClasses to C#Root.
Then we extend JavaServerProgram to contain the above clone, i.e. after creation of the JavaClasses it converts them into a C#Root instance and serializes it to one json. Then we send that C#Root json to the client where it will be derserialized into a C#Root instance and input into C#Run.
Pro: We have no own-program-overhead and no "media discontinuity".
Con: We need to maintain the C#Data classes and the conversion algorithm in two languages (C# and Java).
Version C:
We find a way to write code both in C# and Java but compile it to some common intermediate language that runs in one shared environment (like JVM / .NET-VM).
Pro: No duplicate code and no overhead/"media discontinuity".
Con: Cannot see any. (The time needed to get to know this new environment does not count as a con since it will be just invested once.)
Can anyone elaborate pros and cons from a practical perspective? Like:
Version A: Will the expected overhead be relevant? Or is it going to be small?
Version B: Is maintaining duplicate code in different languages practical? Is it common? Are there tools to assist? (Maybe there are tools that can automatically convert from C# to Java?)
Version C: Does such an environment as described exist? Which one? Has anyone experience with it?
I would probably prefer the Version B alternative.
From my understanding that will create a clean separation of concerns between java and c#, i.e. all Java runs on the server, all c# on the client. They will also share a common object model of the objects that need to be transferred. Note that the data format should be clearly documented, so it is obvious what side is incorrect if there are any issues.
You might also consider making a entirely separate API between the client and server, even if it may happen to look very similar to some existing data structures. That could let you evolve the API without necessarily needing to affect other systems.
But your question implies that this library is used in other contexts, so I would probably recommend figuring out what language to use in what situations. Otherwise you will keep running into problems like "Code from project K would be perfect for project L, but is in the wrong language". As well as risking various employment issues. I.e. holy language wars, conflicts, knowledge gaps, extra training, recruitment difficulties etc.
Version A will do extra work on the server that might or might not cause a performance overhead, but more importantly it will make debugging more difficult, since it might be difficult to tell if the error is in the java code or the conversion code. And the server developer may not be able to debug the c# code efficiently.
Version C is a nice thought, but even if it is possible it would be a uncommon solution. So you will likely have much more issues with build systems, compatibility and finding help when there are issues.
Related
Is it possible to integrate the two worlds at least on the data transfer level?
Say i have Java objects which are provided through a Spring WebMVC REST endpoint and my Dart client access these resources with AJAX. It would be nice if the transfered type would be known by the Dart client aswell so i don't have to synchronize the Dart and Java version of the same type definition but the IDE could give me suggestion and errors if the data access on the client side is invalid.
EDIT
A little more explanation what i'm trying to do because it seems i was not clear enough:
On the Java side i define a bean which is converted to JSON by Spring WebMVC + Jackson. So the transfer unit IS already JSON. I can easily access data with Dart dynamically but that's not what i want to do.
I want to parse the retreived Data to a Dart class which as it turns out being a replicate of the original Java bean's class definition. Take a look at JsonObject's explanation on Dart's site, especially the Language abstract class. That's exactly what i'm doing right now. I'm defining an abstract class which defines the JSON data i'm retreiving from the server. This way Dart can give syntax errors if i'm accessing non existing fields or doing incompatible casts, etc. Of course this can still yield into a parse error but after that i can work with the data in a typed manner.
My problem is that to achieve this i have to manually synchronize the data bean's class definition on the Java side and the abstract class definition on the Dart side. I'm wondering if there's somebody working on creating something like a code generator which creates a Dart class definition from a Java class definition or so.
You are asking for Editor feature which would quitely lead to performance degradation of the editor. For any such feature the Editor might need to build/maintain Object Graph/Syntax Tree ( would this use AST in java world ) for java objects and then compare it with Dart's Syntax Tree.
Also different languages will put forward the same request example C#, Ruby, etc. There does not seem to be any sane way to validate the objects from different programming world within performance limits.
I can borrow some more points from below stackoverflow q/a on why its simpler to use JSON/XML rather than any other way to exchange data between java/c# world to dart world -\
How does Java's serialization work and when it should be used instead of some other persistence technique?
Read a BigInteger serialized from Java into C#
I want to send some complex objects from a Java client to C server via a TCP Socket.
How can I do that ?
Fundamentally the question is, "How to serialize/deserialize objects in a
language agnostic manner?" Specifically Java and C in your case. Since you'll
be sending this data over a network, it is also important to take care of network order/endianness issues.
I assume you have access to both the the client and the server. This means you
get to choose how to serialize the data. (If not, the answer is simple. Write
to the specs of what the other is expecting)
Personally, I would use Protocol Buffers.
There are Java bindings
and C bindings.
If you don't like Protocol Buffers, there are other options like:
JSON (already mentioned)
YAML
Apache Thrift
XDR
roll your own
...
Write the fields of the Java objects to a string (perhaps JSON), send them via TCP, and have the C program read the string and use it to initialize new C variables on the other end.
This question is pretty old, but just in case some one is still looking for a good solution, you can try out the protocol buffers implementation, as mentioned in the previous answer by #Adam Liss: (developers.google.com/protocol-buffers/)
In short, you define any complex message type as in your protocol implementation, and the tool generates C++/Java/Python code which can serialize and deserialize it.
For the same purpose using C code, a research project at the Technische Universität München (TUM) Germany have created a code generator in standard C, that can be used with embedded-C projects. This is fully compatible(with limitations due to C structs) with Google's protobuf implementation. This works better than the C Bindings because it does not need any library to be linked with.
I had issues in getting the C Bindings to work on the embedded systems I was working with, because it needs to be linked with the support library.
This saved my (painful) day with my embedded project - passing complex network data(request-responses) between an embedded system and Android app(Java)/Desktop app(C++/Qt).
I'd like to create a web API of some kind (I don't have a preference for the protocol), where the server uses Java and the client uses PHP.
I want the request and response to both be objects (instances of classes, not JSON-style hashes). The objects' fields can be primitive types or other objects. I would define all the necessary classes in both the client and server code. PHP and Java have similar object models, so it shouldn't be hard to write corresponding classes in both languages.
To make this work, there would need to be some automated way to serialize an object on one side, and unserialize it on the other. It would need to know which PHP class maps to which Java class, and how to convert the fields. I could write something, but is there an existing protocol for transferring objects like this? Can this be done with SOAP?
Java and PHP objects are not interchangeable. You will have to define the object types on both ends, and the transfer protocol could be anything you like. Serialization and deserialization makes the whole process transparent. The transport medium could be JSON, XML, YAML, or anything else for that matter.
For a record-like objects:
{"_type":"MyCoolObjectType", "a":1, "b":2, "c":3"}
If you're wanting to write once and use everywhere, I'd recommend using the same language on both ends, otherwise you'll have to have a compiler that can translate between your choice languages.
A SOAP web service can handle the basic abstraction as long as the request/response is not very complex. You can create the classes in java and then get the API to export a WSDL for them.
You need to have them both serialize to the same string. The PHP format and Java format for serialization are different, and therefore incompatible. You need a common exchange format, and I recommend that you DON'T use PHP's. However, the functions to serialize in PHP are fairly simple, are contained in ext/standard/var.c file in the PHP source if you choose to use it..
See the following:
Unserialize in Java a serialized php object - A similar question to yours.
http://en.wikipedia.org/wiki/Serialization#Serialization_formats
http://en.wikipedia.org/wiki/XML
XML, API, CSV, SOAP! Understanding the Alphabet Soup of Data Exchange
From http://en.wikipedia.org/wiki/XML (emphasis mine):
Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.
I'm working on a Scala-based script language (internal DSL) that allows users to define multiple data transformations functions in a Scala script file. Since the application of these functions could take several hours I would like to cache the results in a database.
Users are allowed to change the definition of the transformation functions and also to add new functions. However, then the user restarts the application with a slightly modified script I would like to execute only those functions that have been changed or added. The question is how to detect those changes? For simplicity let us assume that the user can only adapt the script file so that any reference to something not defined in this script can be assumed to be unchanged.
In this case what's the best practice for detecting changes to such user-defined functions?
Until now I though about:
parsing the script file and calculating fingerprints based on the source code of the function definitions
getting the bytecode of each function at runtime and building fingerprints based on this data
applying the functions to some test data and calculating fingerprints on the results
However, all three approaches have their pitfalls.
Writing a parser for Scala to extract the function definitions could be quite some work, especially if you want to detect changes that indirectly affect the behaviour of your functions (e.g. if your function calls another (changed) function defined in the script).
The bytecode analysis could be another option, but I never worked with those libraries. Thus I have no idea if they can solve my problem and how they deal with Java's dynamic binding.
The approach with example data is definitely the simplest one, but has the drawback that different user-defined functions could be accidentally mapped to the same fingerprint if they return the same results for my test data.
Does someone has experience with one of these "solutions" or can suggest me a better one?
The second option doesn't look difficult. For example, with Javassist library obtaining bytecode of a method is as simple as
CtClass c = ClassPool.getDefault().get(className);
for (CtMethod m: c.getDeclaredMethod()) {
CodeAttribute ca = m.getMethodInfo().getCodeAttribute();
if (ca != null) { // i.e. if the method is not native
byte[] byteCode = ca.getCode();
...
}
}
So, as long as you assume that results of your methods depend on the code of that methods only, it's pretty straighforward.
UPDATE:
On the other hand, since your methods are written in Scala, they probably contain some closures, so that parts of their code reside in anonymous classes, and you may need to trace usage of these classes somehow.
The log4j network adapter sends events as a serialised java object. I would like to be able to capture this object and deserialise it in a different language (python). Is this possible?
NOTE The network capturing is easy; its just a TCP socket and reading in a stream. The difficulty is the deserialising part
Generally, no.
The stream format for Java serialization is defined in this document, but you need access to the original class definitions (and a Java runtime to load them into) to turn the stream data back into something approaching the original objects. For example, classes may define writeObject() and readObject() methods to customise their own serialized form.
(edit: lubos hasko suggests having a little java program to deserialize the objects in front of Python, but the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more. edit2: I could be wrong here, I don't know what gets serialized. If it's just log4j classes you should be fine. On the other hand, it's possible to log arbitrary exceptions, and if they get put in the stream as well my point stands.)
It would be much easier to customise the log4j network adapter and replace the raw serialization with some more easily-deserialized form (for example you could use XStream to turn the object into an XML representation)
Theoretically, it's possible. The Java Serialization, like pretty much everything in Javaland, is standardized. So, you could implement a deserializer according to that standard in Python. However, the Java Serialization format is not designed for cross-language use, the serialization format is closely tied to the way objects are represented inside the JVM. While implementing a JVM in Python is surely a fun exercise, it's probably not what you're looking for (-:
There are other (data) serialization formats that are specifically designed to be language agnostic. They usually work by stripping the data formats down to the bare minimum (number, string, sequence, dictionary and that's it) and thus requiring a bit of work on both ends to represent a rich object as a graph of dumb data structures (and vice versa).
Two examples are JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language).
ASN.1 (Abstract Syntax Notation One) is another data serialization format. Instead of dumbing the format down to a point where it can be easily understood, ASN.1 is self-describing, meaning all the information needed to decode a stream is encoded within the stream itself.
And, of course, XML (eXtensible Markup Language), will work too, provided that it is not just used to provide textual representation of a "memory dump" of a Java object, but an actual abstract, language-agnostic encoding.
So, to make a long story short: your best bet is to either try to coerce log4j into logging in one of the above-mentioned formats, replace log4j with something that does that or try to somehow intercept the objects before they are sent over the wire and convert them before leaving Javaland.
Libraries that implement JSON, YAML, ASN.1 and XML are available for both Java and Python (and pretty much every programming language known to man).
I would recommend moving to a third-party format (by creating your own log4j adapters etc) that both languages understand and can easily marshal / unmarshal, e.g. XML.
In theory it's possible. Now how difficult in practice it might be depends on whether Java serialization format is documented or not. I guess, it's not. edit: oops, I was wrong, thanks Charles.
Anyway, this is what I suggest you to do
capture from log4j & deserialize Java object in your own little Java program.
now when you have the object again, serialize it using your own custom formatter.
Tip: Maybe you don't even have to write your own custom formatter. for example, JSON (scroll down for libs) has libraries for Python and Java, so you could in theory use Java library to serialize your objects and Python equivalent library to deserialize it
send output stream to your python application and deserialize it
Charles wrote:
the problem is that for this
to work, your "little java program"
needs to load the same versions of all
the same classes that it might
deserialize. Which is tricky if you're
receiving log messages from one app,
and really tricky if you're
multiplexing more than one log stream.
Either way, it's not going to be a
little program any more.
Can't you just simply reference Java log4j libraries in your own java process? I'm just giving general advice here that is applicable to any pair of languages (name of the question is pretty language agnostic so I just provided one of the generic solutions). Anyway, I'm not familiar with log4j and don't know whether you can "inject" your own serializer into it. If you can, then of course your suggestion is much better and cleaner.
Well I am not Python expert so I can't comment on how to solve your problem but if you have program in .NET you may use IKVM.NET to deserialize Java objects easily. I have experimented this by creating .NET Client for Log4J log messages written to Socket appender and it worked really well.
I am sorry, if this answer does not make sense here.
If you can have a JVM on the receiving side and the class definitions for the serialized data, and you only want to use Python and no other language, then you may use Jython:
you would deserialize what you received using the correct Java methods
and then you process what you get with you Python code