Java serialization alternative with better performance [closed]

Java serialization alternative with better performance [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Suppose I use the standard Java object serialization to write/read small (< 1K) Java objects to/from memory buffer. The most critical part is the deserialization, i.e. reading Java objects from memory buffer (byte array).
Is there any faster alternative to the standard Java serialization for this case ?

You might also want to have a look at FST.
also provides tools for offheap reading/writing

have a look at kryo.
its much much faster than the built-in serialization mechanism (that writes out a lot of strings and relies heavily on reflection), but a bit harder to use.
edit: R.Moeller below suggested FST, which i've never heard of until now but looks to be both faster than kryo and compatible with java built-in serialization (which should make it even easier to use), so i'd look at that 1st

Try Google protobuf or Thrift.

The standard serialization adds a lot of type information which is then verified when the object is deserialized. When you know the type of the object you are deserializing, this is usually not necessary.
What you could do, is create your own serialization method for each class, which just writes all the values of the object to a byte buffer, and a constructor (or factory method, when you swing that way) which takes such a byte buffer and reads all the variables from it.
But just like AlexR I wonder if you really need that. Serialization is usually only needed when the data leaves the program (like getting stored on disk or sent over the network to another program).

Java's standard serialisation is known to be slow, and to use a huge ammount of bytes on disk. It is very simple to do your own custom serialisation.
javas std serialisation is nice for demo project but for above reasons not well suited for professional projects. Further versioning is not well under your controll.
java provides all you need for custom serialisation, see demo code in my post at
Java partial (de)serialization of objects
With that approach you even can specify the binary file format, such that in in C or C# it could be read in, too.
Another advantage custom setialized objects need less space than in main memory (a boolean needs 4 byte in main memm but only 1 byte when custom serialized (as byte)
If differnet project partners have to read your serialied data, Googles Protobuf is an alternative to look at.

Related

Serialization InvalidClass best practice

I am making an application in Java which uses files to store information with serialization. The trouble I ran into was that everytime I update one of my classes thats being store I obviously get InvalidClassException. The process I am following for now is that I just rm all the files and rebuild them. Obviously thats tidious with 5 Users,and I couldnt continue it with 10. Whats the standard best practice when updating Serialized objects to not lose the information from the files?

Mostly?
Stop using java's baked-in serialization. It sucks. This isn't just an opinion - the OpenJDK engineers themselves routinely raise fairly serious eyebrows when the topic of java's baked in serialization mechanism (ObjectInputStream / ObjectOutputStream comes up). In particular:
It is binary.
It is effectively unspecified; you will never be reading or writing it with anything other than java code.
Assuming multiple versions are involved (and it is - that's what your question is all about), it is extremely convoluted and requires advanced java knowledge to try to spin together some tests to ensure that things are backwards/forwards compatible as desired.
The format is not particularly efficient even though it is binary.
The API is weirdly un-java-like (with structural typing even, that's.. bizarre).
So what should I do?
You use an explicit serializer: A library that you include which does the serialization. There are many options. You can use GSON or Jackson to turn your object into a JSON string, and then store that. JSON is textual, fairly easy to read, and can be read and modified by just about any language. Because you 'control' what happens, its a lot simpler to tweak the format and define what is supposed to happen (e.g. if you add a new field in the code, you can specify what the default should be in your Jackson or GSON annotations, and that's the value that you get when you read in a file written with a version of your class that didn't have that field).
JSON is not efficient on disk at all, but its trivial to wrap your writes and reads with GZipInputStream / GZipOutputStream if that's an issue.
An alternative is protobuf. It is more effort but you end up with a binary data format that is fairly compact even if not compressed, and can still be read and written to from many, many languages, and which also parses way faster (this is irrelevant, computers are so fast, the bottleneck will be network or disk, but, if you're reading this stuff on battery-powered raspberry pis or what not, it matters).
I really want to stick with java's baked-in serialization
Read the docs, then. The specific part you want here is what serialVersionUID is all about, but there are so many warts and caveats, you should mostly definitely not just put an svuid in and move on with life - you'll run into the next weird bug in about 5 seconds. Read it all, experiment, attempt to understand it fully.
Then give up, realize it's a mess and ridiculously complicated to test properly, and use one of the above options.

When to use which Data Structure? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am studying Data Structures in a Fundamentals of Software Development course. I have encountered the following data structures:
Structs
Arrays
Lists
Queues
Hash Tables
... among others. I have a good understanding of how they work, but I'm struggling to understand when and where to use them.
I can identify the use of the Queue Data structure, as this would be helpful in printer and/or thread queuing and prioritizing.
Knowing the strengths and weaknesses of a data structure and implementing it in code are different things, and I am finding the former difficult.
What is a simple example of the use of each of the data structures listed above?
For example:
Queue: first-in, first-out → used for printer queue to queue docs

I had trouble understanding them when i first started programming and so i decided to give a heads up to start with.
I am trying to be as simple as possible. Try Oracle Docs fro further details
Struct: When ever you need Object like structure, where you can group related data, use structs. Structs are very rarely used in java though(as objects are created in their place)
Arrays: Arrays are contiguous memory. when ever you want fixed time access based on index, unlike linkedlist, arrays are very fast and so use them.
But the backlog with arrays is that you need to know the size at the time of initialization. Also arrays does not support higher level methods such as add(), remove(), clear(),contains(), indexOf() etc.
List: is an interface which can be implemented using Arrays(ArrayList)
or LinkedLists (LinkedList). They support all the higher level methods specified earlier.
Also Lists re-sizes themselves whenever it is getting out of space. You can specify the initial size which the underlying Arrays or LinkedLists will be created, but whenever the limit is reached, it created the underlying structure with a bigger size and then copies the contents of the initial one.
Queue or Stack: is an implementation technique and not really a data structure. If you want FIFO implementation, you implement Queue on either Arrays or LinkedList(yes, you can implement this technique on both these data structures)
https://en.wikibooks.org/wiki/Data_Structures/Stacks_and_Queues
HashMap: Hashmap is used whenever you want to store key value pairs. if you notice, you cannot use arrays or linked lists or any other mentioned data structure for this purpose. a key can be any thing from String to Object(but note that it has to be an object and cannot be a primitive) and a value can also be any object
google out each data structure for more details

It depends on what you need. If you read and learn more about these data structures you will find convenient ways for their implementation.
Maybe read this book? http://www.amazon.com/Data-Structures-Abstraction-Design-Using/dp/0470128704

All these data structures are used based on their needs in the program. Try to find advantages of one data structure to the other. That should get things more clear for you.What I say wouldn't be much clear, but i'll give it a shot
Like for example,
Structs are used to create a data type, say you want to have a data type for Book & have the names of the bookBook Structure.
Lists are easier to access both ways if you use linked lists & are better than array some times. Queues, well, you can imagine them as real life queues, First In will be First Out. So you can use them when you need to set this priority.
Like I said, looking for advantages of one over the other should get things clear for you.

java embedded library on-disk key-value database [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
What I think I'm looking for is a no-SQL, library-embedded, on disk (ie not in-memory) database, thats accessible from java (and preferably runs inside my instance of the JVM). That's not really much of a database, and I'm tempted to roll-my-own. Basically I'm looking for the "should we keep this in memory or put it on disk" portion of a database.
Our model has grown to several gigabytes. Right now this is all done in memory, meaning we're pushing the JVM for upward of several gigabytes. It's currently all stored in a flat XML file, serialized and deserialized with xstream and compressed with Java'a built in gzip libraries. That's worked well when our model stays under 100MB, but now that its larger than that its becoming a problem.
loosely speaking that model can be broken down as
Project
configuration component (directed-acyclic-graph), not at all database friendly
a list of a dozen "experiment" structures
each containing a list of about a dozen "run-model" structures.
each run-model contains hundreds of megs of data. Once written they are never edited.
What I'd like to do is have something that conforms to a map interface, of guid -> run-model. This mini-database would keep a flat table of these objects. On our experiment model, we would replace the list of run-models with a list of guids, and add, at the application layer, a get call to this map, which would pull it off the disk and into memory.
That means we can keep configuration of our program in XML (which I'm very happy with) and keep a table of the big data in a DBMS that will keep us from consuming multi-GB of memory. On program start and exit I could then load and unload the two portions of our model (the config section in XML, and the run-models in the database format) from an archiving format.
I'm sort've feeling gung-ho about this, and think that I could probably implement it with some of X-Stream's XML inspection strategies and a custom map implementation, but something a voice in the back of my head is telling me I should find a library to do it instead.
Should I roll my own or is there a database that's small enough to fit this bill?
Thanks guys,
-Geoff

http://www.mapdb.org/
Also take a look at this question: Alternative to BerkeleyDB?

Since MapDB is a possible solution for your problem, Chronicle Map is also worth consideration. It's an embeddable Java key-value store, optionally persistent, offering a very similar programming model to MapDB: it also via the vanilla java.util.Map interface and transparent serialization of keys and values.
The major difference is that according to third-party benchmarks, Chronicle Map is times faster than MapDB.
Regarding stability, no bugs were reported about the Chronicle Map data storage for months now, while it is in active use in many projects.
Disclaimer: I'm the developer of Chronicle Map.

Using Java, what's the simplest method of writing a file to disk in a format that is easily readable by other applications

I've been asked to "write a file to disk in a format that is easily readable by other applications. There is no requirement for the file to be human readable". The data to be written to file is a combination of integer, string and date variables.
I can't quite figure out what the aim of this question is and what the correct answer should be.
What are the core considerations to be made in order to write a file to disk in a format that is easily readable by other applications (using the simplest possible method).
No this is not homework.

This is a pretty vague requirement. If the other applications also written in Java, then Java serialization may be the best approach.
EDIT: #leftbrain — In answer to your comment, I think I would lean toward XML; it was designed to support basic interoperability among applications. The three kinds of data mentioned (integer, string, and date) can generally be represented exactly with no special tricks and there is good support for XML processing across most programming languages. However, each data type (in the abstract) can present challenges. I would have asked the following:
What range of integer values need to be supported?
What assumptions (if any) can be made about the string data? Does the full Unicode character set need to be supported?
What range of dates, and what calendar systems, need to be supported?
Is there a well-defined structure to the data?
Is the performance (in terms of time and/or memory) of the write or read operation an issue? What does "easily readable" mean?

What the interviewer is looking for is to find out what experience you have in this area or what you might consider in implementing such a solution. Its and open ended question which has no right answer, it rather depends on the requirements.
Some suggestions use serialization of a data structure. Here a few http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking
Use an SQL or NoSQL database. Here are some NoSQL databases. http://nosql-database.org/
Write the data to disk using DataOutputStream (IO), heap or direct ByteBuffers or memory mapped files. This can work well for simple cases like the one suggested in the question. As you requirements get more complicated, you might consider other options.

If you need to support multiple languages you can use xml.

Java: Serializing a huge amount of data to a single file

I need to serialize a huge amount of data (around 2gigs) of small objects into a single file in order to be processed later by another Java process. Performance is kind of important. Can anyone suggest a good method to achieve this?

Have you taken a look at google's protocol buffers? Sounds like a use case for it.

I don't know why Java Serialization got voted down, it's a perfectly viable mechanism.
It's not clear from the original post, but is all 2G of data in the heap at the same time? Or are you dumping something else?
Out of the box, Serialization isn't the "perfect" solution, but if you implement Externalizable on your objects, Serialization can work just fine. Serializations big expense is figuring out what to write and how to write it. By implementing Externalizable, you take those decisions out of its hands, thus gaining quite a boost in performance, and a space savings.
While I/O is a primary cost of writing large amounts of data, the incidental costs of converting the data can also be very expensive. For example, you don't want to convert all of your numbers to text and then back again, better to store them in a more native format if possible. ObjectStream has methods to read/write the native types in Java.
If all of your data is designed to be loaded in to a single structure, you could simply do ObjectOutputStream.writeObject(yourBigDatastructure), after you've implemented Externalizable.
However, you could also iterate over your structure and call writeObject on the individual objects.
Either way, you're going to need some "objectToFile" routine, perhaps several. And that's effectively what Externalizable provides, as well as a framework to walk your structure.
The other issue, of course, is versioning, etc. But since you implement all of the serialization routines yourself, you have full control over that as well.

A simplest approach coming immediately to my mind is using memory-mapped buffer of NIO (java.nio.MappedByteBuffer). Use the single buffer (approximately) corresponding to the size of one object and flush/append them to the output file when necessary. Memory-mapped buffers are very effecient.

Have you tried java serialization? You would write them out using an ObjectOutputStream and read 'em back in using an ObjectInputStream. Of course the classes would have to be Serializable. It would be the low effort solution and, because the objects are stored in binary, it would be compact and fast.

I developped JOAFIP as database alternative.

Apache Avro might be also usefull. It's designed to be language independent and has bindings for the popular languages.
Check it out.

protocol buffers : makes sense. here's an excerpt from their wiki : http://code.google.com/apis/protocolbuffers/docs/javatutorial.html
Getting More Speed
By default, the protocol buffer compiler tries to generate smaller files by using reflection to implement most functionality (e.g. parsing and serialization). However, the compiler can also generate code optimized explicitly for your message types, often providing an order of magnitude performance boost, but also doubling the size of the code. If profiling shows that your application is spending a lot of time in the protocol buffer library, you should try changing the optimization mode. Simply add the following line to your .proto file:
option optimize_for = SPEED;
Re-run the protocol compiler, and it will generate extremely fast parsing, serialization, and other code.

You should probably consider a database solution--all databases do is optimize their information, and if you use Hibernate, you keep your object model as is and don't really even think about your DB (I believe that's why it's called hibernate, just store your data off, then bring it back)

If performance is very importing then you need write it self. You should use a compact binary format. Because with 2 GB the disk I/O operation are very important. If you use any human readable format like XML or other scripts you resize the data with a factor of 2 or more.
Depending on the data it can be speed up if you compress the data on the fly with a low compression rate.
A total no go is Java serialization because on reading Java check on every object if it is a reference to an existing object.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.