Non -Ram Storage

Non -Ram Storage - java

I am learning java from "Thinking In Java" by Bruce Eckel. I am unable to understand the concept of Non -Ram Storage.
As the book says:
Non-RAM storage. If data lives completely outside a program, it can
exist while the program is not running, outside the control of the
program. The two primary examples of this are streamed objects, in
which objects are turned into streams of bytes, generally to be sent
to another machine, and persistent objects, in which the objects are
placed on disk so they will hold their state even when the program is
terminated. The trick with these types of storage is turning the
objects into something that can exist on the other medium, and yet can
be resurrected into a regular RAM-based object when necessary. Java
provides support for lightweight persistence,and mechanism such as
JDBC!
What is lightweight persistence?..what is meant by turning the objects into something that can exist on the other medium, and yet can be resurrected into a regular RAM-based object when necessary?

Persistent data is information that can outlive the program that creates it. The majority of complex programs use persistent data: GUI applications need to store user preferences across program invocations, web applications track user movements and orders over long periods of time, etc. (source provided below)
Here is the answer your question:
Lightweight persistence is a storage area which requires a little or no work from the developer side. Examples:Java serialization is a form of lightweight persistence because it can be used to persist Java objects directly to a file with very little effort.
I am very happy that you are not just reading the book...rather you are asking questions about anything you come across in the book. good luck
source

There is a processing in java (and other languages) called serialization. Basically it lets you turn an object into a byte stream, so it can be written to a file, stored in a database, sent to a cloud, etc. The idea is that there is an easy and automatic translation between the stored object and the in-memory RAM object. If you do it yourself, such as writing individual fields to a file or database, you need to come up with a file format or database schema. This is heavy-weight storage.
Here is a tutorial on java serialization: http://www.tutorialspoint.com/java/java_serialization.htm

Related

Saving object for future use

I am writing a program (a bot) to play a Risk-like game in an AI competition. I'm new to programming so I've used some very basic coding so far. In this game, each round the program receive some information from the game engine. In the program, I have a class BotState, which allows me to treat information from the current round, such as the opponent bot moves, or the regions currently under my control, etc. This information is put in some ArrayLists. I have some getters to access this information and use them in the main class.
My problem is that each round, the information is overwritten (each round means a new run of the program), so I can only access the information from the current round. What I would like to do is save all of the information each round, so that for example if the game state is at round 10, I still can access the moves that the opponent made on round 8.
I looked for ways to solve this problem, and I came across something named "object serialization". I didn't quite understood how it works, so I would like to know if there is a simpler/better way to do what I want, or if serialization is the way to go. Thanks for your help.
edit: I can't link the program to my disk or a database. I upload the source files of the bot to the game server, so everything has to be in the source files

Object serialization should be fairly simple for your case.
Simply put it is a way to store your object on disk and
to later on take data from the disk and recreate your object
in memory in the same state it was before serializing it.
Another way is to define some sort of representation yourself
e.g. as an XML chunk and for each object and to store those
chunks in an XML file. You can view this as a custom serialization
but it's still a serialization.
Another way is to store your objects into a database.
All in all, you need some permanent/persistent storage
for your objects (whether it's the disk directly or a DB
/which is again using the disk at the lowest level/).

Consider using a modeling framework for your application. The Eclipse Modeling Framework (EMF) comes with a simple XMI serialization built into it. If your model is small and/or simple enough it may be worth it. Have a look at this EMF introduction tutorial and this tutorial on serialization in EMF.
Also, have a look at this question: What's the easiest way to persist java objects?.

Storing Large Amounts of Dictionary-Like Data Within an Application in Java

I fear I may not be truly understanding the utility of database software like MySQL, so perhaps this is an easy question to answer.
I'm writing a program that stores and accesses a bestiary for use in the program. It is a stand-alone application, meaning that it will not connect to the internet or a database (which I am under the impression requires a connection to a server). Currently, I have an enormous .txt file that it parses via a simple pattern (Habitat is on every tenth line, starting with the seventh; name is on every tenth line, starting with the first; etc.) This is prone to parsing errors (problems with reading data that is unrecognizable with the specified encoding, as a lot of the data is copy/pasted by lazy data-entry-ists) and I just feel that parsing a giant .txt file every time I want data is horribly inefficient. Plus, I've never seen a deployed program that had a .txt laying around called "All of our important data.txt".
Are databases the answer? Can they be used simply in basic applications like this one? Writing a class for each animal seems silly. I've heard XML can help, too - but I know virtually nothing about it except that its a mark-up language.
In summary, I just don't know how to store large amounts of data within an application. A good analogy would be: How would you store data for a dictionary/encyclopedia application?

So you are saying that a standalone application without internet access cannot have a database connection? Well your Basic assumption that DB cannot exist in standalone apps is wrong. Today's web applications use Browser assisted SQL databases to store data. All you need is to experiment rather than speculate. If you need direction, start with light weight SQLite

While databases are undoubtedly a good idea for the kind of application you're describing, I'll throw another suggestion your way, which might suit you if your data doesn't necessarily need to change at all, and there's not a "huge" amount of it.
Java provides the ability to serialise objects, which you could use to persist and retrieve object instance data directly to/from files. Using this simple approach, you could:
Write code to parse your text file into a collection of serialisable application-specific object instances;
Serialise these instances to some file(s) which form part of your application;
De-serialise the objects into memory every time the application is run;
Write your own Java code to search and retrieve data from these objects yourself, for example using ordered collection structures with custom comparators.
This approach may suffice if you:
Don't expect your data to change;
Do expect it to always fit within memory on the JVMs you're expecting the application will be run on;
Don't require sophisticated querying abilities.
Even if one or more of the above things do not hold, it may still suit you to try this approach, so that your next step could be to use a so-called object-relational mapping tool like Hibernate or Castor to persist your serialisable data not in a file, but a database (XML or relational). From there, you can use the power of some database to maintain and query your data.

Inserting to and searching a large amount of data in Java

I am writing a program in Java which tracks data about baseball cards. I am trying to decide how to store the data persistently. I have been leaning towards storing the data in an XML file, but I am unfamiliar with XML APIs. (I have read some online tutorials and started experimenting with the classes in the javax.xml hierarchy.)
The software has to major use cases: the user will be able to add cards and search for cards.
When the user adds a card, I would like to immediately commit the data to the persistant storage. Does the standard API allow me to insert data in a random-access way (or even appending might be okay).
When the user searches for cards (for example, by a player's name), I would like to load a list from the storage without necessarily loading the whole file.
My biggest concern is that I need to store data for a large number of unique cards (in the neighborhood of thousands, possibly more). I don't want to store a list of all the cards in memory while the program is open. I haven't run any tests, but I believe that I could easily hit memory constraints.
XML might not be the best solution. However, I want to make it as simple as possible to install, so I am trying to avoid a full-blown database with JDBC or any third-party libraries.
So I guess I'm asking if I'm heading in the right direction and if so, where can I look to learn more about using XML in the way I want. If not, does anyone have suggestions about what other types of storage I could use to accomplish this task?

While I would certainly not discourage the use of XML, it does have some draw backs in your context.
"Does the standard API allow me to insert data in a random-access way"
Yes, in memory. You will have to save the entire model back to file though.
"When the user searches for cards (for example, by a player's name), I would like to load a list from the storage without necessarily loading the whole file"
Unless you're expected multiple users to be reading/writing the file, I'd probably pull the entire file/model into memory at load and keep it there until you want to save (doing periodical writes the background is still a good idea)
I don't want to store a list of all the cards in memory while the program is open. I haven't run any tests, but I believe that I could easily hit memory constraints
That would be my concern to. However, you could use a SAX parser to read the file into a custom model. This would reduce the memory overhead (as DOM parsers can be a little greedy with memory)
"However, I want to make it as simple as possible to install, so I am trying to avoid a full-blown database with JDBC"
I'd do some more research in this area. I (personally) use H2 and HSQLDB a lot for storage of large amount of data. These are small, personal database systems that don't require any additional installation (a Jar file linked to the program) or special server/services.
They make it really easy to build complex searches across the datastore that you would otherwise need to create yourself.
If you were to use XML, I would probably do one of three things
1 - If you're going to maintain the XML document in memory, I'd get familiar with XPath
(simple tutorial & Java's API) for searching.
2 - I'd create a "model" of the data using Objects to represent the various nodes, reading it in using a SAX. Writing may be a little more tricky.
3 - Use a simple SQL DB (and Object model) - it will simply the overall process (IMHO)
Additional
As if I hadn't dumped enough on you ;)
If you really want to XML (and again, I wouldn't discourage you from it), you might consider having a look a XML database style solution
Apache Xindice (apparently retired)
Or you could have a look at some other people think
Use XML as database in Java
Java: XML into a Database, whats the simplest way?
For example ;)

What is the purpose of Serialization in Java?

I have read quite a number of articles on Serialization and how it is so nice and great but none of the arguments were convincing enough. I am wondering if someone can really tell me what is it that we can really achieve by serializing a class?

Let's define serialization first, then we can talk about why it's so useful.
Serialization is simply turning an existing object into a byte array. This byte array represents the class of the object, the version of the object, and the internal state of the object. This byte array can then be used between JVM's running the same code to transmit/read the object.
Why would we want to do this?
There are several reasons:
Communication: If you have two machines that are running the same code, and they need to communicate, an easy way is for one machine to build an object with information that it would like to transmit, and then serialize that object to the other machine. It's not the best method for communication, but it gets the job done.
Persistence: If you want to store the state of a particular operation in a database, it can be easily serialized to a byte array, and stored in the database for later retrieval.
Deep Copy: If you need an exact replica of an Object, and don't want to go to the trouble of writing your own specialized clone() class, simply serializing the object to a byte array, and then de-serializing it to another object achieves this goal.
Caching: Really just an application of the above, but sometimes an object takes 10 minutes to build, but would only take 10 seconds to de-serialize. So, rather than hold onto the giant object in memory, just cache it out to a file via serialization, and read it in later when it's needed.
Cross JVM Synchronization: Serialization works across different JVMs that may be running on different architectures.

While you're running your application, all of its objects are stored in memory (RAM). When you exit, that memory gets reclaimed by the operating system, and your program essentially 'forgets' everything that happened while it was running. Serialization remedies this by letting your application save objects to disk so it can read them back the next time it starts. If your application is going to provide any way of saving/sharing a previous state, you'll need some form of serialization.

I can share my story and I hope it will give some ideas why serialization is necessary. However, the answers to your question are already remarkably detail.
I had several projects that need to load and read a bunch of text files. The files contained stop words, biomedical verbs, biomedical abbreviations, words semantically connected to each other, etc. The contents of these files are simple: words!
Now for each project, I needed to read the words from each of these files and put them into different arrays; as the contents of the file never changed, it became a common, however redundant, task after the first project.
So, what I did is that I created one object to read each of these files and to populate individual arrays (instance variables of the objects). Then I serialized the objects and then for the later projects, I simply deserialized them. I didn't have to read the files and populate the arrays again and again.

In essense:
Serialization is the process of
converting a set of object instances
that contain references to each other
into a linear stream of bytes, which
can then be sent through a socket,
stored to a file, or simply
manipulated as a stream of data
See uses from Wiki:
Serialization has a number of advantages. It provides:
a method of persisting objects which
is more convenient than writing
their properties to a text file on
disk, and re-assembling them by
reading this back in.
a method of
issuing remote procedure calls,
e.g., as in SOAP
a method for
distributing objects, especially in
software componentry such as COM,
CORBA, etc.
a method for detecting
changes in time-varying data.

The most obvious is that you can transmit the serialized class over a network,
and the recepient can construct a duplicate of the original instanstance. Likewise,
you can save a serialized structure to a file system.
Also, note that serialization is recursive, so you can serialize an entire heterogenous
data structure in one swell foop, if desired.

Serialized objects maintain state in space, they can be transferred over the network, file system, etc... and time, they can outlive the JVM that created them.
Sometimes this is useful.

I use serialized objects to standardize the arguments I pass to functions or class constructors. Passing one serialized bean is much cleaner than a long list of arguments. The result is code that is easier to read and debug.

For the simple purpose of learning (notice, I said learning, I did not say best, or even good, but just for the sake of understanding stuff), you could save your data to a text file on the computer, then have a program that reads that info, and based on the file, you could have your program respond differently. If you were more advanced, it wouldn't necessarily have to be a txt file, but something else.
Serializing on the other hand, puts things directly into computer language. It's like you're telling a Spanish computer something in Spanish, rather than telling it something in French, forcing it to learn French, then save things into its native Spanish by translating everything. Not the most tech-intensive answer, I'm just trying to create an understandable example in a common language format.
Serialization is also faster, because in Java, objects are handled on the heap, and take much longer than if they were represented as primitives on the stack. Speed, speed, speed. And less file processing from a programmer point of view.

One of the classical example where serialization is used in daily life is "Save Game" option in any computer games. When player decides save his progress in the game then the application writes the saved state of the game into a file via serialization and when player "Load Game" the serialized file is read and Game state is re-created.

Java: Advice on handling large data volumes. (Part Deux)

Alright. So I have a very large amount of binary data (let's say, 10GB) distributed over a bunch of files (let's say, 5000) of varying lengths.
I am writing a Java application to process this data, and I wish to institute a good design for the data access. Typically what will happen is such:
One way or another, all the data will be read during the course of processing.
Each file is (typically) read sequentially, requiring only a few kilobytes at a time. However, it is often necessary to have, say, the first few kilobytes of each file simultaneously, or the middle few kilobytes of each file simultaneously, etc.
There are times when the application will want random access to a byte or two here and there.
Currently I am using the RandomAccessFile class to read into byte buffers (and ByteBuffers). My ultimate goal is to encapsulate the data access into some class such that it is fast and I never have to worry about it again. The basic functionality is that I will be asking it to read frames of data from specified files, and I wish to minimize the I/O operations given the considerations above.
Examples for typical access:
Give me the first 10 kilobytes of all my files!
Give me byte 0 through 999 of file F, then give me byte 1 through 1000, then give me 2 through 1001, etc, etc, ...
Give me a megabyte of data from file F starting at such and such byte!
Any suggestions for a good design?

Use Java NIO and MappedByteBuffers, and treat your files as a list of byte arrays. Then, let the OS worry about the details of caching, read, flushing etc.

#Will
Pretty good results. Reading a large binary file quick comparison:
Test 1 - Basic sequential read with RandomAccessFile.
2656 ms
Test 2 - Basic sequential read with buffering.
47 ms
Test 3 - Basic sequential read with MappedByteBuffers and further frame buffering optimization.
16 ms

Wow. You are basically implementing a database from scratch. Is there any possibility of importing the data into an actual RDBMS and just using SQL?
If you do it yourself you will eventually want to implement some sort of caching mechanism, so the data you need comes out of RAM if it is there, and you are reading and writing the files in a lower layer.
Of course, this also entails a lot of complex transactional logic to make sure your data stays consistent.

I was going to suggest that you follow up on Eric's database idea and learn how databases manage their buffers—effectively implementing their own virtual memory management.
But as I thought about it more, I concluded that most operating systems are already a better job of implementing file system caching than you can likely do without low-level access in Java.
There is one lesson from database buffer management that you might consider, though. Databases use an understanding of the query plan to optimize the management strategy.
In a relational database, it's often best to evict the most-recently-used block from the cache. For example, a "young" block holding a child record in a join won't be looked at again, while the block containing its parent record is still in use even though it's "older".
Operating system file caches, on the other hand, are optimized to reuse recently used data (and reading ahead of the most recently used data). If your application doesn't fit that pattern, it may be worth managing the cache yourself.

You may want to take a look at an open source, simple object database called jdbm - it has a lot of this kind of thing developed, including ACID capabilities.
I've done a number of contributions to the project, and it would be worth a review of the source code if nothing else to see how we solved many of the same problems you might be working on.
Now, if your data files are not under your control (i.e. you are parsing text files generated by someone else, etc...) then the page-structured type of storage that jdbm uses may not be appropriate for you - but if all of these files are files that you are creating and working with, it may be worth a look.

#Eric
But my queries are going to be much, much simpler than anything I can do with SQL. And wouldn't a database access be much more expensive than a binary data read?

This is to answer the part about minimizing I/O traffic. On the Java side, all you can really do is wrap your readers in BufferedReaders. Aside from that, your operating system will handle other optimizations like keeping recently-read data in the page cache and doing read-ahead on files to speed up sequential reads. There's no point in doing additional buffering in Java (although you'll still need a byte buffer to return the data to the client).

I had someone recommend hadoop (http://hadoop.apache.org) to me just the other day. It looks like it could be pretty nice, and might have some marketplace traction.

I would step back and ask yourself why you are using files as your system of record, and what gains that gives you over using a database. A database certainly gives you the ability to structure your data. Given the SQL standard, it might be more maintainable in the long run.
On the other hand, your file data may not be structured so easily within the constraints of a database. The largest search company in the world :) doesn't use a database for their business processing. See here and here.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.