Is there a portable shared memory mechanism in Java? - java

We are looking for a shared memory mechanism to transfer large amounts of data between processes without copying, in Java. It has to be portable (including Windows). Is there such a thing? We were thinking about using mmap-ed files, as they are portable, however their contents are written to disk which is not desirable. Are there alternatives?
Otherwise, Windows has page-file-backed sections; is there an easy way to use these in Java? We are probably ok if we can use some other shared memory mechanism on *nix and those on Windows.

There is a couple of solutions in OpenHFT. These support a rolling queue which can read and written to concurrently, and a SharedHashMap which is entirely off heap. In Linux you can "write" to a tmpfs filesystem and on Windows you can use a Ramdrive.
These libraries support replication between machines over TCP. (The replication for Chronicle has been used for a while but for SharedHashMap, replication is still in beta)
While this is portable across OSes, it uses some features internal to OpenJDK/HotSpot JVMs and so it doesn't work on the IBM JVM AFAIK.
Note: the libraries support reading and writing data using a forma of serialization which doesn't create garbage, or using in place, off heap data structures. i.e. you don't need to deserialize the whole object to access a portion of it.

Related

Share memory between jars

I have several private jars in my project and I need them to write and read values to\from the same b-dimensional array.
EDIT: Now I got anotger reqirement - to allow a C++\MFC app to write to this shared memory. So, all in all, I have a c++ app writing to the shared memory and jars reading from it. What is the best way to acheive it? Maybe a web service?
If yes - how to implenet that?
You can;
run all the jars in the same JVM. This way they will share memory natively.
use memory mapped files, possibly on a ram drive if you don't need persistence.
Chronicle has a number of libraries to make using memory mapped files easier as it
offers thread safe operations into native memory.
builds some useful data structures such as a key-value store and persisted queue
support more efficient serialization and zero-copy access to native memory.
For information on all the Apache 2.0 Open Source libraries
BTW Chronicle also supports distribution of its data structures which means the jars don't have to be running on the same machine.

Portable persistance queue -> uploader

I need to implement a disk-backed queue which can accept real-time profiling data from multiple threads and then upload that data over potentially faulty transports. Initially targeted at Java but long-term we will need to use the same mechanism in Objective-C, Flash, JavaScript. Targeted at android Java as well as desktop.
This will be contained within a single process, so an MQ solution is probably out. Performance is a point of significant consideration, meaning we'd trade some reliability for performance.
I'm curious about two things:
Given the above architecture, is there any available technology that'll completely or partially solve this problem?
Given the goal of eventually re-implementing or ideally re-using this mechanism in different platforms, is there any way to build this in a way that can be easily used in say both Objective-C & Android Java?
How's this architecture look?
In case you want to keep limited amount of data (circular log), and able to reserve fixed amount of persistent memory for it, then most effective solution is memory-mapped buffers. Persister is simply a cache of several buffers, serving both profiling queue and uploader.
When reimplementing it on other platforms, chances are that the platform has no mapping facility. In this case, buffers can be read and written directly. This can be less efficient than mapping to memory, but still no less efficient than other solutions (e.g. embedded database).
As for the architecture, the picture does not reflect the situation when data is read from persister (or else what for is persister needed?). Then, profiling queue actually embrace the whole data (including persistent), and what is named as profiling queue is the buffers in main memory, they can be not contiguous, so better name is buffer cache than queue.

Memory-mapped files: pros and cons?

I need to share data between two Java applications running on the same machine (two different JVMs). I precise that the data to be shared is large (about 7 GB). The applications must access the data very fast because they have to answer incoming queries at a very high rate. I don't want the applications to hold each one a copy of the data.
I've seen that one option is to use memory-mapped files. Application A gets the data from somewhere (let's say a database) and stores it in files. Then application B may access these files using java.nio. I don't know exactly how memory-mapped files work, I only know that the data is stored in a file and that this file (or a part of it) is mapped to a region of the memory (virtual memory?). So, the two applications can read-write the data in memory and the changes are automatically (I guess?) committed to the file. I also don't know if there is a maximum size for a file to be entirely mapped in memory.
My first question is what are the different possibilities for two applications to share data in this scenario (I mean taking into account that the amount of data is very large and that access to this data must be very fast)? I precise that this question is not related to memory-mapped I/O, it just to know what are the other ways to solve the same problem.
My second question is what are the pros and cons of using memory-mapped files?
Thanks
My first question is what are the different possibilities for two applications to share data?
As S.Lott points out, there's a lot of mechanisms:
OS-level message queues
OS-level POSIX shared memory segments (persist after process death)
OS-level memory mappings (could be anonymous or file-backed)
OS-level anonymous pipes (unidirectional)
OS-level named pipes (unidirectional)
OS-level sockets (bidirectional) -- whether AF_UNIX or AF_INET or AF_INET6
OS-level shared global memory -- suitable for multi-threaded programs
Storing data in files
Application-level message queues
Application-level blackboard-style tuplespaces
Application-level key/value stores
Application-level remote procedure call frameworks -- many are available
Application-level web-based frameworks
My second question is what are the pros and cons of using memory-mapped files?
Pros:
very fast -- depending upon how you access the data, potentially zero-copy mechanisms can be used to operate directly on the data with no speed penalties. Care must be taken to update objects in a consistent manner.
should be very portable -- available on Unix systems for probably 25 years (give or take), and apparently Windows has mechanisms too.
Cons:
Single-system sharing. If you want to distribute your application over multiple machines, shared memory isn't a great option. Distributed shared memory systems are available, but they feel very much like the wrong interface to my way of thinking.
Even on a single system, if the memory is located on a single NUMA node but needed to be accessed by processors from multiple nodes, the inter-node requests may significantly slow processing compared to giving each node their own segment of the memory.
You can't just store pointers -- everything must be stored as offsets to base addresses, because the memory may be mapped at different locations in different processes. I have no idea what this means for Java objects, though presumably someone smart did their best to make it transparent to Java programmers. If you're not using their provided mechanisms, then you probably must do the work yourself. (Without actual pointers in Java, perhaps this is not very onerous.)
Updating objects consistently has proven to be very difficult. Passing immutable objects in message-passing systems instead generally results in programs with fewer concurrency bugs. (Concurrent programming in Erlang feels very natural and straight-forward. Concurrent programming in more imperative languages tends to introduce a huge pile of new concurrency controls: semaphores, mutexes, spinlocks, monitors).
Memory mapped files sounds like a headache. A simple option and less error prone would be to use a shared database with a cluster aware cache. That way only writes go down to the database and reads can be served from the cache.
As an example of how to do this in hibernate see http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache

Sharing hashmap between different JVM running on the same machien

I have some maps that contains cached data from db. Currently 5 instance of the same server is running on same machine in different JVM. How can I share maps between JVM? cache is write once and read many. Currently the problem is because of this cache JVM footprint is very big. So storing this map in all JVM is consuming lot of memory. I need some solution which may not consume much cpu time. Is there way to do this in the same way class sharing is done between JVM?
Thanks
Nikesh PL
Basically, you can't: those are two different address spaces.
You could serialize one and read it from the other, but that wouldn't be like sharing them.
How about a process to manage the cache, and a quick, low-bandwidth interface that your application programs can use to access the data?
Why dont you look at coherence a project from oracle. Its not free but you can download and test it for free on a development system. It does precisely what you are looking for. It is used as a cache for storing database data but is ultimately a map of keys and values. Its pretty simple to set up and use. Here's a link to get you started:
http://download.oracle.com/docs/cd/E13924_01/coh.340/e14135.pdf

What solutions exist for a JVM-based queue that is larger than heap?

I am looking at possible technology choices for queues (or perhaps streams are a better description) in a JVM-based system.
Some requirements:
Must be accessible from the JVM / Java.
Queues must support sizes larger than the JVM heap, possibly bigger than all available RAM. Thus, support for utilizing the disk (or network) for storage is implied.
Queues do not currently need to be durable past the process lifetime.
Most uses of the queue will have a single producer and a single consumer. Concurrency for any particular queue is thus not an issue. (Obviously concurrency is important across queues.)
Queues are ad-hoc and temporary. They pop into existence, are filled, are drained, and go away.
Small queues should preferably stay in memory, then shift to slower storage based on resource availability. This requirement could be met above the queuing technology.
I am examining several options but am curious what options I am missing?
Use one of available JMS implementations. For example ActiveMQ or Qpid from Jakarta.
I ran across this FIFO queue with spill to disk which is kind of interesting and has some of the properties I'm looking for:
http://code.google.com/p/ashes-queue/
I have considered using Terracotta's BigMemory as a tool for pushing queue data into direct memory and off-heap.
How about using Redis as a messaging queue.It supports both in-memory and can be made persistent once data does not fit the RAM.
HSQLDB provides an in-process database engine where you can use RAM, the local disk or a network server to store the database. That might float your boat, especially if you want to seamlessly move to a network solution rather than the local disk later on. Transitioning from small to large queues would then involve moving data from one database to another. There are standard ways to do this, but they might be pretty slow.
There more I think about it, the more I think this is not a good match. For what it's worth, the in-memory DB is very fast in my experience.

Categories