I have several private jars in my project and I need them to write and read values to\from the same b-dimensional array.
EDIT: Now I got anotger reqirement - to allow a C++\MFC app to write to this shared memory. So, all in all, I have a c++ app writing to the shared memory and jars reading from it. What is the best way to acheive it? Maybe a web service?
If yes - how to implenet that?
You can;
run all the jars in the same JVM. This way they will share memory natively.
use memory mapped files, possibly on a ram drive if you don't need persistence.
Chronicle has a number of libraries to make using memory mapped files easier as it
offers thread safe operations into native memory.
builds some useful data structures such as a key-value store and persisted queue
support more efficient serialization and zero-copy access to native memory.
For information on all the Apache 2.0 Open Source libraries
BTW Chronicle also supports distribution of its data structures which means the jars don't have to be running on the same machine.
Related
We are looking for a shared memory mechanism to transfer large amounts of data between processes without copying, in Java. It has to be portable (including Windows). Is there such a thing? We were thinking about using mmap-ed files, as they are portable, however their contents are written to disk which is not desirable. Are there alternatives?
Otherwise, Windows has page-file-backed sections; is there an easy way to use these in Java? We are probably ok if we can use some other shared memory mechanism on *nix and those on Windows.
There is a couple of solutions in OpenHFT. These support a rolling queue which can read and written to concurrently, and a SharedHashMap which is entirely off heap. In Linux you can "write" to a tmpfs filesystem and on Windows you can use a Ramdrive.
These libraries support replication between machines over TCP. (The replication for Chronicle has been used for a while but for SharedHashMap, replication is still in beta)
While this is portable across OSes, it uses some features internal to OpenJDK/HotSpot JVMs and so it doesn't work on the IBM JVM AFAIK.
Note: the libraries support reading and writing data using a forma of serialization which doesn't create garbage, or using in place, off heap data structures. i.e. you don't need to deserialize the whole object to access a portion of it.
I need to implement a disk-backed queue which can accept real-time profiling data from multiple threads and then upload that data over potentially faulty transports. Initially targeted at Java but long-term we will need to use the same mechanism in Objective-C, Flash, JavaScript. Targeted at android Java as well as desktop.
This will be contained within a single process, so an MQ solution is probably out. Performance is a point of significant consideration, meaning we'd trade some reliability for performance.
I'm curious about two things:
Given the above architecture, is there any available technology that'll completely or partially solve this problem?
Given the goal of eventually re-implementing or ideally re-using this mechanism in different platforms, is there any way to build this in a way that can be easily used in say both Objective-C & Android Java?
How's this architecture look?
In case you want to keep limited amount of data (circular log), and able to reserve fixed amount of persistent memory for it, then most effective solution is memory-mapped buffers. Persister is simply a cache of several buffers, serving both profiling queue and uploader.
When reimplementing it on other platforms, chances are that the platform has no mapping facility. In this case, buffers can be read and written directly. This can be less efficient than mapping to memory, but still no less efficient than other solutions (e.g. embedded database).
As for the architecture, the picture does not reflect the situation when data is read from persister (or else what for is persister needed?). Then, profiling queue actually embrace the whole data (including persistent), and what is named as profiling queue is the buffers in main memory, they can be not contiguous, so better name is buffer cache than queue.
I need to share data between two Java applications running on the same machine (two different JVMs). I precise that the data to be shared is large (about 7 GB). The applications must access the data very fast because they have to answer incoming queries at a very high rate. I don't want the applications to hold each one a copy of the data.
I've seen that one option is to use memory-mapped files. Application A gets the data from somewhere (let's say a database) and stores it in files. Then application B may access these files using java.nio. I don't know exactly how memory-mapped files work, I only know that the data is stored in a file and that this file (or a part of it) is mapped to a region of the memory (virtual memory?). So, the two applications can read-write the data in memory and the changes are automatically (I guess?) committed to the file. I also don't know if there is a maximum size for a file to be entirely mapped in memory.
My first question is what are the different possibilities for two applications to share data in this scenario (I mean taking into account that the amount of data is very large and that access to this data must be very fast)? I precise that this question is not related to memory-mapped I/O, it just to know what are the other ways to solve the same problem.
My second question is what are the pros and cons of using memory-mapped files?
Thanks
My first question is what are the different possibilities for two applications to share data?
As S.Lott points out, there's a lot of mechanisms:
OS-level message queues
OS-level POSIX shared memory segments (persist after process death)
OS-level memory mappings (could be anonymous or file-backed)
OS-level anonymous pipes (unidirectional)
OS-level named pipes (unidirectional)
OS-level sockets (bidirectional) -- whether AF_UNIX or AF_INET or AF_INET6
OS-level shared global memory -- suitable for multi-threaded programs
Storing data in files
Application-level message queues
Application-level blackboard-style tuplespaces
Application-level key/value stores
Application-level remote procedure call frameworks -- many are available
Application-level web-based frameworks
My second question is what are the pros and cons of using memory-mapped files?
Pros:
very fast -- depending upon how you access the data, potentially zero-copy mechanisms can be used to operate directly on the data with no speed penalties. Care must be taken to update objects in a consistent manner.
should be very portable -- available on Unix systems for probably 25 years (give or take), and apparently Windows has mechanisms too.
Cons:
Single-system sharing. If you want to distribute your application over multiple machines, shared memory isn't a great option. Distributed shared memory systems are available, but they feel very much like the wrong interface to my way of thinking.
Even on a single system, if the memory is located on a single NUMA node but needed to be accessed by processors from multiple nodes, the inter-node requests may significantly slow processing compared to giving each node their own segment of the memory.
You can't just store pointers -- everything must be stored as offsets to base addresses, because the memory may be mapped at different locations in different processes. I have no idea what this means for Java objects, though presumably someone smart did their best to make it transparent to Java programmers. If you're not using their provided mechanisms, then you probably must do the work yourself. (Without actual pointers in Java, perhaps this is not very onerous.)
Updating objects consistently has proven to be very difficult. Passing immutable objects in message-passing systems instead generally results in programs with fewer concurrency bugs. (Concurrent programming in Erlang feels very natural and straight-forward. Concurrent programming in more imperative languages tends to introduce a huge pile of new concurrency controls: semaphores, mutexes, spinlocks, monitors).
Memory mapped files sounds like a headache. A simple option and less error prone would be to use a shared database with a cluster aware cache. That way only writes go down to the database and reads can be served from the cache.
As an example of how to do this in hibernate see http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
I have some maps that contains cached data from db. Currently 5 instance of the same server is running on same machine in different JVM. How can I share maps between JVM? cache is write once and read many. Currently the problem is because of this cache JVM footprint is very big. So storing this map in all JVM is consuming lot of memory. I need some solution which may not consume much cpu time. Is there way to do this in the same way class sharing is done between JVM?
Thanks
Nikesh PL
Basically, you can't: those are two different address spaces.
You could serialize one and read it from the other, but that wouldn't be like sharing them.
How about a process to manage the cache, and a quick, low-bandwidth interface that your application programs can use to access the data?
Why dont you look at coherence a project from oracle. Its not free but you can download and test it for free on a development system. It does precisely what you are looking for. It is used as a cache for storing database data but is ultimately a map of keys and values. Its pretty simple to set up and use. Here's a link to get you started:
http://download.oracle.com/docs/cd/E13924_01/coh.340/e14135.pdf
I am new to memcached and caching in general. I have a java web application running on Ubuntu + Tomcat + MySQL on a VPS Server with 1GB of memory.
Does it make sense to add a memcached layer with about 256MB for caching? Will this be too much load on the server? Which is more appropriate caching rendered html pages or database objects?
Please advise.
If you're going to cache pages, don't use memcached, use Varnish. However, there's a good chance that's not a great use of memory. Cacheing pages trades memory for computation and database work, but it does cost quite a lot of memory per page, so it's best for cases where the computation and database work needed to produce a single page amounts to a lot (or the pages are very small!). Also, consider that page cacheing won't be effective, or even possible, if you want to use per-user customisation on your pages (eg showing the number of items in a shopping cart). At least not without getting into some truly hairy shenanigans (edge-side includes, anyone?).
If you're not going to cache pages, and your app is on a single machine, then there's no point using memcached or similar. The point of cache servers like that is to make the memory on one machine work as a cache for another - like how a file server shares a disk, they're essentially memory servers. On a single machine, you might as well give all the memory to Java and cache objects on the heap.
Are you using an object-relational mapper? If so, see if it has any support for a second-level cache. The big three implementations (Hibernate, OpenJPA, and EclipseLink) all support in-memory caches. They're likely to do a much better job than you would if you did the cacheing yourself.
But, if you're not using a mapper, you have no choice but to do the cacheing yourself. There are extension points in LinkedHashMap for building LRU caches, and then of course there's the people's favourite, SoftReference, in combination with a HashMap. Plus, there are probably cache implementations out there you could download and use - i'd be shocked if there wasn't something in the Apache Commons libraries.
memcached won't add any noticeable load on your server, but it will be memory your app can't use. If you only plan to have a single app server for a while, you're better off using an in-JVM cache.
As far what to cache, the answer falls somewhere in the middle of the above. You don't want to cache exactly what's in your database and you certainly don't want to cache the final output. You have a data model representation in your application that isn't exactly what's in the DB (e.g. a User object might be made up of multiple queries from a few different tables). Cache that kind of thing as it's most reusable.
There's lots of info in the memcached site that should help you understand and get going with caching in general and memcached specifically.
It might make sense to do that, why don't try a smaller size like 64 MB and see how that goes. When you use more resources for the memcache, there is less for everything else. You should try it and see what will give you the best performance.