I'm looking for an on-disk implementation of java.util.Map. Nothing too fancy, just something that I can point at a directory or file and have it store its contents there, in some way it chooses. Does anyone know of such a thing?
You could have a look at the Disk-Backed-map project.
A library that implements a disk backed map in Java
A small library that provide a disk backed map implementation for storing large number of key value pairs. The map implementations (HashMap, HashTable) max out around 3-4Million keys/GB of memory for very simple key/value pairs and in most cases the limit is much lower. DiskBacked map on the other hand can store betweeen 16Million (64bit JVM) to 20Million(32bit JVM) keys/GB, regardless the size of the key/value pairs.
If you are looking for key-object based structures to persist data then NoSQL databases are a very good choice. You'll find that some of them such MongoDB or Redis scale and perform for big datasets and apart from hash based look ups they provide interesting query and transactional features.
In essence these types of systems are a Map implementation. And it shouldn't be too complicated to implement your own adapter that implements java.util.Map to bridge them.
MapDB (mapdb.org) does exactly what you are looking for. Besides disk backed TreeMap and HashMap it gives you other collection types.
It's maps are also thread-safe and have really good performance.
See Features
Chronicle Map is a modern and the fastest solution to this problem. It implements ConcurrentMap interface and persists the data to disk (under the hood, it is done by mapping Chronicle Map's memory to a file).
You could use a simple EHCache implementation? The nice thing about EHCache being that it can be very simple to implement :-)
I take it you've ruled out serialising / deserialising an actual Map instance?
This seems like a relatively new open source solution to the problem, I've used it, and like it so far
https://github.com/jankotek/JDBM4
Related
I have a large amount of data I need to store in a Map<String, int...>. I need to be able to perform the following functions:
containsKey(String key)
get(String key).add(int value)
put(String key, singletonList(int value))
entrySet().iterator()
I originally was just using a HashMap<String, ArrayList<Integer>> where singletonList is a function that creates a new ArrayList<Integer> and adds the given value to it. However, this strategy does not scale well to the amount of data I am using as it stores everything in the RAM and my RAM is not big enough to store all the data.
My next idea was to just dump everything into a file. However, this would mean that get, containsKey, and put would become very expensive operations, which is not at all desirable. Of course, I could keep everything sorted, but that is often difficult in a large file.
I was wondering if there is a better strategy out there.
Using an embedded database engine (key-value store), such as MapDB, is one way to go.
From MapDB's official website:
MapDB is embedded database engine. It provides java collections backed by disk or memory database store. MapDB has excellent performance comparable to java.util.HashMap and other collections, but is not limited by GC. It is also very flexible engine with many storage backend, cache algorithms, and so on. And finally MapDB is pure-java single 400K JAR and only depends on JRE 6+ or Android 2.1+.
If that works for you, you can start from here.
Why don`t you try ObjectOutputStream to store the map to a file and you can use ObjectInputStream to get data from the map.
I hope it would help you.
Is there a simple way of having a file backed Map?
The contents of the map are updated regularly, with some being deleted, as well as some being added. To keep the data that is in the map safe, persistence is needed. I understand a database would be ideal, but sadly due to constraints a database can't be used.
I have tried:
Writing the whole contents of the map to file each time it gets updated. This worked, but obviously has the drawback that the whole file is rewritten each time, the contents of the map are expected to be anywhere from a couple of entries to ~2000. There are also some concurrency issues (i.e. writing out of order results in loss of data).
Using a RandomAccessFile and keeping a pointer to each file's start byte so that each entry can be looked up using seek(). Again this had a similar issue as before, changing an entry would involve updating all of the references after it.
Ideally, the solution would involve some sort of caching, so that only the most recently accessed entries are kept in memory.
Is there such a thing? Or is it available via a third party jar? Someone suggested Oracle Coherence, but I can't seem to find much on how to implement that, and it seems a bit like using a sledgehammer to crack a nut.
You could look into MapDB which was created with this purpose in mind.
MapDB provides concurrent Maps, Sets and Queues backed by disk storage
or off-heap-memory. It is a fast and easy to use embedded Java
database engine.
Yes, Oracle Coherence can do all of that, but it may be overkill if that's all you're doing.
One way to do this is to "overflow" from RAM to disk:
BinaryStore diskstore = new BerkeleyDBBinaryStore("mydb", ...);
SimpleSerializationMap mapDisk = SimpleSerializationMap(diskstore);
LocalCache mapRAM = new LocalCache(100 * 1024 * 1024); // 100MB in RAM
OverflowMap cache = new OverflowMap(mapRAM, mapDisk);
Starting in version 3.7, you can also transparently overflow from RAM journal to flash journal. While you can configure it in code (as per above), it's generally just a line or two of config and then you ask for the cache to be configured on your behalf, e.g.
// simplest example; you'd probably use a builder pattern or a configurable cache factory
NamedCache cache = CacheFactory.getCache("mycache");
For more information, see the doc available from http://coherence.oracle.com/
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.
jdbm2 looks promising, never used it but it seems to be a candidate to meet your requirements:
JDBM2 provides HashMap and TreeMap which are backed by disk storage. It is very easy and fast way to persist your data. JDBM2 also have minimal hardware requirements and is highly embeddable (jar have only 145 KB).
You'll find many more solutions if you look for key/value stores.
I basically want to store a hashtable on disk so I can query it later. My program is written in Java.
The hashtable maps from String to List.
There are a lot of key-value stores out there, but after doing a lot of research/reading, its not clear which one is the best for my purposes. Here are some things that are important to me.
Simple key-value store which allows you to retrieve a value with a single key.
Good Java client that is documented well.
Dataset is small and there is no need for advanced features. Again, I want it to be simple.
I have looked into Redis and MongoDB. Both look promising but not ideal for my purposes.
Any info would be appreciated.
If your dataset is small and you want it to be SIMPLE. why don't you serialize your hashmap to a file or rdbms and load it in your application?
How do you wan't to "query" your hashmap? key approximation? value 'likeness'? I don't know, seems overkill to me to mantain a keyvalue storage just for the sake of.
What you are looking for is a library that supports object prevalence. These libraries are designed to be simple and fast providing collection like API. Below are few such libraries that allow you to work with collections but behind the scenes use a disk storage.
space4j
Advagato
Prevayler
Before providing any sort of answers, I'd start by asking myself why do I need to store this hashtable on disk as according to your description the data set is small and so I assume it can fit into memory. If it is just to be able to reuse this structure after restarting your application, then you can probably use any sort of format to persist it.
Second, you don't provide any reasons for Redis or MongoDB not being ideal. Based on your (short) 3 requirements, I would have said Redis is probably your best bet:
good Java clients
not only able to store lists, but also supports operations on the list values (so data is not opaque)
The only reason I could suppose for eliminating Redis is that you are looking for strict ACID characteristics. If that's what you are looking for than you could probably take a look at BerkleyDB JE. It has been around for a while and the documentation is good.
Check out JDBM2 - http://code.google.com/p/jdbm2/
I worked on the JDBM 1 code base, and have been impressed with what I've seen in jdbm2
Chronicle Map should be a perfect fit, it's an embeddable key-value store written in pure Java, so it acts as the best possible "client" (though actually there are no "client" or "server", you just open your database and have full read/update in-process access to it).
Chronicle Map resides a single file. This file could be moved around filesystem, and even sent to another machine with different OS and/or architecture and still be an openable Chronicle Map database.
To create or open a data store (if the database file is non-existent, it is created, otherwise an existing store is accessed):
ChronicleMap<String, List<Point>> map = ChronicleMap
.of(String.class, (Class<List<Point>>) (Class) List.class)
.averageKey("range")
.averageValue(asList(of(0, 0), of(1, 1)))
.entries(10_000)
.createPersistedTo(myDatabaseFile);
Then you can work with created ChronicleMap object just as with a simple HashMap, not bothering with keys and values serialization.
DISCLAIMER:
This question was not meant to be argumentative!
What is fastest and less memory draining way of searching a key-value pair? I will be storing items in a key-value like relation and I need to access them quickly. Should I use a SQLite database? A Map? A Hashtable? A HashMap? Please give some advantages/disadvantages of using whatever method of searching.
Any hash-based Map structure is the way to go as long as your hash function for the key is efficient. You can use value id:s as the result for the lookup to conserve memory during search.
If your data is already in database though, you can leave this search entirely to the RDBMS, after all they're made for this stuff.
If your data is in memory, Maps in general are your friends - they are meant for this.
Don't use a Hashtable however. It is much slower than newer Map implementations. because its methods are synchronized, which most of the time is not needed (and when needed, there is a much better alternative - see below).
In single-threaded context, HashMap is probably going to be OK.
If you need thread safety, use a ConcurrentHashMap.
I need a data structure to store users which should be retrieved by id.
I noticed there are several classes that implement the Map interface. Which one should be my default choice? They all seem quite equivalent to me.
Probably it depends on how many users you plan to have and if you will need them ordered or just getting single items by id.
HashMap uses hash codes to store things so you have constant time for put and get operations but items are always unordered.
TreeMap instead uses a binary tree so you have log(n) time for basic operations but items are kept ordered in the tree.
I would use HashMap because it's the simpler one (remember to give it a suitable initial capacity). Remember that these datastructures are not synchronized by default, if you plan to use it from more than one thread take care of using ConcurrentHashMap.
A middle approach is the LinkedHashMap that uses same structure as HashMap (hashcode and equals method) but it also keeps a doubly linked list of element inserted in the map (mantaining the order of insertion). This hybrid has ordered items (ordered in sense of insertion order, as suggested by comments.. just to be precise but I had already specified that) without performance losses of TreeMap.
No concurrency: use java.util.HashMap
Concurrency: use java.util.concurrent.ConcurrentHashMap
If you want some control on the order used by iterators, use a TreeMap or a LinkedHashMap.
This is covered on the Java Collections Trail, Implementations page.
If they all seem equivalent then you haven't read the documentation. Sun's documentation is pretty much as terse as it gets and provides very important points for making your choices.
Start here.
Your choice could be modified by how you intend to use the data structure, and where you would rather have performance - reads, or writes?
In a user-login system, my guess is that you'll be doing more reads than writes.
(I know I already answered this once, but I feel this needs saying)
Have you considered using a database to store this information? Even if it's SQLite, it'd probably be easier than storing your user database in the program code or loading the entire dataset into memory each time.