Does Java API of Rocks DB support prefix scan?

Does Java API of Rocks DB support prefix scan? - java

I have huge data set(key-value) in Rocks DB and I have to search for key based on prefix of key in hand. I do not want to scan whole data set to filter out key based on key-prefix. is there any way to do that?

You can use something like this.
Using RocksIterator there is a api exposed where you can seek to the key substring and if your key starts with the prefix then consider that key.
Please find the sample code.
List<String> result = new ArrayList<String>();
RocksIterator iterator = db.newIterator();
for (iterator.seek(prefix.getBytes()); iterator.isValid(); iterator
.next()) {
String key = new String(iterator.key());
if (!key.startsWith(prefix))
break;
result.add(String.format("%s", new String(iterator.key())));
}
Hope it will help you.

The answer of #Pramatha V works pretty well, although I made some improvements to the code. I am not deserializing the iterator key in every iteration. I am using the Bytes.increment() from the Kafka common utils (you can extract this class and use it in your code directly). This function increments the underlying byte array by adding 1. With this approach, I can find the next bigger key than my prefix key. I am using the BYTES_LEXICO_COMPARATOR (also from the same class) to make the comparison, but you are free to implement and use your comparator. Moreover, the function returns a map of byte arrays, which you can deserialize later in your code.
public Map<byte[], byte[]> prefixScan(final byte[] prefix) {
final Map<byte[], byte[]> result = new HashMap<>();
RocksIterator iterator = db.newIterator();
byte[] rawLastKey = increment(prefix);
for (iterator.seek(prefix); iterator.isValid(); iterator.next()) {
if (Bytes.BYTES_LEXICO_COMPARATOR.compare(iterator.key(), rawLastKey) > 0
|| Bytes.BYTES_LEXICO_COMPARATOR.compare(iterator.key(), rawLastKey) == 0) {
break;
}
result.put(iterator.key(), iterator.value());
}
iterator.close();
return result;
}

Seek is working very slow. 5.35 Seconds on SSD disk , 1 billion records.
The size of the Keys are fixed 16 bytes. Searched for 8 bytes.
2 Long bytes [xx,xx]
Searched for 1 Long as 8 bytes.
Use ColumnFamily for mapping keys.

Related

Access to cache elements grouped by key

I want to cache a large number of Java objects (String, byte[]) with a composite key (String, int) in a cache like JCS or Infinispan.
The keys can be grouped by the string part (let's call it ID) of it:
KEY = VALUE
-------------
A 1 = valueA1
A 4 = valueA4
A 5 = valueA5
B 9 = valueB9
C 3 = valueC3
C 7 = valueC7
I need to remove elements grouped by the ID part of the key, so for example A should remove A 1, A 4 and A 5.
First I tried something like this:
final List<String> keys = cache.keySet()
.stream().filter(k -> k.getId().equals(id)).collect(Collectors.toList());
keys.forEach(cache::remove);
While this works, it is - not surprising - very expensive and thus slow.
So I tried another approach by using only the ID as key and group the values in a map:
KEY = VALUE
---------------------------------------------
A = {1 = valueA1, 4 = valueA4, 5 = valueA5}
B = {9 = valueB9}
C = {3 = valueC3, 7 = valueC7}
Removing a group is then very efficient:
cache.remove(id);
But putting requires a get:
Map<Integer, Value> map = cache.get(key.getId());
if (map == null) {
map = new HashMap<>();
}
map.put(key.getInt(), value);
cache.put(key.getId(), map);
Now there are less elements in the cache with a simpler key, but the values are larger and more complex. Testing with hundreds of thousands of elements in the cache, deletes are fast and puts and gets don't seem to be noticeably slower.
Is this a valid solution or are there better approaches?

I suggest you use computeIfAbsent and save a put and get invocation as follows:
cache.computeIfAbsent(key.getId(), k -> new HashMap<Integer,Value>()).put(key.getInt(),value);
This method ensures the creation of the secondary map only if it is not already mapped in the primary map, and saves the need for an additional get invocation since it returns the secondary map mapped to the primary key.
References:
Map::computeIfAbsent
What is the difference between putIfAbsent and computeIfAbsent in Java 8 Map ?

Convert list to hashmap

Title of the question may give you the impression that it is duplicate question, but according to me it is not.
I am just a few months old in Java and a month old in MongoDB, SpringBoot and REST.
I have a Mongo Collection with 3 fields in a document, _id (default field), appName and appKey. I am using list to iterate through all the documents and find one document whose appName and appKey matches with the one that is passed. This collection right now has only 4 entries, and thus it is running smoothly. But I was reading a bit about collections and found that if there will be a higher number of documents in a collection then the result with list will be much slower than hashMap.
But as I have already said that I am quite new to Java, I am having a bit of trouble converting my code to hashMap, so I was hoping if someone can guide me through this.
I am also attaching my code for reference.
public List<Document> fetchData() {
// Collection that stores appName and appKey
MongoCollection<Document> collection = db.getCollection("info");
List<Document> nameAndKeyList = new ArrayList<Document>();
// Getting the list of appName and appKey from info DB
AggregateIterable<Document> output = collection
.aggregate(Arrays.asList(new BasicDBObject("$group", new BasicDBObject("_id",
new BasicDBObject("_id", "$id").append("appName", "$appName").append("appKey", "$appKey"))
)));
for (Document doc : output) {
nameAndKeyList.add((Document) doc.get("_id"));
}
return nameAndKeyList;
}// End of Method
And then I am calling it in another method of the same class:
List<Document> nameAndKeyList = new ArrayList<>();
//InfoController is the name of the class
InfoController obj1 = new InfoController();
nameAndKeyList = obj1.fetchData();
// Fetching and checking if the appName & appKey pair
// is present in the DB one by one.
// If appName & appKey mismatches, it increments the value
// of 'i' and check them with the other values in DB
for (int i = 0; i < nameAndKeyList.size(); i++) {
"followed by my code"
And if I am not wrong then there will be no need for the above loop also.
Thanks in advance.

You just need a simple find query to get the record you need directly from Mongo DB.
Document document = collection
.find(new Document("appName", someappname).append("appKey", someappkey)).first();

First of all a list is not much slower or faster than an HashMap. A Hasmap is commonly used to save key-pair values such as "ID", "Name" or something like that. In your case I see you are using ArrayList without a specified size for the list. better use a linked list when you do not know the size because an arraylist is holding a array behind and extending this by copying. If you want to generate a Hasmap out of the List or use a Hasmap you need to map an ID and the value to the records.
HashMap<String /*type of the identifier*/, String /*type of value*/> map = new HashMap<String,String>();
for (Document doc : output) {
map.put(doc.get("_id"), doc.get("_value"));
}

First, avoid premature optimization (lookup the expression if you don’t know what it is). Put a realistic number of thousands of items containing near-realistic data in your list. Try to retrieve an item that isn’t there. This will force your for loop to traverse the entire list. See how long it takes. Try a number of times to get an impression of whether you get impatient. If you don’t, you’re done.
If you find out that you need a speed-up, I agree that HashMap is one of the obvious solutions to try. One of the first things to consider with this is a key type for you HashMap. As I understand, what you need to search for is an item where appName and appKey are both right. The good solution is to write a simple class with these two fields and equals and hashCode methods (I’ll call it DocumentHashMapKey for now, think of a better name). For hashCode(), try Objects.hash(appName, appKey). If it doesn’t give satisfactory performance with the data you have, consider alternatives. Now you are ready to build your HashMap< DocumentHashMapKey, Document>.
If you’re lazy or just want a first impression of how a HashMap performs, you may also build your keys by concatenating appName + "$##" + appKey (where the string in the middle is something that is unlikely to be part of a name or key) and use HashMap<String, Document>.
Everything I said can be refined depending on your needs. This was just to get you started.

Thanks everyone for your help, without which I would not have got to a solution.
public HashMap<String, String> fetchData() {
// Collection that stores appName and apiKey
MongoCollection<Document> collection = db.getCollection("info");
HashMap<String, String> appKeys = new HashMap<String, String>();
// Getting the list of appName and appKey from info DB
AggregateIterable<Document> output = collection
.aggregate(Arrays.asList(new BasicDBObject("$group", new BasicDBObject("_id",
new BasicDBObject("_id", "$id").append("appName", "$appName").append("appKey", "$appKey"))
)));
String appName = null;
String appKey = null;
for (Document doc : output) {
Document temp = (Document) doc.get("_id");
appName = (String) temp.get("appName");
appKey = (String) temp.get("appKey");
appKeys.put(appName, appKey);
}
return appKeys;
Calling the above method into another method of the same class.
InfoController obj = new InfoController();
//Fetching the values of 'appName' & 'appKey' sent from 'info' DB
HashMap<String, String> appKeys = obj.fetchData();
storedAppkey = appKeys.get(appName);
//Handling the case of mismatch
if (storedAppkey == null || storedApikey.compareTo(appKey)!=0)
{//Then the response and further processing that I need to do.
Now what HashMap has done is that it has made my code more readable and the 'for' loop that I was using for iterating is gone, although it might not make much difference in the performance as of now.
Thanks once again to everyone for your help and support.

Access to the key-value pair of a Map with one element in Java

A method of mine returns a Map<A,B>. In some clearly identified cases, the map only contains one key-value pair, effectively only being a wrapper for the two objects.
Is there an efficient / elegant / clear way to access both the key and the value? It seems overkill to iterate over the one-element entry set. I'm looking for somehing that would lower the brain power required for people who will maintain this, along the lines of:
(...)
// Only one result.
else {
A leKey = map.getKey(whicheverYouWantThereIsOnlyOne); // Is there something like this?
B leValue = map.get(leKey); // This actually exists. Any Daft Punk reference was non-intentional.
}
Edit: I ended up going with #akoskm solution's below. In the end, the only satisfying way of doing this without iteration was with a TreeMap, and the overhead made that unreasonable.
It turns out there is not always a silver bullet, especially as this would be a very small rabbit to kill with it.

If you need both key/value then try something like this:
Entry<Long, AccessPermission> onlyEntry = map.entrySet().iterator().next();
onlyEntry.getKey();
onlyEntry.getValue();

You can use TreeMap or ConcurrentSkipListMap.
TreeMap<String, String> myMap = new TreeMap<String, String>();
String firstKey = myMap.firstEntry().getKey();
String firstValue = myMap.firstEntry().getValue();
Another way to use this:
String firstKey = myMap.firstKey();
String firstValue = myMap.get(myMap.firstKey());
This can work as an alternate solution.

There is a method called keySet() to get set of keys. read this thread.
else {
A leKey=map.keySet().iterator().next();
B leValue; = map.get(leKey); // This actually exists. Any Daft Punk reference was non-intentional.
}

Using for-each loop and var :
for(var entry : map.entrySet()){
A key = entry.getKey();
B value = entry.getValue();
}

Iterating Hashtable in Java (puzzle)

So here's the conundrum:
"kb" is an instance of a class that extends java.util.Hashtable
The key is a String, the stored value is of a class called "IntelCard"
This code extracts the keys, and endeavors to print data from the table
Set<String> ks = kb.keySet();
System.out.println(ks); // is this what we thought?
for(String key: ks){
IntelCard ic = kb.get(key);
String o = String.format("%-24s %24s %8s",
ic.name, ic.alliance, ic.might);
System.out.println(o);
}
This is the output:
[commanderv, repo, olaf, triguy]
triguy galactica 10000
triguy galactica 10000
triguy galactica 10000
triguy galactica 10000
We can see the dump of "ks" which is supposed to be the set of keys. But apparently it is selecting only the last "touched" entry in the Hashtable. (In this test, "triguy" was the last value added.)
Is there a need to reset the Hashtable selector somehow? It doesn't make sense, since the code selects each value by key. Is there a need to reset the selector on the key set (ks)? That doesn't make sense either since the loop should simply iterate over the entire set.
I dunno, what am I missing?
---v

Probably you have the same IntelCard object associated to multiple keys; to be sure you are iterating over all keys, format string using String.format(key, ic.alliance, ic.might).
Iterate over map using Map.Entry<K,V> instead of using keySet()/get() pair:
for(final Map.Entry<String,IntelCard> e : kb.entrySet()) {
IntelCard ic = e.getValue();
String o = String.format("%-24s %24s %8s", ic.name, ic.alliance, ic.might);
System.out.println(o);
}

Apparently you added 4 similar (same toString() of fields) IntelCards with different keys. Hashtable has unique keys, not necessarily unique values.

java.util.Collections$UnmodifiableMap problem : code included

I am building a facebook platform web app using GWT and hosting it on App Engine.
I am adding validation code that uses supplied query string parameters in the callback url. GWT allows me to get these parameters by calling Window.Location.getParameterMap() and the returned Map is immutable.
I may be wrong however I think this problem has nothing to do with FB, GWT or App Engine specifically and is more down to my misunderstanding something about Map objects.
I don't think that my code attempts to modify the supplied Map but the error I get seems to suggest that my code is trying to modify an immutable Map.
Can someone please take a look and let me know where I am modifying an unmodifiable Map?
I would supply a stack trace but I can't find a way to get a stack trace for this to display in App Engine logs.
Thanks in advance for any and all help :-)
/**
* Validation Test
* To generate the signature for these arguments:
* 1. Remove the fb_sig key and value pair.
* 2. Remove the "fb_sig_" prefix from all of the keys.
* 3. Sort the array alphabetically by key.
* 4. Concatenate all key/value pairs together in the format "k=v".
* 5. Append your secret key.
* 6. Take the md5 hash of the whole string.
* #param fbQueryStringParams
* #return String
*/
public String test(Map<String,List<java.lang.String>> fbQueryStringParams) {
String appSecret = TinyFBClient.APP_SECRET;
String fbSig = fbQueryStringParams.get("fb_sig").get(0);
StringBuilder sb = new StringBuilder();
TreeMap<String,String> sortedMap = new TreeMap<String,String>();
// Get a Set view of the Map of query string parameters.
Set<Map.Entry<String,List<java.lang.String>>> mapEntries = fbQueryStringParams.entrySet();
// Iterate through the Set view, inserting into a SortedMap all Map.Entry's
// that do not have a Key value of "fb_sig".
Iterator<Map.Entry<String,List<java.lang.String>>> i = mapEntries.iterator();
while(i.hasNext()) {
Map.Entry<String,List<java.lang.String>> mapEntry = i.next();
if(!mapEntry.getKey().equals("fb_sig")) { // 1. Remove the fb_sig key and value pair.
sortedMap.put(mapEntry.getKey(),mapEntry.getValue().get(0)); // 3. Sort the array alphabetically by key.
}
}
// Get a Set view of the Map of alphabetically sorted Map.Entry objects.
Set<Map.Entry<String,String>> sortedMapEntries = sortedMap.entrySet();
// Iterate through the Set view, appending the concatenated key's and value's
// to a StringBuilder object.
Iterator<Map.Entry<String,String>> ii = sortedMapEntries.iterator();
while(ii.hasNext()) {
Map.Entry<String,String> mapEntry = ii.next();
// 4. Concatenate all key/value pairs together in the format "k=v".
sb.append(mapEntry.getKey().replaceAll("fb_sig_","")); // 2. Remove the "fb_sig_" prefix from all of the keys.
sb.append("=");
sb.append(mapEntry.getValue());
}
sb.append(appSecret); // 5. Append your secret key.
String md5 = DigestUtils.md5Hex(sb.toString()); // 6. Take the md5 hash of the whole string.
// Build and return an output String for display.
StringBuilder output = new StringBuilder();
output.append("fbSig = "+fbSig);
output.append("<br/>");
output.append("md5 = "+md5);
return output.toString();
}

copy the Windows.Location.getParameterMap() in a HashMap and it will work:
So you send new HashMap>( Windows.Location.getParameterMap()) over RPC that works.
The problem is that unmodifiableMap is not Serializable for GWT. I know that it has a Serializable marker, but in GWT it works a little bit different. Most collection classes have a custom GWT implementation and some are not 100% compatible.

I don't see any unmodifiable collections.
Your code is pretty complicated. If I understood it right, then this should be equivalent. I wouldn't use Map.Entry objects and the TreeMap has a handy constructor for your needs. And finally, I'd prefer the 'forall' loop over the iterator.
public String test(Map<String, List<java.lang.String>> fbQueryStringParams) {
String appSecret = TinyFBClient.APP_SECRET;
String fbSig = fbQueryStringParams.get("fb_sig").get(0);
StringBuilder sb = new StringBuilder();
TreeMap<String, List<String>> sortedMap = new TreeMap<String, List<String>>(fbQueryStringParams);
sortedMap.remove("fbSig"); // remove the unwanted entry
for (String key, sortedMap.keySet()) {
List<String> values = sortedMap.get(key);
String printableKey = key.replaceAll("fb_sig_", ""));
String value = "EMPTY LIST";
if (!values.isEmpty()) {
// This could have been your problem, you always
// assume, all lists in the map are not empty
value = values.get(0);
}
sb.append(String.format("%s=%s", printableKey, value);
}
sb.append(appSecret);
String md5 = DigestUtils.md5Hex(sb.toString());
// Build and return an output String for display.
StringBuilder output = new StringBuilder();
output.append("fbSig = " + fbSig);
output.append("<br/>");
output.append("md5 = " + md5);
return output.toString();
}
While refactoring I found one possible bug: when you create the sorted map in your code, you assume, all lists in the map are not empty. So the first empty list will cause a NPE in the first loop.

Do a System.out.println(fbQueryStringParams.getClass()); at the start of the message (or log it or whatever you need to be able to see what it is).
If that argument is passed to you from the system it is very likely wrapped as an unmodifiable collection since they don't want you altering it.

Did I understand it correctly that you are doing a Window.Location.getParameterMap in your client code and sending it to the server in a RPC call ? In that case ... the question is: is that ParameterMap serializable ? Not all implementations are in fact supported in GWT. So it might just be that your server code is not even called but that it crashes before it can send the request. Did you see any warning during GWT compilation ?
The code, although the implementation can be cleaned up and indeed you can have a NPE, is NOT modifying the supplied parameter Map or the List in the Map values. So the problem is probably somewhere else.
Why don't you run your application in hosted mode (or development mode as they call it in GWT 2.0) ?
David

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Does Java API of Rocks DB support prefix scan? - java

I have huge data set(key-value) in Rocks DB and I have to search for key based on prefix of key in hand. I do not want to scan whole data set to filter out key based on key-prefix. is there any way to do that?

Seek is working very slow. 5.35 Seconds on SSD disk , 1 billion records. The size of the Keys are fixed 16 bytes. Searched for 8 bytes. 2 Long bytes [xx,xx] Searched for 1 Long as 8 bytes. Use ColumnFamily for mapping keys.

Related

Access to cache elements grouped by key

Convert list to hashmap

Access to the key-value pair of a Map with one element in Java

Iterating Hashtable in Java (puzzle)

java.util.Collections$UnmodifiableMap problem : code included

Categories

Resources