How to find intersection of two keys in Redis using Java? - java

I have two keys in the Redis. First key contains set of strings as a value.Second key contains the sorted set of object(String as a value and score ). I want to fetch elements where string in first key and string field of the object in the second key are similar.
If I replace object with a string in the second key,I am able to fetch but I want to fetch list of strings along with their score.
I am using Spring-data-redis and jedis for Redis handling.
Is it possible to fetch list of common strings and their corresponding score? If yes, how.

How you are storing your data will affect how you want to retrieve it. By storing your keys as listed in the comment, you are basically limited to string manipulation to determine anything useful, and that really isn't the value of using Redis. (It's not meant for "searching", its meant for fast lookups.)
Consider something like this:
The keys used in Redis will be your first set of strings, each containing a list of values. The values in those lists will be your second set of strings, and may be duplicated in different lists (as you see fit).
LPUSH "x1" "POJO[field1=x1, field2=y1]" "POJO[field1=x1, field2=y2]"
LPUSH "x2" "POJO[field1=x2, field2=y2]"
etc...
When you want the values of your first number
LRANGE x1 0 1000 (or LLEN x1 --> "result", then LRANGE x1 0 "result")

Related

Data Structure choices based on requirements

I'm completely new to programming and to java in particular and I am trying to determine which data structure to use for a specific situation. Since I'm not familiar with Data Structures in general, I have no idea what structure does what and what the limitations are with each.
So I have a CSV file with a bunch of items on it, lets say Characters and matching Numbers. So my list looks like this:
A,1,B,2,B,3,C,4,D,5,E,6,E,7,E,8,E,9,F,10......etc.
I need to be able to read this in, and then:
1)display just the letters or just the numbers sorted alphabetically or numerically
2)search to see if an element is contained in either list.
3)search to see if an element pair (for example A - 1 or B-10) is contained in the matching list.
Think of it as an excel spreadsheet with two columns. I need to be able to sort by either column while maintaining the relationship and I need to be able to do an IF column A = some variable AND the corresponding column B contains some other variable, then do such and such.
I need to also be able to insert a pair into the original list at any location. So insert A into list 1 and insert 10 into list 2 but make sure they retain the relationship A-10.
I hope this makes sense and thank you for any help! I am working on purchasing a Data Structures in Java book to work through and trying to sign up for the class at our local college but its only offered every spring...
You could use two sorted Maps such as TreeMap.
One would map Characters to numbers (Map<Character,Number> or something similar). The other would perform the reverse mapping (Map<Number, Character>)
Let's look at your requirements:
1)display just the letters or just the numbers sorted alphabetically
or numerically
Just iterate over one of the maps. The iteration will be ordered.
2)search to see if an element is contained in either list.
Just check the corresponding map. Looking for a number? Check the Map whose keys are numbers.
3)search to see if an element pair (for example A - 1 or B-10) is
contained in the matching list.
Just get() the value for A from the Character map, and check whether that value is 10. If so, then A-10 exists. If there's no value, or the value is not 10, then A-10 doesn't exist.
When adding or removing elements you'd need to take care to modify both maps to keep them in sync.

User Defined Index for an ArrayList of IDs

I'm in a situation where (in Java) I need to define an arraylist of generated ids. I don't know how many would be generated at any given time, but I do know that when one is generated, the user who generated it would need to set a custom index, and be able to retrieve it by that index. What would be the generally accepted standard way of storing and working with a data structure like this? An arraylist of arrays?
Sounds like a use case for a Map which you can use the ID as the key and a value (or potentially an array of values, if multiple values can have the same id) as the value. You can then index into the map and retrieve data using the key. The benefit is that this works even if you want to change the ID from an int to a String or even some other idea.
The problem with using a List like this is if I have two ids 1 and 3000 then there are 2998 indices that are wasted, which is not exactly ideal.

Efficient way to find the difference between two data sets

I have two copies of data, here 1 represents my volumes and 2 represent my issues. I have to compare COPY2 with COPY1 and find all the elements which are missing in COPY2 (COPY1 will always be a superset and COPY2 can be equal or will always be a subset).
Now, I have to get the missing volume and the issue in COPY2.
Such that from the following figure(scenario) I get the result as : -
Missing files – 1-C, 1-D, 2-C, 2-C, 3-A, 3-B, 4,E.
Question-
What data structure should I use to store the above values (volume and issue) in java?
How should I implement this scenario in java in the most efficient manner to find the difference between these 2 copies?
I suggest a flat HashSet<VolumeIssue>. Each VolumeIssue instance corresponds to one categorized issue, such as 1-C.
In that case all you will need to find the difference is a call
copy1.removeAll(copy2);
What is left in copy1 are all the issues present in copy1 and missing from copy2.
Note that your VolumeIssue class must properly implement equals and hashCode for this to work.
Since you've added the Guava tag, I'd go for a variation of Marco Topolnik's answer. Instead of removing one set from the other, use Sets.difference(left, right)
Returns an unmodifiable view of the difference of two sets. The
returned set contains all elements that are contained by set1 and not
contained by set2. set2 may also contain elements not present in set1;
these are simply ignored. The iteration order of the returned set
matches that of set1.
What data structure should I use to store the above values (volume and issue) in java?
You can have a HashMap's with key and value pairs.
key is Volume and Value is a List of Issues.
How should I implement this scenario in java in the most efficient manner to find the difference between these 2 copies?
By getting value from both the HashMap's so you get two List's of value. Then find the difference between those two lists.
consider you got two list of values with same key from two maps.
now
Collection<Issue> diff = list1.removeAll( list2 );

Comparator for TreeBag to sort by the number of occurrences

I have a source of strings (let us say, a text file) and many strings repeat multiple times. I need to get the top X most common strings in the order of decreasing number of occurrences.
The idea that came to mind first was to create a sortable Bag (something like org.apache.commons.collections.bag.TreeBag) and supply a comparator that will sort the entries in the order I need. However, I cannot figure out what is the type of objects I need to compare. It should be some kind of an internal map that combines my object (String) and the number of occurrences, generated internally by TreeBag. Is this possible?
Or would I be better off by simply using a hashmap and sort it by value as described in, for example, Java sort HashMap by value
Why don't you put the strings in a map. Map of string to number of times they appear in text.
In step 2, traverse the items in the map and keep on adding them to a minimum heap of size X. Always extract min first if the heap is full before inserting.
Takes nlogx time.
Otherwise after step 1 sort the items by number of occurrences and take first x items. A tree map would come in helpful here :) (I'd add a link to the javadocs, but I'm in a tablet )
Takes nlogn time.
With Guava's TreeMultiset, just use Multisets.copyHighestCountFirst.

Java Collections - Effienct search for DateTime ranges

I have a case where I have a table (t1) which contains items like
| id | timestamp | att1 | att2 |
Now I have to iterate over a collection of elements of type att1 and get all records from t1 which are between two certain timestamps for this att1. I have to do this operation several times for a single att1.
So in order to go easy on the database queries, I intended to load every entry from t1 which has a certain att1 attribute once into a collection and perform the subsequent searches on this collection.
Is there a collection that could handle a search like between '2011-02-06 09:00:00' and '2011-02-06 09:00:30'? It's not guaranteed to contain entries for those two timestamps.
Before writing an implementation for that (most likely a very slow implementation ^^) I wanted to ask you guys if there might be some existing collections already or how I could tackle this problem.
Thanks!
Yes. Use TreeMap which is basically a sorted map of key=>value pairs and its method TreeMap::subMap(fromKey, toKey).
In your case you would use timestamps as keys to the map and for values att1 attribute or id or whatever else would be most convenient for you.
The closest I can think of, and this isn't really what I would consider ideal, is to write a comparator that will sort dates so that those within the range count as less than those outside the range (always return -1 when comparing in to out, 0 when comparing in to in or out to out, and always return +1 when comparing out to in.
Then, use this comparator to sort a collection (I suggest an ArrayList). The values within the range will appear first.
You might just be better off writing your own filter, though. Input a collection (I recommend a LinkedList), iterate over it, and remove anything not in the range. Keep a master copy around for spawning new ones to pass into the filter, if you need to.
You can make the object you want in your collection, which I think is att1, implement the Comparable interface and then have the compareTo method compare the timestamp field. With this in place it will work in any sorted collection, such as a treeSet, making it easy to iterate and pull out everything in a certain range.

Categories