TL;DR I am looking for a way to store, increment and retrieve ranges of event counts by minute.
I am looking for a solution to creating an incrementing timeseries in redis. I am looking to store counts to the minute. My goal is to be able to look up a time range and get the values. So for instnace if an event occurred for a specific key 30 times a minute. I would want to do something like zrange and get their key values. I also am hoping to use something like zincrby to increment the value. I have of course looked at a sorted set which would have seemed like a perfect fit until I realized that I can only do a range scan on the score and not the value. The optimal solution would be to use the number of minutes as the score and then use the value in the sorted set as the number of events for that minute. The problem I ran into is the zincrby only increments the score and not the value. I was unable to find a way to increment the value atomically. I also looked into a hashmap using the current minute as the key and event count as the value. I was able to increment the value using hincrby but the problem is that it doesn't support fetching a range of keys.
Any help would be appreciated.
You know, right a question already has an answer. And you already says about redis way to solve your problem:
Use ZSET - key as time and value as counter.
Use HSET - key as time and value as counter.
Use string keys - key name as time and value as counter.
Why only this cases - becouse of only this structures (ZSET, HSET and string keys) has atomic methods to increment values.
So actualy:
You should make right choise about data structure.
Resolve the issue with the data selection.
The first question answer is compromise between memory and perfomance. From your question you do not need to have any types if sorting so sorted sets is not a best solution - consume lot of memory and ZINCRBY time complexity is O(log(N)) rather HINCRBY and INCRBY is O(1). So we should choose betweeh hashes and string keys. Please look at question and answer about right memory optimization in redis - according this i think you should use hashes as a data type for your solution.
The second question is common for any types of data structures becouse of all types of them do not contains select by name features or they analogs. And we may use HMGET or LUA scripting to solve this problem. In any case this solution would have time complexity O(n).
Here is sample with Jedis (i`m not an Java programmer, sorry for possible errors):
int fromMinute = 1;
int toMinute = 10;
List<String> list = new ArrayList<String>();
for(int i = fromMinute ; i < toMinute ; i++) {
list.add(i.toString());
}
Jedis jedis = new Jedis("localhost");
List<String> values = jedis.hmget("your_set_name", list);
This solution is atomic, fast, has time complexity O(n) and consume memory as little as possible in redis.
Related
I have list of objects which contains a statusEnum. Now, I want to return all those objects which falls under specific list of provided statuses.
A simple solution is to loop on list of objects and then another for loop on provided list of statusEnums ... This would work however, It would make the time complexity of O(n)^2. Is there a way i could reduce it to O(n) ?
I can't change the map. The only other solution i could think of is maintaining another map based on statusEnums as the key but then it would increase the space complexity a lot.
EDIT
I had hashMap of objects (which i said as a list)
Here is the code which i came up with for others ...
public List<MyObjects> getObjectsBasedOnCriteria (List<ObjectStatus> statuses, String secondCriteria){
EnumSet<ObjectStatus> enumSet = EnumSet.copyOf(statuses);
for (Map.Entry<Long, MyObject> objEntry : myObjs.entrySet()){
MyObjects obj = objEntry.getValue();
if (enumSet.contains(obj.getStatus()) && obj.equals(secondCriteria)){
...
}
}
}
Use an Set to hold statusEnums (probably an EnumSet), and check if each instance's status is in that set using set.contains(object.getStatus()), or whatever.
Lookups in EnumSet and HashSet are O(1), so the solution is linear (assuming just one status per object). EnumSet.contains is more efficient than HashSet.contains with enum values; however, the choice is irrelevant to overall time complexity.
Assuming you have a sane number of statuses, esp if you have a enum of statuses you can use an EnumSet to match the status or a HashMap.
Even if you don't do this the time complexity is O(n * m) where n is the number of entries and m if the number of statuses you are looking for. In general it is assumed that you will have much more records than you have statuses you are checking for.
The number of possible enum values is limited to a few thousand due to a limitation in way Java is compiled so this is always an upper bound for enums.
In Java.
How can I map a set of numbers(integers for example) to another set of numbers?
All the numbers are positive and all the numbers are unique in their own set.
The first set of numbers can have any value, the second set of numbers represent indexes of an array, and so the goal is to be able to access the numbers in the second set through the numbers in the first set. This is a one to one association.
Speed is crucial as the method will have to be called many times each second.
Edit: I tried it with SE hashmap implementation, but found it to be slow for my purposes.
There's an article, devoted to this problem (with a solution): Implementing a world fastest Java int-to-int hash map
Code can be found in related GitHub repository. (Best results are in class IntIntMap4a.java )
Citation from the article:
Summary
If you want to optimize your hash map for speed, you have to do as much as you can of the following list:
Use underlying array(s) with capacity equal to a power of 2 - it will allow you to use cheap & instead of expensive % for array index
Do not store the state in the separate array - use dedicated fields for free/removed keys and values.
Interleave keys and values in the one array - it will allow you to load a value into memory for free.
Implement a strategy to get rid of 'removed' cells - you can sacrifice some of remove performance in favor of more frequent get/put.
Scramble the keys while calculating the initial cell index - this is required to deal with the case of consecutive keys.
Yes, I know how to use citation formatting. But it looks awful and doesn't handle bullet lists well.
The structure you are looking for is called an associative array. In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears just once in the collection.
In java in particular as already mentioned this is easily done with a HashMap.
HashMap<Integer, Integer> cache = new HashMap<Integer, Integer>();
You can insert elements with the method put
cache.put(21, 42);
and you can retrieve a value with get
Integer key = 21
Integer value = cache.get(key);
System.out.println("Key: " + key +" value: "+ value);
Key: 21 value: 42
If you want to iterate through data you need to define an iterator:
Iterator<Integer> Iterator = cache.keySet().iterator();
while(Iterator.hasNext()){
Integer key = Iterator.next();
System.out.println("key: " + key + " value: " + cache.get(key));
}
Sounds like HashMap<Integer,Integer> is what you're looking for.
If you are willing to use an external library, you can use apache's IntToIntMap, which is a part of Apache Lucene.
It implements a pretty efficient int to int map that uses primitives for tasks that should not suffer the boxing overhead.
If you have a limit for the size of the first list, you can just use a large array. Suppose you know there first list only has numbers 0-99, you can use int[100]. Use the first number as an array index.
Your requirements can be satisfied by the Map interface. As an example, see HashMap<K,V>.
See Map and HashMap
I need some structure where to store N Enums, some of them repeated. And be able to easily extract them. So far I've try to use the EnumSet like this.
cards = EnumSet.of(
BEST_OF_THREE,
BEST_OF_THREE,
SIMPLE_QUESTION,
SIMPLE_QUESTION,
STAR);
But now I see it can only have one of each. Conceptually, which one would be the best structure to use for this problem.
Regards
jose
You can use a Map of type Enumeration -> Integer, where the integer indicates how many of each there are. The google guava "MultiSet" does this for you, and handles the edge cases of adding an enum to the set when there is not already an entry, and removing an enum when it leaves none left.
Another strategy is to use the Enumeration ordinal index. Because this index is unique, you can use this to index into an int array that is sized to the Enumeration size, where the count in each array slot would indicate how many of each enumeration you have. Like this:
// initialize array for counting each enumeration type
// TODO: someone should double check every initial value will be zero
int[] cardCount = new int[CardEnum.values().length];
...
// incrementing the count for an enumeration (when we add)
cardCount[BEST_OF_THREE.ordinal()]++;
...
// decrementing the count for an enumeration (when we remove)
cardCount[BEST_OF_THREE.ordinal()]--;
// DEBUG: assert cardCount[BEST_OF_THREE.ordinal()] >= 0
...
// getting the count for an enumeration
int count = cardCount[BEST_OF_THREE.ordinal()];
... Some time later
Having read the clarifying comments underneath the original post that explained what the OP was asking, it is clear that you're best off with a linear structure with an entry per element. I didn't realize that you didn't need detailed information on how many of each you needed. Storing them in a MultiSet or an equivalent counting structure makes it hard to randomly pick, as you need to attribute an index picked at random from [0, size) to a particular container, which takes log time.
Sets don't allow duplicates, so if you want repeats you'll need either a List or a Map.
If you just need the number of duplicates, an EnumMap with Integer values is probably your best bet.
If the order is important, and you need quick access to the number of each type, you'll probably need to roll your own data structure.
If the order is important (but the count of each is not), then a List is the way to go, which implementation depends on how you will use it.
LinkedList - Best when there will be many inserts/removals from the beginning of the List. Indexing into a LinkedList is very expensive, and should be avoided whenever possible. If a List is built by shifting data onto the front of the list, but any later additions are at the end, conversion to an ArrayList once the initial List is built is a good idea - especially if indexing into the List is anticipated at any point.
ArrayList - When in doubt, this is a good place to start. Inserting or removing items requires shifting, so if this is a common operation look elsewhere.
TreeList - This is a good all-around option, and insertions and removals anywhere in the List are inexpensive. This does require the Apache commons library, and uses a bit more memory than the others.
Benchmarks, and the code used go generate them can be found in this gist.
I'm new to Java and as a learning project, I would like to program a little vocabulary application, so that the user can test himself but also search for entries. However, I struggle to find the right datastructure for this and even after spending the last few days googling for it, I'm still at a loss.
Here is what I have in mind for my vocabulary object:
import java.io.*;
class Vocab implements Serializable {
String lang1;
String lang2;
int rightAnswersInARow; // to influence what to ask during testing
int numberOfTimesSearched; // to influence search suggestions
// ... plus the appropriate setter and getter methods.
}
Now for the testing, at first glance an ArrayList seems to be the most appropriate (choosing a random number and then selecting that object to test). But what if I would also like to factor in the rightAnswersInARow and ask vocabularies with a low number more often? My approach would be count the number of objects for each value, give each value an interval (e.g. the interval for rightAnswersInARow = 0 would be inflated by the factor 3) and then randomly select from there.
But even if I go through the ArrayList each time, get the rightAnswersInARow and determine the intervals...how would I then map the calculated number to the right index since the elements are not sorted? So would a TreeSet be more appropriate?
To search for entries in both languages and maybe even adding a dropdown-list with suggested words (like in Google's search) would require that I find the strings quickly (HashMap?). Or maybe go through 2+ (one for each language) TreeSets to reach the first element that starts with those letters, then selecting the next few elements from there? But that would mean the search would always suggest the same words, ignoring which words were searched for the most.
What would you suggest? Have a HashMap with each value pair and manually implement something like a relational database?
Thank you in advance! :)
I have a class along the lines of:
public class Observation {
private String time;
private double x;
private double y;
//Constructors + Setters + Getters
}
I can choose to store these objects in any type of collection (Standard class or 3rd party like Guava). I have stored some example data in an ArrayList below, but like I said I am open to any other type of collection that will do the trick. So, some example data:
ArrayList<Observation> ol = new ArrayList<Observation>();
ol.add(new Observation("08:01:23",2.87,3.23));
ol.add(new Observation("08:01:27",2.96,3.17));
ol.add(new Observation("08:01:27",2.93,3.20));
ol.add(new Observation("08:01:28",2.93,3.21));
ol.add(new Observation("08:01:30",2.91,3.23));
The example assumes a matching constructor in Observation. The timestamps are stored as String objects as I receive them as such from an external source but I am happy to convert them into something else. I receive the observations in chronological order so I can create and rely on a sorted collection of observations. The timestamps are NOT unique (as can be seen in the example data) so I cannot create a unique key based on time.
Now to the problem. I frequently need to find one (1) observation with a time equal or nearest to a certain time, e.g if my time was 08:01:29 I would like to fetch the 4th observation in the example data and if the time is 08:01:27 I want the 3rd observation.
I can obviously iterate through the collection until I find the time that I am looking for, but I need to do this frequently and at the end of the day I may have millions of observations so I need to find a solution where I can locate the relevant observations in an efficient manner.
I have looked at various collection-types including ones where I can filter the collections with Predicates but I have failed to find a solution that would return one value, as opposed to a subset of the collection that fulfills the "<="-condition. I am essentially looking for the SQL equivalent of SELECT * FROM ol WHERE time <= t LIMIT 1.
I am sure there is a smart and easy way to solve my problem so I am hoping to be enlightened. Thank you in advance.
Try TreeSet providing a comparator that compares the time. It mantains an ordered set and you can ask for TreeSet.floor(E) to find the greatest min (you should provide a dummy Observation with the time you are looking for). You also have headSet and tailSet for ordered subsets.
It has O(log n) time for adding and retrieving. I think is very suitable for your needs.
If you prefer a Map you can use a TreeMap with similar methods.
Sort your collection (ArrayList will probably work best here) and use BinarySearch which returns an integer index of either a match of the "closest" possible match, ie it returns an...
index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(),
Have the Observation class implement Comparable and use a TreeSet to store the objects, which will keep the elements sorted. TreeSet implements SortedSet, so you can use headSet or tailSet to get a view of the set before or after the element you're searching for. Use the first or last method on the returned set to get the element you're seeking.
If you are stuck with ArrayList, but can keep the elements sorted yourself, use Collections.binarySearch to search for the element. It returns a positive number if the exact element is found, or a negative number that can be used to determine the closest element. http://download.oracle.com/javase/1.4.2/docs/api/java/util/Collections.html#binarySearch(java.util.List,%20java.lang.Object)
If you are lucky enough to be using Java 6, and the performance overhead of keeping a SortedSet is not a big deal for you. Take a look at TreeSet ceiling, floor, higher and lower methods.