I have two identical array lists in java each having a string value and an integer count. Now I have to merge these array lists into a single one, in which if the value is present, i will just increment the count, if the value is not present, i will just add the value and the count as such.
The question is, is there anyway I can do it graciously other than iterating in a for loop and if checking every value?
You can't, there's too much custom logic. Iterate, check and add - that's the best approach, and will be more readable.
Technically, you can use a Multiset from guava, but there the count is taken care of by the collection itself, rather than you, so it might require some more work.
The question is, is there anyway I can do it graciously other than
iterating in a for loop and if checking every value?
Short answer is no.
You would be better of using HashMap as a container, at least the merging operation would perform faster. You need a loop in any case. (since there is no addAll / putAll wich could update your counts).
Related
I need some structure where to store N Enums, some of them repeated. And be able to easily extract them. So far I've try to use the EnumSet like this.
cards = EnumSet.of(
BEST_OF_THREE,
BEST_OF_THREE,
SIMPLE_QUESTION,
SIMPLE_QUESTION,
STAR);
But now I see it can only have one of each. Conceptually, which one would be the best structure to use for this problem.
Regards
jose
You can use a Map of type Enumeration -> Integer, where the integer indicates how many of each there are. The google guava "MultiSet" does this for you, and handles the edge cases of adding an enum to the set when there is not already an entry, and removing an enum when it leaves none left.
Another strategy is to use the Enumeration ordinal index. Because this index is unique, you can use this to index into an int array that is sized to the Enumeration size, where the count in each array slot would indicate how many of each enumeration you have. Like this:
// initialize array for counting each enumeration type
// TODO: someone should double check every initial value will be zero
int[] cardCount = new int[CardEnum.values().length];
...
// incrementing the count for an enumeration (when we add)
cardCount[BEST_OF_THREE.ordinal()]++;
...
// decrementing the count for an enumeration (when we remove)
cardCount[BEST_OF_THREE.ordinal()]--;
// DEBUG: assert cardCount[BEST_OF_THREE.ordinal()] >= 0
...
// getting the count for an enumeration
int count = cardCount[BEST_OF_THREE.ordinal()];
... Some time later
Having read the clarifying comments underneath the original post that explained what the OP was asking, it is clear that you're best off with a linear structure with an entry per element. I didn't realize that you didn't need detailed information on how many of each you needed. Storing them in a MultiSet or an equivalent counting structure makes it hard to randomly pick, as you need to attribute an index picked at random from [0, size) to a particular container, which takes log time.
Sets don't allow duplicates, so if you want repeats you'll need either a List or a Map.
If you just need the number of duplicates, an EnumMap with Integer values is probably your best bet.
If the order is important, and you need quick access to the number of each type, you'll probably need to roll your own data structure.
If the order is important (but the count of each is not), then a List is the way to go, which implementation depends on how you will use it.
LinkedList - Best when there will be many inserts/removals from the beginning of the List. Indexing into a LinkedList is very expensive, and should be avoided whenever possible. If a List is built by shifting data onto the front of the list, but any later additions are at the end, conversion to an ArrayList once the initial List is built is a good idea - especially if indexing into the List is anticipated at any point.
ArrayList - When in doubt, this is a good place to start. Inserting or removing items requires shifting, so if this is a common operation look elsewhere.
TreeList - This is a good all-around option, and insertions and removals anywhere in the List are inexpensive. This does require the Apache commons library, and uses a bit more memory than the others.
Benchmarks, and the code used go generate them can be found in this gist.
I am working on an algorithm that will store a small list of objects as a sublist of a larger set of objects. the objects are inherently ordered, so an ordered list is required.
The most common operations performed will be, in order of frequency:
retrieving the nth element from the list (for some arbitrary n)
inserting a single to the beginning or end of the list
removing the first or last n elements from the list (for some
arbitrary n)
removing and inserting from the middle will never be done so there is no need to consider the efficiency of that.
My question is what implementation of List is most efficient for this use case in Java (i.e. LinkedList, ArrayList, Vector, etc)? Please defend your answer by explaining the implementation s of the different data structures so that I can make an informed decision.
Thanks.
NOTE
No, this is not a homework question. No, I do not have an army research assistants who can do the work for me.
Based on your first criteria (arbitrary access) you should use an ArrayList. ArrayLists (and arrays in general) provide lookup/retrieval in constant time. In contrast, it takes linear time to look up items in a LinkedList.
For ArrayLists, insertion or deletion at the end is free. It may also be with LinkedLists, but that would be an implementation-specific optimization (it's linear otherwise).
For ArrayLists, insertion or deletion at front requires linear time (with consistent reuse of space, these may become constant depending on implementation). LinkedList operations at front of list are constant.
The last two usage cases somewhat balance each other out, however your most common case definitely suggests array-based storage.
As far as basic implementation details:
ArrayLists are basically just sequential sections of memory. If you know where the beginning is, you can just do a single addition to find the location of any element. Operations at the front are expensive because elements may have to be shifted to make room.
LinkedLists are disjoint in memory and consist of nodes linked to each other (with a reference to the first node). To find the nth node, you have to start at the first node and follow links until you reach the desired node. Operations at the front just require creating a node and updating your start pointer.
I vote for double linked list. http://docs.oracle.com/javase/6/docs/api/java/util/Deque.html
Probably the best data structure for this purpose would be a deque implemented with a dynamic array, which is basically an ArrayList that starts adding elements to the middle of the internal array instead of the beginning. Unfortunately Java's ArrayDeque does not support looking up an nth element.
It is, however, pretty easy to implement one yourself (or lookup an existing implementation), and then all three of the described operations can be done in O(1).
YOu can do all of them with arrayList with minimal confusion if your not worried about efficiency.
i would uses some sort of a queue or stack if i am only inserting at the front or end. They have the least overhead. Or you could also use a linked list.
To remove N elements from the first or end i would use a linked list, you can just delete one node and the ones before or after it are gone. Ie if i delete the first 5 elements just delete the 5th element and the ones before it will disappear. Also if i delete the last 6 elements just delete the 6th to last one and the rest will disappear. And java will do the garbage collecting for you. This would be an order of (1) for this operation.
is this a homework question?
Definitely go for LinkedList. For both inserting a value at the beginning/end of the list and removing the first/last element in the list, it runs in O(1). This is because all that needs to be changed to carry out these operations is a couple of pointers, a minimally costly operation.
Although ArrayLists retrieve the nth element in O(1) while LinkedLists retrieve the nth element in O(n), ArrayLists run the danger of having to adjust their size when elements are inserted. What do you suppose happens when the memory allotted for the ArrayList is used up and you try to insert another element? Well what happens is the ArrayList duplicates itself then allocates more memory (amounting to twice as much as it had initially allocated), a very costly operation. LinkedLists don't have this problem since, again, all that is done is the addition of a pointer.
I don't know a whole lot about Java Vectors, but if they're anything like C++ vectors, then they're very similar to ArrayLists.
I hope this helps.
java.util.TreeMap of Long to Object, and use index of i+tm.firstKey()
All,
I am wondering what's the most efficient way to check if a row already exists in a List<Set<Foo>>. A Foo object has a key/value pair(as well as other fields which aren't applicable to this question). Each Set in the List is unique.
As an example:
List[
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:4]
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:2][Foo_Key:C, Foo_Value:4]
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:3]
]
I want to be able to check if a new Set (Ex: Set[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:4]) exists in the List.
Each Set could contain anywhere from 1-20 Foo objects. The List can contain anywhere from 1-100,000 Sets. Foo's are not guaranteed to be in the same order in each Set (so they will have to be pre-sorted for the correct order somehow, like a TreeSet)
Idea 1: Would it make more sense to turn this into a matrix? Where each column would be the Foo_Key and each row would contain a Foo_Value?
Ex:
A B C
-----
1 3 4
1 2 4
1 3 3
And then look for a row containing the new values?
Idea 2: Would it make more sense to create a hash of each Set and then compare it to the hash of a new Set?
Is there a more efficient way I'm not thinking of?
Thanks
If you use TreeSets for your Sets can't you just do list.contains(set) since a TreeSet will handle the equals check?
Also, consider using Guava's MultSet class.Multiset
I would recommend you use a less weird data structure. As for finding stuff: Generally Hashes or Sorting + Binary Searching or Trees are the ways to go, depending on how much insertion/deletion you expect. Read a book on basic data structures and algorithms instead of trying to re-invent the wheel.
Lastly: If this is not a purely academical question, Loop through the lists, and do the comparison. Most likely, that is acceptably fast. Even 100'000 entries will take a fraction of a second, and therefore not matter in 99% of all use cases.
I like to quote Knuth: Premature optimisation is the root of all evil.
I have a case where I have a table (t1) which contains items like
| id | timestamp | att1 | att2 |
Now I have to iterate over a collection of elements of type att1 and get all records from t1 which are between two certain timestamps for this att1. I have to do this operation several times for a single att1.
So in order to go easy on the database queries, I intended to load every entry from t1 which has a certain att1 attribute once into a collection and perform the subsequent searches on this collection.
Is there a collection that could handle a search like between '2011-02-06 09:00:00' and '2011-02-06 09:00:30'? It's not guaranteed to contain entries for those two timestamps.
Before writing an implementation for that (most likely a very slow implementation ^^) I wanted to ask you guys if there might be some existing collections already or how I could tackle this problem.
Thanks!
Yes. Use TreeMap which is basically a sorted map of key=>value pairs and its method TreeMap::subMap(fromKey, toKey).
In your case you would use timestamps as keys to the map and for values att1 attribute or id or whatever else would be most convenient for you.
The closest I can think of, and this isn't really what I would consider ideal, is to write a comparator that will sort dates so that those within the range count as less than those outside the range (always return -1 when comparing in to out, 0 when comparing in to in or out to out, and always return +1 when comparing out to in.
Then, use this comparator to sort a collection (I suggest an ArrayList). The values within the range will appear first.
You might just be better off writing your own filter, though. Input a collection (I recommend a LinkedList), iterate over it, and remove anything not in the range. Keep a master copy around for spawning new ones to pass into the filter, if you need to.
You can make the object you want in your collection, which I think is att1, implement the Comparable interface and then have the compareTo method compare the timestamp field. With this in place it will work in any sorted collection, such as a treeSet, making it easy to iterate and pull out everything in a certain range.
I've been able to read a four column text file into a hashmap and get it to write to a output file. However, I need to get the second column(distinct values) into a hashset and write to the output file. I've been able to create the hashset, but it is grabbing everything and not sorting. By the way I'm new, so please take this into consideration when you answer. Thanks
Neither HashSet nor HashMap are meant to sort. They're fundamentally unsorted data structures. You should use an implementation of SortedSet, such as TreeSet.
Some guesses, related to mr Skeets answer and your apparent confusion...
Are you sure you are not inserting the whole line in the TreeSet? If you are going to use ONLY the second column, you will need to split() the strings (representing the lines) into columns - that's nothing that's done automatically.
Also, If you are actually trying to sort the whole file using the second column as key, You will need a TreeMap instead, and use the 2:nd column as key, and the whole line as data. But that won't solve the splitting, it only to keep the relation between the line and the key.
Edit: Here is some terminology for you, you might need it.
You have a Set. It's a collection of other objects - like String. You add other objects to it, and then you can fetch all objects in it by iterating through the set. Adding is done through the method add()and iterating can be done using the enhanced for loop syntax or using the iterator() method.
The set doesn't "grab" or "take" stuff; You add something to the set - in this case a String - Not an array of Strings which is written as String[]
(Its apparently possible to add array to a TreeSet (they are objects too) , but the order is not related to the contents of the String. Maybe thats what you are doing.)
String key = splittedLine[1]; // 2:nd element
"The second element of the keys" doesn't make sense at all. And what's the duplicates you're talking about. (note the correct use of apostrophes... :-)