Searching data in a array(list) - java

I have a ArrayList containing Attributes
class Attribute{
private int id;
public string getID(){
return this.id;
}
private string value;
public string getValue(){
return this.value;
}
//... more properties here...
}
Well I filled the ArrayList with like hundreds of those attributes. And I want to find the Attribute with a defined ID. I want to do something like this:
ArrayList<Attribute> arr = new ArrayList<Attribute>();
fillList(arr); //Method that puts a lot of these Attributes in the list
arr.find(234); //Find the attribute with the ID 234;
Is looping over the ArrayList the only solution.

Well something's going to have to loop over the array list, yes. There are various ways of doing this, different libraries etc.
If you fill the array in an ordered way (e.g. so that low IDs always come before high IDs) then you can perform a binary search in O(log N) time. Otherwise, it'll be O(N).
If you're going to search by IDs a lot, however, why not create a Map<Integer, Attribute> to start with - e.g. a HashMap, or a LinkedHashMap if you want to preserve ordering?
If you're only going to search for a single ID (or a few), however, this almost certainly won't be worth it - there's a cost involved in hashing, after all; filling the map will be more expensive than filling the list, and the difference is likely to be greater than the time saved looking up a few IDs.
Have you already established that this is a performance bottleneck? If so, this is an easy place to improve by using a map (or just a sorted list with a binary search). If not, I wouldn't disturb your code if it more naturally uses a list than a map - but you should certainly check whether it's a bottleneck or not.

You want to use a Map

If you wish to access elements of a collection using element attributes, and this attribute is guaranteed to be unique per element, then you really should use a Map. Try a Map with the Attribute.id as the key.

Related

Is using a HashMap the simplest solution for storing an Object that has an ID?

I have a class in some code, ChatChannel (some unneccessary code omitted), that I'm having a bit of trouble with.
public class ChatChannel {
private static HashMap<String, ChatChannel> registeredChannels = new HashMap<>(); // ChannelID, ChatChannel Object
public static void registerChannel(ChatChannel channel) {
registeredChannels.put(channel.getId(), channel);
}
public static ChatChannel getChannelById(String id) {
return registeredChannels.getOrDefault(id, null);
}
/** The actual ChatChannel item is defined BELOW THIS LINE **/
private String name;
private String id;
public ChatChannel(String name, String id) {
this.name = name;
this.id = id;
}
public static String getId() {
return id;
}
}
Essentially, this class will allow me to separate messages sent by users into "channels." Users may only receive messages in joined channels, and may only send a message to their active channel. Channels should be accessible using their ID (for example, global).
However, my problem is I don't know whether I should use a HashMap or Collection in order to keep the code light and simple. Ideally, I'd like to be able to reference any ChatChannel by its id at any point in the code, so I don't need to constantly pass around these ChatChannels. What, if any, would the performance gain of using HashMap (and external IDs) be? Would it be roughly equal to using a Collection and then iterating through it using my getId() method? If so, which is considered "proper" Java?
To answer the stated question "Should I be using a HashMap or Collection for performance?" — you can't and won't use a "Collection" in this sense because a Collection is an abstract concept, represented in Java as an interface.
A Collection could be a List, or a Map, or a Set, among other things. You can write a method that, for example, accepts (any kind of) a Collection and performs an operation on everything in the Collection, but in your case here you must decide on what kind of collection to use in your implementation.
Since you're retrieving a channel given an identifier String, a Map is a useful choice because it is a key-to-value mapping; you don't have to iterate through it to find the element that has the desired key.
You should generally declare things generically, then instantiate them with a specific implementation. That is, when working with it in your code you don't care what sort of Map it is, just that it's a Map. The actual map that you allocate could be a HashMap or a LinkedHashMap or a TreeMap — since maintaining the insertion order or keeping things sorted doesn't seem to matter here, the plain HashMap appears appropriate.
private static Map<String, ChatChannel> registeredChannels = new HashMap<>();
// ^^^ generic declarat | specific implementation ^^^^
You might also know something about how many channels there are likely to be, or at least the size of the starting set of channels, so you may also consider the initialCapacity and the loadFactor parameters to the constructor, for example
// Allocate with room for 10 initial channels, expand the map size when 75% full
private static Map<String, ChatChannel> registeredChannels =
new HashMap<>(10, 0.75);
It is quite likely you have IDs from a continuous range.. like 1,2,3,4... or 110,111,112,113,114... with maybe some holes. It then becomes easy to hash such sequences to a sequence like 0,1,2,3,4.... .
Now you can use a pure array(!) which is super fast. The numbers 0..n map to index in array and access cannot be faster. The index contains pointer to the session data.
Basically an array is a map. Key is index number, and value is what it contains or points to.

Performing the fastest search - which collection should i use?

I know:
If you need fast access to elements using index, ArrayList should be choice.
If you need fast access to elements using a key, use HashMap.
If you need fast add and removal of elements, use LinkedList (but it has a very poor seeking performance).
In order to perform the fastest search, on the basis of data stored in a collection object, which collection should I use?
Below is my code:
public void fillAndSearch(Collection<Student> collection) {
if(collection!=null){
for (int i=0; i<=10; i++) {
Student student = new Student("name" + i, "id" + i);
collection.add(student);
}
}
//here We have to perform searching for "name7" or "id5",
//then which implementation of collection will be fastest?
}
class Student {
String name;
String id;
Student(String name, String id) {
this.name = name;
this.id = id;
}
}
The thing which is often skipped when comparing ArrayList and LinkedList is cache and memory management optimisations. ArrayList is effectively just an array which means that it is stored in a continuous space in the memory. This allows the Operating System to use optimisations such as "when a byte in memory was accessed, most likely the next byte will be accessed soon". Because of this, ArrayList is faster than LinkedList in all but one case: when inserting/deleting the element at the beginning of the list (because all elements in the array have to be shifted). Adding/deleting at the end or in the middle, iterating over, accessing the element are all faster in case of ArrayList.
If you need to search for student with given name and id, it sounds to me like a map with composite key - Map<Student, StudentData>. I would recommend to use HashMap implementation, unless you need to be able to both search the collection and retrieve all elements sorted by key in which case TreeMap may be a better idea. Although remember that HashMap has O(1) access time, while TreeMap has O(logn) access time.
With given restrictions, you should use HashMap.
It will give you quick search, as you wished.
If you care about traversing elements in specific order, you should choose TreeMap (natural order) or LinkedHashMap (insertion order).
If your collection is guaranteed immutable, you can use sorted ArrayList with binary search, it will save you some memory. In this case, you can search only by one specific key, which is undesirable in many real world applications.
Anyway, you should have really huge number of elements (millions/billions) to feel the difference between O(logN) solutions and O(1) solutions.
If you want to learn more about data structures, I recommend you to review Algorythms course by Princeton university on coursera.com
There is nothing wrong in keeping multiple collections to access your data faster.
In this situation I would use 2 HashMap<String, Student>'s. One for each search-key.
(PS: Or if you don't know which kind of keyword is used to search for, then you can store both in the same map).

Random access for HashMap keys

I need to randomly access keys in a HashMap. Right now, I am using Set's toArray() method on the Set that HashMap's keySet() returns, and casting it as a String[] (my keys are Strings). Then I use Random to pick a random element of the String array.
public String randomKey() {
String[] keys = (String[]) myHashMap.keySet().toArray();
Random rand = new Random();
return keyring[rand.nextInt(keyring.length)];
}
It seems like there ought to be a more elegant way of doing this!
I've read the following post, but it seems even more convoluted than the way I'm doing it. If the following solution is the better, why is that so?Selecting random key and value sets from a Map in Java
There is no facility in a HashMap to return an entry without knowing the key so, if you want to use only that class, what you have is probably as good a solution as any.
Keep in mind however that you're not actually restricted to using a HashMap.
If you're going to be reading this collection far more often than writing it, you can create your own class which contains both a HashMap of the mappings and a different collection of the keys that allows random access (like a Vector).
That way, you won't incur the cost of converting the map to a set then an array every time you read, it will only happen when necessary (adding or deleting items from your collection).
Unfortunately, a Vector allows multiple keys of the same value so you would have to defend against that when inserting (to ensure fairness when selecting a random key). That will increase the cost of insertion.
Deletion would also be increased cost since you would have to search for the item to remove from the vector.
I'm not sure there's an easy single collection for this purpose. If you wanted to go the whole hog, you could have your current HashMap, a Vector of the keys, and yet another HashMap mapping the keys to the vector indexes.
That way, all operations (insert, delete, change, get-random) would be O(1) in time, very efficient in terms of time, perhaps less so in terms of space :-)
Or there's a halfway solution that still uses a wrapper but creates a long-lived array of strings whenever you insert, change or delete a key. That way, you only create the array when needed and you still amortise the costs. Your class then uses the hashmap for efficient access with a key, and the array for random selection.
And the change there is minimal. You already have the code for creating the array, you just have to create your wrapper class which provides whatever you need from a HashMap (and simply passes most calls through to the HashMap) plus one extra function to get a random key (using the array).
Now, I'd only consider using those methods if performance is actually a problem though. You can spend untold hours making your code faster in ways that don't matter :-)
If what you have is fast enough, it's fine.
Why not use the Collections.shuffle method, saved to a variable and simply pop one off the top as required.
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#shuffle(java.util.List)
You could avoid copying the whole keyset into a temporary data structure, by first getting the size, choosing the random index and then iterating over the keyset the appropriate number of times.
This code would need to be synchronized to avoid concurrent modifications.
If you really just want any element of a set, this will work fine.
String s = set.iterator().next();
If you are unsure whether there is an element in the set, use:
String s;
Iterator<String> it = set.iterator();
if (it.hasNext()) {
s = it.next();
}
else {
// set was empty
}

How to structure data that can be both indexed and sorted on different keys?

I'd like to maintain a set of data that has two main attributes: 1. I can quickly look up the existence of an object by a numerical ID, and 2. I want to sort the data, but avoid needlessly sorting it since it can be slow. For a more concrete example, I have a set of user data, where each user has a unique ID (an int) and a unique username (a String). I'll be adding and removing users, and occasionally I want to generate a human-readable, alphabetically-sorted list to the user, but as my number of users increases, so does the time needed to sort the data.
How would you structure this? The only reasonable approach I can think of involves creating two separate data structures, and redundantly add/remove items to BOTH structures at the same time. As my data grows, it will be using more data than a single structure would. I might also introduce more bugs this way, as I have to remind myself to duplicate the operations to both structures when I come back to add to the code later. In other words, I could have:
TreeMap<String,Integer> nameSortedMap = new TreeMap<String,Integer>(String.CASE_INSENSITIVE_ORDER);
and
Map<Integer,String> idMap = new HashMap<Integer,String>();
Whenever I add or remove data, I do it on both maps. If I am retrieving a username by ID, I'd call idMap.get(id) or idMap.contains(id) (to see if a user exists). On the other hand, if I need to display a sorted list, I would use nameSortedMap.keySet(), which I gather should already be in name order, avoiding the need for additional work each time a sorted list is needed.
How's my thought process? Is there a better or simpler way to accomplish this? Thank you!
There are two ways I can think of:
Use a database and index both columns. Databases are fast, and can be very small (see: SQLite), but they're probably overkill if you don't need to save the data, or if this is the only thing you would use it for.
Create a class containing both of your maps above, which handles all inserting and deleting. That way, you only have one place where you have to remember to do operations on both. This is one of the major selling points for object oriented programming.
If you dont call the ordered list very often I dont think would be necessary to keep two maps, but if you prefer to do that. I would suggest you to create one class thtat extend the HashMap and implement the methods to handle both maps. Example:
public class UserMap extends HashMap<Integer, String> {
TreeMap<String, Integer> nameSortedMap = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
#Override
public String put(Integer key, String value) {
String put = super.put(key, value);
nameSortedMap.put(value, key);
return put;
}
#Override
public String remove(Object key) {
String toRemove = get(key);
if (toRemove != null) {
remove(key);
getOrderedName().remove(toRemove);
}
return toRemove;
}
public Set<String> getSortedNames() {
return nameSortedMap.keySet();
}
}

Searchable list of objects in Java

I want to create a large (~300,000 entries) List of self defined objects of the class Drug.
Every Drug has an ID and I want to be able to search the Drugs in logarithmic time via that ID.
What kind of List do I have to use?
How do I declare that it should be searchable via the ID?
The various implementations of the Map interface should do what you want.
Just remember to override the hashCode() method of your Drug class if you plan to use a HashMap.
public class Drug implements Comparable<Drug> {
public int compareTo(Drug o) {
return this.id.compareTo(o.getId());
}
}
Then in your List you can use binarySearch
List<Drug> drugList; <--- List of all drugs
Drug drugToSearchFor; <---- The drug that you want to search for, containing the id
// Sort before search
Collections.sort(drugList);
int index = Collections.binarySearch(drugList, drugToSearchFor);
if (index >= 0) {
return true;
} else {
return false;
}
Wouldn't you use TreeMap instead of List using the ID as your Key?
If searching by a key is important for you, then you probably need to use a Map and not a List. From the Java Collections Trail:
The three general-purpose Map
implementations are HashMap, TreeMap
and LinkedHashMap. If you need
SortedMap operations or key-ordered
Collection-view iteration, use
TreeMap; if you want maximum speed and
don't care about iteration order, use
HashMap; if you want near-HashMap
performance and insertion-order
iteration, use LinkedHashMap.
Due to the high number of entries you might consider to use a database instead of holding everything in memory.
If you still want to keep it in memory you might have a look at b-trees.
You could use any list, and as long as it is sorted you can use a binary search.
But I would use a Map which searches in O(1).
I know I am pretty redundant with this statement, but as everybody said isnt this exactly the case for a Map ?

Categories