Java's Hashtable - How to get any entry - java

I'm working on a chat server, and I'm putting the Clients into a Hashtable.
This Hashtable is composed by <String name, Connection c>, where Connection has Socket and in-out flows.
I can send messages just looking for a nick in the Hashtable, but how can I send it to all the people?
Can I "Scout" (this was the unknown term) every Hashtable's entry? (like an array, I want to "SCOUT" each entry, so I'll do a loop and I'll send the message to everyone).
Thanks in advance.

You could answer to your own question by reading the javadocs for HashMap. "Read the javadocs" is an important lesson that every beginner in Java should learn and remember.
In this case, the javadocs will show you 3 methods that could be useful:
The keys() method returns a collection consisting of the keys in the table.
The values() method returns a collection consisting of the values in the table.
The entries() method returns a collection representing the key/value pairs in the table.
You can iterate these collections as any other collection. There are examples in the other answers.
However, I get the impression that your application is multi-threaded. If that is the case the there are two other problems that you need to deal with to make your program reliable:
If two or more threads could use the same object or data structure, they need to take the necessary steps to ensure that they are properly synchronized. If they don't then there is a non-zero probability that some sequence of operations will result in the data structure being put into an inconsistent state, or that one or more threads will see an inconsistent state (due to memory caches, values saved in registers, etc).
If one thread is using one of a HashMap's collection iterators and another adds or removes an entry, then the first one is likely to get a ConcurrentModificationException.
If you solve the above two problems by locking out all other operations on the HashMap while your "send to all" operation is going on, you are unintentionally creating a performance bottleneck. Basically, everything else stops until the operation has finished. You get a similar effect (but on a finer scale) if you simply put a synchronization wrapper around the HashMap.
You need to read and learn about these things. (And there's far too much to explain in a single SO Answer). A simple (but not universal) solution to all 3 problems that probably will work in your use-case it to use a ConcurrentHashMap instead of a plain HashMap.

I can send messages just looking for a nick in the Hashtable, but how can I send it to all the people?
Then do the same for all nicknames in the hash table:
for (String name : yourTable.keySet())
yourTable.get(name).send("your message");
or, alternatively:
for (Connection conn : yourTable.values())
conn.send("your message");

You can iterate over all the values in the Hashtable and do what you wish to all of them:
Map<String, Connection> users;
for (Connection connection : users.values()) {
// Send the message to each Socket here.
}

Hashtable has keySet() which returns all key entries in that table. I am posting this from mobile, couldnt get you example link. If you want all connection list you can use entrySet().

Related

Should we use HashSet?

A HashSet is backed by a HashMap. From it's JavaDoc:
This class implements the Set interface, backed by a hash table
(actually a HashMap instance)
When taking a look at the source we can also see how they relate to each other:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Therefore a HashSet<E> is backed by a HashMap<E,Object>. For all HashSets in our application we have one reference object PRESENT that we use in the HashMap for the value. While the memory needed to store PRESENT is neglectable, we still store a reference to it for each value in the map.
Would it not be more efficient to use null instead of PRESENT? A further consideration then is should we forgo the HashSet altogether and directly use a HashMap, given the circumstance permits the use of a Map instead of a Set.
My basic problem that triggered these thoughts is the following situation: I have a collection of objects on with the following properties:
big collection of objects > 30'000
Insertion order is not relevant
Efficient check if an item is contained
Adding new items to the collection is not relevant
The chosen solution should perform optimal in the context to the above criteria as well as minimize memory consumption. On this basis the datastructures HashSet and HashMap spring to mind. When thinking about alternative approaches, the key question is:
How to check containement efficiently?
The only answer that comes to my mind is using the items hash to calculate the storage location. I might be missing something here. Are there any other approaches?
I had a look at various issues, that did shed some light on the issue, but not quietly answered my question:
Java : HashSet vs. HashMap
clarifying facts behind Java's implementation of HashSet/HashMap
Java HashSet vs HashMap
I am not looking for suggestions of any alternative libraries or framework to address this, but I want to understand if there is an other way to think about efficient containement checking of an element in a Collection.
In short, yes you should use HashSet. It might not be the most possibly efficient Set implementation, but that hardly ever matters, unless you are working with huge amounts of data.
In that case, I would suggest using specialized libraries. EnumMaps if you can use enums, primitive maps like Trove if your data is mostly primitives, a bunch of other data-structures that are optimized for certain data-types, or even an in-memory-database.
Don't get me wrong, I'm someone who likes performance-tuning, too, but replacing the built-in data-structures should only be done when its really necessary. For most cases, they work perfectly fine.
What you could do, in case you really want to save the last bit of memory and do not care about inserting, is using a fixed-sized array, sorting that and doing a binary search every time. But I doubt that it's more efficient than a HashSet.
Hashtables and HashSets should be used entirely different, so maybe the two shouldn't be compared as "which is more efficient". The hashset would be more suitable for the mathematical "set" (ex. {1,2,3,4}). They contain no duplicates and allow for only one null value. While a hashmap is more of a key-> pair value system. They allow multiple null values as well as duplicates, just not duplicate key vales. I know this is probably answering "difference between a hashtable and hashset" but I think my point is they really can't be compared.

Atomic way to reorder keys in a ConcurrentSkipListMap / ConcurrentSkipListSet?

Summary of this post: I have an set of ordered items whose order may change over time. I need to be able to iterate through this set from multiple threads, each of which may also want to update the order of the items.
For example, multiple threads need to access String keys in some arbitrary sorted order. They strings are not sorted according to their natural ordering, but by some values that may change (hence, a custom Comparator). My original implementation was to use a TreeSet and synchronize on it. If any of the keys needed to be reordered, a thread would remove the key from the map, update the comparison value, and reinsert the key. To implement this, the keys are native Strings, but the comparator has access to the values. This is a weird arrangement where the order of keys may change over time, but since a changed key is always removed and reinserted when it changes, it seems to work. (I suppose it could also work if the Strings were wrapped inside another object.)
I recently became aware of the ConcurrentSkipListSet/ConcurrentSkipListMap implementations which are basically thread-safe sorted sets (resp. maps.) It seems like I can now iterate through the keys without having to lock the entire data structure. However, is there a way I can use them to atomically remove a key and replace it with another, like the operation I was doing above, so that other iterating threads don't miss the item, and without having to use synchronize blocks?
If anyone can suggest a better data structure for this type of operation, I'm all ears, too!
is there a way I can use them to atomically remove a key and replace it with another, like the operation I was doing above, so that other iterating threads don't miss the item, and without having to use synchronize blocks?
The short answer is no. If you need to remove and reinsert, there is no atomic way to do this with any collection that I know of.
That said, one possibility would be for you to reinsert the item before deleting it from the skip list. This would cause a duplicate but may be easier to handle then a missing entry. You would reinsert it after you changed the object so it would sort differently. This assumes that the object would then be non-equal as well. But if the other threads that are processing the lists can't handle the duplicates then I think you are SOL.

How best to get List nodes for a cache implementation

Okay first I will preface this with "I am very very new to Java" (i.e., a few days in), but I am a programmer by trade.
I have come across a situation where I want to load data. However, I would like to cache that data to prevent extraneous calls to the API (or, whatever the data source may be). After thinking about it a bit, I have come up with a cache scheme which seems to be pretty reasonable to me: the idea is that the DataCache class has two collections: a hash table that with key type "string" and value type "CacheData". CacheData has 2 data members - the actual result of the api call in string form, and a ref (ListIterator?) to a node of a linked list. Which brings us to the 2nd collection - a linked list of keys. The idea is that when a request comes in for data, we see if it's in the Hash. If not, we fetch from the API, add the resulting key to the front of the linked list, and store a Data object in the hash containing the result, along with a ref to the first node of the linked list (the one we just added). If the data IS found in the hash, we break the node out of the linked list, put it to the front, and return the data from CacheData. The benefit, every operation is guaranteed to execute in O(1), if I'm understanding correctly.
Can I store the integer hash value of the 'request' in the linked list instead of the string (request) as a whole? If so, how can I access the result in the hashmap given that integer? (none of the methods seem to take an 'int' as param). Also...is my approach to this situation sound? Or is there perhaps something in Java that would make this easier?

Efficiently finding duplicates in a constrained many-to-many dataset?

I have to write a bulk operation version of something our webapp
lets you do on a more limited basis from the UI. The desired
operation is to assign objects to a category. A category can have
multiple objects but a given object can only be in one category.
The workflow for the task is:
1) Using the browser, a file of the following form is uploaded:
# ObjectID, CategoryID
Oid1, Cid1
Oid2, Cid1
Oid3, Cid2
Oid4, Cid2
[etc.]
The file will most likely have tens to hundreds of lines, but
definitely could have thousands of lines.
In an ideal world a given object id would only occur once in the file
(reflecting the fact that an object can only be assigned to one category)
But since the file is created outside of our control, there's no guarantee
that's actually true and the processing has to deal with that possibility.
2) The server will receive the file, parse it, pre-process it
and show a page something like:
723 objects to be assigned to 126 categories
142 objects not found
42 categories not found
Do you want to continue?
[Yes] [No]
3) If the user clicks the Yes button, the server will
actually do the work.
Since I don't want to parse the file in both steps (2) and (3), as
part of (2), I need to build a container that will live across
requests and hold a useful representation of the data that will let me
easily provide the data to populate the "preview" page and will let me
efficiently do the actual work. (While obviously we have sessions, we
normally keep very little in-memory session state.)
There is an existing
assignObjectsToCategory(Set<ObjectId> objectIds, CategoryId categoryId)
function that is used when assignment is done through the UI. It is
highly desireable for the bulk operation to also use this API since it
does a bunch of other business logic in addition to the simple
assignment and we need that same business logic to run when this bulk
assign is done.
Initially it was going to be OK that if the file "illegally" specified
multiple categories for a given object -- it would be OK to assign the
object abitrarily to one of the categories the file associated it
with.
So I was initially thinking that in step (2) as I went through the
file I would build up and put into the cross-request container a
Map<CategoryId, Set<ObjectId>> (specifically a HashMap for quick
lookup and insertion) and then when it was time to do the work I could
just iterate on the map and for each CategoryId pull out the
associated Set<ObjectId> and pass them into assignObjectsToCategory().
However, the requirement on how to handle duplicate ObjectIds changed.
And they are now to be handled as follows:
If an ObjectId appears multiple times in the file and
all times is associated with the same CategoryId, assign
the object to that category.
If an ObjectId appears multiple times in the file and
is associated with different CategoryIds, consider that
an error and make mention of it on the "preview" page.
That seems to mess up my Map<CategoryId, Set<ObjectId>> strategy
since it doesn't provide a good way to detect that the ObjectId I
just read out of the file is already associated with a CategoryId.
So my question is how to most efficiently detect and track these
duplicate ObjectIds?
What came to mind is to use both "forward" and "reverse" maps:
public CrossRequestContainer
{
...
Map<CategoryId, Set<ObjectId>> objectsByCategory; // HashMap
Map<ObjectId, List<CategoryId>> categoriesByObject; // HashMap
Set<ObjectId> illegalDuplicates;
...
}
Then as each (ObjectId, CategoryId) pair was read in, it would
get put into both maps. Once the file was completely read in, I
could do:
for (Map.Entry<ObjectId, List<CategoryId>> entry : categoriesByObject.entrySet()) {
List<CategoryId> categories = entry.getValue();
if (categories.size() > 1) {
ObjectId object = entry.getKey();
if (!all_categories_are_equal(categories)) {
illegalDuplicates.add(object);
// Since this is an "illegal" duplicate I need to remove it
// from every category that it appeared with in the file.
for (CategoryId category : categories) {
objectsByCategory.get(category).remove(object);
}
}
}
}
When this loop finishes, objectsByCategory will no longer contain any "illegal"
duplicates, and illegalDuplicates will contain all the "illegal" duplicates to
be reported back as needed. I can then iterate over objectsByCategory, get the Set<ObjectId> for each category, and call assignObjectsToCategory() to do the assignments.
But while I think this will work, I'm worried about storing the data twice, especially
when the input file is huge. And I'm also worried that I'm missing something re: efficiency
and this will go very slowly.
Are there ways to do this that won't use double memory but can still run quickly?
Am I missing something that even with the double memory use will still run a lot
slower than I'm expecting?
Given the constraints you've given, I don't there's a way to do this using a lot less memory.
One possible optimization though is to only maintain lists of categories for objects which are listed in multiple categories, and otherwise just map object to category, ie:
Map<CategoryId, Set<ObjectId>> objectsByCategory; // HashMap
Map<ObjectId, CategoryId> categoryByObject; // HashMap
Map<ObjectId, Set<CategoryId>> illegalDuplicates; // HashMap
Yes, this adds yet another container, but it will contain (hopefully) only a few entries; also, the memory requirements of the categoryByObject map is reduced (cutting out one list overhead per entry).
The logic is a little more complicated of course. When a duplicate is initially discovered, the object should be removed from the categoryByObject map and added into the illegalDuplicates map. Before adding any object into the categoryByObject map, you will need to first check the illegalDuplicates map.
Finally, it probably won't hurt performance to build the objectsByCategory map in a separate loop after building the other two maps, and it will simplify the code a bit.

Timeout Mechanism for Hashtable

I have a hashtable that under heavy-traffic. I want to add timeout mechanism to hashtable, remove too old records. My concerns are,
- It should be lightweight
- Remove operation has not time critical. I mean (timeout value is 1 hour) remove operation can be after 1 hour or and 1 hour 15 minute. There is no problem.
My opinion is,
I create a big array (as ring buffer)that store put time and hashtable key,
When adding to hashtable, using array index find a next slot on array put time,
if array slot empty, put insertion time and HT key,
if array slot is not empty, compare insertion time for timeout occured.
if timeout occured remove from Hashtable (if not removed yet)
it not timeout occured, increment index till to find empty slot or timeouted array slot.
When removing from hashtable there is no operation on big array.
Shortly, for every add operation to Hashtable, may remove 1 timeouted element from hashtable or do nothing.
What is your the more elegant and more lightweight solution ?
Thanks for helps,
My approach would be to use the Guava MapMaker:
ConcurrentMap<String, MyValue> graphs = new MapMaker()
.maximumSize(100)
.expireAfterWrite(1, TimeUnit.HOURS)
.makeComputingMap(
new Function<String, MyValue>() {
public MyValue apply(String string) {
return calculateMyValue(string);
}
});
This might not be exactly what you're describing, but chances it's close enough. And it's much easier to produce (plus it's using a well-tested code base).
Note that you can tweak the behaviour of the resulting Map by calling different methods before the make*() call.
You should rather consider using a LinkedHashMap or maybe a WeakHashMap.
The former has a constructor to set the iteration order of its elements to the order of last access; this makes it trivial to remove too old elements. And its removeEldestEntry method can be overridden to define your own policy on when to remove the eldest entry automatically after the insertion of a new one.
The latter uses weak references to keys, so any key which has no other reference to it can be automatically garbage collected.
I think a much easier solution is to use LRUMap from Apache Commons Collections. Of course you can write your own data structures if you enjoy it or you want to learn, but this problem is so common that numerous ready-made solutions exist. (I'm sure others will point you to other implementations too, after a time your problem will be choosing the right one from them :))
Under the assumption that the currently most heavily accessed items in your cache structure are in the significant minority, you may well get by with randomly selecting items for removal (you have a low probability of removing something very useful). I've used this technique and, in this particular application, it worked very well and took next to no implementation effort.

Categories