Iterating through a map many times efficiently - java

I have a cookie manager class that stores Lists of cookies by their domain in a Map. The size will stay below 100 most of the time.
Map<String, CookieList> cookieMap;
Every time I set up cookies for connections, it needs to iterate through all domains(String), check if it's acceptable, then insert the CookieList. I will be iterating through the map many times. I have a separate List holding the domains and search that, then get the CookieList by the Key.
List<String> domainList;
// host is from the connection being set up
for (String domain : domainList) {
if (host.contains(domain)) {
CookieList list = cookieMap.get(domain);
// set up cookies
}
}
Since I'm using contains, I can't directly get the Key from cookieMap. Is this a good way or should I just be iterating Map's EntrySet? If so, would LinkedHashMap be good in this example?

Instead of maintaining a Map and a List, you could use Map.keySet to get the domains.
for (String domain : cookieMap.keySet()) {
if (host.contains(domain)) {
CookieList list = cookieMap.get(domain);
}
}
There is nothing inefficient about this, since the for loop is O(n), and the call to cookieMap is O(1).

Map<String, CookieList> coockieMap = new HashMap<String, CookieList>();
for (Map.Entry<Integer, CookieList> entry : coockieMap.entrySet()) {
if (host.contains(entry.getKey())) {
CookieList list = entry.getValue();
}
}
Hope this helps you.

I think your code is pretty optimized, if you want, you can use
domainList.retainAll(hosts)
before your for loop, so stop doing a check every loop. Effetively, your code will look as follows :
List<String> hostList = new ArrayList<String>(domainList); // we don't want to edit domains
hostList.retainAll(host);
for (String hostEntry : hostList) { // I'd rename "host" so I can use it here
CookieList list = cookieMap.get(hostEntry);
// set up cookies
}

Related

Optimal way to maintain and quickly look up which objects contain a specific token (string) without maintaining two hash maps?

My system takes in a documentID and list of strings that represent tokens associated with the document. The primary metric I am trying to optimize for is returning a list of all the document ids that are associated with a given token. I am pretty confident I should start with something like HashMap<String, HashSet<Integer>> tokenLookupMap where the string is the token and the hash set is the set of documents IDs that contain that token. The tricky part is how to easily deal with documents being overwritten with new token lists (inserts completely overwrite the existing token lists with the new input). For example if my input looks like:
insertDocument(docId: 1, tokens: {token1, token2, token3} )
// query on token1 returns docIDs:[1]
insertDocument(docId: 2, tokens: {token1, token2, token3} )
// query on token1 returns docIDs:[1, 2]
insertDocument(docId: 1, tokens: {token4, token5, token6} )
// query on token1 returns docIDs:[2]
// query on token4 returns docIDs:[1]
I need to be able to efficiently update all the values in tokenLookupMap to reflect any tokens that are no longer present in the overridden document. Currently I'm maintaining a second hash map HashMap<Integer, HashSet<String>> documentLookupMap; to provide the "opposite" lookup perspective such that I can quickly look up what tokens are associated with a given document id and remove the old ones before an overwrite. This definitely allows me to optimize for lookups by token (insert time doesn't matter as much as queries) but it feels silly or even dangerous to have two structs that sort of represent the same thing and share a lot of overlapping space. Aside from the space increase and slight time increase on insert I technically run the risk of the structures getting out of sync.
Are there more optimal ways I could go about this? I could always put the two hash maps in a separate class and lock it down with limited public methods but are there ways to change the structure and perhaps avoid maintaining two structures altogether? Here's the most relevant code:
private HashMap<Integer, HashSet<String>> documentLookupMap;
private HashMap<String, HashSet<Integer>> tokenLookupMap;
private void insertDocument(int docId, HashSet<String> tokens ) {
if( documentLookupMap.containsKey(docId)) {
// if we've aleady indexed a doc with the same id we need to clean up first
var oldTokens = documentLookupMap.get(docId);
for (String token : oldTokens) {
tokenLookupMap.get(token).remove(docId);
// not sure if this is beneficial big picture on large data sets / space constraints
if(tokenLookupMap.get(token).isEmpty()) {
tokenLookupMap.remove(token);
}
}
}
documentLookupMap.put(docId, tokens);
for (String token : tokens) {
tokenLookupMap.computeIfAbsent(token,t->new HashSet<Integer>()).add(docId);
}
}
private Set<Integer> getDocsForToken(String token) {
return tokenLookupMap.containsKey(token) ? tokenLookupMap.get(token) : new HashSet<Integer>();
}
This needs to scale efficiently to tens of thousands of documents / tokens
Thanks in advance for any insights!
One thing that comes to my mind would be to maintain the Document-Token relation in separate classes and maintain 2 maps only for lookup:
class Document {
Integer docId;
//using arrays saves some space and tokens don't seem to change that often
Token[] tokens;
}
class Token {
String token;
Set<Document> documents;
}
Map<Integer, Document> docs = new HashMap<>();
Map<String, Token> tokens = new WeakHashMap<>();
When inserting a new document you basically clear the set of tokens and rebuild it:
private void insertDocument(int docId, Set<String> tokens ) {
Document doc = docs.computeIfAbsent(docId, ...);
//clear the tokens
for( Token old : doc.tokens ) {
old.documents.remove(doc);
}
//add the new tokens
Set<Token> newTokens = new HashSet<>();
for( String t: tokens ) {
Token newToken = tokens.computeIfAbsent(t, ...);
newToken.documents.add(doc);
newTokens.add(newToken);
}
doc.tokens = newTokens.toArray(new Token[0]);
}
Of course this could be optimized to ignore tokens that aren't changed.
Note the use of WeakHashMap for tokens: since tokens could be abandoned at some point they should not use up any more memory. WeakHashMap would allow the garbage collector to remove those that aren't reachably by anyone else, e.g. those that aren't listed in any document.
Of course it could take some time until gc kicks in and in the meantime token lookup could return tokens that aren't used anymore. You'd either need to filter those or remove the tokens from the token map manually if they don't have document references anymore.

Java Hashmap Iterator Issue on conditon

I am trying to loop through a hashmap which contains Sessions and IDs. Multiple sessions may have the same ID. On each method call, I need to iterate through the hashmap and find which sessions are listed against a given ID.
class contains:
private static Map<Session, String> peers = new HashMap<Session, String> ();
Method contains:
for (Map.Entry<Session, String> entry : peers.entrySet()) {
if(entry.getValue() == clientId){
Session peer = entry.getKey();
peer.getBasicRemote().sendObject(figure);
}
}
But the problem is it runs only one time. Even I tried to get the size of hashmap and it given the exactly amount what I have.
As the value is a String, you should probably compare it via equals(), as == only checks if two objects are the same object, but not if they are the same String. But that's only guessing.

Best way to save two depending Strings in Java and compare if new strings already exist

I need to save two depending Strings (action and parameter) into a file or a hashtable/map or an Array, depending what is the best solution for speed and memory.
My Application iterates through a large amount of forms on a website and i want to skip if the combination (String action,String parameter) already was tested and therefore saved. I thing an Array would be too slow if I have more then thousands of different action and parameter tupels. I´m not experienced enough to chose the right method for this. I tried a hashtable but it does not work:
Hashtable<String, String> ht = new Hashtable<String, String>();
if (ht.containsKey(action) && ht.get(action).contains(parameter)) {
System.out.println("Tupel already exists");
continue;
}
else
ht.put(action, parameter);
If a action and parameter will always be a 1-to-1 mapping (an action will only ever have one parameter), then your basic premise should be fine (though I'd recommend HashMap over Hashtable as it's faster and supports null keys)
If you will have many parameters for a given action, then you want Map<String, Set<String>> - where action is the key and each action is then associated with a set of parameters.
Declare it like this:
Map<String, Set<String>> map = new HashMap<>();
Use it like this:
Set<String> parameterSet = map.get(action); // lookup the parameterSet
if ((parameterSet != null) && (parameterSet.contains(parameter)) { // if it exists and contains the key
System.out.println("Tupel already exists");
} else { // pair doesn't exist
if (parameterSet == null) { // create parameterSet if needed
parameterSet = new HashSet<String>();
map.put(action, parameterSet);
}
parameterSet.add(parameter); // and add your parameter
}
As for the rest of your code and other things that may not be working:
I'm not sure what your use of continue is for in your original code; it's hard to tell without the rest of the method.
I'm assuming the creation of your hashtable is separated from the usage - if you're recreating it each time, then you'll definitely have problems.

HashSet contains substring

I have a HashSet of Strings in the format: something_something_name="value"
Set<String> name= new HashSet<String>();
Farther down in my code I want to check if a String "name" is included in the HashSet. In this little example, if I'm checking to see if "name" is a substring of any of the values in the HashSet, I'd like it to return true.
I know that .contains() won't work since that works using .equals(). Any suggestions on the best way to handle this would be great.
With your existing data structure, the only way is to iterate over all entries checking each one in turn.
If that's not good enough, you'll need a different data structure.
You can build a map (name -> strings) as follows:
Map<String, List<String>> name_2_keys = new HashMap<>();
for (String name : names) {
String[] parts = key.split("_");
List<String> keys = name_2_keys.get(parts[2]);
if (keys == null) {
keys = new ArrayList<>();
}
keys.add(name);
name_2_keys.put(parts[2], keys);
}
Then retrieve all the strings containing the name name:
List<String> keys = name_2_keys.get(name)
You can keep another map where name is the key and something_something_name is the value.
Thus, you would be able to move from name -> something_something_name -> value. If you want a single interface, you can write a wrapper class around these two maps, exposing the functionality you want.
I posted a MapFilter class here a while ago.
You could use it like:
MapFilter<String> something = new MapFilter<String>(yourMap, "something_");
MapFilter<String> something_something = new MapFilter<String>(something, "something_");
You will need to make your container into a Map first.
This would only be worthwhile doing if you look for the substrings many times.

Compare two lists & print elements which are repeating in second list

I have searched but couldn't find specific to what I was looking for.
I have a duplicate list got from Main List.
E.G.:
duplicateSet { D16A, D2243A, D2235A}
xConnectors { D16A, xxx, xxxx, xxxx, D16A, xxxx , D2243A ,xxxx, D2243A, xxxx, D2235A, xxxx, xxxx, D2235A}
I wrote this code
Set duplicateConnectors = new HashSet();
for(String s : duplicateSet)
{
for(IXConnector xCon : xConnectors)
{
if(s.equals(xCon.getAttribute("Name")))
{
duplicateConnectors.add(xCon);
vReporter.report(getDefaultSeverity(), "Connector {0} is duplicate", xCon);
}
}
}
The o/p I get is
Connector D16A is duplicate
Connector D16A is duplicate
Connector D2243A is duplicate
Connector D2243A is duplicate
But I need the above o/p in one single line.
Connectors D16A and D16A are duplicates.
Connectors D2243A and D2243A are duplicates.
Your current code runs in quadratic time i.e. O(n^2) and that is not a very scalable solution, because as your input will grow your running time will grow quadratically.
You should utilize a Hash Set to your advantage here, a hash set is a set that does not allow duplicates and items are hashed into an indexed array hence you receive constant time performance for insertion and contains. You now have one loop to check whether a connector name is found in the hashset earlier, if it is then it is a duplicate, this check is also constant time. So your whole algorithm becomes linear.
Set<String> dupes = new HashSet<String>();
for(String s : duplicateSet)
dupes.add(s);
for(IXConnector xCon : xConnectors)
{
String name = xCon.getAttribute("Name");
if(dupes.contains(name))
vReporter.report(getDefaultSeverity(), "Connectors {0} and {0} are duplicates.", xCon);
}
If you want to print the message only once, you can change the HashSet to a HashMap and use a boolean as a value to represent whether or not you have printed a message yet.
Map<String, Boolean> dupes = new HashMap<String, Boolean>();
for(String s : duplicateSet)
dupes.put(s, false);
for(IXConnector xCon : xConnectors)
{
String name = xCon.getAttribute("Name");
if(dupes.containsKey(name) && dupes.get(name) == false)
{
vReporter.report(getDefaultSeverity(), "Connectors {0} and {0} are duplicates.", xCon);
dupes.put(name, true);
}
}

Categories