Optimisation of searching HashMap with list of values

Optimisation of searching HashMap with list of values - java

I have a map in which values have references to lists of objects.
//key1.getElements() - produces the following
[Element N330955311 ({}), Element N330955300 ({}), Element N3638066598 ({})]
I would like to search the list of every key and find the occurrence of a given element (>= 2).
Currently my approach to this is every slow, I have a lot of data and I know execution time is relative but it takes 40seconds~.
My approach..
public String occurance>=2 (String id)
//Search for id
//Outer loop through Map
//get first map value and return elements
//inner loop iterating through key.getElements()
//if match with id..then iterate count
//return Strings with count == 2 else return null
The reason why this is so slow is because I have a lot of ids which I'm searching for - 8000~ and I have 3000~ keys in my map. So its > 8000*3000*8000 (given that every id/element exists in the key/valueSet map at least once)
Please help me with a more efficient way to make this search. I'm not too deep into practicing Java, so perhaps there's something obvious I'm missing.
Edited in real code after request:
public void findAdjacents() {
for (int i = 0; i < nodeList.size(); i++) {
count = 0;
inter = null;
container = findIntersections(nodeList.get(i));
if (container != null) {
intersections.add(container);
}
}
}
public String findIntersections(String id) {
Set<Map.Entry<String, Element>> entrySet = wayList.entrySet();
for (Map.Entry entry : entrySet) {
w1 = (Way) wayList.get(entry.getKey());
for (Node n : w1.getNodes()) {
container2 = String.valueOf(n);
if (container2.contains(id)) {
count++;
}
if (count == 2) {
inter = id;
count = 0;
}
}
}
if (inter != (null))
return inter;
else
return null;
}

Based on the pseudocode provided by you, there is no need to iterate all the keys in the Map. You can directly do a get(id) on the map. If the Map has it, you will get the list of elements on which you can iterate and get the element if its count is > 2. If the id is not there then null will be returned. So in that case you can optimize your code a bit.
Thanks

Related

Read a map N keys at a time

I have a map (instaWords) which is filled with thousands of words. I need to to loop over it N item at a time. Here is my code. In this code I need to read instaWords in the chunks of e.g 500 words and execute "updateInstaPhrases" with those words. Any help?
private static Map<InstaWord, List<Integer>> instaWords = new HashMap<InstaWord, List<Integer>>();
// here a couple of words are added to instaWords
updateInstaPhrases(instaWords);
private static void updateInstaPhrases(Map<InstaWord, List<Integer>> wordMap)
throws SQLException, UnsupportedEncodingException {
for (Map.Entry<InstaWord, List<Integer>> entry : wordMap.entrySet()) {
InstaWord instaWord = entry.getKey();
List<Integer> profiles = entry.getValue();
pst.setBytes(1, instaWord.word.getBytes("UTF-8"));
pst.setBytes(2, instaWord.word.getBytes("UTF-8"));
pst.setBytes(3, (instaWord.lang == null) ?·
"".getBytes("UTF-8") :·
instaWord.lang.getBytes("UTF-8"));
String profilesList = "";
boolean first = true;
for (Integer p : profiles) {
profilesList += (first ? "" : ", ") + p;
first = false;
}
pst.setString(4, profilesList);
pst.addBatch();
}
System.out.println("Words batch executed");
pst.executeBatch();
con.commit();
}
What I need is to iterate through a hashmap 'in chunks' (e.g. 500 item each time)

You may keep a counter, initialize to 0 and increment for each item, while collecting the items as you see fit (like, say, ArrayList<Map.Entry<InstaWord, List<Integer>>>). If counter (after increment) equals 500, process the whole batch, reset counter to 0 and clear the collection.
Another option is to have the counter control the loop and declare explicitly the iterator you draw the Map.Entrys from. In this way it’s probably a bit clearer what is going on.

how to find the duplicates in ArrayList using hashmap in java?

my program is reading large txt files(in MBs) which contain the source ip and destination ip(for example 192.168.125.10,112.25.2.1) ,,,Here read is an ArrayList in which the data is present.
i have generated unique ids(uid int type) using srcip and destip and now i am storing in
static ArrayList<Integer[]> prev = new ArrayList<Integer[]>();
where Array is
:-
static Integer[] multi1;
multi1 = new Integer[]{(int)uid,count,flag};
i have to print the all uids with there count or their frequencies using hashmap.
Plz give some solution...
for (ArrayList<String> read : readFiles.values())
{
if(file_count<=2)
{
for(int i=0 ; i<read.size() ; i++)
{
String str1=read.get(i).split(",")[0];//get only srcIP
String str2=read.get(i).split(",")[1];//get only destIP
StringTokenizer tokenizer1=new StringTokenizer(str1,".");
StringTokenizer tokenizer2=new StringTokenizer(str2,".");
if(tokenizer1.hasMoreTokens()&&tokenizer2.hasMoreTokens())
{
sip_oct1=Integer.parseInt(tokenizer1.nextToken());
sip_oct2=Integer.parseInt(tokenizer1.nextToken());
sip_oct3=Integer.parseInt(tokenizer1.nextToken());
sip_oct4=Integer.parseInt(tokenizer1.nextToken());
dip_oct1=Integer.parseInt(tokenizer2.nextToken());
dip_oct2=Integer.parseInt(tokenizer2.nextToken());
dip_oct3=Integer.parseInt(tokenizer2.nextToken());
dip_oct4=Integer.parseInt(tokenizer2.nextToken());
uid=uniqueIdGenerator(sip_oct1,sip_oct2,sip_oct3,sip_oct4,dip_oct1,dip_oct2,dip_oct3,dip_oct4);
}
multi1 = new Integer[]{(int)uid,count,flag};
prev.add(multi1);
System.out.println(prev.get(i)[0]);//getting uids from prev
Map<ArrayList<Integer []> , Integer> map = new HashMap<ArrayList<Integer[]>, Integer>();
for (int j=0 ; j<prev.size() ; j++)
{
Integer temp=map.get(prev.get(i)[0]);
count = map.get(temp);
map.put(temp, (count == null) ? 1 : count++);
}
printMap(map);
System.out.println("uids--->"+prev.get(i)[0]+" Count--- >"+count+" flag--->"+prev.get(i)[2]);
}
}
file_count++;
}
}
public static void printMap(Map<ArrayList<Integer[]>, Integer> map)
{
for (Entry<ArrayList<Integer[]>, Integer> entry : map.entrySet())
{
System.out.println(" Value : "+ entry.getValue()+"key : "+entry.getKey());
}
}
public static double uniqueIdGenerator(int oc1,int oc2,int oc3,int oc4,int oc5,int oc6,int oc7,int oc8)
{
int a,b;
double c;
a=((oc1*10+oc2)*10+oc3)*10+oc4;
b=((oc5*10+oc6)*10+oc7)*10+oc8;
c= Math.log(a)+Math.log(b);
return Math.round(c*1000);
}

Now understanding what you want, there are (at least) 2 ways of doing this.
1st: Make a list with the uid's. Then a second list where you can have a value (your uid) and keep a count. Was thinking of HashMap, but there you can not easily change the count. Maybe an ArrayList of a list with 2 values.
Then loop over your list with the uid's, check with a second for loop if the uid is already in the second list. If it is, add one to the count. If it is not, add it to the list.
2nd: Do the same thing, but then with classes (very Java). Then you can put even more info into the class ;)
Hope this helps!
*edit: #RC. indeed gives cleaner code.

Converting singly linked list to a map

I have been given an assignment to change to upgrade an existing one.
Figure out how to recode the qualifying exam problem using a Map for each terminal line, on the
assumption that the size of the problem is dominated by the number of input lines, not the 500
terminal lines
The program takes in a text file that has number, name. The number is the PC number and the name is the user who logged on. The program returns the user for each pc that logged on the most. Here is the existing code
public class LineUsageData {
SinglyLinkedList<Usage> singly = new SinglyLinkedList<Usage>();
//function to add a user to the linked list or to increment count by 1
public void addObservation(Usage usage){
for(int i = 0; i < singly.size(); ++i){
if(usage.getName().equals(singly.get(i).getName())){
singly.get(i).incrementCount(1);
return;
}
}
singly.add(usage);
}
//returns the user with the most connections to the PC
public String getMaxUsage(){
int tempHigh = 0;
int high = 0;
String userAndCount = "";
for(int i = 0; i < singly.size(); ++i){//goes through list and keeps highest
tempHigh = singly.get(i).getCount();
if(tempHigh > high){
high = tempHigh;
userAndCount = singly.get(i).getName() + " " + singly.get(i).getCount();
}
}
return userAndCount;
}
}
I am having trouble on the theoretical side. We can use a hashmap or a treemap. I am trying to think through how I would form a map that would hold the list of users for each pc? I can reuse the Usage object which will hold the name and the count of the user. I am not supposed to alter that object though

When checking if Usage is present in the list you perform a linear search each time (O(N)). If you replace your list with the Map<String,Usage>, you'll be able to search for name in sublinear time. TreeMap has O(log N) time for search and update, HashMap has amortized O(1)(constant) time.
So, the most effective data structure in this case is HashMap.
import java.util.*;
public class LineUsageData {
Map<String, Usage> map = new HashMap<String, Usage>();
//function to add a user to the map or to increment count by 1
public void addObservation(Usage usage) {
Usage existentUsage = map.get(usage.getName());
if (existentUsage == null) {
map.put(usage.getName(), usage);
} else {
existentUsage.incrementCount(1);
}
}
//returns the user with the most connections to the PC
public String getMaxUsage() {
Usage maxUsage = null;
for (Usage usage : map.values()) {
if (maxUsage == null || usage.getCount() > maxUsage.getCount()) {
maxUsage = usage;
}
}
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
// alternative version that uses Collections.max
public String getMaxUsageAlt() {
Usage maxUsage = map.isEmpty() ? null :
Collections.max(map.values(), new Comparator<Usage>() {
#Override
public int compare(Usage o1, Usage o2) {
return o1.getCount() - o2.getCount();
}
});
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
}
Map can also be iterated in the time proportional to it's size, so you can use the same procedure to find maximum element in it. I gave you two options, either manual approach, or usage of Collections.max utility method.

With simple words: You use a LinkedList (singly or doubly) when you have a list of items, and you usually plan to traverse them,
and a Map implementation when you have "Dictionary-like" entries, where a key corresponds to a value and you plan to access the value using the key.
In order to convert your SinglyLinkedList to a HashMap or TreeMap, you need find out which property of your item will be used as your key (it must be an element with unique values).
Assuming you are using the name property from your Usage class, you can do this
(a simple example):
//You could also use TreeMap, depending on your needs.
Map<String, Usage> usageMap = new HashMap<String, Usage>();
//Iterate through your SinglyLinkedList.
for(Usage usage : singly) {
//Add all items to the Map
usageMap.put(usage.getName(), usage);
}
//Access a value using its name as the key of the Map.
Usage accessedUsage = usageMap.get("AUsageName");
Also note that:
Map<string, Usage> usageMap = new HashMap<>();
Is valid, due to diamond inference.

I Solved this offline and didn't get a chance to see some of the answers which looked to be both very helpful. Sorry about that Nick and Aivean and thanks for the responses. Here is the code i ended up writing to get this to work.
public class LineUsageData {
Map<Integer, Usage> map = new HashMap<Integer, Usage>();
int hash = 0;
public void addObservation(Usage usage){
hash = usage.getName().hashCode();
System.out.println(hash);
while((map.get(hash)) != null){
if(map.get(hash).getName().equals(usage.name)){
map.get(hash).count++;
return;
}else{
hash++;
}
}
map.put(hash, usage);
}
public String getMaxUsage(){
String str = "";
int tempHigh = 0;
int high = 0;
//for loop
for(Integer key : map.keySet()){
tempHigh = map.get(key).getCount();
if(tempHigh > high){
high = tempHigh;
str = map.get(key).getName() + " " + map.get(key).getCount();
}
}
return str;
}
}

Java Algorithm: pair list entries by multiple case criteria

I fear this won't be an easy question. I've been thinking about a proper solution for this problem for a long time and hope that a fresh bunch of brains have a better view on the problem - let's get to it:
Data:
What we're working with here is a csv file containing multiple columns, the relevant ones for this problem are:
User ID (Integer, ranging from 3 to 8 digits, multiple entries with the same UserID exist) LIST IS SORTED BY THIS
Query (String)
Epoc (Long, epoc time value)
clickurl (String)
Every entry in the data we're working with here has !null values for these attributes.
Example Data:
SID,UID,query,rawdate,timestamp,timegap,epoc,lengthwords,lengthchars,rank,clickurl
5,142,westchester.gov,2006-03-20 03:55:57,Mon Mar 20 03:55:57 CET 2006,0,1142823357504,1,15,1,http://www.westchestergov.com
10,142,207 ad2d 530,2006-04-08 01:31:14,Sat Apr 08 01:31:14 CEST 2006,10000,1144452674507,3,12,1,http://www.courts.state.ny.us
11,142,vera.org,2006-04-08 08:38:42,Sat Apr 08 08:38:42 CEST 2006,11000,1144478322507,1,8,1,http://www.vera.org
Note: there are multiple entries that have the same value for 'Epoc', this is due to the tools used to gather the data
Note2: the list has a size of ~700000, just fyi
Goal: Match pairs of entries that have the same query
Scope: entries that share the same UserID
Due to the mentioned anomaly in the data gathering process, the following has to be considered:
If two entries share the same value for 'Query' and for 'Epoc' , the following elements in the list have to be checked for these criteria until the next entry has a different value for one of these attributes. The group of entries that shared the same Query and Epoc values are to be considered as -one- entry, so in order to match a pair, another entry has to be found that matches the 'Query' value. For lack of a better name, let's call a group that shares the same Query and Epoc value a 'chain'
Now that this is out, it gets a bit easier, there are 3 types of pair compositions we can get out of this:
Entry & Entry
Entry & Chain
Chain & Chain
Type 1 here just means two entries in the list that share the same value for 'Query', but not for 'Epoc'.
So this sums up the Equal Query Pairs
There's also the case of Different Query Pairs which can be described as the following:
After we have matched the equal query pairs, there's the possibility that there are entries which have not been paired with other entries because their query didn't match - every entry that has not been matched to another entry because of this is part of the set called 'different queries'
The members of this set have to be paired without following any criteria, but chains are still treated as -one- entry of the pair.
As for matching the pairs in general, there may be no redundant pairs - a single entry can be part of n many pairs, but two individual entries can only form one pair.
EXAMPLE:
The following entries are to be paired
UID,Query,Epoc,Clickurl
772,Donuts,1141394053510,https://www.dunkindonuts.com/dunkindonuts/en.html
772,Donuts,1141394053510,https://www.dunkindonuts.com/dunkindonuts/en.html
772,Donuts,1141394053510,https://www.dunkindonuts.com/dunkindonuts/en.html
772,raspberry pi,1141394164710,http://www.raspberrypi.org/
772,stackoverflow,1141394274810,http://en.wikipedia.org/wiki/Buffer_overflow
772,stackoverflow,1141394274850,http://www.stackoverflow.com
772,tall women,1141394275921,http://www.tallwomen.org/
772,raspberry pi,1141394277991,http://www.raspberrypi.org/
772,Donuts,114139427999,http://de.wikipedia.org/wiki/Donut
772,stackoverflow,1141394279999,http://www.stackoverflow.com
772,something,1141399299991,http:/something.else/something/
In this example, donuts is a chain, therefore the pairs are(linenumbers without header):
Equal Query Pairs:(1-3,9) (4,8) (5,6) (5,10) (6,10)
Different Query Pairs: (7,11)
My -failed- approach to the problem:
The algorithm I developed to solve this works as follow:
Iterate the list of entries until the value for UserID changes.
Then, applied to a separate list that only contains the just iterated elements that share the same UserID:
for (int i = 0; i < list.size(); i++) {
Entry tempI = list.get(i);
Boolean iMatched = false;
//boolean to save whether or not c1 is set
Boolean c1done = false;
Boolean c2done = false;
//Hashsets holding the clickurl values of the entries that form a pair
HashSet<String> c1 = null;
HashSet<String> c2 = null;
for (int j = i + 1; j < list.size(); j++) {
Entry tempJ = list.get(j);
// Queries match
if (tempI.getQuery().equals(tempJ.getQuery())) {
// wheter or not Entry at position i has been matched or not
if (!iMatched) {
iMatched = true;
}
HashSet<String> e1 = new HashSet<String>();
HashSet<String> e2 = new HashSet<String>();
int k = 0;
// Times match
HashSet<String> chainset = new HashSet<String>();
if (tempI.getEpoc() == tempJ.getEpoc()) {
chainset.add(tempI.getClickurl());
chainset.add(tempJ.getClickurl());
} else {
e1.add(tempI.getClickurl());
if (c1 == null) {
c1 = e1;
c1done = true;
} else {
if (c2 == null) {
c2 = e1;
c2done = true;
}
}
}
//check how far the chain goes and get their entries
if ((j + 1) < list.size()) {
Entry tempjj = list.get(j + 1);
if (tempjj.getEpoc() == tempJ.getEpoc()) {
k = j + 1;
//search for the end of the chain
while ((k < list.size())
&& (tempJ.getQuery().equals(list.get(k)
.getQuery()))
&& (tempJ.getEpoc() == list.get(k).getEpoc())) {
chainset.add(tempJ.getClickurl());
chainset.add(list.get(k).getClickurl());
k++;
}
j = k + 1; //continue the iteration at the end of the chain
if (c1 == null) {
c1 = chainset;
c1done = true;
} else {
if (c2 == null) {
c2 = chainset;
c2done = true;
}
}
// Times don't match
}
} else {
e2.add(tempJ.getClickurl());
if (c1 == null) {
c1 = e2;
c1done = true;
} else {
if (c2 == null) {
c2 = e2;
c2done = true;
}
}
}
/** Block that compares the clicks in the Hashsets and computes the resulting data
* left out for now to not make this any more complicated than it already is
**/
// Queries don't match
} else {
if (!dq.contains(tempJ)) { //note: dq is an ArrayList holding the entries of the differen query set
dq.add(tempJ);
}
}
if (j == al.size() - 1) {
if (!iMatched) {
dq.add(tempI);
}
}
}
if (dq.size() >= 2) {
for (int z = 0; z < dq.size() - 1; z++) {
if (dq.get(z + 1) != null) {
/** Filler, iterate dq just like the normal list with two loops
*
**/
}
}
}
}
So, using an excessive amount of loops I try to match the pairs, resulting in a horribly long runtime which's end I have not seen up until this point
Okay I hope I didn't forget anything crucial, I'll add further needed information later
If you've made it this far, thanks for reading - hopefully you have an idea that might help me

Use SQL to import the data into a db and then perform the queries. Your txt file is too large; it's no wonder that it takes so long to go through it. :)

First, remove all but one entry from each chain. To do this, sort by (userid, query, epoch), remove duplicates.
Then, scan the sorted list. take all entries for a (userid, query) pair. If there is only one, save it in a list for later processing, else emit all pairs.
For all the entries for a given user that You have saved for later processing (these are type 2 & 3), emit pairs.

How to use indexOf on a List containing HashMap with multiple key/value pairs

I have a List containing HashMaps. Each HashMap in the list might have multiple key/value pairs. I want to indexOf on the list to find out the index of the element where the passed in HashMap is. However, the problem is that equals method of HashMap looks at all the entire entrySet while comparing. Which is not what I want.
Example:
List<HashMap> benefit = new ArrayList<HashMap>();
HashMap map1 = new HashMap();
map1.put("number", "1");
benefit.add(map1);
HashMap map2 = new HashMap();
map2.put("number", "2");
map2.put("somethingelse", "blahblah"); //1
benefit.add(map2);
HashMap find = new HashMap();
find.put("number", "2");
int index = benefit.indexOf(find);
if (index >= 0)
System.out.println(benefit.get(index).get("number"));
The above code does not print anything because of line with //1.
What do I have to do so that the above code actually prints 2?
Is there a way to implement comparable on the list so that I can define
my own?

I think you're looking for retainAll(), so you can compare only the elements you're interested in:
int index = myIndexOf(benefit, find);
...
static int myIndexOf(List<HashMap> benefit, Map find) {
int i = 0;
for (Map map : benefit) {
Map tmp = new HashMap(map);
tmp.keySet().retainAll(find.keySet());
if (tmp.equals(find)) {
return i;
}
i++;
}
return -1;
}
It's possible, of course, to declare your own subclass of List that overrides the indexOf method with this behaviour. However, I don't think that's a good idea. It would violate the contract of the indexOf method:
returns the lowest index i such that (o==null ? get(i)==null : o.equals(get(i)))
This would be confusing to someone else maintaining the code. You might then think that you could subclass HashMap to redefine equals, but that would violate the symmetry property of Object.equals().

The way you are trying to achieve your goal is wrong. The indexOf method works exactly as it should in this case. It is trying to find an exact match, not a partial one.
What you are trying to do, if I get it correctly, is to find a map in your list of maps that contains a specific entry. In this case, you should manually perform this search, by going through all the maps, calling containsKey (), and then comparing the value you are expecting to find with the value associated with the key.
The other way would be to create a proxy class around your List, and add a new method findMapWithEntry (String key, String value), which would perform this seach for you (the same search I described above).

Why not change the way you search?
List<Map> matchingBenefits = new ArrayList<Map>();
for (Map m : benefit) {
if (m.containsKey("number") && m.get("number").equals("2"))
matchingBenefits.add(m);
}
for (Map m : matchingBenefits) {
System.out.println(m.get("number"));
}

You can always override the indexOf method. Looking at the source for ArrayList:
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
So it's not a very complex search algorithm at all. You may look at something like:
List benefit = new ArrayList(){
public int indexOf(Object o){
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
else {
for (int i = 0; i < size; i++) //traverse the hashmaps
Object key = ((HashMap)o).keySet().get(0); //assuming one pair
Object val = ((HashMap)o).valueSet().get(0);
if (
((HashMap)elementData[i]).containsKey(key) &&
((HashMap)elementData[i]).get(key).equals(val))
return i;
}
return -1;
};
My advice would be to consider a different data structure, perhaps writing your own one for it.

Given that you cannot change the design, would writing your own find method help?
The code below should work if I understood what you're trying to do and it runs in O(n)
public static String find(List<HashMap<String,String>> listMap, String key, String value) {
for(int i = 0; i < listMap.size(); i++)
if(listMap.get(i).get(key).equals(value))
return value;
return null;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Optimisation of searching HashMap with list of values - java

Related

Read a map N keys at a time

how to find the duplicates in ArrayList using hashmap in java?

Converting singly linked list to a map

Java Algorithm: pair list entries by multiple case criteria

How to use indexOf on a List containing HashMap with multiple key/value pairs

Categories

Resources