Read a map N keys at a time - java

I have a map (instaWords) which is filled with thousands of words. I need to to loop over it N item at a time. Here is my code. In this code I need to read instaWords in the chunks of e.g 500 words and execute "updateInstaPhrases" with those words. Any help?
private static Map<InstaWord, List<Integer>> instaWords = new HashMap<InstaWord, List<Integer>>();
// here a couple of words are added to instaWords
updateInstaPhrases(instaWords);
private static void updateInstaPhrases(Map<InstaWord, List<Integer>> wordMap)
throws SQLException, UnsupportedEncodingException {
for (Map.Entry<InstaWord, List<Integer>> entry : wordMap.entrySet()) {
InstaWord instaWord = entry.getKey();
List<Integer> profiles = entry.getValue();
pst.setBytes(1, instaWord.word.getBytes("UTF-8"));
pst.setBytes(2, instaWord.word.getBytes("UTF-8"));
pst.setBytes(3, (instaWord.lang == null) ?·
"".getBytes("UTF-8") :·
instaWord.lang.getBytes("UTF-8"));
String profilesList = "";
boolean first = true;
for (Integer p : profiles) {
profilesList += (first ? "" : ", ") + p;
first = false;
}
pst.setString(4, profilesList);
pst.addBatch();
}
System.out.println("Words batch executed");
pst.executeBatch();
con.commit();
}
What I need is to iterate through a hashmap 'in chunks' (e.g. 500 item each time)

You may keep a counter, initialize to 0 and increment for each item, while collecting the items as you see fit (like, say, ArrayList<Map.Entry<InstaWord, List<Integer>>>). If counter (after increment) equals 500, process the whole batch, reset counter to 0 and clear the collection.
Another option is to have the counter control the loop and declare explicitly the iterator you draw the Map.Entrys from. In this way it’s probably a bit clearer what is going on.

Related

Java HashMap performing put() and get() in same line

I am using a HashMap<String, Integer> to keep track of count of an occurrence of a specific string. I am performing this operation in a single-thread manner in the following way:
HashMap<String, Integer> count = new HashMap<>();
// List<String≥ words = ...;
for (String word : words) {
if (!count.containsKey(word)) {
count.put(word, 0);
}
count.put(word, count.get(word) + 1);
}
Is it possible, for the same word, the count increases by more than 1 because I am performing a put and get on the same key at the same time? i.e. Let's say the word = "hello". Initially, count.get(word) => 1. When I perform count.put(word, count.get(word) + 1), if I do count.get(word), instead of getting 2, I get 3.
To answer your questions directly: no it is not possible for the statement count.put(word, count.get(word) + 1) to increment the value by more than 1. Although the two method calls are in the same statement they are performed sequentially: the get is performed first to find the second argument to pass to the put.
You can combine your missing key test and initialisation into a single statement:
count.putIfAbsent(word, 0);
This conveniently returns the value afterwards, allowing:
count.put(word, 1 + count.putIfAbsent(word, 0));
However there is also a method that already combines those two operations:
count.merge(word, 1, Integer::sum);
Map has methods compute and merge that would allow to implement shorter updates of the values for the keys:
compute
for (String word : words) {
count.compute(word, (w, prev) -> prev == null ? 1 : prev + 1);
}
merge
for (String word : words) {
count.merge(word, 1, (prev, one) -> prev + one);
}
Lambda expression (prev, one) -> prev + one is actually a function of two int arguments returning their sum, therefore it may be replaced with a method reference Integer::sum:
for (String word : words) {
count.merge(word, 1, Integer::sum);
}
It absolutely safe to do it in a single thread.
No, it's not possible that "count increases by more than 1 because I am performing a put and get on the same key at the same time" because two operations never can happen at the same time with single-threaded execution.
Code count.put(word, count.get(word) + 1); will execute commands in following order:
Integer value1 = count.get(word);
int value2 = value1.intValue();
int value3 = value2 + 1;
Integer value4 = new Integer(value3);
count.put(word, value4);
By the way, your code will produce quite a lot of garbage and will be not very effective.
This way is more effective:
private static class CounterHolder{
int value;
}
Map<String, CounterHolder> count = new HashMap<>();
List<String> words = ...
for (String word : words) {
CounterHolder holder;
if (count.containsKey(word)) {
holder = new CounterHolder();
} else {
holder = new CounterHolder();
count.put(word, holder);
}
++holder.value;
}

Optimisation of searching HashMap with list of values

I have a map in which values have references to lists of objects.
//key1.getElements() - produces the following
[Element N330955311 ({}), Element N330955300 ({}), Element N3638066598 ({})]
I would like to search the list of every key and find the occurrence of a given element (>= 2).
Currently my approach to this is every slow, I have a lot of data and I know execution time is relative but it takes 40seconds~.
My approach..
public String occurance>=2 (String id)
//Search for id
//Outer loop through Map
//get first map value and return elements
//inner loop iterating through key.getElements()
//if match with id..then iterate count
//return Strings with count == 2 else return null
The reason why this is so slow is because I have a lot of ids which I'm searching for - 8000~ and I have 3000~ keys in my map. So its > 8000*3000*8000 (given that every id/element exists in the key/valueSet map at least once)
Please help me with a more efficient way to make this search. I'm not too deep into practicing Java, so perhaps there's something obvious I'm missing.
Edited in real code after request:
public void findAdjacents() {
for (int i = 0; i < nodeList.size(); i++) {
count = 0;
inter = null;
container = findIntersections(nodeList.get(i));
if (container != null) {
intersections.add(container);
}
}
}
public String findIntersections(String id) {
Set<Map.Entry<String, Element>> entrySet = wayList.entrySet();
for (Map.Entry entry : entrySet) {
w1 = (Way) wayList.get(entry.getKey());
for (Node n : w1.getNodes()) {
container2 = String.valueOf(n);
if (container2.contains(id)) {
count++;
}
if (count == 2) {
inter = id;
count = 0;
}
}
}
if (inter != (null))
return inter;
else
return null;
}
Based on the pseudocode provided by you, there is no need to iterate all the keys in the Map. You can directly do a get(id) on the map. If the Map has it, you will get the list of elements on which you can iterate and get the element if its count is > 2. If the id is not there then null will be returned. So in that case you can optimize your code a bit.
Thanks

how to find the duplicates in ArrayList using hashmap in java?

my program is reading large txt files(in MBs) which contain the source ip and destination ip(for example 192.168.125.10,112.25.2.1) ,,,Here read is an ArrayList in which the data is present.
i have generated unique ids(uid int type) using srcip and destip and now i am storing in
static ArrayList<Integer[]> prev = new ArrayList<Integer[]>();
where Array is
:-
static Integer[] multi1;
multi1 = new Integer[]{(int)uid,count,flag};
i have to print the all uids with there count or their frequencies using hashmap.
Plz give some solution...
for (ArrayList<String> read : readFiles.values())
{
if(file_count<=2)
{
for(int i=0 ; i<read.size() ; i++)
{
String str1=read.get(i).split(",")[0];//get only srcIP
String str2=read.get(i).split(",")[1];//get only destIP
StringTokenizer tokenizer1=new StringTokenizer(str1,".");
StringTokenizer tokenizer2=new StringTokenizer(str2,".");
if(tokenizer1.hasMoreTokens()&&tokenizer2.hasMoreTokens())
{
sip_oct1=Integer.parseInt(tokenizer1.nextToken());
sip_oct2=Integer.parseInt(tokenizer1.nextToken());
sip_oct3=Integer.parseInt(tokenizer1.nextToken());
sip_oct4=Integer.parseInt(tokenizer1.nextToken());
dip_oct1=Integer.parseInt(tokenizer2.nextToken());
dip_oct2=Integer.parseInt(tokenizer2.nextToken());
dip_oct3=Integer.parseInt(tokenizer2.nextToken());
dip_oct4=Integer.parseInt(tokenizer2.nextToken());
uid=uniqueIdGenerator(sip_oct1,sip_oct2,sip_oct3,sip_oct4,dip_oct1,dip_oct2,dip_oct3,dip_oct4);
}
multi1 = new Integer[]{(int)uid,count,flag};
prev.add(multi1);
System.out.println(prev.get(i)[0]);//getting uids from prev
Map<ArrayList<Integer []> , Integer> map = new HashMap<ArrayList<Integer[]>, Integer>();
for (int j=0 ; j<prev.size() ; j++)
{
Integer temp=map.get(prev.get(i)[0]);
count = map.get(temp);
map.put(temp, (count == null) ? 1 : count++);
}
printMap(map);
System.out.println("uids--->"+prev.get(i)[0]+" Count--- >"+count+" flag--->"+prev.get(i)[2]);
}
}
file_count++;
}
}
public static void printMap(Map<ArrayList<Integer[]>, Integer> map)
{
for (Entry<ArrayList<Integer[]>, Integer> entry : map.entrySet())
{
System.out.println(" Value : "+ entry.getValue()+"key : "+entry.getKey());
}
}
public static double uniqueIdGenerator(int oc1,int oc2,int oc3,int oc4,int oc5,int oc6,int oc7,int oc8)
{
int a,b;
double c;
a=((oc1*10+oc2)*10+oc3)*10+oc4;
b=((oc5*10+oc6)*10+oc7)*10+oc8;
c= Math.log(a)+Math.log(b);
return Math.round(c*1000);
}
Now understanding what you want, there are (at least) 2 ways of doing this.
1st: Make a list with the uid's. Then a second list where you can have a value (your uid) and keep a count. Was thinking of HashMap, but there you can not easily change the count. Maybe an ArrayList of a list with 2 values.
Then loop over your list with the uid's, check with a second for loop if the uid is already in the second list. If it is, add one to the count. If it is not, add it to the list.
2nd: Do the same thing, but then with classes (very Java). Then you can put even more info into the class ;)
Hope this helps!
*edit: #RC. indeed gives cleaner code.

Converting singly linked list to a map

I have been given an assignment to change to upgrade an existing one.
Figure out how to recode the qualifying exam problem using a Map for each terminal line, on the
assumption that the size of the problem is dominated by the number of input lines, not the 500
terminal lines
The program takes in a text file that has number, name. The number is the PC number and the name is the user who logged on. The program returns the user for each pc that logged on the most. Here is the existing code
public class LineUsageData {
SinglyLinkedList<Usage> singly = new SinglyLinkedList<Usage>();
//function to add a user to the linked list or to increment count by 1
public void addObservation(Usage usage){
for(int i = 0; i < singly.size(); ++i){
if(usage.getName().equals(singly.get(i).getName())){
singly.get(i).incrementCount(1);
return;
}
}
singly.add(usage);
}
//returns the user with the most connections to the PC
public String getMaxUsage(){
int tempHigh = 0;
int high = 0;
String userAndCount = "";
for(int i = 0; i < singly.size(); ++i){//goes through list and keeps highest
tempHigh = singly.get(i).getCount();
if(tempHigh > high){
high = tempHigh;
userAndCount = singly.get(i).getName() + " " + singly.get(i).getCount();
}
}
return userAndCount;
}
}
I am having trouble on the theoretical side. We can use a hashmap or a treemap. I am trying to think through how I would form a map that would hold the list of users for each pc? I can reuse the Usage object which will hold the name and the count of the user. I am not supposed to alter that object though
When checking if Usage is present in the list you perform a linear search each time (O(N)). If you replace your list with the Map<String,Usage>, you'll be able to search for name in sublinear time. TreeMap has O(log N) time for search and update, HashMap has amortized O(1)(constant) time.
So, the most effective data structure in this case is HashMap.
import java.util.*;
public class LineUsageData {
Map<String, Usage> map = new HashMap<String, Usage>();
//function to add a user to the map or to increment count by 1
public void addObservation(Usage usage) {
Usage existentUsage = map.get(usage.getName());
if (existentUsage == null) {
map.put(usage.getName(), usage);
} else {
existentUsage.incrementCount(1);
}
}
//returns the user with the most connections to the PC
public String getMaxUsage() {
Usage maxUsage = null;
for (Usage usage : map.values()) {
if (maxUsage == null || usage.getCount() > maxUsage.getCount()) {
maxUsage = usage;
}
}
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
// alternative version that uses Collections.max
public String getMaxUsageAlt() {
Usage maxUsage = map.isEmpty() ? null :
Collections.max(map.values(), new Comparator<Usage>() {
#Override
public int compare(Usage o1, Usage o2) {
return o1.getCount() - o2.getCount();
}
});
return maxUsage == null ? null : maxUsage.getName() + " " + maxUsage.getCount();
}
}
Map can also be iterated in the time proportional to it's size, so you can use the same procedure to find maximum element in it. I gave you two options, either manual approach, or usage of Collections.max utility method.
With simple words: You use a LinkedList (singly or doubly) when you have a list of items, and you usually plan to traverse them,
and a Map implementation when you have "Dictionary-like" entries, where a key corresponds to a value and you plan to access the value using the key.
In order to convert your SinglyLinkedList to a HashMap or TreeMap, you need find out which property of your item will be used as your key (it must be an element with unique values).
Assuming you are using the name property from your Usage class, you can do this
(a simple example):
//You could also use TreeMap, depending on your needs.
Map<String, Usage> usageMap = new HashMap<String, Usage>();
//Iterate through your SinglyLinkedList.
for(Usage usage : singly) {
//Add all items to the Map
usageMap.put(usage.getName(), usage);
}
//Access a value using its name as the key of the Map.
Usage accessedUsage = usageMap.get("AUsageName");
Also note that:
Map<string, Usage> usageMap = new HashMap<>();
Is valid, due to diamond inference.
I Solved this offline and didn't get a chance to see some of the answers which looked to be both very helpful. Sorry about that Nick and Aivean and thanks for the responses. Here is the code i ended up writing to get this to work.
public class LineUsageData {
Map<Integer, Usage> map = new HashMap<Integer, Usage>();
int hash = 0;
public void addObservation(Usage usage){
hash = usage.getName().hashCode();
System.out.println(hash);
while((map.get(hash)) != null){
if(map.get(hash).getName().equals(usage.name)){
map.get(hash).count++;
return;
}else{
hash++;
}
}
map.put(hash, usage);
}
public String getMaxUsage(){
String str = "";
int tempHigh = 0;
int high = 0;
//for loop
for(Integer key : map.keySet()){
tempHigh = map.get(key).getCount();
if(tempHigh > high){
high = tempHigh;
str = map.get(key).getName() + " " + map.get(key).getCount();
}
}
return str;
}
}

Most efficient way to check if a pair is present in a text

Introduction:
One of the features used by many sentiment analysis programs is calculated by assigning to relevant unigrams, bigrams or pairs a specific score according to a lexicon. More in detail:
An example lexicon could be:
//unigrams
good 1
bad -1
great 2
//bigrams
good idea 1
bad idea -1
//pairs (--- stands for whatever):
hold---up -0.62
how---i still -0.62
Given a sample text T, for each each unigram, bigram or pair in T i want to check if a correspondence is present in the lexicon.
The unigram\bigram part is easy: i load the lexicon in a Map and then iterate my text, checking each word if present in the dictionary. My problems is with detecting pairs.
My Problem:
One way to check if specific pairs are present in my text would be to iterate the whole lexicon of pairs and use a regex on the text. Checking for each word in the lexicon if "start_of_pair.*end_of_pair" is present in the text. This seems very wasteful, because i'd have to iterate the WHOLE lexicon for each text to analyze. Any ideas on how to do this in a smarter way?
Related questions: Most Efficient Way to Check File for List of Words and Java: Most efficient way to check if a String is in a wordlist
One could realize a frequency map of bigrams as:
Map<String, Map<String, Integer> bigramFrequencyMap = new TreeMap<>();
Fill the map with the desired bigrams with initial frequency 0.
First lexeme, second lexeme, to frequency count.
static final int MAX_DISTANCE = 5;
Then a lexical scan would keep the last #MAX_DISTANCE lexemes.
List<Map<String, Integer>> lastLexemesSecondFrequencies = new ArrayList<>();
void processLexeme() {
String lexeme = readLexeme();
// Check whether there is a bigram:
for (Map<String, Integer> prior : lastLexemesSecondFrequencies) {
Integer freq = prior.get(lexeme);
if (freq != null) {
prior.put(lexeme, 1 + freq);
}
}
Map<String, Integer> lexemeSecondFrequencies =
bigramFrequencyMap.get(lexeme);
if (lexemeSecondFrequencies != null) {
// Could remove lexemeSecondFrequencies if present in lastLexemes.
lastLexems.add(0, lexemeSecondFrequencies); // addFirst
if (lastLexemes.size() > MAX_DISTANCE) {
lastLexemes.remove(lastLexemes.size() - 1); // removeLast
}
}
}
The optimization is to keep the bigrams second half, and only handle registered bigrams.
At the end i ended up solving it this way:
I loaded the pair lexicon as Map<String, Map<String, Float>> - where the first key is the first half of the pairs, the inner map holds all the possible ending for that key's start and the corresponding sentiment value.
Basically i have a List of possible endings (enabledTokens) which i increase each time i read a new token - and then i search this list to see if the current token is the ending of some previous pair.
With a few modifications to prevent the previous token from being used right away for an ending, this is my code:
private Map<String, Map<String, Float>> firstPartMap;
private List<LexiconPair> enabledTokensForUnigrams, enabledTokensForBigrams;
private Queue<List<LexiconPair>> pairsForBigrams; //is initialized with two empty lists
private Token oldToken;
public void parseToken(Token token) {
String unigram = token.getText();
String bigram = null;
if (oldToken != null) {
bigram = oldToken.getText() + " " + token.getText();
}
checkIfPairMatchesAndUpdateFeatures(unigram, enabledTokensForUnigrams);
checkIfPairMatchesAndUpdateFeatures(bigram, enabledTokensForBigrams);
List<LexiconPair> pairEndings = toPairs(firstPartMap.get(unigram));
if(bigram!=null)pairEndings.addAll(toPairs(firstPartMap.get(bigram)));
pairsForBigrams.add(pairEndings);
enabledTokensForUnigrams.addAll(pairEndings);
enabledTokensForBigrams.addAll(pairsForBigrams.poll());
oldToken = token;
}
private void checkIfPairMatchesAndUpdateFeatures(String text, List<LexiconPair> listToCheck) {
Iterator<LexiconPair> iter = listToCheck.iterator();
while (iter.hasNext()) {
LexiconPair next = iter.next();
if (next.getText().equals(text)) {
float val = next.getValue();
POLARITY polarity = getPolarity(val);
for (LexiconFeatureSubset lfs : lexiconsFeatures) {
lfs.handleNewValue(Math.abs(val), polarity);
}
//iter.remove();
//return; //remove only 1 occurrence
}
}
}

Categories