I am trying to add to a collection the following pairs
698xxxxxxx - personA
698xxxxxxx - personB
699xxxxxxx - personA
699xxxxxxx - personB
I go through alot of files and try to add to a collection the pairs i find there. I want to be able to have a table that will show each number and what people it was correlated with without having dublicate PAIRS. for example
1-personA ok
1-personB ok
2-personA ok
3-personB ok
3-personB NOT OK as its already there
I tried using a Multimap but i m not sure if its the right choice. Whatever the solution is please show me how to iterrate through its values as well so i can use the pairs. Sorry for the demanding post but i m new with Java and i find a lil hard to understand the APIs.
Thanks in advance
There are three obvious alternative, depending on what you require.
If there can only be one person for each phone number, then a simple Map<PhoneNo, Name>.
If there a given phone number can be associated with multiple people, then either a Map<Phone,Set<Name>> or a multi-map class.
If you also want to find out the phone number or numbers for each person, you need two maps or two multi-maps ... or a bidirectional map.
There is a secondary choice you need to make: hash-table versus tree-based organizations. A hash table will give you O(1) lookup/insert/remove (assuming that the hash function is good). A tree-based implementation gives O(logN) operations ... but it also allows you to iterate over the entries (or values) in key order.
While the standard Java class libraries don't provide multi-maps or bidirectional maps, they can easily be implemented by combining the simple collection classes.
You can choose Map Interface in Java, which accepts key and value pairs.
You can have this as a reference: http://www.tutorialspoint.com/java/java_map_interface.htm
You may need a hashmap with the key as the name of the person and value as a HashSet of numbers. Hashset does not allow duplicates, so duplicate numbers will not be stored in that. Here is the code:
HashMap<String,HashSet> Records;
In Java there is a couple of options. If you don't know about the cardinality of persons or numbers, then go for:
public class Pair {
String person;
String number;
}
Then use a Set to be save from doublettes like
Set<Pair> pairs = new HashSet<>();
....
pairs.add( new Pair( "689xxxx", "personA" );
for ( Pair pair : pairs ) {
System.out.println( pair.number + " - " + pair.person );
}
Hajo
Related
I have a pretty large list containing many instances of one class, this class has many attributes(member variables). My problem is to find a feasible data structure to store these instances that allow searches based on multiple attributes like database search(i.e. A Student class, each student has age, date of birth, grade and GPA.find all 2nd year students whose ages are between 20 and 23). The Map seems not applicable as it only allow single key and if I create multi attribute index for searching, the big O is still not decreased. I also considered using trees like AVL tree, and I don't think it would work.
I'd be grateful if someone could give me some hints.
I think what you are looking for is an Inverted Index (using attribute name + value as keys) or possibly one Inverted Index per attribute. A search would build the intersection of all results found for each attribute.
You could do this:
Build an AVL tree with objects sorted by the most recurrent attribute (just one, e.g. "id" or "name").
Then create a search function that instead of taking a value, takes a Java lambda expression F (so your seacrh condition must be something like F(myObj) == true instead of myObj.deFaultAttribute == searchParameter)
For the example you gave, F could be something like ((myObj) -> myObj.year==2 && myObj.age>=20 && myObj.age<=23)
I hope it helps.
While implementing an ip-lookup structure, I was trying to maintain a set of keys in a trie-like structure that allows me to search the "floor" of a key (that is, the largest key that is less or equal to a given key). I decided to use Apache Collections 4 PatriciaTrie but sadly, I found that the floorEntry and related methods are not public. My current "dirty" solution is forcing them with reflection (in Scala):
val pt = new PatriciaTrie[String]()
val method = pt.getClass.getSuperclass.getDeclaredMethod("floorEntry", classOf[Object])
method.setAccessible(true)
// and then for retrieving the entry for floor(key)
val entry = method.invoke(pt, key).asInstanceOf[Entry[String, String]]
Is there any clean way to have the same functionality? Why this methods are not publicly available?
Why those methods are not public, I don't know. (Maybe it's because you can achieve what you want with common Map API).
Here's a way to fulfil your requirement:
PatriciaTrie<String> trie = new PatriciaTrie<>();
trie.put("a", "a");
trie.put("b", "b");
trie.put("d", "d");
String floorKey = trie.headMap("d").lastKey(); // d
According to the docs, this is very efficient, since it depends on the number of bits of the largest key of the trie.
EDIT: As per the comment below, the code above has a bounds issue: headMap() returns a view of the map whose keys are strictly lower than the given key. This means that, i.e. for the above example, trie.headMap("b").lastKey() will return "a", instead of "b" (as needed).
In order to fix this bounds issue, you can use the following trick:
String cFloorKey = trie.headMap("c" + "\uefff").lastKey(); // b
String dFloorKey = trie.headMap("d" + "\uefff").lastKey(); // d
Now everything works as expected, since \uefff is the highest unicode character. Actually, searching for key + "\uefff", whatever key is, will always return key if it belongs to the trie, or the element immediately prior to key, if key is not present in the trie.
Now, this trick works for String keys, but is extensible to other types as well. i.e. for Integer keys you could search for key + 1, for Date keys you could add 1 millisecond, etc.
In Java.
How can I map a set of numbers(integers for example) to another set of numbers?
All the numbers are positive and all the numbers are unique in their own set.
The first set of numbers can have any value, the second set of numbers represent indexes of an array, and so the goal is to be able to access the numbers in the second set through the numbers in the first set. This is a one to one association.
Speed is crucial as the method will have to be called many times each second.
Edit: I tried it with SE hashmap implementation, but found it to be slow for my purposes.
There's an article, devoted to this problem (with a solution): Implementing a world fastest Java int-to-int hash map
Code can be found in related GitHub repository. (Best results are in class IntIntMap4a.java )
Citation from the article:
Summary
If you want to optimize your hash map for speed, you have to do as much as you can of the following list:
Use underlying array(s) with capacity equal to a power of 2 - it will allow you to use cheap & instead of expensive % for array index
Do not store the state in the separate array - use dedicated fields for free/removed keys and values.
Interleave keys and values in the one array - it will allow you to load a value into memory for free.
Implement a strategy to get rid of 'removed' cells - you can sacrifice some of remove performance in favor of more frequent get/put.
Scramble the keys while calculating the initial cell index - this is required to deal with the case of consecutive keys.
Yes, I know how to use citation formatting. But it looks awful and doesn't handle bullet lists well.
The structure you are looking for is called an associative array. In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears just once in the collection.
In java in particular as already mentioned this is easily done with a HashMap.
HashMap<Integer, Integer> cache = new HashMap<Integer, Integer>();
You can insert elements with the method put
cache.put(21, 42);
and you can retrieve a value with get
Integer key = 21
Integer value = cache.get(key);
System.out.println("Key: " + key +" value: "+ value);
Key: 21 value: 42
If you want to iterate through data you need to define an iterator:
Iterator<Integer> Iterator = cache.keySet().iterator();
while(Iterator.hasNext()){
Integer key = Iterator.next();
System.out.println("key: " + key + " value: " + cache.get(key));
}
Sounds like HashMap<Integer,Integer> is what you're looking for.
If you are willing to use an external library, you can use apache's IntToIntMap, which is a part of Apache Lucene.
It implements a pretty efficient int to int map that uses primitives for tasks that should not suffer the boxing overhead.
If you have a limit for the size of the first list, you can just use a large array. Suppose you know there first list only has numbers 0-99, you can use int[100]. Use the first number as an array index.
Your requirements can be satisfied by the Map interface. As an example, see HashMap<K,V>.
See Map and HashMap
I have 2 sets of data.
Let say one is a people, another is a group.
A people can be in multiple groups while a group can have multiple people.
My operations will basically be CRUD on group and people.
As well as a method that makes sure a list of people are in different groups (which gets called alot).
Right now I'm thinking of making a table of binary 0's and 1's with horizontally representing all the people and vertically all the groups.
I can perform the method in O(n) time by adding each list of binaries and compare with the "and" operation of the list of binaries.
E.g
Group A B C D
ppl1 1 0 0 1
ppl2 0 1 1 0
ppl3 0 0 1 0
ppl4 0 1 0 0
check (ppl1, ppl2) = (1001 + 0110) == (1001 & 0110)
= 1111 == 1111
= true
check (ppl2, ppl3) = (0110 + 0010) == (0110+0010)
= 1000 ==0110
= false
I'm wondering if there is a data structure that does something similar already so I don't have to write my own and maintain O(n) runtime.
I don't know all of the details of your problem, but my gut instinct is that you may be over thinking things here. How many objects are you planning on storing in this data structure? If you have really large amounts of data to store here, I would recommend that you use an actual database instead of a data structure. The type of operations you are describing here are classical examples of things that relational databases are good at. MySQL and PostgreSQL are examples of large scale relational databases that could do this sort of thing in their sleep. If you'd like something lighter-weight SQLite would probably be of interest.
If you do not have large amounts of data that you need to store in this data structure, I'd recommend keeping it simple, and only optimizing it when you are sure that it won't be fast enough for what you need to do. As a first shot, I'd just recommend using java's built in List interface to store your people and a Map to store groups. You could do something like this:
// Use a list to keep track of People
List<Person> myPeople = new ArrayList<Person>();
Person steve = new Person("Steve");
myPeople.add(steve);
myPeople.add(new Person("Bob"));
// Use a Map to track Groups
Map<String, List<Person>> groups = new HashMap<String, List<Person>>();
groups.put("Everybody", myPeople);
groups.put("Developers", Arrays.asList(steve));
// Does a group contain everybody?
groups.get("Everybody").containsAll(myPeople); // returns true
groups.get("Developers").containsAll(myPeople); // returns false
This definitly isn't the fastest option available, but if you do not have a huge number of People to keep track of, you probably won't even notice any performance issues. If you do have some special conditions that would make the speed of using regular Lists and Maps unfeasible, please post them and we can make suggestions based on those.
EDIT:
After reading your comments, it appears that I misread your issue on the first run through. It looks like you're not so much interested in mapping groups to people, but instead mapping people to groups. What you probably want is something more like this:
Map<Person, List<String>> associations = new HashMap<Person, List<String>>();
Person steve = new Person("Steve");
Person ed = new Person("Ed");
associations.put(steve, Arrays.asList("Everybody", "Developers"));
associations.put(ed, Arrays.asList("Everybody"));
// This is the tricky part
boolean sharesGroups = checkForSharedGroups(associations, Arrays.asList(steve, ed));
So how do you implement the checkForSharedGroups method? In your case, since the numbers surrounding this are pretty low, I'd just try out the naive method and go from there.
public boolean checkForSharedGroups(
Map<Person, List<String>> associations,
List<Person> peopleToCheck){
List<String> groupsThatHaveMembers = new ArrayList<String>();
for(Person p : peopleToCheck){
List<String> groups = associations.get(p);
for(String s : groups){
if(groupsThatHaveMembers.contains(s)){
// We've already seen this group, so we can return
return false;
} else {
groupsThatHaveMembers.add(s);
}
}
}
// If we've made it to this point, nobody shares any groups.
return true;
}
This method probably doesn't have great performance on large datasets, but it is very easy to understand. Because it's encapsulated in it's own method, it should also be easy to update if it turns out you need better performance. If you do need to increase performance, I would look at overriding the equals method of Person, which would make lookups in the associations map faster. From there you could also look at a custom type instead of String for groups, also with an overridden equals method. This would considerably speed up the contains method used above.
The reason why I'm not too concerned about performance is that the numbers you've mentioned aren't really that big as far as algorithms are concerned. Because this method returns as soon as it finds two matching groups, in the very worse case you will call ArrayList.contains a number of times equal to the number of groups that exist. In the very best case scenario, it only needs to be called twice. Performance will likely only be an issue if you call the checkForSharedGroups very, very often, in which case you might be better off finding a way to call it less often instead of optimizing the method itself.
Have you considered a HashTable? If you know all of the keys you'll be using, it's possible to use a Perfect Hash Function which will allow you to achieve constant time.
How about having two separate entities for People and Group. Inside People have a set of Group and vice versa.
class People{
Set<Group> groups;
//API for addGroup, getGroup
}
class Group{
Set<People> people;
//API for addPeople,getPeople
}
check(People p1, People p2):
1) call getGroup on both p1,p2
2) check the size of both the set,
3) iterate over the smaller set, and check if that group is present in other set(of group)
Now, you can basically store People object in any data structure. Preferably a linked list if size is not fixed otherwise an array.
So the question is regarding optimization of the code. I have a table for retirement date which im going to list below
Year of Birth Full Retirement Age
1937 or earlier.............................65
1938........................................65 years 2 months
1939........................................65-4
1934.......................................65-6
.
.
.and the list is a long list
What i want to do is to store this table in a in list object or something so that I can pass in the year of birth in a method and the list object and get back the corresponding retirement age. I dont want to have a lot of If and Else Statements in my code because the list is so damn big and the code will be confusing.
What can be a possible solution for this problem?
Thanks in advance
Try using map instead of list. Use year of birth as key, so that you can directly get the associated value from the map.
You can use map but there is a chance for duplicate keys.
Two persons can born in same year.
Use MultiMap
A Multimap that can hold duplicate key-value pairs and that maintains the insertion ordering of values for a given key. See the Multimap documentation for information common to all multimaps.
Use a map. Map is a List object with Key:Value.
Map<String, Object> map = new HashMap<String, Object>();
map.put('1937', 65);
...
To go through a map you can use this:
for (String key : map.keySet()) {
System.out.println(map.get(key));
}
You can change values for <String, Object> as you wish (Integer, Date... or whatever). Always follow the same order <KeyType, ValueType>
Store your list/table into a HashMap...then retrieve from your method, something like:
public String getRetirementAge(String yearOfBirth) {
return yourMap.get(yearOfBirth);
}
If you have data for every year i would use a java map http://docs.oracle.com/javase/tutorial/collections/interfaces/map.html where the key is the year and the value is the retirement value.
This would give you an O(1)
If you have sparse data and you have somehow to calculate the nearest year you could either use a sorted List and use Binary search which gives you an O(logn) or even use a B-tree.
BR,
David
I would recommend that you store this information in a database, especially if the list is a very long list (which you say it is). There will be many optimizations that come from using a database. For one thing, you won't have to store that huge list in memory. For another, SQL queries for data are often much faster than data structures in code. Martin Fowler has an (admittedly old) article about this at http://www.martinfowler.com/articles/dblogic.html. The other thing you gain from putting this in a database is that this is the type of list that is likely to change. They are already talking about adjusting retirement age in order to save social security. It is much easier to update data in a database than it is to edit code and recompile / redeploy.
The type of database you use can be NoSQL or relational, embedded or online. That decision I'll leave up to you. It will be a bonus for you if there is already a database available to this application for other reasons.