How to "join" Hashtables in Java? - java

I have two strings:
A { 1,2,3,4,5,6 }
B { 6,7,8,9,10,11 }
it doesnt really matter what the numbers are in the strings. So then the user is going to pick what to join:
A hashjoin A.a1 = B.b5 B
I think I put the A into a hashtable by the A.a1 as the key and then iterate through B? The keys will be what the user wants then to join on and the data will be whats in the strings.

Are you sure you're trying to join hashtables? Perhaps you have the wrong data structure?
Look into java.util.Set (and java.util.HashSet). If you want the items that are in both tables, then it's a simple Set operation like so:
Collection A = new ...
...fill the A up...
Collection B = new ...
...fill the B up...
Set join = new HashSet();
join.addAll(A);
join.retainAll(B);
If you mean something more like a SQL table join, then the output will depend on what type of join you mean to perform, and what the equals sign means in this case. Note you'll have to write a Pair class (which you should make more descriptive than Pair for your exact case)
For a full join:
ArrayList pairs = new ArrayList();
for (Number numberA : A) {
for (Number numberB : B) {
pairs.add(new Pair(numberA, numberB));
}
}
For a full join with a where clause:
ArrayList pairs = new ArrayList();
for (Number numberA : A) {
for (Number numberB : B) {
if (check the condition of the where clause here) {
pairs.add(new Pair(numberA, numberB));
}
}
}
That's about as good an answer that can be given under the circumstances, as your question isn't very specific. If these general answers don't help you out, then you'll need to explain your question in more detail to get a more detailed answer.
--- First Edit, after some clarification ---
Ok, so it's an SQL-like equi-join.
Hashtables are Maps, which means they have an element in one "domain" which can be used to look up an element in another "domain". In a hash table, the first domain is the set of keys, and the second domain is the set of values. Think of it as a bunch of labels and a bunch of items. If the equi-join is to be performed, it must join like elements. That means it will either join one key to another key, or it will join one value to another value.
For keys:
Hashtable A = ...
Hashtable B = ...
Set keyJoin = new HashSet();
keyJoin.addAll(A.keySet());
keyJoin.retainAll(B.keySet());
For values:
Hashtable A = ...
Hashtable B = ...
Set valueJoin = new HashSet();
valueJoin.addAll(A.values());
valueJoin.retainAll(B.values());
It doesn't make sense to join the hashtables themselves; because, one "matching" value may live in both hashtables but be referenced by two different keys. Likewise, one "matching" key found in two different hashtables might not refer to the same value.

Your question doesn't make much sense. A hashtable (or hashmap), stores data as keys and values. You've said nothing about which of those values should be keys, and which should be values.

Related

JAVA : Best performance-wise method to find an object stored in hashMap

I have a bunch of objects stored in hashMap<Long,Person> i need to find the person object with a specific attribute without knowing its ID.
for example the person class:
public person{
long id;
String firstName;
String lastName;
String userName;
String password;
String address;
..
(around 7-10 attributes in total)
}
lets say i want to find the object with username = "mike". Is there any method to find it without actually iterating on the whole hash map like this :
for (Map.Entry<Long,Person> entry : map.entrySet()) {
if(entry.getValue().getUserName().equalsIgnoreCase("mike"));
the answers i found here was pretty old.
If you want speed and are always looking for one specific attribute, your best bet is to create another 'cache' hash-map keyed with that attribute.
The memory taken up will be insignificant for less than a million entries and the hash-map lookup will be much much faster than any other solution.
Alternatively you could put all search attributes into a single map (ie. names, and ids). Prefix the keys with something unique if you're concerned with collisions. Something like:
String ID_PREFIX = "^!^ID^!^";
String USERNAME_PREFIX = "^!^USERNAME^!^";
String FIRSTNAME_PREFIX = "^!^FIRSTNAME^!^";
Map<String,Person> personMap = new HashMap<String,Person>();
//add a person
void addPersonToMap(Person person)
{
personMap.put(ID_PREFIX+person.id, person);
personMap.put(USERNAME_PREFIX+person.username, person);
personMap.put(FIRSTNAME_PREFIX+person.firstname, person);
}
//search person
Person findPersonByID(long id)
{
return personMap.get(ID_PREFIX+id);
}
Person findPersonByUsername(String username)
{
return personMap.get(USERNAME_PREFIX+username);
}
//or a more generic version:
//Person foundPerson = findPersonByAttribute(FIRSTNAME_PREFIX, "mike");
Person findPersonByAttribute(String attr, String attr_value)
{
return personMap.get(attr+attr_value);
}
The above assumes that each attribute is unique amongst all the Persons. This might be true for ID and username, but the question specifies firstname=mike which is unlikely to be unique.
In that case you want to abstract with a list, so it would be more like this:
Map<String,List<Person>> personMap = new HashMap<String,List<Person>>();
//add a person
void addPersonToMap(Person person)
{
insertPersonIntoMap(ID_PREFIX+person.id, person);
insertPersonIntoMap(USERNAME_PREFIX+person.username, person);
insertPersonIntoMap(FIRSTNAME_PREFIX+person.firstname, person);
}
//note that List contains no duplicates, so can be called multiple times for the same person.
void insertPersonIntoMap(String key, Person person)
{
List<Person> personsList = personMap.get(key);
if(personsList==null)
personsList = new ArrayList<Person>();
personsList.add(person);
personMap.put(key,personsList);
}
//we know id is unique, so we can just get the only person in the list
Person findPersonByID(long id)
{
List<Person> personList = personMap.get(ID_PREFIX+id);
if(personList!=null)
return personList.get(0);
return null;
}
//get list of persons with firstname
List<Person> findPersonsByFirstName(String firstname)
{
return personMap.get(FIRSTNAME_PREFIX+firstname);
}
At that point you're really getting into a grab-bag design but still very efficient if you're not expecting millions of entries.
The best performance-wise method I can think of is to have another HashMap, with the key being the attribute you want to search for, and the value being a list of objects.
For your example this would be HashMap<String, List<Person>>, with the key being the username. The downside is that you have to maintain two maps.
Note: I've used a List<Person> as the value because we cannot guarantee that username is unique among all users. The same applies for any other field.
For example, to add a Person to this new map you could do:
Map<String, List<Person>> peopleByUsername = new HashMap<>();
// ...
Person p = ...;
peopleByUsername.computeIfAbsent(
p.getUsername(),
k -> new ArrayList<>())
.add(p);
Then, to return all people whose username is i.e. joesmith:
List<Person> matching = peopleByUsername.get("joesmith");
Getting one or a few entries from a volatile map
If the map you're operating on can change often and you only want to get a few entries then iterating over the map's entries is ok since you'd need space and time to build other structures or sort the data as well.
Getting many entries from a volatile map
If you need to get many entries from that map you might get better performance by either sorting the entries first (e.g. build a list and sort that) and then using binary search. Alternatively you could build an intermediate map that uses the attribute(s) you need to search for as its key.
Note, however, that both approaches at least need time so this only yields better performance when you're looking for many entries.
Getting entries multiple times from a "persistent" map
If your map and its valuies doesn't change (or not that often) you could maintain a map attribute -> person. This would mean some effort for the initial setup and updating the additional map (unless your data doesn't change) as well as some memory overhead but speeds up lookups tremendously later on. This is a worthwhile approach when you'd do very little "writes" compared to how often you do lookups and if you can spare the memory overhead (depends on how big those maps would be and how much memory you have to spare).
Consider one hashmap per alternate key.
This will have "high" setup cost,
but will result in quick retrieval by alternate key.
Setup the hashmap using the Long key value.
Run through the hashmap Person objects and create a second hashmap (HashMap<String, Person>) for which username is the key.
Perhaps, fill both hashmaps at the same time.
In your case,
you will end up with something like HashMap<Long, Person> idKeyedMap and HashMap<String, Person> usernameKeyedMap.
You can also put all the key values in the same map,
if you define the map as Map<Object, Person>.
Then,
when you add the
(id, person) pair,
you need to also add the (username, person) pair.
Caveat, this is not a great technique.
What is the best way to solve the problem?
There are many ways to tackle this as you can see in the answers and comments.
How is the Map is being used (and perhaps how it is created). If the Map is built from a select statement with the long id value from a column from a table we might think we should use HashMap<Long, Person>.
Another way to look at the problem is to consider usernames should also be unique (i.e. no two persons should ever share the same username). So instead create the map as a HashMap<String, Person>. With username as the key and the Person object as the value.
Using the latter:
Map<String, Person> users = new HashMap<>();
users = retrieveUsersFromDatabase(); // perform db select and build map
String username = "mike";
users.get(username).
This will be the fastest way to retrieve the object you want to find in a Map containing Person objects as its values.
You can simply convert Hashmap to List using:
List list = new ArrayList(map.values());
Now, you can iterate through the list object easily. This way you can search Hashmap values on any property of Person class not just limiting to firstname.
Only downside is you will end up creating a list object. But using stream api you can further improve code to convert Hashmap to list and iterate in single operation saving space and improved performance with parallel streams.
Sorting and finding of value object can be done by designing and using an appropriate Comparator class.
Comparator Class : Designing a Comparator with respect to a specific attribute can be done as follows:
class UserComparator implements Comparator<Person>{
#Override
public int compare(Person p1, Person p2) {
return p1.userName.compareTo(p2.userName);
}
}
Usage : Comparator designed above can be used as follows:
HashMap<Long, Person> personMap = new HashMap<Long, Person>();
.
.
.
ArrayList<Person> pAL = new ArrayList<Person>(personMap.values()); //create list of values
Collections.sort(pAL,new UserComparator()); // sort the list using comparator
Person p = new Person(); // create a dummy object
p.userName="mike"; // Only set the username
int i= Collections.binarySearch(pAL,p,new UserComparator()); // search the list using comparator
if(i>=0){
Person p1 = pAL.get(Collections.binarySearch(pAL,p,new UserComparator())); //Obtain object if username is present
}else{
System.out.println("Insertion point: "+ i); // Returns a negative value if username is not present
}

Get the same element values on multiple arrays

Ive been searching SO about this question and most only have the problem with two arrays comparing by have a nested loop. My problem is quite the same but on a bigger scale. Suppose I have a 100 or thousand user on my app, and each user has the list of item it wants.
Something like this
User1 = {apple,orange,guava,melon,durian}
User2 = {apple, melon,banana,lemon,mango}
User3 = {orange,carrots,guava,melon,tomato}
User4 = {mango,carrots,tomato,apple,durian}
.
.
Nuser = ...
I wanted to see how many apples or oranges was listed from all the users array. So I am basically comparing but on a bigger scale. The data isn't static as well, A user can input an unknown fruit from the developers knowledge but on the users knowledge they can put it there so there can be multiple users that can put this unknown fruit and yet the system can still figure out how many is this unknown item was listed. Keep in mind this is a dynamic one. User can reach for example a 100 users depending how popular an app would be. I can't afford to do nested loop here.
PS this is not the exact problem but it is the simplest scenario I can think of to explain my problem.
PS: just to clarify, I dont intend to use 3rd party lib as well like guava. I am having a problem on proguard with it.
Edit
Just read that Original poster cannot use Java 8, which is a pity, because this would realy make it very easy!
Java 7 solution
final Map<String, Integer> occurencesByFruit = new HashMap<>();
for (User user : users) {
String[] fruits = user.getFruits();
for (String fruit : fruits) {
final Integer currentCount = occurencesByFruit.get(fruit);
if (currentCount == null) {
occurencesByFruit.put(fruit, 1);
} else {
occurencesByFruit.put(fruit, currentCount + 1);
}
}
}
Java 8 solution
I'd stream the users, flatMap() to the actual fruit elements, and then use Collectors.groupingBy() with a downstream collector Collectors.counting().
This will give you a Map where the keys are the fruits, and the values are the occurrences of each fruit throughout all your users.
List<User> users = Arrays.asList(/* ... */);
final Map<String, Long> occurencesByFruit = users.stream()
.map(User::getFruits)
.flatMap(Arrays::stream)
.collect(Collectors.groupingBy(f -> f, Collectors.counting()));
Seems it is a good possibility to use HashMap<Item, Integer> fruits. You could iterate over all Users (you would need to store all Users in some kind of list, such as ArrayList<User> users) and check the list of items chosen by each User (I suppose User should have a field ArrayList<Item> items in its body to store items). You could achieve it with something like that:
for (User user : users) { // for each User from users list
for (Item item : user.items) { // check each item chosen by this user
if (fruits.containsKey(item) { // if the fruit is already present in the items HashMap increment the amount of items
int previousNumberOfItems = fruits.get(item);
fruits.put(item, ++previousNumberOfItems);
else { // otherwise put the first occurrency of this item
fruits.put(item, 1);
}
}
}
I would either create an ArrayList containing a HashMap with strings and ints or use two ArrayLists (one of type String and one of type Integer). Then you can iterate over every entry in each of the user arrays (this is only a simple nested loop). For every entry in the current user array you check if there is already the same entry in the ArrayList you created additionally. If so, you increment the respective int. If not, you add a string and an int. In the end, you have the number of occurrences of all the fruit strings in the added ArrayLists, which is, if I understood you correctly, just what you wanted.

Iterate efficiently through 2 different List with same Type of Object(Java8)

I have two list containing an important number of object with each N elements:
List<Foo> objectsFromDB = {{MailId=100, Status=""}, {{MailId=200, Status=""}, {MailId=300, Status=""} ... {MailId=N , Status= N}}
List <Foo> feedBackStatusFromCsvFiles = {{MailId=100, Status= "OPENED"}, {{MailId=200, Status="CLICKED"}, {MailId=300, Status="HARDBOUNCED"} ... {MailId=N , Status= N}}
Little Insights:
objectFromDB retrieves row of my database by calling a Hibernate method.
feedBackStatusFromCsvFiles calls a CSVparser method and unmarshall to Java objects.
My entity class Foo has all setters and getters. So I know that the basic idea is to use a foreach like this:
for (Foo fooDB : objectsFromDB) {
for(Foo fooStatus: feedBackStatusFromCsvFiles){
if(fooDB.getMailId().equals(fooStatus.getMailId())){
fooDB.setStatus(fooStatus.getStatus());
}
}
}
As far as my modest knowledge of junior developer is, I think it is a very bad practice doing it like this? Should I implement a Comparator and use it for iterating on my list of objects? Should I also check for null cases?
Thanks to all of you for your answers!
Assuming Java 8 and considering the fact that feedbackStatus may contain more than one element with the same ID.
Transform the list into a Map using ID as key and having a list of elements.
Iterate the list and use the Map to find all messages.
The code would be:
final Map<String, List<Foo>> listMap =
objectsFromDB.stream().collect(
Collectors.groupingBy(item -> item.getMailId())
);
for (final Foo feedBackStatus : feedBackStatusFromCsvFiles) {
listMap.getOrDefault(feedBackStatus.getMailId(), Colleactions.emptyList()).forEach(item -> item.setStatus(feedBackStatus.getStatus()));
}
Use maps from collections to avoid the nested loops.
List<Foo> aList = new ArrayList<>();
List<Foo> bList = new ArrayList<>();
for(int i = 0;i<5;i++){
Foo foo = new Foo();
foo.setId((long) i);
foo.setValue("FooA"+String.valueOf(i));
aList.add(foo);
foo = new Foo();
foo.setId((long) i);
foo.setValue("FooB"+String.valueOf(i));
bList.add(foo);
}
final Map<Long,Foo> bMap = bList.stream().collect(Collectors.toMap(Foo::getId, Function.identity()));
aList.stream().forEach(it->{
Foo bFoo = bMap.get(it.getId());
if( bFoo != null){
it.setValue(bFoo.getValue());
}
});
The only other solution would be to have the DTO layer return a map of the MailId->Foo object, as you could then use the CVS list to stream, and simply look up the DB Foo object. Otherwise, the expense of sorting or iterating over both of the lists is not worth the trade-offs in performance time. The previous statement holds true until it definitively causes a memory constraint on the platform, until then let the garbage collector do its job, and you do yours as easy as possible.
Given that your lists may contain tens of thousands of elements, you should be concerned that you simple nested-loop approach will be too slow. It will certainly perform a lot more comparisons than it needs to do.
If memory is comparatively abundant, then the fastest suitable approach would probably be to form a Map from mailId to (list of) corresponding Foo from one of your lists, somewhat as #MichaelH suggested, and to use that to match mailIds. If mailId values are not certain to be unique in one or both lists, however, then you'll need something a bit different than Michael's specific approach. Even if mailIds are sure to be unique within both lists, it will be a bit more efficient to form only one map.
For the most general case, you might do something like this:
// The initial capacity is set (more than) large enough to avoid any rehashing
Map<Long, List<Foo>> dbMap = new HashMap<>(3 * objectFromDb.size() / 2);
// Populate the map
// This could be done more effciently if the objects were ordered by mailId,
// which perhaps the DB could be enlisted to ensure.
for (Foo foo : objectsFromDb) {
Long mailId = foo.getMailId();
List<Foo> foos = dbMap.get(mailId);
if (foos == null) {
foos = new ArrayList<>();
dbMap.put(mailId, foos);
}
foos.add(foo);
}
// Use the map
for (Foo fooStatus: feedBackStatusFromCsvFiles) {
List<Foo> dbFoos = dbMap.get(fooStatus.getMailId());
if (dbFoos != null) {
String status = fooStatus.getStatus();
// Iterate over only the Foos that we already know have matching Ids
for (Foo fooDB : dbFoos) {
fooDB.setStatus(status);
}
}
}
On the other hand, if you are space-constrained, so that creating the map is not viable, yet it is acceptable to reorder your two lists, then you should still get a performance improvement by sorting both lists first. Presumably you would use Collections.sort() with an appropriate Comparator for this purpose. Then you would obtain an Iterator over each list, and use them to iterate cooperatively over the two lists. I present no code, but it would be reminiscent of the merge step of a merge sort (but the two lists are not actually merged; you only copy status information from one to the other). But this makes sense only if the mailIds from feedBackStatusFromCsvFiles are all distinct, for otherwise the expected result of the whole task is not well determined.
your problem is merging Foo's last status into Database objects.so you can do it in two steps that will make it more clearly & readable.
filtering Foos that need to merge.
merging Foos with last status.
//because the status always the last,so you needn't use groupingBy methods to create a complex Map.
Map<String, String> lastStatus = feedBackStatusFromCsvFiles.stream()
.collect(toMap(Foo::getMailId, Foo::getStatus
, (previous, current) -> current));
//find out Foos in Database that need to merge
Predicate<Foo> fooThatNeedMerge = it -> lastStatus.containsKey(it.getMailId());
//merge Foo's last status from cvs.
Consumer<Foo> mergingFoo = it -> it.setStatus(lastStatus.get(it.getMailId()));
objectsFromDB.stream().filter(fooThatNeedMerge).forEach(mergingFoo);

The efficient way to get a list of objects that are not in the database in Java

Suppose I have a list of objects (ArrayList objects) and a db table for the objects, I want to find the objects which has not been stored in the my database. The objects are identified by their "id". I can think of two solutions but I do not know which one is more efficient.
The first solution I think of is to construct one db query to get all objects existed in the db, and loop through the existed objects to determine the ones that is not in the db
ArrayList<Integer> ids = new ArrayList<Integer>();
for(MyObject o in objects){
ids.add(o.getId());
}
//I use sugar orm on Android, raw query can be seen as
// "select * from my_object where id in [ id1,id2,id3 ..... ]"
List<MyObjectRow> unwanted_objects = MyObject.find("id in (?,?,?,?,.....)",ids);
//remove the query results from the original arraylist
for(MyObjectRow o in unwanted_objects){
for(MyObject o1 in objects){
if(o1.getId() == o.getId()) objects.remove(o1);
}
}
The second solution is to query existence of every object in db, and add non-existed object to result array
ArrayList<MyObject> result_objects = new ArrayList<MyObject>();
boolean exist = false
for(MyObject o in objects){
exist = MyObject.find("EXIST( select 1 from my_object where id = ?)", o.getId());
if(!exist){
result_objects.add(o);
}
}
The first solution only require one query, but when loop through all founded objects, the complexity become O(n*n)
The second solution constructs n db querys, but it only has a complexity of O(n)
Which one may be better in terms of performance?
I would use option 1 with a change to use a Map<Integer, MyObject> to improve the performance of the removal of query results from the original list:
List<Integer> ids = new ArrayList<Integer>();
Map<Integer, MyObject> mapToInsert = new HashMap<Integer, MyObject>();
for(MyObject o in objects) {
//add the ids of the objects to possibly insert
ids.add(o.getId());
//using the id of the object as the key in the map
mapToInsert.put(o.getId(), o);
}
//retrieve the ids of the elements that already exist in database
List<MyObjectRow> unwanted_objects = MyObject.find("id in (?,?,?,?,.....)",ids);
//remove the query results from the map, not the list
for(MyObjectRow o in unwanted_objects){
mapToInsert.remove(o.getId());
}
//insert the values that still exist in mapToInsert
Collection<MyObject> valuesToInsert = mapToInsert.values();
You don't know the efficiency of the database operations. If the database is a b-tree under the hood that query could take O(log n). If your indices aren't set up correctly, you may be looking at o(n) performance for that query. Your measurement of efficiency here is also ignoring any transaction costs: the cost to initiation a connection with, process the query, and close the connection to the database. This is a 'fixed' cost, and I wouldn't want to do that in a loop if i didn't have to.
Go with the first solution.

Sorting of 2 or more massive resultsets?

I need to be able to sort multiple intermediate result sets and enter them to a file in sorted order. Sort is based on a single column/key value. Each result set record will be list of values (like a record in a table)
The intermediate result sets are got by querying entirely different databases.
The intermediate result sets are already sorted based on some key(or column). They need to be combined and sorted again on the same key(or column) before writing it to a file.
Since these result sets can be massive(order of MBs) this cannot be done in memory.
My Solution broadly :
To use a hash and a random access file . Since the result sets are already sorted, when retrieving the result sets , I will store the sorted column values as keys in a hashmap.The value in the hashmap will be a address in the random access file where every record associated with that column value will be stored.
Any ideas ?
Have a pointer into every set, initially pointing to the first entry
Then choose the next result from the set, that offers the lowest entry
Write this entry to the file and increment the corresponding pointer
This approach has basically no overhead and time is O(n). (it's Merge-Sort, btw)
Edit
To clarify: It's the merge part of merge sort.
If you've got 2 pre-sorted result sets, you should be able to iterate them concurrently while writing the output file. You just need to compare the current row in each set:
Simple example (not ready for copy-and-paste use!):
ResultSet a,b;
//fetch a and b
a.first();
b.first();
while (!a.isAfterLast() || !b.isAfterLast()) {
Integer valueA = null;
Integer valueB = null;
if (a.isAfterLast()) {
writeToFile(b);
b.next();
}
else if (b.isAfterLast()) {
writeToFile(a);
a.next();
} else {
int valueA = a.getInt("SORT_PROPERTY");
int valueB = b.getInt("SORT_PROPERTY");
if (valueA < valueB) {
writeToFile(a);
a.next();
} else {
writeToFile(b);
b.next();
}
}
}
Sounds like you are looking for an implementation of the Balance Line algorithm.

Categories