Preserving memory with two HashMaps that contain similar values - java

I am loading 2 large datasets into two separate HashMaps, sequentially. (The datasets are serialized into many Record objects, depicted below). The HashMaps are represented like so, with the key as the id of the Record:
Map<Long, Record> recordMapA = new HashMap<>();
Map<Long, Record> recordMapB = new HashMap<>();
The Record object looks like so:
class Record {
Long id;
Long timestamp;
String category;
String location;
}
In many cases, the records are the same between the two datasets, except that the timestamp field differs. For my use case, any two Record objects are equal if all field values except for the timestamp field are the same.
// These two records are the same because only the timestamp differs
Record recordA = new Record(54321, 1615270861975L, "foo", "USA");
Record recordB = new Record(54321, 1615357219994L, "foo", "USA");
To preserve memory, is there a way to make it so that if two Record objects are "equal", both of those map entry values in maps A and B would refer to the same Record object in memory? I've overridden the equals and hashCode methods for the Record object to ignore timestamp, then checked if RecordMapA already contains the same record. If so, I put the record from RecordMapA into RecordMapB instead of putting the new Record that has been serialized from dataset B into Map B. However the impact on memory seems negligible so far.
One side note is that I need to retain both maps (instead of merging them into one) for purposes of comparison later.

If the records are 'small enough' then I would not bother trying anything fancy. For large records, the easiest way seems to be to do what you're doing.
void addToMap(Long key, Record rec, Map<Long,Record> map,
Map<Long,Record> otherMap) {
Record existing = otherMap.get(key);
map.put(key, existing != null ? existing : rec);
]
Assumes that if the key is present then the record located by the key must be the same. If not the case, you'll need to check.
void addToMap(Long key, Record rec, Map<Long,Record> map,
Map<Long,Record> otherMap) {
Record existing = otherMap.get(key);
if (existing != null && existing.equals(rec))
map.put(key, existing);
else
map.put(key, rec);
]

Related

JAVA : Best performance-wise method to find an object stored in hashMap

I have a bunch of objects stored in hashMap<Long,Person> i need to find the person object with a specific attribute without knowing its ID.
for example the person class:
public person{
long id;
String firstName;
String lastName;
String userName;
String password;
String address;
..
(around 7-10 attributes in total)
}
lets say i want to find the object with username = "mike". Is there any method to find it without actually iterating on the whole hash map like this :
for (Map.Entry<Long,Person> entry : map.entrySet()) {
if(entry.getValue().getUserName().equalsIgnoreCase("mike"));
the answers i found here was pretty old.
If you want speed and are always looking for one specific attribute, your best bet is to create another 'cache' hash-map keyed with that attribute.
The memory taken up will be insignificant for less than a million entries and the hash-map lookup will be much much faster than any other solution.
Alternatively you could put all search attributes into a single map (ie. names, and ids). Prefix the keys with something unique if you're concerned with collisions. Something like:
String ID_PREFIX = "^!^ID^!^";
String USERNAME_PREFIX = "^!^USERNAME^!^";
String FIRSTNAME_PREFIX = "^!^FIRSTNAME^!^";
Map<String,Person> personMap = new HashMap<String,Person>();
//add a person
void addPersonToMap(Person person)
{
personMap.put(ID_PREFIX+person.id, person);
personMap.put(USERNAME_PREFIX+person.username, person);
personMap.put(FIRSTNAME_PREFIX+person.firstname, person);
}
//search person
Person findPersonByID(long id)
{
return personMap.get(ID_PREFIX+id);
}
Person findPersonByUsername(String username)
{
return personMap.get(USERNAME_PREFIX+username);
}
//or a more generic version:
//Person foundPerson = findPersonByAttribute(FIRSTNAME_PREFIX, "mike");
Person findPersonByAttribute(String attr, String attr_value)
{
return personMap.get(attr+attr_value);
}
The above assumes that each attribute is unique amongst all the Persons. This might be true for ID and username, but the question specifies firstname=mike which is unlikely to be unique.
In that case you want to abstract with a list, so it would be more like this:
Map<String,List<Person>> personMap = new HashMap<String,List<Person>>();
//add a person
void addPersonToMap(Person person)
{
insertPersonIntoMap(ID_PREFIX+person.id, person);
insertPersonIntoMap(USERNAME_PREFIX+person.username, person);
insertPersonIntoMap(FIRSTNAME_PREFIX+person.firstname, person);
}
//note that List contains no duplicates, so can be called multiple times for the same person.
void insertPersonIntoMap(String key, Person person)
{
List<Person> personsList = personMap.get(key);
if(personsList==null)
personsList = new ArrayList<Person>();
personsList.add(person);
personMap.put(key,personsList);
}
//we know id is unique, so we can just get the only person in the list
Person findPersonByID(long id)
{
List<Person> personList = personMap.get(ID_PREFIX+id);
if(personList!=null)
return personList.get(0);
return null;
}
//get list of persons with firstname
List<Person> findPersonsByFirstName(String firstname)
{
return personMap.get(FIRSTNAME_PREFIX+firstname);
}
At that point you're really getting into a grab-bag design but still very efficient if you're not expecting millions of entries.
The best performance-wise method I can think of is to have another HashMap, with the key being the attribute you want to search for, and the value being a list of objects.
For your example this would be HashMap<String, List<Person>>, with the key being the username. The downside is that you have to maintain two maps.
Note: I've used a List<Person> as the value because we cannot guarantee that username is unique among all users. The same applies for any other field.
For example, to add a Person to this new map you could do:
Map<String, List<Person>> peopleByUsername = new HashMap<>();
// ...
Person p = ...;
peopleByUsername.computeIfAbsent(
p.getUsername(),
k -> new ArrayList<>())
.add(p);
Then, to return all people whose username is i.e. joesmith:
List<Person> matching = peopleByUsername.get("joesmith");
Getting one or a few entries from a volatile map
If the map you're operating on can change often and you only want to get a few entries then iterating over the map's entries is ok since you'd need space and time to build other structures or sort the data as well.
Getting many entries from a volatile map
If you need to get many entries from that map you might get better performance by either sorting the entries first (e.g. build a list and sort that) and then using binary search. Alternatively you could build an intermediate map that uses the attribute(s) you need to search for as its key.
Note, however, that both approaches at least need time so this only yields better performance when you're looking for many entries.
Getting entries multiple times from a "persistent" map
If your map and its valuies doesn't change (or not that often) you could maintain a map attribute -> person. This would mean some effort for the initial setup and updating the additional map (unless your data doesn't change) as well as some memory overhead but speeds up lookups tremendously later on. This is a worthwhile approach when you'd do very little "writes" compared to how often you do lookups and if you can spare the memory overhead (depends on how big those maps would be and how much memory you have to spare).
Consider one hashmap per alternate key.
This will have "high" setup cost,
but will result in quick retrieval by alternate key.
Setup the hashmap using the Long key value.
Run through the hashmap Person objects and create a second hashmap (HashMap<String, Person>) for which username is the key.
Perhaps, fill both hashmaps at the same time.
In your case,
you will end up with something like HashMap<Long, Person> idKeyedMap and HashMap<String, Person> usernameKeyedMap.
You can also put all the key values in the same map,
if you define the map as Map<Object, Person>.
Then,
when you add the
(id, person) pair,
you need to also add the (username, person) pair.
Caveat, this is not a great technique.
What is the best way to solve the problem?
There are many ways to tackle this as you can see in the answers and comments.
How is the Map is being used (and perhaps how it is created). If the Map is built from a select statement with the long id value from a column from a table we might think we should use HashMap<Long, Person>.
Another way to look at the problem is to consider usernames should also be unique (i.e. no two persons should ever share the same username). So instead create the map as a HashMap<String, Person>. With username as the key and the Person object as the value.
Using the latter:
Map<String, Person> users = new HashMap<>();
users = retrieveUsersFromDatabase(); // perform db select and build map
String username = "mike";
users.get(username).
This will be the fastest way to retrieve the object you want to find in a Map containing Person objects as its values.
You can simply convert Hashmap to List using:
List list = new ArrayList(map.values());
Now, you can iterate through the list object easily. This way you can search Hashmap values on any property of Person class not just limiting to firstname.
Only downside is you will end up creating a list object. But using stream api you can further improve code to convert Hashmap to list and iterate in single operation saving space and improved performance with parallel streams.
Sorting and finding of value object can be done by designing and using an appropriate Comparator class.
Comparator Class : Designing a Comparator with respect to a specific attribute can be done as follows:
class UserComparator implements Comparator<Person>{
#Override
public int compare(Person p1, Person p2) {
return p1.userName.compareTo(p2.userName);
}
}
Usage : Comparator designed above can be used as follows:
HashMap<Long, Person> personMap = new HashMap<Long, Person>();
.
.
.
ArrayList<Person> pAL = new ArrayList<Person>(personMap.values()); //create list of values
Collections.sort(pAL,new UserComparator()); // sort the list using comparator
Person p = new Person(); // create a dummy object
p.userName="mike"; // Only set the username
int i= Collections.binarySearch(pAL,p,new UserComparator()); // search the list using comparator
if(i>=0){
Person p1 = pAL.get(Collections.binarySearch(pAL,p,new UserComparator())); //Obtain object if username is present
}else{
System.out.println("Insertion point: "+ i); // Returns a negative value if username is not present
}

java: check if an object's attribute exists in HashMap values

I have a HashMap with key of type Double and my custom object as value.
It looks like this:
private static Map<Double, Incident> incidentHash = new HashMap<>();
The Incident object has following attributes: String date, String address, String incidentType.
Now I have a String date that I get from the user as input and I want to check if there exists any incident in the HashMap with that user inputted date. There can be many Incidents in the HashMap with the given date but as long as there's at least one Incident with the given date, I can do *
something.
I can just iterate over all the values in the HashMap and check if a given date exists but I was wondering if there is any better and more efficient way possible without modifying the data structure.
Given your HashMap, NO, there is not another way of doing so without iterating that HashMap.
As for changing the structure, you could do as Map<String, List<Incident>> that way you would have a date as key and a List of incidents for that date, given your requirement: There can be many Incidents in the HashMap with the given date.
So this would be a O(1)
//considering that the key is added when you have at least one incident
if (yourHash.get("yourDateStringWhatEverTheFormatIs") != null)
You can use streams API (from Java8) as shown in the below code with inline comments:
String userInput="10-APR-2017";
Optional<Map.Entry<Double, Incident>> matchedEntry =
incidentHash.entrySet().stream().
//filter with the condition to match
filter(element -> element.getValue().getDate().equals(userInput)).findAny();
//if the entry is found, do your logic
matchedEntry.ifPresent(value -> {
//do something here
});
If you are looking for something prior to JDK1.8, you can refer the below code:
String userInput="10-APR-2017";
Set<Map.Entry<Double, Incident>> entries = incidentHash.entrySet();
Map.Entry<Double, Incident> matchedEntry = null;
for(Iterator<Map.Entry<Double, Incident>> iterator = entries.iterator();
iterator.hasNext();) {
Map.Entry<Double, Incident> temp = iterator.next();
if(temp.getValue().getDate().equals(userInput)) {
matchedEntry = temp;
break;
}
}
You can use a TreeMap with your custom Comparator. In your Comparator compare the values of dates.
You would have to iterate through the map until you find a data that matches. Since you only need to know if any occurrences exist you can simply exit the loop when you find a match instead of iterating the rest of the map.
You can only keep a second Hash/TreeMap that matches the attribute to the object, so you can also check this attibute qickly. But you have to curate one such map for each attribute you want to access quickly. This makes it a bit more complex and use more memory, but can be much much faster.
If this is not an option the stream API referenced in other answers is a nice and tidy way to iterate over all objects to search for an attribute.
private static Map<Double, Incident> incidentHash = new HashMap<>();
private static Map<String, List<Incident>> incidentsPerDayMap = new HashMap<>();
Given that you don't want to iterate the Map and currently it's the only way to get the required value, I would recommend recomment another Map that contains Date as key and List<Incident> as value. It can be a TreeMap, e.g.:
Map<Date, List<Incident>> incidents = new TreeMap<>();
You can put the entry in this Map whenever an entry is added into the original Map, e.g.:
Incident incident = ;// incident object
Date date; //Date
incidents.computeIfAbsent(date, t -> new ArrayList<>()).add(incident);
Once the user inputs the Date, you can get all the incidents belonging to this date just by incidents.get(). Although that will give you a list and you still need to iterate over it, it will contain a lot less elements and get method in TreeMap will guarantee you log n complexity as it is sorted. So, your search operation will be much more efficient.

How to put record in map of linkedhashmap java or how to create data structure?

How to put record in :
Map<String, LinkedHashMap<String, String>> fileRecords = new LinkedHashMap<String, LinkedHashMap<String,String>>();
I have two files in the same format, i have to create the data structure for both of them, and then compare the data structure to identify the differences for each CAID-find the blocks and compare the data.
So accordingly i tried to read the file line by line. confused of what should i try to put the records in map or in arraylist (I have to follow the insertion order).
Take ID as unique, for unique id, i want to store the corresponding block and data.
The decision to use a Map or a List or a Set lies on the purpose of the data being read.
if you need to index the data (e.g. by ID), then use a Map
else, if you want to iterate through the data in order, I would recommend a List or Set collection. If your dataset does not support duplicates I recommend you a Set, or a List if it will have duplicates.
About the code using Map, the piece of code below groups Block and Data per ID:
public void read() {
Map<String, LinkedHashMap<String, String>> fileRecords = new LinkedHashMap<>();
String id, block, data;
//while(dataInFile()) {
// I assume data is read row by row into vars id, block and data
// get the block map for the just read ID
LinkedHashMap<String, String> blockRecords = fileRecords.get(id);
// first Block for the ID
if (blockRecords == null) {
blockRecords = new LinkedHashMap<>();
fileRecords.put(id, blockRecords);
}
// read data and block
blockRecords.put(block, data);
}
This code also allows you to group block and data if the IDs are not sorted in the file.
If the ID is always an integer, I recommend you to use an Integer instead of String. It saves memory and CPU during the matching.
Also, if data in your file is not just these 3 fields, I'd recommend you to use objects instead. Then you will need to be more mindful about the usage of the data to design the object model.

Best way to save some data and then retrieve it

I have a project where I save some data coming from different channels of a Soap Service, for example:
String_Value Long_timestamp Double_value String_value String_value Int_value
I can have many lines (i.e. 200), with different values, like the one above.
I thought that I could use an ArrayList, however data can have a different structure than the one above, so an ArrayList maybe isn't a good solution in order to retrieve data from it.
For example above I have, after the first two values that are always fixed, 4 values, but in another channel I may have 3, or 5, values. What I want retrieve data, I must know how many values have a particular line, and I think that Arraylist doesn't help me.
What solution could I use?
When you have a need to uniquely identify varying length input, a HashMap usually works quite well. For example, you can have a class:
public class Record
{
private HashMap<String, String> values;
public Record()
{
// create your hashmap.
values = new HashMap<String, String>();
}
public String getData(String key)
{
return values.get(key);
}
public void addData(String key, String value)
{
values.put(key, value);
}
}
With this type of structure, you can save as many different values as you want. What I would do is loop through each value passed from Soap and simply add to the Record, then keep a list of Record objects.
Record rec = new Record();
rec.addData("timestamp", timestamp);
rec.addData("Value", value);
rec.addData("Plans for world domination", dominationPlans);
You could build your classes representing the entities and then build a parser ... If it isn't in a standard format (eg JSON, YAML, ecc...) you have no choice to develop your own parser .
Create a class with fields.
class ClassName{
int numberOfValues;
String dataString;
...
}
Now create an ArrayList of that class like ArrayList<ClassName> and for each record fill that class object with numberOfValues and dataString and add in Arraylist.

How to check if a Set contains an object which has one member variable equal to some value

I have a java Set of Result objects. My Result class definition looks like this:
private String url;
private String title;
private Set<String> keywords;
I have stored my information in a database table called Keywords which looks like this
Keywords = [id, url, title, keyword, date-time]
As you can see there isn't a one-to-one mapping between an object and a row in the database. I am using SQL (MySQL DB) to extract the values and have a suitable ResultSet object.
How do I check whether the Set already contains a Result with a given URL.
If the set already contains a Result object with the current URL I simply want to add the extra keyword to the Set of keywords, otherwise I create a new Result object for adding to the Set of Result objects.
When you iterate over the JDBC resultSet (to create your own set of Results) why don't you put them into a Map? To create the Map after the fact:
Map<String, List<Result>> map = new HashMap<String, List<Result>>();
for (Result r : resultSet) {
if (map.containsKey(r.url)) {
map.get(r.url).add(r);
} else {
List<Result> list = new ArrayList<Result>();
list.add(r);
map.put(r.url, list);
}
}
Then just use map.containsKey(url) to check.
Normalization is your friend
http://en.wikipedia.org/wiki/Database_normalization
If it's possible, I suggest changing your database design to eliminate this problem. Your current design requries storing the id, url, title and date-time once per key word, which could waste quite a bit of space if you have lots of key words
I would suggest having two tables. Assuming that the id field is guarenteed to be unique, the first table would store the id, url, title and date-time and would only have one row per id. The second table would store the id and a key word. You would insert multiple rows into this table as required.
Is that possible / does that make sense?
You can use a Map with the URLs as the keys:
Map<String, Result> map = new HashMap<String, Result>();
for (Result r : results) {
if (map.containsKey(r.url)) {
map.get(r.url).keywords.addAll(r.keywords);
} else {
map.put(r.url, r);
}
}
I think that you need to make an override on equals() method of your Result class. In that method you will put your logic that will check what you are looking for.
N.B. You also need to know that overrideng the equals() method, you need to override also hashCode() method.
For more on "overriding equals() and hashCode() methods" topic you can look at the this another question.

Categories