ArrayList of objects vs Arraylist of HashMaps - java

I have a file containing approximately 10,000 json dumps. Each json has about 20 fields, out of which only 5 are of use to me. I need to iterate over the file, parse each json and store the relevant elements for further processing.
In Java what will be an efficient data structure to store the relevant json fields. I am confused between an ArrayList of Objects (for which I will create a bean to hold the various fields) and an ArrayList of HashMaps (where each of the relevant json fields will be stored as key value pairs).
Which of the two is better in regards to memory usage and computation?

It depends on your use case. If you are going to use all the 5 fields as it is . Like putting it in a database, or displaying in UI, then the first approach (array of beans). If you are going to use the fields selectively, (1 out 5 fields here, and another of 5 fields there) then the sceond approach is better (array of hash maps).

a List of Beans has better type safety and readability. Use that until you can prove that there is a problem with that approach.

If you have a fixed set of fields an Object will be smaller than a HashMap.
The HashMap has to store the keys as Strings for each instance. Also accessing the fields in an Object will be much faster. Accessing a field in an Object is a single byte code operation. Accessing a HashMap requires computing the hash for the given field and then access an element in an array.
Regardless, performance is probably not a dominant factor for this particular problem and using an Object will probably be more readable.

Related

Best way to map simple values

I have a question about mapping Long value to another Long value and the best way to achieve that.
I have to map left values to right values just right before writing data to database.
3 => 70
8 => 12
1 => 45
Is there any "best way"? I was thinking about static map where the left Long will be a key and a right would be a value, and I have just to get a value corresponding to a given key.
Is it good approach?
You have two main options: an associative container, or an array. If the input values are all within a small range and performance is very important, you could use an array. Otherwise you may as well use a map as you said.
As #John Zwinck points out, a map is generally fine for this type of thing. The cost of mapping from one Long to another is trivial & going to be dwarfed by the network latency of writing to a database (so don't use a primitive arrray :).
Open for extension, but closed for modification
I think it's probably more import for you to consider what happens if the mappings change or you need to add another one. In line with SOLID principles (and in particular open-closed), it should be possible to modify the mappings without changing the class.
In practice you should make sure you can read the mappings (initially,on demand or periodically) from an external source (e.g. a property file, db, NoSQL cache).
Use map and pass initial capacity to constructor if you know the size of your key-value pairs.Choose implementation of map carefully depending upon concurrency/ordering requirements as per source.

Create Weka Instance with string attribute

I'm trying to convert an ArrayList that is custom code I have inherited to a Weka Instances structure so I can use the Weka IBk classifier on it.
In the Instance the features are represented with a HashMap. So if I'm classifying a film review for example a feature might be a HashMap of ("funny", 2), 2 being the occurrence of the word "funny"
Although there's probably a better way I'm iterating over my instances to try and convert them to Weka Instances.
The problem is I can't instance.setValue("funny", 2) as setValue() requires an int,double input. Is there a way to do this or should I be approaching it a different way?
You can create an attribute per key (get all your distinct keys in a list, sort it to keep the order fixed). The attributes will have numeric values which are the number of occurrences.

How to get the Object Reference having the hashcode or other Unique String in Java

this is my issue. Im storing Data into a database table which has a column where i store the hashcode (or can be some other Unique String such as an ID because the JVM can re-locate the objects, changing the hashcode). But once i get that String i want to access to the object mapped to that String. I can do it with HashMap like:
ConcurrentHashMap<String, MyClass> MyClassDictionary;
The average of objects to store would be like +800. I can take other options to avoid this kind of things but i really want to know if some of you know a better way than using HashMap.
I found something about a Referenceable Interface that i could implement, you can check it out in the next link:
http://docs.oracle.com/javase/jndi/tutorial/objects/storing/reference.html
Thanks for reading.
You can use any key in the HashMap which is Immutable. String by nature is immutable, which means the object cannot be changed, if someone tries to change the object, a new one will be created and the original remains as it is. So you are safe if you are using unique strings as key. The advantage of using immutable keys in any hashed collection is that, your key object will always be preserved or unchanged. And there will be no chance that someone by mistake and change the key, and leading to a problem that you lose the reference to the value. If the key is not immutable and it is changed from some other place in the code. Then you will never be able to fetch the associated value to that key. This is sometimes refer to as memory leak in java.
The hashCode of an object is very explicitly not unique; it is quite legal for your hashCode() method to just return 0 all the time. You will need to use some other identifier.
You look like you're crossing two separate issues here: Are your objects being stored in the database or just in memory? If they're only in memory, then there's no reason to put the identifier in the database, because the objects will get thrown away when the program restarts. If they're in the database, you need some sort of object-relational mapping solution to recreate Java objects from database rows, and you should look at JPA.

Java map content comparison

Here is a tricky data structure and data organization case.
I have an application that reads data from large files and produces objects of various types (e.g., Boolean, Integer, String) that are categorized in a few (less than a dozen) groups and then stored in a database.
Each object is currently stored in a single HashMap<String, Object> data structure. Each such HashMap corresponds to a single category (group). Each database record is built from the information in all the objects contained in all categories (HashMap data structures).
A requirement has appeared for checking whether subsequent records are "equivalent" in the number and type of columns, where equivalence must be verified across all maps by comparing the name (HashMap key) and the type (actual class) of each stored object.
I am looking for an efficient way of implementing this functionality, while maintaining the original object categorization, because listing objects by category in the fastest possible way is also a requirement.
An idea would be to just sort the keys (e.g., by replacing each HashMap with a TreeMap) and then walk over all maps. An alternative would be to just copy everything in a TreeMap for comparison purposes only.
What would be the most efficient way of implementing this functionality?
Also, if how would you go about finding the difference (i.e., the fields added and those removed), between successive records?
Create a meta SortedSet in which you store all the created maps.
Means SortedSet<Map<String,Object>> e.g. a TreeSet which as a custom Comparator<Map<String,Object>> which does check exactly your requirements of same number and names of keys and same object type per value.
You can then use the contains() method of this meta set structure to find out if a similar record does already exist.
==== EDIT ====
Since I've misundertood the relation between database records and the maps in the first place, I've to change some semantics my answer now of course a little bit.
Still I'would use the mentioned SortedSet<Map<String,Object>> but of course the Map<String,Object> would now point to that Map you and havexy suggested.
On the other hand could it be a step forward to use a Set<Set<KeyAndType>> or SortedSet<Set<KeyAndType>> where your KeyAndType will only contain the key and the type with appropriate Comparable implementation or equals with hashcode.
Why? You asked how to find the differences between two records? If each record relates to one of those inner Set<KeyAndType> you can easily use retainAll() to form the intersection of two successive Sets.
If you would compare this to the idea of a SortedSet<Map<String,Object>>, in both ways you would have the logic which differenciates between the fields within the comparator, one time comparing inner sets, one time comparing inner maps. And since this information gets lost when the surrounding set is constructed, it will be hard to get the differences between two records later on, if you do not have another reduced structure which is easy to use to find such differences. And since such a Set<KeyAndType> could act as key as well as as easy base for comparison between two records, it could be a good candidate to be used for both purposes.
If furthermore you wanna keep the relation between such a Set<KeyAndType> to your record or the group of Map<String,Object> your meta structure could be something like:
Map<Set<KeyAndType>,DatabaseRecord> or Map<Set<KeyAndType>,GroupOfMaps> implemented by a simple LinkedHashMap which allows simple iteration in original order.
One soln is to keep both category based HashMap and combined TreeMap. This will have slight more memory requirement, not much though, as you ll just keep the same reference in both of them.
So whenever you are adding/removing to HashMap you will do the same operation in the TreeMap too. This way both will always be in sync.
You can then use TreeMap for comparison, whether you want comparison of type of object or actual content comparison.

Best Java data structure to store a 3 column oracle table? 3 column array? or double map?

What is the best data structure to store an oracle table that's about 140 rows by 3 columns. I was thinking about a multi dimensional array.
By best I do not necessarily mean most efficient (but i'd be curious to know your opinions) since the program will run as a job with plenty of time to run but I do have some restrictions:
It is possible for multiple keys to be "null" at first. so the first column might have multiple null values. I also need to be able to access elements from the other columns. Anything better than a linear search to access the data?
So again, something like [][][] would work.. but is there something like a 3 column map where I can access by the key or the second column ? I know maps have only two values.
All data will probably be strings or cast as strings.
Thanks
A custom class with 3 fields, and a java.util.List of that class.
There's no benefit in shoe-horning data into arrays in this case, you get no improvement in performance, and certainly no improvement in code maintainability.
This is another example of people writing FORTRAN in an object-oriented language.
Java's about objects. You'd be much better off if you started using objects to abstract your problem, hide details away from clients, and reduce coupling.
What sensible object, with meaningful behavior, do those three items represent? I'd start with that, and worry about the data structures and persistence later.
All data will probably be strings or cast as strings.
This is fine if they really are strings, but I'd encourage you to look deeper and see if you can do better.
For example, if you write an application that uses credit scores you might be tempted to persist it as a number column in a database. But you can benefit from looking at the problem harder and encapsulating that value into a CreditScore object. When you have that, you realize that you can add something like units ("FICO" versus "TransUnion"), scale (range from 0 to 850), and maybe some rich behavior (e.g., rules governing when to reorder the score). You encapsulate everything into a single object instead of scattering the logic for operating on credit scores all over your code base.
Start thinking less in terms of tables and columns and more about objects. Or switch languages. Python has the notion of tuples built in. Maybe that will work better for you.
If you need to access your data by key and by another key, then I would just use 2 maps for that and define a separate class to hold your record.
class Record {
String field1;
String field2;
String field3;
}
and
Map<String, Record> firstKeyMap = new HashMap<String, Record>();
Map<String, Record> secondKeyMap = new HashMap<String, Record>();
I'd create an object which map your record and then create a collection of this object.

Categories