Which Java data structure move the position of the last retrieved element? - java

I used to understand a bunch of Java data structures but did not use it for a while. I am looking for two data structures.
the data structure that moves the position of the last retrieved element to the LAST position.
the data structure that moves the position of the last retrieved element
to the FIRST position.
I have tried to look for them on the Internet and saw e.g. LinkedList, ArrayList, HashMap, HashSet ect. They all provide the description and how to implement them but not the two points I mentioned above. Thus which are those two Java data structure?

See java.util.LinkedHashMap, from API: it is possible to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order).

If you need a fixed-size data structure, please consider using the LRUMap from Apache Collections.
This is half an answer, though, since an LRUMap-like structure only accomplishes #2 of your requirements.
For #1, you need an MRUMap-like structure (MRU stands for most-recently-used). As a guide, refer to this LinkedHashMap-based implementation, whose drawback is that only the put() operation is considered as an access.

Related

ArrayList of objects vs Arraylist of HashMaps

I have a file containing approximately 10,000 json dumps. Each json has about 20 fields, out of which only 5 are of use to me. I need to iterate over the file, parse each json and store the relevant elements for further processing.
In Java what will be an efficient data structure to store the relevant json fields. I am confused between an ArrayList of Objects (for which I will create a bean to hold the various fields) and an ArrayList of HashMaps (where each of the relevant json fields will be stored as key value pairs).
Which of the two is better in regards to memory usage and computation?
It depends on your use case. If you are going to use all the 5 fields as it is . Like putting it in a database, or displaying in UI, then the first approach (array of beans). If you are going to use the fields selectively, (1 out 5 fields here, and another of 5 fields there) then the sceond approach is better (array of hash maps).
a List of Beans has better type safety and readability. Use that until you can prove that there is a problem with that approach.
If you have a fixed set of fields an Object will be smaller than a HashMap.
The HashMap has to store the keys as Strings for each instance. Also accessing the fields in an Object will be much faster. Accessing a field in an Object is a single byte code operation. Accessing a HashMap requires computing the hash for the given field and then access an element in an array.
Regardless, performance is probably not a dominant factor for this particular problem and using an Object will probably be more readable.

Problems that we use a BiMap to solve

I'm reviewing the capabilities of Googles Guava API and I ran into a data structure that I haven't seen used in my 'real world programming' experience, namely, the BiMap. Is the only benefit of this construct the ability to quickly retrieve a key, for a given value? Are there any problems where the solution is best expressed using a BiMap?
Any time you want to be able to do a reverse lookup without having to populate two maps. For instance a phone directory where you would like to lookup the phone number by name, but would also like to do a reverse lookup to get the name from the number.
Louis mentioned the memory savings possible in a BiMap implementation. That's the only thing that you can't get by wrapping two Map instances. Still, if you let us wrap the Map instances for you, we can take care of a few edges cases. (You could handle all these yourself, but why bother? :))
If you call put(newKey, existingValue), we'll error out immediately to keep the two maps in sync, rather than adding the entry to one map before realizing that it conflicts with an existing mapping in the other. (We provide forcePut if you do want to override the existing value.) We provide similar safeguards for inserting null or other invalid values.
BiMap views keep the two maps in sync: If you remove an element from the entrySet of the original BiMap, its corresponding entry is also removed from the inverse. We do the same kind of thing in Entry.setValue.
We handle serialization: A BiMap and its inverse stay "connected," and the entries are serialized only once.
We provide a smart implementation of inverse() so that foo.inverse().inverse() returns foo, rather than a wrapper of a wrapper.
We override values() to return a Set. This set is identical to what you'd get from inverse().keySet() except that it maintains the same iteration order as the original BiMap.

Java map content comparison

Here is a tricky data structure and data organization case.
I have an application that reads data from large files and produces objects of various types (e.g., Boolean, Integer, String) that are categorized in a few (less than a dozen) groups and then stored in a database.
Each object is currently stored in a single HashMap<String, Object> data structure. Each such HashMap corresponds to a single category (group). Each database record is built from the information in all the objects contained in all categories (HashMap data structures).
A requirement has appeared for checking whether subsequent records are "equivalent" in the number and type of columns, where equivalence must be verified across all maps by comparing the name (HashMap key) and the type (actual class) of each stored object.
I am looking for an efficient way of implementing this functionality, while maintaining the original object categorization, because listing objects by category in the fastest possible way is also a requirement.
An idea would be to just sort the keys (e.g., by replacing each HashMap with a TreeMap) and then walk over all maps. An alternative would be to just copy everything in a TreeMap for comparison purposes only.
What would be the most efficient way of implementing this functionality?
Also, if how would you go about finding the difference (i.e., the fields added and those removed), between successive records?
Create a meta SortedSet in which you store all the created maps.
Means SortedSet<Map<String,Object>> e.g. a TreeSet which as a custom Comparator<Map<String,Object>> which does check exactly your requirements of same number and names of keys and same object type per value.
You can then use the contains() method of this meta set structure to find out if a similar record does already exist.
==== EDIT ====
Since I've misundertood the relation between database records and the maps in the first place, I've to change some semantics my answer now of course a little bit.
Still I'would use the mentioned SortedSet<Map<String,Object>> but of course the Map<String,Object> would now point to that Map you and havexy suggested.
On the other hand could it be a step forward to use a Set<Set<KeyAndType>> or SortedSet<Set<KeyAndType>> where your KeyAndType will only contain the key and the type with appropriate Comparable implementation or equals with hashcode.
Why? You asked how to find the differences between two records? If each record relates to one of those inner Set<KeyAndType> you can easily use retainAll() to form the intersection of two successive Sets.
If you would compare this to the idea of a SortedSet<Map<String,Object>>, in both ways you would have the logic which differenciates between the fields within the comparator, one time comparing inner sets, one time comparing inner maps. And since this information gets lost when the surrounding set is constructed, it will be hard to get the differences between two records later on, if you do not have another reduced structure which is easy to use to find such differences. And since such a Set<KeyAndType> could act as key as well as as easy base for comparison between two records, it could be a good candidate to be used for both purposes.
If furthermore you wanna keep the relation between such a Set<KeyAndType> to your record or the group of Map<String,Object> your meta structure could be something like:
Map<Set<KeyAndType>,DatabaseRecord> or Map<Set<KeyAndType>,GroupOfMaps> implemented by a simple LinkedHashMap which allows simple iteration in original order.
One soln is to keep both category based HashMap and combined TreeMap. This will have slight more memory requirement, not much though, as you ll just keep the same reference in both of them.
So whenever you are adding/removing to HashMap you will do the same operation in the TreeMap too. This way both will always be in sync.
You can then use TreeMap for comparison, whether you want comparison of type of object or actual content comparison.

Neo4j indexing (with Lucene) - good way to organize node "types"?

This is more actually more of a Lucene question, but it's in the context of a neo4j database.
I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.
When searching, I always want to find nodes of a specific type, never across all nodes.
I can see three ways of organizing this:
One index per type, properties map naturally to index fields: index 'foo', 'id'='1234'.
A single global index, each field maps to a property name, to distinguish the type either include it as part of the value ('id'='foo:1234') or check the nodes once they're returned (I expect duplicates to be very rare).
A single index, type is part of the field name: 'foo.id'='1234'.
Once created, the database is read-only.
Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?
As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.
I came across this problem recently when I was building an ActiveRecord connection adapter for Neo4j over REST, to be used in a Rails project. Since ActiveRecord and ActiveRelation, both, have a tight coupling with SQL syntaxes, it became difficult to fit everything into NoSQL. Might not be the best solution, but here's how I solved it:
Created an index named model_index which indexed nodes under two keys, type and model
Index lookup with type key currently happens with just one value model. This was introduced primarily to achieve a SHOW TABLES SQL functionality which can get me a list of all models present in the graph.
Index lookup with model key takes place with values corresponding to different model names in my system. This is primarily for achieving DESC <TABLENAME> functionality.
With each table creation as in CREATE TABLE, a node is created with table definition attributes being stored in node properties.
Created node is indexed under model_index with type:model and model:<model-name>. This enables the newly created model in the list of 'tables' and also allows one to directly reach the model node by an index lookup with model key.
For each record created per model (type in your case), an outgoing edge is created labeled instances directed from model node to this new record. v[123] :=> [instances] :=> v[245] where v[123] represents model node and v[245] represents a record of v[123]'s type.
Now if you want to get all instances of a specified type, you could lookup the model_index with model:<model-name> to reach a model node and then fetch all adjacent nodes over an outgoing edge labeled instances. Filtered lookups can be further achieved by applying filters and other complex traversals.
The above solution prevents model_index from clogging since it contains 2x and achieves an effective record lookup via one index lookup and single-level traversal.
Although in your case, nodes of different types are not adjacent to each other, even if you wanted to do so, you could determine the type of any arbitrary node by simply looking up it's adjacent node with an incoming edge labeled instances. Further, I'm considering the incorporate SpringDataGraph's pattern of storing a __type__ property on each instance node to avoid this adjacent node lookup.
I'm currently translating AREL to Gremlin scripts for almost everything. You could find the source code for my AR Adapter at https://github.com/yournextleap/activerecord-neo4j-adapter
Hope this helps, Cheers! :)
A single index will be smaller than several little indexes, because some data, such as the term dictionary, will be shared. However, since a term dictionary lookup is a O(lg(n)) operation, a lookup in a bigger term dictionary might be a little slower. (If you have 50 indexes, this would only require 6 (2^6>=50) more comparisons, it is likely you won't notice any difference.)
Another advantage of a smaller index is that the OS cache is likely to make queries run faster.
Instead of your options 2 and 3, I would index two different fields id and type and search for (id:ID AND type:TYPE) but I don't know if it is possible with neo4j.
spring-data-neo4j is using the first approach - it creates a different index for each type. So I guess that's a good option for the general scenario. But in your particular case it might be suboptimal, as you say. I'd run some benchmarks to measure the performance.
The other two, by the way, seem a bit artificial. You are possibly indexing completely unrelated information in the same index, which doesn't sound right.

structure for holding data in this instance (Hashmap/ArrayList etc)?

Best way to describe this is explain the situation.
Imagine I have a factory that produces chairs. Now the factory is split into 5 sections. A chair can be made fully in one area or over a number of areas. The makers of the chairs add attributes of the chair to a chair object. At the end of the day these objects are collected by my imaginary program and added into X datatype(ArrayList etc).
When a chair is added it must check if the chair already exists and if so not replace the existing chair but append this chairs attributes to it(Dont worry about this part, Ive got this covered)
So basically I want a structure than I can easily check if an object exists if not just straight up insert it, else perform the append. So I need to find the chair matching a certain unique ID. Kind of like a set. Except its not matching the same object, if a chair is made in three areas it will be three distinct objects - in real life they all reperesent the same object though - yet I only want one object that will hold the entire attribute contents of all the chairs.
Once its collected and performed the update on all areas of the factory it needs iterate over each object and add its contents to a DB. Again dont worrk about adding to the DB etc thats covered.
I just want to know what the best data structure in Java would be to match this spec.
Thank you in advance.
I'd say a HashMap: it lets you quickly check whether an object exists with a given unique ID, and retrieve that object if it does exist in the collection. Then it's simply a matter of performing your merge function to add attributes to the object that is already in the collection.
Unlike most other collections (ArrayList, e.g.), HashMaps are actually optimized for looking something up by a unique ID, and it will be just as fast at doing this regardless of how many objects you have in your collection.
This answer originally made reference to the Hashtable class, but after further research (and some good comments), I discovered that you're always better off using a HashMap. If you need synchronization, you can call Collections.synchronizedMap() on it. See here for more information.
I'd say use ArrayList. Override the hashcode/equals() method on your Chair object to use the unique ID. That way you can just use list.contains(chair) to check if it exists.
I'd say use an EnumMap. Define an enum of all possible part categories, so you can query the EnumMap for which part is missing
public enum Category {
SEAT,REST,LEGS,CUSHION
}

Categories