Problem
I have the following tree:
2
/ \
3 5
/ / \
6 4 1
that is represented in the following way and order:
id parent
------------
2 null
3 2
6 3
5 2
4 5
1 5
Purpose:
Store this flatten tree in a recursive structure in O(n) (O(n*log(n)) is acceptable, but not very good) (I know how to solve it in O(n^2), but I stored data in that DFS order to be able to "parse" it in a more efficient way). E.g.:
class R {
int id;
List<R> children;
}
that looks like this in a JSON form:
{
id: 2,
children: [
{
id: 3,
children: { ... }
},
{
id: 5,
children: { ... }
}
]
}
How can I do this? The programming language is not important, because I can translate it in Java.
Java code:
R r = new R();
Map<Long, Line> map = createMap2();
List<Line> vals = new ArrayList<Line>(map.values());
r.id = vals.get(0).id;
vals.remove(0);
r.children = createResource(vals, r.id);
...
private static List<R> createResource(List<Line> l, Long pid) {
List<R> lr = new ArrayList<R>();
if ( l.size() > 0 ) {
Long id = l.get(0).id;
Long p = l.get(0).pid;
l.remove(0);
if ( pid.equals(p) ) {
R r = new R();
r.id = id;
r.children = createResource(l, id);
lr.add(r);
}
else {
return createResource(l, pid); // of course, this is not ok
}
}
return lr;
}
The problem in the code above is that only 2, 3 and 6 are stored in the recursive structure (class R). I want to store the whole flatten tree structure (many Line objects) in that recursive structure (R object), not only some of nodes.
P.S.: The problem is simplified. I'm not interested in a specific solutions because there are many fields involved and thousands of entries. I am also interested in solutions that work fine for the worst case scenarios (different kind of trees) because this is the user's guarantee.
What about something like this? In the first pass, hash the parents as arrays of their children and identify the root; in the second, begin with the root and for each of its children, insert a new object, with its own children and so on:
To take your example, the first pass would generate
parent_hash = {2:[3,5], 3:[6], 5:[4,1]}
root = 2
The second pass would go something like:
object 2 -> object 3 -> object 6
-> object 5 -> object 4
-> object 1
done
The problem with your code is that once an entry doesn't satisfy p == pid condition it is lost forever. Instead of losing entries you should break the loop and return immediately. The offending entry shall also be returned and handled by a proper instance of R upstream.
You can easily represent the whole tree in an array, since each node of the tree can be represented by an index in the array. For a binary tree, the children of index i would be at index 2*i+1 and index 2*i+2. It would then be simple to convert the array to any other representation. The array itself would be a space-efficient representation for balanced trees, but would waste a lot of space for very unbalanced trees. (This should not matter unless you're dealing with a very large amount of data.)
If you need a memory-efficient way for large unbalanced trees, then it would make sense to use the standard Node-representation of trees. To convert from your list, you could use a HashMap as גלעד ברקן suggested. However, if the id's of the nodes are mostly continuous (like the example where they're from 1 to 6), you could also just use an array where each index of the array i is used to store a Node with an id of i. This will let you easily find a parent node and assign it children nodes as they're created.
(cf. my Trees tutorial for storing trees as arrays.)
I found a simple solution based on that "DFS" order.
This approach works even if I use a list of "Line" objects or a map.
private static List<R> createResource(List<Line> l, Long pid) {
List<R> lr = new ArrayList<R>();
for ( Line line : l ) {
if ( line is a children ) {
R r = new R();
r.id = id;
l.remove(0);
r.children = createResource(l, line.id);
lr.add(r);
}
}
return lr;
}
It seems to be in O(n^2) because there is a for loop + recursion for all elements, but it is in O(n) . Thanks to DFS order, the next element for which createResource is called is on the first position ( 0 -> O(1) ). Because the recursion takes every element => the complexity is O(n).
But if the order is not the DFS order (maybe a Map that is not LinkedHashMap is involved), I recommend the solution that contains arrays for parents. (according to גלעד ברקן )
Related
I have a data in tabular format like below:
Activity | ActivityID | ParentID
a1 | 1 |
a2 | 2 | 1
a3 | 3 |
a4 | 4 | 3
a5 | 5 | 2
a6 | 6 | 3
a7 | 7 | 1
I want to represent it like below in java:
a1 -> a2 -> a5
-> a7
a3 -> a4
-> a6
Basically, a List of tree objects where a1 and a3 are roots of the tree having 2 children (a2, a7 and a4, a6 respectively) and a2 has one child (a5). The tree might no necessarily be binary and the data set can be big where one parent can have 50-100 children.
What would be the most effective way in Java ?
For a list of the tree, you can store your data in a structure like this:
final class Node{
final int id;
final string name;
final List<Node> children;
}
So, the final structure is: List<Node>
The data structure you search for is called N-ary tree data structure (you can refer to wikipedia and nist).
If you are familiar with the binary tree (only two childs) , it will be easy for you to change fo n-ary tree (n childs).
In your case you have a forest of the n-ary tree (a list of the n-ary tree) , or you can consider it as one big tree with a common root, where all your effective tree begin at the level one.
The simplest way is to create a Node{info, listChildren} with a field info and a list (arrayList maybe) that will contain children, and a NTreeClasse that contain methods as addChild...generally we use a recursive function that check a certain condition to choose the right path where to insert a new node.
An example of implementing N-ary tree source code and binary tree source code example
If you want to improve your implementation or you seek optimisation, you have to consider some others points like:
The type of the list of children in each node, which is related to the possible number of children, if the number is small we can use a simple array, if the number is big we can use hash table ... etc
Is your tree change or not (dynamic with insert and delete)
The traversal of the tree
Avoid recursion: replace the recursive method by an iterative.
Consider the construction steps, the normal way is for each element, we begin from the root, and we find the right path til we arrive to the leaf where we should insert the node (new child), you can maybe insert the children directly in the right place, its depend on your data and how you want to organize your tree.
you can consider using array to store the tree, also depend strongly on your data.
Hope this helps a little bit to begin and to dig further.
I am providing you a short and straight forward algorithm. It's the dirty version. You can rewrite it as you wish(by which, i mean a better version):
I am assuming, there is a table array such that table[i].activityID would give me ith activity's id and table[i].parentID would give me the activity's parent id ans so on...
[Note: one more point here, I'm also assuming if there is no parent of an element then, its parent id is -1, i.e. in the given example of yours a1's and a3's parent id would be -1. This way we can understand which activity does not have any parent. If you have a better idea, you can use that].
Let us first define the class Activity
class Activity implements Comparable<Activity> {
int id;
String name;
List<Activity> children;
Activity(int id, String name) {
this.id = id;
this.name = name;
this.children = new ArrayList<>();
}
#Override
public int compareTo(Activity activity) {
return this.id - activity.id;
}
}
As the activity class is defined, we'll be creating activity's now and put them in a TreeMap(just to keep track of them).
Notice, which activity's parent id is -1, those will be roots of trees [and according to your example, there could be several trees formed from your table]. We'll keep track of roots while creating activity's.
// to keep track of all acticities
TreeMap<int, Activity> activities = new TreeMap<>();
for(int i=0; i<table.length; i++) {
Activity a = new Activity(table[i].activityID, table[i].activityName);
activities.add(a);
}
// to keep track of root activities
List<Activity> roots = new ArrayList<>();
for(int i=0; i<table.length; i++) {
// check if there is a parent of this Activity
Activity activity = activities.get(table[i].activityID);
// now, if activity has a parent
// then, add activity in the parent's children list
if(table[i].parentID != -1) {
Activity parent = activities.get(table[i].parentID);
// this is very bad way of writing code
// just do it for now
// later rewrite it
parent.children.add(activity);
} else {
// and activity does not have a parent
// then, it is a root
roots.add(activity);
}
}
Now, we're going to create a traverse method, which will help to traverse the tree node in order.
void traverse(Activity u) {
System.out.println(u.name);
for(Activity v: u.children) {
traverse(v);
}
}
Now, you can call this function using the root activities:
for(Activity rootActivity: roots) {
traverse(rootActivity);
}
That's it...
I want to cache a large number of Java objects (String, byte[]) with a composite key (String, int) in a cache like JCS or Infinispan.
The keys can be grouped by the string part (let's call it ID) of it:
KEY = VALUE
-------------
A 1 = valueA1
A 4 = valueA4
A 5 = valueA5
B 9 = valueB9
C 3 = valueC3
C 7 = valueC7
I need to remove elements grouped by the ID part of the key, so for example A should remove A 1, A 4 and A 5.
First I tried something like this:
final List<String> keys = cache.keySet()
.stream().filter(k -> k.getId().equals(id)).collect(Collectors.toList());
keys.forEach(cache::remove);
While this works, it is - not surprising - very expensive and thus slow.
So I tried another approach by using only the ID as key and group the values in a map:
KEY = VALUE
---------------------------------------------
A = {1 = valueA1, 4 = valueA4, 5 = valueA5}
B = {9 = valueB9}
C = {3 = valueC3, 7 = valueC7}
Removing a group is then very efficient:
cache.remove(id);
But putting requires a get:
Map<Integer, Value> map = cache.get(key.getId());
if (map == null) {
map = new HashMap<>();
}
map.put(key.getInt(), value);
cache.put(key.getId(), map);
Now there are less elements in the cache with a simpler key, but the values are larger and more complex. Testing with hundreds of thousands of elements in the cache, deletes are fast and puts and gets don't seem to be noticeably slower.
Is this a valid solution or are there better approaches?
I suggest you use computeIfAbsent and save a put and get invocation as follows:
cache.computeIfAbsent(key.getId(), k -> new HashMap<Integer,Value>()).put(key.getInt(),value);
This method ensures the creation of the secondary map only if it is not already mapped in the primary map, and saves the need for an additional get invocation since it returns the secondary map mapped to the primary key.
References:
Map::computeIfAbsent
What is the difference between putIfAbsent and computeIfAbsent in Java 8 Map ?
I am quite new to java streams. Do I need to re-create the stream each time in this loop or is there a better way to do this? Creating the stream once and using the .noneMatch twice results in "stream already closed" exception.
for ( ItemSetNode itemSetNode : itemSetNodeList )
{
Stream<Id> allUserNodesStream = allUserNodes.stream().map( n -> n.getNodeId() );
Id nodeId = itemSetNode.getNodeId();
//if none of the user node ids match the node id, the user is missing the node
if ( allUserNodesStream.noneMatch( userNode -> userNode.compareTo( nodeId ) == 0 ) )
{
isUserMissingNode = true;
break;
}
}
Thank you !
I would suggest you make a list of all the user ids outside the loop. Just make sure the class Id overrides equals() function.
List<Id> allUsersIds = allUserNodes.stream().map(n -> n.getNodeId()).collect(Collectors.toList());
for (ItemSetNode itemSetNode : itemSetNodeList)
{
Id nodeId = itemSetNode.getNodeId();
if (!allUsersIds.contains(nodeId))
{
isUserMissingNode = true;
break;
}
}
The following code should be equivalent, except that the value of the boolean is reversed so it's false if there are missing nodes.
First all the user node Ids are collected to a TreeSet (if Id implements hashCode() and equals() you should use a HashSet). Then we stream itemSetNodeList to see if all those nodeIds are contained in the set.
TreeSet<Id> all = allUserNodes
.stream()
.map(n -> n.getNodeId())
.collect(Collectors.toCollection(TreeSet::new));
boolean isAllNodes = itemSetNodeList
.stream()
.allMatch(n -> all.contains(n.getNodeId()));
There are many ways to write equivalent (at least to outside eyes) code, this uses a Set to improve the lookup so we don't need to keep iterating the allUserNodes collection constantly.
You want to avoid using a stream in a loop, because that will turn your algorithm into O(n²) when you're doing a linear loop and a linear stream operation inside it. This approach is O(n log n), for the linear stream operation and O(log n) TreeSet lookup. With a HashSet this goes down to just O(n), not that it matters much unless you're dealing with large amount of elements.
You also could do something like this:
Set<Id> allUserNodeIds = allUserNodes.stream()
.map(ItemSetNode::getNodeId)
.collect(Collectors.toCollection(TreeSet::new));
return itemSetNodeList.stream()
.anyMatch(n -> !allUserNodeIds.contains(n.getNodeId())); // or firstMatch
Or even:
Collectors.toCollection(() -> new TreeSet<>(new YourComparator()));
Terminal operations of a Stream such as noneMatch() close the Stream and make it so not reusable again.
If you need to reuse this Stream :
Stream<Id> allUserNodesStream = allUserNodes.stream().map( n -> n.getNodeId() );
just move it into a method :
public Stream<Id> getAllUserNodesStream(){
return allUserNodes.stream().map( n -> n.getNodeId());
}
and invoke it as you need it to create it :
if (getAllUserNodesStream().noneMatch( userNode -> userNode.compareTo( nodeId ) == 0 ))
Now remember that Streams become loops in the byte code after compilation.
Performing multiple times the same loop may not be desirable. So you should consider this point before instantiating multiple times the same stream.
As alternative to create multiple streams to detect match with nodeId :
if (allUserNodesStream.noneMatch( userNode -> userNode.compareTo( nodeId ) == 0 ) ) {
isUserMissingNode = true;
break;
}
use rather a structure of type Set that contains all id of allUserNodes :
if (idsFromUserNodes.contains(nodeId)){
isUserMissingNode = true;
break;
}
It will make the logic more simple and the performance better.
Of course it supposes that compareTo() be consistent with equals() but it is strongly recommended (though not required).
It will take each item from the itemSetNodeList and check it if present in the using the noneMatch(). If it is not present will get true returned. The anyMatch if atleast once item is not found will stop the search and return false. If all the item is found, we will return true.
Stream<Id> allUserNodesStream = allUserNodes.stream().map( n -> n.getNodeId() );
boolean isUserMissing=itemSetNodeList.stream()
.anyMatch(n-> allUserNodes.stream().noneMatch(n));
I have an "Item" class that contains the following fields (in short): id (related to the primary key of the Item table on SQL Server), description, sequence (non-null integer), and link (a reference to the id of the parent object), can be null)
I would like to sort by using Java as follows:
Id Sequence Link Description
1 1 null Item A
99 ..1 1 Son of A, first of the sequence
57 ..2 1 Son of A, second of the sequence
66 ..3 1 Son of A, third of the sequence
2 2 null Item B
3 3 null Item C
...
(I put the dots for better visualization)
That is, I would like the children of a certain item to come directly below their parent, ordered by the "sequence" field.
I tried using the comparator, but it failed:
public class SequenceComparator implements Comparator<Item> {
#Override
public int compare(Item o1, Item o2) {
String x1 = o1.getSequence().toString();
String x2 = o2.getSequence().toString();
int sComp = x1.compareTo(x2);
if (sComp != 0) {
return sComp;
} else {
x1 = o1.getLink().toString();
x2 = o2.getLink() == null?"":o2.getLink().toString();
return x1.compareTo(x2);
}
}
}
How can I do that?
New answer: I don’t think you want one comparator to control the complete sorting, because when sorting children you need the sequence of the parent, and you don’t have an easy or natural access to that from within the comparator.
Instead I suggest a sorting in a number of steps:
Put the items into groups by parent items. So one group will be the item with id 1 and all its children. Items with no children will be in a group on their own.
Sort each group so the parent comes first and then all the children in the right order.
Sort the groups by the parent’s sequence.
Concatenate the sorted groups into one list.
Like this, using both Java 8 streams and List.sort():
// group by parent id
Map<Integer, List<Item>> intermediate = input.stream()
.collect(Collectors.groupingBy(i -> i.getLink() == null ? Integer.valueOf(i.getId()) : i.getLink()));
// sort each inner list so that parent comes first and then children by sequence
for (List<Item> innerList : intermediate.values()) {
innerList.sort((i1, i2) -> {
if (i1.getLink() == null) { // i1 is parent
return -1; // parent first
}
if (i2.getLink() == null) {
return 1;
}
return i1.getSequence().compareTo(i2.getSequence());
});
}
// sort lists by parent’s sequence, that is, sequence of first item
List<Item> result = intermediate.values().stream()
.sorted(Comparator.comparing(innerList -> innerList.get(0).getSequence()))
.flatMap(List::stream)
.collect(Collectors.toList());
The output is (leaving out the item description):
1 1 null
99 ..1 1
57 ..2 1
66 ..3 1
2 2 null
3 3 null
(This output was produced with a toString method that printed the dots when converting an item with a parent to a String.)
If you cannot use Java 8, I still believe the general idea of the steps mentioned above will work, only some of the steps will require a little more code.
I deleted my previous answer since I had misunderstood the part about what getLink() returns and then decided that that answer wasn’t worth trying to salvage.
Edit:
I am actually ignoring this piece from the documentation of Collectors.groupingBy(): “There are no guarantees on the …, mutability, of the … List objects returned.” It still works with my Java 8. If immutability of the list should prevent sorting, the solution is to create a new ArrayList containing the same items.
With thanks to Stuart Marks for the inspiration, the comparator for sorting the inner lists needs not be as clumsy as above. The sorting can be written in this condensed way:
innerList.sort(Comparator.comparing(itm -> itm.getLink() == null ? null : itm.getSequence(),
Comparator.nullsFirst(Integer::compare)));
Given that there are only two layers in the hierarchy, this boils down to a classic multi-level sort. There are two kinds of items, parents and children, distinguished by whether the link field is null. The trick is that the sorting at each level isn't on a particular field. Instead, the value on which to sort on depends on what kind of item it is.
The first level of sorting should be on the parent value. The parent value of a parent item is its sequence, but the parent value of a child item is the sequence of the parent it's linked to. Child items are linked to parent items via their id, so the first thing we need to do is to build up a map from ids to sequence values of parent nodes:
Map<Integer, Integer> idSeqMap =
list.stream()
.filter(it -> it.getLink() == null)
.collect(Collectors.toMap(Item::getId, Item::getSequence));
(This assumes that ids are unique, which is reasonable as they're related to the table primary key.)
Now that we have this map, you can write a lambda expression that gets the appropriate parent value from the item. (This assumes that all non-null link values point to existing items.) This is as follows:
(Item it) -> it.getLink() == null ? it.getSequence() : idSeqMap.get(it.getLink())
The second level of sorting should be on the child value. The child value of a parent item is null, so nulls will need to be sorted before any non-null value. The child value of a child item is its sequence. A lambda expression for getting the child value is:
(Item it) -> it.getLink() == null ? null : it.getSequence()
Now, we can combine these using the Comparator helper functions introduced in Java 8. The result can be passed directly to the List.sort() method.
list.sort(Comparator.comparingInt((Item it) -> it.getLink() == null ? it.getSequence() : idSeqMap.get(it.getLink()))
.thenComparing((Item it) -> it.getLink() == null ? null : it.getSequence(),
Comparator.nullsFirst(Integer::compare))
.thenComparingInt(Item::getId));
The first level of sorting is pretty straightforward; just pass the first lambda expression (which extracts the parent value) to Comparator.comparingInt.
The second level of sorting is a bit tricky. I'm assuming the result of getLink() is a nullable Integer. First, we have to extract the child value using the second lambda expression. This results in a nullable value, so if we were to pass this to thenComparing we'd get a NullPointerException. Instead, thenComparing allows us to pass a secondary comparator. We'll use this to handle nulls. For this secondary comparator we pass
Comparator.nullsFirst(Integer::compare)
This compares Integer objects, with nulls sorted first, and non-nulls compared in turn using the Integer.compare method.
Finally, we compare id values as a last resort. This is optional if you're using this comparator only for sorting; duplicates will end up next to each other. But if you use this comparator for a TreeSet, you'll want to make sure that different items never compare equals. Presumably a database id value would be sufficient to differentiate all unique items.
Considering your data structure is a Tree (with null as the root node) with no cycles:
You have to walk up the tree for both o1 and o2 until you find a common ancestor. Once you do, take one step back along both branches to find their relative order (with Sequence)
Finding the common ancestor may be tricky to do, and I don't know if it is possible in linear time, but certainly possible in O(n log n) time (with n the length of the branches)
Say I have a csv file where one of the columns is unix timestamp. It is not sorted on that, but following those in order could be a useful relationship. When I want that, I could use ORDER BY, but adding relationship pointers should be faster right, and make use of the NOSQL? Do I have to sort this and add the relationship as I ingest, or can I do it after a query?
After I run the first query and get a subset back:
result = engine.execute(START...WHERE...ORDER BY time)
Can I then go through results adding this relationship like:
prev.createRelationshipTo(next, PRECEDES);
I tried two different ways using foreach or iterator and both had runtime errors casting a string to a Node:
for (Map<String, Object>row : result) {
String n = (String) row.values().iterator().next();
System.out.println(n);
}
Iterator<Node> nodes = result.columnAs("n.chosen");
if (nodes.hasNext()) {
Node prev = nodes.next();
while (nodes.hasNext()) {
Node n = nodes.next();
prev.createRelationshipTo(n, null);
prev = n;
}
}
Also, there is the edge case of two rows having the same timestamp. I don't care about the order that is chosen, but I want it to not break the relation chain.