Large branching trees in java?

Large branching trees in java? - java

My question is about scalable logic branching.
Is there an elegant way to do branching logic trees in java (although I've always thought that they look more like root systems, but that's beside the point). I'm trying to develop a very simple text based adventure game as a side project to my studies, but I'm not sure what the best way to go about navigating these large logic systems is.
What I'm trying currently is an array that holds four values: stage, location, step, choice.
[EDIT - added choice variable to store user choice, changed name to reflect actual name in my code so that I don't get confused later]
int[] decisionPoint = {stage, location, step, choice};
A stage is supposed to represent a single major part of the tree.
A location is supposed to represent my location within the tree.
A step is supposed to represent my progress through a given location.
Choice is the user input
At the moment, since I'm only dealing with a single tree, stage isn't being used much. Location and step are working well, but any time I get into a decision within a step the system breaks down.
I could keep creating more and more variables to represent deeper and deeper layers into the tree, but I feel like Java probably provides a better solution somewhere.
Currently, I'm using switch statements to figure out where in the program I am based on the values stored in nextQuestion. Is there something better? Or, is there a way to extend the array beyond what I'm using here to make it a bit more polymorphic (in the methods for the individual questions/text/whatever could I just have it create a larger array from a smaller one? Could I pass a smaller array as an argument but define the parameter as a larger array?)
//Switch example
switch(LocationTracker.getLocation()) { //start location finding switch
case 1 : //Location 1
switch (LocationTracker.getStep()) {//start location1 switch
case 1 :
location1s1(graphicsStuff);
break;
case 2 :
location1s2(graphicsStuff);
break;
} break; //end location1 switch
case 2 : //Location 2
switch (LocationTracker.getStep()) {
//same stuff that happened above
} break;
Everything I find online just brings me to irrelevant pages about different online survey creators that I can use. If I could view their source-code that'd be kind of nice, but since I can't, I'm hoping you guys can help. :)
[EDIT]
Wow, what a nice response in such a short time at such an early hour!
I'll try to go into very explicit detail about how I'm solving the problem right now. It's worth mentioning that this does technically work, it's just that every time I need a branch inside a branch I have to create another variable inside a string array to keep track of my position, and really I'm just fishing for a solution that doesn't need an infinitely expanding string as the program becomes more and more complex.
Right now I have a program with 5 classes:
The Main Class which starts the GUI
The GUI class which provides three services: userInput, userOptions, and outputArea.
The DecisionTreeStage1 class which handles the logic of my problem at the moment (using switch statements).
The LocationTracker class which is designed to track my location within the DecisionTreeStage1 class
The DialogueToOutput class which changes the options that the users have, and also updates the output fields with the results of their actions.
Special point of interest:
I want to have multiple decision branches at some point, and one main tree (maybe call it Yggdrasil? :D). For now, the DecisionTreeStage1 represents a very isolated system that isn't planning to go anywhere. I hope to use the stage variable stored in my array to move from one major branch to the next (climbing the tree if you will). My current implementation for this just uses nested switch statements to decide where I'm going. This imposes an annoying limitation: every time my path gets deeper and deeper I need another variable in my array to store that data. For example:
//Switch example deeper
switch(LocationTracker.getLocation()) { //start location finding switch
case 1 : //Location 1
switch (LocationTracker.getStep()) {//start location1 switch
case 1 :
switch(LocationTracker.getChoice()) {//Make a decision at this step based on the user choice
Given this example, what if the user choice doesn't just lead to some logic? (In this case, just an update to the outputArea) What if it leads to ANOTHER branching path? And that leads to another branching path? Eventually, I would want all paths to converge on the same spot so that I could move to the next "stage."
My real hope is to make this problem infinitely scalable. I want to be able to go as deep into one branch as I need to, without having to create a static and arbitrary number of variable declarations in my decisionPoint array every time.
I haven't been able to find much information about this, like I said.
Let me try presenting this question: Are there any other branching logic statements other than:
if(something)
stuff;
else
otherStuff;
and
switch(something) {
case 1:
stuff;
break;
case 2:
otherStuff;
break;
And if so, what are they?
PS - I know about the ternary if statement in Java, but it doesn't seem useful to what I'm doing. :)

You can build normal tree structures in Java, similar to the trees that can be built in C. Regardless if object references are theoretically pointers or not, they substitute pointers nicely in the tree constructions:
class Node {
Node left;
Node right;
Node parent;
}
You can also build graphs (cyclic graphs including) and linked lists no problem. There is no any obvious reason why large structures should have problems (apart from that object reference uses some memory).

Instead of returning a value, you could return an Callable which just needs to be executed. This can then be chained (theoretically infinitely)
You can have a LocationEvaluation for example which could return a SpecificLocationEvaluator which in turns returns one of StepEvaluation or ChoiceEvaluator or somesuch. All of these would implement the Callable interface.
Depending on how you do it, you could have strict type checking so that a LocationEvaluation always returns a SpecificLocationEvaluator or it can generic and then you can chain any of then in any order.
Once you build the structure, out, you would essentially have a tree which would be traversed to solve it.
I don't understand the problem adequately to be able to provide more concrete implementation details - and apologies if I misunderstood some of the branching (i.e. the names of the classes / steps above)

Related

Is defining variables for every condition is better then using getter and setters?

I have two codes can anyone tell me which approach is the better and why.
Approach 1 -
if (("Male").equalsIgnoreCase(input.getSex()) || ("Female").equalsIgnoreCase(input.getSex())) {
// do something
}else{
//do somethong
}
Approach 2 -
String tempSex = input.getSex()
if (("Male").equalsIgnoreCase(tempSex) || ("Female").equalsIgnoreCase(tempSex)) {
// do something
}else{
//do somethong
}
this is one condition, in my code, I have a lot of conditions similar to this one. In some condition, I have to compare with a lot more Strings.
Is this a good approach to define variables for every condition or I can use getter and setters?

These two approaches are essentially identical in terms of performance assuming the getSex function is a trivial getter (if getSex is complex or involves changing some other state in the class then these two bits of code are NOT equivalent).
I would prefer the first from a style point of view in that the extra local variable is slightly confusing to the flow of the code.
However if you main purpose is using code of this form is to validate legal input (as it appears from your example) I would try to create a method
boolean input.isSexValid() to encapsulate that functionality which would make the code less repetitive and more readable.

Strong argument that this is primarily opinion based, but:
I vote Approach 2.
What if the getter is slow (like it has to go to a DB)? You have a redundant round trip to the DB.

speed up a simple method in java

I tried to find out why a part of my application runs very slow. I used 'jmc' over 5 Minutes and ran that part of my application which takes so long.
Analysing the methods-Section, I found out that 66% of the time were due to one function (no method-call inside showed up).
The method looks like this and is called about 4 million times:
public DataCell getKNIMECell(int rowIdx) {
if(m_missingFlags.contains(rowIdx))
return DataType.getMissingCell();
switch(m_type) {
case R_LOGICAL:
return BooleanCellFactory.create((boolean)m_data[rowIdx]);
case R_INT:
return IntCellFactory.create((int) m_data[rowIdx]);
case R_DOUBLE:
return DoubleCellFactory.create((double) m_data[rowIdx]);
case R_FACTOR:
case R_STRING:
return StringCellFactory.create((String) m_data[rowIdx]);
default:
}
return null;
}
m_type is a class member and an enum defined within another class like this:
public enum RType { R_DOUBLE, R_LOGICAL, R_INT, R_STRING, R_FACTOR };
The array m_data is of type 'Object' and has around 4 million entries.
m_missingFlag is a ArrayList<Integer>.
I really don't know how to speed up that part of the code. Any ideas? As I said, none of the calls within that method seems to take a lot of time.

m_missingFlags is an ArrayList<>
This may be your bottleneck - if the list is big. Try using a HashSet.

My guess (and I just decided to write an answer because the comments were getting overwhelming) is that the performance issue is because of the large object array and call contains List<Boolean> missing flags (also we have no idea what list implementation that is as well).
My approaches to fix this would be to
cache DataCell (I hope its immutable) (ie particularly for boolean)
use a different data structure for m_missingFlags (ie bloom filter or some tree, or hash).
create an array per data type (this avoids some casting issues but costs more memory).
That is roughly the order I would try things but your mileage may vary as I have no idea how or what DataCell is composed of.

Helping the JVM with stack allocation by using separate objects

I have a bottleneck method which attempts to add points (as x-y pairs) to a HashSet. The common case is that the set already contains the point in which case nothing happens. Should I use a separate point for adding from the one I use for checking if the set already contains it? It seems this would allow the JVM to allocate the checking-point on stack. Thus in the common case, this will require no heap allocation.
Ex. I'm considering changing
HashSet<Point> set;
public void addPoint(int x, int y) {
if(set.add(new Point(x,y))) {
//Do some stuff
}
}
to
HashSet<Point> set;
public void addPoint(int x, int y){
if(!set.contains(new Point(x,y))) {
set.add(new Point(x,y));
//Do some stuff
}
}
Is there a profiler which will tell me whether objects are allocated on heap or stack?
EDIT: To clarify why I think the second might be faster, in the first case the object may or may not be added to the collection, so it's not non-escaping and cannot be optimized. In the second case, the first object allocated is clearly non-escaping so it can be optimized by the JVM and put on stack. The second allocation only occurs in the rare case where it's not already contained.

Marko Topolnik properly answered your question; the space allocated for the first new Point may or may not be immediately freed and it is probably foolish to bank on it happening. But I want to expand on why you're currently in a deep state of sin:
You're trying to optimise this the wrong way.
You've identified object creation to be the bottleneck here. I'm going to assume that you're right about this. You're hoping that, if you create fewer objects, the code will run faster. That might be true, but it will never run very fast as you've designed it.
Every object in Java has a pretty fat header (16 bytes; an 8-byte "mark word" full of bit fields and an 8-byte pointer to the class type) and, depending on what's happened in your program thus far, possibly another pretty fat trailer. Your HashSet isn't storing just the contents of your objects; it's storing pointers to those fat-headers-followed-by-contents. (Actually, it's storing pointers to Entry classes that themselves store pointers to Points. Two levels of indirection there.)
A HashSet lookup, then, figures out which bucket it needs to look at and then chases one pointer per thing in the bucket to do the comparison. (As one great big chain in series.) There probably aren't very many of these objects, but they almost certainly aren't stored close together, making your cache angry. Note that object allocation in Java is extremely cheap---you just increment a pointer---and that this is quite probably a bigger source of slowness.
Java doesn't provide any abstraction like C++'s templates, so the only real way to make this fast and still provide the Set abstraction is to copy HashSet's code, change all of the data structures to represent your objects inline, modify the methods to work with the new data structures, and, if you're still worried, make copies of the relevant methods that take a list of parameters corresponding to object contents (i.e. contains(int, int)) that do the right thing without constructing a new object.
This approach is error-prone and time-consuming, but it's necessary unfortunately often when working on Java projects where performance matters. Take a look at the Trove library Marko mentioned and see if you can use it instead; Trove did exactly this for the primitive types.
With that out of the way, a monomorphic call site is one where only one method is called. Hotspot aggressively inlines calls from monomorphic call sites. You'll notice that HashSet.contains punts to HashMap.containsKey. You'd better pray for HashMap.containsKey to be inlined since you need the hashCode call and equals calls inside to be monomorphic. You can verify that your code is being compiled nicely by using the -XX:+PrintAssembly option and poring over the output, but it's probably not---and even if it is, it's probably still slow because of what a HashSet is.

As soon as you have written new Point(x,y), you are creating a new object. It may happen not to be placed on the heap, but that's just a bet you can lose. For example, the contains call should be inlined for the escape analysis to work, or at least it should be a monomorphic call site. All this means that you are optimizing against a quite erratic performance model.
If you want to avoid allocation the solid way, you can use Trove library's TLongHashSet and have your (int,int) pairs encoded as single long values.

Java: efficient Collection concept for a paired objects

I'm missing some kind of collection functionality for a specific problem.
I'd like to start with a few informations about the problem's background - maybe there's a more elegant way to solve it, which doesn't end in the specific problem I'm stuck with:
I'm modelling a volume mesh made of tetrahedral cells (the 2D-analog would be a triangle mesh). Two tetrahedrons are considered to be adjacent if they share one triangle-face (which occupies three vertices). My application has to be able to navigate from cell to cell via their common face.
To meet some other requirements I had to split the faces into two so-called half-faces which share the same vertices but are belonging to different cells and have opposite orientation.
The application needs to be able to do calls like this (where Face models a half-face):
Cell getAdjacentCell(Cell cell, int faceIndex) {
Face face = cell.getFace(faceIndex);
Face partnerFace = face.getPartner();
if (partnerFace == null) return null; // no adjacent cell present
Cell adjacentCell = partnerFace.getCell();
return adjacentCell;
}
The implementation of the getPartner()-method is the method in question. My approach is as follows:
Face-objects can create some kind of a immutable Signature-object containing merely the vertex-configuration, the orientation (clockwise (cw) or counter-clockwise (ccw)) and a back-reference to the originating Face-object. Face.Signature-objects are considered to be equal (#Override equals()) if they occupy the same three vertices - regardless of their orientation and their associated cell.
I created two sets in the Mesh-objects to contain all half-faces grouped by their orientation:
Set<Face.Signature> faceSignatureCcw = new HashSet<Face.Signature>();
Set<Face.Signature> faceSignatureCw = new HashSet<Face.Signature>();
Now I'm able to determine if a partner exists ...
class Face {
public Face getPartner() {
if (this.getSignature().isCcw()) {
boolean partnerExists = this.getMesh().faceSignatureCw.contains(this);
} else {
boolean partnerExists = this.getMesh().faceSignatureCcw.contains(this);
}
}
}
... but Set does not allow to retrieve the specific object it contains! It merely confirms that it contains an object that matches via .equals().
(end of background informations)
I need a Collection-concept which provides the following functionality:
add a Face-Object to the Collection (duplicates are prohibited by the application and thus cannot occur)
retrieve the partner from the Collection for a given Face-Object that .equals() but has the opposite orientation
A possible (but way to slow) solution would be:
class PartnerCollection {
List<Face.Signature> faceSignatureCcw = new ArrayList<Face.Signature>();
List<Face.Signature> faceSignatureCw = new ArrayList<Face.Signature>();
void add(Face.Signature faceSignature) {
(faceSignature.isCcw() ? faceSignatureCw : faceSignatureCcw).add(faceSignature);
}
Face.Signature getPartner(Face.Signature faceSignature) {
List<Face.Signature> partnerList = faceSignature.isCcw() ? faceSignatureCw : faceSignatureCcw;
for (Face.Signature partnerSignature : partnerList) {
if (faceSignature.equals(partnerSignature)) return partnerSignature;
}
return null;
}
}
To be complete: The final application will have to handle hundreds of thousands of Face-Objects in a real-time environment. So performance is an issue.
Thanks in advance to anyone who at least tried to follow me up to this point :)
I hope there's anyone out there having the right idea to solve this.

Anything wrong with using two Map<Face.Signature, Face.Signature>?
One for each direction?
That's what I'd do. There's practically no code to it.

It's late night here and I haven't ready your question completely. So, I apologize if this doesn't make any sense, but do have you considered using a graph data structure? If the graph data structure is indeed a possible solution, you might want to check out jGraphT

Have you considered just giving each Face a partner data member? As in,
public class Face
{
Face partner;
//whatever else
}
The Face.Signature construct is a bit hairy and really shouldn't be needed. If every face has a partner (or enough Face objects can have a partner that it makes sense to think that there is a has-a relationship between a Face and a partner Face), the connection should just be an instance variable. If you can use this approach, it should vastly simplify your code. If not, post back the reason this doesn't work for you so that I can keep trying to help.

Using the design you have now, there is no way around something needing to iterate somewhere. The question is, where you want that iteration to occur? I suggest you do this:
List<Face.Signature> partnerList = faceSignature.isCcw() ? faceSignatureCw : faceSignatureCcw;
int idx = partnerList.indexOf(faceSignature);
if(idx == -1)
return null;
return partnerList.get(idx);
Also, as long as you are using Lists, and know that the initial size will have to be pretty big, you might as well say, new ArrayList(100000) or so.
Of course, this isn't the only method, just one that ensures the iteration will be optimal.
EDIT: After some thought, I believe the ideal data-structure for this would be an Octuply Linked List, which can make things confusing, but also very fast (comparatively).

Why should pop() take an argument?

Quick background
I'm a Java developer who's been playing around with C++ in my free/bored time.
Preface
In C++, you often see pop taking an argument by reference:
void pop(Item& removed);
I understand that it is nice to "fill in" the parameter with what you removed. That totally makes sense to me. This way, the person who asked to remove the top item can have a look at what was removed.
However, if I were to do this in Java, I'd do something like this:
Item pop() throws StackException;
This way, after the pop we return either: NULL as a result, an Item, or an exception would be thrown.
My C++ text book shows me the example above, but I see plenty of stack implementations taking no arguments (stl stack for example).
The Question
How should one implement the pop function in C++?
The Bonus
Why?

To answer the question: you should not implement the pop function in C++, since it is already implemented by the STL. The std::stack container adapter provides the method top to get a reference to the top element on the stack, and the method pop to remove the top element. Note that the pop method alone cannot be used to perform both actions, as you asked about.
Why should it be done that way?
Exception safety: Herb Sutter gives a good explanation of the issue in GotW #82.
Single-responsibility principle: also mentioned in GotW #82. top takes care of one responsibility and pop takes care of the other.
Don't pay for what you don't need: For some code, it may suffice to examine the top element and then pop it, without ever making a (potentially expensive) copy of the element. (This is mentioned in the SGI STL documentation.)
Any code that wishes to obtain a copy of the element can do this at no additional expense:
Foo f(s.top());
s.pop();
Also, this discussion may be interesting.
If you were going to implement pop to return the value, it doesn't matter much whether you return by value or write it into an out parameter. Most compilers implement RVO, which will optimize the return-by-value method to be just as efficient as the copy-into-out-parameter method. Just keep in mind that either of these will likely be less efficient than examining the object using top() or front(), since in that case there is absolutely no copying done.

The problem with the Java approach is that its pop() method has at least two effects: removing an element, and returning an element. This violates the single-responsibility principle of software design, which in turn opens door for design complexities and other issues. It also implies a performance penalty.
In the STL way of things the idea is that sometimes when you pop() you're not interested in the item popped. You just want the effect of removing the top element. If the function returns the element and you ignore it then that's a wasted copy.
If you provide two overloads, one which takes a reference and another which doesn't then you allow the user to choose whether he (or she) is interested in the returned element or not. The performance of the call will optimal.
The STL doesn't overload the pop() functions but rather splits these into two functions: back() (or top() in the case of the std::stack adapter) and pop(). The back() function just returns the element, while the pop() function just removes it.

Using C++0x makes the whole thing hard again.
As
stack.pop(item); // move top data to item without copying
makes it possible to efficiently move the top element from the stack. Whereas
item = stack.top(); // make a copy of the top element
stack.pop(); // delete top element
doesn't allow such optimizations.

The only reason I can see for using this syntax in C++:
void pop(Item& removed);
is if you're worried about unnecessary copies taking place.
if you return the Item, it may require an additional copy of the object, which may be expensive.
In reality, C++ compilers are very good at copy elision, and almost always implement return value optimization (often even when you compile with optimizations disabled), which makes the point moot, and may even mean the simple "return by value" version becomes faster in some cases.
But if you're into premature optimization (if you're worried that the compiler might not optimize away the copy, even though in practice it will do it), you might argue for "returning" parameters by assigning to a reference parameter.
More information here

IMO, a good signature for the eqivalent of Java's pop function in C++ would be something like:
boost::optional<Item> pop();
Using option types is the best way to return something that may or may not be available.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.