Java or C++ for my particular agent-based model (ABM)? - java

I unfortunately need to develop an agent-based model. My background is C++; I'm decent but not a professional programmer. My goal is to determine whether, my background aside for the moment, the following kind of algorithm would be faster or dramatically easier to write in C++ or Java.
My agents will be of class Host. Their private member variables include their infection and immune statuses (type int) with respect to different strains. (In C++, I might use an unordered_map or vector to hold this information, depending on the number of strains.) I plan to keep track of all hosts in a vector, vector&lt Host *&gt hosts .
The program will need to know at any time all the particular hosts infected with a particular strain or with immunity to a particular strain. For each strain, I could thus maintain two separate structures, e.g., vector&lt Host *&gt immune and vector&lt Host *&gt infectious (I might make each two-dimensional, indexed by strain and then host).
Hosts can die. It seems like this creates a mess in C++, in that I would have to find the right individual to kill in host and search through the other structures (immune and infectious) to find all pointers to this object. I'm under the impression that Java will delete all these pointers implicitly if I delete the underlying object. Is this true? Is there a dramatically better way to do this in C++ than what I have here?
Thanks in advance for any help.
I should add that if I use C++, I will use smart pointers. That said, I still don't see a slick way to delete all pointers to an object when the object needs to go. (When a host dies, I want to delete it from memory.)
I realize there's a lot to learn in Java. I'm hoping someone with more perspective on the differences between the languages, and who can understand what I need to do (above), can tell me if one language will obviously be more efficient than another.

I'm under the impression that Java will delete all these pointers implicitly if I delete the underlying object. Is this true?
Nope. You actually have it backwards; if you delete all the pointers, Java will delete the underlying object. So you'll still need to search through all three of your data structures (hosts, immune, and infectious) to kill that particular host.
However, this "search" will be fast and simple if you use the right data structures; a HashSet will do the job very nicely.
private HashSet<Host> hosts;
private HashSet<Host> immune;
private HashSet<Host> infectious;
public void killHost(Host deadManWalking) {
hosts.remove(deadManWalking);
immune.remove(deadManWalking);
infectious.remove(deadManWalking);
}
It's really that simple, and will take place in O(lg n) time. (Though you will have to override the equals and hashCode methods in your implementation of Host; this is not technically challenging.)
My memories of C++ are too hazy for me to give any sort of authoritative comparison between the two languages; I did a ton of C++ work in college, haven't touched it since. Will C++ code run faster? Done right and assuming you don't have any memory leaks, I'd suspect it would, though Java's rep as a slow language is mostly a holdover from its youth; it's pretty decent these days. Easier to write? Well, give that you'd be learning the language, probably not. But the learning curve from C++ to Java is pretty gentle, and I personally don't miss C++ at all. Once you know the languages, Java is, in my opinion, vastly easier to work with. YMMV, natch, but it may well be worth the effort for you.

I can't answer your all questions, but
I'm under the impression that Java will delete all these pointers implicitly if I delete the underlying object.
In Java you don't delete an object; instead, it gets effectively deleted when the reference count to it goes to zero. However, you may want to utilize weak references here; this way the object disappears when the strong reference count goes to zero.

Actually, your impression is basicly backwards: Java will assume an object (the host in this case) is dead when there's no longer any pointer to give access to that object. At that point it'll clean up the object (automatically).
At a guess, however, there's one collection that "owns" the hosts, and would be responsible for deleting a host when it dies. The other pointers to the host don't own it. If that's the case, then in C++ you'd normally handle this by having the "owning" collection contain a shared_ptr to the host, and the other collections contain weak_ptrs to the host. To use the object via a weak_ptr, you have to first convert that to a shared_ptr that you can dereference to get to the host itself. If, however, the object has been deleted, the attempt at converting the weak_ptr to a shared_ptr will fail, and you'll know the host is dead (and you can then delete your reference to it).

Related

DRY Principle: Angular2/Typescript and Java back end object duplication

I'm a Java developer but I've recently begun learning Angular2/Typescript. I've worked with Angular 1.x before so I'm not a complete noob :)
While working through a POC with a RESTful Spring Boot back end and Angular2 front end I noticed myself duplicating model objects on both sides a lot e.g.
Java Object
public class Car {
private Double numSeats;
private Double numDoors;
.....
}
Now in interest of Typescript and being strongly typed I'd create a similar object within my front end project:
export interface PersonalDetailsVO {
numSeats : number;
numDoors : number;
}
I'm duplicating the work and constantly violating the DRY (Don't Repeat Yourself) principle here.
I'm wondering is there a better way of going about this. I was thinking about code generation tools like jSweet but interested to hear if anyone else has come across the same issue and how they approached it.
There are two schools of thought on whether this is a violation of the DRY principle. If you're really, really sure that there's a natural mapping you would always apply to bind json in each language, then you could say that it is duplicate work; which is (at least part of) the thinking behind IDL-type languages in technologies like CORBA (but I'm showing my age).
OTOH maybe each system (the server, the client, an alternate client if anyone were to write one) should be free to independently define the internal representations of objects that is best suited to that system (given its language, what it plans to do, etc.).
In your example, the typescript certainly doesn't contain all of the information needed to define the Java "equivalent". ('number' could map to a lot of things; and the typescript says nothing about access modifiers...) Of course you can narrow that down by adopting conventions, but my point is it's not self-evident that there'd be a 1-to-1 mapping.
Maybe one language handles references more gracefully than another. Maybe one can't deal with circular references but the other can. Maybe one has reason to prefer a more flat view of the object. Maybe a lot of things.
All of that said, it certainly is true that if you modify the json structure of an object, and you're maintaining each system's internal representation independently, then you likely have to make code changes in multiple places to accommodate that single underlying change. And pragmatically, if that can be avoided it's a good thing.
So if you can come up with a code generator that processes the more expressive language's representation to create a representation for the less expressive language, and maybe at least use that by default, you may find it's not a bad thing for your project.

Why do we need getters?

I have read the stackoverflow page which discusses "Why use getters and setters?", I have been convinced by some of the reasons using a setter, for example: later validation, data encapsulation, etc. But what is the reason of using getters anyway? I don't see any harm of getting a value of a private field, or reasons to validation before you get the a field's value. Is it OK to never use a getter and always get a field's value using dot notation?
If a given field in a Java class be visible for reading (on the RHS of an expression), then it must also be possible to assign that field (on the LHS of an expression). For example:
class A {
int someValue;
}
A a = new A();
int value = a.someValue; // if you can do this (potentially harmless)
a.someValue = 10; // then you can also do this (bad)
Besides the above problem, a major reason for having a getter in a class is to shield the consumer of that class from implementation details. A getter does not necessarily have to simply return a value. It could return a value distilled from a Collection or something else entirely. By using a getter (and a setter), we free the consumer of the class from having to worry about the implementation changing over time.
I want to focus on practicalities, since I think you're at a point where you haven't seen the conceptual benefits line up just yet with the actual practice.
The obvious conceptual benefit is that setters and getters can be changed without impacting the outside world using those functions. Another Java-specific benefit is that all methods not marked as final are capable of being overriden, so you get the ability for subclasses to override the behavior as a bonus.
Overkill?
Yet you're probably at a point where you've heard these conceptual benefits before and it still sounds like overkill for your more daily scenarios. A difficult part of understanding software engineering practices is that they are generally designed to deal with very real world, large-scale codebases being managed by teams of developers. A lot of things are going to seem like overkill initially when you're just working on a small project of your own.
So let's get into some practical, real-world scenarios. I formerly worked in a very large-scale codebase. It a was low-level C codebase with a long legacy and sometimes barely a step above assembly, but many of the lessons I learned there translate to all kinds of languages.
Real-World Grief
In this codebase, we had a lot of bugs, and the majority of them related to state management and side effects. For example, we had cases where two fields of a structure were supposed to stay in sync with each other. The range of valid values for one field depended on the value of the other. Yet we ran into bugs where those two fields were out of sync. Unfortunately since they were just public variables with a very global scope ('global' should really be considered a degree with respect to the amount of code that can access a variable rather than an absolute), there were potentially tens of thousands of lines of code that could be the culprit.
As a simpler example, we had cases where the value of a field was never supposed to be negative, yet in our debugging sessions, we found negative values. Let's call this value that's never supposed to be negative, x. When we discovered the bugs resulting from x being negative, it was long after x was touched by anything. So we spent hours placing memory breakpoints and trying to find needles in a haystack by looking at all possible places that modified x in some way. Eventually we found and fixed the bug, but it was a bug that should have been discovered years earlier and should have been much less painful to fix.
Such would have been the case if large portions of the codebase weren't just directly accessing x and used functions like set_x instead. If that were the case, we could have done something as simple as this:
void set_x(int new_value)
{
assert(new_value >= 0);
x = new_value;
}
... and we would have discovered the culprit immediately and fixed it in a matter of minutes. Instead, we discovered it years after the bug was introduced and it took us meticulous hours of headaches to trace it down and fix.
Such is the price we can pay for ignoring engineering wisdom, and after dealing with the 10,000th issue which could have been avoided with a practice as simple as depending on functions rather than raw data throughout a codebase, if your hairs haven't all turned grey at that point, you're still generally not going to have a cheerful disposition.
The biggest value of getters and setters comes from the setters. It's the state manipulation that you generally want to control the most to prevent/detect bugs. The getter becomes a necessity simply as a result of requiring a setter to modify the data. Yet getters can also be useful sometimes when you want to exchange a raw state for a computation non-intrusively (by just changing one function's implementation), e.g.
Interface Stability
One of the most difficult things to appreciate earlier in your career is going to be interface stability (to prevent public interfaces from changing constantly). This is something that can only be appreciated with projects of scale and possibly compatibility issues with third parties.
When you're working on a small project on your own, you might be able to change the public definition of a class to your heart's content and rewrite all the code using it to update it with your changes. It won't seem like a big deal to constantly rewrite the code this way, as the amount of code using an interface might be quite small (ex: a few hundred lines of code using your class, and all code that you personally wrote).
When you work on a large-scale project and look down at millions of lines of code, changing the public definition of a widely-used class might mean that 100,000 lines of code need to be rewritten using that class in response. And a lot of that code won't even be your own code, so you have to intrusively analyze and fix other people's code and possibly collaborate with them closely to coordinate these changes. Some of these people may not even be on your team: they may be third parties writing plugins for your software or former developers who have moved on to other projects.
You really don't want to run into this scenario repeatedly, so designing public interfaces well enough to keep them stable (unchanging) becomes a key skill for your most central interfaces. If those interfaces are leaking implementation details like raw data, then the temptation to change them over and over is going to be a scenario you can face all the time.
So you generally want to design interfaces to focus on "what" they should do, not "how" they should do it, since the "how" might change a lot more often than the "what". For example, perhaps a function should append a new element to a list. However, you may want to swap out the list data structure it's using for another, or introduce a lock to make that function thread safe ("how" concerns). If these "how" concerns are not leaked to the public interface, then you can change the implementation of that class (how it's doing things) locally without affecting any of the existing code that is requesting it to do things.
You also don't want classes to do too much and become monolithic, since then your class variables will become "more global" (become visible to a lot more code even within the class's implementation) and it'll also be hard to settle on a stable design when it's already doing so much (the more classes do, the more they'll want to do).
Getters and setters aren't the best examples of such interface design, but they do avoid exposing those "how" details at least slightly better than a publicly exposed variable, and thus have fewer reasons to change (break).
Practical Avoidance of Getters/Setters
Is it OK to never use a getter and always get a field's value using dot notation?
This could sometimes be okay. For example, if you are implementing a tree structure and it utilizes a node class as a private implementation detail that clients never use directly, then trying too hard to focus on the engineering of this node class is probably going to start becoming counter-productive.
There your node class isn't a public interface. It's a private implementation detail for your tree. You can guarantee that it won't be used by anything more than the tree implementation, so there it might be overkill to apply these kinds of practices.
Where you don't want to ignore such practices is in the real public interface, the tree interface. You don't want to allow the tree to be misused and left in an invalid state, and you don't want an unstable interface which you're constantly tempted to change long after the tree is being widely used.
Another case where it might be okay is if you're just working on a scrap project/experiment as a kind of learning exercise, and you know for sure that the code you write is rather disposable and is never going to be used in any project of scale or grow into anything of scale.
Nevertheless, if you're very new to these concepts, I think it's a useful exercise even for your small scale projects to err on the side of using getters/setters. It's similar to how Mr. Miyagi got Daniel-San to paint the fence, wash the car, etc. Daniel-San finds it all pointless with his arms exhausted on top of that. Then Mr. Miyagi goes "hyah hyah hyoh hyah" throwing big punches and kicks, and using that indirect training, Daniel-San blocks all of them without realizing how he's even doing it.
In java you can't tell the compiler to allow read-only access to a public field from outside.
So exposing public fields opens the door to uncontroled modifications.
Fields are not polymorphic.
The alternative to a getter would be a public field; however, fields are not polymorphic.
This means that you cannot extend the class and "override" the field without introducing weird behaviour. Basically, the value you get will depend on how you refer to the field.
Furthermore, you can't include the field in an interface and you can't perform validation (that applies more to a setter).

Most efficient way to store 5 attributes

So I'm trying to store 5 attributes of an object, which are 5 different integers.
What would be the best way to store these? I was thinking of arrays, but arrays aren't flexible. I also need to be able to retrieve all 5 attributes, so arrays probably won't work well.
Here's some background if it helps: I am currently making a game similar to Terraria (or Minecraft in 2D).
I wanted to store where the object is on the map(x,y), where it is on the screen at the part of the map(x,y), and what type of object it is.
import java.awt.Point
public class MyClass {
private Point pointOnMap;
private Point pointOnScreen;
// ...
}
The Point class binds x & y values into a single object (which makes sense) and gives you useful, basic methods it sounds like you'll need, such as translate and distance. http://docs.oracle.com/javase/7/docs/api/java/awt/Point.html
It is not possible to predict what is the most efficient way to store the attributes without seeing all of your code. (And I for one don't want to :-)) Second, you haven't clearly explained what you are optimizing for. Speed? Memory usage? Minimization of GC pauses?
However, this smells of premature optimization. Wasting lost of time trying to optimize performance on something that hasn't been built, and without any evidence that the performance of this part the codebase is going to be significant.
My advice would be:
Pick a simple design and implement it; e.g. 5 private int variables with getters and setters. If that is inconvenient, then choose a more convenient API.
Complete the program.
Get it working.
Benchmark it. Does it run fast enough? If yes, stop.
Profile it. Pick the biggest performance hotspot and optimize that.
Rerun the benchmarking and profile to check that your optimization has made things faster. If yes, then "commit" it. If not then back it out.
Go to step 4.
I would suggest HashMap where key can be objectId-attributeName and value will be integer value as you have to do retrieval based on key. This will be O(1) operation

Creating Objects on the stack memory in java ?

This is just a simple theoretical question out of curiosity. I have always been like a java fan boy. But one thing makes me wonder why java does not provide mechanism for creating objects on the stack ? Wouldn't it be more efficient if i could just create small Point(int x,int y ) object on the stack instead of the heap like creating a structure on C# . Is there any special security reason behind this restriction in java ? :)
The strategy here is that instead of leaking this decision into the language, Java lets the JVM/Hotspot/JIT/runtime decide where and how it wants to allocate memory.
There is research going on to use "escape analysis" to figure out what objects don't actually need to go onto the heap and stack-allocate them instead. I am not sure if this has made it into a mainstrem JVM already. But if it does, it will be controlled by the runtime (thing -XX:something), not the developer.
The upside of this is that even old code can benefit from these future enhancements without itself being updated.
If you like to manually manage this (but still have the compiler check that it stays "safe"), take a look at Rust.
This will tentatively be coming to Java, there is no real ETA set for this so you could only hope it will come by Java 10.
The proposal is called Value Types and you can follow it in the mailing list of Project Valhalla.
I do not know if there were any prior reasons as to why it wasn't in the language in the first place, maybe originally it was thought of as unneeded or there was simply no time to implement this.
A common problem would be to initialize some global reference with an object created on the stack. When the method which created the object exits what do you point to?
That being said object are created on the stack in Java, it's just being done behind your back using the escape analysis which makes sure the above scenario doesn't occur.

Simulating Destructors in Clojure

Problem Statement
I have two machines, A and B, both running Clojure.
B has some in memory data structure.
A holds an object A_P which is a reference/pointer to some object B_O in B's memory.
Now, as long as A_P is NOT GC-ed by A, I do not want B_O GC-ed by B.
However, once A_P has been GC-ed by A (and nothing else in A referes to B_O, and nothing else in B refers to B_O), then I want B_O to be elegible to be GC-ed.
Solution in Languages with Destructors
In C++, this is easy -- I use destructors. When A_P gets GC-ed, A sends B a msg to decrement the number of external references to B_O, and when that's 0, and internal refernes to B_0 is also 0, then B_O gets GC-ed.
Solution in Java/Clojure?
Now, I know that Java does not have destructors. However, I'm wondering if Clojure has a way around this problem.
Thanks!
No good solution exists, without a real distributed garbage collector. Even in C++, you cannot do this safely, because you implemented reference counting and pretended it was a real garbage collector; but if two objects point to each other across the machine divide, and are both unreferenced locally, they still both have a nonzero reference count and cannot be collected.
No, Clojure (based on JVM, CLR) doesn't have the "C++ type destructors" because of the automatic memory management model of JVM. There are things like finalizers but it is recommended to not use them. Instead you should model your solution based on message passing mechanism rather then A machine holding "pointer/reference" to data in B. I know this answer is very high level because you haven't provide any specific problem details in your question. If you need more details about how to solve a particular problem please provide the complete context and I am sure someone will able to help you.
This is an inherently difficult problem: distributed garbage problem is really hard if not impossible to get right.
However you might just be able to make it work using Java finalisers and overriding the finalize() method. You can then implement a messaging technique similar to the one you describe for C++.
This will have issues in the more general case (it won't help you with circular references across machines as amalloy points out) and there are some other quirks to be aware of (mostly around your lack of control over exactly when the finaliser gets called) but you might be able to get it to work in your specific situation.
Assuming you're using a data structure like a ref or atom for holding data structure A somewhere inside it, you can use listeners for monitoring the state of that structure for removals of A, and those listeners can send appropriate message to B. clojure.data/diff could be really useful for finding the structures that were removed.
The other option would be to have, immediately after the A structure is dereferenced, the function responsible for doing so send the message. As part of this though, make sure that that code was actually responsible for the removal of A, and not some other update.

Categories