Most efficient way to store 5 attributes - java

So I'm trying to store 5 attributes of an object, which are 5 different integers.
What would be the best way to store these? I was thinking of arrays, but arrays aren't flexible. I also need to be able to retrieve all 5 attributes, so arrays probably won't work well.
Here's some background if it helps: I am currently making a game similar to Terraria (or Minecraft in 2D).
I wanted to store where the object is on the map(x,y), where it is on the screen at the part of the map(x,y), and what type of object it is.

import java.awt.Point
public class MyClass {
private Point pointOnMap;
private Point pointOnScreen;
// ...
}
The Point class binds x & y values into a single object (which makes sense) and gives you useful, basic methods it sounds like you'll need, such as translate and distance. http://docs.oracle.com/javase/7/docs/api/java/awt/Point.html

It is not possible to predict what is the most efficient way to store the attributes without seeing all of your code. (And I for one don't want to :-)) Second, you haven't clearly explained what you are optimizing for. Speed? Memory usage? Minimization of GC pauses?
However, this smells of premature optimization. Wasting lost of time trying to optimize performance on something that hasn't been built, and without any evidence that the performance of this part the codebase is going to be significant.
My advice would be:
Pick a simple design and implement it; e.g. 5 private int variables with getters and setters. If that is inconvenient, then choose a more convenient API.
Complete the program.
Get it working.
Benchmark it. Does it run fast enough? If yes, stop.
Profile it. Pick the biggest performance hotspot and optimize that.
Rerun the benchmarking and profile to check that your optimization has made things faster. If yes, then "commit" it. If not then back it out.
Go to step 4.

I would suggest HashMap where key can be objectId-attributeName and value will be integer value as you have to do retrieval based on key. This will be O(1) operation

Related

Efficiency and Instantiation - Java

I understand it might be hard to answer this question without knowing the details of the problem but I hope some one have encountered a similar situation before and could help me.
I am writing a simulator in Java and I have n neurons that communicate to each other during the simulation time. Each of these neurons have specific parameters and properties and I need to access and maybe manipulate their values during the simulation time.
I am wondering which of the following is the "right" choice:
Storing information in 1-D and 2-D arraylists - this means a lot of look ups and requires extra care to make sure information are linked properly.
Having one class with fields and methods required for a neuron and making different instances of it for every neuron (using the constructor to provide parameters specific to that neuron).
Basically, my question is where is the limit for making instances of a class? When does it become too many and inefficient? 100s? 1000s?
Let me know if I should explain more.
Appreciate any other suggestion as well.
Thank you.
Only memory is a limit for making instances of a class, there are no other performance related issues with having many instances of classes(*).
However, class instances do have some additional information stored with them so they come with more memory overhead than using arrays. For each instance of a class you can expect 16 bytes of overhead.
(*) In theory anyway, in practice you may encounter more GC overhead or have worse performance due to cache misses if the instances are not spread favorably in memory vs the array solution.

Why would data structures like queues and stacks be used if arrays are easier to use and more powerful?

This may be a silly question, but it's been bugging me for a while. Most programming languages have arrays (e.g. Java, C/C++, C#...ok Python has lists!) but in much of the literature I see, some data structures (such as stacks and queues) are treated as more basic as arrays. But since so many languages have such great support for arrays, why would anyone use a stack or queue? I realize that conceptually a data structure other than an array may better fit the model, but considering you may have to implement your own stack or queue it's a lot more work considering how fundamental arrays are.
One example I'm thinking of is from these notes on the Gale Shapley
algorithm Maintain a list of free men (in a stack or queue).
Wouldn't it be easier to just use an array and trust yourself to only look at the front/end of it?
I guess I'm asking, why would anyone bother using something like a stack or queue when most are implemented using arrays, and arrays are more powerful anyways?
There are several reasons. First of all, it's helpful if your implementation more closely matches the algorithm. That is, if your algorithm uses a stack it's useful if you can say, for example, stack.push(thing) rather than writing:
if (stack_index >= stack_array.length)
{
// have to extend the stack
....
}
stack_array[stack_index++] = thing
And you'd have to write that code every place you push something onto the stack. And you'd have to write multiple lines of code every time you wanted to pop from the stack.
I don't know about you, but I find it incredibly easy to make mistakes when I'm writing code like that.
Seems easier, much more clear, and way more reliable to encapsulate that functionality into a Stack object that you can then use as it's intended: with calls to push and pop methods.
Another benefit is that when you find yourself having to do a quick thread-safe stack, you can modify your Stack class to put locks around any code that changes the internal structure (i.e. the array), and any callers automatically have a thread-safe stack. If you were to address the array directly in your code, then you'd have to go to every place that you access the underlying array and add your lock statements. I'd give better than even odds that you'd make a mistake in there somewhere and then you'd have an interesting time tracking down that intermittent failure.
Arrays themselves aren't particularly powerful. They're flexible, but they have no smarts. We wrap behavior around arrays to limit what they can do so that we don't have to "trust ourselves" not to do stupid things, and also so that we can do the right things more easily. If I'm implementing an algorithm that uses a queue, then I'm going to be thinking in terms of a queue that has Enqueue and Dequeue operations rather than in terms of a linked list or an array that has head and tail indexes that I have to manage, etc.
Finally, if you write a Stack or Queue or similar data structure implementation, test it and prove it correct, you can use it over and over again in multiple projects without ever having to look at the code again. It just works. That's opposed to rolling your own with an array so that you have to debug it not just in every project you use a stack in, but you have the potential of screwing up every single push or pop.
To sum up, we create data structures rather than use raw arrays:
Because it's easier to think in terms of data structure operations rather than the mechanics of working with an array.
Code re-use: write once, debug, and then use it in multiple projects.
Simplifies code (stack.push(item) rather than multiple lines of array indexing).
Reduces potential for error.
Easier for the next guy to come by and understand what you did. "Oh, he's pushing items onto a stack."
Internally I'd say most of those classes are implemented with the help of arrays. But it would be tedious to use Arrays as Stacks or Queues. Arrays are fixed length things where you cannot insert stuff at arbitrary places. You would have to do much copying around of array elements, enlarging and shrinking the array or keep in mind what your head and tail positions are etc. The Stack and Queue classes do all this for you and you can just use the much more convenient push, pop, etc. methods.
Arrays are just one more type of data structure. They have specific use cases, just like any other.
All data structures have particular properties, e.g.
fixed vs variable size;
ordered vs unordered;
allows duplicates vs prohibits duplicates
covariant or not
can contain primitives or not
specific time complexity on insertion/removal/retrieval operations
iteration order
...
Whether you choose to use an array or any other data structure depends upon what you are trying to do, and whether that data structure possesses the properties you require.
And it is better to have simple data structures which do one thing well, than to attempt to have an uber data structure which does everything.
you can change the size of stacks and queues easier than the size from an array
thats is very difficult.
if you know how big your array should be, use an array. but if you don#t know it stacks and queues are the better choice.
Admirable that you would trust yourself so much, but when you're creating a project with several other developers (or even by yourself), you can't rely on trust.
Different data structures make it more obvious what the code is doing, it prevents you from doing wrong things (no matter how much you trust yourself), they can provide performance, concurrency or content guarantees and dozens of other things that simple arrays can't do or just aren't the best fit for.
Why would you have pockets when backpacks are invented?
I agree with #Vampire. Arrays provide instant access to any of it's elements if you have the index. Whereas stacks and queues give you the convenience of LIFO and FIFO orderings which allow you to implement many algorithms much easier. Also with the stacks and queues it's easier to add or remove elements since with array you have a limited memory at all times and your variables are in consecutive memory blocks you would have to move a lot of data and allocate/deallocate memory accordingly. Please check this link for further information.

Why do we need getters?

I have read the stackoverflow page which discusses "Why use getters and setters?", I have been convinced by some of the reasons using a setter, for example: later validation, data encapsulation, etc. But what is the reason of using getters anyway? I don't see any harm of getting a value of a private field, or reasons to validation before you get the a field's value. Is it OK to never use a getter and always get a field's value using dot notation?
If a given field in a Java class be visible for reading (on the RHS of an expression), then it must also be possible to assign that field (on the LHS of an expression). For example:
class A {
int someValue;
}
A a = new A();
int value = a.someValue; // if you can do this (potentially harmless)
a.someValue = 10; // then you can also do this (bad)
Besides the above problem, a major reason for having a getter in a class is to shield the consumer of that class from implementation details. A getter does not necessarily have to simply return a value. It could return a value distilled from a Collection or something else entirely. By using a getter (and a setter), we free the consumer of the class from having to worry about the implementation changing over time.
I want to focus on practicalities, since I think you're at a point where you haven't seen the conceptual benefits line up just yet with the actual practice.
The obvious conceptual benefit is that setters and getters can be changed without impacting the outside world using those functions. Another Java-specific benefit is that all methods not marked as final are capable of being overriden, so you get the ability for subclasses to override the behavior as a bonus.
Overkill?
Yet you're probably at a point where you've heard these conceptual benefits before and it still sounds like overkill for your more daily scenarios. A difficult part of understanding software engineering practices is that they are generally designed to deal with very real world, large-scale codebases being managed by teams of developers. A lot of things are going to seem like overkill initially when you're just working on a small project of your own.
So let's get into some practical, real-world scenarios. I formerly worked in a very large-scale codebase. It a was low-level C codebase with a long legacy and sometimes barely a step above assembly, but many of the lessons I learned there translate to all kinds of languages.
Real-World Grief
In this codebase, we had a lot of bugs, and the majority of them related to state management and side effects. For example, we had cases where two fields of a structure were supposed to stay in sync with each other. The range of valid values for one field depended on the value of the other. Yet we ran into bugs where those two fields were out of sync. Unfortunately since they were just public variables with a very global scope ('global' should really be considered a degree with respect to the amount of code that can access a variable rather than an absolute), there were potentially tens of thousands of lines of code that could be the culprit.
As a simpler example, we had cases where the value of a field was never supposed to be negative, yet in our debugging sessions, we found negative values. Let's call this value that's never supposed to be negative, x. When we discovered the bugs resulting from x being negative, it was long after x was touched by anything. So we spent hours placing memory breakpoints and trying to find needles in a haystack by looking at all possible places that modified x in some way. Eventually we found and fixed the bug, but it was a bug that should have been discovered years earlier and should have been much less painful to fix.
Such would have been the case if large portions of the codebase weren't just directly accessing x and used functions like set_x instead. If that were the case, we could have done something as simple as this:
void set_x(int new_value)
{
assert(new_value >= 0);
x = new_value;
}
... and we would have discovered the culprit immediately and fixed it in a matter of minutes. Instead, we discovered it years after the bug was introduced and it took us meticulous hours of headaches to trace it down and fix.
Such is the price we can pay for ignoring engineering wisdom, and after dealing with the 10,000th issue which could have been avoided with a practice as simple as depending on functions rather than raw data throughout a codebase, if your hairs haven't all turned grey at that point, you're still generally not going to have a cheerful disposition.
The biggest value of getters and setters comes from the setters. It's the state manipulation that you generally want to control the most to prevent/detect bugs. The getter becomes a necessity simply as a result of requiring a setter to modify the data. Yet getters can also be useful sometimes when you want to exchange a raw state for a computation non-intrusively (by just changing one function's implementation), e.g.
Interface Stability
One of the most difficult things to appreciate earlier in your career is going to be interface stability (to prevent public interfaces from changing constantly). This is something that can only be appreciated with projects of scale and possibly compatibility issues with third parties.
When you're working on a small project on your own, you might be able to change the public definition of a class to your heart's content and rewrite all the code using it to update it with your changes. It won't seem like a big deal to constantly rewrite the code this way, as the amount of code using an interface might be quite small (ex: a few hundred lines of code using your class, and all code that you personally wrote).
When you work on a large-scale project and look down at millions of lines of code, changing the public definition of a widely-used class might mean that 100,000 lines of code need to be rewritten using that class in response. And a lot of that code won't even be your own code, so you have to intrusively analyze and fix other people's code and possibly collaborate with them closely to coordinate these changes. Some of these people may not even be on your team: they may be third parties writing plugins for your software or former developers who have moved on to other projects.
You really don't want to run into this scenario repeatedly, so designing public interfaces well enough to keep them stable (unchanging) becomes a key skill for your most central interfaces. If those interfaces are leaking implementation details like raw data, then the temptation to change them over and over is going to be a scenario you can face all the time.
So you generally want to design interfaces to focus on "what" they should do, not "how" they should do it, since the "how" might change a lot more often than the "what". For example, perhaps a function should append a new element to a list. However, you may want to swap out the list data structure it's using for another, or introduce a lock to make that function thread safe ("how" concerns). If these "how" concerns are not leaked to the public interface, then you can change the implementation of that class (how it's doing things) locally without affecting any of the existing code that is requesting it to do things.
You also don't want classes to do too much and become monolithic, since then your class variables will become "more global" (become visible to a lot more code even within the class's implementation) and it'll also be hard to settle on a stable design when it's already doing so much (the more classes do, the more they'll want to do).
Getters and setters aren't the best examples of such interface design, but they do avoid exposing those "how" details at least slightly better than a publicly exposed variable, and thus have fewer reasons to change (break).
Practical Avoidance of Getters/Setters
Is it OK to never use a getter and always get a field's value using dot notation?
This could sometimes be okay. For example, if you are implementing a tree structure and it utilizes a node class as a private implementation detail that clients never use directly, then trying too hard to focus on the engineering of this node class is probably going to start becoming counter-productive.
There your node class isn't a public interface. It's a private implementation detail for your tree. You can guarantee that it won't be used by anything more than the tree implementation, so there it might be overkill to apply these kinds of practices.
Where you don't want to ignore such practices is in the real public interface, the tree interface. You don't want to allow the tree to be misused and left in an invalid state, and you don't want an unstable interface which you're constantly tempted to change long after the tree is being widely used.
Another case where it might be okay is if you're just working on a scrap project/experiment as a kind of learning exercise, and you know for sure that the code you write is rather disposable and is never going to be used in any project of scale or grow into anything of scale.
Nevertheless, if you're very new to these concepts, I think it's a useful exercise even for your small scale projects to err on the side of using getters/setters. It's similar to how Mr. Miyagi got Daniel-San to paint the fence, wash the car, etc. Daniel-San finds it all pointless with his arms exhausted on top of that. Then Mr. Miyagi goes "hyah hyah hyoh hyah" throwing big punches and kicks, and using that indirect training, Daniel-San blocks all of them without realizing how he's even doing it.
In java you can't tell the compiler to allow read-only access to a public field from outside.
So exposing public fields opens the door to uncontroled modifications.
Fields are not polymorphic.
The alternative to a getter would be a public field; however, fields are not polymorphic.
This means that you cannot extend the class and "override" the field without introducing weird behaviour. Basically, the value you get will depend on how you refer to the field.
Furthermore, you can't include the field in an interface and you can't perform validation (that applies more to a setter).

If classes all contain lots of useful class variables, will it have an impact on performances?

Whenever I write a new class, I use quite a ton of class variables to describe the class's properties, up to the point where when I go back to review the codes I've typed, I see 40s to 50s of class variables, regardless of whether they are public, protected, or private, they are all used prominently throughout the classes I've defined.
Even though, the class variables consists of mostly primitive variables, like booleans, integers, doubles, etc., I still have this uneasy feeling where that some of my classes with large amounts of class variables may have an impact on performances, however negligible they may be.
But being rational as possible, if I consider unlimited RAM size and unlimited Java class variables, a Java class may be an infinitely large block of memory in the RAM, which the first portion of the block contains the class variables partitions, and the rest of the block contains the addresses to the class methods within the Java class. With this amount of RAM, the performance for it is very nontrivial.
But that above isn't making my feelings any easier than said. If we were to consider limited RAM but unlimited Java class variables, what would be the result? What would really happen in an environment where performance matters?
And probably may get mentioned beforehand, I don't know if having lots of class variables counts as bad Java practice, when all of them are important, and all classes have been refactored.
Thanks in advance.
Performance has nothing to do with the number of fields an object has. Memory consumption is of course potentially affected, but if the variables are needed, you can't do much about it.
Don't worry too much about performance. Make your code simple, readable, maintainable, tested. Then, if you notice performance problems, measure and profile to see where they come from, and optimize where needed.
Maintainability and readability is affected by the number of fields an object has though. 40 to 50 fields is quite a lot of fields, and is probably an indication that your classes do too much on their own, and have too many responsibilities. Refactoring them to many, smaller subclasses, and using composition would probably be a good idea.
I hope I don't sound like an ass, but in my view having more than 10 properties in a class is usually a hint of a bad design and requires justification.
Performance wise, if you very often need all those properties, then you're going to be saving some memory, as each object also has a header. So intead of having 5-10 classes you put everyting into one and you save some bytes.
Depending on which garbage collector you use, having bigger objects can be more expensive to allocate (this is true for the CMS garbage collector, but not for the parallel one). More GC work = less time for your app to run.
Unless you're writing a high traffic, low latency application, the benefits of having less classes (and using less memory) is going to be completely overwhelmed by the extra effort needed for maintenance.
The biggest problem I see in having a class with a lot of variables is Thread safety - it is going to be really hard to reason about the invariants in such a case. Also reading/maintaining such a class is going to be really hard.
Of course if you make as much as you can fields immutable, that is going to be a lot better.
I try to go with : less is better, easier to maintain.
A basic principle we are always taught is to keep cohesion high (one class is focusing on one task) and coupling low (less interdependency among classes so that changes in one doesnot effect others).
While designing a system, I will believe the focus should be more on maintainable design, performance will take care of itself. I don't think there is fixed limit on number of variables a class can have as a good practice, as this will strictly depend on your requirement.
For example, if I have a requirement where the application suggest a course to student, and algorithm needs 50 inputs (scores, hobbies etc), it will not matter whether this data is available in one class or multiple, as the whole information needs to be loaded in the RAM for a faster execution.
I will again say, take care of your design, it is both harmful to keep unnecessary variables in a class (as it will load non-required information to RAM) or split into more classes than required (more references and hence pointer movement)
1. I always use this as a thumb of rule. A Class should have only One reason to Change, so It should Do only One Thing.
2. Keeping this in mind i take those variables which are needed to define this class's attributes.
3. I make sure that my class is following the Cohesive principle, where the Methods within the class reflects the Class name.
4. Now after sorting everything out, if i need some other variables to work-out my class, then i need to use them, i have no choice...Moreover after all these thinking and work going into creating a class, will be hardly effected by some additional variables.
Sometimes class variables are used as static final constants to store some default strings like product name, version, OS version, etc. Or even to store product specific settings like font size, type, etc. Those static variables can be kept at class level.
You can also use HashMap instead of simple class if you just want to store fields constants or like product setting that rarely change. That may help you speed you response time.
Two things I would like to mention :
1. All instance variables are stored in Heap area of RAM..
2. All static variables are stored in non Heap area(Method area to be specific).
Whatever be the type of variable(instance or static), ultimately all reside in RAM.
Now coming to your question. As far as instance variable is concerned, java's built-in Garbage collector will work, in most cases well and truly effectively, to keep freeing memory. However, static variables are not garbage collected.
If you are highly concerned with memory issues due to large number of variables in your class, you can resort to using Weak References instead of traditional strong reference.

java best practices in matrix/vector library

i have to write a simple vector/matrix library for a small geometry related project i'm working on. here's what i'm wondering.
when doing mathematical operations on vectors in a java environment, is it better practice to return a new instance of a vector or modify the state of the original.
i've seen it back and forth and would just like to get a majority input.
certain people say that the vectors should be immutable and static methods should be used to create new ones, others say that they should be mutable and normal methods should be used to modify their state. i've seen it in some cases where the object is immutable and normal methods are called which returns a new vector from the object without changing the state - this seems a little off to me.
i would just like to get a feel for if there is any best practice for this - i imagine it's something that's been done a million times and am really just wondering if there's a standard way to do this.
i noticed the apache commons math library returns a new vector every time from the original.
How important is performance going to be? Is vector arithmetic going to be a large component so that it affects the performance of overall system?
If it is not and there is going to be lot of concurrency then immutable vectors will be useful because they reduce concurrency issues.
If there are lot of mutations on vectors then the overhead of new objects that immutable vectors will require will become significant and it may be better to have mutable vectors and do the concurrency the hard way.
It depends. Generally speaking, immutability is better.
First and foremost, it is automatically threadsafe. It is easier to maintain and test.
That said, sometimes you need speed where creating new instances will take too much time.
(Note: If you're not 100% positive you need that amount of speed, you don't need it. Think high-frequency trading and real-time math-intensive applications. And even though, you should go simple first, and optimize later.)
As for static vs normal methods, following good OOP principles, you shouldn't have static methods. To create new vectors/matrices you can use the constructor.
Next, what's your backing structure? Your best bet is probably single-dimensional arrays of doubles for vectors and multi-dimensional arrays of doubles for matrices. This at least lets you stay relatively quick by using primitive objects.
If you get to the point that you need even more performance, you can add modifiers on your Vector/Matrix that can change the backing data. You could even decide that the dimensions are immutable but the contents are mutable which would give you some other safeties as well.

Categories