java best practices in matrix/vector library

java best practices in matrix/vector library - java

i have to write a simple vector/matrix library for a small geometry related project i'm working on. here's what i'm wondering.
when doing mathematical operations on vectors in a java environment, is it better practice to return a new instance of a vector or modify the state of the original.
i've seen it back and forth and would just like to get a majority input.
certain people say that the vectors should be immutable and static methods should be used to create new ones, others say that they should be mutable and normal methods should be used to modify their state. i've seen it in some cases where the object is immutable and normal methods are called which returns a new vector from the object without changing the state - this seems a little off to me.
i would just like to get a feel for if there is any best practice for this - i imagine it's something that's been done a million times and am really just wondering if there's a standard way to do this.
i noticed the apache commons math library returns a new vector every time from the original.

How important is performance going to be? Is vector arithmetic going to be a large component so that it affects the performance of overall system?
If it is not and there is going to be lot of concurrency then immutable vectors will be useful because they reduce concurrency issues.
If there are lot of mutations on vectors then the overhead of new objects that immutable vectors will require will become significant and it may be better to have mutable vectors and do the concurrency the hard way.

It depends. Generally speaking, immutability is better.
First and foremost, it is automatically threadsafe. It is easier to maintain and test.
That said, sometimes you need speed where creating new instances will take too much time.
(Note: If you're not 100% positive you need that amount of speed, you don't need it. Think high-frequency trading and real-time math-intensive applications. And even though, you should go simple first, and optimize later.)
As for static vs normal methods, following good OOP principles, you shouldn't have static methods. To create new vectors/matrices you can use the constructor.
Next, what's your backing structure? Your best bet is probably single-dimensional arrays of doubles for vectors and multi-dimensional arrays of doubles for matrices. This at least lets you stay relatively quick by using primitive objects.
If you get to the point that you need even more performance, you can add modifiers on your Vector/Matrix that can change the backing data. You could even decide that the dimensions are immutable but the contents are mutable which would give you some other safeties as well.

Related

Why would data structures like queues and stacks be used if arrays are easier to use and more powerful?

This may be a silly question, but it's been bugging me for a while. Most programming languages have arrays (e.g. Java, C/C++, C#...ok Python has lists!) but in much of the literature I see, some data structures (such as stacks and queues) are treated as more basic as arrays. But since so many languages have such great support for arrays, why would anyone use a stack or queue? I realize that conceptually a data structure other than an array may better fit the model, but considering you may have to implement your own stack or queue it's a lot more work considering how fundamental arrays are.
One example I'm thinking of is from these notes on the Gale Shapley
algorithm Maintain a list of free men (in a stack or queue).
Wouldn't it be easier to just use an array and trust yourself to only look at the front/end of it?
I guess I'm asking, why would anyone bother using something like a stack or queue when most are implemented using arrays, and arrays are more powerful anyways?

There are several reasons. First of all, it's helpful if your implementation more closely matches the algorithm. That is, if your algorithm uses a stack it's useful if you can say, for example, stack.push(thing) rather than writing:
if (stack_index >= stack_array.length)
{
// have to extend the stack
....
}
stack_array[stack_index++] = thing
And you'd have to write that code every place you push something onto the stack. And you'd have to write multiple lines of code every time you wanted to pop from the stack.
I don't know about you, but I find it incredibly easy to make mistakes when I'm writing code like that.
Seems easier, much more clear, and way more reliable to encapsulate that functionality into a Stack object that you can then use as it's intended: with calls to push and pop methods.
Another benefit is that when you find yourself having to do a quick thread-safe stack, you can modify your Stack class to put locks around any code that changes the internal structure (i.e. the array), and any callers automatically have a thread-safe stack. If you were to address the array directly in your code, then you'd have to go to every place that you access the underlying array and add your lock statements. I'd give better than even odds that you'd make a mistake in there somewhere and then you'd have an interesting time tracking down that intermittent failure.
Arrays themselves aren't particularly powerful. They're flexible, but they have no smarts. We wrap behavior around arrays to limit what they can do so that we don't have to "trust ourselves" not to do stupid things, and also so that we can do the right things more easily. If I'm implementing an algorithm that uses a queue, then I'm going to be thinking in terms of a queue that has Enqueue and Dequeue operations rather than in terms of a linked list or an array that has head and tail indexes that I have to manage, etc.
Finally, if you write a Stack or Queue or similar data structure implementation, test it and prove it correct, you can use it over and over again in multiple projects without ever having to look at the code again. It just works. That's opposed to rolling your own with an array so that you have to debug it not just in every project you use a stack in, but you have the potential of screwing up every single push or pop.
To sum up, we create data structures rather than use raw arrays:
Because it's easier to think in terms of data structure operations rather than the mechanics of working with an array.
Code re-use: write once, debug, and then use it in multiple projects.
Simplifies code (stack.push(item) rather than multiple lines of array indexing).
Reduces potential for error.
Easier for the next guy to come by and understand what you did. "Oh, he's pushing items onto a stack."

Internally I'd say most of those classes are implemented with the help of arrays. But it would be tedious to use Arrays as Stacks or Queues. Arrays are fixed length things where you cannot insert stuff at arbitrary places. You would have to do much copying around of array elements, enlarging and shrinking the array or keep in mind what your head and tail positions are etc. The Stack and Queue classes do all this for you and you can just use the much more convenient push, pop, etc. methods.

Arrays are just one more type of data structure. They have specific use cases, just like any other.
All data structures have particular properties, e.g.
fixed vs variable size;
ordered vs unordered;
allows duplicates vs prohibits duplicates
covariant or not
can contain primitives or not
specific time complexity on insertion/removal/retrieval operations
iteration order
...
Whether you choose to use an array or any other data structure depends upon what you are trying to do, and whether that data structure possesses the properties you require.
And it is better to have simple data structures which do one thing well, than to attempt to have an uber data structure which does everything.

you can change the size of stacks and queues easier than the size from an array
thats is very difficult.
if you know how big your array should be, use an array. but if you don#t know it stacks and queues are the better choice.

Admirable that you would trust yourself so much, but when you're creating a project with several other developers (or even by yourself), you can't rely on trust.
Different data structures make it more obvious what the code is doing, it prevents you from doing wrong things (no matter how much you trust yourself), they can provide performance, concurrency or content guarantees and dozens of other things that simple arrays can't do or just aren't the best fit for.
Why would you have pockets when backpacks are invented?

I agree with #Vampire. Arrays provide instant access to any of it's elements if you have the index. Whereas stacks and queues give you the convenience of LIFO and FIFO orderings which allow you to implement many algorithms much easier. Also with the stacks and queues it's easier to add or remove elements since with array you have a limited memory at all times and your variables are in consecutive memory blocks you would have to move a lot of data and allocate/deallocate memory accordingly. Please check this link for further information.

Most efficient way to store 5 attributes

So I'm trying to store 5 attributes of an object, which are 5 different integers.
What would be the best way to store these? I was thinking of arrays, but arrays aren't flexible. I also need to be able to retrieve all 5 attributes, so arrays probably won't work well.
Here's some background if it helps: I am currently making a game similar to Terraria (or Minecraft in 2D).
I wanted to store where the object is on the map(x,y), where it is on the screen at the part of the map(x,y), and what type of object it is.

import java.awt.Point
public class MyClass {
private Point pointOnMap;
private Point pointOnScreen;
// ...
}
The Point class binds x & y values into a single object (which makes sense) and gives you useful, basic methods it sounds like you'll need, such as translate and distance. http://docs.oracle.com/javase/7/docs/api/java/awt/Point.html

It is not possible to predict what is the most efficient way to store the attributes without seeing all of your code. (And I for one don't want to :-)) Second, you haven't clearly explained what you are optimizing for. Speed? Memory usage? Minimization of GC pauses?
However, this smells of premature optimization. Wasting lost of time trying to optimize performance on something that hasn't been built, and without any evidence that the performance of this part the codebase is going to be significant.
My advice would be:
Pick a simple design and implement it; e.g. 5 private int variables with getters and setters. If that is inconvenient, then choose a more convenient API.
Complete the program.
Get it working.
Benchmark it. Does it run fast enough? If yes, stop.
Profile it. Pick the biggest performance hotspot and optimize that.
Rerun the benchmarking and profile to check that your optimization has made things faster. If yes, then "commit" it. If not then back it out.
Go to step 4.

I would suggest HashMap where key can be objectId-attributeName and value will be integer value as you have to do retrieval based on key. This will be O(1) operation

If classes all contain lots of useful class variables, will it have an impact on performances?

Whenever I write a new class, I use quite a ton of class variables to describe the class's properties, up to the point where when I go back to review the codes I've typed, I see 40s to 50s of class variables, regardless of whether they are public, protected, or private, they are all used prominently throughout the classes I've defined.
Even though, the class variables consists of mostly primitive variables, like booleans, integers, doubles, etc., I still have this uneasy feeling where that some of my classes with large amounts of class variables may have an impact on performances, however negligible they may be.
But being rational as possible, if I consider unlimited RAM size and unlimited Java class variables, a Java class may be an infinitely large block of memory in the RAM, which the first portion of the block contains the class variables partitions, and the rest of the block contains the addresses to the class methods within the Java class. With this amount of RAM, the performance for it is very nontrivial.
But that above isn't making my feelings any easier than said. If we were to consider limited RAM but unlimited Java class variables, what would be the result? What would really happen in an environment where performance matters?
And probably may get mentioned beforehand, I don't know if having lots of class variables counts as bad Java practice, when all of them are important, and all classes have been refactored.
Thanks in advance.

Performance has nothing to do with the number of fields an object has. Memory consumption is of course potentially affected, but if the variables are needed, you can't do much about it.
Don't worry too much about performance. Make your code simple, readable, maintainable, tested. Then, if you notice performance problems, measure and profile to see where they come from, and optimize where needed.
Maintainability and readability is affected by the number of fields an object has though. 40 to 50 fields is quite a lot of fields, and is probably an indication that your classes do too much on their own, and have too many responsibilities. Refactoring them to many, smaller subclasses, and using composition would probably be a good idea.

I hope I don't sound like an ass, but in my view having more than 10 properties in a class is usually a hint of a bad design and requires justification.
Performance wise, if you very often need all those properties, then you're going to be saving some memory, as each object also has a header. So intead of having 5-10 classes you put everyting into one and you save some bytes.
Depending on which garbage collector you use, having bigger objects can be more expensive to allocate (this is true for the CMS garbage collector, but not for the parallel one). More GC work = less time for your app to run.
Unless you're writing a high traffic, low latency application, the benefits of having less classes (and using less memory) is going to be completely overwhelmed by the extra effort needed for maintenance.

The biggest problem I see in having a class with a lot of variables is Thread safety - it is going to be really hard to reason about the invariants in such a case. Also reading/maintaining such a class is going to be really hard.
Of course if you make as much as you can fields immutable, that is going to be a lot better.
I try to go with : less is better, easier to maintain.

A basic principle we are always taught is to keep cohesion high (one class is focusing on one task) and coupling low (less interdependency among classes so that changes in one doesnot effect others).
While designing a system, I will believe the focus should be more on maintainable design, performance will take care of itself. I don't think there is fixed limit on number of variables a class can have as a good practice, as this will strictly depend on your requirement.
For example, if I have a requirement where the application suggest a course to student, and algorithm needs 50 inputs (scores, hobbies etc), it will not matter whether this data is available in one class or multiple, as the whole information needs to be loaded in the RAM for a faster execution.
I will again say, take care of your design, it is both harmful to keep unnecessary variables in a class (as it will load non-required information to RAM) or split into more classes than required (more references and hence pointer movement)

1. I always use this as a thumb of rule. A Class should have only One reason to Change, so It should Do only One Thing.
2. Keeping this in mind i take those variables which are needed to define this class's attributes.
3. I make sure that my class is following the Cohesive principle, where the Methods within the class reflects the Class name.
4. Now after sorting everything out, if i need some other variables to work-out my class, then i need to use them, i have no choice...Moreover after all these thinking and work going into creating a class, will be hardly effected by some additional variables.

Sometimes class variables are used as static final constants to store some default strings like product name, version, OS version, etc. Or even to store product specific settings like font size, type, etc. Those static variables can be kept at class level.
You can also use HashMap instead of simple class if you just want to store fields constants or like product setting that rarely change. That may help you speed you response time.

Two things I would like to mention :
1. All instance variables are stored in Heap area of RAM..
2. All static variables are stored in non Heap area(Method area to be specific).
Whatever be the type of variable(instance or static), ultimately all reside in RAM.
Now coming to your question. As far as instance variable is concerned, java's built-in Garbage collector will work, in most cases well and truly effectively, to keep freeing memory. However, static variables are not garbage collected.
If you are highly concerned with memory issues due to large number of variables in your class, you can resort to using Weak References instead of traditional strong reference.

When to use List<Long> instead of long[]?

There's something I really don't understand: a lot (see my comment) of people complain that Java isn't really OO because you still have access to primitive and primitive arrays. Some people go as far as saying that these should go away...
However I don't get it... Could you do efficiently things like signal processing (say write an FFT, for starters), writing efficient encryption algorithms, writing fast image manipulation libraries, etc. in Java if you hadn't access to, say, int[] and long[]?
Should I start writing my Java software by using List<Long> instead of long[]?
If the answer is "simply use higher-level libraries doing what you need" (for example, say, signal processing), then how are these libraries supposed to be written?

I personally use List most of the times, because it gives you a lot of convenience. You can also have concurrent collections, but not concurrent raw arrays.
Almost the only situation I use raw arrays is when I'm reading a large chunk of binary data, like image processing. I'm concerned instantiating e.g.Byte objects 100M times, though I have to confess I never tried working with that huge Byte list. I noticed when you have something like a 100KB file, List<Byte> works ok.
Most of the image processing examples etc. use array as well, so in this field it's actually more convenient to use raw arrays.
So in conclusion, my practical answer to this is
Use wrappers unless you are
Working with a very large array
like length > 10M (I'm too lazy to
write a benchmark!),
Working in a field
where many examples or people prefer
raw arrays (e.g. network
programming, image processing),
You found out there is a significant
performance gain by changing to raw arrays, by doing
experiments.
If for whatever
reason it's easier to work with raw
arrays on that problem for you.

In high performance computing, arrays of objects (as well as primitives) are essential as they map more robustly onto the underlying CPU architecture and behave more predictably for things such as cache access and garbage collection. With such techniques, Java is being used very successfully in areas where the received wisdom is that the language is not suitable.
However, if your goal is solely to write code that is highly maintainable and provably self consistent, then the higher level constructs are the obvious way to go. In your direct comparison, the List object hides the issue of memory allocation, growing your list and so on, as well as providing (in various implementations) additional facilities such as particular access patterns like stacks or queues. The use of generics also allows you to carry out refactoring with a far greater level of confidence and the full support of your IDE and toolchain.
An enlightened developer should make the appropriate choice for the use case they are approaching. Suggesting that a language is not "OO enough" because it allows such choices would lead me to suspect that the person either doesn't trust that other developers are as smart as they are or has not had a particularly wide experience of different application domains.

It's a judgment call, really. Lists tend to play better with generic libraries and have stuff like add, contains, etc, while arrays generally are faster and have built-in language support and can be used as varargs. Select whatever you find serves your purpose better.

Okay.
You need to know the size of an array at the time that it is created, but you cannot change its size after it has been created. But, a list can grow dynamically after it has been created, and it has the .Add() function to do that.
Have you gone through this link ?
A nice comparison of Arrays vs List.
Array or List in Java. Which is faster ?
List v/s Array

Java: multi-threaded maps: how do the implementations compare?

I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice.
The options I know are:
HashMap. Disastrously un-thread safe.
ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance.
Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives.
Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe.
Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of?
Thanks in advance for your input!

Collections.synchronizedMap() simply makes all the Map methods synchronized.
ConcurrentMap is really the interface you want and there are several implementations (eg ConcurrentHashMap, ConcurrentSkipList). It has several operations that Map doesn't that are important for threadsafe operations. Plus it is more granular than a synchronized Map as an operation will only lock a slice of the backing data structure rather than the entire thing.

I have no experience of the following, but I worked with a project once who swore by Javolution for real time and memory sensitive tasks.
I notice in the API there is FastMap that claims to be thread safe. As I say, I've no idea if it's any good for you, but worth a look:
API for FastMap
Javolution Home

Google Collection's MapMaker seems like it can do the job too.

Very surprising that it has a 2k foot print!! How about making ConcurrentHashMap's concurrency setting lower (e.g. 2-3), and optimizing its initial size (= make smaller).
I don't know where that memory consumption is coming from, but maybe it has something to do with maintaining striped locks. If you lower the concurrency setting, it will have less.
If you want good performance with out-of-the-box thread safety, ConcurrentHashMap is really nice.

Well, there's a spruced-up Colt in Apache Mahout. It's still not in the current business. What's wrong with protecting the code with a synchronized block? Are you expecting some devilishly complex scheme that hold locks for smaller granularity than put or get?
If you can code one, please contribute it to Mahout.

It's worth taking a look at the persistent hash maps in Clojure.
These are immutable, thread safe data structures with performance comparable to classic Java HashMaps. You'd obviously need to wrap them if you want a mutable map, but that shouldn't be difficult.
http://clojure.org/data_structures

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.