Java help for modelling a in memory distributed array matrix - java

I want to distribute a huge two dimensional array Object[][]. Mostly holding primitives but also String, Date and Color objects. I want to allow different machines to modify data in this matrix. My problem is that I need some help with the modelling.
As far as I can see there are only distributed maps available like hazelcalst or openhft. But maps are not fitting my needs exactly. So what I can do is store all rows (or all columns) of my array in a map. But when I act on a whole row/column it is blocked for all other threads even if they operate on another index. When I store every "cell" in a distibuted map the row/column index for the map key might me bigger than the whole array itself. Are there better tools or modelling approaches for this scenario? I would need a matrix where I could distribute data and lock just one single cell.

Related

Optimize removing process complexity

when I have an Array and I want to remove one value from it I need to shift the next element to lift but the idea is to do shifting one time when a n of null value in array.
Of course it is micro-optimisation, and ArrayList (maybe LinkedList) would be a production quality data structure for dynamic arrays.
Here you might keep an extra list of nulled entries. At a certain threshold one could do **System.arraycopy**s to remove the gaps. If there are many index based inserts too, you might opt for keeping gaps, maybe collecting small gaps together.
This is a traditional technique in editors for text.
For several data structures one might search through guava classes.
For instance write-on-copy data structures.
Or concurrency, compactifying in the background.
For a specific data structure & algorithm maybe someone else can give pointers.

Algorithms in checking overlapping genomic regions

I have two large list of genomic regions in the form of two bed files, and there are many tools help me check the overlap of the two list.
Any given region (one from list A, another from list B), as long as they overlap in any of their coordinates, they are called overlap. There are available tools to do that. But I wish to write an efficient algorithms that I can maintain a hash-table like structure in list A, and then I iterate all the regions in list B, and for each regions from list B I can use a quick algorithms to tell if some of the regions in list A overlap with this specific regions from list B.
I specifically need an efficient solutions since both lists are very large. Thanks very much.
One option would be to:
Create a 1-dimensional R-tree of the regions in one BED file. Insert a range for each exon.
For each region in the other BED file, search the R-tree for
intersections of each of that region's exons.
For Java, there are multiple implementations of R-trees. One I've used that supports 1-dimensional ranges is SIRtree, in the library JTS. It provides simple methods to insert ranges and search for intersections.
Any data structure represented in memory will be a scalability concern for sufficiently large BED files. You can address that by either increasing the amount of memory available to the VM (hardware and the -Xmx setting) or by representing your data structure on disk.

Array of Structs are always faster than Structs of arrays?

I was wondering if the data layout Structs of Arrays (SoA) is always faster than an Array of Structs (AoS) or Array of Pointers (AoP) for problems with inputs that only fits in RAM programmed in C/JAVA.
Some days ago I was improving the performance of a Molecular Dynamic algorithm (in C), summarizing in this algorithm it is calculated the force interaction among particles based on their force and position.
Original the particles were represented by a struct containing 9 different doubles, 3 for particles forces (Fx,Fy,Fz) , 3 for positions and 3 for velocity. The algorithm had an array containing pointers to all the particles (AoP). I decided to change the layout from AoP to SoA to improve the cache use.
So, now I have a Struct with 9 array where each array stores forces, velocity and positions (x,y,z) of each particle. Each particle is accessed by it own array index.
I had a gain in performance (for an input that only fits in RAM) of about 1.9x, so I was wondering if typically changing from AoP or AoS to SoA it will always performance better, and if not in which types of algorithms this do not occurs.
Much depends of how useful all fields are. If you have a data structure where using one fields means you are likely to use all of them, then an array of struct is more efficient as it keeps together all the things you are likely to need.
Say you have time series data where you only need a small selection of the possible fields you have. You might have all sorts of data about an event or point in time, but you only need say 3-5 of them. In this case a structure of arrays is more efficient because a) you don't need to cache the fields you don't use b) you often access values in order i.e. caching a field, its next value and its next is useful.
For this reason, time-series information is often stored as a collection of columns.
This will depend on how exactly you access the data.
Try to imagine, what exactly happens in the hardware when you access your data, in either SoA or AoS.
To reason about your question, you must consider following things -
If the cache is absent, the performance should be the same, assuming that memory access latency is equal for all the elements of the data.
Now with the cache, if you access consecutive address locations, definitely you will get performance improvement. This is exactly valid in your case. When you have AoS, The locations are not consecutive in the memory, so you must lose some performance there.
You must be accessing in for loops your data like for(int i=0;i<1000000;i++) Fx[i] = 0. So if the data is huge in quantity, you will easily see the small performance benefits. If your data was small, this would not matter much.
Finally, you also don't know about the DRAM that you are using. It will have some benefits when you access consecutive data. For example to understand why it is like that you can refer to wiki.

Game Interface Design

So I'm making a game similar to Age of Empires 2. So far, what I've been doing is using an Applet and making an array of tiles (grid) to contain units and buildings. I feel that in the long run this is going to be inefficient. I'm sure there is a better way to do this, perhaps using GUI, but I'm not a very experienced programmer yet so I've resorted to this more primitive method. Is there a better, more efficient way to have tiles that contain characters and buildings on the map?
Thanks.
This is pretty much how I want it to look like:
An array of tiles should be fine - this is what is usually used to represent the map in a 2D game.
Note that you will probably want to distinguish between:
Map terrain tiles, where each square contains just one tile
Units / buildings / other game objects, where each square might contain multiple objects
For the terrain a single array is fine, with one tile in each position.
For the other objects, you need a way to store multiple objects in each square. An array with an ArrayList in each square can work - though you will probably want to leave nulls in all the squares that don't have any contents to avoid creating a lot of empty ArrayLists.
It's also useful to create a global collection of game objects so that you can iterate over all game objects and/or find a speciifc object quickly without searching the whole map. I typically use something like a HashMap for this, where the integer key is a unique ID for each game object.
It depends on what you consider to be the 'inefficient' part. If most of the array elements are going to be containing data, and you need to be able to look up that data quickly and easily, than arrays are certainly the best way to go. They're easy to understand, and the logic mikera described for storing data is a very good way to get started.
There are data structures that very big tile based games use for storing data. Most of those are optimized for 'sparse' graphs at the like. A sparse graph is one where most of the locations are empty, but a few of them have content. Very big world maps often have this qualit. If you're curious about these scenarios, then you should google "quad tree tiling".
But in general, if you're still learning development, stick to the array. Premature optimization is the root of all evil. :)

direct access to vector elements similar to arrays

I'm currently creating a tile based game, where elements of the games are placed in four different vectors (since there are multiple game objects with different properties, hence stored in different vectors).
These game elements themselves contain x and y coordinates similar to how they are stored in a two dimensional array. i was wondering if there was a way to access these vector elements similar to two dimensional array access (currently i am implementing an for loop to cycle the elements while comparing its coordinates).
This kinda sucks when i need to refresh my display at every game cycle (since the large number of comparisons and loops).
I'm implementing this in java btw.
My recommendation would be to think "object-oriented": create a class named Board or Grid or whatever fits that encapsulates that implementation detail of choosing between 2D array or Vector of Vectors. Add a method that lets you return a board token for a given (i, j) index into the board.
Don't use Vector, use ArrayList.
If you have very large datas, see perhaps buffers, for instance IntBuffer.
Three Ideas:
You could use a HashMap with the coordinates as key and your elements as value. This is faster than cycling through the vector and lightweight for memory.
You could store null instead of elements at empty coordinates. This way you can access each stored memory by its coordinates.Fastest but memory intensive way.
A speed up for what you currently do: Sort your elements by their coordinates once and then use binary search to find them in the vector.

Categories