Sliding Window: Implementation and Performance (Java)

Sliding Window: Implementation and Performance (Java) - java

I want to implement a very simple sliding window. In other words, I will have some kind of list with objects inserted from the right end of that list and dropped from the left end. In every insertion, the previous objects are left-shifted by one index. When the list get filled with objects, in every insertion from the right end an object will be dropped from the left end (and the previous objects of course will be left-shifted by one index, as usual).
I had in mind either a LinkedList or an ArrayDeque - probably the latter is a better choise, since as far as I know both inserting AND removing to/from either end is constant effort O(1) for an ArrayDeque, which is not the case for a LinkedList. Is that right?
Moreover, I would like to ask the following: Left-shifting all the previous objects stored in the sliding window when I insert a new object is processing-intensive for a large sliding window with 100,000 or even 1,000,000 objects as in my case. Is there any other data structure which might perform better in my application?
NOTE: I use the term "sliding window" for what I want to implement, maybe there is some other term that describes it better, but I think is clear what I want to do from the above description.

ArrayDeque does what you want. It doesn't move the elements around. It moves the index of where the start and the end is. When you add an element, the end counter moves and when you remove an element, the start counter moves.
One advantage of ArrayDeque is that it can use less memory and does create garbage. On the down side it has a fixed maximum size. LinkedList grows and shrinks.
BTW If you want a light weight sliding window or the average of some values, an exponentially weighted moving average is much cheaper as you only need to record two values, the previous and last time.
e.g
double last = 0;
long lastTime = 0;
double halfLife = 60 * 1000; // 60 seconds for example.
public static double ewma(double sample, long time) {
double alpha = Math.exp((lastTime - time) / halfLife);
lastTime = time;
return last = sample * alpha + last * (1 - alpha);
}
or you can approximate this to avoid calling Math.exp with
public static double ewma(double sample, long time) {
long delay = time - lastTime
double alpha = delay >= halfLife ? 1.0 : delta / halfLife;
lastTime = time;
return last = sample * alpha + last * (1 - alpha);
}
This is many times faster and for short intervals gives much the same result.

Are you talking about a Queue? Take a look at java.util.LinkedList implementation, as it implements the Queue interface. Also LinkedList's both push and pop complexity is O(1), but get's is O(N).
Edit: This is the core of LinkedList's add method:
Link<ET> next = link.next;
Link<ET> newLink = new Link<ET>(object, link, next);
link.next = newLink;
next.previous = newLink;
link = newLink;
lastLink = null;
pos++;
expectedModCount++;
list.size++;
list.modCount++;

Related

Why does ArrayList seriously outperform LinkedList? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I did some research and wrote the following article: http://www.heavyweightsoftware.com/blog/linkedlist-vs-arraylist/ and wanted to post a question here.
class ListPerformanceSpec extends Specification {
def "Throwaway"() {
given: "A Linked List"
List<Integer> list
List<Integer> results = new LinkedList<>()
when: "Adding numbers"
Random random = new Random()
//test each list 100 times
for (int ix = 0; ix < 100; ++ix) {
list = new LinkedList<>()
LocalDateTime start = LocalDateTime.now()
for (int jx = 0; jx < 100000; ++jx) {
list.add(random.nextInt())
}
LocalDateTime end = LocalDateTime.now()
long diff = start.until(end, ChronoUnit.MILLIS)
results.add(diff)
}
then: "Should be equal"
true
}
def "Linked list"() {
given: "A Linked List"
List<Integer> list
List<Integer> results = new LinkedList<>()
when: "Adding numbers"
Random random = new Random()
//test each list 100 times
for (int ix = 0; ix < 100; ++ix) {
list = new LinkedList<>()
LocalDateTime start = LocalDateTime.now()
for (int jx = 0; jx < 100000; ++jx) {
list.add(random.nextInt())
}
long total = 0
for (int jx = 0; jx < 10000; ++jx) {
for (Integer num : list) {
total += num
}
total = 0
}
LocalDateTime end = LocalDateTime.now()
long diff = start.until(end, ChronoUnit.MILLIS)
results.add(diff)
}
then: "Should be equal"
System.out.println("Linked list:" + results.toString())
true
}
def "Array list"() {
given: "A Linked List"
List<Integer> list
List<Integer> results = new LinkedList<>()
when: "Adding numbers"
Random random = new Random()
//test each list 100 times
for (int ix = 0; ix < 100; ++ix) {
list = new ArrayList<>()
LocalDateTime start = LocalDateTime.now()
for (int jx = 0; jx < 100000; ++jx) {
list.add(random.nextInt())
}
long total = 0
for (int jx = 0; jx < 10000; ++jx) {
for (Integer num : list) {
total += num
}
total = 0
}
LocalDateTime end = LocalDateTime.now()
long diff = start.until(end, ChronoUnit.MILLIS)
results.add(diff)
}
then: "Should be equal"
System.out.println("Array list:" + results.toString())
true
}
}
Why does ArrayList outperform LinkedList by 28% for sequential access when LinkedList should be faster?
My question is different from When to use LinkedList over ArrayList? because I'm not asking when to choose it, but why it's faster.

Array-based lists, as Java ArrayList, use much less memory for the same data amount than link-based lists (LinkedList), and this memory is organized sequentially. This essentially decreases CPU cache trashing with side data. As soon as access to RAM requires 10-20 times more delay than L1/L2 cache access, this is causing sufficient time difference.
You can read more about these cache issues in books like this one, or similar resources.
OTOH, link-based lists outperform array-based ones in operation like insering to middle of list or deleting there.
For a solution that have both memory economy (so, fast iterating) and fast inserting/deleting, one should look at combined approaches, as in-memory B⁺-trees, or array of array lists with proportionally increased sizes.

From LinkedList source code:
/**
* Appends the specified element to the end of this list.
*
* <p>This method is equivalent to {#link #addLast}.
*
* #param e element to be appended to this list
* #return {#code true} (as specified by {#link Collection#add})
*/
public boolean add(E e) {
linkLast(e);
return true;
}
/**
* Links e as last element.
*/
void linkLast(E e) {
final Node<E> l = last;
final Node<E> newNode = new Node<>(l, e, null);
last = newNode;
if (l == null)
first = newNode;
else
l.next = newNode;
size++;
modCount++;
}
From ArrayList source code:
/**
* Appends the specified element to the end of this list.
*
* #param e element to be appended to this list
* #return <tt>true</tt> (as specified by {#link Collection#add})
*/
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
So linked list has to create new node for each element added, while array list does not. ArrayList does not reallocate/resize for each new element, so most of time array list simply set object in array and increment size, while linked list does much more work.
You also commented:
When I wrote a linked list in college, I allocated blocks at a time and then farmed them out.
I do not think this would work in Java. You cannot do pointer tricks in Java, so you would have to allocate a lot of small arrays, or create empty nodes ahead. In both cases overhead would probably be a bit higher.

Why does ArrayList outperform LinkedList by 28% for sequential access when LinkedList should be faster?
You're assuming that, but don't provide anything to back it up. But it's not really a great surprise. An ArrayList has an array as the underlying data store. Accessing this sequentially is extremely fast, because you know exactly where every element is going to be. The only slowdown comes when the array grows beyond a certain size and needs to be expanded, but that can be optimised.
The real answer would probably be: check the Java source code, and compare the implementations of ArrayList and LinkedList.

One explanation is that your base assumption (that multiplication is slower than memory fetches) is questionable.
Based on this document, a AMD Bulldozer takes 1 clock cycles to perform a 64 bit integer multiply instruction (register x register) with 6 cycles of latency1. By contrast, a memory to register load takes 1 clock cycle with 4 cycles of latency. But that assumes that you get a cache hit for the memory fetch. If you get a cache miss, you need to add a number of cycles. (20 clock cycles for an L2 cache miss, according to this source.)
Now that is just one architecture, and others will vary. And we also need to consider other issues, like constraints on the number of multiplications that can be overlapped, and how well the compiler can organize the instructions to get them minimize instruction dependencies. But the fact remains that for a typical modern pipelined chip architecture, the CPU can execute integer multiplies as fast as it can execute memory to register moves, and much faster if there are more cache misses in the memory fetches.
Your benchmark is using lists with 100,000 Integer elements. When you look at the amount of memory involved, and the relative locality of the heap nodes that represent the lists and the elements, the linked list case will use significantly more memory, and have correspondingly worse memory locality. That will lead to more cache misses per cycle of the inner loop, and worse performance.
Your benchmark results are not surprising2 to me.
The other thing to note is that if you use Java LinkedList, a separate heap node is used to represent the list nodes. You can implement your own linked lists more efficiently if your element class has its own next field that can be used to chain the elements. However, brings its own limitations; e.g. an element can only be in one list at a time.
Finally, as #maaartinus points out, a full IMUL is not required in the case of a Java ArrayList. When reading or writing the ArrayList's array, the indexing multiplication will be either x 4 or x 8 and that can be performed by a MOV with one of the standard addressing modes; e.g.
MOV EAX, [EDX + EBX*4 + 8]
This multiplication can be done (at the hardware level) by shifting with much less latency than 64 bit IMUL.
1 - In this context, the latency is the number of cycles delay before the result of the instruction is available ... to the next instruction that depends on it. The trick is to order the instructions so that other work is done during the delay.
2 - If anything, I am surprised that LinkedList appears to be doing so well. Maybe calling Random.nextInt() and autoboxing the result is dominating the loop times?

Which data structures to use when storing multiple entities with multiple query criteria?

There is a storage unit, with has a capacity for N items. Initially this unit is empty.
The space is arranged in a linear manner, i.e. one beside the other in a line.
Each storage space has a number, increasing till N.
When someone drops their package, it is assigned the first available space. The packages could also be picked up, in this case the space becomes vacant.
Example: If the total capacity was 4. and 1 and 2 are full the third person to come in will be assigned the space 3. If 1, 2 and 3 were full and the 2nd space becomes vacant, the next person to come will be assigned the space 2.
The packages they drop have 2 unique properties, assigned for immediate identification. First they are color coded based on their content and second they are assigned a unique identification number(UIN).
What we want is to query the system:
When the input is color, show all the UIN associated with this color.
When the input is color, show all the numbers where these packages are placed(storage space number).
Show where an item with a given UIN is placed, i.e. storage space number.
I would like to know how which Data Structures to use for this case, so that the system works as efficiently as possible?
And I am not given which of these operations os most frequent, which means I will have to optimise for all the cases.
Please take a note, even though the query process is not directly asking for storage space number, but when an item is removed from the store it is removed by querying from the storage space number.

You have mentioned three queries that you want to make. Let's handle them one by one.
I cannot think of a single Data Structure that can help you with all three queries at the same time. So I'm going to give an answer that has three Data Structures and you will have to maintain all the three DS's state to keep the application running properly. Consider that as the cost of getting a respectably fast performance from your application for the desired functionality.
When the input is color, show all the UIN associated with this color.
Use a HashMap that maps Color to a Set of UIN. Whenever an item:
is added - See if the color is present in the HashMap. If yes, add this UIN to the set else create a new entry with a new set and add the UIN then.
is removed - Find the set for this color and remove this UIN from the set. If the set is now empty, you may remove this entry altogether.
When the input is color, show all the numbers where these packages are placed.
Maintain a HashMap that maps UIN to the number where an incoming package is placed. From the HashMap that we created in the previous case, you can get the list of all UINs associated with the given Color. Then using this HashMap you can get the number for each UIN which is present in the set for that Color.
So now, when a package is to be added, you will have to add the entry to previous HashMap in the specific Color bucket and to this HashMap as well. On removing, you will have to .Remove() the entry from here.
Finally,
Show where an item with a given UIN is placed.
If you have done the previous, you already have the HashMap mapping UINs to numbers. This problem is only a sub-problem of the previous one.
The third DS, as I mentioned at the top, will be a Min-Heap of ints. The heap will be initialized with the first N integers at the start. Then, as the packages will come, the heap will be polled. The number returned will represent the storage space where this package is to be put. If the storage unit is full, the heap will be empty. Whenever a package will be removed, its number will be added back to the heap. Since it is a min-heap, the minimum number will bubble up to the top, satisfying your case that when 4 and 2 are empty, the next space to be filled will be 4.
Let's do a Big O analysis of this solution for completion.
Time for initialization: of this setup will be O(N) because we will have to initialize a heap of N. The other two HashMaps will be empty to begin with and therefore will incur no time cost.
Time for adding a package: will include time to get a number and then make appropriate entries in the HashMaps. To get a number from heap will take O(Log N) time at max. Addition of entries in HashMaps will be O(1). Hence a worst case overall time of O(Log N).
Time for removing a package: will also be O(Log N) at worst because the time to remove from the HashMaps will be O(1) only while, the time to add the freed number back to min-heap will be upper bounded by O(Log N).

This smells of homework or really bad management.
Either way, I have decided to do a version of this where you care most about query speed but don't care about memory or a little extra overhead to inserts and deletes. That's not to say that I think that I'm going to be burning memory like crazy or taking forever to insert and delete, just that I'm focusing most on queries.
Tl;DR - to solve your problem, I use a PriorityQueue, an Array, a HashMap, and an ArrayListMultimap (from guava, a common external library), each one to solve a different problem.
The following section is working code that walks through a few simple inserts, queries, and deletes. This next bit isn't actually Java, since I chopped out most of the imports, class declaration, etc. Also, it references another class called 'Packg'. That's just a simple data structure which you should be able to figure out just from the calls made to it.
Explanation is below the code
import com.google.common.collect.ArrayListMultimap;
private PriorityQueue<Integer> openSlots;
private Packg[] currentPackages;
Map<Long, Packg> currentPackageMap;
private ArrayListMultimap<String, Packg> currentColorMap;
private Object $outsideCall;
public CrazyDataStructure(int howManyPackagesPossible) {
$outsideCall = new Object();
this.currentPackages = new Packg[howManyPackagesPossible];
openSlots = new PriorityQueue<>();
IntStream.range(0, howManyPackagesPossible).forEach(i -> openSlots.add(i));//populate the open slots priority queue
currentPackageMap = new HashMap<>();
currentColorMap = ArrayListMultimap.create();
}
/*
* args[0] = integer, maximum # of packages
*/
public static void main(String[] args)
{
int howManyPackagesPossible = Integer.parseInt(args[0]);
CrazyDataStructure cds = new CrazyDataStructure(howManyPackagesPossible);
cds.addPackage(new Packg(12345, "blue"));
cds.addPackage(new Packg(12346, "yellow"));
cds.addPackage(new Packg(12347, "orange"));
cds.addPackage(new Packg(12348, "blue"));
System.out.println(cds.getSlotsForColor("blue"));//should be a list of {0,3}
System.out.println(cds.getSlotForUIN(12346));//should be 1 (0-indexed, remember)
System.out.println(cds.getSlotsForColor("orange"));//should be a list of {2}
System.out.println(cds.removePackage(2));//should be the orange one
cds.addPackage(new Packg(12349, "green"));
System.out.println(cds.getSlotForUIN(12349));//should be 2, since that's open
}
public int addPackage(Packg packg)
{
synchronized($outsideCall)
{
int result = openSlots.poll();
packg.setSlot(result);
currentPackages[result] = packg;
currentPackageMap.put(packg.getUIN(), packg);
currentColorMap.put(packg.getColor(), packg);
return result;
}
}
public Packg removePackage(int slot)
{
synchronized($outsideCall)
{
if(currentPackages[slot] == null)
return null;
else
{
Packg packg = currentPackages[slot];
currentColorMap.remove(packg.getColor(), packg);
currentPackageMap.remove(packg.getUIN());
currentPackages[slot] = null;
openSlots.add(slot);//return slot to priority queue
return packg;
}
}
}
public List<Packg> getUINsForColor(String color)
{
synchronized($outsideCall)
{
return currentColorMap.get(color);
}
}
public List<Integer> getSlotsForColor(String color)
{
synchronized($outsideCall)
{
return currentColorMap.get(color).stream().map(packg -> packg.getSlot()).collect(Collectors.toList());
}
}
public int getSlotForUIN(long uin)
{
synchronized($outsideCall)
{
if(currentPackageMap.containsKey(uin))
return currentPackageMap.get(uin).getSlot();
else
return -1;
}
}
I use 4 different data structures in my class.
PriorityQueue I use the priority queue to keep track of all the open slots. It's log(n) for inserts and constant for removals, so that shouldn't be too bad. Memory-wise, it's not particularly efficient, but it's also linear, so that won't be too bad.
Array I use a regular Array to track by slot #. This is linear for memory, and constant for insert and delete. If you needed more flexibility in the number of slots you could have, you might have to switch this out for an ArrayList or something, but then you'd have to find a better way to keep track of 'empty' slots.
HashMap ah, the HashMap, the golden child of BigO complexity. In return for some memory overhead and an annoying capital letter 'M', it's an awesome data structure. Insertions are reasonable, and queries are constant. I use it to map between the UIDs and the slot for a Packg.
ArrayListMultimap the only data structure I use that's not plain Java. This one comes from Guava (Google, basically), and it's just a nice little shortcut to writing your own Map of Lists. Also, it plays nicely with nulls, and that's a bonus to me. This one is probably the least efficient of all the data structures, but it's also the one that handles the hardest task, so... can't blame it. this one allows us to grab the list of Packg's by color, in constant time relative to the number of slots and in linear time relative to the number of Packg objects it returns.
When you have this many data structures, it makes inserts and deletes a little cumbersome, but those methods should still be pretty straight-forward. If some parts of the code don't make sense, I'll be happy to explain more (by adding comments in the code), but I think it should be mostly fine as-is.

Query 3: Use a hash map, key is UIN, value is object (storage space number,color) (and any more information of the package). Cost is O(1) to query, insert or delete. Space is O(k), with k is the current number of UINs.
Query 1 and 2 : Use hash map + multiple link lists
Hash map, key is color, value is pointer(or reference in Java) to link list of corresponding UINs for that color.
Each link list contains UINs.
For query 1: ask hash map, then return corresponding link list. Cost is O(k1) where k1 is the number of UINs for query color. Space is O(m+k1), where m is the number of unique color.
For query 2: do query 1, then apply query 3. Cost is O(k1) where k1 is the number of UINs for query color. Space is O(m+k1), where m is the number of unique color.
To Insert: given color, number and UIN, insert in hash map of query 3 an object (num,color); hash(color) to go to corresponding link list and insert UIN.
To Delete: given UIN, ask query 3 for color, then ask query 1 to delete UIN in link list. Then delete UIN in hash map of query 3.
Bonus: To manage to storage space, the situation is the same as memory management in OS: read more

This is very simple to do with SegmentTree.
Just store a position in each place and query min it will match with vacant place, when you capture a place just assign 0 to this place.
Package information possible store in separate array.
Initiall it have following values:
1 2 3 4
After capturing it will looks following:
0 2 3 4
After capturing one more it will looks following:
0 0 3 4
After capturing one more it will looks following:
0 0 0 4
After cleanup 2 it will looks follwong:
0 2 0 4
After capturing one more it will looks following:
0 0 0 4
ans so on.
If you have segment tree to fetch min on range it possible to done in O(LogN) for each operation.
Here my implementation in C#, this is easy to translate to C++ of Java.
public class SegmentTree
{
private int Mid;
private int[] t;
public SegmentTree(int capacity)
{
this.Mid = 1;
while (Mid <= capacity) Mid *= 2;
this.t = new int[Mid + Mid];
for (int i = Mid; i < this.t.Length; i++) this.t[i] = int.MaxValue;
for (int i = 1; i <= capacity; i++) this.t[Mid + i] = i;
for (int i = Mid - 1; i > 0; i--) t[i] = Math.Min(t[i + i], t[i + i + 1]);
}
public int Capture()
{
int answer = this.t[1];
if (answer == int.MaxValue)
{
throw new Exception("Empty space not found.");
}
this.Update(answer, int.MaxValue);
return answer;
}
public void Erase(int index)
{
this.Update(index, index);
}
private void Update(int i, int value)
{
t[i + Mid] = value;
for (i = (i + Mid) >> 1; i >= 1; i = (i >> 1))
t[i] = Math.Min(t[i + i], t[i + i + 1]);
}
}
Here example of usages:
int n = 4;
var st = new SegmentTree(n);
Console.WriteLine(st.Capture());
Console.WriteLine(st.Capture());
Console.WriteLine(st.Capture());
st.Erase(2);
Console.WriteLine(st.Capture());
Console.WriteLine(st.Capture());

For getting the storage space number I used a min heap approach, PriorityQueue. This works in O(log n) time, removal and insertion both.
I used 2 BiMaps, self-created data structures, for storing the mapping between UIN, color and storage space number. These BiMaps used internally a HashMap and an array of size N.
In first BiMap(BiMap1), a HashMap<color, Set<StorageSpace>> stores the mapping of color to the list of storage spaces's. And a String array String[] colorSpace which stores the color at the storage space index.
In the Second BiMap(BiMap2), a HashMap<UIN, storageSpace> stores the mapping between UIN and storageSpace. And a string arrayString[] uinSpace` stores the UIN at the storage space index.
Querying is straight forward with this approach:
When the input is color, show all the UIN associated with this color.
Get the List of storage spaces from BiMap1, for these spaces use the array in BiMap2 to get the corresponding UIN's.
When the input is color, show all the numbers where these packages are placed(storage space number). Use BiMap1's HashMap to get the list.
Show where an item with a given UIN is placed, i.e. storage space number. Use BiMap2 to get the values from the HashMap.
Now when we are given a storage space to remove, both the BiMaps have to be updated. In BiMap1 get the entry from the array, get the corersponding Set, and remove the space number from this set. From BiMap2 get the UIN from the array, remove it and also remove it from the HashMap.
For both the BiMaps the removal and the insert operations are O(1). And the Min heap works in O(Log n), hence the total time complexity is O(Log N)

Algorithm to detect and combine overlapping / colliding circles

I'm trying to write a time efficient algorithm that can detect a group of overlapping circles and make a single circle in the "middle" of the group that will represent that group. The practical application of this is representing GPS locations over a map, put the conversion in to Cartesian co-ordinates is already handled so that's not relevant, the desired effect is that at different zoom levels clusters of close together points just appear as a single circle (that will have the number of points printed in the centre in the final version)
In this example the circles just have a radius of 15 so the distance calculation (Pythagoras) is not being square rooted and compared to 225 for the collision detection. I was trying anything to shave off time, but the problem is this really needs to happen very quickly becasue it's a user facing bit of code that needs to be snappy and good looking.
I've given this a go and I it works with small data sets pretty well. 2 big problems, it takes too long and it can run out of memory if all the points are on top of one another.
The route I've taken is to calculate distance between each point in a first pass, and then take the shortest distance first and start to combine from there, anything that's been combined becomes ineligible for combination on that pass, and the whole list is passed back around to the distance calculations again until nothing changes.
To be honest I think it needs a radical shift in approach and I think it's a little beyond me. I've re factored my code in to one class for ease of posting and generated random points to give an example.
package mergepoints;
import java.awt.Point;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Merger {
public static void main(String[] args) {
Merger m = new Merger();
m.subProcess(m.createRandomList());
}
private List<Plottable> createRandomList() {
List<Plottable> points = new ArrayList<>();
for (int i = 0; i < 50000; i++) {
Plottable p = new Plottable();
p.location = new Point((int) Math.floor(Math.random() * 1000),
(int) Math.floor(Math.random() * 1000));
points.add(p);
}
return points;
}
private List<Plottable> subProcess(List<Plottable> visible) {
List<PlottableTuple> tuples = new ArrayList<PlottableTuple>();
// create a tuple to store distance and matching objects together,
for (Plottable p : visible) {
PlottableTuple tuple = new PlottableTuple();
tuple.a = p;
tuples.add(tuple);
}
// work out each Plottable relative distance from
// one another and order them by shortest first.
// We may need to do this multiple times for one set so going in own
// method.
// this is the bit that takes ages
setDistances(tuples);
// Sort so that smallest distances are at the top.
// parse the set and combine any pair less than the smallest distance in
// to a combined pin.
// any plottable thats been combine is no longer eligable for combining
// so ignore on this parse.
List<PlottableTuple> sorted = new ArrayList<>(tuples);
Collections.sort(sorted);
Set<Plottable> done = new HashSet<>();
Set<Plottable> mergedSet = new HashSet<>();
for (PlottableTuple pt : sorted) {
if (!done.contains(pt.a) && pt.distance <= 225) {
Plottable merged = combine(pt, done);
done.add(pt.a);
for (PlottableTuple tup : pt.others) {
done.add(tup.a);
}
mergedSet.add(merged);
}
}
// if we haven't processed anything we are done just return visible
// list.
if (done.size() == 0) {
return visible;
} else {
// change the list to represent the new combined plottables and
// repeat the process.
visible.removeAll(done);
visible.addAll(mergedSet);
return subProcess(visible);
}
}
private Plottable combine(PlottableTuple pt, Set<Plottable> done) {
List<Plottable> plottables = new ArrayList<>();
plottables.addAll(pt.a.containingPlottables);
for (PlottableTuple otherTuple : pt.others) {
if (!done.contains(otherTuple.a)) {
plottables.addAll(otherTuple.a.containingPlottables);
}
}
int x = 0;
int y = 0;
for (Plottable p : plottables) {
Point position = p.location;
x += position.x;
y += position.y;
}
x = x / plottables.size();
y = y / plottables.size();
Plottable merged = new Plottable();
merged.containingPlottables.addAll(plottables);
merged.location = new Point(x, y);
return merged;
}
private void setDistances(List<PlottableTuple> tuples) {
System.out.println("pins: " + tuples.size());
int loops = 0;
// Start from the first item and loop through, then repeat but starting
// with the next item.
for (int startIndex = 0; startIndex < tuples.size() - 1; startIndex++) {
// Get the data for the start Plottable
PlottableTuple startTuple = tuples.get(startIndex);
Point startLocation = startTuple.a.location;
for (int i = startIndex + 1; i < tuples.size(); i++) {
loops++;
PlottableTuple compareTuple = tuples.get(i);
double distance = distance(startLocation, compareTuple.a.location);
setDistance(startTuple, compareTuple, distance);
setDistance(compareTuple, startTuple, distance);
}
}
System.out.println("loops " + loops);
}
private void setDistance(PlottableTuple from, PlottableTuple to,
double distance) {
if (distance < from.distance || from.others == null) {
from.distance = distance;
from.others = new HashSet<>();
from.others.add(to);
} else if (distance == from.distance) {
from.others.add(to);
}
}
private double distance(Point a, Point b) {
if (a.equals(b)) {
return 0.0;
}
double result = (((double) a.x - (double) b.x) * ((double) a.x - (double) b.x))
+ (((double) a.y - (double) b.y) * ((double) a.y - (double) b.y));
return result;
}
class PlottableTuple implements Comparable<PlottableTuple> {
public Plottable a;
public Set<PlottableTuple> others;
public double distance;
#Override
public int compareTo(PlottableTuple other) {
return (new Double(distance)).compareTo(other.distance);
}
}
class Plottable {
public Point location;
private Set<Plottable> containingPlottables;
public Plottable(Set<Plottable> plots) {
this.containingPlottables = plots;
}
public Plottable() {
this.containingPlottables = new HashSet<>();
this.containingPlottables.add(this);
}
public Set<Plottable> getContainingPlottables() {
return containingPlottables;
}
}
}

Map all your circles on a 2D grid first. You then only need to compare the circles in a cell with the other circles in that cell and in it's 9 neighbors (you can reduce that to five by using a brick pattern instead of a regular grid).
If you only need to be really approximate, then you can just group all the circles that fall into a cell together. You will probably also want to merge cells that only have a small number of circles together with there neighbors, but this will be fast.

This problem is going to take a reasonable amount of computation no matter how you do it, the question then is: can you do all the computation up-front so that at run-time it's just doing a look-up? I would build a tree-like structure where each layer is all the points that need to be drawn for a given zoom level. It takes more computation up-front, but at run-time you are simply drawing a list of point, fast.
My idea is to decide what the resolution of each zoom level is (ie at zoom level 1 points closer than 15 get merged; at zoom level 2 points closer than 30 get merged), then go through your points making groups of points that are within the 15 of each other and pick a point to represent group that group at the higher zoom. Now you have a 2 layer tree. Then you pass over the second layer grouping all points that are within 30 of each other, and so on all the way up to your highest zoom level. Now save this tree structure to file, and at run-time you can very quickly change zoom levels by simply drawing all points at the appropriate tree level. If you need to add or remove points, that can be done dynamically by figuring out where to attach them to the tree.
There are two downsides to this method that come to mind: 1) it will take a long time to compute the tree, but you only have to do this once, and 2) you'll have to think really carefully about how you build the tree, based on how you want the groupings to be done at higher levels. For example, in the image below the top level may not be the right grouping that you want. Maybe instead building the tree based off the previous layer, you always want to go back to the original points. That said, some loss of precision always happens when you're trying to trade-off for faster run-time.
EDIT
So you have a problem which requires O(n^2) comparisons, you say it has to be done in real-time, can not be pre-computed, and has to be fast. Good luck with that.
Let's analyze the problem a bit; if you do no pre-computation then in order to decide which points can be merged you have to compare every pair of points, that's O(n^2) comparisons. I suggested building a tree before-hand, O(n^2 log n) once, but then runtime is just a lookup, O(1). You could also do something in between where you do some work before and some at run-time, but that's how these problems always go, you have to do a certain amount of computation, you can play games by doing some of it earlier, but at the end of the day you still have to do the computation.
For example, if you're willing to do some pre-computation, you could try keeping two copies of the list of points, one sorted by x-value and one sorted by y-value, then instead of comparing every pair of points, you can do 4 binary searches to find all the points within, say, a 30 unit box of the current point. More complicated so would be slower for a small number of points (say <100), but would reduce the overall complexity to O(n log n), making it faster for large amounts of data.
EDIT 2
If you're worried about multiple points at the same location, then why don't you do a first pass removing the redundant points, then you'll have a smaller "search list"
list searchList = new list()
for pt1 in points :
boolean clean = true
for pt2 in searchList :
if distance(pt1, pt2) < epsilon :
clean = false
break
if clean :
searchList.add(pt1)
// Now you have a smaller list to act on with only 1 point per cluster
// ... I guess this is actually the same as my first suggestion if you make one of these search lists per zoom level. huh.
EDIT 3: Graph Traversal
A totally new approach would be to build a graph out of the points and do some sort of longest-edge-first graph traversal on them. So pick a point, draw it, and traverse its longest edge, draw that point, etc. Repeat this until you come to a point which doesn't have any untraversed edges longer than your zoom resolution. The number of edges per point gives you an easy way to tradeoff speed for correctness. If the number of edges per point was small and constant, say 4, then with a bit of cleverness you could build the graph in O(n) time and also traverse it to draw points in O(n) time. Fast enough to do it on the fly with no pre-computation.

Just a wild guess and something that occurred to me while reading responses from others.
Do a multi-step comparison. Assume your combining distance at the current zoom level is 20 meters. First, subtract (X1 - X2). If This is bigger than 20 meters then you are done, the points are too far. Next, subtract (Y1 - Y2) and do the same thing to reject combining the points.
You could stop here and be happy if you are good with using only horizontal/vertical distances as your metric for combining. Much less math (no squaring or square roots). Pythagoras wouldn't be happy but your users might.
If you really insist on exact answers, do the two subtraction/comparison steps above. If the points are within horizontal and vertical limits, THEN you do the full Pythagoras check with square roots.
Assuming all your points are not highly clustered very close to the combining limit, this should save some CPU cycles.
This is still approximately an O(n^2) technique, but the math should be simpler. If you have the memory, you could store distances between each set of points and then you never have to compute it again. This could take up more memory than you have and also grows at a rate of approximately O(n^2), so be careful.
Also, you could make a linked list or sorted array of all your points, sorted in order of increasing X or increasing Y. (I don't think you need both, just one). Then walk through the list in sorted order. For each point, check the neighbors out until (X1 - X2) is bigger than your combining distance. and then stop. You don't have to compare each set of points for O(N^2), you only have to compare neighbors that are close in one dimension to quickly prune your large list to a small one. As you move through the list, you only have to compare points that have a bigger X than your current candidate, because you already compared and combined with all previous X values. This gets you closer to the O(n) complexity you want. Of course, you would need to check the Y dimension and fully qualify the points to be combined before you actually do it. Don't just use the X distance to make your combining decision.

Procedural World Generation

I am generating my world (random, infinite and 2d) in sections that are x by y, when I reach the end of x a new section is formed. If in section one I have hills, how can I make it so that in section two those hills will continue? Is there some kind of way that I could make this happen?
So it would look something like this
1221
1 = generated land
2 = non generated land that will fill in the two ones
I get this now:
Is there any way to make this flow better?

This seems like just an algorithm issue. Your generation mechanism needs a start point. On the initial call it would be say 0, on subsequent calls it would be the finishing position of the previous "chunk".
If I was doing this, I'd probably make the height of the next point plus of minus say 0-3 from the previous, using some sort of distribution - e.g. 10% of the time it's +/1 3, 25% of the time it is +/- 2, 25% of the time it is 0 and 40% of the time it is +/- 1.

If I understood your problem correctly, here is a solution:
If you generated the delta (difference) between the hills and capped at a fixed value (so changes are never too big), then you can carry over the value of the last hill from the previous section when generating the new one and apply the first randomly genenarted delta (of the new section) to the carried-over hill size.

If you're generating these "hills" sequentially, I would create an accessor method that provides the continuation of said hill with a value to begin the next section. It seems that you are creating a random height for the hill to be constrained by some value already when drawing a hill in a single section. Extend that functionality with this new accessor method.
My take on a possible implementation of this.
public class DrawHillSection {
private int index;
private int x[50];
public void drawHillSection() {
for( int i = 0; i < 50; i++) {
if (i == 0) {
getPreviousHillSectionHeight(index - 1)
}
else {
...
// Your current implementation to create random
// height with some delta-y limit.
...
}
}
}
public void getPreviousHillSectionHeight(int index)
{
return (x[49].height);
}
}

Implementing an odometer in Java

I'm toying with the idea of creating an odometer style app in Java.
Basically I know I could use a series of loops to control the rotation, but I was thinking of doing it mathematically.
So if each dial rotates around ten times, from 0 to 9, and there are six dials, that should make for a total of 1000000 total rotations, if my maths are right.
Math.pow(10, 6);
My question is; what would be the most efficient way of keeping track of the last dials rotation, because like a series of real cogs, for every ten turns of the final dial, the last dial would turn once.
And then for every tenth turn of the second from last dial, the third from last would rotate, and then all the others after it would reset back to zero.
Any advice or suggestions?

A suggested implementation There's no point making the model more complex than it has to be just for the sake of recreating the object mechanically. See John3136's answer for corroboration.
Rotation model can be as simple as:
int rotations = 0;
/**
* Increment rotations every rotation
*/
void rotate() {
rotations++;
if (rotations >= Math.pow(10, 6)) // consider extracting as constant
rotations = 0; // reset
}
Then to create the view:
/**
* dial can be from 1 .. 6 (where dial 1 moves every rotation)
*/
int getDialPosition(int dial) {
int pow = Math.pow(10, dial);
return Math.floor((rotations % pow) / (pow / 10));
// above gets the digit at position dial
}
Notes
Wrap above model into an Odometer class
Build a view that gets refreshed every rotation

Think back to design and separation of concerns. What you have here is a display of some number between 0 and 9999999. The odometer is just a "view" of that number.
I'd make my model (holds a number and has some method to increment the number) and then write a view to display it in whatever GUI style I choose.

You could create a class using the composite pattern, where each counter can contain a nested child counter.
As the child is added to the parent, the parent registers itself with the child, in order to receive increment and decrement events each time that the childs counter moves across the zero threshold.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.