Context: I'm working on an analytics system for an ordering system. There are about 100,000 orders per day and the analytics need to run for the last N (say, 100) days months. The relevant data fits in memory. After N days, all orders are evicted from the memory cache, with an entire day in the past being evicted. Orders can be created or updated.
A traditional approach would use a ConcurrentHashMap<Date, Queue<Order>>. Every day, values for keys representing dates more than N days in the past will be deleted. But, of course, the whole point of using Guava is to avoid this. EDIT: changed Map to ConcurrentHashMap, see the end of the question for rationale.
With Guava collections, a MultiMap <Date, Order> would be simpler. Eviction is similar, implemented explicitly.
While the Cache implementation looks appealing (after all, I am implementing a Cache), I'm not sure about the eviction options. Eviction only happens once a day and its best initiated from outside the cache, I don't want the cache to have to check the age of an order. I'm not even sure if the cache would use a MultiMap, which I think it's a suitable data structure in this case.
Thus, my question is: is it possible to use a Cache that uses and exposes the semantics of a MultiMap and allows evictions controlled from outside itself, in particular with the rule I need ("delete all orders older than N days") ?
As an important clarification, I'm not interested in a LoadingCache but I do need bulk loads (if the application needs to be restarted, the cache has to be populated, from the database, with the last N days of orders).
EDIT: Forgot to mention that the map needs to be concurrent, as orders come in they are evaluated live against the previous orders for the same customer or location etc.
EDIT2: Just stumbled over Guava issue 135. It looks like the MultiMap is not concurrent.
I would use neither a Cache nor a Multimap here. While I like and use both of them, there's not much to gain here.
You want to evict your entries manually, so the features of Cache don't really get used here.
You're considering ConcurrentHashMap<Date, Queue<Order>>, which is in a sense more powerful than a Multimap<Date, Order>.
I'd use a Cache, if I thought about different eviction criteria and if I felt like losing any of its entries anytime1 is fine.
You may find out that you need a ConcurrentMap<Date, Dequeue<Order>> or maybe ConcurrentMap<Date, YouOwnQueueFastSearchList<Order>> or whatever. This could probably be managed somehow by the Multimap, but IMHO it gets more complicated instead of simpler.
I'd ask myself "what do I gain by using Cache or Multimap here?". To me it looks like the plain old ConcurrentMap offers about everything you need.
1 By no means I'm suggesting this would happen with Guava. On the opposite, without an eviction reason (capacity, expiration, ...) it works just like a ConcurrentMap. It's just that what you've described feels more like a Map rather than a Cache.
IMHO The simplest thing to do is to include the date of the order in the order record. (I would expect it is a field already) As you only need to clean the cache once per day it doesn't have to be very efficient, just reasonably timely.
e.g.
public class Main {
static class Order {
final long time;
Order(long time) {
this.time = time;
}
public long getTime() {
return time;
}
}
final Map<String, Order> orders = new LinkedHashMap<String, Order>();
public void expireOrdersOlderThan(long dateTime) {
for (Iterator<Order> iter = orders.values().iterator(); iter.hasNext(); )
if (iter.next().getTime() < dateTime)
iter.remove();
}
private void generateOrders() {
for (int i = 0; i < 120000; i++) {
orders.put("order-" + i, new Order(i));
}
}
public static void main(String... args) {
for (int t = 0; t < 3; t++) {
Main m = new Main();
m.generateOrders();
long start = System.nanoTime();
for (int i = 0; i < 20; i++)
m.expireOrdersOlderThan(i * 1000);
long time = System.nanoTime() - start;
System.out.printf("Took an average of %.3f ms to expire 1%% of entries%n", time / 20 / 1e6);
}
}
}
prints
Took an average of 9.164 ms to expire 1% of entries
Took an average of 8.345 ms to expire 1% of entries
Took an average of 7.812 ms to expire 1% of entries
For 100,000 orders, I would expect this to take ~10 ms which is not so much to incur at a quiet period in the middle of the night.
BTW: You can make this more efficient if your OrderIds are sorted by time. ;)
Have you considered using a sorted list of some sort? It would allow you to pull entries until you hit one that's fresh enough to stay. Of course this assumes that's your primary functio. If what you most need is the O(1) access with a hashmap, my answer doesn't apply.
Related
We collect some statistics using AtomicLongs. Some users are seeing contention on these and have suggested using LongAdder instead. However I see no way to calculate the maximum value as we are currently doing with the Atomic:
AtomicLong _current, _total, _max;
...
void add(long delta)
{
long current = _current.addAndGet(delta);
if (delta>0)
{
_total.addAndGet(delta);
long max = _max.get();
while (current > max)
{
if (_max.compareAndSet(max, current))
break;
max = _max.get();
}
}
So I think we can replace _total easily enough with a LongAdder, but because we do _current.addAndGet(delta) that will not work well for a LongAdder, nor can we do cas operation for the `_max' value.
Are there any good algorithms for collecting such statistics based on LongAdder or similar scalable lock free constructs?
Actually, whiles I'm asking, our stats typically update 6 to 10 AtomicLongs. If we are seeing contention anyway, could it be possibly be better to just grab a lock and update 6 to 10 normal longs?
You don't want LongAdder, but LongAccumulator here: you want new LongAccumulator(Math::max, Long.MIN_VALUE), which does the right thing here. LongAdder is a special case of LongAccumulator.
I have a very large file (10^8 lines) with counts of events as follows,
A 10
B 11
C 23
A 11
I need to accumulate the counts for each event, so that my map contains
A 21
B 11
C 23
My current approach:
Read the lines, maintain a map, and update the counts in the map as follows
updateCount(Map<String, Long> countMap, String key,
Long c) {
if (countMap.containsKey(key)) {
Long val = countMap.get(key);
countMap.put(key, val + c);
} else {
countMap.put(key, c);
}
}
Currently this is the slowest part of the code, (takes around 25 ms).
Note that the map is based on MapDB, but I doubt that updates are slow due to that (are they?)
This is the mapdb configs for the map,
DBMaker.newFileDB(dbFile).freeSpaceReclaimQ(3)
.mmapFileEnablePartial()
.transactionDisable()
.cacheLRUEnable()
.closeOnJvmShutdown();
Are there ways to speed this up?
EDIT:
The number of unique keys is of the order of the pages in wikipedia. The data is actually page traffic data from here.
You might try
class Counter {
long count;
}
void updateCount(Map<String, Counter> countMap, String key, int c) {
Counter counter = countMap.get(key);
if (counter == null) {
counter = new Counter();
countMap.put(key, counter);
counter.count = c;
} else {
counter.count += c;
}
}
This does not create many Long wrappers, but just allocates Counters the number of keys.
Note: do not create Long's. Above I made c an int to not oversee long/Long.
As a starting point, I'd suggest thinking about:
What is yardstick by which you're saying that 25ms is actually an unreasonable amount of time for the amount of data involved and for a generic map implementation? if you quantify that, it might help you work out if there is anything wrong.
How much time is being spent re-hashing the map versus other operations (e.g. calculation of hash codes on each put)?
What do your "events" as you call them consist of? How many unique events-- and hence unique keys-- are there? How are keys to the map being generated, and is there a more efficient way to do so? (In a standard hash map, for example, you create additional objects for each association, and actually store the key objects increasing the memory footprint.)
Depending on the answers to the previous, you could potentially roll a more efficient map structure yourself (see this example that you might be able to adapt). Essentially, you need to look specifically at what is taking the time (e.g. hash code calculation per put / cost of rehashing) and try and optimise that part.
If you are using a TreeMap, there are performance tuning options like
The number of entries in each node.
You could also use specific key and value serializer that will speed up the serialization and de-serilization.
You could use Pump mode to build the tree, which is very very fast. But one caveat is that this is useful when you are building a new map from scratch. You can find the full example here
https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/Huge_Insert.java
This is a strange case that recently came up while profiling a specialised collection I've been working on.
The collection is pretty much just two arrays, one an int[] array of keys, and one an Object[] array of values, with a hash function providing rapid lookup. It's all working nicely, but I've come to profiling the code and am getting some weird results; for profiling I've decided to do it the old fashioned way, by grabbing System.currentTimeMillis(), running a test over and over and then checking how much time has elapsed, like so:
long sTime = System.currentTimeMillis();
for (int index : indices)
foo.remove(index);
long took = System.currentTimeMillis() - sTime;
In my test I have foo prepared with 200,000 entries, and a pre-generated the list of indices that I will remove. I reset and run the test using a loop for a thousand repetitions and add took to a running total.
Now, for commands I get extremely good results compared to other data types, except with my remove(int) method. However, I've been struggling to figure out why, as my removal method is identical to my get(int) method (other than the removal obviously), as shown:
public Object get(int key) {
int i = getIndex(key); // Hashes key and locates it
return (i >= 0) ? this.values[i] : null;
}
public Object remove(int key) {
int i = getIndex(key); // Does exactly the same as above
if (i >= 0) {
--this.size;
++this.modifications; // For concurrent access behaviour
this.keys[i] = 0; // Zero indicates null entry
Object old = this.values[i];
this.values[i] = null;
return old;
}
return null;
}
While I would expect the removal to take a bit longer, they're taking more than 5 times as long to execute as get(int). However, if I comment out the line this.keys[i] = 0 then performance becomes nearly identical to get(int).
Am I correct in observing that this is an issue with assigning a value to my int[] array? I've tried commenting out all the this.values operations and experience the same slow times, but leaving this.values while commenting out this.keys[i] = 0 consistently solves the problem; I'm at a total loss as to what's going on, is there anything to be done about it?
The performance is still good considering that removals are relatively rare, but it seems strange that setting a value in an int[] is seemingly having such a big impact, so I'm curious to know why.
The code as written doesn't work concurrently. If there's other concurrency code not shown, that could well be the source timing differences. Other than that, the most likely cause is merely accessing the keys[] array in addition to the values[] array changes memory access patterns. For instance, switching from registers to memory locations, L1 cache to L2 cache, or L3 cache, or main memory. 'False sharing' is an example of a degradation pattern. 'Mechanical sympathy' is a name used for tuning to current hardware architectures.
I ran across some code that was doing something like this:
Map<String,String> fullNameById = buildMap1(dataSource1);
Map<String,String> nameById = buildMap2(dataSource2);
Map<String,String> nameByFullName = new HashMap<String,String>();
Map<String,String> idByName = new HashMap<String,String>();
Set<String> ids = fullNameById.keySet();
for (String nextId : ids) {
String name = nameById.get(nextId);
String fullName = fullNameById.get(nextId);
nameByFullName.put(fullName, name);
idByName.put(name, nextId);
}
I had to stare at it for several minutes to figure out what was going on. All of that amounts to a join operation on id's and an inversion of one of the original maps. Since Id, FullName and Name are always 1:1:1 it seemed to me that there should be some way to simplify this. I also discovered that the first two maps are never used again, and I find that the above code is a bit hard to read. So I'm considering replacing it with something like this that (to me) reads much cleaner
Table<String, String, String> relations = HashBasedTable.create();
addRelationships1(dataSource1, relations);
addRelationships2(dataSource2, relations);
Map<String,String> idByName = relations.column("hasId");
Map<String,String> nameByFullName = relations.column("hasName");
relations = null; // not used hereafter
In addRelationships1 I do
relations.put(id, "hasFullName", fullname);
And in addRelationships2 where my query yields values for id and name I do
relations.put(relations.remove(id,"hasFullName"), "hasName", name);
relations.put(name, "hasId", id);
So my questions are these:
Is there a lurking inefficiency in what I have done either via processor or memory, or GC load? I don't think so, but I'm not that familiar with the efficiency of Table. I am aware that the Table object won't be GC'd after relations = null, I just want to communicate that it's not used again in the rather lengthy section of code that follows.
Have I gained any efficiency? I keep convincing and unconvincing myself that I have and have not.
Do you find this more readable? Or is this only easy for me to read because I wrote it? I'm a tad worried on that front due to the fact Table is not well known. On the other hand, the top level now pretty clearly says, "gather data from two sources and make these two maps from it." I also like the fact that it doesn't leave you wondering if/where the other two maps are being used (or not).
Do you have an even better, cleaner, faster, simpler way to do it than either of the above?
Please Lets not have the optimize early/late discussion here. I'm well aware of that pitfall. If it improves readability without hurting performance I am satisfied. Performance gain would be a nice bonus.
Note: my variable and method names have been sanitized here to keep the business area from distracting from the discussion, I definitely won't name them addRelationships1 or datasource1! Similarly, the final code will of course use constants not raw strings.
So I did some mini benchmarking myself and came up with the conclusion that there is little difference in the two methods in terms of execution time. I kept the total size of the data being processed constant by trading runs for data-set size. I did 4 runs and chose the lowest time for each implementation from among all 4 runs. Re-reassuringly both implementations were always fastest on the same run. My code can be found here. Here are my results:
Case Maps (ms) Table (ms) Table vs Maps
100000 runs of size 10 2931 3035 104%
10000 runs of size 100 2989 3033 101%
1000 runs of size 1000 3129 3160 101%
100 runs of size 10000 4126 4429 107%
10 runs of size 100000 5081 5866 115%
1 run of size 1000000 5489 5160 94%
So using Table seems to be slightly slower for small data sets. Something interesting happens around 100,000 and then by 1 million the table is actually faster. My data will hang out in the 100 to 1000 range, so at least in execution time the performance should be nearly identical.
As for readability, my opinion is that if someone is trying to figure out what is happening near by and reads the code it will be significantly easier to see the intent. If they have to actually debug this bit of code it may be a bit harder since Table is less common, and requires some sophistication to understand.
Another thing I am unsure of is whether or not it's more efficient to create the hash maps, or to just query the table directly in the case where all keys of the map will subsequently be iterated. However that's a different question :)
And the comedic ending is that in fact as I analyzed the code further (hundreds of lines), I found that the only significant use of nameByFullname.get() outside of logging (of questionable value) was to pass the result of the to idByName.get(). So in the end I'll actually be building an idByFullName map and an idByName map instead with no need for any joining, and dropping the whole table thing anyway. But it made for an interesting SO question I guess.
tl;dr, but I'm afraid that you'd need to make a bigger step away from the original design. Simulating DB tables might be a nice exercise, but for me your code isn't really readable.
Is there a lurking inefficiency in what I have done... No idea.
Have I gained any efficiency? I'm afraid you need to measure it first. Removing some indirections surely helps, but using a more complicated data structure might offset it. And performance in general is simply too complicated.
Do you find this more readable? I'm afraid not.
Do you have an even better, cleaner, faster, simpler way to do it than either of the above? I hope so....
Where I get lost in such code is the use of strings for everything - it's just too easy to pass a wrong string as an argument. So I'd suggest to aggregate them into an object and provide maps for accessing the objects via any part of them. Something as trivial as this should do:
class IdNameAndFullName {
String id, name, fullName;
}
class IdNameAndFullNameMaps {
Map<String, IdNameAndFullName> byId;
Map<String, IdNameAndFullName> byName;
Map<String, IdNameAndFullName> byFullName;
}
You could obviously replace the class IdNameAndFullNameMaps by a Table. However, besides using a nice pre-existing data structure I see no advantages therein. The disadvantages are:
loss of efficiency
loss of readability (I wouldn't use Table here for the very same reason Tuple should be avoided)
use of String keys (your "hasId" and "hasName").
I have the following program to remove even numbers from a string vector, when the vector size grows larger, it might take a long time, so I thought of threads, but using 10 threads is not faster then one thread, my PC has 6 cores and 12 threads, why ?
import java.util.*;
public class Test_Threads
{
static boolean Use_Threads_To_Remove_Duplicates(Vector<String> Good_Email_Address_Vector,Vector<String> To_Be_Removed_Email_Address_Vector)
{
boolean Removed_Duplicates=false;
int Threads_Count=10,Delay=5,Average_Size_For_Each_Thread=Good_Email_Address_Vector.size()/Threads_Count;
Remove_Duplicate_From_Vector_Thread RDFVT[]=new Remove_Duplicate_From_Vector_Thread[Threads_Count];
Remove_Duplicate_From_Vector_Thread.To_Be_Removed_Email_Address_Vector=To_Be_Removed_Email_Address_Vector;
for (int i=0;i<Threads_Count;i++)
{
Vector<String> Target_Vector=new Vector<String>();
if (i<Threads_Count-1) for (int j=i*Average_Size_For_Each_Thread;j<(i+1)*Average_Size_For_Each_Thread;j++) Target_Vector.add(Good_Email_Address_Vector.elementAt(j));
else for (int j=i*Average_Size_For_Each_Thread;j<Good_Email_Address_Vector.size();j++) Target_Vector.add(Good_Email_Address_Vector.elementAt(j));
RDFVT[i]=new Remove_Duplicate_From_Vector_Thread(Target_Vector,Delay);
}
try { for (int i=0;i<Threads_Count;i++) RDFVT[i].Remover_Thread.join(); }
catch (Exception e) { e.printStackTrace(); } // Wait for all threads to finish
for (int i=0;i<Threads_Count;i++) if (RDFVT[i].Changed) Removed_Duplicates=true;
if (Removed_Duplicates) // Collect results
{
Good_Email_Address_Vector.clear();
for (int i=0;i<Threads_Count;i++) Good_Email_Address_Vector.addAll(RDFVT[i].Target_Vector);
}
return Removed_Duplicates;
}
public static void out(String message) { System.out.print(message); }
public static void Out(String message) { System.out.println(message); }
public static void main(String[] args)
{
long start=System.currentTimeMillis();
Vector<String> Good_Email_Address_Vector=new Vector<String>(),To_Be_Removed_Email_Address_Vector=new Vector<String>();
for (int i=0;i<1000;i++) Good_Email_Address_Vector.add(i+"");
Out(Good_Email_Address_Vector.toString());
for (int i=0;i<1500000;i++) To_Be_Removed_Email_Address_Vector.add(i*2+"");
Out("=============================");
Use_Threads_To_Remove_Duplicates(Good_Email_Address_Vector,To_Be_Removed_Email_Address_Vector); // [ Approach 1 : Use 10 threads ]
// Good_Email_Address_Vector.removeAll(To_Be_Removed_Email_Address_Vector); // [ Approach 2 : just one thread ]
Out(Good_Email_Address_Vector.toString());
long end=System.currentTimeMillis();
Out("Time taken for execution is " + (end - start));
}
}
class Remove_Duplicate_From_Vector_Thread
{
static Vector<String> To_Be_Removed_Email_Address_Vector;
Vector<String> Target_Vector;
Thread Remover_Thread;
boolean Changed=false;
public Remove_Duplicate_From_Vector_Thread(final Vector<String> Target_Vector,final int Delay)
{
this.Target_Vector=Target_Vector;
Remover_Thread=new Thread(new Runnable()
{
public void run()
{
try
{
Thread.sleep(Delay);
Changed=Target_Vector.removeAll(To_Be_Removed_Email_Address_Vector);
}
catch (InterruptedException e) { e.printStackTrace(); }
finally { }
}
});
Remover_Thread.start();
}
}
In my program you can try "[ Approach 1 : Use 10 threads ]" or "[ Approach 2 : just one thread ]" there isn't much difference speed wise, I expext it to be several times faster, why ?
The simple answer is that your threads are all trying to access a single vector calling synchronized methods. The synchronized modifier on those methods ensures that only one thread can be executing any of the methods on that object at any given time. So a significant part of the parallel part of the computation involves waiting for other threads.
The other problem is that for an O(N) input list, you have an O(N) setup ... population of the Target_Vector objects ... that is done in one thread. Plus the overheads of thread creation.
All of this adds up to not much speedup.
You should get a significant speedup (with multiple threads) if you used a single ConcurrentHashMap instead of a single Good_Email_Address_Vector object that gets split into multiple Target_Vector objects:
the remove operation is O(1) not O(n),
reduced copying,
the data structure provides better multi-threaded performance due to better handling of contention, and
you don't need to jump through hoops to avoid ConcurrentModificationException.
In addition, the To_Be_Removed_Email_Address_Vector object should be replaced with an unsynchronized List, and List.sublist(...) should be used to create views that can be passed to the threads.
In short, you are better of throwing away your current code and starting again. And please use sensible identifier names that follow the Java coding conventions, and
wrap your code at line ~80 so that people can read it!
Vector Synchronization Creates Contention
You've split up the vector to be modified, which avoids a some contention. But multiple threads are accessing a the static Vector To_Be_Removed_Email_Address_Vector, so much contention still remains (all Vector methods are synchronized).
Use an unsynchronized data structure for the shared, read-only information so that there is no contention between threads. On my machine, running your test with ArrayList in place of Vector cut the execution time in half.
Even without contention, thread-safe structures are slower, so don't use them when only a single thread has access to an object. Additionally, Vector is largely obsolete by Java 5. Avoid it unless you have to inter-operate with a legacy API you can't alter.
Choose a Suitable Data Structure
A list data structure is going to provide poor performance for this task. Since email addresses are likely to be unique, a set should be a suitable replace, and will removeAll() much faster on large sets. Using HashSet in place of the original Vector cut execution time on my (8 core) machine from over 5 seconds to around 3 milliseconds. Roughly half of this improvement is due to using the right data structure for the job.
Concurrent Structures Are a Bad Fit
Using a concurrent concurrent data structure is relatively slow, and doesn't simplify the code, so I don't recommend it.
Using a more up-to-date concurrent data structure is much faster than contending for a Vector, but the concurrency overhead of these data structures is still much higher than single-threaded structures. For example, running the original code on my machine took more than five seconds, while a ConcurrentSkipListSet took half a second, and a ConcurrentHashMap took one eighth of a second. But remember, when each thread had its own HashSet to update, the total time was just 3 milliseconds.
Even when all threads are updating a single concurrent data structure, the code needed to partition the workload is very similar to that used to create a separate Vector for each thread in the original code. From a readability and maintenance standpoint, all of these solutions have equivalent complexity.
If you had a situation where "bad" email addresses were being added to the set asynchronously, and you wanted readers of the "good" list to see those updates auto-magically, a concurrent set would be a good choice. But, with the current design of the API, where consumers of the "good" list explicitly call a blocking filter method to update the list, a concurrent data structure may be the wrong choice.
All your threads are working on the same vector. Your access to the vector is serialized (i.e. only one thread can access it at a time) so using multiple threads is likely to be the same speed at best, but more likely to be much slower.
Multiple threads work much faster when you have independent tasks to perform.
In this case, the fastest option is likely to be to create a new List which contains all the elements you want to retain and replacing the original, in one thread. This will be fastest than using a concurrent collection with multiple threads.
For comparison, this is what you can do with one thread. As the collection is fairly small, the JVM doesn't warmup in just one run, so there are multiple dummy runs which are not printed.
public static void main(String... args) throws IOException, InterruptedException, ParseException {
for (int n = -50; n < 5; n++) {
List<String> allIds = new ArrayList<String>();
for (int i = 0; i < 1000; i++) allIds.add(String.valueOf(i));
long start = System.nanoTime();
List<String> oddIds = new ArrayList<String>();
for (String id : allIds) {
if ((id.charAt(id.length()-1) % 2) != 0)
oddIds.add(id);
}
long time = System.nanoTime() - start;
if (n >= 0)
System.out.println("Time taken to filter " + allIds.size() + " entries was " + time / 1000 + " micro-seconds");
}
}
prints
Time taken to filter 1000 entries was 136 micro-seconds
Time taken to filter 1000 entries was 141 micro-seconds
Time taken to filter 1000 entries was 136 micro-seconds
Time taken to filter 1000 entries was 137 micro-seconds
Time taken to filter 1000 entries was 138 micro-seconds