What does "costly" mean in terms of software operations?

What does "costly" mean in terms of software operations? - java

What is meant by Operation is costly or the resource is costly in-terms of Software. When i come across with some documents they mentioned something like Opening a file every-time is a Costly Operation. I can have more examples like this (Database connection is a costly operation, Thread pool is a cheaper one, etc..). At what basis it decided whether the task or operation is costly or cheaper? When we calculating this what the constraints to consider? Is based on the Time also?
Note : I already checked in the net with this but i didn't get any good explanation. If you found kindly share with me and i can close this..

Expensive or Costly operations are those which cause a lot of resources to be used, such as the CPU, Disk Drive(s) or Memory
For example, creating an integer variable in code is not a costly or expensive operation
By contrast, creating a connection to a remote server that hosts a relational database, querying several tables and returning a large results set before iterating over it while remaining connected to the data source would be (relatively) expensive or costly, as opposed to my first example with the Integer.
In order to build scalable, fast applications you would generally want to minimize the frequency of performing these costly/expensive actions, applying techniques of optimisation, caching, parallelism (etc) where they are essential to the operation of the software.
To get a degree of accuracy and some actual numbers on what is 'expensive' and what is 'cheap' in your application, you would employ some sort of profiling or analysis tool. For JavaScript, there is ySlow - for .NET applications, dotTrace - I'd be certain that whatever the platform, a similar solution exists. It's then down to someone to comprehend the output, which is probably the most important part!

Running time, memory use or bandwidth consumption are the most typical interpretations of "cost". Also consider that it may apply to cost in development time.

I'll try explain through some examples:
If you need to edit two field in each row of a Database, if you do it one field at a time that's gonna be close to twice the time as if it was properly done both at same time.
This extra time was not only your waste of time, but also a connection opened longer then needed, memory occupied longer then needed and at the end of the day, your eficience goes down the drain.
When you start scalling, very small amount of time wasted grows into a very big waste of Company resources.

It is almost certainly talking about a time penalty to perform that kind of input / output. Lots of memory shuffling (copying of objects created from classes with lots of members) is another time waster (pass by reference helps eliminate a lot of this).

Usually costly means, in a very simplified way, that it'll take much longer then an operation on memory.
For instance, accessing a File in your file system and reading each line takes much longer then simply iterating over a list of the same size in memory.
The same can be said about database operations, they take much longer then in-memory operations, and so some caution should be used not to abuse these operations.
This is, I repeat, a very simplistic explanation. Exactly what costly means depends on your particular context, the number of operations you're performing, and the overall architecture of the system.

Related

Is there any way to free memory during large data proccessing?

I have a database where I store invoices. I have to make complex operations for any given month with a series of algorithms using the information from all of the invoices. Retrieving and processing the necessary data for these operations takes a large amount of memory since there might be lots of invoices. The problem gets increasingly worse when the interval requested by the user for these calculations goes up to several years. The result is I'm getting a PermGen exception because it seems that the garbage collector is not doing its job between each month calculation.
I've always taken using System.GC to hint the GC to do its job is not a good practice. So my question is, are there any other ways to free memory aside from that? Can you force the JVM to use HD swapping in order to store partial calculations temporarily?
Also, I've tried to use System.gc at the end of each month calculation and the result was a high CPU usage (due to garbage collector being called) and a reasonably lower memory use. This could do the job but I don't think this would be a proper solution.

Don't ever use System.gc(). It always takes a long time to run and often doesn't do anything.
The best thing to do is rewrite your code to minimize memory usage as much as possible. You haven't explained exactly how your code works, but here are two ideas:
Try to reuse the data structures you generate yourself for each month. So, say you have a list of invoices, reuse that list for the next month.
If you need all of it, consider writing the processed files to temporary files as you do the processing, then reloading them when you're ready.

We should remember System.gc() does not really run the Garbage Collector. It simply asks to do the same. The JVM may or may not run the Garbage Collector. All we can do is to make unnecessary data structures available for garbage collection. You can do the same by:
Assigning Null as the value of any data structure after it has been used. Hence no active threads will be able to access it( in short enabling it for gc).
reusing the same structures instead of creating new ones.

Microbenchmarking with Big Data

I'm currently working on my thesis project designing a cache implementation to be used with a shortest-path graph algorithm. The graph algorithm is rather inconsistent with runtimes, so it's too troublesome to benchmark the entire algorithm. I must concentrate on benchmarking the cache only.
The caches I need to benchmark are about a dozen or so implementations of a Map interface. These caches are designed to work well with a given access pattern (the order in which keys are queried from the algorithm above). However, in a given run of a "small" problem, there are a few hundred billion queries. I need to run almost all of them to be confident about the results of the benchmark.
I'm having conceptual problems about loading the data into memory. It's possible to create a query log, which would just be an on-disk ordered list of all the keys (they're 10-character string identifiers) that were queried in one run of the algorithm. This file is huge. The other thought I had would be to break the log up into chunks of 1-5 million queries, and benchmark in the following manner:
Load 1-5 million keys
Set start time to current time
Query them in order
Record elapsed time (current time - start time)
I'm unsure of what effects this will have with caching. How could I perform a warm-up period? Loading the file will likely clear any data that was in L1 or L2 caches for the last chunk. Also, what effect does maintaining a 1-5 million element string array have (and does even iterating it skew results)?
Keep in mind that the access pattern is important! For example, there are some hash tables with move-to-front heuristics, which re-orders the internal structure of the table. It would be incorrect to run a single chunk multiple times, or run chunks out of order. This makes warming CPU caches and HotSpot a bit more difficult (I could also keep a secondary dummy cache that's used for warming but not timing).
What are good practices for microbenchmarks with a giant dataset?

If I understand the problem correctly, how about loading the query log on one machine, possibly in chunks if you don't have enough memory, and streaming it to the machine running the benchmark, across a dedicated network (a crossover cable, probably), so you have minimal interference between the system under test, and the test code/data ...?
Whatever solution you use, you should try multiple runs so you can assess repeatability - if you don't get reasonable repeatability then you can at least detect that your solution is unsuitable!
Updated: re: batching and timing - in practice, you'll probably end up with some form of fine-grained batching, at least, to get the data over the network efficiently. If your data falls into natural large 'groups' or stages then I would time those individually to check for anomalies, but would rely most strongly on an overall timing. I don't see much benefit from timing small batches of thousands (given that you are running through many millions).
Even if you run everything on one machine with a lot of RAM, it might be worth loading the data in one JVM and the code under test on another, so that garbage collection on the cache JVM is not (directly) affected by the large heap required to hold the query log.

Java allocation : allocating objects from a pre-existing/allocated pool

In a Java program when it is necessary to allocate thousands of similar-size objects, it would be better (in my mind) to have a "pool" (which is a single allocation) with reserved items that can be pulled from when needed. This single large allocation wouldn't fragment the heap as much as thousands of smaller allocations.
Obviously, there isn't a way to specifically point an object reference to an address in memory (for its member fields) to set up a pool. Even if the new object referenced an area of the pool, the object itself would still need to be allocated. How would you handle many allocations like this without resorting to native OS libraries?

You could try using the Commons Pool library.
That said, unless I had proof the JVM wasn't doing what I needed, I'd probably hold off on optimizing object creation.

Don't worry about it. Unless you have done a lot of testing and analysis on the actual code being run and know that it is a problem with garbage collection and that the JVM isn't doing a good enough job, spend your time elsewhere.

If you are building an application, where a predictable response time is very important, then pooling of objects, no matter how small they are will pay you dividends. Again, pooling is also a factor of how big of a data set you are trying to pool and how much physical memory your machine has.
There is ample proof on the web that shows that object pooling, no matter how small the objects are, is beneficial for application performance.
There are two levels of pooling you could do:
Pooling of the basic objects such as Vectors, which you retrieve from the pool each time you have to use the vector to form a map or such.
Have the higher level composite objects pooled, which are most commonly used, pooled.
This is generally an application design decision.
Also, in a multi-threaded application, you would like to be sensitive about how many different threads are going to be allocating and returning to the pool. You certainly do not want your application to be bogged down by contention - especially if you are dealing with thousands of objects at the same time.

#Dave and Casey, you don't need any proof to show that contiguous memory layout improves Cache efficiency, which is the major bottleneck in most OOP apps that need high performance but follow a "too idealistic" OOP-design trajectory.
People often think of the GC as the culprit causing low performance in high performance Java applications and after fixing it, just leave it at that, without actually profiling memory-behavior of the application. Note though that un-cached memory instructions are inherently more expensive than arithmetic instructions (and are getting more and more expensive due to the memory access <-> computation gap). So if you care about performance, you should certainly care about memory management.
Cache-aware, or more general, data-oriented programming, is the key to achieving high performance in many kinds of applications, such as games, or mobile apps (to reduce power consumption).
Here is a SO thread on DOP.
Here is a slideshow from the Sony R&D department that shows the usefulness of DOP as applied to a playstation game (high performance required).
So how to solve the problem that Java, does not, in general allow you to allocate a chunk of memory? My guess is that when the program is just starting, you can assume that there is very little internal fragmentation in the already allocated pages. If you now have a loop that allocates thousands or millions of objects, they will probably all be as contiguous as possible. Note that you only need to make sure that consecutive objects stretch out over the same cacheline, which in many modern systems, is only 64 bytes. Also, take a look at the DOP slides, if you really care about the (memory-) performance of your application.
In short: Always allocate multiple objects at once (increase temporal locality of allocation), and, if your GC has defragmentation, run it beforehand, else try to reduce such allocations to the beginning of your program.
I hope, this is of some help,
-Domi
PS: #Dave, the commons pool library does not allocate objects contiguously. It only keeps track of the allocations by putting them into a reference array, embedded in a stack, linked list, or similar.

Java Profiling, Performance Tuning and Memory Profiling exercises

I am about to conduct a workshop profiling, performance tuning, memory profiling, memory leak detection etc. of java applications using JProfiler and Eclipse Tptp.
I need a set of exercises that I could offer to participants where they can:
Use the tool to to profile the discover the problem: bottleneck, memory leak, suboptimal code etc. I am sure there is plenty experience and real-life examples around.
Resolve the problem and implement optimized code
Demonstrate the solution by performing another session of profiling
Ideally, write the unit test that demonstrates the performance gain
Problems nor solutions should not be overly complicated; it should be possible to resolve them in matter of minutes at best and matter of hours at worst.
Some interesting areas to exercise:
Resolve memory leaks
Optimize loops
Optimize object creation and management
Optimize string operations
Resolve problems exacerbated by concurrency and concurrency bottlenecks
Ideally, exercises should include sample unoptimized code and the solution code.

I try to find real life examples that I've seen in the wild (maybe slightly altered, but the basic problems were all very real). I've also tried to cluster them around the same scenario, so you can build up a session easily.
Scenario: you have a time consuming function that you want to do many times for different values, but the same values may pop up again (ideally not too long after it was created). A good and simple example is url-web page pairs that you need to download and process (for the exercise it should be probably simulated).
Loops:
You want to check if any of a set of words pops up in the pages. Use your function in a loop, but with the same value, pseudo code:
for (word : words) {
checkWord(download(url))
}
One solution is quite easy, just download the page before the loop.
Other solution is below.
Memory leak:
simple one: you can also solve your problem with a kind of cache. In the simplest case you can just put the results to a (static) map. But if you don't prevent it, its size will grow infinitely -> memory leak.
Possible solution: use an LRU map. Most likely performance will not degrade too much, but the memory leak should go away.
trickier one: say you implement the previous cache using a WeakHashMap, where the keys are the URLs (NOT as strings, see later), values are instances of a class that contain the URL, the downloaded page and something else. You may assume that it should be fine, but in fact it is not: as the value (which is not weakly referenced) has a reference to the key (the URL) the key will never be eligible to clean up -> nice memory leak.
Solution: remove the URL from the value.
Same as before, but the urls are interned strings ("to save some memory if we happen to have the same strings again"), value does not refer to this. I did not try it, but it seems to me that it would also cause a leak, because interned Strings can not be GC-ed.
Solution: do not intern, which will also lead to the advice that you must not skip: don't do premature optimization, as it is the root of all evil.
Object creation & Strings:
say you want to display the text of the pages only (~remove html tags). Write a function that does it line by line, and appends it to a growing result. At first the result should be a string, so appending will take a lot of time and object allocation. You can detect this problem from performance point of view (why appends are so slow) and from object creation point of view (why we created so many Strings, StringBuffers, arrays, etc).
Solution: use a StringBuilder for the result.
Concurrency:
You want to speed the whole stuff up by doing downloading/filtering in parallel. Create some threads and run your code using them, but do everything inside a big synchronized block (based on the cache), just "to protect the cache from concurrency problems". Effect should be that you effectively use just one thread, as all the others are waiting to acquire the lock on the cache.
Solution: synchronize only around cache operations (e.g. use `java.util.collections.synchronizedMap())
Synchronize all tiny little pieces of code. This should kill performance, probably prevent normal parallel execution. If you are lucky/smart enough you can come up with a dead lock also.
Moral of this: synchronization should not be an ad hoc thing, on an "it will not hurt" basis, but a well thought thing.
Bonus exercise:
Fill up your cache at the beginning and don't do too much allocation afterward, but still have a small leak somewhere. Usually this pattern is not too easy to catch. You can use a "bookmark", or "watermark" feature of the profiler, which should be created right after the caching is done.

Don't ignore this method because it works very well for any language and OS, for these reasons. An example is here. Also, try to use examples with I/O and significant call depth. Don't just use little cpu-bound programs like Mandelbrot. If you take that C example, which isn't too large, and recode it in Java, that should illustrate most of your points.
Let's see:
Resolve memory leaks.
The whole point of a garbage collector is to plug memory leaks. However, you can still allocate too much memory, and that shows up as a large percent of time in "new" for some objects.
Optimize loops.
Generally loops don't need to be optimized unless there's very little done inside them (and they take a good percent of time).
Optimize object creation and management.
The basic approach here is: keep data structure as simple as humanly possible. Especially stay away from notification-style attempts to keep data consistent, because those things run away and make the call tree enormously bushy. This is a major reason for performance problems in big software.
Optimize string operations.
Use string builder, but don't sweat code that doesn't use a solid percent of execution time.
Concurrency.
Concurrency has two purposes.
1) Performance, but this only works to the extent that it allows multiple pieces of hardware to get cranking at the same time. If the hardware isn't there, it doesn't help. It hurts.
2) Clarity of expression, so for example UI code doesn't have to worry about heavy calculation or network I/O going on at the same time.
In any case, it can't be emphasized enough, don't do any optimization before you've proved that something takes a significant percent of time.

I have used JProfiler for profiling our application.But it hasn't been of much help.Then I used JHat.Using JHat you cannot see the heap in real time.You have to take a heap dump and then analyse it. Using the OQL(Object Query Language) is a good technique to find heap leaks.

Object Pooling in Java

What are the pro's and con's of maintaining a pool of frequently used objects and grab one from the pool instead of creating a new one. Something like string interning except that it will be possible for all class objects.
For example it can be considered to be good since it saves gc time and object creation time. On the other hand it can be a synchronization bottleneck if used from multiple threads, demands explicit deallocation and introduces possibility of memory leaks. By tying up memory that could be reclaimed, it places additional pressure on the garbage collector.

First law of optimization: don't do it. Second law: don't do it unless you actually have measured and know for a fact that you need to optimize and where.
Only if objects are really expensive to create, and if they can actually be reused (you can reset the state with only public operations to something that can be reused) it can be effective.
The two gains you mention are not really true: memory allocation in java is free (the cost was close to 10 cpu instructions, which is nothing). So reducing the creation of objects only saves you the time spent in the constructor. This can be a gain with really heavy objects that can be reused (database connections, threads) without changing: you reuse the same connection, the same thread.
GC time is not reduced. In fact it can be worse. With moving generational GCs (Java is, or was up to 1.5) the cost of a GC run is determined by the number of alive objects, not by the released memory. Alive objects will be moved to another space in memory (this is what makes memory allocation so fast: free memory is contiguous inside each GC block) a couple of times before being marked as old and moved into the older generation memory space.
Programming languages and support, as GC, were designed keeping in mind the common usage. If you steer away from the common usage in many cases you may end up with harder to read code that is less efficient.

Unless the object is expensive to create, I wouldn't bother.
Benefits:
Fewer objects created - if object creation is expensive, this can be significant. (The canonical example is probably database connections, where "creation" includes making a network connection to the server, providing authentication etc.)
Downsides:
More complicated code
Shared resource = locking; potential bottleneck
Violates GC's expectations of object lifetimes (most objects will be shortlived)
Do you have an actual problem you're trying to solve, or is this speculative? I wouldn't think about doing something like this unless you've got benchmarks/profile runs showing that there's a problem.

Pooling will mean that you, typically, cannot make objects immutable. This leads to defencive copying so you ultimately wind up making many more copies than you would if you just made a new immutable object.
Immutability is not always desirable, but more often than not you will find that things can be immutable. Making them not immutable so that you can reuse them in a pool is probably not a great idea.
So, unless you know for certain that it is an issue don't bother. Make the code clear and easy to follow and odds are it will be fast enough. If it isn't then the fact that the code is clear and easy to follow will make it easier to speed it up (in general).

Don't.
This is 2001 thinking. The only object "pool" that is still worth anything now a days is a singleton. I use singletons only to reduce the object creation for purposes of profiling (so I can see more clearly what is impacting the code).
Anything else you are just fragmenting memory for no good purpose.
Go ahead and run a profile on creating a 1,000,000 objects. It is insignificant.
Old article here.

It entirely depends on how expensive your objects are to create, compared to the number of times you create them... for instance, objects that are just glorified structs (e.g. contain only a couple of fields, and no methods other than accessors) can be a real use case for pooling.
A real life example: I needed to repetitively extract the n highest ranked items (integers) from a process generating a great number of integer/rank pairs. I used a "pair" object (an integer, and a float rank value) in a bounded priority queue. Reusing the pairs, versus emptying the queue, throwing the pairs away, and recreating them, yielded a 20% performance improvement... mainly in the GC charge, because the pairs never needed to be reallocated throughout the entire life of the JVM.

Object pools are generally only a good idea for expensive object like database connections. Up to Java 1.4.2, object pools could improve performance but as of Java 5.0 object pools where more likely to harm performance than help and often object pools were removed to improve performances (and simplicity)

I agree with Jon Skeet's points, if you don't have a specific reason to create a pool of objects, I wouldn't bother.
There are some situations when a pool is really helpful/necessary though. If you have a resource that is expensive to create, but can be reused (such as a database connection), it might make sense to use a pool. Also, in the case of database connections, a pool is useful for preventing your apps from opening too many concurrent connections to the database.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.