java Program runtime is too fast? Issue with Memory - java

So I am running some simulations that require some sample datasets. For the sake of simplicity I am using this http://loremipsum.sourceforge.net/ Lorem Ipsum generator. I am setting a test parameter called DATASIZE that sets the amount of words or paragraphs this generator creates. I am using this generated data to create an "input" and "output" hash. The output data will use a slightly different hash. For example,
String input = hash(new LoremIpsum().getWords(DATASIZE))
String output = hash(new LoremIpsum().getWords(DATASIZE-2))
My question is, does Java keep the first data set in memory and then slightly modify it to quickly produce output? Maybe I was just pessemistic on the runtime but it seems very small. Virtually zero in System.currentTimeMillis(); Could it be the jar?
I also noticed something odd with my output. I am creating several objects that store this input and output hash. On some of these that I generate, for some reason the runtime is 16. Otherwise it is 0. Something with memory or just shoddy code?

It uses StringBuilder. So answer to your question is NO. There is no reuse/cache in getWords(..). - https://sourceforge.net/p/loremipsum/code/HEAD/tree/trunk/src/main/java/de/svenjacobs/loremipsum/LoremIpsum.java
Having said that, if you give really large number - say 1000000 then you may see difference. I checked using my latest all powerful macbook pro
public static void main(String[] args) {
LoremIpsum loremipsum = new LoremIpsum();
long start;
int number = 100000;
for(int i=0;i<5;i++) {
start = System.currentTimeMillis();
loremipsum.getWords(number);
System.out.println("getWords():" +(System.currentTimeMillis()-start));
}
}
Output in ms
getWords():11
getWords():7
getWords():5
getWords():4
getWords():4

Related

What is the quickest way to search in a large directory

I'm implementing a file transfer tool in Java that will transfer some 'X' no. of files, where 'X' is configurable by user from one SFTP server to another. The transfer bit works but it can potentially pick up duplicate files (logic for which is not yet in place).
Now the SFTP_source server receives several hundred thousand files everyday and I'm not able to figure out how to perform a quick search to avoid duplicate file transfer in this behemoth list of files on the source server.
Or please also suggest if there's any better, faster way to achieve this without performing an expensive search operation? If searching through file names is the only way to go then what search paradigm to use?
Thanks.
6M files is not that much memory. Experimentally, adding the string representations of the first 6M natural numbers to a HashSet<String> works with -Xmx1G and fails with -Xmx512M; and it only takes 2.5s on my machine (Java 8, 64-bit). Using a HashSet is therefore definitely feasible.
You can drastically lower the memory footprint if you are willing to sacrifice speed, by using the disk to store an index. In that case, you may be better of using an actual database - they are very well optimized to index and search large collections that would not fit in memory.
The code that I used for testing:
import java.util.*;
public class C {
public static void main(String ... args) {
HashSet<String> hs = new HashSet<>();
long t = System.currentTimeMillis();
for (int i=0; i< 6 * 1000 * 1000; i++) {
hs.add("" + i); // add returns "false" if key is already present
}
System.out.println("Added " + hs.size() + " keys in "
+ (System.currentTimeMillis()-t));
}
}

JAVA : Performance and Memory improvement code comparison from codechef

So today I solved a very simple problem from Codechef and I solved the problem using JAVA my answer was accepted. My code was.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
class INTEST {
public static void main(String args[]) throws IOException {
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String input = reader.readLine();
int n = Integer.parseInt(input.split(" ")[0]);
long k = Long.parseLong(input.split(" ")[1]);
int count = 0;
String element;
for (int i = 0; i < n; i++) {
element = reader.readLine();
if (Long.parseLong(element) % k == 0) {
count++;
}
}
System.out.println(count);
}
}
The onine judge reported
Running Time : 0.58 Second
Memory : 1340.5M
So, I looked into some other solutions for the same problem (sorted the solution by time) and I got another solution by the user indontop.
public class Main{
public static void main(String ...args)throws Exception{
byte b;
byte barr[]=new byte[1028];
int r=0,n=0,k=0;
while((r=System.in.read())!= ' '){
n=n*10+r-'0';
}
//System.out.println(n);
while((r=System.in.read())!='\n'){ //change
k=k*10+r-'0';
}
//System.out.println(k);
//System.in.read(); // remove
n=0;
int count=0;
while((r=System.in.read(barr,0,1028))!=-1){
for(int i=0;i<barr.length;i++){
b=barr[i];
if(b!='\n'){ //change
n=n*10+b-'0';
}
else{
// i++; //remove
if(n%k==0)count++;
n=0;
}
}
}
System.out.println(count);
}
}
the execution time and memory for the above code.
Running Time : 0.13 Second
Memory : OM
I wonder how was the user able to achieve this much performance and Memory gain with this very simple problem.
I dont understand the logic behind this code, can anyone help me by explaining this code, and also please explain what is wrong with my code.
Thank You.
How indontop achieved a better memory footprint
Basically, indontop's program reads bytes directly from the input stream, without going through readers or reading lines. The only structure it allocates is a single array of 1028 bytes, and no other objects are created directly.
Your program, on the other hand, reads lines from a BufferedReader. Each such line is allocated in memory as a string. But your program is rather short, so it's highly likely that the garbage collector doesn't kick in, hence all those lines that were read are not cleared away from memory.
What indontop's program does
It reads the input byte by byte and parses the numbers directly from it, without using Integer.parseInt or similar methods. The characters '0' through '9' can be converted to their respective values (0-9) by subtracting '0' from each of them. The numbers themselves are parsed by noting that a number like '123' can be parsed as 1*10*10 + 2*10 + 3.
The bottom line is that the user is implementing the very basic algorithm for interpreting numbers without ever having the full string in memory.
Is indontop's program better than yours?
My answer to this is no. First, his program is not entirely correct: he is reading an array of bytes and is not checking how many bytes were actually read. The last array read can contain bytes from the previous read, which may give wrong output, and it is by sheer luck that this didn't happen when he ran it.
Now, the rest of this is opinion-based:
Your program is much more readable than his. You have meaningful variable names, he doesn't. You are using well-known methods, he doesn't. Your code is concise, his is verbose and repeats the same code many times.
He is reinventing the wheel - there are good number parsing methods in Java, no need to rewrite them.
Reading data byte-by-byte is inefficient as far as system calls are concerned, and improves efficiency only in artificial environments like CodeChef and like sites.
Runtime efficiency
You really can't tell by looking at one run. Those programs run under a shared server that does lots of other things and there are too many factors that affect performance. Benchmarking is a complicated issue. The numbers you see? Just ignore them.
Premature Optimization
In real world programs, memory is garbage collected when it's needed. Memory efficiency should be improved only if it's something very obvious (don't allocate an array of 1000000 bytes if you only intend to use 1000 of them), or when the program, when running under real conditions, has memory issues.
This is true for the time efficiency as well, but as I said, it's not even clear if his program is more runtime efficient than yours.
Is your program good?
Well, not perfect, you are running the split twice, and it would be better to just do it once and store the result in a two-element array. But other than that, it's a good answer to this question.

Reading large files for a simulation (Java crashes with out of heap space)

For a school assignment, I need to create a Simulation for memory accesses. First I need to read 1 or more trace files. Each contains memory addresses for each access. Example:
0 F001CBAD
2 EEECA89F
0 EBC17910
...
Where the first integer indicates a read/write etc. then the hex memory address follows. With this data, I am supposed to run a simulation. So the idea I had was parse these data into an ArrayList<Trace> (for now I am using Java) with trace being a simple class containing the memory address and the access type (just a String and an integer). After which I plan to loop through these array lists to process them.
The problem is even at parsing, it running out of heap space. Each trace file is ~200MB. I have up to 8. Meaning minimum of ~1.6 GB of data I am trying to "cache"? What baffles me is I am only parsing 1 file and java is using 2GB according to my task manager ...
What is a better way of doing this?
A code snippet can be found at Code Review
The answer I gave on codereview is the same one you should use here .....
But, because duplication appears to be OK, I'll duplicate the answer here.
The issue is almost certainly in the structure of your Trace class, and it's memory efficiency. You should ensure that the instrType and hexAddress are stored as memory efficient structures. The instrType appears to be an int, which is good, but just make sure that it is declared as an int in the Trace class.
The more likely problem is the size of the hexAddress String. You may not realise it but Strings are notorious for 'leaking' memory. In this case, you have a line and you think you are just getting the hexString from it... but in reality, the hexString contains the entire line.... yeah, really. For example, look at the following code:
public class SToken {
public static void main(String[] args) {
StringTokenizer tokenizer = new StringTokenizer("99 bottles of beer");
int instrType = Integer.parseInt(tokenizer.nextToken());
String hexAddr = tokenizer.nextToken();
System.out.println(instrType + hexAddr);
}
}
Now, set a break-point in (I use eclipse) your IDE, and then run it, and you will see that hexAddr contains a char[] array for the entire line, and it has an offset of 3 and a count of 7.
Because of the way that String substring and other constructs work, they can consume huge amounts of memory for short strings... (in theory that memory is shared with other strings though). As a consequence, you are essentially storing the entire file in memory!!!!
At a minimum, you should change your code to:
hexAddr = new String(tokenizer.nextToken().toCharArray());
But even better would be:
long hexAddr = parseHexAddress(tokenizer.nextToken());
Like rolfl I answered your question in the code review. The biggest issue, to me, is the reading everything into memory first and then processing. You need to read a fixed amount, process that, and repeat until finished.
Try use class java.nio.ByteBuffer instead of java.util.ArrayList<Trace>. It should also reduce the memory usage.
class TraceList {
private ByteBuffer buffer;
public TraceList(){
//allocate byte buffer
}
public void put(byte operationType, int addres) {
//put data to byte buffer
}
public Trace get(int index) {
//get data from byte buffer by index
byte type = ...//read type
int addres = ...//read addres
return new Trace(type, addres)
}
}

Existing solution to "smart" initial capacity for StringBuilder

I have a piece logging and tracing related code, which called often throughout the code, especially when tracing is switched on. StringBuilder is used to build a String. Strings have reasonable maximum length, I suppose in the order of hundreds of chars.
Question: Is there existing library to do something like this:
// in reality, StringBuilder is final,
// would have to create delegated version instead,
// which is quite a big class because of all the append() overloads
public class SmarterBuilder extends StringBuilder {
private final AtomicInteger capRef;
SmarterBuilder(AtomicInteger capRef) {
int len = capRef.get();
// optionally save memory with expense of worst-case resizes:
// len = len * 3 / 4;
super(len);
this.capRef = capRef;
}
public syncCap() {
// call when string is fully built
int cap;
do {
cap = capRef.get();
if (cap >= length()) break;
} while (!capRef.compareAndSet(cap, length());
}
}
To take advantage of this, my logging-related class would have a shared capRef variable with suitable scope.
(Bonus Question: I'm curious, is it possible to do syncCap() without looping?)
Motivation: I know default length of StringBuilder is always too little. I could (and currently do) throw in an ad-hoc intitial capacity value of 100, which results in resize in some number of cases, but not always. However, I do not like magic numbers in the source code, and this feature is a case of "optimize once, use in every project".
Make sure you do the performance measurements to make sure you really are getting some benefit for the extra work.
As an alternative to a StringBuilder-like class, consider a StringBuilderFactory. It could provide two static methods, one to get a StringBuilder, and the other to be called when you finish building a string. You could pass it a StringBuilder as argument, and it would record the length. The getStringBuilder method would use statistics recorded by the other method to choose the initial size.
There are two ways you could avoid looping in syncCap:
Synchronize.
Ignore failures.
The argument for ignoring failures in this situation is that you only need a random sampling of the actual lengths. If another thread is updating at the same time you are getting an up-to-date view of the string lengths anyway.
You could store the string length of each string in a statistic array. run your app, and at shutdown you take the 90% quartil of your string length (sort all str length values, and take the length value at array pos = sortedStrings.size() * 0,9
That way you created an intial string builder size where 90% of your strings will fit in.
Update
The value could be hard coded (like java does for value 10 in ArrayList), or read from a config file, or calclualted automatically in a test phase. But the quartile calculation is not for free, so best you run your project some time, measure the 90% quartil on the fly inside the SmartBuilder, output the 90% quartil from time to time, and later change the property file to use the value.
That way you would get optimal results for each project.
Or if you go one step further: Let your smart Builder update that value from time to time in the config file.
But this all is not worth the effort, you would do that only for data that have some millions entries, like digital road maps, etc.

Printing very big BigIntegers

I'm trying to figure out the following issue related to BigIntegers in Java 7 x64. I am attempting to calculate a number to an extremely high power. Code is below, followed by a description of the problem.
import java.math.BigInteger;
public class main {
public static void main(String[] args) {
// Demo calculation; Desired calculation: BigInteger("4096").pow(800*600)
BigInteger images = new BigInteger("2").pow(15544);
System.out.println(
"The number of possible 16 bpc color 800x600 images is: "
+ images.toString());
}
}
I am encountering issues printing the result of this operation. When this code executes it prints the message but not the value of images.toString().
To isolate the problem I started calculating powers of two instead of the desired calculation listed in the comment on that line. On the two systems I have tested this on, 2^15544 is the smallest calculation that triggers the problem; 2^15543 works fine.
I'm no where close to hitting the memory limit on the host systems and I don't believe that I am even close to the VM limit (at any rate running with the VM arguments -Xmx1024M -Xms1024M has no effect).
After poking around the internet looking for answers I have come to suspect that I am hitting a limit in either BigInteger or String related to the maximum size of an array (Integer.MAX_VALUE) that those types use for internal data storage. If the problem is in String I think it would be possible to extend BigInteger and write a print method that spews out a few chars at a time until the entire BigInteger is printed, but I rather suspect that the problem lies elsewhere.
Thank you for taking the time to read my question.
The problem is a bug of the Console view in Eclipse.
On my setup, Eclipse (Helios and Juno) can't show a single line longer than 4095 characters without CRLF. The maximum length can vary depending on your font choice - see below.
Therefore, even the following code will show the problem - there's no need for a BigInteger.
StringBuilder str = new StringBuilder();
for (int i = 0; i < 4096; i++) {
str.append('?');
}
System.out.println(str);
That said, the string is actually printed in the console - you can for instance copy it out of it. It is just not shown.
As a workaround, you can set Fixed width console setting in Console preferences, the string will immediatelly appear:
The corresponding bugs on Eclipse's bugzilla are:
Display problem in console when a line reaches 4096 characters
Texteditor can't show a line with more than 4095 chars. Limit at 4096 chars.
Long lines are not displayed by editor
According to those, it's a Windows/GTK bug and Eclipse's developers can't do anything about it.
The bug is related to the length of the text is pixels, use a smaller
font and you will be able to get more characters in the text before it
breaks.

Categories