Storing arrays in memory and using these arrays later

Storing arrays in memory and using these arrays later - java

I am currently working on a program which requires preprocessing; filling multidimensional arrays with around 5765760*2 values.
My issue is that I have to run this preprocessing every time before I actually get to test the data and it takes around 2 minutes.
I don't want to have to wait 2 minutes each time I run a test, but I also don't want to store the values in a file.
Is there a way to store the values in a temporary memory rather than actually outputting them into a file?

I think, what you are asking for translates to: "can I make my JVM write data to some place in memory so that another JVM instance can later on read from there?"
And the simple answer is: no, that is not possible.
When the JVM dies, the memory consumed by the JVM is returned to the OS. That stuff is gone.
So even the infamous sun.misc.Unsafe with "direct" memory access does not allow you to do that.
The one thing that would work: if your OS is Linux, you could create a RAM disc. And then you write your file to that.
So, yes, you store your data in a file, but the file resides in memory; thus reading/writing is much faster compared to disk IO. And that data stays available as long as you don't delete the RAM disc or restart your OS.
On the other hand, when your OS is Linux, and you have enough RAM (a few GB should do!) then you should just try if an "ordinary disc" isn't good enough.
You see - those modern OSes, they do a lot of things in the background. It might look like "writing to disk", but in the end, the Linux OS just keeps using the memory.
So, before you spent hours on bizarre solutions - measure the impact of writing to disk!

Run the preprocessing, save the result using a data structure of your choice and keep your programm running until you need the result.

Can it be stored in memory? Well, yes, it's already in memory! The obvious solution is to keep your program running. You can put your program in a loop with an option to repeat - "enter Y to test again, or N to quit." Then, your program can skip the preprocessing if it's already been done. Until you exit the program, you can do this as many times as you like.
Another thing you might consider is whether your code can be made more efficient. If your code takes less time to run, it won't be quite so annoying to wait for it. In general, if something can be done outside a loop, don't do it inside a loop. If you have an instruction being run five million times, that can add up. If this is homework, you'll likely use more time making it more efficient than you'd spend waiting for it - however, this isn't wasted time, as you're practicing skills you may need later. Obviously, I can't give specific suggestions without the code (and making specific code more efficient would probably be better suited for the Code Review stack exchange site.)

Related

can reading from disk from difference threads optimize program?

I am wondering is there a way to optimize reading from disk in java. I mean for example I want to print the contains of all text files in some directory, but after uppercase them. I can create another thread do uppercase them, but can I optimize reading by adding another(thread(s)) to read files too? I mean 2,3 or more threads to read difference files from disk. Is there some optimization for doing this or not? I hope that I explain the problem clearly.

I want to print the contains of all text files
This is most likely your bottleneck. If not, you should focus on what you bottleneck is as optimising anything else is likely to complicate your code for no benefit.
I can create another thread do uppercase them,
You can, though passing the work to another thread could be more expensive than making it uppercase depending on how your do this.
can I optimize reading by adding another(thread(s)) to read files too?
Possibly. How many disks do you have. If you have one disk, it can usually only do one thing at a time.
I mean 2,3 or more threads to read difference files from disk.
Most desktop drives can only do one operation at a time.
Is there some optimization for doing this or not?
Yes, but as I said, until you know what your bottleneck is, it's hard to jump to a solution.

I can create another thread do uppercase them
That's actually going in the right direction, but simply making all letters uppercase doesn't take enough time to really matter unless you're processing really large chunks of the file.
Because the standard single-threaded model of read-then-process means you're either reading data or processing it, when you could be doing both at the same time.
For example, you could be creating a series of highly compressed (say, JPEG2000 because it's so CPU intensive) images from a large video stream file. You could have one thread reading frames from the stream, placing them into a queue to process, and then have N threads each processing a frame into an image.
You'd tune the number of threads reading data and the number of threads processing data to keep both your disks and CPUs maximally busy without excess contention.
There are some cases where you can use multiple threads to read from a single file to get better performance. But you need a system designed from the ground up to do that. You need lots of disks (less so if they're SSDs), a pretty substantial IO infrastructure along with a system that has a lot of IO bandwidth, and then you need a file system that can handle multiple simultaneous access to a single file. Then the code you have to write to get better performance from reading using more than one thread has to match things like the physical layout of your files on disk.
That works best if you're doing lots of random reads from a file spread over multiple devices. Like a large, high-powered database server.
For example, lets say I have a huge data file spread over four or five disks (or even RAID arrays), with the file spread out over the disks in 64KB chunks. A handful of threads doing 64KB reads would be ideal to read or write such a file in a random-access mode. Let's say everything is really fast and you can read or write 1 GB/sec from such a file.
But if you turn around and just try to copy that data in a stream, you can still use multiple threads to get maximum performance - say 1 GB/sec - but if you just used a single thread to do read() calls in 1 MB chunks you'd probably get 950 MB/sec - or 95% or maximum multithreaded read performance.
I've actually benchmarked such systems and most of the time, multithreaded IO isn't worth the trouble unless you've invested a lot of money in your hardware and software (opensource file systems tend not to do this very well - you need to get into the realm of IBM's GPFS and Oracle's (nee LSC's then Sun's) QFS) and you know exactly what you're doing when you set it up.

Java readline() performance differs on subsequent tests

I wrote program to read content from a simple 1GB file using a simple buffered reader.
I recorded the time from start to finish, as to calculate the time used.
An interesting observation I have made is that on the first run, the reading speed came out to about 80~90MB/s, but when I ran it a second time, it reads considerably faster, and at a speed of around 320MB/s.
I guess this might be a result of a memory caching problem, but I don't know how to fix it.

If caching is the problem, you should be able to use the method detailed here to clear your cache, assuming you're on a Linux system. This method requires super user access.

I guess what you want to do is to compare the speed difference between readline and some other methods of reading in a 1GB file and you are getting conflicting results from running readline a couple of times?
Perhaps randomize the file contents, or read different files.

Check if there is enough memory before allocating byte array

I need to load a file into memory. Before I do that I want to make sure that there is enough memory in my VM left. If not I would like to show an error message. I want to avoid the OutOfMemory exception.
Approach:
Get filesize of my file
Use Runtime.getRuntime().freeMemory()
Check if it fits
Would this work or do you have any other suggestions?

The problem with any "check first then do" strategy is that there may be changes between the "check" and the "do" that render the entire thing useless.
A "try then recover" strategy is almost always a better idea and, unfortunately, that means trying to allocate the memory and handling the exception. Even if you do the "check first" option, you should still code for the possibility that the allocation may fail.
A classic example of that would be checking that a file exists before opening it. If someone were to delete the file between your check and open, you'll get an exception regardless of the fact the file was there very recently.
Now I don't know why you have an aversion to catching the exception but I'd urge you to rethink it. Since Java relies heavily on them, they're generally accepted as a good way to do things, if you don't actually control what it is you're trying (such as opening files or allocating memory).
If, as it seems from your comments, you're worried about the out-of-memory affecting other threads, that shouldn't be the case if you try to allocate one big area for the file. If you only have 400M left and you ask for 600, your request will fail but you should still have that 400M left.
It's only if you nickle-and-dime your way up to the limit (say trying 600 separate 1M allocations) would other threads start to feel the pinch after you'd done about 400. And that would only happen if you dodn't release those 400 in a hurry.
So perhaps a possibility would be to work out how much space you need and make sure you allocate it in one hit. Either it'll work or it won't. If it doesn't, your other threads will be no worse off.
I suppose you could use your suggested method to try and make sure the allocation left a bit of space for the other threads (say 100M or 10% or something like that), if you're really concerned. But I'd just go ahead and try anyway. If your other threads can't do their work because of low memory, there's ample precedent to tell the user to provide more memory for the VM.

Personally I would advice against loading a massive file directly into memory, rather try to load it in chunks or use some sort of temp file to store intermediate data.

You may want to look at the FileChannel.map(FileChannel.MapMode, long, long) method. This allows mapping a file (think POSIX mmap) without filling the heap. The operating system will (hopefully successfully) take care of the memory for you.

java performance degrading due to arraylist of size more than 6000

Hello I am using jdk7 and ram of 6GB with intel corei5 processor.I have a java code which has an arraylist of size more than 6000 and each element in that arraylist contains 12 double values.The processing speed has decreased very much and now it takes around 20 mins to run that entire code.
What the code does is as follows:
There are some 4500 iterations happening due to nested for loops..and in each iteration a file of 400 kb is read and some processing happens and some values are stored in arraylist.
Once the arraylist is ready the values of arraylist are written in another file through csvwriter.and then i have used jtable and the jtable also required the arraylist for referring to some values in that arraylist.so basically i cant clear this arraylist.
I have given the values for heap memory as follows
-Xms1024M -Xmx4096M
I am new to programming and I am rather confused as to what should i do?Can i increase heap size more than this?will that help?My senior suggests to use hard disk memory for storing arraylist elements or processing but i doubt if that is possible.
Please help.Any help will be appreciated

12 x 8 x 6000 doubles are not going to take up a significant amount of memory
If your program's speed is getting slower each time until it eventually crashes with an OutOfMemoryError, then it's possible that you have a coding error that is causing a memory leak.
This question has some examples of memory leak causes in Java.
Using VisualVM or some manual logging will help to identify the issue. Static code anaylsers like FindBugs or PMD may also help.

Heap size isn't your issue. When you have one of those, you'll see an OutOfMemoryError.
Usually what you do when you encounter performance issues like this is you profile, either with something like VisualVM or by hand, using System.nanoTime() to track which part of your code is the bottleneck. From there, you make sure you're using appropriate data structures, algorithms, etc., and then see where you can parallelize your code.

I guess you're leaking the JTables somehow. This can easily happen with Listeners, TableSorters, etc. A proper tool would tell you, but the better way is IMHO to decompose the problem.
Either it's the GUI part what makes troubles or not. Ideally, the remaining program should be completely independent of the GUI, so you can run it in isolation and you'll see what happens.
My senior suggests to use hard disk memory for storing arraylist elements or processing but i doubt if that is possible.
Many things are possible but few make sense. If you're really storing just 12 x 8 x 6000 doubles, then it makes absolutely no sense. Even with the high overhead of Double, it's just a few megabytes.
Another idea: If all the data you need fits into memory, then you can try to read it all upfront. This ensures you're not storing multiple copies.

CPU Resources and Clock Cycles: System.out.println Or Incrementing a flag

To debug our Android code we have put System.out.println(string) which will let us know how many times a function has been called. The other method would have been to put a flag and keep on incrementing it after every function call. And then at the end printing the final value of flag by System.out.println(...). (practically in my application the function will be called thousands of time)
My question is: In terms of CPU Resources and Clock Cycles which one is lighter: increment operation Or System.out.println?

Incrementing is going to be much, much more efficient - especially if you've actually got anywhere for that output to go. Think of all the operations required by System.out.println vs incrementing a variable. Of course, whether the impact will actually be significant is a different matter - and if your method is already doing a lot of work, then a System.out.println call may not actually make much difference. But if you just want to know how many times it was called, then keeping a counter makes more sense than looking through the logs anyway, IMO.
I would recommend using AtomicLong or AtomicInteger instead of just having a primitive variable, as that way you get simple thread-safety.

Incrementing will be a lot faster in terms of clock cycles. Assuming the increment is fairly close to a hardware increment it would only take a couple of clock cycles. That means you can do millions every second.
On the other hand System.out.println will have to call out to the OS. Use stdout. Convert characters, etc. Each of these steps will take many, many clock cycles.
Coming back to your original question, if you're looking at how many times a function gets called you could try and run a profiler - there are various desktop and android solutions available. That way you wouldn't need to pollute your code with counting/printing, and you can keep your production code lean.
Again thinking a litle further, why would you like to know exact number of times a function is called? If you're concerned about a defect consider writing some unit tests that will prove exactly how many times a function gets called. If you're concerned about performance, perhaps look at load test techniques in combination with your profiler.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.