Matlab segmentation fault when iterating vector assignment - java

I've been vectorizing some matlab code I'd previously written, and during this process matlab started crashing due to segmentation faults. I narrowed the problem down to a single type of computation: assigning to multiple struct properties.
For example, even self assignment of this form eventually causes a seg fault when executed several thousand times:
[my_class_instance.my_struct_vector.my_property] = my_class_instance.my_struct_vector.my_property;
I initially assumed this must be a memory leak of some sort, so tried printing out java's free memory after every iteration, but this remained fairly constant.
So yeah, completely at a loss now as to why this breaks :-/
UPDATE: the following changes fixes the seg faulting:
temp = [my_class_instance.my_struct_vector];
[temp.my_property] = temp.my_property;
[my_class_instance.my_struct_vector] = temp;
The question is now why this would fix anything. Something about repeated accessing a handle class rather than a struct list perhaps?
UPDATE 2: THE PLOT THICKENS
I've finally replicated the problem and the work around using a dummy program simple enough to post here:
a simple class:
classdef test_class
properties
test_prop
end
end
And a program that makes a bunch of vector assignments with the class, and will always crash.
test_instance = test_class();
test_instance.test_prop = struct('test_field',{1 1});
for i=1:10000
[test_instance.test_prop.test_field] = test_instance.test_prop.test_field;
end
UPDATE 3: THE PLOT THINS
Turns out I found a bug. According to Matlab tech support, repeated vector assignment of class properties simply won't work in R2011a (and presumably in earlier version). He told me it works fine in R2012a, and then mentioned the same workaround I discovered: use a temporary variable.
So yeah...
pretty sure this question ends with that support ticket, but if any daring individuals want to take a shot as to WHY this bug exists at all, I'd definitely still be interested in such an answer. (learning is fun!)

By far the most likely cause is that the operation is internally using self-modifying code. The problem with this is that modern processors have CPU caches, so if you change code in memory, but the code has already been committed to a cache, it will generate a seg fault.
The reason why it is random is because it depends on whether the modified code is in the cache at the time of modification and other factors.
To avoid this the programmer has to be sure to have the code flush the cache before doing a self-modification.

Related

AnyLogic memory error: how to know how much the threshold is exceeded?

I have a lot of road traffic and markup elements, charts, nodes and arcs within my Main agent. When running the simulation it throws the following error:
Description: The code of method _createPersistentElementsBP4_xjal() is exceeding the 65535 bytes limit.
I read this article: https://noorjax.com/2018/10/17/your-agent-is-too-big-memory-problem/
However, I would like to know how much have I exceeded the limit. Is there any way of getting this information? Because if it is not that far from the threshold, I can make some modifications to drop below that threshold. Otherwise it is painful to create so many new agents, etc.
This is a Java Virtual Machine (JVM) restriction on the Java bytecode size for the method body (i.e., the compiled code size) as I understand it (e.g., see Baeldung's description which links to the relevant JVM specification details). Thus, even though you can see the generated Java source code for the offending method, it isn't actually the length of that that is the limitation (though obviously the length of the source code correlates to some degree with the size of the compiled bytecode).
[As such, I'm surprised if Felipe's idea of reducing variable name lengths makes any difference since they're not stored explicitly like that in the bytecode...]
So, no, you can't tell how much you've exceeded it by (unless I guess you actually interrogate compiled class files and know exactly what you're doing). Even though it is AnyLogic's code generation that is 'causing' the problem, any such situation will normally always be something that you could re-architect better (as with Felipe's example) from an object-oriented (or data structuring) design perspective in the model.

How can I make my matrix multiplication Java code more fail-safe? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a project, where I was provided a Java matrix-multiplication program which can run in a distributed system , which is run like so :
usage: java Coordinator maxtrix-dim number-nodes coordinator-port-num
For example:
java blockMatrixMultiplication.Coordinator 25 25 54545
Here's a snapshot of how output looks like :
I want to extend this code with some kind of failsafe ability - and am curious about how I would create checkpoints within a running matrix multiplication calculation. The general idea is to recover to where it was in a computation (but it doesn't need to be so fine grained - just recover to beginning, i.e row 0 column 0 )
My first idea is to use log files (like Apache log4j ), where I would be logging the relevant matrix status. Then, if we forcibly shut down the app in the middle of a calculation, we could recover to a reasonable checkpoint.
Should I use MySQL for such a task (or maybe a more lightweight database)? Or would a basic log file ( and using some useful Apache libraries) be good enough ? any tips appreciated, thanks
source-code :
MatrixMultiple
Coordinator
Connection
DataIO
Worker
If I understand the problem correctly, all you need to do is recover your place in a single matrix calculation in the event of a crash or if the application is quit half way through.
Minimum Viable Solution
The simplest approach would be to recover just the two matrixes you were actively multiplying, but none of your progress, and multiply them from the beginning next time you load the application.
The Process:
At the beginning of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, create a file, let's call it recovery_data.txt with the state of the two arrays being multiplied (parameters a and b). Alternatively, you could use a simple database for this.
At the end of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, right before you return, clear the contents of the file, or wipe you database.
When the program is initially run, most likely near the beginning of the main(String[] args) you should check to see if the contents of the text file is non-null, in which case you should multiply the contents of the file, and display the output, otherwise proceed as usual.
Notes on implementation:
Using a simple text file or a full fledged relational database is a decision you are going to have to make, mostly based on the real world data that only you could really know, but in my mind, a textile wins out in most situations, and here are my reasons why. You are going to want to read the data sequentially to rebuild your matrix, and so being relational is not that useful. Databases are harder to work with, not too hard, but compared to a text file there is no question, and since you would not be much use of querying, that isn't balanced out by the ways they usually might make a programmers life easier.
Consider how you are going to store your arrays. In a text file, you have several options, my recommendation would be to store each row in a line of text, separated by spaces or commas, or some other character, and then put an extra line of blank space before the second matrix. I think a similar approach is used in crAlexander's Answer here, but I have not tested his code. Alternatively, you could use something more complicated like JSON, but I think that would be too heavy handed to justify. If you are using a database, then the relational structure should make several logical arrangements for your data apparent as well.
Strategic Checkpoints
You expressed interest in saving some calculations by taking advantage of the possibility that some of the calculations will have already been handled on last time the program ran. Lets look first look at the Pros and Cons of adding in checkpoints after every row has been processed, best I can see them.
Pros:
Save computation time next time the program is run, if the system had been closed.
Cons:
Making the extra writes will either use more nodes if distributed (more on that later) or increase general latency from calculations because you now have to throw in a database write operation for every checkpoint
More complicated to implement (but probably not by too much)
If my comments on the implementation of the Minimum Viable Solution about being able to get away with a text file convinced you that you would not have to add in RDBMS, I take back the parts about not leveraging queries, and everything being accessed sequentially, so a database is now perhaps a smarter choice.
I'm not saying that checkpoints are definitely not the better solution, just that I don't know if they are worth it, but here is what I would consider:
Do you expect people to be quitting half way through a calculation frequently relative to the total amount of calculations they will be running? If you think this feature will be used a lot, then the pro of adding checkpoints becomes much more significant relative to the con of it slowing down calculations as a whole.
Does it take a long time to complete a typical calculation that people are providing the program? If so, the added latency I mentioned in the cons is (percentage wise) smaller, and so perhaps more tolerable, but users are already less happy with performance, and so that cancels out some of the effect there. It also makes the argument for checkpointing more significant because it has the potential to save more time.
And so I would only recommend checkpointing like this if you expect a relatively large amount of instances where this is happening, and if it takes a relatively large amount of time to complete a calculation.
If you decide to go with checkpoints, then modify the approach to:
after every row has been processed on the array that you produce the content of that row to your database, or if you use the textile, at the end of the textile, after another empty line to separate it from the last matrix.
on startup if you need to finish a calculation that has already been begun, solve out and distribute only the rows that have yet to be considered, and retrieve the content of the other rows from your database.
A quick point on implementing frequent checkpoints: You could greatly reduce the extra latency from adding in frequent checkpoints by pushing this task out to an additional thread. Doing this would use more processes, and there is always some latency in actually spawning the process or thread, but you do not have to wait for the entire write operation to be completed before proceeding.
A quick warning on the implementation of any such failsafe method
If there is an unchecked edge case that would mean some sort of invalid matrix would crash the program, this failsafe now bricks the program it entirely by trying it again on every start. To combat this, I see some obvious solutions, but perhaps a bit of thought would let you modify my approaches to something you prefer:
Use a lot of try and catch statements, if you get any sort of error that seems to be caused by malformed data, wipe your recovery file, or modify it to add a note that tells your program to treat it as a special case. A good treatment of this special case may be to display the two matrixes at start with an explanation that your program failed to multiply them likely due to malformed content.
Add data in your file/database on how many times the program has quit while solving the current problem, if this is not the first resume, treat it like the special case in the above option.
I hope that this provided enough information for you to implement your failsafe in the way that makes the most sense given what you suspect the realistic use to be, and note that there are perhaps other ways you could approach this problem as well, and these could equally have their own lists of pros and cons to take into consideration.

Would my time consuming Java app benefit from Rootbeer by shifting some work to GPU?

My Java app reads in thousands of text files, goes through the lines and parse the text into floats and does some calculation with them then saves the results into some files, now I'm using multiple thread to execute them simultaneously, still need about an hour to run the whole app.
Here is a simplified pseudo version of what my app does :
String Lines[]=ReadFile("abc.txt"),Items[];
float its[];
for (int i=0;i<Lines.length;i++)
{
Items=Lines[i].split(",");
its=new float[Items.length];
for (int j=0;j<its.length;j++) its[j]=Float.parseFloat(Items[j]);
do some calculation with : its[]
}
Do more calculation, save results to a StringBuffer.
Save StringBuffer to ("abc_result.txt");
I've read that Rootbeer is able to shift some workload to GPU and speed up the process, so my questions are :
[1] Can Rootbeer handle text parsing and file I/O on the GPU ? Or do I need this portion done on CPU and only do the calculation on GPU ?
[2] Would my app benefit from Rootbeer ?
The answer, as so often is: it depends.
Rootbeer will be able to shift some work to the GPU. But there are lots of caveats.
You have to learn a whole new way of doing things.
It might not speed it up. It rather depends on where the bottleneck in your application is. For instance, if your problem is I/O-bound, then GPU processing won't help. You can check this with your code as it stands, by looking at CPU usage while your app is running. If it's taking an hour, then unless your files are catastrophically huge, it does sound as though it's CPU-bound.
(This is the most important caveat.) You should start by optimising the code that you have, and checking how much room there is for improvement. GPU processing is essentially just throwing more horsepower at the problem; which is a good idea if and only if your code is already running optimally.
Now, lots of the processing in your application, by the look of things, is to do with processing the strings. You're invoking .split() on a String, which breaks it up into a String[]; then you go through the bits and read each one to convert to a float; then you do your calculations. This is quite heavy work, especially on the garbage collector, which has to get rid of all these Strings.
I did some detailed analysis of something very similar, using integers instead of floating point, in this question, and it led to some detailed discussion. The conclusion there was that it made a huge difference to abandon the String-based approach and read the files in character by character, turning them manually into integers. Almost certainly something very similar will help you with producing floating point numbers.
So, in short, Rootbeer might help, but you have a lot you can do before you try something that drastic.

Fastest way to process through million line files with timestamps

So I've got these huge text files that are filled with a single comma delimited record per line. I need a way to process the files line by line, removing lines that meet certain criteria. Some of the removals are easy, such as one of the fields is less than a certain length. The hardest criteria is that these lines all have timestamps. Many records are identical except for their timestamps and I have to remove all records but one that are identical and within 15 seconds of one another.
So I'm wondering if some others can come up with the best approach for this. I did come up with a small program in Java that accomplishes the task, using JodaTime for the timestamp stuff which makes it really easy. However, the initial way I coded the program was running into OutofMemory Heap Space errors. I refactored the code a bit and it seemed ok for the most part but I do still believe it has some memory issues as once in awhile the program just seems to get hung up. That and it just seems to take way too long. I'm not sure if this is a memory leak issue, a poor coding issue, or something else entirely. And yes I tried increasing the Heap Size significantly but still was having issues.
I will say that the program needs to be in either Perl or Java. I might be able to make a python script work too but I'm not overly familiar with python. As I said, the timestamp stuff is easiest (to me) in Java because of the JodaTime library. I'm not sure how I'd accomplish the timestamp stuff in Perl. But I'm up for learning and using whatever would work best.
I will also add the files being read in vary tremendously in size but some big ones are around 100Mb with something like 1.3 million records.
My code essentially reads in all the records and puts them into a Hashmap with the keys being a specific subset of the data from a record that similar records would share. So a subset of the record not including the timestamps which would be different. This way you'd end up with some number of records with identical data but that occurred at different times. (So completely identical minus the timestamps).
The value of each key then, is a Set of all records that have the same subset of data. Then I simply iterate through the Hashmap, taking each set and iterating through it. I take the first record and compare its times to all the rest to see if they're within 15 seconds. If so the record is removed. Once that set is finished it's written out to a file until all the records have been gone through. Hopefully that makes sense.
This works but clearly the way I'm doing it is too memory intensive. Anyone have any ideas on a better way to do it? Or, a way I can do this in Perl would actually be good because trying to insert the Java program into the current implementation has caused a number of other headaches. Though perhaps that's just because of my memory issues and poor coding.
Finally, I'm not asking someone to write the program for me. Pseudo code is fine. Though if you have ideas for Perl I could use more specifics. The main thing I'm not sure how to do in Perl is the time comparison stuff. I've looked a little into Perl libraries but haven't seen anything like JodaTime (though I haven't looked much). Any thoughts or suggestions are appreciated. Thank you.
Reading all the rows in is not ideal, because you need to store the whole lot in memory.
Instead you could read line by line, writing out the records that you want to keep as you go. You could keep a cache of the rows you've hit previously, bounded to be within 15 seconds of the current program. In very rough pseudo-code, for every line you'd read:
var line = ReadLine()
DiscardAnythingInCacheOlderThan(line.Date().Minus(15 seconds);
if (!cache.ContainsSomethingMatchingCriteria()) {
// it's a line we want to keep
WriteLine(line);
}
UpdateCache(line); // make sure we store this line so we don't write it out again.
As pointed out, this assumes that the lines are in time stamp order. If they aren't, then I'd just use UNIX sort to make it so they are, as that'll quite merrily handle extremely large files.
You might read the file and output just the line numbers to be deleted (to be sorted and used in a separate pass.) Your hash map could then contain just the minimum data needed plus the line number. This could save a lot of memory if the data needed is small compared to the line size.

How to store millions of Double during a calculation?

My engine is executing 1,000,000 of simulations on X deals. During each simulation, for each deal, a specific condition may be verified. In this case, I store the value (which is a double) into an array. Each deal will have its own list of values (i.e. these values are indenpendant from one deal to another deal).
At the end of all the simulations, for each deal, I run an algorithm on his List<Double> to get some outputs. Unfortunately, this algorithm requires the complete list of these values, and thus, I am not able to modify my algorithm to calculate the outputs "on the fly", i.e. during the simulations.
In "normal" conditions (i.e. X is low, and the condition is verified less than 10% of the time), the calculation ends correctly, even if this may be enhanced.
My problem occurs when I have many deals (for example X = 30) and almost all of my simulations verify my specific condition (let say 90% of simulations). So just to store the values, I need about 900,000 * 30 * 64bits of memory (about 216Mb). One of my future requirements is to be able to run 5,000,000 of simulations...
So I can't continue with my current way of storing the values. For the moment, I used a "simple" structure of Map<String, List<Double>>, where the key is the ID of the element, and List<Double> the list of values.
So my question is how can I enhance this specific part of my application in order to reduce the memory usage during the simulations?
Also another important note is that for the final calculation, my List<Double> (or whatever structure I will be using) must be ordered. So if the solution to my previous question also provide a structure that order the new inserted element (such as a SortedMap), it will be really great!
I am using Java 1.6.
Edit 1
My engine is executing some financial calculations indeed, and in my case, all deals are related. This means that I cannot run my calculations on the first deal, get the output, clean the List<Double>, and then move to the second deal, and so on.
Of course, as a temporary solution, we will increase the memory allocated to the engine, but it's not the solution I am expecting ;)
Edit 2
Regarding the algorithm itself. I can't give the exact algorithm here, but here are some hints:
We must work on a sorted List<Double>. I will then calculate an index (which is calculated against a given parameter and the size of the List itself). Then, I finally return the index-th value of this List.
public static double algo(double input, List<Double> sortedList) {
if (someSpecificCases) {
return 0;
}
// Calculate the index value, using input and also size of the sortedList...
double index = ...;
// Specific case where I return the first item of my list.
if (index == 1) {
return sortedList.get(0);
}
// Specific case where I return the last item of my list.
if (index == sortedList.size()) {
return sortedList.get(sortedList.size() - 1);
}
// Here, I need the index-th value of my list...
double val = sortedList.get((int) index);
double finalValue = someBasicCalculations(val);
return finalValue;
}
I hope it will help to have such information now...
Edit 3
Currently, I will not consider any hardware modification (too long and complicated here :( ). The solution of increasing the memory will be done, but it's just a quick fix.
I was thinking of a solution that use a temporary file: Until a certain threshold (for example 100,000), my List<Double> stores new values in memory. When the size of List<Double> reaches this threshold, I append this list in the temporary file (one file per deal).
Something like that:
public void addNewValue(double v) {
if (list.size() == 100000) {
appendListInFile();
list.clear();
}
list.add(v);
}
At the end of the whole calculation, for each deal, I will reconstruct the complete List<Double> from what I have in memory and also in the temporary file. Then, I run my algorithm. I clean the values for this deal, and move to the second deal (I can do that now, as all the simulations are now finished).
What do you think of such solution? Do you think it is acceptable?
Of course I will lose some time to read and write my values in an external file, but I think this can be acceptable, no?
Your problem is algorithmic and you are looking for a "reduction in strength" optimization.
Unfortunately, you've been too coy in the the problem description and say "Unfortunately, this algorithm requires the complete list of these values..." which is dubious. The simulation run has already passed a predicate which in itself tells you something about the sets that pass through the sieve.
I expect the data that meets the criteria has a low information content and therefore is amenable to substantial compression.
Without further information, we really can't help you more.
You mentioned that the "engine" is not connected to a database, but have you considered using a database to store the lists of elements? Possibly an embedded DB such as SQLite?
If you used int or even short instead of string for the key field of your Map, that might save some memory.
If you need a collection object that guarantees order, then consider a Queue or a Stack instead of your List that you are currently using.
Possibly think of a way to run deals sequentially, as Dommer and Alan have already suggested.
I hope that was of some help!
EDIT:
Your comment about only having 30 keys is a good point.
In that case, since you have to calculate all your deals at the same time, then have you considered serializing your Lists to disk (i.e. XML)?
Or even just writing a text file to disk for each List, then after the deals are calculated, loading one file/List at a time to verify that List of conditions?
Of course the disadvantage is slow file IO, but, this would reduced your server's memory requirement.
Can you get away with using floats instead of doubles? That would save you 100Mb.
Just to clarify, do you need ALL of the information in memory at once? It sounds like you are doing financial simulations (maybe credit risk?). Say you are running 30 deals, do you need to store all of the values in memory? Or can you run the first deal (~900,000 * 64bits), then discard the list of double (serialize it to disk or something) and then proceed with the next? I thought this might be okay as you say the deals are independent of one another.
Apologies if this sounds patronising; I'm just trying to get a proper idea of the problem.
The flippant answer is to get a bunch more memory. Sun JVM's can (almost happily) handle multi gigabyte heaps and if it's a batch job then longer GC pauses might not be a massive issue.
You may decide that this not a sane solution, the first thing to attempt would be to write a custom list like collection but have it store primitive doubles instead of the object wrapper Double objects. This will help save the per object overhead you pay for each Double object wrapper. I think the Apache common collections project had primitive collection implementations, these might be a starting point.
Another level would be to maintain the list of doubles in a nio Buffer off heap. This has the advantage that the space being used for the data is actually not considered in the GC runs and could in theory could lead you down the road of managing the data structure in a memory mapped file.
From your description, it appears you will not be able to easily improve your memory usage. The size of a double is fixed, and if you need to retain all results until your final processing, you will not be able to reduce the size of that data.
If you need to reduce your memory usage, but can accept a longer run time, you could replace the Map<String, List<Double>> with a List<Double> and only process a single deal at a time.
If you have to have all the values from all the deals, your only option is to increase your available memory. Your calculation of the memory usage is based on just the size of a value and the number of values. Without a way to decrease the number of values you need, no data structure will be able to help you, you just need to increase your available memory.
From what you tell us it sounds like you need 10^6 x 30 processors (ie number of simulations multiplied by number of deals) each with a few K RAM. Perhaps, though, you don't have that many processors -- do you have 30 each of which has sufficient memory for the simulations for one deal ?
Seriously: parallelise your program and buy an 8-core computer with 32GB RAM (or 16-core w 64GB or ...). You are going to have to do this sooner or later, might as well do it now.
There was a theory that I read awhile ago where you would write the data to disk and only read/write a chunk what you. Of course this describes virtual memory, but the difference here is that the programmer controls the flow and location rathan than the OS. The advantage there is that the OS is only allocated so much virtual memory to use, where you have access to the whole HD.
Or an easier option is just to increase your swap/paged memory, which I think would be silly but would help in your case.
After a quick google it seems like this function might help you if you are running on Windows:
http://msdn.microsoft.com/en-us/library/aa366537(VS.85).aspx
You say you need access to all the values, but you cannot possibly operate on all of them at once? Can you serialize the data such that you can store it in a single file. Each record set apart either by some delimiter, key value, or simply the byte count. Keep a byte counter either way. Let that be a "circular file" composed of a left file and a right file operating like opposing stacks. As data is popped(read) off the left file it is processed and pushed(write) into the right file. If your next operation requires a previously processed value reverse the direction of the file transfer. Think of your algorithm as residing at the read/write head of your hard drive. You have access as you would with a list just using different methods and at much reduced speed. The speed hit will be significant but if you can optimize your sequence of serialization so that the most likely accessed data is at the top of the file in order of use and possibly put the left and right files on different physical drives and your page file on a 3rd drive you will benefit from increased hard disk performance due to sequential and simultaneous reads and writes. Of course its a bit harder than it sounds. Each change of direction requires finalizing both files. Logically something like,
if (current data flow if left to right) {send EOF to right_file; left_file = left_file - right_file;} Practically you would want to leave all data in place where it physically resides on the drive and just manipulate the beginning and ending addresses for the files in the master file table. Literally operating like a pair of hard disk stacks. This will be a much slower, more complicated process than simply adding more memory, but very much more efficient than separate files and all that overhead for 1 file per record * millions of records. Or just put all your data into a database. FWIW, this idea just came to me. I've never actually done it or even heard of it done. But I imagine someone must have thought of it before me. If not please let me know. I could really use the credit on my resume.
One solution would be to format the doubles as strings and then add them in a (fast) Key Value store which is ordering by-design.
Then you would only have to read sequentially from the store.
Here is a store that 'naturally' sorts entries as they are inserted.
And they boast that they are doing it at the rate of 100 million entries per second (searching is almost twice as fast):
http://forum.gwan.com/index.php?p=/discussion/comment/897/#Comment_897
With an API of only 3 calls, it should be easy to test.
A fourth call will provide range-based searches.

Categories