ulimit -s 40960 vs ulimit ulimit -s 10240
I wrote this because i was able to use openmpi to run mpirun
on my 12-core workstation rather happily since day 1 I setup the system a few months ago. Yesterday when I tried to run a big job under mpirun, the job crashed rather quickly, the error message was something like mpirun process exited blah blah with signal 11 (Segmentation fault). Interestingly (or annoyingly) a job required less memory ran okay. Since I never had this problem before, I thought it was the hardware failure. I called my IT guy to explain the problem and he is kind enough to suggest to put a line ulimit -s 40960 in my .bashrc. And it works! But I have no clue why mpirun misbehaves out of a sudden, and that ulimit setting solves the problem completely. I would like to learn from this incident. Anyone has any idea to share ? Thanks a lot! |
okay. My happiness is short-lived.
I still hit mpirun noticed that process rank 3 with PID 11591 on node xxx-node exited on signal 11 (Segmentation fault). problem when I tried a big job. I strongly suspect that this has to do with a huge job that crashed the day before when the disk space ran out. Could it be that the crashed job is dumping something to the commonly used space and it was not clear in time for new jobs to used. Beat me... |
Fyi, "ulimit -s 40960" sets (increases?) the stack size limit to 40 megabytes.
For additional discussion, see here: http://stackoverflow.com/questions/1...s-unlimited-do |
All times are GMT -5. The time now is 07:28 PM. |