GUIDELINES FOR RUNNING LARGE JOBS ON THE DEPARTMENT OF MATHEMATICS SYSTEMS 1) Run long jobs at lower priority. If you are going to run a non-interactive job that takes more than 5 minutes to complete you should make sure to "nice" the job. This lowers the priority of your job so that people needing interactive access will not notice it as much. For example, to run "job" in the background at nice 10 use: "nice +10 job &". Time to job completion Minimum Nice value to use ********************** ************************* 5-10 minutes 4 10-60 minutes 10 1-2 hours 16 2-10 hours 20 More than 10 hours 20, and send email to "requests@math" If you forgot to "nice" a job when it started you can still use "renice" once you know the PID (process identification number). Use "ps -fu $USER" to get a listing of your processes and their PIDs. Then you could use, for example, "renice +20 " where "" is the PID for the long running process. There are manual pages for these commands: "man renice" and "man ps" work, but you should use "man tcsh" for information about "nice". 2) Run jobs consecutively, not concurrently if possible. If you have several commands to run it is usually better to run them consecutively, not concurrently. For example, if you had to run "job1", "job2", and "job3" then use: coxeter% nice +20 job1 ; nice +20 job2 ; nice +20 job3 & to run them consecutively in the background. Note that sphere is a quad processor machine so at off hours, if not much else is using the machine, the three jobs could be run concurrently without interfering (much) with each other or other people. If you feel you need to run jobs concurrently then send email to "requests@math" to let us know the machine you intend to use, how many jobs you wish to run, and why concurrent processing is necessary. 3) Make sure you don't fill up all the disk space. You should always do a test on a small case for any job that will produce output. You could fill up all the disk space if you had a program that printed a line for each even number and the program was counting from 1 to 10^12, for example. The shell has a facility for limiting the size of created files. Type "limit f 10m" before running your job to limit the size of created files to 10 megabytes, for example. On coxeter file system usage quotas have been implemented in order to help prevent the accidental consumption of the whole disk by a single user. 4) Try to make your job restartable. The system may go down part way through your calculations, for example, during the weekly reboot time. If some milestones indicating how far things have gotten and/or some current state of the calculation can be written out then continuing the job from a later point after interruption is simpler. 5) Memory size limits. coxeter has a limited amount of main memory (RAM). It can support jobs of 2GB in size by default (with 4GB virtual images). If a job gets too large then the system will slow down dramatically as swapping occurs. Try to ensure that your jobs do not use more memory than is necessary, but send an email to "requests@math" if you need access to more RAM. sphere does not have these memory restrictions but if one process gets too large it may be terminated (by the system or by hand if necessary). [Last update: February 8, 2008]