Large Jobs on Coxeter

If you need to run a large job on Coxeter requiring temporary disk space or CPU usage please read the following information or type "scratchspace" or "bigjobs" on coxeter.

Temporary Disk Space

Sphere has some space (currently 10GB available) so people can temporarily store files for a day. The directory /scratch/today can be used for this storage. It is recommended that users create their own directories inside /scratch/today (for example "mkdir /scratch/today/joe") and put their temporary files there. Every day, just after midnight, all the files and directories in /scratch/today will be moved to /scratch/yesterday and the previous day's version of /scratch/yesterday will be removed. Files in these directories are NOT backed up and are NOT recoverable; do NOT store anything important in these directories.

Do NOT use any system directories (such as /tmp or /var/tmp) for file storage. Use only your own home directory (and subdirectories of that) or the space in /scratch/today for storing your files. Files in system directories may be removed at any time.

Guidelines for Running Large Jobs on the Department of Mathematics Systems

  • Run long jobs at lower priority
    • If you are going to run a non-interactive job that takes more than 5 minutes to complete you should make sure to "nice" the job. This lowers the priority of your job so that people needing interactive access will not notice it as much. For example, to run "job" in the background at nice 10 use: "nice +10 job &".
    • Time to job completion Minimum Nice value to use:
      • 5-10 minutes 4 
      • 10-60 minutes 10 
      • 1-2 hours 16 
      • 2-10 hours 20
      • More than 10 hours 20, and send email to "requests@math"
    • If you forgot to "nice" a job when it started you can still use "renice" once you know the PID (process identification number). Use "ps -fu $USER" to get a listing of your processes and their PIDs. (If you are running a job with multiple threads then use "ps -fLu $USER".) Then you could use, for example, "renice +20 <pid>" where "<pid>" is the PID for any long running processes or threads. There are manual pages for these commands: "man renice" and "man ps" work, but you should use "man tcsh" for information about "nice" (assuming you are using the default shell, tcsh, that we give to people).
  • Run jobs consecutively, not concurrently if possible.
    • If you have several commands to run it is usually better to run them consecutively, not concurrently. For example, if you had to run "job1", "job2", and "job3" then use: coxeter% nice +20 job1 ; nice +20 job2 ; nice +20 job3 & to run them consecutively in the background. Note that sphere is a quad processor machine so at off hours, if not much else is using the machine, the three jobs could be run concurrently without interfering (much) with each other or other people.
    • If you feel you need to run jobs concurrently then send email to "requests@math" to let us know the machine you intend to use, how many jobs you wish to run, and why concurrent processing is necessary.
  • Make sure you don't fill up all the disk space.
    • You should always do a test on a small case for any job that will produce output. You could fill up all the disk space if you had a program that printed a line for each even number and the program was counting from 1 to 10^12, for example. The shell has a facility for limiting the size of created files. 
    • Type "limit f 10m" before running your job to limit the size of created files to 10 megabytes, for example. On coxeter file system usage quotas have been implemented in order to help prevent the accidental consumption of the whole disk by a single user.
  • Try to make your job restartable.
    • The system may go down part way through your calculations, for example, during the weekly reboot time. If some milestones indicating how far things have gotten and/or some current state of the calculation can be written out then continuing the job from a later point after interruption is simpler.
  • Memory size limits.
    • coxeter has a limited amount of main memory (RAM). It can support jobs of 2GB in size by default (with 4GB virtual images). If a job gets too large then the system will slow down dramatically as swapping occurs. Try to ensure that your jobs do not use more memory than is necessary, but send an email to "requests@math" if you need access to more RAM. sphere does not have these memory restrictions but if one process gets too large it may be terminated (by the system or by hand if necessary).

Contact Us

If you need something specialized please contact us at requests@math.toronto.edu with your requirements