Description:

The training outline follows:

  1. Slurm Refresher
    • How Slurm actually works.
    • How Slurm schedules jobs.
    • How long to wait; how to better schedule jobs.
    • Slurm and priorities; how is it done?
  2. Key features
  3. Resource Management
  4. Running a job; job/step allocation
    • Examples – GPUs
    • Examples – Job Arrays
  5. Advanced Features
    • Topology Aware Scheduling
    • Job Sanity Check
    • Job profiling
    • Multithreading (SMT)
    • Heterogeneous j obs
  6. Job Dependencies
    • Chain Jobs
    • Staging input before running, and storing outputs
    • Master/Slave programs
    • Submitting collections of programs (multi-prog)
  7. System Information Job monitoring
  8. Checkpointing & Restart
  9. Use of SLURM API (plans to support this in the future on Pawsey systems)

Start: Tuesday, 13 July 2021 @ 09:00

End: Friday, 16 July 2021 @ 12:00

Duration: 12:00

Timezone: Perth

Prerequisites:

This training is targeted at users who have already used SLURM but whose needs go beyond simple batch files or small interactive jobs.

Eligibility:
  • Host institutions

Organiser: Pawsey Supercomputing Research Centre

Contact: training@pawsey.org.au

Host institution: Pawsey Supercomputing Centre

Keywords: slurm, scheduler, supercomputer

Capacity: 16

Event type:
  • Workshop
Advanced Slurm Training https://staging.dresa.org.au/events/advanced-slurm-training The training outline follows: 1. Slurm Refresher - How Slurm actually works. - How Slurm schedules jobs. - How long to wait; how to better schedule jobs. - Slurm and priorities; how is it done? 2. Key features 3. Resource Management 4. Running a job; job/step allocation - Examples – GPUs - Examples – Job Arrays 5. Advanced Features - Topology Aware Scheduling - Job Sanity Check - Job profiling - Multithreading (SMT) - Heterogeneous j obs 6. Job Dependencies - Chain Jobs - Staging input before running, and storing outputs - Master/Slave programs - Submitting collections of programs (multi-prog) 7. System Information Job monitoring 8. Checkpointing & Restart 9. Use of SLURM API (plans to support this in the future on Pawsey systems) 2021-07-13 09:00:00 UTC 2021-07-16 12:00:00 UTC Pawsey Supercomputing Research Centre Pawsey Supercomputing Centre training@pawsey.org.au [] [] 16 workshop host_institution slurmschedulersupercomputer