Useful Commands

  • pestat will show all nodes and their state, along with running jobs, used resources, etc. This is very useful in determining node utilization and seeing what's available before you start your job.

  • scancel <job-number> will cancel the job with that number.  You will only be able to cancel jobs that are yours.

  • squeue will show all running jobs and their status. With this you can see if your job is running, and if not, why. It could be for a number of reasons. Some output you might see here and their explanations are below.

    • launch failed requeued held - this is actually three different messages. "launch failed" means the job failed to start. A lot of times this is due to bad code or an application that crashed. "requeued" means the job has been requeued to run again. "held" means the job is being held back from running again. This prevents a job that's repeatedly crashing from continuing the start/crash cycle.

    • Resources - means your job is waiting on available resources and will run when they are available

    • PartitionConfig - means your job will not run due to requesting resources that don't match the partition you wanted.

    • QOSMaxGRESPerNode - means you requested more resources than you are allowed to. Most likely this happens when you request a GPU on a non-GPU partition

    • QOSMinGRES - means you didn't request all the resources you needed to for the partition you picked. Most likely this happens when you try to use a GPU partition, but don't request a GPU.

Additional Support 

Open an IT Helpdesk request ticket.
Send an email to ITHelp@utc.edu.
Contact the IT Help Desk at 423-425-4000 or visit our IT Chat Portal.