HPC Clusters: Useful Commands

Body

  • pestat will show all nodes and their state, along with running jobs, used resources, etc. This is very useful in determining node utilization and seeing what's available before you start your job.

  • squeue will show all running jobs and their status. With this you can see if your job is running, and if not, why. It could be for a number of reasons. Some output you might see here and their explanations are below.

    • Resources - means your job is waiting on available resources and will run when they are available

    • PartitionConfig - means your job will not run due to requesting resources that don't match the partition you wanted.

    • QOSMaxGRESPerNode - means you requested more resources than you are allowed to. Most likely this happens when you request a GPU on a non-GPU partition

    • QOSMinGRES - means you didn't request all the resources you needed to for the partition you picked. Most likely this happens when you try to use a GPU partition, but don't request a GPU.

Details

Details

Article ID: 163832
Created
Thu 9/5/24 7:52 PM
Modified
Fri 9/20/24 12:26 PM