Qsub ruserok failed validating
Hi All, I'm trying to add a new cluster to our network, and I've hit a snag with job submission.We have things to the point where the existing networks can submit jobs to the new cluster, but the new cluster can not submit any jobs.
The plan was to initially install the scheduler on a single box, acting as server, scheduler, compute node, and submission host.There may be nodes down, which means you are requesting resources that aren't available or simply too many resources.To get a listing of all available nodes, use pbsnodes -a For each free compute node, you should get something like the following: To remove jobs from the PBS queue, you can use the qdel command (see the man pages for it). Look at the sections on: The easiest way to do this is to use the psub script to autmoaticallly generate and submit your job to the queue.Eventually, job submission would be extended to other machines, adding them also as compute nodes on additional queues.To help myself if I ever need to do this again, and to help anyone else in the same situation, I’ll detail below what I did.First, of course, one needs to install the necessary packages.
This can be done easily, with the caveat that you get Torque v2.4.16, which at this point is at end of life.
The basic idea is to create a PBS script (see below) that details what queue to use and submit it with the qsub command.
The qsub command is how you submit a job, ANY job - serial, parallel, interactive, or non-interactive - to torque qstat -a utah.edu: Req'd Req'd Elap Job ID Username Queue Jobname Sess ID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 31.
A node may also be down and is still in your /home//hosts file.
We'll send email to Raven account holders when we discover a compute node down that we can't quickly bring back on line.
The error in the torque server_log file is: 10/27/2010 ;0080; PBS_Server; Req;req_reject; Reject reply code=15023(Bad UID for job execution MSG=ruserok failed validating testuser/testuser from morph4), aux=0, type=Queue Job, from [email protected] I've checked all of the "allow_node_submit" and "allow_proxy_user" variables that I've ever read about, and they all seem to be set correctly.