GridEngine Queue Layout
Last modified: 10/29 16:27
It is really no longer necessary to discuss queues in the traditional sense. In the past, we would create queues based on pools of hardware resources. If a user wanted to utilize a particular hardware resource, he or she would request the appropriate queue. Most times, however, what the user wants and what is best for the user or what is best for all users are not necessarily the same. Allowing individuals to dictate where their jobs will run will inevitably lead to throughput problems since it would be unreasonable to expect the users to understand the complete state and behavior of the scheduler.
We have followed tradition, however, and have created at least one queue per hardware resource. We shall call these "sub-clusters". Each sub-cluster was originally an entire cluster that had to be managed and administered separately. We've now integrated all of these hardware resources into a single accessible network and we've kept with GridEngine for managing those resources.
Here is a run-down of how jobs make their way through the queue:
- When a job is submitted, it is checked to see if it is eligible to run in a particular queue. If it is not, the scheduler moves on to the next queue.
- When the scheduler finds a queue that a job is eligible to run in (this is via Access Control Lists), it then determines if the requested hardware requirements of the job (see Using Complexes match up with the resources the queue provides. If it does, the job is executed if there are available resources. If it does not match or if the chosen queue does not have available resources, the scheduler will try the next queue.
This behavior progresses down the list of queues which are set up in a particular order:
- High-priority access-controlled queues are checked first
- Normal-priority, open-use queues are checked next
- Low-priority, open-use queues are checked last if the user requests
The following queues are available:
| Order | Name | Priority | Hardware Description (per node) | Nodes |
| 0 | mri.q | High | 16.0 GB RAM, 2 x Xeon X5460, Quad-Core | 115 |
| 1 | canaima.q | High | 4.0 GB RAM, 2 x Opteron 246, GigE | 26 |
| 2 | brca.q | High | 4.0 GB RAM, 2 x Opteron 246, GigE | 13 |
| 4 | aetm.q | High | 8.0 GB RAM, 2 x Opteron 2220, InfiniBand | 12 |
| 5 | chandra.q | High | 8.0 GB RAM, 2 x Opteron 2220, InfiniBand | 10 |
| 6 | msl.q | High | 16.0 GB RAM, 2 x Opteron 2384, Quad-Core, InfiniBand | 36 |
| 6 | ipn.q | High | 24.0 GB RAM, 2 x Opteron 2427, Six-Core, InfiniBand | 38 |
| 49 | rcnib2.q | Normal | 16.0 GB RAM, 2 x Xeon X5460, Quad-Core | 5 |
| 50 | enviro.q | Normal | 8.0 GB RAM, 2 x Opteron 248, Myrinet | 24 |
| 51 | rcnib.q | Normal | 16.0 GB RAM 2 x Opteron 2220, InfinIband | 38 |
| 53 | rcnsm_8way.q | Normal | 32.0 GB RAM 4 x Opteron 880, Dual Core | 1 |
| 54 | 16way_sparc.q* | Normal | 64.0 GB RAM 16 x UltraSPARC III | 1 |
| 99 | ib_low_pri2.q | Low | 16.0 GB RAM 2 x Xeon X5460, Quad-Core | 115 |
| 100 | mx_low_pri.q | Low | 4.0 GB RAM, 2 x Opteron 246, Myrinet | 12 |
| 101 | ib_low_pri.q | Low | 8.0 GB RAM, 2 x Opteron 2220, InfiniBand | 22 |
| 102 | gb_low_pri.q | Low | 4.0 GB RAM, 2 x Opteron 246, GigE | 39 |
* SPARC users should specify -l arch=sparc-sol64
Remember, there is no need to manually specify -q [queue_name.q] if you have correctly identified the requirements for your job. The scheduler will run your job on the best possible hardware given your job requirements and permissions regardless of queue.