SAM Batch Adapter

1. Introduction

The SAM Batch Adapter package is a python API which serves as an interface between SAM and batch systems used for submitting user jobs. The package is fully configurable and does not make any assumptions about underlying batch systems. It comes with a full set of administrative commands that can be used for adapter configuration. Once it has been configured, it will contain knowledge about all batch systems available to the local SAM stations. Overview of the SAM Batch Adapter package, its requirements and design, including (somewhat simplified) class diagrams, can be found here.

2. Design Notes

SAM station can have any number of batch systems available for submitting and running user jobs. For each of those batch systems there should be an adapter configured. Station's batch adapter configuration is kept in a local python module which gets updated every time a valid administrative command is executed. Adapter configuration consists of batch commands and queues available to users, as well as of the default batch system limits. Batch commands are described by their type (e.g., job submission command) and command string which may contain any number of predefined string templates (e.g., qstat %__BATCH_JOB_ID__). They can be associated with any number of possible outcomes characterized by the command exit status, as well as by its output string which also may contain templates.

The Batch Adapter API does not execute batch commands. It simply provides functionality for preparing commands before their execution, as well as for analyzing their outcome. It is responsibility of the API user to execute commands and interpret their results.

There are several types of queues that can be configured for a given adapter. The Batch Adapter API does not make any assumptions about client usage of those queues, so that different clients may use the same type of queue for different purposes. Adding new queue types is straightforward, which makes the API fairly flexible and extensible. The batch queues can be have different limits configured, and those limits override the default adapter limits.

3. SAM Job Submission Client

The SAM Batch Adapter API is used by the SAM Job Submission Handler (i.e., the commands like sam submit and sam run project). The SAM Submission Handler makes the Batch Adapter API calls in order to obtain adapter configuration for a given station, get the requested batch queue and various batch commands. These are used for preparing several wrapper scripts:

Once the project wrapper is executed and jobs are submitted to the batch system, the Submission Handler analyzes the submission output using the Batch Adapter API, and subsequently writes all known information about user job(s) into a file.

There are currently three different queue types that are recognized by the SAM Job Submission Handler: interactive, consumer and project queues (note that project queues may be associated with a single consumer queue). Different queue types are used to support several different modes of running SAM jobs via the Batch Adapter API:

  1. Interactive-Interactive (II) mode, for jobs submitted to an interactive queue. In this mode SAM projects are started interactively at the time of job submission. The actual user applications (i.e., consumers) are invoked following the successful project startup.
  2. Interactive-Batch (IB) mode, for jobs submitted to a consumer queue. SAM projects are also started interactively at the time of job submission, but the user applications are submitted to the batch system after the successful project startup.
  3. Batch-Interactive (BI) mode, for jobs submitted to a project queue which does not have an associated consumer queue. In this mode projects are not started immediately, but are submitted to the batch system. Once projects get started, the user applications are invoked and run within the same batch slot.
  4. Batch-Batch (BB) mode, for jobs submitted to an project queue which has an associated consumer queue. In this mode projects are submitted to the project queue, while the user applications are submitted to its associated consumer queue (only after projects get successfully started).

The SAM Job Submission Handler understands several predefined command types that are used for job submission, lookup, and killing:


For the purposes of the SAM job submission, it is only necessary that one job submission command is defined (two if the batch system configuration requires different commands for submitting projects and consumers). The lookup/kill commands are intended for user advice.

The predefined templates that are understood by the SAM Job Submission Handler and that can be used to form the batch command strings, as well as to define their output, are listed below:

Just like in the case of queue types, none of the above predefined templates and command types has any special meaning for the Batch Adapter API.

4. Administrative Commands

As mentioned before, the SAM Batch Adapter package comes with a full set of administrative commands that can be used for viewing and modifying the local adapter configuration for a given SAM station.

4.1 Station Configuration


4.2 Adapter Configuration


4.3 Queue Manipulation


4.4 Command Manipulation

4.5 Miscellaneous Commands


Example: Creating Station's Batch Adapter Configuration

In this example we create configuration for a new station. We assume that station's name is "d0station", and that it uses the pbs batch system for submitting jobs. The queues configured for SAM are "sam_short" (intended for short SAM jobs), "sam_long" (intended for large SAM jobs), and a special queue "sam_project" (intended only for the SAM projects). The "sam_project" queue requires resource "pmaster".

We start by adding the new station's configuration:

d0test> sambatch list configured stations
Configured stations: ['samadams', 'cab-test', 'cab', 'd0mainz', 'central-analysis', 'clued0', 'sammy', 'fnal-farm', 'generic_station']

d0test> sambatch add station config --station=d0station
Created new configuration module for station d0station.
Added configuration for station d0station.

d0test> sambatch list configured stations
Configured stations: ['samadams', 'fnal-farm', 'd0mainz', 'clued0', 'central-analysis', 'cab', 'generic_station', 'cab-test', 'd0station', 'sammy']

d0test> sambatch display station config --station=d0station
Station: d0station 
  Available Adapters: []
d0test> 
    

The next step is to add the adapter and its queues. The "sam_short" and "sam_long" will be added as consumer queues, while "sam_project" will be added as the project queue with "sam_long" as its associated consumer queue.

d0test> sambatch add adapter --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added batch adapter PBS for station d0station.

d0test> sambatch add consumer queue --queue=sam_short --description="Short SAM jobs" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added consumer queue sam_short to batch adapter PBS for station d0station.

d0test> sambatch add consumer queue --queue=sam_long --description="Long SAM jobs" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added consumer queue sam_long to batch adapter PBS for station d0station.

d0test> sambatch add project queue --queue=sam_project --description="SAM projects" --adapter=PBS --station=d0station --consumer-queue=sam_long
Updated batch configuration for station d0station.
Added project queue sam_project to batch adapter PBS for station d0station.

d0test> sambatch display station config --station=d0station                    
Station: d0station 
  Default Adapter: PBS
  Available Adapters: ['PBS']
    Adapter: PBS 
      Default Queue: sam_short
      Available Queues: ['sam_short', 'sam_project', 'sam_long']
        Consumer Queue: sam_short (Short SAM jobs)
        Project Queue: sam_project (SAM projects)
          Consumer Queue: sam_long (Long SAM jobs)
        Consumer Queue: sam_long (Long SAM jobs)
d0test> 

    

At this point we decide that we will not allow SAM jobs to be submitted directly into the "sam_long" queue, so we remove it from the list of available queues. This does not affect configuration of our "sam_project" queue:

d0test> sambatch delete queue --queue=sam_long --adapter=PBS --station=d0station 
Updated batch configuration for station d0station.
Deleted queue sam_long from batch adapter PBS for station d0station.
d0test> sambatch display station config --station=d0station
Station: d0station 
  Default Adapter: PBS
  Available Adapters: ['PBS']
    Adapter: PBS 
      Default Queue: sam_short
      Available Queues: ['sam_short', 'sam_project']
        Consumer Queue: sam_short (Short SAM jobs)
        Project Queue: sam_project (SAM projects)
          Consumer Queue: sam_long (Long SAM jobs)
d0test> 
    

We also decide to set a limit for the number of parallel user jobs for the "sam_short" queue:

d0test> sambatch list limit types
Available limit types: ['Maximum number of processes per user', 'Maximum cpu time per event']
d0test> sambatch set queue limit --limit="Maximum number of processes per user" --value=1 --queue=sam_short --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Limit for "Maximum number of processes per user" has been set to "1" (queue: sam_short, adapter: PBS, station: d0station).
d0test> sambatch display station config --station=d0station                    
Station: d0station 
  Default Adapter: PBS
  Available Adapters: ['PBS']
    Adapter: PBS 
      Default Queue: sam_short
      Available Queues: ['sam_short', 'sam_project']
        Consumer Queue: sam_short (Short SAM jobs)
          Limits:
            Maximum number of processes per user: 1
        Project Queue: sam_project (SAM projects)
          Consumer Queue: sam_long (Long SAM jobs)
d0test> 
    

We still have to add the adapter commands. Since the "sam_project" queue requires special resource, we'll need two submission commands: one for the consumer wrapper scripts, and one for the project wrapper scripts. For the user's convenience, we'll add standard job lookup and kill commands as well:

d0test> sambatch list command types
Available command types: ['job submit command', 'job lookup command', 'job killcommand', 'project submit command', 'project lookup command', 'project kill command', 'consumer submit command', 'consumer lookup command', 'consumer kill command', 'process submit command', 'process lookup command', 'process kill command']
d0test> sambatch list command templates
Available command templates: ['%__USER_PROJECT__', '%__USER_SCRIPT__', '%__USER_SCRIPT_ARGS__', '%__USER_JDF__', '%__USER_JOB_OUTPUT__', '%__USER_JOB_ERROR__', '%__USER_NAME__', '%__BATCH_JOB_ID__', '%__BATCH_JOB_NAME__', '%__BATCH_QUEUE__', '%__BATCH_FLAGS__', '%__BATCH_HOST__', '%__UNIX_PROCESS_ID__', '%__UNIX_HOST__']
d0test> sambatch add command --command-type="job submit command" --command-strin
g="qsub -q %__BATCH_QUEUE__ -o %__USER_JOB_OUTPUT__ -e %__USER_JOB_ERROR__ %__USER_SCRIPT__" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added batch command of type "job submit command" to batch adapter PBS for station d0station.
d0test> sambatch add command --command-type="project submit command" --command-string="qsub -l nodes=1:pmaster -k oe -q %__BATCH_QUEUE__ %__USER_SCRIPT__" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added batch command of type "project submit command" to batch adapter PBS for station d0station.
d0test> sambatch add command --command-type="job lookup command" --command-string="qstat %__BATCH_JOB_ID__.%__BATCH_HOST__" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added batch command of type "job lookup command" to batch adapter PBS for station d0station.
d0test> sambatch add command --command-type="job kill command" --command-string="qdel %__BATCH_JOB_ID__.%__BATCH_HOST__" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added batch command of type "job kill command" to batch adapter PBS for station d0station.
d0test> sambatch display station config --station=d0station
Station: d0station 
  Default Adapter: PBS
  Available Adapters: ['PBS']
    Adapter: PBS 
      Default Queue: sam_short
      Available Queues: ['sam_short', 'sam_project']
        Consumer Queue: sam_short (Short SAM jobs)
          Limits:
            Maximum number of processes per user: 1
        Project Queue: sam_project (SAM projects)
          Consumer Queue: sam_long (Long SAM jobs)
      Available Commands: ['job kill command', 'job lookup command', 'job submit command', 'project submit command']
        Command: qdel %__BATCH_JOB_ID__.%__BATCH_HOST__
          Type: job kill command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 1
            Outcome Description: Failure
        Command: qstat %__BATCH_JOB_ID__.%__BATCH_HOST__
          Type: job lookup command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 1
            Outcome Description: Failure
        Command: qsub -q %__BATCH_QUEUE__ -o %__USER_JOB_OUTPUT__ -e %__USER_JOB_ERROR__ %__USER_SCRIPT__
          Type: job submit command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 1
            Outcome Description: Failure
        Command: qsub -l nodes=1:pmaster -k oe -q %__BATCH_QUEUE__ %__USER_SCRIPT__
          Type: project submit command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 1
            Outcome Description: Failure
d0test> 
    

The final step is to define successful submission result and to add it to the job submission commands:

d0test> sambatch add command result --command-type="job submit command" --exit-status=0 --command-output="%__BATCH_JOB_ID__.%__BATCH_HOST__" --description="Successful job submission" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added exit status 0 outcome for batch command of type "job submit command" (adapter: PBS, station: d0station).
d0test> sambatch add command result --command-type="project submit command" --exit-status=0 --command-output="%__BATCH_JOB_ID__.%__BATCH_HOST__" --description="Successful project submission" --adapter=PBS --station=d0station
Updated batch configuration for station d0station.
Added exit status 0 outcome for batch command of type "project submit command" (adapter: PBS, station: d0station).
d0test> sambatch display station config --station=d0station
Station: d0station 
  Default Adapter: PBS
  Available Adapters: ['PBS']
    Adapter: PBS 
      Default Queue: sam_short
      Available Queues: ['sam_short', 'sam_project']
        Consumer Queue: sam_short (Short SAM jobs)
          Limits:
            Maximum number of processes per user: 1
        Project Queue: sam_project (SAM projects)
          Consumer Queue: sam_long (Long SAM jobs)
      Available Commands: ['job kill command', 'job lookup command', 'job submit command', 'project submit command']
        Command: qdel %__BATCH_JOB_ID__.%__BATCH_HOST__
          Type: job kill command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 1
            Outcome Description: Failure
        Command: qstat %__BATCH_JOB_ID__.%__BATCH_HOST__
          Type: job lookup command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 1
            Outcome Description: Failure
        Command: qsub -q %__BATCH_QUEUE__ -o %__USER_JOB_OUTPUT__ -e %__USER_JOB_ERROR__ %__USER_SCRIPT__
          Type: job submit command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 0
            Expected Output: %__BATCH_JOB_ID__.%__BATCH_HOST__
            Outcome Description: Successful job submission
            Exit Status: 1
            Outcome Description: Failure
        Command: qsub -l nodes=1:pmaster -k oe -q %__BATCH_QUEUE__ %__USER_SCRIPT__
          Type: project submit command
          Known Outcomes:
            Exit Status: 0
            Outcome Description: Success
            Exit Status: 0
            Expected Output: %__BATCH_JOB_ID__.%__BATCH_HOST__
            Outcome Description: Successful project submission
            Exit Status: 1
            Outcome Description: Failure
d0test> 
    
At this point the PBS batch adapter for d0station should be ready for use.


Sinisa Veseli
Last modified: Fri May 30 14:46:56 CDT 2003