zoukankan      html  css  js  c++  java
  • SGE:qsub/qstat/qdel/qhost 任务投递和监控

    参考:

    Oracle Grid Engine

    qsub命令

    SGE - qsub使用范例

    SGE作业基本用法

    qsub是最为稳定的底层任务投递系统,就是把一个脚本投递到集群的计算节点上运行。

    注意,只有登录节点才有资格投递任务,计算节点没有权限投递任务,只能执行,所以千万不要在投递的脚本内嵌套投递,会报错的。

    下面是我最为常用的投递命令:

    qsub -cwd -l vf=5g -P 任务单元 -q 队列名

    先逐条解释:

    -cwd: 就是 current working directory,从当前的目录开始执行作业,也就是log文件会写到当前目录;如果不加cwd的话,就会默认输出到用户的 home 目录。如果你想指定输出目录的话,就可以使用wd命令,log会输出到你指定的目录。

    -l:resource=value, 表明作业运行所需要的资源。可以看到我们后面指定了预估内存 vf=5g,一般不用指定 CPU 数。注意,实际这个没什么卵用,很少有集群能严格限制用户的内存使用,vf 只会影响你投递的效率,有人就会钻空子,尽量把内存往低了投,尽快排上。这一部分其实就是个道德约束。

    -P:大型组织里会分团队,分项目,不同的项目需要制定项目名,主要是为了后期方便统计计算资源的消耗,算钱,其实这个命令没卵用。

    -q:指定队列名,这个就非常重要了,队列就是计算机的队列,一个队列只有一些特定的计算节点,你投了哪个节点,你就只能用该节点指定的计算资源。

    待续~

    qsub -help
    OGS/GE 2011.11p1
    usage: qsub [options]
       [-a date_time]                           request a start time
       [-ac context_list]                       add context variable(s)
       [-ar ar_id]                              bind job to advance reservation
       [-A account_string]                      account string in accounting record
       [-b y[es]|n[o]]                          handle command as binary
       [-binding [env|pe|set] exp|lin|str]      binds job to processor cores
       [-c ckpt_selector]                       define type of checkpointing for job
       [-ckpt ckpt-name]                        request checkpoint method
       [-clear]                                 skip previous definitions for job
       [-cwd]                                   use current working directory
       [-C directive_prefix]                    define command prefix for job script
       [-dc simple_context_list]                delete context variable(s)
       [-dl date_time]                          request a deadline initiation time
       [-e path_list]                           specify standard error stream path(s)
       [-h]                                     place user hold on job
       [-hard]                                  consider following requests "hard"
       [-help]                                  print this help
       [-hold_jid job_identifier_list]          define jobnet interdependencies
       [-hold_jid_ad job_identifier_list]       define jobnet array interdependencies
       [-i file_list]                           specify standard input stream file(s)
       [-j y[es]|n[o]]                          merge stdout and stderr stream of job
       [-js job_share]                          share tree or functional job share
       [-jsv jsv_url]                           job submission verification script to be used
       [-l resource_list]                       request the given resources
       [-m mail_options]                        define mail notification events
       [-masterq wc_queue_list]                 bind master task to queue(s)
       [-notify]                                notify job before killing/suspending it
       [-now y[es]|n[o]]                        start job immediately or not at all
       [-M mail_list]                           notify these e-mail addresses
       [-N name]                                specify job name
       [-o path_list]                           specify standard output stream path(s)
       [-P project_name]                        set job's project
       [-p priority]                            define job's relative priority
       [-pe pe-name slot_range]                 request slot range for parallel jobs
       [-q wc_queue_list]                       bind job to queue(s)
       [-R y[es]|n[o]]                          reservation desired
       [-r y[es]|n[o]]                          define job as (not) restartable
       [-sc context_list]                       set job context (replaces old context)
       [-shell y[es]|n[o]]                      start command with or without wrapping <loginshell> -c
       [-soft]                                  consider following requests as soft
       [-sync y[es]|n[o]]                       wait for job to end and return exit code
       [-S path_list]                           command interpreter to be used
       [-t task_id_range]                       create a job-array with these tasks
       [-tc max_running_tasks]                  throttle the number of concurrent tasks (experimental)
       [-terse]                                 tersed output, print only the job-id
       [-v variable_list]                       export these environment variables
       [-verify]                                do not submit just verify
       [-V]                                     export all environment variables
       [-w e|w|n|v|p]                           verify mode (error|warning|none|just verify|poke) for jobs
       [-wd working_directory]                  use working_directory
       [-@ file]                                read commandline input from file
       [{command|-} [command_args]]
    
    account_string          account_name
    complex_list            complex[,complex,...]
    context_list            variable[=value][,variable[=value],...]
    ckpt_selector           `n' `s' `m' `x' <interval> 
    date_time               [[CC]YY]MMDDhhmm[.SS]
    job_identifier_list     {job_id|job_name|reg_exp}[,{job_id|job_name|reg_exp},...]
    jsv_url                 [script:][username@]path
    mail_address            username[@host]
    mail_list               mail_address[,mail_address,...]
    mail_options            `e' `b' `a' `n' `s'
    working_directory       path
    path_list               [host:]path[,[host:]path,...]
    file_list               [host:]file[,[host:]file,...]
    priority                -1023 - 1024
    resource_list           resource[=value][,resource[=value],...]
    simple_context_list     variable[,variable,...]
    slot_range              [n[-m]|[-]m] - n,m > 0
    task_id_range           task_id['-'task_id[':'step]]
    variable_list           variable[=value][,variable[=value],...]
    wc_cqueue               wildcard expression matching a cluster queue
    wc_host                 wildcard expression matching a host
    wc_hostgroup            wildcard expression matching a hostgroup
    wc_qinstance            wc_cqueue@wc_host
    wc_qdomain              wc_cqueue@wc_hostgroup
    wc_queue                wc_cqueue|wc_qdomain|wc_qinstance
    wc_queue_list           wc_queue[,wc_queue,...]
    ar_id                   advance reservation id
    max_running_tasks       maximum number of simultaneously running tasks
    exp                     explicit:<socket>,<core>[:...]
    lin                     linear:<amount>[:<socket>,<core>]
    str                     striding:<amount>:<stepsize>[:<socket>,<core>]
  • 相关阅读:
    SQL Server 2008中的FileStream支持 (转)
    解决SQL Server (MSSQLSERVER) 服务因 3417 (0xD59) 服务性错误而停止 .
    SQL Server 2008: CDC和Change Tracking
    无法升级数据库 'SchoolPlatForm1',因为它是只读的,或者具有只读文件。请将数据库或文件设为可写,然后重新运行恢复操作。 (Microsoft SQL Server,错误: 3415)
    在eclipse中将android项目生成apk并且给apk签名
    Android实现左右滑动效果
    Java Web开发中路径问题小结
    java的事务处理
    离线安装Eclipse的Android ADT开发插件
    jsp母版页组装
  • 原文地址:https://www.cnblogs.com/leezx/p/6285787.html
Copyright © 2011-2022 走看看