zoukankan      html  css  js  c++  java
  • Hints for Debugging Parallel Programs

    Using ddd/gdb:

    Use of a debugging tool like gdb can save you large amounts of time and frustration in any debugging project. But it is especially useful in debugging parallel programs. Whenever possible, avoid doing your debugging by simply adding calls to printf()! Use a debugging tool, for instance gdb, if possible. (Ordinarily I suggest using the ddd GUI to gdb. However, when debugging a parallel program, this may be difficult, as GUIs take up a lot of space on one's monitor screen.) I have a writeup on the art of debugging, and an introduction to gdb, at my debugging-tutorial Web page.

    Debugging of parallel programs is particularly difficult, both because there is "too much happening at once," and because debugging tools like gdb were not designed for parallel use. However, here is how you can use gdb with MPI, PVM, the various DSM packages, and so on (see important note on page-based DSMs later on):

    First, when you compile your application source code, make sure to use the -g option, to retain the symbol table for gdb.

    Now, get the program running, say on the partition

    fajita.engr.ucdavis.edu
    chimi.engr.ucdavis.edu
    
    Say the name of the program is Prime. A copy of Prime will now be running on each machine. You will need to go to each machine and attach gdb to these invocations of the program. To do this, type
    ps ax | grep Prime
    
    (or ps -e or ps -ux, depending on the system), and find the process number for Prime at each machine. You might find several lines of output from this, such as "tcsh Prime ..." or "rsh chimi Prime...". Ignore these; you want the line which is for the execution of Prime itself.

    (Note: One way around this would be to actually initiate the execution of the program at each node via gdb itself. However, this might be difficult to do with some parallel processing library packages.)

    Then type

    gdb Prime process_number
    
    and then use gdb as usual from that point on.

    Note that Prime was ALREADY running at fajita and chimi! What we have done is attach gdb to two already-running processes. However, in order to keep those process from running away from you, get them to wait for you, using the following method:

    In your source code define an integer variable named something like "DebugWait", initialize it to 1, and insert code like

    while (DebugWait) ;
    
    at the very beginning of main(). When you attach gdb to the two Prime processes, both will be stuck at that "while" loop line -- which is exactly what you want. Then for both of them, give the gdb command
    (gdb) set DebugWait = 0
    
    to "liberate" them. Then use gdb as usual, setting break points, single-stepping through the code and so on.

    If you are using a page-based DSM, you need to tell gdb to ignore seg faults, which comprise the central mechanism for page-based DSM. To do this, issue the command

    handle 11 nostop noprint
    

    to gdb. (Seg faults are signal number 11 in UNIX.) Or better yet, place such a line in your .gdbinit startup file during the times when you are debugging your DSM programs.

    Other Debugging Hints:

    Make sure that you do not have any "zombie" processes still hanging around from previous debugging runs. In our examples above, for instance, our program was named Prime; make sure there aren't any old Prime processes still running, since they may interfere with new Prime processes.

    Use malloc() instead of declaring static arrays. Some message-passing packages, for instances, will just quit without an error message of you have declared large (or in some cases even medium-sized) arrays.

    If you find that your program still does not accept large arrays, use the Unix limit command to increase your maximum stack size.

  • 相关阅读:
    多线程编程(二)--进程&&线程
    hdu2222 Keywords Search
    sqlserver 运行正則表達式,调用c# 函数、代码
    【Nutch2.2.1基础教程之2.2】集成Nutch/Hbase/Solr构建搜索引擎之二:内容分析
    ios学习之block初探
    PHP GD 生成图片验证码+session获取储存验证码
    ps 命令详解
    python --subprocess
    python --存储对象
    python --字符串格式化
  • 原文地址:https://www.cnblogs.com/cy163/p/765658.html
Copyright © 2011-2022 走看看