zoukankan      html  css  js  c++  java
  • Hints for Debugging Parallel Programs

    Using ddd/gdb:

    Use of a debugging tool like gdb can save you large amounts of time and frustration in any debugging project. But it is especially useful in debugging parallel programs. Whenever possible, avoid doing your debugging by simply adding calls to printf()! Use a debugging tool, for instance gdb, if possible. (Ordinarily I suggest using the ddd GUI to gdb. However, when debugging a parallel program, this may be difficult, as GUIs take up a lot of space on one's monitor screen.) I have a writeup on the art of debugging, and an introduction to gdb, at my debugging-tutorial Web page.

    Debugging of parallel programs is particularly difficult, both because there is "too much happening at once," and because debugging tools like gdb were not designed for parallel use. However, here is how you can use gdb with MPI, PVM, the various DSM packages, and so on (see important note on page-based DSMs later on):

    First, when you compile your application source code, make sure to use the -g option, to retain the symbol table for gdb.

    Now, get the program running, say on the partition

    fajita.engr.ucdavis.edu
    chimi.engr.ucdavis.edu
    
    Say the name of the program is Prime. A copy of Prime will now be running on each machine. You will need to go to each machine and attach gdb to these invocations of the program. To do this, type
    ps ax | grep Prime
    
    (or ps -e or ps -ux, depending on the system), and find the process number for Prime at each machine. You might find several lines of output from this, such as "tcsh Prime ..." or "rsh chimi Prime...". Ignore these; you want the line which is for the execution of Prime itself.

    (Note: One way around this would be to actually initiate the execution of the program at each node via gdb itself. However, this might be difficult to do with some parallel processing library packages.)

    Then type

    gdb Prime process_number
    
    and then use gdb as usual from that point on.

    Note that Prime was ALREADY running at fajita and chimi! What we have done is attach gdb to two already-running processes. However, in order to keep those process from running away from you, get them to wait for you, using the following method:

    In your source code define an integer variable named something like "DebugWait", initialize it to 1, and insert code like

    while (DebugWait) ;
    
    at the very beginning of main(). When you attach gdb to the two Prime processes, both will be stuck at that "while" loop line -- which is exactly what you want. Then for both of them, give the gdb command
    (gdb) set DebugWait = 0
    
    to "liberate" them. Then use gdb as usual, setting break points, single-stepping through the code and so on.

    If you are using a page-based DSM, you need to tell gdb to ignore seg faults, which comprise the central mechanism for page-based DSM. To do this, issue the command

    handle 11 nostop noprint
    

    to gdb. (Seg faults are signal number 11 in UNIX.) Or better yet, place such a line in your .gdbinit startup file during the times when you are debugging your DSM programs.

    Other Debugging Hints:

    Make sure that you do not have any "zombie" processes still hanging around from previous debugging runs. In our examples above, for instance, our program was named Prime; make sure there aren't any old Prime processes still running, since they may interfere with new Prime processes.

    Use malloc() instead of declaring static arrays. Some message-passing packages, for instances, will just quit without an error message of you have declared large (or in some cases even medium-sized) arrays.

    If you find that your program still does not accept large arrays, use the Unix limit command to increase your maximum stack size.

  • 相关阅读:
    怎样在ASP.NET中使用VB.NET上传大文件
    准备开始研读petShop
    Documentum常见问题9—导航树的一些常见设置
    [转]CMMI 入门 基本术语:约束、假设、依赖、风险、承诺
    MyEclipse 8.6反编译插件安装
    Documentum常见问题7—为客户化Type类型Import时添加自定义属性
    【自省篇】程序员的十大技术烦恼
    Documentum常见问题10修改特定Type的显示图标
    打发时间的题目,每天做一点
    Documentum常见问题8—限制导航树上的文件夹个数
  • 原文地址:https://www.cnblogs.com/cy163/p/765658.html
Copyright © 2011-2022 走看看