zoukankan      html  css  js  c++  java
  • Linux MPI集群配置

    参考文档:Linux下MPI并行编程环境搭建配置

    MPI是一种并行计算架构,MPICH是MPI的一种实现,本集群使用虚拟机安装,操作系统是ubuntu14.04,使用三台机器,用户名都是ubuntu,机器名分别是ub0, ub1, ub2

    • 安装MPICH
      1. $ tar -xzvf soft/mpich-3.0.4.tar.gz
        $ cd mpich-3.0.4/
        $ ./configure --prefix=/usr/local/mpich
        $ make && sudo make install

        安装后加入环境变量到/etc/profile文件,并执行 source /etc/profile,追加内容到/etc/profile

      2. PATH=$PATH:/usr/local/mpich/bin
        MANPATH=$MANPATH:/usr/local/mpich/man
        export PATH MANPATH
    • 单节点测试
      • 复制源代码包下的examples目录到安装目录下
      1. cp -r examples/ /usr/local/mpich

        执行

        mpirun -np 10 ./examples/cpi

        输出结果如下:

      2. Process 0 of 10 is on ub0
        Process 9 of 10 is on ub0
        Process 1 of 10 is on ub0
        Process 4 of 10 is on ub0
        Process 5 of 10 is on ub0
        Process 7 of 10 is on ub0
        Process 2 of 10 is on ub0
        Process 3 of 10 is on ub0
        Process 6 of 10 is on ub0
        Process 8 of 10 is on ub0
        
        pi is approximately 3.1415926544231256, Error is 0.0000000008333325
        wall clock time = 0.020644
    • 集群配置
      • 需要先配置ssh免密码登录,把ub0(机器名)当作master node,也即主节点,其他是slave node,也即从节点。配置免密码ssh登录的步骤
        • 你需要把主节点的公钥分别发送给从节点,这样主节点登录从节点才是可信任的,就不用密码,以配置ub0的公钥给ub1为例
        • ub0产生公钥
        • $ ssh-keygen -t rsa

          一路enter就行了

        • 把ub0的/home/ubuntu/.ssh/id_rsa.pub 发送到ub1的/home/ubuntu/.ssh/下,如果没有/home/ubuntu/.ssh/就mkdir .ssh
        • 在ub1的/home/ubuntu/.ssh/下执行
        • $ cat id_rsa.pub >> authorized_keys
        • 尝试ub0 ssh到ub1试试,看看是否成功设置无密码登录,如果成功,就继续下一个节点吧
      • 复制编译程序到其他机器上面,这样就不用在其他机器上进行源码编译mpich,节省了时间
      1. scp -r mpich ub1:/usr/local/
        scp -r mpich ub2:/usr/local/
      • 在ub0, ub1, ub2的/etc/hosts上追加
      • 192.168.0.2 ub0
        192.168.0.3 ub1
        192.168.0.4 ub2

        注意,三台机器的/etc/hosts都要追加

      • 把主节点的/usr/local/mpich/example/cpi这个计算圆周率的可执行文件复制到/home/ubuntu目录下,并且发送到ub1和ub2的/home/ubuntu目录
      • 在主节点的/home/ubuntu目录下增加servers文件,记录集群的机器名和对应的进程数
      • ub0:2
        ub1:2
        ub2:2
      • 在ub0的/home/ubuntu目录下执行
      • $ mpiexec -n 10 -f servers ./cpi

        你就可以看到下面的结果

      • Process 0 of 10 is on ub0
        Process 1 of 10 is on ub1
        Process 4 of 10 is on ub0
        Process 5 of 10 is on ub2
        Process 6 of 10 is on ub1
        Process 7 of 10 is on ub2
        Process 8 of 10 is on ub0
        Process 9 of 10 is on ub1
        Process 2 of 10 is on ub2
        Process 3 of 10 is on ub1
        pi is approximately 3.1415926544231256, Error is 0.0000000008333325
        wall clock time = 0.018768
  • 相关阅读:
    LeetCode OJ:Merge Two Sorted Lists(合并两个链表)
    LeetCode OJ:Remove Nth Node From End of List(倒序移除List中的元素)
    LeetCode OJ:Find Peak Element(寻找峰值元素)
    LeetCode OJ:Spiral MatrixII(螺旋矩阵II)
    LeetCode OJ:Longest Palindromic Substring(最长的回文字串)
    利用生产者消费者模型实现大文件的拷贝
    Linux下用c语言实现whereis.
    Huffman编码实现文件的压缩与解压缩。
    修改MySQL数据库存储位置datadir
    python中pickle简介
  • 原文地址:https://www.cnblogs.com/goingmyway/p/5296002.html
Copyright © 2011-2022 走看看