zoukankan      html  css  js  c++  java
  • apex 安装总结

    最近使用一个库,依赖apex。折腾一个早上才安装好。做记录以方便后来者。

    环境:
    系统: Windows

    库:pytorch1.9.0
    cuda版本: 11.1

    vs : 2019 

    vs补充说明,除 vs和默认推荐C++推荐安装外。遇到问题的时候,临时装

    且没有重启电脑。理论上应该和apex安装无关。因为过程发生操作,所以此处也做记录。

    1.cuda版本不匹配

    库推荐使用pytorch1.7.1  cuda=10.2   。按照库给出的说明安装,提示cuda库不匹配。

    打开 “apex/setup.py” 文件 ,查看代码 发现 torch的cuda版本(torch_binary_major ,torch_binary_minor)和安装的cuda驱动版本要一致nvcc(bare_metal_major,bare_metal_minor)

    def get_cuda_bare_metal_version(cuda_dir):
        raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
        output = raw_output.split()
        release_idx = output.index("release") + 1
        release = output[release_idx].split(".")
        bare_metal_major = release[0]
        bare_metal_minor = release[1][0]
    
        return raw_output, bare_metal_major, bare_metal_minor
    
    def check_cuda_torch_binary_vs_bare_metal(cuda_dir):
        raw_output, bare_metal_major, bare_metal_minor = get_cuda_bare_metal_version(cuda_dir)
        torch_binary_major = torch.version.cuda.split(".")[0]
        torch_binary_minor = torch.version.cuda.split(".")[1]
    
        print("
    Compiling cuda extensions with")
        print(raw_output + "from " + cuda_dir + "/bin
    ")
    
        if (bare_metal_major != torch_binary_major) or (bare_metal_minor != torch_binary_minor):
            raise RuntimeError("Cuda extensions are being compiled with a version of Cuda that does " +
                               "not match the version used to compile Pytorch binaries.  " +
                               "Pytorch binaries were compiled with Cuda {}.
    ".format(torch.version.cuda) +
                               "In some cases, a minor-version mismatch will not cause later errors:  " +
                               "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
                               "You can try commenting out this check (at your own risk).")

    解决办法,cuda和pytorch之间,一者适应另一者 。另外,查看SetUp,py文件,cuda版本>10.0

    最终选择

    python:3.7

    pytorch安装命令“”

    2.安装nvcc

    cmd激活命令, 输入 “nvcc -V” 提示不是系统命令

    重新安装cuda11.1 ,选择自定义,去除其余,勾选nvcc 。安装。 
    接着设定 nvcc的路径到系统路径 。然后参考网上命令 激活Path(正在跑程序,不想重启电脑)
    cmd窗口输入“nvcc -V” 。结果正常
    疑似此处留的坑,当时安装完没重启,可能因此导致后面安装失败,直到重启为止。


    3.遇到“Given no hashes to check XXX links for project 'pip': discarding no candidates”错误

    一直卡在这个提示
    1)首先,打开“apex/requirements.txt”,“apex/requirements_dev.txt” ,对照conda list ,安装缺失的库。

    2)其次,“https://blog.csdn.net/qq_33019383/article/details/103990248” 说要安装 torch-scatter 。于是安装。
    3)网上说删除之前下载的“C:UsersAdministratorapex”文件夹,重新执行如下命令

    git clone https://www.github.com/nvidia/apex
    cd apex
    python3 setup.py install

    遗憾的是以上都没有生效
    4.最终解决

    重启电脑。因为前面说的库,还依赖其它,就顺手装

    pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8 diffdist

    然后执行

    cd apex
    python3 setup.py install 

    有警告,但安装成功了。

    torch.__version__  = 1.9.0
    
    
    setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
      warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
    running install
    running bdist_egg
    running egg_info
    writing apex.egg-infoPKG-INFO
    writing dependency_links to apex.egg-infodependency_links.txt
    writing top-level names to apex.egg-info	op_level.txt
    reading manifest file 'apex.egg-infoSOURCES.txt'
    writing manifest file 'apex.egg-infoSOURCES.txt'
    installing library code to builddist.win-amd64egg
    running install_lib
    running build_py
    creating buildlib
    creating buildlibapex
    copying apex__init__.py -> buildlibapex
    creating buildlibapexamp
    copying apexampamp.py -> buildlibapexamp
    copying apexampcompat.py -> buildlibapexamp
    ……
    copying buildlibapexpyprof
    vtx__init__.py -> builddist.win-amd64eggapexpyprof
    vtx
    creating builddist.win-amd64eggapexpyprofparse
    copying buildlibapexpyprofparsedb.py -> builddist.win-amd64eggapexpyprofparse
    ……
    copying buildlibapexRNN__init__.py -> builddist.win-amd64eggapexRNN
    copying buildlibapex__init__.py -> builddist.win-amd64eggapex
    byte-compiling builddist.win-amd64eggapexampamp.py to amp.cpython-37.pyc
    ……
    byte-compiling builddist.win-amd64eggapexRNNRNNBackend.py to RNNBackend.cpython-37.pyc
    byte-compiling builddist.win-amd64eggapexRNN__init__.py to __init__.cpython-37.pyc
    byte-compiling builddist.win-amd64eggapex__init__.py to __init__.cpython-37.pyc
    creating builddist.win-amd64eggEGG-INFO
    copying apex.egg-infoPKG-INFO -> builddist.win-amd64eggEGG-INFO
    copying apex.egg-infoSOURCES.txt -> builddist.win-amd64eggEGG-INFO
    copying apex.egg-infodependency_links.txt -> builddist.win-amd64eggEGG-INFO
    copying apex.egg-info	op_level.txt -> builddist.win-amd64eggEGG-INFO
    zip_safe flag not set; analyzing archive contents...
    apex.pyprof.nvtx.__pycache__.nvmarker.cpython-37: module references __file__
    apex.pyprof.nvtx.__pycache__.nvmarker.cpython-37: module references __path__
    creating dist
    creating 'distapex-0.1-py3.7.egg' and adding 'builddist.win-amd64egg' to it
    removing 'builddist.win-amd64egg' (and everything under it)
    Processing apex-0.1-py3.7.egg
    creating c:programdataanaconda3envsXXXXlibsite-packagesapex-0.1-py3.7.egg
    Extracting apex-0.1-py3.7.egg to c:programdataanaconda3envsXXXXlibsite-packages
    Adding apex 0.1 to easy-install.pth file
    
    Installed c:programdataanaconda3envsXXXXlibsite-packagesapex-0.1-py3.7.egg
    Processing dependencies for apex==0.1
    Finished processing dependencies for apex==0.1

    5.后续

    1)

    后面发现执行设定精度设置的语句会报错,所以实际没安装成功。

    并且再次执行命令

    python setup.py install

    命令执行,直接换行,没有执行结果。

    改用

    python setup.py build
    pip install -v --no-cache-dir

    执行结果

    torch.__version__  = 1.9.0
    
    
    setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
      warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
    

    running bdist_wheel
    running build
    running build_py
    installing to builddist.win-amd64wheel
    running install
    running install_lib
    ………………………………………………………………………………………………………………………………………………
    adding 'apex-0.1.dist-info/WHEEL'
    adding 'apex-0.1.dist-info/top_level.txt'
    adding 'apex-0.1.dist-info/RECORD'
    removing builddist.win-amd64wheel
    Error in atexit._run_exitfuncs:
    Traceback (most recent call last):
    File "C:ProgramDataAnaconda3envspytorch1.8.1libsite-packagescoloramaansitowin32.py", line 59, in closed
    return stream.closed
    ValueError: underlying buffer has been detached
    done
    Created wheel for apex: filename=apex-0.1-py3-none-any.whl size=206058 sha256=8761f64146164553df82742b07c5ef2cfe9da3a82a636b9457483cb95a9544ba
    Stored in directory: C:UsersAdministratorAppDataLocalTemppip-ephem-wheel-cache-8l21lyriwheels17e2d0fbd642567ec1ec2e05aa8db3ae5d45c586c0f909da3f40de6e
    Successfully built apex
    Installing collected packages: apex

    
    

    Successfully installed apex-0.1
    1 location(s) to search for versions of pip:
    * https://pypi.org/simple/pip/
    Fetching project page and analyzing links: https://pypi.org/simple/pip/
    Getting page https://pypi.org/simple/pip/
    Found index url https://pypi.org/simple
    Starting new HTTPS connection (1): pypi.org:443
    https://pypi.org:443 "GET /simple/pip/ HTTP/1.1" 200 16538
    ……………………………………………………………………………………………………………………………………………………………………
    Found link https://files.pythonhosted.org/packages/b1/44/6e26d5296b83c6aac166e48470d57a00d3ed1f642e89adc4a4e412a01643/pip-21.1.2.tar.gz#sha256=eb5df6b9ab0af50fe1098a52fd439b04730b6e066887ff7497357b9ebd19f79b (from https://pypi.org/simple/pip/) (requires-python:>=3.6), version: 21.1.2
    Skipping link: not a file: https://pypi.org/simple/pip/
    Given no hashes to check 167 links for project 'pip': discarding no candidates
    Removed build tracker: 'C:\Users\Administrator\AppData\Local\Temp\pip-req-tracker-hs8z7jdp'

    
    

    Successfully installed apex-0.1”显示安装成功。但是要注意命令没有安装cuda拓展和C++拓展。一旦代码运用到涉及的部分,就会出现问题。

    比如:运行swin_Transformer 示例。 会弹警告,提示找不到 “amp_C” 。连锁反应“

    torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank)

    这一句执行弹出警告,实际执行失败,没有完成分布式运算初始化。 进而导致,后续跟分布式有关代码全部要手动注释掉(抽样,训练时世代设置)

    2)

    其余安装方法参考 codebrid的 apex 安装/使用 记录

    测试参考apex 安装/使用 记录

  • 相关阅读:
    loadrunner-27796错误寻求解决办法
    LR常用函数整理
    Ajax本地跨域问题 Cross origin requests are only supported for HTTP
    Sublime Text 2 安装emmet插件和常用快捷键
    如何设置静态内容缓存时间
    怎么看网站是否开启CDN加速?测试网站全国访问速度方法详解
    python 多线程就这么简单(转)
    (转)浅谈ASP.NET报表控件
    (转)第一天 XHTML CSS基础知识 文章出处:标准之路(http://www.aa25.cn/div_css/902.shtml)
    详解CSS选择器、优先级与匹配原理
  • 原文地址:https://www.cnblogs.com/PiaoLingJiLu/p/14889612.html
Copyright © 2011-2022 走看看