zoukankan      html  css  js  c++  java
  • Pytorch1.3源码解析-第一篇

    pytorch$ tree -L 1
    .
    ├── android
    ├── aten
    ├── benchmarks
    ├── binaries
    ├── c10
    ├── caffe2
    ├── CITATION
    ├── cmake
    ├── CMakeLists.txt
    ├── CODEOWNERS
    ├── CONTRIBUTING.md
    ├── docker
    ├── docs
    ├── ios
    ├── LICENSE
    ├── Makefile
    ├── modules
    ├── mypy-files.txt
    ├── mypy.ini
    ├── mypy-README.md
    ├── NOTICE
    ├── README.md
    ├── requirements.txt
    ├── scripts
    ├── setup.py
    ├── submodules
    ├── test
    ├── third_party
    ├── tools
    ├── torch
    ├── ubsan.supp
    └── version.txt
    
    17 directories, 15 files

    解读如下:

    .

    ├── android

    ├── aten(aten -A TENsor library for C++11,PyTorch的C++ tensor library,aten有大量的代码是来声明和定义Tensor运算相关的逻辑)

    ├── benchmarks (PyTorch Benchmarks)

    ├── binaries (用于移动端基准测试,在PEP中运行pytorch移动基准测试,Run pytorch mobile benchmark in PEP)

    ├── c10(c10-Caffe Tensor Library,核心Tensor实现(手机端+服务端))

    ├── caffe2 (TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test。为了复用,2018年4月Facebook宣布将Caffe2的仓库合并到了PyTorch的仓库,从用户层面来复用包含了代码、CI、部署、使用、各种管理维护等。caffe2中network、operators等的实现,会生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so(caffe2 CPU Python 绑定)、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so(caffe2 CUDA Python 绑定),基本上来自旧的caffe2项目)

    ├── cmake (TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test)

    ├── ios (与性能测试有关脚本)

    ├── modules (与iOS相关)

    ├── scripts (与iOS应用测试相关,增加 benchmark code to iOS TestApp)

    ├── submodules (Re-sync with internal repository)

    ├── third_party (谷歌、Facebook、NVIDIA、Intel等开源的第三方库)

    ├── tools (用于PyTorch构建的脚本)

    ├── torch (TH / THC提供了一些hpp头文件,它们是标准的C ++头文件,而不是C头文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在这里声明定义。其中,PyTorch会使用tools/setup_helpers/generate_code.py来动态生成)

    细节 展开2级目录

    $ tree -L 2
    .
    ├── android
    │       ├── build.gradle
    │       ├── gradle
    │       ├── gradle.properties
    │       ├── libs
    │       ├── pytorch_android
    │       ├── pytorch_android_torchvision
    │       ├── run_tests.sh
    │       └── settings.gradle
    ├── aten
    │       ├── CMakeLists.txt
    │       ├── conda
    │       ├── src
    │       └── tools
    ├── benchmarks
    │       ├── fastrnns
    │       ├── framework_overhead_benchmark
    │       ├── operator_benchmark
    │       └── README.md
    ├── binaries
    │       ├── at_launch_benchmark.cc
    │       ├── bench_gen
    │       ├── benchmark_args.h
    │       ├── benchmark_helper.cc
    │       ├── benchmark_helper.h
    │       ├── caffe2_benchmark.cc
    │       ├── CMakeLists.txt
    │       ├── convert_and_benchmark.cc
    │       ├── convert_caffe_image_db.cc
    │       ├── convert_db.cc
    │       ├── convert_encoded_to_raw_leveldb.cc
    │       ├── convert_image_to_tensor.cc
    │       ├── core_overhead_benchmark.cc
    │       ├── core_overhead_benchmark_gpu.cc
    │       ├── db_throughput.cc
    │       ├── inspect_gpu.cc
    │       ├── intra_inter_benchmark.cc
    │       ├── make_cifar_db.cc
    │       ├── make_image_db.cc
    │       ├── make_mnist_db.cc
    │       ├── parallel_info.cc
    │       ├── predictor_verifier.cc
    │       ├── print_core_object_sizes_gpu.cc
    │       ├── print_registered_core_operators.cc
    │       ├── run_plan.cc
    │       ├── run_plan_mpi.cc
    │       ├── speed_benchmark.cc
    │       ├── speed_benchmark_torch.cc
    │       ├── split_db.cc
    │       ├── tsv_2_proto.cc
    │       ├── tutorial_blob.cc
    │       └── zmq_feeder.cc
    ├── c10
    │       ├── CMakeLists.txt
    │       ├── core
    │       ├── cuda
    │       ├── hip
    │       ├── macros
    │       ├── test
    │       └── util
    ├── caffe2
    │       ├── c2_aten_srcs.bzl
    │       ├── CMakeLists.txt
    │       ├── contrib
    │       ├── core
    │       ├── cuda_rtc
    │       ├── db
    │       ├── distributed
    │       ├── experiments
    │       ├── ideep
    │       ├── image
    │       ├── __init__.py
    │       ├── mobile
    │       ├── mpi
    │       ├── observers
    │       ├── onnx
    │       ├── operators
    │       ├── opt
    │       ├── perfkernels
    │       ├── predictor
    │       ├── proto
    │       ├── python
    │       ├── quantization
    │       ├── queue
    │       ├── README.md
    │       ├── release-notes.md
    │       ├── requirements.txt
    │       ├── serialize
    │       ├── sgd
    │       ├── share
    │       ├── test
    │       ├── transforms
    │       ├── utils
    │       ├── VERSION_NUMBER
    │       └── video
    ├── CITATION
    ├── cmake
    │       ├── BuildVariables.cmake
    │       ├── Caffe2Config.cmake.in
    │       ├── Caffe2ConfigVersion.cmake.in
    │       ├── cmake_uninstall.cmake.in
    │       ├── Codegen.cmake
    │       ├── Dependencies.cmake
    │       ├── External
    │       ├── GoogleTestPatch.cmake
    │       ├── iOS.cmake
    │       ├── MiscCheck.cmake
    │       ├── Modules
    │       ├── Modules_CUDA_fix
    │       ├── ProtoBuf.cmake
    │       ├── ProtoBufPatch.cmake
    │       ├── public
    │       ├── Summary.cmake
    │       ├── TorchConfig.cmake.in
    │       ├── TorchConfigVersion.cmake.in
    │       ├── Utils.cmake
    │       └── Whitelist.cmake
    ├── CMakeLists.txt
    ├── CODEOWNERS
    ├── CONTRIBUTING.md
    ├── docker
    │       ├── caffe2
    │       └── pytorch
    ├── docs
    │       ├── caffe2
    │       ├── cpp
    │       ├── libtorch.rst
    │       ├── make.bat
    │       ├── Makefile
    │       ├── requirements.txt
    │       └── source
    ├── ios
    │       ├── LibTorch.h
    │       ├── LibTorch.podspec
    │       ├── README.md
    │       └── TestApp
    ├── LICENSE
    ├── Makefile
    ├── modules
    │       ├── CMakeLists.txt
    │       ├── detectron
    │       ├── module_test
    │       ├── observers
    │       └── rocksdb
    ├── mypy-files.txt
    ├── mypy.ini
    ├── mypy-README.md
    ├── NOTICE
    ├── README.md
    ├── requirements.txt
    ├── scripts
    │       ├── add_apache_header.sh
    │       ├── apache_header.txt
    │       ├── apache_python.txt
    │       ├── appveyor
    │       ├── build_android.sh
    │       ├── build_host_protoc.sh
    │       ├── build_ios.sh
    │       ├── build_local.sh
    │       ├── build_mobile.sh
    │       ├── build_pytorch_android.sh
    │       ├── build_raspbian.sh
    │       ├── build_tegra_x1.sh
    │       ├── build_tizen.sh
    │       ├── build_windows.bat
    │       ├── diagnose_protobuf.py
    │       ├── fbcode-dev-setup
    │       ├── get_python_cmake_flags.py
    │       ├── model_zoo
    │       ├── onnx
    │       ├── proto.ps1
    │       ├── read_conda_versions.sh
    │       ├── README.md
    │       ├── remove_apache_header.sh
    │       ├── run_mobilelab.py
    │       ├── temp.sh
    │       └── xcode_build.rb
    ├── setup.py
    ├── submodules
    │       └── nervanagpu-rev.txt
    ├── test
    │       ├── backward_compatibility
    │       ├── bottleneck
    │       ├── common_cuda.py
    │       ├── common_device_type.py
    │       ├── common_distributed.py
    │       ├── common_methods_invocations.py
    │       ├── common_nn.py
    │       ├── common_quantization.py
    │       ├── common_quantized.py
    │       ├── common_utils.py
    │       ├── cpp
    │       ├── cpp_api_parity
    │       ├── cpp_extensions
    │       ├── custom_operator
    │       ├── data
    │       ├── dist_autograd_test.py
    │       ├── dist_utils.py
    │       ├── error_messages
    │       ├── expect
    │       ├── expecttest.py
    │       ├── HowToWriteTestsUsingFileCheck.md
    │       ├── hypothesis_utils.py
    │       ├── jit
    │       ├── jit_utils.py
    │       ├── onnx
    │       ├── optim
    │       ├── rpc_test.py
    │       ├── run_test.py
    │       ├── simulate_nccl_errors.py
    │       ├── test_autograd.py
    │       ├── test_c10d.py
    │       ├── test_c10d_spawn.py
    │       ├── test_cpp_api_parity.py
    │       ├── test_cpp_extensions.py
    │       ├── test_cuda_primary_ctx.py
    │       ├── test_cuda.py
    │       ├── test_dataloader.py
    │       ├── test_data_parallel.py
    │       ├── test_dist_autograd_fork.py
    │       ├── test_dist_autograd_spawn.py
    │       ├── test_distributed.py
    │       ├── test_distributions.py
    │       ├── test_docs_coverage.py
    │       ├── test_expecttest.py
    │       ├── test_fake_quant.py
    │       ├── test_function_schema.py
    │       ├── test_indexing.py
    │       ├── test_jit_disabled.py
    │       ├── test_jit_fuser.py
    │       ├── test_jit.py
    │       ├── test_jit_py3.py
    │       ├── test_jit_string.py
    │       ├── test_logging.py
    │       ├── test_mkldnn.py
    │       ├── test_module
    │       ├── test_multiprocessing.py
    │       ├── test_multiprocessing_spawn.py
    │       ├── test_namedtensor.py
    │       ├── test_namedtuple_return_api.py
    │       ├── test_nccl.py
    │       ├── test_nn.py
    │       ├── test_numba_integration.py
    │       ├── test_optim.py
    │       ├── test_qat.py
    │       ├── test_quantization.py
    │       ├── test_quantized_models.py
    │       ├── test_quantized_nn_mods.py
    │       ├── test_quantized.py
    │       ├── test_quantized_tensor.py
    │       ├── test_quantizer.py
    │       ├── test_rpc_fork.py
    │       ├── test_rpc_spawn.py
    │       ├── test_sparse.py
    │       ├── test_tensorboard.py
    │       ├── test_throughput_benchmark.py
    │       ├── test_torch.py
    │       ├── test_type_hints.py
    │       ├── test_type_info.py
    │       ├── test_type_promotion.py
    │       └── test_utils.py
    ├── third_party(谷歌、Facebook、NVIDIA、Intel等开源的第三方库)
    │       ├── benchmark(谷歌开源的benchmark库)
    │       ├── cpuinfo(Facebook开源的cpuinfo,检测cpu信息)
    │       ├── cub(NVIDIA开源的CUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming)
    │       ├── eigen(线性代数矩阵运算库)
    │       ├── fbgemm(Facebook开源的低精度高性能的矩阵运算库,目前作为caffe2 x86的量化运算符的backend)
    │       ├── foxi(ONNXIFI with Facebook Extension)
    │       ├── FP16(Conversion to/from half-precision floating point formats)
    │       ├── FXdiv(C99/C++ header-only library for division via fixed-point multiplication by inverse)
    │       ├── gemmlowp(谷歌开源的矩阵乘法运算库Low-precision matrix multiplication,https://github.com/google/gemmlowp)
    │       ├── gloo(Facebook开源的跨机器训练的通信库Collective communications library with various primitives for multi-machine training)
    │       ├── googletest(谷歌开源的UT框架)
    │       ├── ideep(Intel开源的使用MKL-DNN做的神经网络加速库)
    │       ├── ios-cmake(用于ios的cmake工具链文件)
    │       ├── miniz-2.0.8(数据压缩库,Miniz is a lossless, high performance data compression library in a single source file)
    │       ├── nccl(NVIDIA开源的多GPU通信的优化原语,Optimized primitives for collective multi-GPU communication)
    │       ├── neon2sse(与ARM有关,intende to simplify ARM->IA32 porting)
    │       ├── NNPACK(多核心CPU加速包用于神经网络,Acceleration package for neural networks on multi-core CPUs)
    │       ├── onnx(Open Neural Network Exchange,Facebook开源的神经网络模型交换格式,目前Pytorch、caffe2、ncnn、coreml等都可以对接)
    │       ├── onnx-tensorrt(ONNX-TensorRT: TensorRT backend for ONNX)
    │       ├── protobuf(谷歌开源的protobuf)
    │       ├── psimd(便携式128位SIMD内部函数,Portable 128-bit SIMD intrinsics)
    │       ├── pthreadpool(用于C/C++的多线程池,pthread-based thread pool for C/C++)
    │       ├── pybind11(C ++ 11和Python之间的无缝可操作性支撑库,Seamless operability between C++11 and Python)
    │       ├── python-enum(Python标准枚举模块,Mirror of enum34 package (PeachPy dependency) from PyPI to be used in submodules)
    │       ├── python-peachpy(用于编写高性能汇编内核的Python框架,PeachPy is a Python framework for writing high-performance assembly kernels)
    │       ├── python-six(Python 2 and 3兼容性库)
    │       ├── QNNPACK(Facebook开源的面向移动平台的神经网络量化加速库)
    │       ├── README.md
    │       ├── sleef(SIMD Library for Evaluating Elementary Functions,SIMD库,用于评估基本函数)
    │       ├── tbb(Intel开源的官方线程构建Blocks,Official Threading Building Blocks (TBB))
    │       └── zstd((Facebook开源的Zstandard,快速实时压缩算法库)
    ├── tools
    │       ├── amd_build
    │       ├── aten_mirror.sh
    │       ├── autograd
    │       ├── build_libtorch.py
    │       ├── build_pytorch_libs.py
    │       ├── build_variables.py
    │       ├── clang_format.py
    │       ├── clang_tidy.py
    │       ├── docker
    │       ├── download_mnist.py
    │       ├── flake8_hook.py
    │       ├── generated_dirs.txt
    │       ├── git_add_generated_dirs.sh
    │       ├── git-pre-commit
    │       ├── git_reset_generated_dirs.sh
    │       ├── __init__.py
    │       ├── jit
    │       ├── pyi
    │       ├── pytorch.version
    │       ├── README.md
    │       ├── setup_helpers
    │       └── shared
    ├── torch
    │       ├── abi-check.cpp
    │       ├── autograd
    │       ├── backends
    │       ├── _classes.py
    │       ├── CMakeLists.txt
    │       ├── __config__.py
    │       ├── contrib
    │       ├── csrc
    │       ├── cuda
    │       ├── custom_class.h
    │       ├── distributed
    │       ├── distributions
    │       ├── extension.h
    │       ├── for_onnx
    │       ├── functional.py
    │       ├── __future__.py
    │       ├── hub.py
    │       ├── __init__.py
    │       ├── __init__.pyi.in
    │       ├── jit
    │       ├── _jit_internal.py
    │       ├── legacy
    │       ├── lib
    │       ├── multiprocessing
    │       ├── _namedtensor_internals.py
    │       ├── nn
    │       ├── onnx
    │       ├── _ops.py
    │       ├── optim
    │       ├── py.typed
    │       ├── quantization
    │       ├── quasirandom.py
    │       ├── random.py
    │       ├── README.txt
    │       ├── script.h
    │       ├── serialization.py
    │       ├── _six.py
    │       ├── sparse
    │       ├── _storage_docs.py
    │       ├── storage.py
    │       ├── _tensor_docs.py
    │       ├── tensor.py
    │       ├── _tensor_str.py
    │       ├── testing
    │       ├── _torch_docs.py
    │       ├── utils
    │       ├── _utils_internal.py
    │       └── _utils.py
    ├── ubsan.supp
    └── version.txt
    
    148 directories, 219 files

    其中 第三方库:third_party(谷歌、Facebook、NVIDIA、Intel等开源的第三方库):

    ├── third_party(谷歌、Facebook、NVIDIA、Intel等开源的第三方库)

    │       ├── benchmark(谷歌开源的benchmark库)

    │       ├── cpuinfo(Facebook开源的cpuinfo,检测cpu信息)

    │       ├── cub(NVIDIA开源的CUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming)

    │       ├── eigen(线性代数矩阵运算库)

    │       ├── fbgemm(Facebook开源的低精度高性能的矩阵运算库,目前作为caffe2 x86的量化运算符的backend)

    │       ├── foxi(ONNXIFI with Facebook Extension)

    │       ├── FP16(Conversion to/from half-precision floating point formats)

    │       ├── FXdiv(C99/C++ header-only library for division via fixed-point multiplication by inverse)

    │       ├── gemmlowp(谷歌开源的矩阵乘法运算库Low-precision matrix multiplication,https://github.com/google/gemmlowp)

    │       ├── gloo(Facebook开源的跨机器训练的通信库Collective communications library with various primitives for multi-machine training)

    │       ├── googletest(谷歌开源的UT框架)

    │       ├── ideep(Intel开源的使用MKL-DNN做的神经网络加速库)

    │       ├── ios-cmake(用于ios的cmake工具链文件)

    │       ├── miniz-2.0.8(数据压缩库,Miniz is a lossless, high performance data compression library in a single source file)

    │       ├── nccl(NVIDIA开源的多GPU通信的优化原语,Optimized primitives for collective multi-GPU communication)

    │       ├── neon2sse(与ARM有关,intende to simplify ARM->IA32 porting)

    │       ├── NNPACK(多核心CPU加速包用于神经网络,Acceleration package for neural networks on multi-core CPUs)

    │       ├── onnx(Open Neural Network Exchange,Facebook开源的神经网络模型交换格式,目前Pytorch、caffe2、ncnn、coreml等都可以对接)

    │       ├── onnx-tensorrt(ONNX-TensorRT: TensorRT backend for ONNX)

    │       ├── protobuf(谷歌开源的protobuf)

    │       ├── psimd(便携式128位SIMD内部函数,Portable 128-bit SIMD intrinsics)

    │       ├── pthreadpool(用于C/C++的多线程池,pthread-based thread pool for C/C++)

    │       ├── pybind11(C ++ 11和Python之间的无缝可操作性支撑库,Seamless operability between C++11 and Python)

    │       ├── python-enum(Python标准枚举模块,Mirror of enum34 package (PeachPy dependency) from PyPI to be used in submodules)

    │       ├── python-peachpy(用于编写高性能汇编内核的Python框架,PeachPy is a Python framework for writing high-performance assembly kernels)

    │       ├── python-six(Python 2 and 3兼容性库)

    │       ├── QNNPACK(Facebook开源的面向移动平台的神经网络量化加速库)

    │       ├── README.md

    │       ├── sleef(SIMD Library for Evaluating Elementary Functions,SIMD库,用于评估基本函数)

    │       ├── tbb(Intel开源的官方线程构建Blocks,Official Threading Building Blocks (TBB))

    │       └── zstd((Facebook开源的Zstandard,快速实时压缩算法库)

    Pytorch核心分为5大块:

    1. c10(c10-Caffe Tensor Library,核心Tensor实现(手机端+服务端))

    2. aten(aten -A TENsor library for C++11,PyTorch的C++ tensor library,aten有大量的代码是来声明和定义Tensor运算相关的逻辑)

    3. caffe2 (TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test。为了复用,2018年4月Facebook宣布将Caffe2的仓库合并到了PyTorch的仓库,从用户层面来复用包含了代码、CI、部署、使用、各种管理维护等。caffe2中network、operators等的实现,会生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so(caffe2 CPU Python 绑定)、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so(caffe2 CUDA Python 绑定),基本上来自旧的caffe2项目)

    4. torch (TH / THC提供了一些hpp头文件,它们是标准的C ++头文件,而不是C头文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在这里声明定义。其中,PyTorch会使用tools/setup_helpers/generate_code.py来动态生成)

    5. third_party (谷歌、Facebook、NVIDIA、Intel等开源的第三方库)

     具体详情如下:

    c10下的核心部件(c10-Caffe Tensor Library,最核心Tensor实现(手机端+服务端)。请注意,C10库应保持最小的依赖关系-特别是,它不应该依赖于任何特定于实现或后端的库。它尤其不应依赖于任何生成的protobuf头文件,因为protobuf头文件将可传递性地迫使一个人链接到特定的protobuf版本),具体包括如下:

    ├── c10

    │       ├── CMakeLists.txt

    │       ├── core

    │       ├── cuda

    │       ├── hip

    │       ├── macros

    │       ├── test

    │       └── util

    Aten下的核心部件(aten -A TENsor library for C++11,PyTorch的C++ tensor library,aten有大量的代码是来声明和定义Tensor运算相关的逻辑):

    $ tree -L 2

    .

    ├── CMakeLists.txt

    ├── conda

    │    ├── build.sh

    │    └── meta.yaml

    ├── src

    │    ├── ATen

    │    ├── README.md

    │    ├── TH

    │    ├── THC

    │    ├── THCUNN

    │    └── THNN

    └── tools

            ├── run_tests.sh

            ├── test_install.sh

            └── valgrind.sup

    8 directories, 7 files

    其中,Aten/ src下

    该目录包含PyTorch低级别的tensor libraries库,同时新的C++版Aten被构建,这些低级别的tensor libraries库可以追溯到最原始的Torch项目,该目录包含库如下:

    * TH = TorcH

    * THC = TorcH Cuda

    * THCS = TorcH Cuda Sparse (now defunct)—不使用了

    * THCUNN = TorcH CUda Neural Network (see cunn)

    * THNN = TorcH Neural Network

    * THS = TorcH Sparse (now defunct) —不使用了

    caffe2模块

    Caffe2是一个轻量级,模块化和可扩展的深度学习框架。支持TensorRT 6.0 (优化加速) and PyTorch->ONNX->TRT6 unit test。为了复用,2018年4月Facebook宣布将Caffe2的仓库合并到了PyTorch的仓库,从用户层面来复用包含了代码、CI、部署、使用、各种管理维护等。caffe2中network、operators等的实现,会生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so(caffe2 CPU Python 绑定)、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so(caffe2 CUDA Python 绑定),基本上来自旧的caffe2项目。

    torch下核心部件(TH / THC提供了一些hpp头文件,它们是标准的C ++头文件,而不是C头文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在这里声明定义。理想情况下,根本不会安装这些标头。相反,应该使用公共函数(在类似THTensor.h的头文件中,而不是THTensor.hpp的头文件中)来操纵这些结构。但是,在Torch / csrc中有一些地方违反了这种抽象。它们头文件有指向此注释的指针。当重构THTensor的核和相关结构时,必须重构每个站点。其中,PyTorch会使用tools/setup_helpers/generate_code.py来动态生成):

    .

    ├── autograd (梯度处理)

    ├── backends (后向处理,包含cuda、cudnn、mkl、mkldnn、openmp和quantized库)

    ├── csrc (csrc目录包含与Python集成有关的所有代码。这与lib(它包含与Python无关的Torch库)形成对比。csrc取决于lib,反之则不然。具体包含api、autograd、cuda、distributed、generic、jit、multiprocessing、onnx、tensor和utils)

    ├── cuda (cuda)

    ├── distributed (分布式处理,包括autograd)

    ├── distributions

    ├── jit (用于最优性能编译)

    ├── legacy (低于0.5版本才有)

    ├── lib (它包含与Python无关的Torch库,具体包括:c10d、libshm和libshm_windows)

    ├── multiprocessing (cuda多线程处理)

    ├── nn (与神经网络有关的操作与声明,具体包括backends、intrinsic、modules、parallel、qat、quantized和utils)

    ├── onnx (模型交换格式)

    ├── optim (优化)

    ├── quantization (量化)

    ├── utils (具体包括backcompat、bottleneck、data、ffi、hipify和tensorboard)

    third_party三方模块

    谷歌、Facebook、NVIDIA、Intel等开源的第三方库,具体包含请见前文。

    分层的视角看待:

    1           第一层C10: 最核心的Tensor实现,手机端、服务端都用;

    2           第二层ATen + TH*: Tensor算法的实现,由ATen和TH*组成这一层面;这一层依赖上一层(第一层)。目前已将ATen 某些core往C10上移植,并且将Torch往ATen上移植;

    3           第三层Caffe2: 是一个轻量级,模块化和可扩展的深度学习框架。支持TensorRT 6.0 (优化加速) and PyTorch->ONNX->TRT6 unit test。caffe2中network、operators等的实现,会生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so(caffe2 CPU Python 绑定)、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so(caffe2 CUDA Python 绑定);基本上来自于旧的caffe2项目,这一层依赖上一层(第二层);

    4           第四层Torch,PyTorch的实现,TH / THC提供了一些hpp头文件,它们是标准的C ++头文件,而不是C头文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在这里声明定义,这一层会生成libtorch.so和libtorch_python.so(Python绑定),依赖ATen+TH*(第二层),不过因为ATen+TH*的逻辑被封装在了libcaffe2.so,因此这一层要直接依赖上一层(第三层)。

    5           其他,如hird_party三方库:谷歌、Facebook、NVIDIA、Intel等开源的第三方库,用于支撑ATen + TH*、Caffe2和Torch。

  • 相关阅读:
    价值投资-买股票操作流程
    win10安装mysql8
    .NET Debugging Demos Lab 7: Memory Leak
    .NET Debugging Demos Lab 6: Debugging Challenge
    .NET Debugging Demos Lab 5: Crash
    .NET Debugging Demos Lab 3: Memory
    .NET Debugging Demos Lab 1: Hang- Walkthrough
    .NET Debugging Demos Lab 1: Hang
    【翻译 windbg-3】高级WinDbg 使用教程
    【翻译 windbg-2】Getting started with windbg
  • 原文地址:https://www.cnblogs.com/jeshy/p/11751253.html
Copyright © 2011-2022 走看看