zoukankan      html  css  js  c++  java
  • PaddlePaddle inference 源码分析(二)

    这一部分开始介绍创建Predictor过程, 以下代码均位于paddle/fluid/inference/api目录下

    1、对外暴露的接口均在paddle_inference_api.h中

    namespace paddle_infer
    using Config = paddle::AnalysisConfig;
    ///
    /// \brief A factory to help create predictors. /// /// Usage: /// /// \code{.cpp} /// Config config; /// ... // change the configs. /// auto predictor = CreatePredictor(config); /// \endcode /// PD_INFER_DECL std::shared_ptr<Predictor> CreatePredictor( const Config& config); // NOLINT

    2、Config的具体实现在analysis_config.cc,包括开启GPU、XPU(百度昆仑)、NPU(华为昇腾),设置mkl等待配置均使用Config设置。以开启GPU为例

    void AnalysisConfig::EnableUseGpu(uint64_t memory_pool_init_size_mb,
                                      int device_id) {
    #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
      use_gpu_ = true;
      memory_pool_init_size_mb_ = memory_pool_init_size_mb;
      FLAGS_initial_gpu_memory_in_mb = memory_pool_init_size_mb_;
      gpu_device_id_ = device_id;
    #else
      LOG(ERROR) << "Please compile with gpu to EnableGpu()";
      use_gpu_ = false;
    #endif
    
      Update();
    }

    每次修改配置都会调用Update函数

        2.1 Update函数会将修改的配置更新到Config保存的pass_builder_中。

    mutable std::unique_ptr<PassStrategy> pass_builder_;

        如果开启GPU,会将padd_builder设置为GpuPassStrategy,包含了各种预设好的GPU配置

    // Transfer pass_builder and copy the existing compatible passes.
      if (!pass_builder_ || ((use_gpu() ^ pass_builder_->use_gpu())) ||
          ((use_xpu() ^ pass_builder_->use_xpu())) ||
          ((use_npu() ^ pass_builder_->use_npu()))) {
        if (use_gpu()) {
          pass_builder_.reset(new GpuPassStrategy);
    
          if (use_tensorrt_) {
            // Append after the Affine_channel_conv_fuse pass.
            pass_builder()->InsertPass(3, "tensorrt_subgraph_pass");
          }
        } else if (use_xpu()) {
          PADDLE_ENFORCE_EQ(
              use_gpu(), false,
              platform::errors::InvalidArgument(
                  "Only one choice can be made between CPU and XPU."));
          pass_builder_.reset(new XpuPassStrategy);
        } else if (use_npu()) {
          PADDLE_ENFORCE_EQ(
              use_gpu(), false,
              platform::errors::InvalidArgument(
                  "Only one choice can be made between GPU and NPU."));
          pass_builder_.reset(new NpuPassStrategy);
        } else {
          pass_builder_.reset(new CpuPassStrategy);
        }
    
      }

    3、设置好Config后,就可以进行CreatePredictor操作。具体实现在analysis_predictor.cc中。

    namespace paddle_infer {
    std::shared_ptr<Predictor> CreatePredictor(const Config &config) {  // NOLINT
      std::shared_ptr<Predictor> predictor(new Predictor(config));
      return predictor;
    }

    Predictor::Predictor(const Config &config) {
      const_cast<Config *>(&config)->SwitchUseFeedFetchOps(false);
      // The second parameter indicates that the discard log is not printed
      predictor_ = paddle::CreatePaddlePredictor<
          Config, paddle::PaddleEngineKind::kAnalysis>(config);
    }

    4、CreatePaddlePredictor的声明在paddle_api.h中,有2种特化,这里使用的是Analysis

    enum class PaddleEngineKind {
      kNative = 0,         ///< Use the native Fluid facility.
      kAutoMixedTensorRT,  ///< Automatically mix Fluid with TensorRT.
      kAnalysis,           ///< More optimization.
    };
    
    template <typename ConfigT, PaddleEngineKind engine>
    PD_INFER_DECL std::unique_ptr<PaddlePredictor> CreatePaddlePredictor(
        const ConfigT& config);
    
    template <>
    PD_INFER_DECL std::unique_ptr<PaddlePredictor> CreatePaddlePredictor<
        NativeConfig, PaddleEngineKind::kNative>(const NativeConfig& config);
    
    template <>
    PD_INFER_DECL std::unique_ptr<PaddlePredictor> CreatePaddlePredictor<
        AnalysisConfig, PaddleEngineKind::kAnalysis>(const AnalysisConfig& config);

    Native的具体实现在api_impl.cc中,而Analysis的实现仍在analysis_predictor.cc中

    5、CreatePaddlePredictor<AnalysisConfig, PaddleEngineKind::kAnalysis>的具体实现如下,这里省略了部分代码,着重介绍逻辑

    template <>
    std::unique_ptr<PaddlePredictor> CreatePaddlePredictor<
        AnalysisConfig, PaddleEngineKind::kAnalysis>(const AnalysisConfig &config) {
      ...
    // 创建完成会将Config设置InValid,保障一个Config对应一个Predictor
    VLOG(3) << "create AnalysisConfig"; PADDLE_ENFORCE_EQ( config.is_valid(), true, platform::errors::InvalidArgument( "Note: Each config can only be used for one predictor.")); // 注册OP,只会执行一次 // Register custom operators compiled by the user. // This function can only be executed once per process. static std::once_flag custom_operators_registered; std::call_once(custom_operators_registered, []() { inference::RegisterAllCustomOperator(); }); // 设置GPU参数 if (config.use_gpu()) { ...if (config.thread_local_stream_enabled() && process_level_allocator_enabled) { PADDLE_THROW(platform::errors::Fatal( "When binding threads and streams, the use of " "process-level allocators will result in undefined result " "errors due to memory asynchronous operations." "The thread and stream binding configuration of all " "predictors should be the same in a single process.")); } } std::unique_ptr<PaddlePredictor> predictor(new AnalysisPredictor(config)); // Each config can only be used for one predictor. config.SetInValid(); auto predictor_p = dynamic_cast<AnalysisPredictor *>(predictor.get()); if (!predictor_p->Init(nullptr)) { return nullptr; } if (config.mkldnn_quantizer_enabled() && !predictor_p->MkldnnQuantize()) { return nullptr; } return predictor; }

     6、每个AnalysisPredictor有一个自己的id

    explicit AnalysisPredictor(const AnalysisConfig &config) : config_(config) {
        if (config_.shape_range_info_collected()) {
          config_.SwitchIrOptim(false);
          config_.EnableMemoryOptim(false);
        }
        predictor_id_ = inference::GetUniqueId();
      }

    7、AnalysisPredictor::Init这里是最主要的初始化逻辑,资源占用。

    bool AnalysisPredictor::Init(
        const std::shared_ptr<framework::Scope> &parent_scope,
        const std::shared_ptr<framework::ProgramDesc> &program) {
      VLOG(3) << "Predictor::init()";
      ...
    
      // no matter with or without MKLDNN
      paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads());
    
      if (!PrepareScope(parent_scope)) {
        return false;
      }
      if (!CreateExecutor()) {
        return false;
      }
      if (!PrepareProgram(program)) {
        return false;
      }
    
      // Prepare executor, create local variables.
      if (!PrepareExecutor()) {
        return true;
      }
    
      // Get the feed_target_names and fetch_target_names
      PrepareFeedFetch();
    
      return true;
    }

    8、首先是scope初始化。scope是变量容器,用于保存输入输出变量。PrepareScope会读取所有设备信息,同时创建Scope对象。

    //parent_scope=nullptr
    bool AnalysisPredictor::PrepareScope(
        const std::shared_ptr<framework::Scope> &parent_scope) {
      if (parent_scope) {
        PADDLE_ENFORCE_NOT_NULL(
            parent_scope,
            platform::errors::PreconditionNotMet(
                "Both program and parent_scope should be set in Clone mode."));
        scope_ = parent_scope;
        status_is_cloned_ = true;
      } else {
    // 获取设备,例如GPU就会调用cuda接口,与设备相关的内容都在platform目录下。这里会读取所有设备,并将其信息保存 paddle::framework::InitDevices();
    // TODO(wilber): we need to release memory occupied by weights. scope_.reset(new paddle::framework::Scope()); status_is_cloned_ = false; } sub_scope_ = &scope_->NewScope(); return true; }

    Scope中保存了所有变量,使用unordered_map保存

    mutable std::unordered_map<std::string, std::unique_ptr<Variable>, KeyHasher>
          vars_;

    Scope为链表结构,可以生成sub_scope,而sub_scope->parent指向父节点。这里保存参数时,持久化参数全部保存在父节点Scope中,非持久化参数保存在sub_scope中。Scope为每个predictor独自持有。

    9、创建Executor,这里会根据配置创建对应Place,例如CPUPlace、CUDAPlace。然后根据place_创建对应的NaiveExecutor,NaiveExecutor只用于inference

      place_ = paddle::platform::CPUPlace();
      }
      executor_.reset(new paddle::framework::NaiveExecutor(place_));

    10、PrepareProgram(program)这里program=nullptr,这一步是比较重的操作,会读取模型文件,读取参数,并进行pass优化等等。所以这一步会进行详细介绍。

    10.1 LoadProgramDesc,读取模型文件内容。这里proto定义为framework/framework.proto::ProgramDesc。然后通过proto对象初始化framework::ProgramDesc对象

    message OpDesc {
    
      message Attr {
        required string name = 1;
        required AttrType type = 2;
        optional int32 i = 3;
        optional float f = 4;
        optional string s = 5;
        repeated int32 ints = 6;
        repeated float floats = 7;
        repeated string strings = 8;
        optional bool b = 10;
        repeated bool bools = 11;
        optional int32 block_idx = 12;
        optional int64 l = 13;
        repeated int32 blocks_idx = 14;
        repeated int64 longs = 15;
        repeated double float64s = 16;
      };
    
      message Var {
        required string parameter = 1;
        repeated string arguments = 2;
      };
    
      required string type = 3;
      repeated Var inputs = 1;
      repeated Var outputs = 2;
      repeated Attr attrs = 4;
      optional bool is_target = 5 [ default = false ];
    };
    
    message VarDesc {
    
      message Attr {
        required string name = 1;
        required AttrType type = 2;
        optional int32 i = 3;
        optional string s = 4;
        repeated int32 ints = 5;
      };
    
      required string name = 1;
      required VarType type = 2;
      optional bool persistable = 3 [ default = false ];
      // True if the variable is an input data and
      // have to check the feed data shape and dtype
      optional bool need_check_feed = 4 [ default = false ];
      optional bool is_parameter = 5 [ default = false ];
      optional bool stop_gradient = 6 [ default = false ];
      repeated Attr attrs = 7;
    }
    
    message BlockDesc {
      required int32 idx = 1;
      required int32 parent_idx = 2;
      repeated VarDesc vars = 3;
      repeated OpDesc ops = 4;
      optional int32 forward_block_idx = 5 [ default = -1 ];
    }
    
    // In some cases, Paddle may perform operator definition iterations,
    // and the operator uses OpVersionMap for compatibility testing.
    message OpVersion { required int32 version = 1; }
    message OpVersionMap {
      message OpVersionPair {
        required string op_name = 1;
        required OpVersion op_version = 2;
      }
      repeated OpVersionPair pair = 1;
    }
    
    // Please refer to
    // https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md
    // for more details.
    // TODO(panyx0718): A model can have multiple programs. Need a
    // way to distinguish them. Maybe ID or name?
    message ProgramDesc {
      reserved 2, 3; // For backward compatibility.
      repeated BlockDesc blocks = 1;
      optional Version version = 4;
      optional OpVersionMap op_version_map = 5;
    }

    10.2 NaiveExecutor->CreateVariables,这里将读取的模型文件中的参数信息保存到Scope中vars_。这里会调用两次,一次是将持久化的参数信息保存到父Scope中。第二次将非持久化的参数信息保存到子sub_scope中。

    //block_id=0,persistable第一次true,第二次调用false,scope=sub_scope
    void NaiveExecutor::CreateVariables(const ProgramDesc &desc, int block_id,
                                        bool persistable, Scope *scope) {
      PADDLE_ENFORCE_NOT_NULL(scope,
                              platform::errors::InvalidArgument(
                                  "The Scope to hold variables is nullptr."));
    
      auto &global_block = desc.Block(block_id);
    
      const auto *anc = scope;
      PADDLE_ENFORCE_NE(
          anc->parent(), anc,
          platform::errors::InvalidArgument("Input scope should be child scope."));
      while (anc->parent()) {
        anc = anc->parent();
      }
    
      int num_vars = 0;
      for (auto &var : global_block.AllVars()) {
        if (var->Name() == framework::kEmptyVarName) {
          continue;
        }
        num_vars++;
    
        if (persistable == var->Persistable()) {
          if (persistable) {
            if (!anc->FindVar(var->Name())) {
              auto *ptr = const_cast<Scope *>(anc)->Var(var->Name());
              VLOG(3) << scope << " Create persistable variable " << var->Name()
                      << ", which pointer is " << ptr;
              InitializeVariable(ptr, var->GetType());
            }
          } else {
            auto *ptr = const_cast<Scope *>(scope)->Var(var->Name());
            VLOG(3) << scope << " Create variable " << var->Name()
                    << ", which pointer is " << ptr;
            InitializeVariable(ptr, var->GetType());
          }
        }
      }
      VLOG(4) << "naive executor create " << num_vars << " vars";
    }

     

    10.3 OptimizeInferenceProgram.这里会调用Analyzer根据配置过一遍所有PASS,生成经过优化的ProgramDesc,并将inference_program_重置为优化后的argument_.ir_analyzed_program()

    // NOTE All the members in AnalysisConfig should be copied to Argument.
    void AnalysisPredictor::OptimizeInferenceProgram() {
      // 将config的配置设置到argument中
      PrepareArgument();
      // 遍历analysis_passes,使用Pass对argument进行处理
      Analyzer().Run(&argument_);
    
      PADDLE_ENFORCE_EQ(
          argument_.scope_valid(), true,
          platform::errors::InvalidArgument("The argument scope should be valid."));
      VLOG(5) << "to prepare executor";
      ARGUMENT_CHECK_FIELD((&argument_), ir_analyzed_program);
      inference_program_.reset(
          new framework::ProgramDesc(argument_.ir_analyzed_program()),
          [](framework::ProgramDesc *prog) {
    // Note, please do NOT use any member variables, because member variables may
    // have been destructed in multiple threads.
    #if PADDLE_WITH_TENSORRT
            ...
    #endif
            delete prog;
          });
      // The config and argument take a lot of storage,
      // when the predictor settings are complete, we release these stores.
      argument_.PartiallyRelease();
      config_.PartiallyRelease();
      LOG(INFO) << "======= optimize end =======";
    }

    11、PrepareExecutor.

      首先执行了DisablePrepareDataOpt,将inference_program_中的op进行了一次梳理,如果发现不友好的op就将准备数据disable掉。然后执行NaiveExecutor->Prepare.将sub_scope放入Executor中,然后根据优化后的ProgramDesc调用CreateOps逐个创建OP并保存到Executor中。

    bool AnalysisPredictor::PrepareExecutor() {
      DisablePrepareDataOpt(inference_program_, 0, false);
    
      executor_->Prepare(sub_scope_, *inference_program_, 0,
                         config_.use_feed_fetch_ops_);
    
      PADDLE_ENFORCE_NOT_NULL(sub_scope_,
                              platform::errors::PreconditionNotMet(
                                  "The sub_scope should not be nullptr."));
    
      return true;
    }

    12、PrepareFeedFetch,在sub_scope中创建feed和fetch变量,将其与模型中的输入输出op绑定

    void AnalysisPredictor::PrepareFeedFetch() {
      PADDLE_ENFORCE_NOT_NULL(sub_scope_,
                              platform::errors::InvalidArgument(
                                  "The sub_scope should not be nullptr."));
      CreateFeedFetchVar(sub_scope_);
      for (auto *op : inference_program_->Block(0).AllOps()) {
        if (op->Type() == "feed") {
          int idx = BOOST_GET_CONST(int, op->GetAttr("col"));
          if (feeds_.size() <= static_cast<size_t>(idx)) {
            feeds_.resize(idx + 1);
          }
          feeds_[idx] = op;
          feed_names_[op->Output("Out")[0]] = idx;
          idx2feeds_[idx] = op->Output("Out")[0];
        } else if (op->Type() == "fetch") {
          int idx = BOOST_GET_CONST(int, op->GetAttr("col"));
          if (fetches_.size() <= static_cast<size_t>(idx)) {
            fetches_.resize(idx + 1);
          }
          fetches_[idx] = op;
          idx2fetches_[idx] = op->Input("X")[0];
        }
      }
    }
    联系方式:emhhbmdfbGlhbmcxOTkxQDEyNi5jb20=
  • 相关阅读:
    centos下修改hosts
    metasploit rpc
    使用Suricata和ELK进行网络入侵检测
    查询存储设备的UUID
    CentOS基础命令大全
    两个有序数组合并到一个新数组
    dubbo
    redis基本数据类型【3】-List类型
    redis基本数据类型【2】-Hash类型
    redis基本数据类型【1】-String类型
  • 原文地址:https://www.cnblogs.com/zl1991/p/15688762.html
Copyright © 2011-2022 走看看