1、PaddlePaddle使用CPU时正常运行,但是使用GPU时却报出一堆错误信息,节选如下:
paddle.fluid.core.EnforceNotMet: enforce allocating <= available failed, 1835602936 > 1651048192 at [/paddle/paddle/fluid/platform/gpu_info.cc:119] PaddlePaddle Call Stacks: 0 0x7f89b8241736p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486 1 0x7f89b91f0afep paddle::platform::GpuMaxChunkSize() + 766 2 0x7f89b9120aadp paddle::memory::GetGPUBuddyAllocator(int) + 141 3 0x7f89b9120cbcp void* paddle::memory::Alloc<paddle::platform::CUDAPlace>(paddle::platform::CUDAPlace, unsigned long) + 28
解决方法:
export FLAGS_fraction_of_gpu_memory_to_use=0
避免每次运行或者每个终端都添加一次,可将此按个人喜好添加到用户级~/.bashrc或系统级/etc/profile
2、PaddlePaddle使用Fluid版本,使用exe.run时候报错:
--------------------------------------------------------------------------- EnforceNotMet Traceback (most recent call last) <ipython-input-2-ca8c92bb26a4> in <module>() 29 loss = exe.run(fluid.default_main_program(), 30 feed=feeder.feed(data), ---> 31 fetch_list=[avg_cost]) 32 print("Pass {0},Loss {1}".format(pass_id,loss)) /home/dzqiu/anaconda2/lib/python2.7/site-packages/paddle/fluid/executor.pyc in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache) 441 442 self._feed_data(program, feed, feed_var_name, scope) --> 443 self.executor.run(program.desc, scope, 0, True, True) 444 outs = self._fetch_data(fetch_list, fetch_var_name, scope) 445 if return_numpy: EnforceNotMet: enforce y_dims.size() > y_num_col_dims failed, 1 <= 1 The input tensor Y's rank of MulOp should be larger than y_num_col_dims. at [/paddle/paddle/fluid/operators/mul_op.cc:52] PaddlePaddle Call Stacks: 0 0x7f3db10d7736p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 486 1 0x7f3db16da696p paddle::operators::MulOp::InferShape(paddle::framework::InferShapeContext*) const + 2774 2 0x7f3db1f0ef7bp paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 91 3 0x7f3db1f0c6edp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 205 4 0x7f3db11734afp paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 255 5 0x7f3db1174500p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 128
解决方法:
在训练前先执行:exe.run(fluid.default_startup_program())