背景
具体报错信息:
[libprotobuf FATAL bc_out/baidu/aicd/bvs-algo/common/proto/processor.pb.cc:59] CHECK failed: file != NULL:
现象:
程序运行时报错。实际程序代码没有修改,仅是替换了新的sdk动态库文件。
查看libprotobuf自动生成的.cc文件,common/proto/processor.pb.cc中报错代码位置如下:
const ::google::protobuf::FileDescriptor* file =
::google::protobuf::DescriptorPool::generated_pool()->FindFileByName(
"xxxx.proto");
GOOGLE_CHECK(file != NULL);
程序链接情况:
进程process启动时通过dlopen加载libtest1.so;
libtest1.so依赖libcommon.so; 缺失的xxxx.proto就是编译在这个common库中,并且静态链接了libprotobuf库;
libtest1又会加载另一个so, libtset222.so,通过dlopen的方式,下面的函数调用方式:
handle = dlopen(sopath.c_str(), RTLD_NOW | RTLD_GLOBAL | RTLD_DEEPBIND);
问题分析结果
报错原因是在全局的::google::protobuf::DescriptorPool::generated_pool()中找不到processor.proto";而processor.proto在common模块下并且没有修改代码。
从报错看,可能是注册到全局generated_pool中的内容被另一份内容覆盖了,或者被重新初始化给删除了。
查看资料发现,
Protocol Buffers holds a global registry based on the .proto filename. When 2 pieces of software try to add the same PB message to this registry, you get a name conflict.
同一进程内libprotobuf会维护一个全局的global descriptor database。
C++全局对象的初始化是在进入main函数之前进行的,也就是说在程序一开始的时候就已经将.proto文件中的元信息导入到了数据表项集合中。
查看依赖的so,发现依赖的一个新引入so有libprotobuf.so的依赖项。因此问题原因应该是global descriptor database重复初始化了,类似下面的case:
- InitGeneratedPoolOnce() is called before generated_pool_init_ is dynamically initialized. It will call InitGeneratedPool() and create EncodeDescriptorDatabase/DescriptorPool for the first time.
- dynamical initialization of generated_pool_init_ happens and it simply zeros the variable's content.
- InitGeneratedPoolOnce() is called again and it calls InitGeneratedPool() again which creates new EncodedDescriptorDatabase/DescriptorPool for the second time.
修复步骤:
尝试删除新的so对libprotobuf动态库的依赖,重启程序查看没有了上述的报错。
使用patchelf去除libprotobuf动态库的依赖:
$ patchelf --print-needed libtest222.so
libpthread.so.0
libdl.so.2
libprotobuf.so.11
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
$ patchelf --remove-needed libprotobuf.so.11 libtest222.so
$ patchelf --print-needed libtest222.so
libpthread.so.0
libdl.so.2
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
分析过程
1、将报错的代码封装成 testproto()的简单函数,然后提前到init()中进行测试。 这么做目的是验证在其他地方调用是否会报错,且排查程序运行过程的影响。另外,算法so也是在init函数中加载的。
2、修改后,发现在加载算法so之前调用testproto(),就不会报CHECK failed: file != NULL的错误;但是在加载算法so之后第二次执行testproto(),就会在log打印消息时崩溃,堆栈在protobuf访问上。
在加载算法so之后首次调用testproto(),就会报CHECK failed: file != NULL的错误;
3、init()会加载4个第三方的so,依次去除某一个so进行测试,发现只有一个so会引起崩溃。
4、聚焦这个so,发现和其他算法so不同之处在于它依赖了libprotobuf.so.11动态库。猜测可能和全局对象的初始化有关。
查看资料,发现libprotobuf会维护全局的global descriptor database,而且C++全局对象的初始化是在进入main函数之前进行的。
那么程序已经静态编译了libprotobuf,又额外加载了链接libprotobuf动态库的so,很可能是重复初始化的原因。
因此手动修改so的依赖,去除libprotobuf动态库的依赖进行验证。发现删除libprotobuf.so.11的依赖,即可避免此错误。
5、打印generated_pool()的地址进行验证:
第一次,不加载对应so:
test--- init-testproto
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x366cf20
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x366cf20
第二次,加载对应的so:
test--- init-testproto
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x283d310, xxxx.proto file = 0x7f6c8c00edc0
init libtest222.so //加载动态库
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x283d310, xxxx.proto file = 0x7f6c8c00edc0
test--- test-proto
test--- in log process
signal 11 received //在加载新的so之前先调用::google::protobuf::DescriptorPool::generated_pool(),获取的xxxx.proto file 就不为空 ;但是在加载so之后再调用testrpc(),还是会崩溃
deregister service for signal 11
stop node successfully!
段错误(吐核)
第三次,加载新的的so,在加载so之前不调用generated_pool() :
test--- init-testproto
init libtest222.so //加载动态库
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x7f7d600545f0, xxxx.proto file = (nil)
test--- test-proto
test--- in log process
[libprotobuf FATAL bc_out/baidu/aicd/bvs-algo/common/proto/processor.pb.cc:59] CHECK failed: file != NULL: //在加载so之后首次调用::google::protobuf::DescriptorPool::generated_pool(),就会报CHECK failed: file != NULL的错误
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): CHECK failed: file != NULL:
signal 11 received
deregister service for signal 11
已放弃(吐核)
第四次,去除so对libprotobuf动态库的依赖,加载so进行验证:
test--- init-testproto
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x39ce350
init libtest222.so //加载动态库
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x39ce350
第五次,保留so对libprotobuf动态库的依赖,加载track的so进行验证, 但是去除dlopen调用时的RTLD_GLOBAL设置:
handle = dlopen(sopath.c_str(), RTLD_NOW | RTLD_GLOBAL | RTLD_DEEPBIND); //当前加载so的设置方式
init libtest222.so
test--- init-testptoro 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x3b12260, xxxx.proto file = 0x7fb79d5cf200 //去除RTLD_GLOBAL后加载so也能正常获取xxxx.proto的FileDescriptor
test--- test-proto
test--- in log process
test--- test-proto
RTLD_GLOBAL
causes symbols from shared libraries to be made public and available for relocation. This is needed when you import several separate libraries via dlopen(), that use each other's symbols.
protobuf部分源码
1、protobuf/include/google/protobuf/descriptor.h 中的相关定义:
namespace google {
namespace protobuf {
// Used to construct descriptors.
//
// Normally you won't want to build your own descriptors. Message classes
// constructed by the protocol compiler will provide them for you. However,
// if you are implementing Message on your own, or if you are writing a
// program which can operate on totally arbitrary types and needs to load
// them from some sort of database, you might need to.
//
// Since Descriptors are composed of a whole lot of cross-linked bits of
// data that would be a pain to put together manually, the
// DescriptorPool class is provided to make the process easier. It can
// take a FileDescriptorProto (defined in descriptor.proto), validate it,
// and convert it to a set of nicely cross-linked Descriptors.
//
// DescriptorPool also helps with memory management. Descriptors are
// composed of many objects containing static data and pointers to each
// other. In all likelihood, when it comes time to delete this data,
// you'll want to delete it all at once. In fact, it is not uncommon to
// have a whole pool of descriptors all cross-linked with each other which
// you wish to delete all at once. This class represents such a pool, and
// handles the memory management for you.
//
// You can also search for descriptors within a DescriptorPool by name, and
// extensions by number.
class LIBPROTOBUF_EXPORT DescriptorPool {
// Get a pointer to the generated pool. Generated protocol message classes
// which are compiled into the binary will allocate their descriptors in
// this pool. Do not add your own descriptors to this pool.
static const DescriptorPool* generated_pool();
...
};
2、descriptor.cc
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.cc
// generated_pool ====================================================
namespace {
EncodedDescriptorDatabase* GeneratedDatabase() {
static auto generated_database =
internal::OnShutdownDelete(new EncodedDescriptorDatabase());
return generated_database;
}
DescriptorPool* NewGeneratedPool() {
auto generated_pool = new DescriptorPool(GeneratedDatabase());
generated_pool->InternalSetLazilyBuildDependencies();
return generated_pool;
}
} // anonymous namespace
DescriptorDatabase* DescriptorPool::internal_generated_database() {
return GeneratedDatabase();
}
DescriptorPool* DescriptorPool::internal_generated_pool() {
static DescriptorPool* generated_pool =
internal::OnShutdownDelete(NewGeneratedPool());
return generated_pool;
}
const DescriptorPool* DescriptorPool::generated_pool() {
const DescriptorPool* pool = internal_generated_pool();
// Ensure that descriptor.proto has been registered in the generated pool.
DescriptorProto::descriptor();
return pool;
}
参考链接
解析Google Protocol Buffer消息类型的自动反射原理
global std::once_flag variables don't work with Visual Studio #4773