在ClassfileParser::parseClassFile()函数中,解析完常量池、父类和接口后,接着会调用parser_fields()函数解析字段信息。调用语句如下:
u2 java_fields_count = 0;
// Fields (offsets are filled in later)
FieldAllocationCount fac;
Array<u2>* fields = parse_fields(class_name,
access_flags.is_interface(),
&fac, &java_fields_count,
CHECK_(nullHandle));
在调用parse_fields()方法之前定义了一个变量fac,类型为FieldAllocationCount,定义如下:
来源:classFileParser.cpp文件
class FieldAllocationCount: public ResourceObj {
public:
u2 count[MAX_FIELD_ALLOCATION_TYPE];
FieldAllocationCount() {
for (int i = 0; i < MAX_FIELD_ALLOCATION_TYPE; i++) { // MAX_FIELD_ALLOCATION_TYPE的值为10
count[i] = 0;
}
}
FieldAllocationType update(bool is_static, BasicType type) {
FieldAllocationType atype = basic_type_to_atype(is_static, type);
// Make sure there is no overflow with injected fields.
assert(count[atype] < 0xFFFF, "More than 65535 fields");
count[atype]++;
return atype;
}
};
count数组用来统计各个类型变量的数量,这些类型通过FieldAllocationType枚举值定义。FieldAllocationType枚举类的定义如下:
enum FieldAllocationType {
STATIC_OOP, // 0 Oops
STATIC_BYTE, // 1 Boolean, Byte, char
STATIC_SHORT, // 2 shorts
STATIC_WORD, // 3 ints
STATIC_DOUBLE, // 4 aligned long or double
NONSTATIC_OOP, // 5
NONSTATIC_BYTE, // 6
NONSTATIC_SHORT, // 7
NONSTATIC_WORD, // 8
NONSTATIC_DOUBLE, // 9
MAX_FIELD_ALLOCATION_TYPE, // 10
BAD_ALLOCATION_TYPE = -1
};
主要统计静态与非静态的这5种变量的数量,这样在分配内存空间时,会根据变量的数量计算所需要的内存大小。统计的类型如下:
- Oop,引用类型
- Byte,字节类型
- Short,短整型
- Word,双字类型
- Double,浮点类型
update()方法用来更新对应类型变量的总数量。其中的BasicType枚举类的定义如下:
源代码位置:utilities/globalDefinitions.hpp
enum BasicType {
T_BOOLEAN = 4,
T_CHAR = 5,
T_FLOAT = 6,
T_DOUBLE = 7,
T_BYTE = 8,
T_SHORT = 9,
T_INT = 10,
T_LONG = 11,
T_OBJECT = 12,
T_ARRAY = 13,
T_VOID = 14,
T_ADDRESS = 15, // 表示ret指令用到的表示返回地址的returnAddress类型
T_NARROWOOP = 16,
T_METADATA = 17,
T_NARROWKLASS = 18,
T_CONFLICT = 19, // for stack value type with conflicting contents
T_ILLEGAL = 99
};
调用basic_type_to_atype()方法将BasicType对象转换为对应的FieldAllocationType对象,如下:
static FieldAllocationType _basic_type_to_atype[2 * (T_CONFLICT + 1)] = {
BAD_ALLOCATION_TYPE, // 0
BAD_ALLOCATION_TYPE, // 1
BAD_ALLOCATION_TYPE, // 2
BAD_ALLOCATION_TYPE, // 3
///////////////////////////////////////////////////////////
NONSTATIC_BYTE , // T_BOOLEAN = 4,
NONSTATIC_SHORT, // T_CHAR = 5,
NONSTATIC_WORD, // T_FLOAT = 6,
NONSTATIC_DOUBLE, // T_DOUBLE = 7,
NONSTATIC_BYTE, // T_BYTE = 8,
NONSTATIC_SHORT, // T_SHORT = 9,
NONSTATIC_WORD, // T_INT = 10,
NONSTATIC_DOUBLE, // T_LONG = 11,
NONSTATIC_OOP, // T_OBJECT = 12,
NONSTATIC_OOP, // T_ARRAY = 13,
///////////////////////////////////////////////////////////
BAD_ALLOCATION_TYPE, // T_VOID = 14,
BAD_ALLOCATION_TYPE, // T_ADDRESS = 15,
BAD_ALLOCATION_TYPE, // T_NARROWOOP = 16,
BAD_ALLOCATION_TYPE, // T_METADATA = 17,
BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
BAD_ALLOCATION_TYPE, // T_CONFLICT = 19,
BAD_ALLOCATION_TYPE, // 0
BAD_ALLOCATION_TYPE, // 1
BAD_ALLOCATION_TYPE, // 2
BAD_ALLOCATION_TYPE, // 3
///////////////////////////////////////////////////////////
STATIC_BYTE , // T_BOOLEAN = 4,
STATIC_SHORT, // T_CHAR = 5,
STATIC_WORD, // T_FLOAT = 6,
STATIC_DOUBLE, // T_DOUBLE = 7,
STATIC_BYTE, // T_BYTE = 8,
STATIC_SHORT, // T_SHORT = 9,
STATIC_WORD, // T_INT = 10,
STATIC_DOUBLE, // T_LONG = 11,
STATIC_OOP, // T_OBJECT = 12,
STATIC_OOP, // T_ARRAY = 13,
///////////////////////////////////////////////////////////
BAD_ALLOCATION_TYPE, // T_VOID = 14,
BAD_ALLOCATION_TYPE, // T_ADDRESS = 15,
BAD_ALLOCATION_TYPE, // T_NARROWOOP = 16,
BAD_ALLOCATION_TYPE, // T_METADATA = 17,
BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
BAD_ALLOCATION_TYPE, // T_CONFLICT = 19,
};
static FieldAllocationType basic_type_to_atype(bool is_static, BasicType type) {
assert(type >= T_BOOLEAN && type < T_VOID, "only allowable values");
FieldAllocationType result = _basic_type_to_atype[ type + (is_static ? (T_CONFLICT + 1) : 0) ];
assert(result != BAD_ALLOCATION_TYPE, "bad type");
return result;
}
方法baseic_type_to_atype()的实现很简单,这里不在介绍。
1、为变量分配内存空间
为变量分配内存,在ClassFileParser::parse_fields()函数中有如下调用:
u2* fa = NEW_RESOURCE_ARRAY_IN_THREAD(
THREAD, u2, total_fields * (FieldInfo::field_slots + 1));
其中NEW_RESOURCE_ARRAY_IN_THREAD宏定义如下:
#define NEW_RESOURCE_ARRAY_IN_THREAD(thread, type, size)
(type*) resource_allocate_bytes(thread, (size) * sizeof(type))
宏替换后相当于如下调用代码:
u2* fa = (u2*) resource_allocate_bytes(THREAD, (total_fields * (FieldInfo::field_slots + 1)) * sizeof(u2))
其中FieldInfo是个枚举类型,枚举常量field_slots的值为6,在内存中开辟total_fields * (FieldInfo::field_slots + 1)个sizeof(u2)大小的内存空间,因为存储时要按如下的规则存储:
f1: [access, name index, sig index, initial value index, low_offset, high_offset]
f2: [access, name index, sig index, initial value index, low_offset, high_offset]
...
fn: [access, name index, sig index, initial value index, low_offset, high_offset]
[generic signature index]
[generic signature index]
...
也就是如果有n个变量,那么每个变量要占用6个u2类型的存储空间,不过每个变量还可能会有generic signature index,所以只能暂时开辟足够大小的空间来临时存储一下,在后面会按照实际情况来分配空间,然后copy一下即可,这样就避免了由于某些变量没有generic signature index而多分配出的空间。
变量在Class文件中的存储格式如下:
field_info {
u2 access_flags;
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}
其中的access_flags、name_index与descriptor_index对应的就是每个fn中的access、name index与sig index。另外的initial value index用来存储常量值(如果这个变量是一个常量),low_offset与high_offset在后面会详细介绍,这里暂时不介绍。
调用的resource_allocate_bytes()函数如下:
extern char* resource_allocate_bytes(Thread* thread, size_t size, AllocFailType alloc_failmode) {
return thread->resource_area()->allocate_bytes(size, alloc_failmode);
}
char* allocate_bytes(size_t size, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
return (char*)Amalloc(size, alloc_failmode);
}
void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
// 校验ARENA_AMALLOC_ALIGNMENT必须是2的整数倍
assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2");
// 宏扩展后为:
// ((((size_t)(x)) + (((size_t)((2*BytesPerWord))) - 1)) & (~((size_t)(((size_t)((2*BytesPerWord))) - 1))))
x = ARENA_ALIGN(x);
if (!check_for_overflow(x, "Arena::Amalloc", alloc_failmode))
return NULL;
if (_hwm + x > _max) {
return grow(x, alloc_failmode);
} else {
char *old = _hwm;
_hwm += x;
return old;
}
}
最终是在ResourceArea中分配空间,每个线程有一个_resource_area属性,调用的Amalloc()函数与之前在释放Handle句柄时介绍到的Amalloc_4()函数非常相似,这里不过多介绍。
_resource_area属性的定义如下:
// Thread local resource area for temporary allocation within the VM ResourceArea* _resource_area;
在创建线程对象Thead时就会初始化这个属性,在构造函数中有如下调用:
set_resource_area(new (mtThread)ResourceArea()); // 初始化_resource_area属性
ResourceArea继承自Arena类,通过ResourceArea分配内存空间后就可以通过ResourceMark释放,类似于HandleArea和HandleMark。
2、读取变量
下面看ClassFileParser::parse_fields()方法中对变量的读取,如下:
// The generic signature slots start after all other fields' data.
int generic_signature_slot = total_fields * FieldInfo::field_slots;
int num_generic_signature = 0;
for (int n = 0; n < length; n++) {
cfs->guarantee_more(8, CHECK_NULL); // access_flags, name_index, descriptor_index, attributes_count
// 读取变量的访问标识
AccessFlags access_flags;
jint flags = cfs->get_u2_fast() & JVM_RECOGNIZED_FIELD_MODIFIERS;
access_flags.set_flags(flags);
// 读取变量名称索引
u2 name_index = cfs->get_u2_fast();
int cp_size = _cp->length(); // 读取常量池中的数量
Symbol* name = _cp->symbol_at(name_index);
// 读取描述符索引
u2 signature_index = cfs->get_u2_fast();
Symbol* sig = _cp->symbol_at(signature_index);
u2 constantvalue_index = 0;
bool is_synthetic = false;
u2 generic_signature_index = 0;
bool is_static = access_flags.is_static();
FieldAnnotationCollector parsed_annotations(_loader_data);
// 读取变量属性
u2 attributes_count = cfs->get_u2_fast();
if (attributes_count > 0) {
parse_field_attributes(attributes_count, is_static, signature_index,
&constantvalue_index, &is_synthetic,
&generic_signature_index, &parsed_annotations,
CHECK_NULL);
if (parsed_annotations.field_annotations() != NULL) {
if (_fields_annotations == NULL) {
_fields_annotations = MetadataFactory::new_array<AnnotationArray*>(
_loader_data, length, NULL,
CHECK_NULL);
}
_fields_annotations->at_put(n, parsed_annotations.field_annotations());
parsed_annotations.set_field_annotations(NULL);
}
if (parsed_annotations.field_type_annotations() != NULL) {
if (_fields_type_annotations == NULL) {
_fields_type_annotations = MetadataFactory::new_array<AnnotationArray*>(
_loader_data, length, NULL,
CHECK_NULL);
}
_fields_type_annotations->at_put(n, parsed_annotations.field_type_annotations());
parsed_annotations.set_field_type_annotations(NULL);
}
if (is_synthetic) {
access_flags.set_is_synthetic();
}
if (generic_signature_index != 0) {
access_flags.set_field_has_generic_signature();
fa[generic_signature_slot] = generic_signature_index;
generic_signature_slot ++;
num_generic_signature ++;
}
} // 变量属性读取完毕
FieldInfo* field = FieldInfo::from_field_array(fa, n);
field->initialize(access_flags.as_short(),
name_index,
signature_index,
constantvalue_index);
BasicType type = _cp->basic_type_for_signature_at(signature_index);
// Remember how many oops we encountered and compute allocation type
FieldAllocationType atype = fac->update(is_static, type);
field->set_allocation_type(atype);
// After field is initialized with type, we can augment it with aux info
if (parsed_annotations.has_any_annotations())
parsed_annotations.apply_to(field);
} // 结束了for语句
按格式读取出变量的各个值后存储到fa中,其中FieldInfo::from_field_array()方法的实现如下:
static FieldInfo* from_field_array(u2* fields, int index) {
return ((FieldInfo*)(fields + index * field_slots));
}
取出第index个变量对应的6个u2类型的内存位置,然后强制转换为FieldInfo*,这样就通过FieldInfo类非常方便的存取6个属性了,FieldInfo类的定义如下:
// This class represents the field information contained in the fields
// array of an InstanceKlass. Currently it's laid on top an array of
// Java shorts but in the future it could simply be used as a real
// array type. FieldInfo generally shouldn't be used directly.
// Fields should be queried either through InstanceKlass or through
// the various FieldStreams.
class FieldInfo VALUE_OBJ_CLASS_SPEC {
u2 _shorts[field_slots];
...
}
这个类没有虚函数,并且_shorts数组中的元素也是u2类型,也就是占用16位,在内存布局与之前介绍存储变量的布局完全一样,直接通过类中定义的方法操作_shorts数组即可。
调用field->initialize()方法存储读取出来的变量各个属性值,方法的实现如下:
void initialize(u2 access_flags,
u2 name_index,
u2 signature_index,
u2 initval_index ){
_shorts[access_flags_offset] = access_flags;
_shorts[name_index_offset] = name_index;
_shorts[signature_index_offset] = signature_index;
_shorts[initval_index_offset] = initval_index;
_shorts[low_packed_offset] = 0;
_shorts[high_packed_offset] = 0;
}
调用_cp->basic_type_for_signature_at()从变量的签名中读取类型,方法的实现如下:
BasicType ConstantPool::basic_type_for_signature_at(int which) {
return FieldType::basic_type(symbol_at(which));
}
Symbol* symbol_at(int which) {
assert(tag_at(which).is_utf8(), "Corrupted constant pool");
return *symbol_at_addr(which);
}
BasicType FieldType::basic_type(Symbol* signature) {
return char2type(signature->byte_at(0));
}
BasicType FieldType::basic_type(Symbol* signature) {
return char2type(signature->byte_at(0));
}
// Convert a char from a classfile signature to a BasicType
inline BasicType char2type(char c) {
switch( c ) {
case 'B': return T_BYTE;
case 'C': return T_CHAR;
case 'D': return T_DOUBLE;
case 'F': return T_FLOAT;
case 'I': return T_INT;
case 'J': return T_LONG;
case 'S': return T_SHORT;
case 'Z': return T_BOOLEAN;
case 'V': return T_VOID;
case 'L': return T_OBJECT;
case '[': return T_ARRAY;
}
return T_ILLEGAL;
}
调用ConstantPool类中定义的symbol_at()函数从常量池which索引处获取表示签名字符串的Symbol对象,然后根据签名第1个字符就可判断出来变量的类型。得到变量的类型后,调用fac->update()函数更新对应类型的变量数量,这在本篇文章之前已经介绍过,这里不再介绍。
下面就是将临时存储变量信息的fa中的信息copy到新的数组中,代码如下:
// Now copy the fields' data from the temporary resource array.
// Sometimes injected fields already exist in the Java source so
// the fields array could be too long. In that case the
// fields array is trimed. Also unused slots that were reserved
// for generic signature indexes are discarded.
Array<u2>* fields = MetadataFactory::new_array<u2>(
_loader_data, index * FieldInfo::field_slots + num_generic_signature,
CHECK_NULL);
_fields = fields; // save in case of error
{
int i = 0;
for (; i < index * FieldInfo::field_slots; i++) {
fields->at_put(i, fa[i]);
}
for (int j = total_fields * FieldInfo::field_slots;j < generic_signature_slot; j++) {
fields->at_put(i++, fa[j]);
}
assert(i == fields->length(), "");
}
在创建fields数组时,可以看到元素类型为u2的数组的大小变为了index * FieldInfo::field_slots + num_generic_signature,其中的index表示实际共有的变量数量(因为可能还有注入的变量),另外根据实际情况分配了num_generic_signature的存储位置,下面就是从fa中获取信息copy到fields中了,逻辑比较简单,这里不再详细介绍。
相关文章的链接如下:
1、 在Ubuntu 16.04上编译OpenJDK8的源代码
2、 调试HotSpot源代码
3、 HotSpot项目结构
4、 HotSpot的启动过程
13、类加载器
14、类的双亲委派机制
15、核心类的预装载
16、Java主类的装载
17、触发类的装载
18、类文件介绍
19、文件流
20、解析Class文件
21、常量池解析(1)
22、常量池解析(2)
作者持续维护的个人博客classloading.com。
关注公众号,有HotSpot源码剖析系列文章!