在ClassfileParser::parseClassFile()函数中,解析完常量池、父类和接口后,接着会调用parser_fields()函数解析字段信息。调用语句如下:
u2 java_fields_count = 0; // Fields (offsets are filled in later) FieldAllocationCount fac; Array<u2>* fields = parse_fields(class_name, access_flags.is_interface(), &fac, &java_fields_count, CHECK_(nullHandle));
在调用parse_fields()方法之前定义了一个变量fac,类型为FieldAllocationCount,定义如下:
来源:classFileParser.cpp文件
class FieldAllocationCount: public ResourceObj { public: u2 count[MAX_FIELD_ALLOCATION_TYPE]; FieldAllocationCount() { for (int i = 0; i < MAX_FIELD_ALLOCATION_TYPE; i++) { // MAX_FIELD_ALLOCATION_TYPE的值为10 count[i] = 0; } } FieldAllocationType update(bool is_static, BasicType type) { FieldAllocationType atype = basic_type_to_atype(is_static, type); // Make sure there is no overflow with injected fields. assert(count[atype] < 0xFFFF, "More than 65535 fields"); count[atype]++; return atype; } };
count数组用来统计各个类型变量的数量,这些类型通过FieldAllocationType枚举值定义。FieldAllocationType枚举类的定义如下:
enum FieldAllocationType { STATIC_OOP, // 0 Oops STATIC_BYTE, // 1 Boolean, Byte, char STATIC_SHORT, // 2 shorts STATIC_WORD, // 3 ints STATIC_DOUBLE, // 4 aligned long or double NONSTATIC_OOP, // 5 NONSTATIC_BYTE, // 6 NONSTATIC_SHORT, // 7 NONSTATIC_WORD, // 8 NONSTATIC_DOUBLE, // 9 MAX_FIELD_ALLOCATION_TYPE, // 10 BAD_ALLOCATION_TYPE = -1 };
主要统计静态与非静态的这5种变量的数量,这样在分配内存空间时,会根据变量的数量计算所需要的内存大小。统计的类型如下:
- Oop,引用类型
- Byte,字节类型
- Short,短整型
- Word,双字类型
- Double,浮点类型
update()方法用来更新对应类型变量的总数量。其中的BasicType枚举类的定义如下:
源代码位置:utilities/globalDefinitions.hpp enum BasicType { T_BOOLEAN = 4, T_CHAR = 5, T_FLOAT = 6, T_DOUBLE = 7, T_BYTE = 8, T_SHORT = 9, T_INT = 10, T_LONG = 11, T_OBJECT = 12, T_ARRAY = 13, T_VOID = 14, T_ADDRESS = 15, // 表示ret指令用到的表示返回地址的returnAddress类型 T_NARROWOOP = 16, T_METADATA = 17, T_NARROWKLASS = 18, T_CONFLICT = 19, // for stack value type with conflicting contents T_ILLEGAL = 99 };
调用basic_type_to_atype()方法将BasicType对象转换为对应的FieldAllocationType对象,如下:
static FieldAllocationType _basic_type_to_atype[2 * (T_CONFLICT + 1)] = { BAD_ALLOCATION_TYPE, // 0 BAD_ALLOCATION_TYPE, // 1 BAD_ALLOCATION_TYPE, // 2 BAD_ALLOCATION_TYPE, // 3 /////////////////////////////////////////////////////////// NONSTATIC_BYTE , // T_BOOLEAN = 4, NONSTATIC_SHORT, // T_CHAR = 5, NONSTATIC_WORD, // T_FLOAT = 6, NONSTATIC_DOUBLE, // T_DOUBLE = 7, NONSTATIC_BYTE, // T_BYTE = 8, NONSTATIC_SHORT, // T_SHORT = 9, NONSTATIC_WORD, // T_INT = 10, NONSTATIC_DOUBLE, // T_LONG = 11, NONSTATIC_OOP, // T_OBJECT = 12, NONSTATIC_OOP, // T_ARRAY = 13, /////////////////////////////////////////////////////////// BAD_ALLOCATION_TYPE, // T_VOID = 14, BAD_ALLOCATION_TYPE, // T_ADDRESS = 15, BAD_ALLOCATION_TYPE, // T_NARROWOOP = 16, BAD_ALLOCATION_TYPE, // T_METADATA = 17, BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18, BAD_ALLOCATION_TYPE, // T_CONFLICT = 19, BAD_ALLOCATION_TYPE, // 0 BAD_ALLOCATION_TYPE, // 1 BAD_ALLOCATION_TYPE, // 2 BAD_ALLOCATION_TYPE, // 3 /////////////////////////////////////////////////////////// STATIC_BYTE , // T_BOOLEAN = 4, STATIC_SHORT, // T_CHAR = 5, STATIC_WORD, // T_FLOAT = 6, STATIC_DOUBLE, // T_DOUBLE = 7, STATIC_BYTE, // T_BYTE = 8, STATIC_SHORT, // T_SHORT = 9, STATIC_WORD, // T_INT = 10, STATIC_DOUBLE, // T_LONG = 11, STATIC_OOP, // T_OBJECT = 12, STATIC_OOP, // T_ARRAY = 13, /////////////////////////////////////////////////////////// BAD_ALLOCATION_TYPE, // T_VOID = 14, BAD_ALLOCATION_TYPE, // T_ADDRESS = 15, BAD_ALLOCATION_TYPE, // T_NARROWOOP = 16, BAD_ALLOCATION_TYPE, // T_METADATA = 17, BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18, BAD_ALLOCATION_TYPE, // T_CONFLICT = 19, }; static FieldAllocationType basic_type_to_atype(bool is_static, BasicType type) { assert(type >= T_BOOLEAN && type < T_VOID, "only allowable values"); FieldAllocationType result = _basic_type_to_atype[ type + (is_static ? (T_CONFLICT + 1) : 0) ]; assert(result != BAD_ALLOCATION_TYPE, "bad type"); return result; }
方法baseic_type_to_atype()的实现很简单,这里不在介绍。
1、为变量分配内存空间
为变量分配内存,在ClassFileParser::parse_fields()函数中有如下调用:
u2* fa = NEW_RESOURCE_ARRAY_IN_THREAD( THREAD, u2, total_fields * (FieldInfo::field_slots + 1));
其中NEW_RESOURCE_ARRAY_IN_THREAD宏定义如下:
#define NEW_RESOURCE_ARRAY_IN_THREAD(thread, type, size) (type*) resource_allocate_bytes(thread, (size) * sizeof(type))
宏替换后相当于如下调用代码:
u2* fa = (u2*) resource_allocate_bytes(THREAD, (total_fields * (FieldInfo::field_slots + 1)) * sizeof(u2))
其中FieldInfo是个枚举类型,枚举常量field_slots的值为6,在内存中开辟total_fields * (FieldInfo::field_slots + 1)个sizeof(u2)大小的内存空间,因为存储时要按如下的规则存储:
f1: [access, name index, sig index, initial value index, low_offset, high_offset] f2: [access, name index, sig index, initial value index, low_offset, high_offset] ... fn: [access, name index, sig index, initial value index, low_offset, high_offset] [generic signature index] [generic signature index] ...
也就是如果有n个变量,那么每个变量要占用6个u2类型的存储空间,不过每个变量还可能会有generic signature index,所以只能暂时开辟足够大小的空间来临时存储一下,在后面会按照实际情况来分配空间,然后copy一下即可,这样就避免了由于某些变量没有generic signature index而多分配出的空间。
变量在Class文件中的存储格式如下:
field_info { u2 access_flags; u2 name_index; u2 descriptor_index; u2 attributes_count; attribute_info attributes[attributes_count]; }
其中的access_flags、name_index与descriptor_index对应的就是每个fn中的access、name index与sig index。另外的initial value index用来存储常量值(如果这个变量是一个常量),low_offset与high_offset在后面会详细介绍,这里暂时不介绍。
调用的resource_allocate_bytes()函数如下:
extern char* resource_allocate_bytes(Thread* thread, size_t size, AllocFailType alloc_failmode) { return thread->resource_area()->allocate_bytes(size, alloc_failmode); } char* allocate_bytes(size_t size, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { return (char*)Amalloc(size, alloc_failmode); } void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { // 校验ARENA_AMALLOC_ALIGNMENT必须是2的整数倍 assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2"); // 宏扩展后为: // ((((size_t)(x)) + (((size_t)((2*BytesPerWord))) - 1)) & (~((size_t)(((size_t)((2*BytesPerWord))) - 1)))) x = ARENA_ALIGN(x); if (!check_for_overflow(x, "Arena::Amalloc", alloc_failmode)) return NULL; if (_hwm + x > _max) { return grow(x, alloc_failmode); } else { char *old = _hwm; _hwm += x; return old; } }
最终是在ResourceArea中分配空间,每个线程有一个_resource_area属性,调用的Amalloc()函数与之前在释放Handle句柄时介绍到的Amalloc_4()函数非常相似,这里不过多介绍。
_resource_area属性的定义如下:
// Thread local resource area for temporary allocation within the VM ResourceArea* _resource_area;
在创建线程对象Thead时就会初始化这个属性,在构造函数中有如下调用:
set_resource_area(new (mtThread)ResourceArea()); // 初始化_resource_area属性
ResourceArea继承自Arena类,通过ResourceArea分配内存空间后就可以通过ResourceMark释放,类似于HandleArea和HandleMark。
2、读取变量
下面看ClassFileParser::parse_fields()方法中对变量的读取,如下:
// The generic signature slots start after all other fields' data. int generic_signature_slot = total_fields * FieldInfo::field_slots; int num_generic_signature = 0; for (int n = 0; n < length; n++) { cfs->guarantee_more(8, CHECK_NULL); // access_flags, name_index, descriptor_index, attributes_count // 读取变量的访问标识 AccessFlags access_flags; jint flags = cfs->get_u2_fast() & JVM_RECOGNIZED_FIELD_MODIFIERS; access_flags.set_flags(flags); // 读取变量名称索引 u2 name_index = cfs->get_u2_fast(); int cp_size = _cp->length(); // 读取常量池中的数量 Symbol* name = _cp->symbol_at(name_index); // 读取描述符索引 u2 signature_index = cfs->get_u2_fast(); Symbol* sig = _cp->symbol_at(signature_index); u2 constantvalue_index = 0; bool is_synthetic = false; u2 generic_signature_index = 0; bool is_static = access_flags.is_static(); FieldAnnotationCollector parsed_annotations(_loader_data); // 读取变量属性 u2 attributes_count = cfs->get_u2_fast(); if (attributes_count > 0) { parse_field_attributes(attributes_count, is_static, signature_index, &constantvalue_index, &is_synthetic, &generic_signature_index, &parsed_annotations, CHECK_NULL); if (parsed_annotations.field_annotations() != NULL) { if (_fields_annotations == NULL) { _fields_annotations = MetadataFactory::new_array<AnnotationArray*>( _loader_data, length, NULL, CHECK_NULL); } _fields_annotations->at_put(n, parsed_annotations.field_annotations()); parsed_annotations.set_field_annotations(NULL); } if (parsed_annotations.field_type_annotations() != NULL) { if (_fields_type_annotations == NULL) { _fields_type_annotations = MetadataFactory::new_array<AnnotationArray*>( _loader_data, length, NULL, CHECK_NULL); } _fields_type_annotations->at_put(n, parsed_annotations.field_type_annotations()); parsed_annotations.set_field_type_annotations(NULL); } if (is_synthetic) { access_flags.set_is_synthetic(); } if (generic_signature_index != 0) { access_flags.set_field_has_generic_signature(); fa[generic_signature_slot] = generic_signature_index; generic_signature_slot ++; num_generic_signature ++; } } // 变量属性读取完毕 FieldInfo* field = FieldInfo::from_field_array(fa, n); field->initialize(access_flags.as_short(), name_index, signature_index, constantvalue_index); BasicType type = _cp->basic_type_for_signature_at(signature_index); // Remember how many oops we encountered and compute allocation type FieldAllocationType atype = fac->update(is_static, type); field->set_allocation_type(atype); // After field is initialized with type, we can augment it with aux info if (parsed_annotations.has_any_annotations()) parsed_annotations.apply_to(field); } // 结束了for语句
按格式读取出变量的各个值后存储到fa中,其中FieldInfo::from_field_array()方法的实现如下:
static FieldInfo* from_field_array(u2* fields, int index) { return ((FieldInfo*)(fields + index * field_slots)); }
取出第index个变量对应的6个u2类型的内存位置,然后强制转换为FieldInfo*,这样就通过FieldInfo类非常方便的存取6个属性了,FieldInfo类的定义如下:
// This class represents the field information contained in the fields // array of an InstanceKlass. Currently it's laid on top an array of // Java shorts but in the future it could simply be used as a real // array type. FieldInfo generally shouldn't be used directly. // Fields should be queried either through InstanceKlass or through // the various FieldStreams. class FieldInfo VALUE_OBJ_CLASS_SPEC { u2 _shorts[field_slots]; ... }
这个类没有虚函数,并且_shorts数组中的元素也是u2类型,也就是占用16位,在内存布局与之前介绍存储变量的布局完全一样,直接通过类中定义的方法操作_shorts数组即可。
调用field->initialize()方法存储读取出来的变量各个属性值,方法的实现如下:
void initialize(u2 access_flags, u2 name_index, u2 signature_index, u2 initval_index ){ _shorts[access_flags_offset] = access_flags; _shorts[name_index_offset] = name_index; _shorts[signature_index_offset] = signature_index; _shorts[initval_index_offset] = initval_index; _shorts[low_packed_offset] = 0; _shorts[high_packed_offset] = 0; }
调用_cp->basic_type_for_signature_at()从变量的签名中读取类型,方法的实现如下:
BasicType ConstantPool::basic_type_for_signature_at(int which) { return FieldType::basic_type(symbol_at(which)); } Symbol* symbol_at(int which) { assert(tag_at(which).is_utf8(), "Corrupted constant pool"); return *symbol_at_addr(which); } BasicType FieldType::basic_type(Symbol* signature) { return char2type(signature->byte_at(0)); } BasicType FieldType::basic_type(Symbol* signature) { return char2type(signature->byte_at(0)); } // Convert a char from a classfile signature to a BasicType inline BasicType char2type(char c) { switch( c ) { case 'B': return T_BYTE; case 'C': return T_CHAR; case 'D': return T_DOUBLE; case 'F': return T_FLOAT; case 'I': return T_INT; case 'J': return T_LONG; case 'S': return T_SHORT; case 'Z': return T_BOOLEAN; case 'V': return T_VOID; case 'L': return T_OBJECT; case '[': return T_ARRAY; } return T_ILLEGAL; }
调用ConstantPool类中定义的symbol_at()函数从常量池which索引处获取表示签名字符串的Symbol对象,然后根据签名第1个字符就可判断出来变量的类型。得到变量的类型后,调用fac->update()函数更新对应类型的变量数量,这在本篇文章之前已经介绍过,这里不再介绍。
下面就是将临时存储变量信息的fa中的信息copy到新的数组中,代码如下:
// Now copy the fields' data from the temporary resource array. // Sometimes injected fields already exist in the Java source so // the fields array could be too long. In that case the // fields array is trimed. Also unused slots that were reserved // for generic signature indexes are discarded. Array<u2>* fields = MetadataFactory::new_array<u2>( _loader_data, index * FieldInfo::field_slots + num_generic_signature, CHECK_NULL); _fields = fields; // save in case of error { int i = 0; for (; i < index * FieldInfo::field_slots; i++) { fields->at_put(i, fa[i]); } for (int j = total_fields * FieldInfo::field_slots;j < generic_signature_slot; j++) { fields->at_put(i++, fa[j]); } assert(i == fields->length(), ""); }
在创建fields数组时,可以看到元素类型为u2的数组的大小变为了index * FieldInfo::field_slots + num_generic_signature,其中的index表示实际共有的变量数量(因为可能还有注入的变量),另外根据实际情况分配了num_generic_signature的存储位置,下面就是从fa中获取信息copy到fields中了,逻辑比较简单,这里不再详细介绍。
相关文章的链接如下:
1、 在Ubuntu 16.04上编译OpenJDK8的源代码
2、 调试HotSpot源代码
3、 HotSpot项目结构
4、 HotSpot的启动过程
13、类加载器
14、类的双亲委派机制
15、核心类的预装载
16、Java主类的装载
17、触发类的装载
18、类文件介绍
19、文件流
20、解析Class文件
21、常量池解析(1)
22、常量池解析(2)
作者持续维护的个人博客classloading.com。
关注公众号,有HotSpot源码剖析系列文章!