前言
本文旨在从理论上分析JVM 在 Linux 环境下 Attach 操作的前因后果,以及 JVM 为此而设计并实现的解决方案,通过本文,我希望能够讲述清楚如下三个主要方面的内容。
原发布:我的博客
一、Attach 为什么而出现
Attach的出现究其根本原因,应该就是为了实现 Java 进程(A)与进程(B)之间的本地通信。一旦这个通信通道能够成功建立,那么进程 A 就能通知进程 B 去执行某些操作,从而达到监控进程 B 或者控制进程 B 的某些行为的目的。如 jstack、jmap等 JDK 自带的工具,基本都是通过 Attach 机制去达成各自想要的目的的。至于 jstack、jmap 能做什么、怎么做,就不再本文的讨论范围了,请自行百度或者 Google。
二、Attach 在 JVM 底层实现的根本原理是什么
Attach 实现的根本原理就是使用了 Linux 下是文件 Socket 通信(详情可以自行百度或 Google)。有人也许会问,为什么要采用文件 socket 而不采用网络 socket?我个人认为也许一方面是为了效率(避免了网络协议的解析、数据包的封装和解封装等),另一方面是为了减少对系统资源的占用(如网络端口占用)。采用文件 socket 通信,就好比两个进程通过事先约定好的协议,对同一个文件进行读写操作,以达到信息的交互和共享。简单理解成如下图所示的模型
通过/tmp/.java.pid2345这个文件,实现客户进程与目标进程2345的通信。
三、Attach 在 JVM 中实现的源码分析
源码的分析主要分三阶段进行,这里要达到的目的是,弄 Attach 的清楚来龙去脉,本文的所有源码都是基于 Open JDK 1.8的,大家可以自行去下载 Open JDK 1.8 的源码。
3.1、目标JVM 对OS信号监听的实现
或许你会想,在最开始的时候,目标 JVM 是怎么知道有某个进程想 attach 它自己的?答案很简单,就是目标 JVM 在启动的时候,在 JVM 内部启动了一个监听线程,这个线程的名字叫“Signal Dispatcher”,该线程的作用是,监听并处理 OS 的信号。至于什么是 OS 的信号(可以自行百度或 Google),简单理解就是,Linux系统允许进程与进程之间通过过信号的方式进行通信,如触发某个操作(操作由接受到信号的进程自定义)。如平常我们用的最多的就是 kill -9 ${pid}来杀死某个进程,kill进程通过向${pid}的进程发送一个编号为“9”号的信号,来通知系统强制结束${pid}的生命周期。
接下来我们就通过源码截图的方式来呈现一下“Signal Dispatcher”线程的创建过程。
首先进入 JVM 的启动类:/jdk/src/share/bin/main.c
1 int 2 main(int argc, char **argv) 3 { 4 int margc; 5 char** margv; 6 const jboolean const_javaw = JNI_FALSE; 7 #endif /* JAVAW */ 8 #ifdef _WIN32 9 { 10 int i = 0; 11 if (getenv(JLDEBUG_ENV_ENTRY) != NULL) { 12 printf("Windows original main args: "); 13 for (i = 0 ; i < __argc ; i++) { 14 printf("wwwd_args[%d] = %s ", i, __argv[i]); 15 } 16 } 17 } 18 JLI_CmdToArgs(GetCommandLine()); 19 margc = JLI_GetStdArgc(); 20 // add one more to mark the end 21 margv = (char **)JLI_MemAlloc((margc + 1) * (sizeof(char *))); 22 { 23 int i = 0; 24 StdArg *stdargs = JLI_GetStdArgs(); 25 for (i = 0 ; i < margc ; i++) { 26 margv[i] = stdargs[i].arg; 27 } 28 margv[i] = NULL; 29 } 30 #else /* *NIXES */ 31 margc = argc; 32 margv = argv; 33 #endif /* WIN32 */ 34 return JLI_Launch(margc, margv, 35 sizeof(const_jargs) / sizeof(char *), const_jargs, 36 sizeof(const_appclasspath) / sizeof(char *), const_appclasspath, 37 FULL_VERSION, 38 DOT_VERSION, 39 (const_progname != NULL) ? const_progname : *margv, 40 (const_launcher != NULL) ? const_launcher : *margv, 41 (const_jargs != NULL) ? JNI_TRUE : JNI_FALSE, 42 const_cpwildcard, const_javaw, const_ergo_class); 43 }
这个类里边最重要的一个方法就是最后的JLI_Launch,这个方法的实现存在于jdk/src/share/bin/java.c 中(大家应该都不陌生平时我们运行 java 程序时,都是采用 java com.***.Main来启动的吧)。
1 /* 2 * Entry point. 3 */ 4 int 5 JLI_Launch(int argc, char ** argv, /* main argc, argc */ 6 int jargc, const char** jargv, /* java args */ 7 int appclassc, const char** appclassv, /* app classpath */ 8 const char* fullversion, /* full version defined */ 9 const char* dotversion, /* dot version defined */ 10 const char* pname, /* program name */ 11 const char* lname, /* launcher name */ 12 jboolean javaargs, /* JAVA_ARGS */ 13 jboolean cpwildcard, /* classpath wildcard*/ 14 jboolean javaw, /* windows-only javaw */ 15 jint ergo /* ergonomics class policy */ 16 ) 17 { 18 int mode = LM_UNKNOWN; 19 char *what = NULL; 20 char *cpath = 0; 21 char *main_class = NULL; 22 int ret; 23 InvocationFunctions ifn; 24 jlong start, end; 25 char jvmpath[MAXPATHLEN]; 26 char jrepath[MAXPATHLEN]; 27 char jvmcfg[MAXPATHLEN]; 28 29 _fVersion = fullversion; 30 _dVersion = dotversion; 31 _launcher_name = lname; 32 _program_name = pname; 33 _is_java_args = javaargs; 34 _wc_enabled = cpwildcard; 35 _ergo_policy = ergo; 36 37 InitLauncher(javaw); 38 DumpState(); 39 if (JLI_IsTraceLauncher()) { 40 int i; 41 printf("Command line args: "); 42 for (i = 0; i < argc ; i++) { 43 printf("argv[%d] = %s ", i, argv[i]); 44 } 45 AddOption("-Dsun.java.launcher.diag=true", NULL); 46 } 47 48 /* 49 * Make sure the specified version of the JRE is running. 50 * 51 * There are three things to note about the SelectVersion() routine: 52 * 1) If the version running isn't correct, this routine doesn't 53 * return (either the correct version has been exec'd or an error 54 * was issued). 55 * 2) Argc and Argv in this scope are *not* altered by this routine. 56 * It is the responsibility of subsequent code to ignore the 57 * arguments handled by this routine. 58 * 3) As a side-effect, the variable "main_class" is guaranteed to 59 * be set (if it should ever be set). This isn't exactly the 60 * poster child for structured programming, but it is a small 61 * price to pay for not processing a jar file operand twice. 62 * (Note: This side effect has been disabled. See comment on 63 * bugid 5030265 below.) 64 */ 65 SelectVersion(argc, argv, &main_class); 66 67 CreateExecutionEnvironment(&argc, &argv, 68 jrepath, sizeof(jrepath), 69 jvmpath, sizeof(jvmpath), 70 jvmcfg, sizeof(jvmcfg)); 71 72 ifn.CreateJavaVM = 0; 73 ifn.GetDefaultJavaVMInitArgs = 0; 74 75 if (JLI_IsTraceLauncher()) { 76 start = CounterGet(); 77 } 78 79 if (!LoadJavaVM(jvmpath, &ifn)) { 80 return(6); 81 } 82 83 if (JLI_IsTraceLauncher()) { 84 end = CounterGet(); 85 } 86 87 JLI_TraceLauncher("%ld micro seconds to LoadJavaVM ", 88 (long)(jint)Counter2Micros(end-start)); 89 90 ++argv; 91 --argc; 92 93 if (IsJavaArgs()) { 94 /* Preprocess wrapper arguments */ 95 TranslateApplicationArgs(jargc, jargv, &argc, &argv); 96 if (!AddApplicationOptions(appclassc, appclassv)) { 97 return(1); 98 } 99 } else { 100 /* Set default CLASSPATH */ 101 cpath = getenv("CLASSPATH"); 102 if (cpath == NULL) { 103 cpath = "."; 104 } 105 SetClassPath(cpath); 106 } 107 108 /* Parse command line options; if the return value of 109 * ParseArguments is false, the program should exit. 110 */ 111 if (!ParseArguments(&argc, &argv, &mode, &what, &ret, jrepath)) 112 { 113 return(ret); 114 } 115 116 /* Override class path if -jar flag was specified */ 117 if (mode == LM_JAR) { 118 SetClassPath(what); /* Override class path */ 119 } 120 121 /* set the -Dsun.java.command pseudo property */ 122 SetJavaCommandLineProp(what, argc, argv); 123 124 /* Set the -Dsun.java.launcher pseudo property */ 125 SetJavaLauncherProp(); 126 127 /* set the -Dsun.java.launcher.* platform properties */ 128 SetJavaLauncherPlatformProps(); 129 130 return JVMInit(&ifn, threadStackSize, argc, argv, mode, what, ret); 131 }
这个方法中,进行了一系列必要的操作,如libjvm.so的加载、参数解析、Classpath 的获取和设置、系统属性的设置、JVM 初始化等等,不过和本文相关的主要是130行的 JVMInit 方法,接下来我们看下这个方法的实现(位于/jdk/src/solaris/bin/java_md_solinux.c)。
1 int 2 JVMInit(InvocationFunctions* ifn, jlong threadStackSize, 3 int argc, char **argv, 4 int mode, char *what, int ret) 5 { 6 ShowSplashScreen(); 7 return ContinueInNewThread(ifn, threadStackSize, argc, argv, mode, what, ret); 8 }
这里请关注两个点,ContinueInNewThread方法 和 ifn 入参。ContinueInNewThread位于 java.c中,而 ifn 则携带了libjvm.so中的几个非常重要的函数(CreateJavaVM/GetDefaultJavaVMInitArgs/GetCreatedJavaVMs),这里我们重点关注CreateJavaVM
1 int 2 ContinueInNewThread(InvocationFunctions* ifn, jlong threadStackSize, 3 int argc, char **argv, 4 int mode, char *what, int ret) 5 { 6 7 /* 8 * If user doesn't specify stack size, check if VM has a preference. 9 * Note that HotSpot no longer supports JNI_VERSION_1_1 but it will 10 * return its default stack size through the init args structure. 11 */ 12 if (threadStackSize == 0) { 13 struct JDK1_1InitArgs args1_1; 14 memset((void*)&args1_1, 0, sizeof(args1_1)); 15 args1_1.version = JNI_VERSION_1_1; 16 ifn->GetDefaultJavaVMInitArgs(&args1_1); /* ignore return value */ 17 if (args1_1.javaStackSize > 0) { 18 threadStackSize = args1_1.javaStackSize; 19 } 20 } 21 22 { /* Create a new thread to create JVM and invoke main method */ 23 JavaMainArgs args; 24 int rslt; 25 26 args.argc = argc; 27 args.argv = argv; 28 args.mode = mode; 29 args.what = what; 30 args.ifn = *ifn; 31 32 rslt = ContinueInNewThread0(JavaMain, threadStackSize, (void*)&args); 33 /* If the caller has deemed there is an error we 34 * simply return that, otherwise we return the value of 35 * the callee 36 */ 37 return (ret != 0) ? ret : rslt; 38 } 39 }
可以看出,这里进行了 JavaMainArgs 参数设置,设置完成之后,在32行处调用了 ContinueInNewThread0 (位于/jdk/src/solaris/bin/java_md_solinux.c)方法,该方法中传入了 JavaMain 函数指针和 args 参数,这二者至关重要。接下来看下其源码
1 /* 2 * Block current thread and continue execution in a new thread 3 */ 4 int 5 ContinueInNewThread0(int (JNICALL *continuation)(void *), jlong stack_size, void * args) { 6 int rslt; 7 #ifdef __linux__ 8 pthread_t tid; 9 pthread_attr_t attr; 10 pthread_attr_init(&attr); 11 pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); 12 13 if (stack_size > 0) { 14 pthread_attr_setstacksize(&attr, stack_size); 15 } 16 17 if (pthread_create(&tid, &attr, (void *(*)(void*))continuation, (void*)args) == 0) { 18 void * tmp; 19 pthread_join(tid, &tmp); 20 rslt = (int)tmp; 21 } else { 22 /* 23 * Continue execution in current thread if for some reason (e.g. out of 24 * memory/LWP) a new thread can't be created. This will likely fail 25 * later in continuation as JNI_CreateJavaVM needs to create quite a 26 * few new threads, anyway, just give it a try.. 27 */ 28 rslt = continuation(args); 29 } 30 31 pthread_attr_destroy(&attr); 32 #else /* ! __linux__ */ 33 thread_t tid; 34 long flags = 0; 35 if (thr_create(NULL, stack_size, (void *(*)(void *))continuation, args, flags, &tid) == 0) { 36 void * tmp; 37 thr_join(tid, NULL, &tmp); 38 rslt = (int)tmp; 39 } else { 40 /* See above. Continue in current thread if thr_create() failed */ 41 rslt = continuation(args); 42 } 43 #endif /* __linux__ */ 44 return rslt; 45 }
这里最关键的点在于,如果是 linux 环境下,则创建了一个 pthread_t 的线程来运行传入的 JavaMain 函数,并且将 args 参数也一并传入了。这时候,我们唯一要关注的便是 JavaMain (在jdk/src/share/bin/java.c )函数,请看源码
1 int JNICALL 2 JavaMain(void * _args) 3 { 4 JavaMainArgs *args = (JavaMainArgs *)_args; 5 int argc = args->argc; 6 char **argv = args->argv; 7 int mode = args->mode; 8 char *what = args->what; 9 InvocationFunctions ifn = args->ifn; 10 11 JavaVM *vm = 0; 12 JNIEnv *env = 0; 13 jclass mainClass = NULL; 14 jclass appClass = NULL; // actual application class being launched 15 jmethodID mainID; 16 jobjectArray mainArgs; 17 int ret = 0; 18 jlong start, end; 19 20 RegisterThread(); 21 22 /* Initialize the virtual machine */ 23 start = CounterGet(); 24 if (!InitializeJVM(&vm, &env, &ifn)) { 25 JLI_ReportErrorMessage(JVM_ERROR1); 26 exit(1); 27 } 28 29 if (showSettings != NULL) { 30 ShowSettings(env, showSettings); 31 CHECK_EXCEPTION_LEAVE(1); 32 } 33 34 if (printVersion || showVersion) { 35 PrintJavaVersion(env, showVersion); 36 CHECK_EXCEPTION_LEAVE(0); 37 if (printVersion) { 38 LEAVE(); 39 } 40 } 41 42 /* If the user specified neither a class name nor a JAR file */ 43 if (printXUsage || printUsage || what == 0 || mode == LM_UNKNOWN) { 44 PrintUsage(env, printXUsage); 45 CHECK_EXCEPTION_LEAVE(1); 46 LEAVE(); 47 } 48 49 FreeKnownVMs(); /* after last possible PrintUsage() */ 50 51 if (JLI_IsTraceLauncher()) { 52 end = CounterGet(); 53 JLI_TraceLauncher("%ld micro seconds to InitializeJVM ", 54 (long)(jint)Counter2Micros(end-start)); 55 } 56 57 /* At this stage, argc/argv have the application's arguments */ 58 if (JLI_IsTraceLauncher()){ 59 int i; 60 printf("%s is '%s' ", launchModeNames[mode], what); 61 printf("App's argc is %d ", argc); 62 for (i=0; i < argc; i++) { 63 printf(" argv[%2d] = '%s' ", i, argv[i]); 64 } 65 } 66 67 ret = 1; 68 69 /* 70 * Get the application's main class. 71 * 72 * See bugid 5030265. The Main-Class name has already been parsed 73 * from the manifest, but not parsed properly for UTF-8 support. 74 * Hence the code here ignores the value previously extracted and 75 * uses the pre-existing code to reextract the value. This is 76 * possibly an end of release cycle expedient. However, it has 77 * also been discovered that passing some character sets through 78 * the environment has "strange" behavior on some variants of 79 * Windows. Hence, maybe the manifest parsing code local to the 80 * launcher should never be enhanced. 81 * 82 * Hence, future work should either: 83 * 1) Correct the local parsing code and verify that the 84 * Main-Class attribute gets properly passed through 85 * all environments, 86 * 2) Remove the vestages of maintaining main_class through 87 * the environment (and remove these comments). 88 * 89 * This method also correctly handles launching existing JavaFX 90 * applications that may or may not have a Main-Class manifest entry. 91 */ 92 mainClass = LoadMainClass(env, mode, what); 93 CHECK_EXCEPTION_NULL_LEAVE(mainClass); 94 /* 95 * In some cases when launching an application that needs a helper, e.g., a 96 * JavaFX application with no main method, the mainClass will not be the 97 * applications own main class but rather a helper class. To keep things 98 * consistent in the UI we need to track and report the application main class. 99 */ 100 appClass = GetApplicationClass(env); 101 NULL_CHECK_RETURN_VALUE(appClass, -1); 102 /* 103 * PostJVMInit uses the class name as the application name for GUI purposes, 104 * for example, on OSX this sets the application name in the menu bar for 105 * both SWT and JavaFX. So we'll pass the actual application class here 106 * instead of mainClass as that may be a launcher or helper class instead 107 * of the application class. 108 */ 109 PostJVMInit(env, appClass, vm); 110 /* 111 * The LoadMainClass not only loads the main class, it will also ensure 112 * that the main method's signature is correct, therefore further checking 113 * is not required. The main method is invoked here so that extraneous java 114 * stacks are not in the application stack trace. 115 */ 116 mainID = (*env)->GetStaticMethodID(env, mainClass, "main", 117 "([Ljava/lang/String;)V"); 118 CHECK_EXCEPTION_NULL_LEAVE(mainID); 119 120 /* Build platform specific argument array */ 121 mainArgs = CreateApplicationArgs(env, argv, argc); 122 CHECK_EXCEPTION_NULL_LEAVE(mainArgs); 123 124 /* Invoke main method. */ 125 (*env)->CallStaticVoidMethod(env, mainClass, mainID, mainArgs); 126 127 /* 128 * The launcher's exit code (in the absence of calls to 129 * System.exit) will be non-zero if main threw an exception. 130 */ 131 ret = (*env)->ExceptionOccurred(env) == NULL ? 0 : 1; 132 LEAVE(); 133 }
和本小节相关的函数为InitializeJVM函数,在这个函数中,调用CreateJavaVM方法,这个方法就是之前在加载 libjvm.so 的时候,从动态库中获取的,首先看InitializeJVM的源码
1 /* 2 * Initializes the Java Virtual Machine. Also frees options array when 3 * finished. 4 */ 5 static jboolean 6 InitializeJVM(JavaVM **pvm, JNIEnv **penv, InvocationFunctions *ifn) 7 { 8 JavaVMInitArgs args; 9 jint r; 10 11 memset(&args, 0, sizeof(args)); 12 args.version = JNI_VERSION_1_2; 13 args.nOptions = numOptions; 14 args.options = options; 15 args.ignoreUnrecognized = JNI_FALSE; 16 17 if (JLI_IsTraceLauncher()) { 18 int i = 0; 19 printf("JavaVM args: "); 20 printf("version 0x%08lx, ", (long)args.version); 21 printf("ignoreUnrecognized is %s, ", 22 args.ignoreUnrecognized ? "JNI_TRUE" : "JNI_FALSE"); 23 printf("nOptions is %ld ", (long)args.nOptions); 24 for (i = 0; i < numOptions; i++) 25 printf(" option[%2d] = '%s' ", 26 i, args.options[i].optionString); 27 } 28 29 r = ifn->CreateJavaVM(pvm, (void **)penv, &args); 30 JLI_MemFree(options); 31 return r == JNI_OK; 32 }
29行处,调用 CreateJavaVM(定义在hotspot/src/share/vm/prims/jni.cpp) 方法,来进行 JVM 虚拟机的真正创建过程,源码如下
1 _JNI_IMPORT_OR_EXPORT_ jint JNICALL JNI_CreateJavaVM(JavaVM **vm, void **penv, void *args) { 2 #ifndef USDT2 3 HS_DTRACE_PROBE3(hotspot_jni, CreateJavaVM__entry, vm, penv, args); 4 #else /* USDT2 */ 5 HOTSPOT_JNI_CREATEJAVAVM_ENTRY( 6 (void **) vm, penv, args); 7 #endif /* USDT2 */ 8 9 jint result = JNI_ERR; 10 DT_RETURN_MARK(CreateJavaVM, jint, (const jint&)result); 11 12 // We're about to use Atomic::xchg for synchronization. Some Zero 13 // platforms use the GCC builtin __sync_lock_test_and_set for this, 14 // but __sync_lock_test_and_set is not guaranteed to do what we want 15 // on all architectures. So we check it works before relying on it. 16 #if defined(ZERO) && defined(ASSERT) 17 { 18 jint a = 0xcafebabe; 19 jint b = Atomic::xchg(0xdeadbeef, &a); 20 void *c = &a; 21 void *d = Atomic::xchg_ptr(&b, &c); 22 assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works"); 23 assert(c == &b && d == &a, "Atomic::xchg_ptr() works"); 24 } 25 #endif // ZERO && ASSERT 26 27 // At the moment it's only possible to have one Java VM, 28 // since some of the runtime state is in global variables. 29 30 // We cannot use our mutex locks here, since they only work on 31 // Threads. We do an atomic compare and exchange to ensure only 32 // one thread can call this method at a time 33 34 // We use Atomic::xchg rather than Atomic::add/dec since on some platforms 35 // the add/dec implementations are dependent on whether we are running 36 // on a multiprocessor, and at this stage of initialization the os::is_MP 37 // function used to determine this will always return false. Atomic::xchg 38 // does not have this problem. 39 if (Atomic::xchg(1, &vm_created) == 1) { 40 return JNI_EEXIST; // already created, or create attempt in progress 41 } 42 if (Atomic::xchg(0, &safe_to_recreate_vm) == 0) { 43 return JNI_ERR; // someone tried and failed and retry not allowed. 44 } 45 46 assert(vm_created == 1, "vm_created is true during the creation"); 47 48 /** 49 * Certain errors during initialization are recoverable and do not 50 * prevent this method from being called again at a later time 51 * (perhaps with different arguments). However, at a certain 52 * point during initialization if an error occurs we cannot allow 53 * this function to be called again (or it will crash). In those 54 * situations, the 'canTryAgain' flag is set to false, which atomically 55 * sets safe_to_recreate_vm to 1, such that any new call to 56 * JNI_CreateJavaVM will immediately fail using the above logic. 57 */ 58 bool can_try_again = true; 59 60 result = Threads::create_vm((JavaVMInitArgs*) args, &can_try_again); 61 if (result == JNI_OK) { 62 JavaThread *thread = JavaThread::current(); 63 /* thread is thread_in_vm here */ 64 *vm = (JavaVM *)(&main_vm); 65 *(JNIEnv**)penv = thread->jni_environment(); 66 67 // Tracks the time application was running before GC 68 RuntimeService::record_application_start(); 69 70 // Notify JVMTI 71 if (JvmtiExport::should_post_thread_life()) { 72 JvmtiExport::post_thread_start(thread); 73 } 74 75 EventThreadStart event; 76 if (event.should_commit()) { 77 event.set_javalangthread(java_lang_Thread::thread_id(thread->threadObj())); 78 event.commit(); 79 } 80 81 #ifndef PRODUCT 82 #ifndef TARGET_OS_FAMILY_windows 83 #define CALL_TEST_FUNC_WITH_WRAPPER_IF_NEEDED(f) f() 84 #endif 85 86 // Check if we should compile all classes on bootclasspath 87 if (CompileTheWorld) ClassLoader::compile_the_world(); 88 if (ReplayCompiles) ciReplay::replay(thread); 89 90 // Some platforms (like Win*) need a wrapper around these test 91 // functions in order to properly handle error conditions. 92 CALL_TEST_FUNC_WITH_WRAPPER_IF_NEEDED(test_error_handler); 93 CALL_TEST_FUNC_WITH_WRAPPER_IF_NEEDED(execute_internal_vm_tests); 94 #endif 95 96 // Since this is not a JVM_ENTRY we have to set the thread state manually before leaving. 97 ThreadStateTransition::transition_and_fence(thread, _thread_in_vm, _thread_in_native); 98 } else { 99 if (can_try_again) { 100 // reset safe_to_recreate_vm to 1 so that retrial would be possible 101 safe_to_recreate_vm = 1; 102 } 103 104 // Creation failed. We must reset vm_created 105 *vm = 0; 106 *(JNIEnv**)penv = 0; 107 // reset vm_created last to avoid race condition. Use OrderAccess to 108 // control both compiler and architectural-based reordering. 109 OrderAccess::release_store(&vm_created, 0); 110 } 111 112 return result; 113 }
这里只关注最核心的方法是60行的Threads::create_vm(hotspot/src/share/vm/runtime/Thread.cpp) 方法,在这个方法中,进行了大量的初始化操作,不过,这里我们只关注其中的一个点,就是 os::signal_init() 方法的调用,这就是启动“Signal Dispatcher”线程的地方。先看 create_vm 的源码
1 jint Threads::create_vm(JavaVMInitArgs* args, bool* canTryAgain) { 2 3 extern void JDK_Version_init(); 4 5 // Check version 6 if (!is_supported_jni_version(args->version)) return JNI_EVERSION; 7 8 // Initialize the output stream module 9 ostream_init(); 10 11 // Process java launcher properties. 12 Arguments::process_sun_java_launcher_properties(args); 13 14 // Initialize the os module before using TLS 15 os::init(); 16 17 // Initialize system properties. 18 Arguments::init_system_properties(); 19 20 // So that JDK version can be used as a discrimintor when parsing arguments 21 JDK_Version_init(); 22 23 // Update/Initialize System properties after JDK version number is known 24 Arguments::init_version_specific_system_properties(); 25 26 // Parse arguments 27 jint parse_result = Arguments::parse(args); 28 if (parse_result != JNI_OK) return parse_result; 29 30 os::init_before_ergo(); 31 32 jint ergo_result = Arguments::apply_ergo(); 33 if (ergo_result != JNI_OK) return ergo_result; 34 35 if (PauseAtStartup) { 36 os::pause(); 37 } 38 39 #ifndef USDT2 40 HS_DTRACE_PROBE(hotspot, vm__init__begin); 41 #else /* USDT2 */ 42 HOTSPOT_VM_INIT_BEGIN(); 43 #endif /* USDT2 */ 44 45 // Record VM creation timing statistics 46 TraceVmCreationTime create_vm_timer; 47 create_vm_timer.start(); 48 49 // Timing (must come after argument parsing) 50 TraceTime timer("Create VM", TraceStartupTime); 51 52 // Initialize the os module after parsing the args 53 jint os_init_2_result = os::init_2(); 54 if (os_init_2_result != JNI_OK) return os_init_2_result; 55 56 jint adjust_after_os_result = Arguments::adjust_after_os(); 57 if (adjust_after_os_result != JNI_OK) return adjust_after_os_result; 58 59 // intialize TLS 60 ThreadLocalStorage::init(); 61 62 // Bootstrap native memory tracking, so it can start recording memory 63 // activities before worker thread is started. This is the first phase 64 // of bootstrapping, VM is currently running in single-thread mode. 65 MemTracker::bootstrap_single_thread(); 66 67 // Initialize output stream logging 68 ostream_init_log(); 69 70 // Convert -Xrun to -agentlib: if there is no JVM_OnLoad 71 // Must be before create_vm_init_agents() 72 if (Arguments::init_libraries_at_startup()) { 73 convert_vm_init_libraries_to_agents(); 74 } 75 76 // Launch -agentlib/-agentpath and converted -Xrun agents 77 if (Arguments::init_agents_at_startup()) { 78 create_vm_init_agents(); 79 } 80 81 // Initialize Threads state 82 _thread_list = NULL; 83 _number_of_threads = 0; 84 _number_of_non_daemon_threads = 0; 85 86 // Initialize global data structures and create system classes in heap 87 vm_init_globals(); 88 89 // Attach the main thread to this os thread 90 JavaThread* main_thread = new JavaThread(); 91 main_thread->set_thread_state(_thread_in_vm); 92 // must do this before set_active_handles and initialize_thread_local_storage 93 // Note: on solaris initialize_thread_local_storage() will (indirectly) 94 // change the stack size recorded here to one based on the java thread 95 // stacksize. This adjusted size is what is used to figure the placement 96 // of the guard pages. 97 main_thread->record_stack_base_and_size(); 98 main_thread->initialize_thread_local_storage(); 99 100 main_thread->set_active_handles(JNIHandleBlock::allocate_block()); 101 102 if (!main_thread->set_as_starting_thread()) { 103 vm_shutdown_during_initialization( 104 "Failed necessary internal allocation. Out of swap space"); 105 delete main_thread; 106 *canTryAgain = false; // don't let caller call JNI_CreateJavaVM again 107 return JNI_ENOMEM; 108 } 109 110 // Enable guard page *after* os::create_main_thread(), otherwise it would 111 // crash Linux VM, see notes in os_linux.cpp. 112 main_thread->create_stack_guard_pages(); 113 114 // Initialize Java-Level synchronization subsystem 115 ObjectMonitor::Initialize() ; 116 117 // Second phase of bootstrapping, VM is about entering multi-thread mode 118 MemTracker::bootstrap_multi_thread(); 119 120 // Initialize global modules 121 jint status = init_globals(); 122 if (status != JNI_OK) { 123 delete main_thread; 124 *canTryAgain = false; // don't let caller call JNI_CreateJavaVM again 125 return status; 126 } 127 128 // Should be done after the heap is fully created 129 main_thread->cache_global_variables(); 130 131 HandleMark hm; 132 133 { MutexLocker mu(Threads_lock); 134 Threads::add(main_thread); 135 } 136 137 // Any JVMTI raw monitors entered in onload will transition into 138 // real raw monitor. VM is setup enough here for raw monitor enter. 139 JvmtiExport::transition_pending_onload_raw_monitors(); 140 141 // Fully start NMT 142 MemTracker::start(); 143 144 // Create the VMThread 145 { TraceTime timer("Start VMThread", TraceStartupTime); 146 VMThread::create(); 147 Thread* vmthread = VMThread::vm_thread(); 148 149 if (!os::create_thread(vmthread, os::vm_thread)) 150 vm_exit_during_initialization("Cannot create VM thread. Out of system resources."); 151 152 // Wait for the VM thread to become ready, and VMThread::run to initialize 153 // Monitors can have spurious returns, must always check another state flag 154 { 155 MutexLocker ml(Notify_lock); 156 os::start_thread(vmthread); 157 while (vmthread->active_handles() == NULL) { 158 Notify_lock->wait(); 159 } 160 } 161 } 162 163 assert (Universe::is_fully_initialized(), "not initialized"); 164 if (VerifyDuringStartup) { 165 // Make sure we're starting with a clean slate. 166 VM_Verify verify_op; 167 VMThread::execute(&verify_op); 168 } 169 170 EXCEPTION_MARK; 171 172 // At this point, the Universe is initialized, but we have not executed 173 // any byte code. Now is a good time (the only time) to dump out the 174 // internal state of the JVM for sharing. 175 if (DumpSharedSpaces) { 176 MetaspaceShared::preload_and_dump(CHECK_0); 177 ShouldNotReachHere(); 178 } 179 180 // Always call even when there are not JVMTI environments yet, since environments 181 // may be attached late and JVMTI must track phases of VM execution 182 JvmtiExport::enter_start_phase(); 183 184 // Notify JVMTI agents that VM has started (JNI is up) - nop if no agents. 185 JvmtiExport::post_vm_start(); 186 187 { 188 TraceTime timer("Initialize java.lang classes", TraceStartupTime); 189 190 if (EagerXrunInit && Arguments::init_libraries_at_startup()) { 191 create_vm_init_libraries(); 192 } 193 194 initialize_class(vmSymbols::java_lang_String(), CHECK_0); 195 196 // Initialize java_lang.System (needed before creating the thread) 197 initialize_class(vmSymbols::java_lang_System(), CHECK_0); 198 initialize_class(vmSymbols::java_lang_ThreadGroup(), CHECK_0); 199 Handle thread_group = create_initial_thread_group(CHECK_0); 200 Universe::set_main_thread_group(thread_group()); 201 initialize_class(vmSymbols::java_lang_Thread(), CHECK_0); 202 oop thread_object = create_initial_thread(thread_group, main_thread, CHECK_0); 203 main_thread->set_threadObj(thread_object); 204 // Set thread status to running since main thread has 205 // been started and running. 206 java_lang_Thread::set_thread_status(thread_object, 207 java_lang_Thread::RUNNABLE); 208 209 // The VM creates & returns objects of this class. Make sure it's initialized. 210 initialize_class(vmSymbols::java_lang_Class(), CHECK_0); 211 212 // The VM preresolves methods to these classes. Make sure that they get initialized 213 initialize_class(vmSymbols::java_lang_reflect_Method(), CHECK_0); 214 initialize_class(vmSymbols::java_lang_ref_Finalizer(), CHECK_0); 215 call_initializeSystemClass(CHECK_0); 216 217 // get the Java runtime name after java.lang.System is initialized 218 JDK_Version::set_runtime_name(get_java_runtime_name(THREAD)); 219 JDK_Version::set_runtime_version(get_java_runtime_version(THREAD)); 220 221 // an instance of OutOfMemory exception has been allocated earlier 222 initialize_class(vmSymbols::java_lang_OutOfMemoryError(), CHECK_0); 223 initialize_class(vmSymbols::java_lang_NullPointerException(), CHECK_0); 224 initialize_class(vmSymbols::java_lang_ClassCastException(), CHECK_0); 225 initialize_class(vmSymbols::java_lang_ArrayStoreException(), CHECK_0); 226 initialize_class(vmSymbols::java_lang_ArithmeticException(), CHECK_0); 227 initialize_class(vmSymbols::java_lang_StackOverflowError(), CHECK_0); 228 initialize_class(vmSymbols::java_lang_IllegalMonitorStateException(), CHECK_0); 229 initialize_class(vmSymbols::java_lang_IllegalArgumentException(), CHECK_0); 230 } 231 232 // See : bugid 4211085. 233 // Background : the static initializer of java.lang.Compiler tries to read 234 // property"java.compiler" and read & write property "java.vm.info". 235 // When a security manager is installed through the command line 236 // option "-Djava.security.manager", the above properties are not 237 // readable and the static initializer for java.lang.Compiler fails 238 // resulting in a NoClassDefFoundError. This can happen in any 239 // user code which calls methods in java.lang.Compiler. 240 // Hack : the hack is to pre-load and initialize this class, so that only 241 // system domains are on the stack when the properties are read. 242 // Currently even the AWT code has calls to methods in java.lang.Compiler. 243 // On the classic VM, java.lang.Compiler is loaded very early to load the JIT. 244 // Future Fix : the best fix is to grant everyone permissions to read "java.compiler" and 245 // read and write"java.vm.info" in the default policy file. See bugid 4211383 246 // Once that is done, we should remove this hack. 247 initialize_class(vmSymbols::java_lang_Compiler(), CHECK_0); 248 249 // More hackery - the static initializer of java.lang.Compiler adds the string "nojit" to 250 // the java.vm.info property if no jit gets loaded through java.lang.Compiler (the hotspot 251 // compiler does not get loaded through java.lang.Compiler). "java -version" with the 252 // hotspot vm says "nojit" all the time which is confusing. So, we reset it here. 253 // This should also be taken out as soon as 4211383 gets fixed. 254 reset_vm_info_property(CHECK_0); 255 256 quicken_jni_functions(); 257 258 // Must be run after init_ft which initializes ft_enabled 259 if (TRACE_INITIALIZE() != JNI_OK) { 260 vm_exit_during_initialization("Failed to initialize tracing backend"); 261 } 262 263 // Set flag that basic initialization has completed. Used by exceptions and various 264 // debug stuff, that does not work until all basic classes have been initialized. 265 set_init_completed(); 266 267 #ifndef USDT2 268 HS_DTRACE_PROBE(hotspot, vm__init__end); 269 #else /* USDT2 */ 270 HOTSPOT_VM_INIT_END(); 271 #endif /* USDT2 */ 272 273 // record VM initialization completion time 274 #if INCLUDE_MANAGEMENT 275 Management::record_vm_init_completed(); 276 #endif // INCLUDE_MANAGEMENT 277 278 // Compute system loader. Note that this has to occur after set_init_completed, since 279 // valid exceptions may be thrown in the process. 280 // Note that we do not use CHECK_0 here since we are inside an EXCEPTION_MARK and 281 // set_init_completed has just been called, causing exceptions not to be shortcut 282 // anymore. We call vm_exit_during_initialization directly instead. 283 SystemDictionary::compute_java_system_loader(THREAD); 284 if (HAS_PENDING_EXCEPTION) { 285 vm_exit_during_initialization(Handle(THREAD, PENDING_EXCEPTION)); 286 } 287 288 #if INCLUDE_ALL_GCS 289 // Support for ConcurrentMarkSweep. This should be cleaned up 290 // and better encapsulated. The ugly nested if test would go away 291 // once things are properly refactored. XXX YSR 292 if (UseConcMarkSweepGC || UseG1GC) { 293 if (UseConcMarkSweepGC) { 294 ConcurrentMarkSweepThread::makeSurrogateLockerThread(THREAD); 295 } else { 296 ConcurrentMarkThread::makeSurrogateLockerThread(THREAD); 297 } 298 if (HAS_PENDING_EXCEPTION) { 299 vm_exit_during_initialization(Handle(THREAD, PENDING_EXCEPTION)); 300 } 301 } 302 #endif // INCLUDE_ALL_GCS 303 304 // Always call even when there are not JVMTI environments yet, since environments 305 // may be attached late and JVMTI must track phases of VM execution 306 JvmtiExport::enter_live_phase(); 307 308 // Signal Dispatcher needs to be started before VMInit event is posted 309 os::signal_init(); 310 311 // Start Attach Listener if +StartAttachListener or it can't be started lazily 312 if (!DisableAttachMechanism) { 313 AttachListener::vm_start(); 314 if (StartAttachListener || AttachListener::init_at_startup()) { 315 AttachListener::init(); 316 } 317 } 318 319 // Launch -Xrun agents 320 // Must be done in the JVMTI live phase so that for backward compatibility the JDWP 321 // back-end can launch with -Xdebug -Xrunjdwp. 322 if (!EagerXrunInit && Arguments::init_libraries_at_startup()) { 323 create_vm_init_libraries(); 324 } 325 326 // Notify JVMTI agents that VM initialization is complete - nop if no agents. 327 JvmtiExport::post_vm_initialized(); 328 329 if (TRACE_START() != JNI_OK) { 330 vm_exit_during_initialization("Failed to start tracing backend."); 331 } 332 333 if (CleanChunkPoolAsync) { 334 Chunk::start_chunk_pool_cleaner_task(); 335 } 336 337 // initialize compiler(s) 338 #if defined(COMPILER1) || defined(COMPILER2) || defined(SHARK) 339 CompileBroker::compilation_init(); 340 #endif 341 342 if (EnableInvokeDynamic) { 343 // Pre-initialize some JSR292 core classes to avoid deadlock during class loading. 344 // It is done after compilers are initialized, because otherwise compilations of 345 // signature polymorphic MH intrinsics can be missed 346 // (see SystemDictionary::find_method_handle_intrinsic). 347 initialize_class(vmSymbols::java_lang_invoke_MethodHandle(), CHECK_0); 348 initialize_class(vmSymbols::java_lang_invoke_MemberName(), CHECK_0); 349 initialize_class(vmSymbols::java_lang_invoke_MethodHandleNatives(), CHECK_0); 350 } 351 352 #if INCLUDE_MANAGEMENT 353 Management::initialize(THREAD); 354 #endif // INCLUDE_MANAGEMENT 355 356 if (HAS_PENDING_EXCEPTION) { 357 // management agent fails to start possibly due to 358 // configuration problem and is responsible for printing 359 // stack trace if appropriate. Simply exit VM. 360 vm_exit(1); 361 } 362 363 if (Arguments::has_profile()) FlatProfiler::engage(main_thread, true); 364 if (MemProfiling) MemProfiler::engage(); 365 StatSampler::engage(); 366 if (CheckJNICalls) JniPeriodicChecker::engage(); 367 368 BiasedLocking::init(); 369 370 if (JDK_Version::current().post_vm_init_hook_enabled()) { 371 call_postVMInitHook(THREAD); 372 // The Java side of PostVMInitHook.run must deal with all 373 // exceptions and provide means of diagnosis. 374 if (HAS_PENDING_EXCEPTION) { 375 CLEAR_PENDING_EXCEPTION; 376 } 377 } 378 379 { 380 MutexLockerEx ml(PeriodicTask_lock, Mutex::_no_safepoint_check_flag); 381 // Make sure the watcher thread can be started by WatcherThread::start() 382 // or by dynamic enrollment. 383 WatcherThread::make_startable(); 384 // Start up the WatcherThread if there are any periodic tasks 385 // NOTE: All PeriodicTasks should be registered by now. If they 386 // aren't, late joiners might appear to start slowly (we might 387 // take a while to process their first tick). 388 if (PeriodicTask::num_tasks() > 0) { 389 WatcherThread::start(); 390 } 391 } 392 393 // Give os specific code one last chance to start 394 os::init_3(); 395 396 create_vm_timer.end(); 397 #ifdef ASSERT 398 _vm_complete = true; 399 #endif 400 return JNI_OK; 401 }
309 行处,看到了os::signal_init() 的调用(hotspot/src/share/vm/runtime/os.cpp),这就是我们要找的。接着,我们看下其具体实现
1 void os::signal_init() { 2 if (!ReduceSignalUsage) { 3 // Setup JavaThread for processing signals 4 EXCEPTION_MARK; 5 Klass* k = SystemDictionary::resolve_or_fail(vmSymbols::java_lang_Thread(), true, CHECK); 6 instanceKlassHandle klass (THREAD, k); 7 instanceHandle thread_oop = klass->allocate_instance_handle(CHECK); 8 9 const char thread_name[] = "Signal Dispatcher"; 10 Handle string = java_lang_String::create_from_str(thread_name, CHECK); 11 12 // Initialize thread_oop to put it into the system threadGroup 13 Handle thread_group (THREAD, Universe::system_thread_group()); 14 JavaValue result(T_VOID); 15 JavaCalls::call_special(&result, thread_oop, 16 klass, 17 vmSymbols::object_initializer_name(), 18 vmSymbols::threadgroup_string_void_signature(), 19 thread_group, 20 string, 21 CHECK); 22 23 KlassHandle group(THREAD, SystemDictionary::ThreadGroup_klass()); 24 JavaCalls::call_special(&result, 25 thread_group, 26 group, 27 vmSymbols::add_method_name(), 28 vmSymbols::thread_void_signature(), 29 thread_oop, // ARG 1 30 CHECK); 31 32 os::signal_init_pd(); 33 34 { MutexLocker mu(Threads_lock); 35 JavaThread* signal_thread = new JavaThread(&signal_thread_entry); 36 37 // At this point it may be possible that no osthread was created for the 38 // JavaThread due to lack of memory. We would have to throw an exception 39 // in that case. However, since this must work and we do not allow 40 // exceptions anyway, check and abort if this fails. 41 if (signal_thread == NULL || signal_thread->osthread() == NULL) { 42 vm_exit_during_initialization("java.lang.OutOfMemoryError", 43 "unable to create new native thread"); 44 } 45 46 java_lang_Thread::set_thread(thread_oop(), signal_thread); 47 java_lang_Thread::set_priority(thread_oop(), NearMaxPriority); 48 java_lang_Thread::set_daemon(thread_oop()); 49 50 signal_thread->set_threadObj(thread_oop()); 51 Threads::add(signal_thread); 52 Thread::start(signal_thread); 53 } 54 // Handle ^BREAK 55 os::signal(SIGBREAK, os::user_handler()); 56 } 57 }
这里的完全可以看出来,在此函数中35行处,创建了一个 java 线程,用于执行signal_thread_entry 函数,那我们来看看,这个 signal_thread_entry 函数到底做了什么?
1 // sigexitnum_pd is a platform-specific special signal used for terminating the Signal thread. 2 static void signal_thread_entry(JavaThread* thread, TRAPS) { 3 os::set_priority(thread, NearMaxPriority); 4 while (true) { 5 int sig; 6 { 7 // FIXME : Currently we have not decieded what should be the status 8 // for this java thread blocked here. Once we decide about 9 // that we should fix this. 10 sig = os::signal_wait(); 11 } 12 if (sig == os::sigexitnum_pd()) { 13 // Terminate the signal thread 14 return; 15 } 16 17 switch (sig) { 18 case SIGBREAK: { 19 // Check if the signal is a trigger to start the Attach Listener - in that 20 // case don't print stack traces. 21 if (!DisableAttachMechanism && AttachListener::is_init_trigger()) { 22 continue; 23 } 24 // Print stack traces 25 // Any SIGBREAK operations added here should make sure to flush 26 // the output stream (e.g. tty->flush()) after output. See 4803766. 27 // Each module also prints an extra carriage return after its output. 28 VM_PrintThreads op; 29 VMThread::execute(&op); 30 VM_PrintJNI jni_op; 31 VMThread::execute(&jni_op); 32 VM_FindDeadlocks op1(tty); 33 VMThread::execute(&op1); 34 Universe::print_heap_at_SIGBREAK(); 35 if (PrintClassHistogram) { 36 VM_GC_HeapInspection op1(gclog_or_tty, true /* force full GC before heap inspection */); 37 VMThread::execute(&op1); 38 } 39 if (JvmtiExport::should_post_data_dump()) { 40 JvmtiExport::post_data_dump(); 41 } 42 break; 43 } 44 default: { 45 // Dispatch the signal to java 46 HandleMark hm(THREAD); 47 Klass* k = SystemDictionary::resolve_or_null(vmSymbols::sun_misc_Signal(), THREAD); 48 KlassHandle klass (THREAD, k); 49 if (klass.not_null()) { 50 JavaValue result(T_VOID); 51 JavaCallArguments args; 52 args.push_int(sig); 53 JavaCalls::call_static( 54 &result, 55 klass, 56 vmSymbols::dispatch_name(), 57 vmSymbols::int_void_signature(), 58 &args, 59 THREAD 60 ); 61 } 62 if (HAS_PENDING_EXCEPTION) { 63 // tty is initialized early so we don't expect it to be null, but 64 // if it is we can't risk doing an initialization that might 65 // trigger additional out-of-memory conditions 66 if (tty != NULL) { 67 char klass_name[256]; 68 char tmp_sig_name[16]; 69 const char* sig_name = "UNKNOWN"; 70 InstanceKlass::cast(PENDING_EXCEPTION->klass())-> 71 name()->as_klass_external_name(klass_name, 256); 72 if (os::exception_name(sig, tmp_sig_name, 16) != NULL) 73 sig_name = tmp_sig_name; 74 warning("Exception %s occurred dispatching signal %s to handler" 75 "- the VM may need to be forcibly terminated", 76 klass_name, sig_name ); 77 } 78 CLEAR_PENDING_EXCEPTION; 79 } 80 } 81 } 82 } 83 }
函数里面意思已经很清晰明了了,首先在10行处,有一个os::signal_wait()的调用,该调用的主要是阻塞当前线程,并等待接收系统信号,然后再根据接收到的信号 sig 做 switch 逻辑,对于不同的信号做不同的处理。至此,关于“目标 JVM 对OS信号监听的实现”这一点,就已经分析结束了。简单的一句话总结就是,JVM 在启动的时候,会创建一个名为“Signal Dispatcher”的线程用于接收os 的信号,以便对不同信号分别做处理。
3.2、文件 Socket 通信的通道的创建
经过3.1的分析,我们已经知道在 JVM 启动之后,内部会有线程监听并处理 os 的信号,那么,这个时候,如果我们想和已经启动的 JVM 建立通信,当然就可以毫不犹豫的使用信号来进行了。不过,基于信号的通信,也是存在限制的,一方面,os 支持的信号是有限的,二来信号的通信往往是单向的,不方便通信双方进行高效的通信。基于这些,笔者认为,为了使得 Client JVM 和 Target JVM 更好的通信,就采用了 Socket 通信来实现二者的通信。那接下来我们看看,这个通道究竟是如何创建的?
当我们需要 attach 到某个目标 JVM 进程上去的时候,我们通常会写如下代码
1 VirtualMachine vm = VirtualMachine.attach(pid);
这样我们就能得到目标 JVM 的相关信息了,是不是很简单?不过,今天我们要做的可不是这么简单的事情,我们需要深入其后,了解其根本。接下来我们就以com.sun.tools.attach.VirtualMachine的 attach 方法入手,逐层揭开其神秘面纱。
1 public static VirtualMachine attach(String id) 2 throws AttachNotSupportedException, IOException 3 { 4 if (id == null) { 5 throw new NullPointerException("id cannot be null"); 6 } 7 List<AttachProvider> providers = AttachProvider.providers(); 8 if (providers.size() == 0) { 9 throw new AttachNotSupportedException("no providers installed"); 10 } 11 AttachNotSupportedException lastExc = null; 12 for (AttachProvider provider: providers) { 13 try { 14 return provider.attachVirtualMachine(id); 15 } catch (AttachNotSupportedException x) { 16 lastExc = x; 17 } 18 } 19 throw lastExc; 20 }
这是attach的源码,入参为目标 JVM 的进程 ID,其实现委派给了 AttachProvider 了,通过provider.attachVirtualMachine(id);来实现真正的 attach 操作。由于 AttachProvider 是个抽象类,所以这个方法的真正实现在子类中,在 Linux 环境下,我们看 sun.tools.attach.BsdAttachProvider.java 的实现。
1 public VirtualMachine attachVirtualMachine(String vmid) 2 throws AttachNotSupportedException, IOException 3 { 4 checkAttachPermission(); 5 6 // AttachNotSupportedException will be thrown if the target VM can be determined 7 // to be not attachable. 8 testAttachable(vmid); 9 10 return new BsdVirtualMachine(this, vmid); 11 }
这个方法非常简单,就是 new 了一个 BsdVirtualMachine 对象,并且把目标进程 ID 带过去了。看sun.tools.attach.BsdVirtualMachine.java 的构造函数
1 /** 2 * Attaches to the target VM 3 */ 4 BsdVirtualMachine(AttachProvider provider, String vmid) 5 throws AttachNotSupportedException, IOException 6 { 7 super(provider, vmid); 8 9 // This provider only understands pids 10 int pid; 11 try { 12 pid = Integer.parseInt(vmid); 13 } catch (NumberFormatException x) { 14 throw new AttachNotSupportedException("Invalid process identifier"); 15 } 16 17 // Find the socket file. If not found then we attempt to start the 18 // attach mechanism in the target VM by sending it a QUIT signal. 19 // Then we attempt to find the socket file again. 20 path = findSocketFile(pid); 21 if (path == null) { 22 File f = new File(tmpdir, ".attach_pid" + pid); 23 createAttachFile(f.getPath()); 24 try { 25 sendQuitTo(pid); 26 27 // give the target VM time to start the attach mechanism 28 int i = 0; 29 long delay = 200; 30 int retries = (int)(attachTimeout() / delay); 31 do { 32 try { 33 Thread.sleep(delay); 34 } catch (InterruptedException x) { } 35 path = findSocketFile(pid); 36 i++; 37 } while (i <= retries && path == null); 38 if (path == null) { 39 throw new AttachNotSupportedException( 40 "Unable to open socket file: target process not responding " + 41 "or HotSpot VM not loaded"); 42 } 43 } finally { 44 f.delete(); 45 } 46 } 47 48 // Check that the file owner/permission to avoid attaching to 49 // bogus process 50 checkPermissions(path); 51 52 // Check that we can connect to the process 53 // - this ensures we throw the permission denied error now rather than 54 // later when we attempt to enqueue a command. 55 int s = socket(); 56 try { 57 connect(s, path); 58 } finally { 59 close(s); 60 } 61 }
首先看20行处的findSocketFile(pid);这里是找对应的 socket (/tmp/.java_pid${pid})文件,这个文件就是我们在第二大点图中画出来的,用于进程间通信的 socket 文件,如果不存在,即第一次进入该方法的时候。这时会运行到74行的createAttachFile(f.getPath());来创建一个attach 文件,socket 文件的命名方式为:/tmp/../.attach_pid${pid},关于这两个方法(findSocketFile和createAttachFile)的具体实现,这里就不展开了,感兴趣的可以直接去查看jdk/src/solaris/native/sun/tools/attach/BsdVirtualMachine.c的相关源码。然后就会运行到一个非常关键的方法25行的sendQuitTo(pid);这个方法的实现,我们等会进入BsdVirtualMachine.c看下源码,其主要目的就是给该进程发送一个信号。之后会进入到31行处的 do...while循环,自旋反复轮询指定的次数来获取该 socket 文件的路径,直到超时或者 path(即 socket 文件路径) 不为空,最后在55行处,建立一个 socket,并且在57行处通过 path 进行 socket 的连接,从而完成了客户端(Client JVM)到目标进程(Target JVM)的 socket 通道建立。不过,请打住,这里是不是少了点什么?我相信细心的你肯定发现了,至少还存2个问题,
1. Target JVM 的 socket 服务端是何时创建的?
2. 用于通信的 socket 文件是在哪里创建的?
带着这两个问题,我们进入25行关键方法sendQuitTo(pid);的源码解读,该方法是个本地方法,位于jdk/src/solaris/native/sun/tools/attach/BsdVirtualMachine.c中
1 /* 2 * Class: sun_tools_attach_BsdVirtualMachine 3 * Method: sendQuitTo 4 * Signature: (I)V 5 */ 6 JNIEXPORT void JNICALL Java_sun_tools_attach_BsdVirtualMachine_sendQuitTo 7 (JNIEnv *env, jclass cls, jint pid) 8 { 9 if (kill((pid_t)pid, SIGQUIT)) { 10 JNU_ThrowIOExceptionWithLastError(env, "kill"); 11 } 12 }
看到第9行的时候,是不是觉得这里必然和前面3.1中大篇幅分析的信号处理线程“Signal Dispatcher”有种必然联系了?没错,这里就是通过 kill 这个系统调用像目标 JVM,发送了一个 SIGQUIT 的信号,该信号是个#define,即宏,表示的数字“3”,即类似在 linux 命令行执行了“kill -3 ${pid}”的操做(其实,这个命令正是获取目标 JVM 线程 dump 文件的一种方式,读者可以试试)。既然这里向目标 JVM 发送了这么个信号,那么我们现在就移步到3.1中讲到过的 signal_thread_entry 方法中去。
1 static void signal_thread_entry(JavaThread* thread, TRAPS) { 2 os::set_priority(thread, NearMaxPriority); 3 while (true) { 4 int sig; 5 { 6 // FIXME : Currently we have not decieded what should be the status 7 // for this java thread blocked here. Once we decide about 8 // that we should fix this. 9 sig = os::signal_wait(); 10 } 11 if (sig == os::sigexitnum_pd()) { 12 // Terminate the signal thread 13 return; 14 } 15 16 switch (sig) { 17 case SIGBREAK: { 18 // Check if the signal is a trigger to start the Attach Listener - in that 19 // case don't print stack traces. 20 if (!DisableAttachMechanism && AttachListener::is_init_trigger()) { 21 continue; 22 } 23 // Print stack traces 24 // Any SIGBREAK operations added here should make sure to flush 25 // the output stream (e.g. tty->flush()) after output. See 4803766. 26 // Each module also prints an extra carriage return after its output. 27 VM_PrintThreads op; 28 VMThread::execute(&op); 29 VM_PrintJNI jni_op; 30 VMThread::execute(&jni_op); 31 VM_FindDeadlocks op1(tty); 32 VMThread::execute(&op1); 33 Universe::print_heap_at_SIGBREAK(); 34 if (PrintClassHistogram) { 35 VM_GC_HeapInspection op1(gclog_or_tty, true /* force full GC before heap inspection */); 36 VMThread::execute(&op1); 37 } 38 if (JvmtiExport::should_post_data_dump()) { 39 JvmtiExport::post_data_dump(); 40 } 41 break; 42 } 43 default: { 44 // Dispatch the signal to java 45 HandleMark hm(THREAD); 46 Klass* k = SystemDictionary::resolve_or_null(vmSymbols::sun_misc_Signal(), THREAD); 47 KlassHandle klass (THREAD, k); 48 if (klass.not_null()) { 49 JavaValue result(T_VOID); 50 JavaCallArguments args; 51 args.push_int(sig); 52 JavaCalls::call_static( 53 &result, 54 klass, 55 vmSymbols::dispatch_name(), 56 vmSymbols::int_void_signature(), 57 &args, 58 THREAD 59 ); 60 } 61 if (HAS_PENDING_EXCEPTION) { 62 // tty is initialized early so we don't expect it to be null, but 63 // if it is we can't risk doing an initialization that might 64 // trigger additional out-of-memory conditions 65 if (tty != NULL) { 66 char klass_name[256]; 67 char tmp_sig_name[16]; 68 const char* sig_name = "UNKNOWN"; 69 InstanceKlass::cast(PENDING_EXCEPTION->klass())-> 70 name()->as_klass_external_name(klass_name, 256); 71 if (os::exception_name(sig, tmp_sig_name, 16) != NULL) 72 sig_name = tmp_sig_name; 73 warning("Exception %s occurred dispatching signal %s to handler" 74 "- the VM may need to be forcibly terminated", 75 klass_name, sig_name ); 76 } 77 CLEAR_PENDING_EXCEPTION; 78 } 79 } 80 } 81 } 82 }
这里的17行,我们看到了有个对 SIGBREAK(宏定义) 信号处理的 case,事实上,这个SIGBREAK和前面客户端发过来的SIGQUIT 的值是一样的,都是“3”,熟悉 C语言的读者应该不难理解。所以,当客户端发送这个信号给目标 JVM 时,就理所应当的进入了这个 case 的处理逻辑。这里的27行到40行,事实上就是对“kill -3 ${pid}”执行时对应的处理逻辑“进行目标 JVM 进程的线程 dump 操作”。现在我们重点关注一下20行的 if 语句,第一个 boolean 值,某认情况下是false(可通过/hotspot/src/share/vm/runtime/globals.c)查看,表示某认情况下是不禁止attach 机制的,于是就会进入第二个条件的判断AttachListener::is_init_trigger(),这里的判断还是比较有意思的(即判断当前杀是不是需要进行 attach 的初始化操作),我们进入源码,源码的文件为:hotspot/src/os/bsd/vm/attachListener_bsd.cpp
1 // If the file .attach_pid<pid> exists in the working directory 2 // or /tmp then this is the trigger to start the attach mechanism 3 bool AttachListener::is_init_trigger() { 4 if (init_at_startup() || is_initialized()) { 5 return false; // initialized at startup or already initialized 6 } 7 char path[PATH_MAX + 1]; 8 int ret; 9 struct stat st; 10 11 snprintf(path, PATH_MAX + 1, "%s/.attach_pid%d", 12 os::get_temp_directory(), os::current_process_id()); 13 RESTARTABLE(::stat(path, &st), ret); 14 if (ret == 0) { 15 // simple check to avoid starting the attach mechanism when 16 // a bogus user creates the file 17 if (st.st_uid == geteuid()) { 18 init(); 19 return true; 20 } 21 } 22 return false; 23 }
方法进入的第一行,即判断是不是在 JVM 启动时就初始化或者之前已经初始化过,如果是,则直接返回,否则继续当前方法。方法的第11行,是在处理/tmp/attach_pid${pid}路径(这个文件就是 Client JVM 在attach 时创建的),并把 path 传入13行定义的宏进行判断,如果这个文件存在,且刚好是当前用户的创建的 attach_pid 文件,则进入18行的 init() 方法,否则什么也不做,返回 false。接着我们进入 init 的源码(hotspot/src/share/vm/services/attachListener.cpp)
1 // Starts the Attach Listener thread 2 void AttachListener::init() { 3 EXCEPTION_MARK; 4 Klass* k = SystemDictionary::resolve_or_fail(vmSymbols::java_lang_Thread(), true, CHECK); 5 instanceKlassHandle klass (THREAD, k); 6 instanceHandle thread_oop = klass->allocate_instance_handle(CHECK); 7 8 const char thread_name[] = "Attach Listener"; 9 Handle string = java_lang_String::create_from_str(thread_name, CHECK); 10 11 // Initialize thread_oop to put it into the system threadGroup 12 Handle thread_group (THREAD, Universe::system_thread_group()); 13 JavaValue result(T_VOID); 14 JavaCalls::call_special(&result, thread_oop, 15 klass, 16 vmSymbols::object_initializer_name(), 17 vmSymbols::threadgroup_string_void_signature(), 18 thread_group, 19 string, 20 THREAD); 21 22 if (HAS_PENDING_EXCEPTION) { 23 tty->print_cr("Exception in VM (AttachListener::init) : "); 24 java_lang_Throwable::print(PENDING_EXCEPTION, tty); 25 tty->cr(); 26 27 CLEAR_PENDING_EXCEPTION; 28 29 return; 30 } 31 32 KlassHandle group(THREAD, SystemDictionary::ThreadGroup_klass()); 33 JavaCalls::call_special(&result, 34 thread_group, 35 group, 36 vmSymbols::add_method_name(), 37 vmSymbols::thread_void_signature(), 38 thread_oop, // ARG 1 39 THREAD); 40 41 if (HAS_PENDING_EXCEPTION) { 42 tty->print_cr("Exception in VM (AttachListener::init) : "); 43 java_lang_Throwable::print(PENDING_EXCEPTION, tty); 44 tty->cr(); 45 46 CLEAR_PENDING_EXCEPTION; 47 48 return; 49 } 50 51 { MutexLocker mu(Threads_lock); 52 JavaThread* listener_thread = new JavaThread(&attach_listener_thread_entry); 53 54 // Check that thread and osthread were created 55 if (listener_thread == NULL || listener_thread->osthread() == NULL) { 56 vm_exit_during_initialization("java.lang.OutOfMemoryError", 57 "unable to create new native thread"); 58 } 59 60 java_lang_Thread::set_thread(thread_oop(), listener_thread); 61 java_lang_Thread::set_daemon(thread_oop()); 62 63 listener_thread->set_threadObj(thread_oop()); 64 Threads::add(listener_thread); 65 Thread::start(listener_thread); 66 } 67 }
从源码中,我们可以看出来,这里最主要的功能是,创建一个名为“Attach Listener”的 Java 线程,该线程启动后会调用attach_listener_thread_entry这个方法(52行),来完成有关的任务处理。进入attach_listener_thread_entry方法
1 // The Attach Listener threads services a queue. It dequeues an operation 2 // from the queue, examines the operation name (command), and dispatches 3 // to the corresponding function to perform the operation. 4 5 static void attach_listener_thread_entry(JavaThread* thread, TRAPS) { 6 os::set_priority(thread, NearMaxPriority); 7 8 thread->record_stack_base_and_size(); 9 10 if (AttachListener::pd_init() != 0) { 11 return; 12 } 13 AttachListener::set_initialized(); 14 15 for (;;) { 16 AttachOperation* op = AttachListener::dequeue(); 17 if (op == NULL) { 18 return; // dequeue failed or shutdown 19 } 20 21 ResourceMark rm; 22 bufferedStream st; 23 jint res = JNI_OK; 24 25 // handle special detachall operation 26 if (strcmp(op->name(), AttachOperation::detachall_operation_name()) == 0) { 27 AttachListener::detachall(); 28 } else { 29 // find the function to dispatch too 30 AttachOperationFunctionInfo* info = NULL; 31 for (int i=0; funcs[i].name != NULL; i++) { 32 const char* name = funcs[i].name; 33 assert(strlen(name) <= AttachOperation::name_length_max, "operation <= name_length_max"); 34 if (strcmp(op->name(), name) == 0) { 35 info = &(funcs[i]); 36 break; 37 } 38 } 39 40 // check for platform dependent attach operation 41 if (info == NULL) { 42 info = AttachListener::pd_find_operation(op->name()); 43 } 44 45 if (info != NULL) { 46 // dispatch to the function that implements this operation 47 res = (info->func)(op, &st); 48 } else { 49 st.print("Operation %s not recognized!", op->name()); 50 res = JNI_ERR; 51 } 52 } 53 54 // operation complete - send result and output to client 55 op->complete(res, &st); 56 } 57 }
这里需要关注两个方面的内容,
第一、第10行的AttachListener::pd_init();
第二、第15行开始的 for 循环里面的内容。
首先看AttachListener::pd_init()
1 int AttachListener::pd_init() { 2 JavaThread* thread = JavaThread::current(); 3 ThreadBlockInVM tbivm(thread); 4 5 thread->set_suspend_equivalent(); 6 // cleared by handle_special_suspend_equivalent_condition() or 7 // java_suspend_self() via check_and_wait_while_suspended() 8 9 int ret_code = BsdAttachListener::init(); 10 11 // were we externally suspended while we were waiting? 12 thread->check_and_wait_while_suspended(); 13 14 return ret_code; 15 }
以上的 pd_init() 方法是在hotspot/src/os/bsd/vm/attachListener_bsd.cpp中实现的,我们看第9行的代码,调用了BsdAttachListener::init()一个这样的方法,该方法的主要作用就是生产 socket 通信文件的。源码如下
1 // Initialization - create a listener socket and bind it to a file 2 3 int BsdAttachListener::init() { 4 char path[UNIX_PATH_MAX]; // socket file 5 char initial_path[UNIX_PATH_MAX]; // socket file during setup 6 int listener; // listener socket (file descriptor) 7 8 // register function to cleanup 9 ::atexit(listener_cleanup); 10 11 int n = snprintf(path, UNIX_PATH_MAX, "%s/.java_pid%d", 12 os::get_temp_directory(), os::current_process_id()); 13 if (n < (int)UNIX_PATH_MAX) { 14 n = snprintf(initial_path, UNIX_PATH_MAX, "%s.tmp", path); 15 } 16 if (n >= (int)UNIX_PATH_MAX) { 17 return -1; 18 } 19 20 // create the listener socket 21 listener = ::socket(PF_UNIX, SOCK_STREAM, 0); 22 if (listener == -1) { 23 return -1; 24 } 25 26 // bind socket 27 struct sockaddr_un addr; 28 addr.sun_family = AF_UNIX; 29 strcpy(addr.sun_path, initial_path); 30 ::unlink(initial_path); 31 int res = ::bind(listener, (struct sockaddr*)&addr, sizeof(addr)); 32 if (res == -1) { 33 ::close(listener); 34 return -1; 35 } 36 37 // put in listen mode, set permissions, and rename into place 38 res = ::listen(listener, 5); 39 if (res == 0) { 40 RESTARTABLE(::chmod(initial_path, S_IREAD|S_IWRITE), res); 41 if (res == 0) { 42 // make sure the file is owned by the effective user and effective group 43 // (this is the default on linux, but not on mac os) 44 RESTARTABLE(::chown(initial_path, geteuid(), getegid()), res); 45 if (res == 0) { 46 res = ::rename(initial_path, path); 47 } 48 } 49 } 50 if (res == -1) { 51 ::close(listener); 52 ::unlink(initial_path); 53 return -1; 54 } 55 set_path(path); 56 set_listener(listener); 57 58 return 0; 59 }
从方法的注释,就能看出这个方法就是用来创建一个基于文件的 socket 的 listener 端,即服务端的。具体的创建过程,代码写的已经很清楚了,我做下简单描述,11行处,构建 socket 通信文件的路径(/tmp/.java_pid${pid}),21 行,创建一个 socket,其中关注 socket 函数的第一个参数,当为 PF_UNIX 时,表示创建文件 socket,详情可以参考 linux 的 socket 函数说明,然后到29行,将 socket 文件 path 拷贝到 socket 通信地址中,即以此文件作为通信地址,然后在31行时,将 socket 和该 socket 文件地址做一个绑定,38行,表示对当前 socket 进行监听(数字5表示监听时可容纳客户端连接的队列的大小),如果有Client JVM 的客户端连接上来,并且发送了相关消息,该服务端就可以对其进行相应处理了。至此,进程间 socket 的通信的通道就建立了。
其次看下 for 循环做了什么?其实很简单,16行,BsdAttachListener::dequeue() 从监听器的队列中拿到一个 Client JVM 的AttachOperation(当客户端 attach 上 target JVM 之后,往目标 JVM 发送任意 socket 信息,都会被放置到这个队列中,等待被处理),此处会被阻塞,直到收到请求,如下源码的13行, socket 的 accept 函数处于等待状态,等待来之客户端 JVM 的相关请求,一旦获取到请求,则将请求组装好返回给调用者一个BadAttachOperation 对象。
1 // Dequeue an operation 2 // 3 // In the Bsd implementation there is only a single operation and clients 4 // cannot queue commands (except at the socket level). 5 // 6 BsdAttachOperation* BsdAttachListener::dequeue() { 7 for (;;) { 8 int s; 9 10 // wait for client to connect 11 struct sockaddr addr; 12 socklen_t len = sizeof(addr); 13 RESTARTABLE(::accept(listener(), &addr, &len), s); 14 if (s == -1) { 15 return NULL; // log a warning? 16 } 17 18 // get the credentials of the peer and check the effective uid/guid 19 // - check with jeff on this. 20 uid_t puid; 21 gid_t pgid; 22 if (::getpeereid(s, &puid, &pgid) != 0) { 23 ::close(s); 24 continue; 25 } 26 uid_t euid = geteuid(); 27 gid_t egid = getegid(); 28 29 if (puid != euid || pgid != egid) { 30 ::close(s); 31 continue; 32 } 33 34 // peer credential look okay so we read the request 35 BsdAttachOperation* op = read_request(s); 36 if (op == NULL) { 37 ::close(s); 38 continue; 39 } else { 40 return op; 41 } 42 } 43 }
所以,只要收到一个AttachOperation不为“detachall”的操纵请求就会进入到45行处进行处理,这里的目的就是为了拿到对应的操作AttachOperationFunctionInfo对象,如果不为空,则调用其func,来完成对客户端的响应,如47行所示。AttachOperationFunctionInfo(/hotspot/src/share/vm/services/attachListener.cpp)的定义如下
1 // names must be of length <= AttachOperation::name_length_max 2 static AttachOperationFunctionInfo funcs[] = { 3 { "agentProperties", get_agent_properties }, 4 { "datadump", data_dump }, 5 { "dumpheap", dump_heap }, 6 { "load", JvmtiExport::load_agent_library }, 7 { "properties", get_system_properties }, 8 { "threaddump", thread_dump }, 9 { "inspectheap", heap_inspection }, 10 { "setflag", set_flag }, 11 { "printflag", print_flag }, 12 { "jcmd", jcmd }, 13 { NULL, NULL } 14 };
从这里,我们可以看到,threaddump、dumpheap 等我们常用的操纵。到此为止,水落石出,涉及到 attach 操纵的服务端的原理基本已经理清楚了。接下来我们以 jstack 为例,来看下客户端 JVM 是不是确实是以我们上面分析出来的方式与服务端 JVM 进行通信,并获取到它想要的内容的。
3.3 JVM 对 Attach 上来的进程的命令的响应,以 jstack -l 为例
我们首先进入 jstack 的源码,源码目录为jdk/src/share/classes/sun/tools/jstack/JStack.java。进入 main 函数
1 public static void main(String[] args) throws Exception { 2 if (args.length == 0) { 3 usage(1); // no arguments 4 } 5 6 boolean useSA = false; 7 boolean mixed = false; 8 boolean locks = false; 9 10 // Parse the options (arguments starting with "-" ) 11 int optionCount = 0; 12 while (optionCount < args.length) { 13 String arg = args[optionCount]; 14 if (!arg.startsWith("-")) { 15 break; 16 } 17 if (arg.equals("-help") || arg.equals("-h")) { 18 usage(0); 19 } 20 else if (arg.equals("-F")) { 21 useSA = true; 22 } 23 else { 24 if (arg.equals("-m")) { 25 mixed = true; 26 } else { 27 if (arg.equals("-l")) { 28 locks = true; 29 } else { 30 usage(1); 31 } 32 } 33 } 34 optionCount++; 35 } 36 37 // mixed stack implies SA tool 38 if (mixed) { 39 useSA = true; 40 } 41 42 // Next we check the parameter count. If there are two parameters 43 // we assume core file and executable so we use SA. 44 int paramCount = args.length - optionCount; 45 if (paramCount == 0 || paramCount > 2) { 46 usage(1); 47 } 48 if (paramCount == 2) { 49 useSA = true; 50 } else { 51 // If we can't parse it as a pid then it must be debug server 52 if (!args[optionCount].matches("[0-9]+")) { 53 useSA = true; 54 } 55 } 56 57 // now execute using the SA JStack tool or the built-in thread dumper 58 if (useSA) { 59 // parameters (<pid> or <exe> <core> 60 String params[] = new String[paramCount]; 61 for (int i=optionCount; i<args.length; i++ ){ 62 params[i-optionCount] = args[i]; 63 } 64 runJStackTool(mixed, locks, params); 65 } else { 66 // pass -l to thread dump operation to get extra lock info 67 String pid = args[optionCount]; 68 String params[]; 69 if (locks) { 70 params = new String[] { "-l" }; 71 } else { 72 params = new String[0]; 73 } 74 runThreadDump(pid, params); 75 } 76 }
当采用 jstack -l 时,会走65行的 else 分支,最终执行77行的runThreadDump方法
1 // Attach to pid and perform a thread dump 2 private static void runThreadDump(String pid, String args[]) throws Exception { 3 VirtualMachine vm = null; 4 try { 5 vm = VirtualMachine.attach(pid); 6 } catch (Exception x) { 7 String msg = x.getMessage(); 8 if (msg != null) { 9 System.err.println(pid + ": " + msg); 10 } else { 11 x.printStackTrace(); 12 } 13 if ((x instanceof AttachNotSupportedException) && 14 (loadSAClass() != null)) { 15 System.err.println("The -F option can be used when the target " + 16 "process is not responding"); 17 } 18 System.exit(1); 19 } 20 21 // Cast to HotSpotVirtualMachine as this is implementation specific 22 // method. 23 InputStream in = ((HotSpotVirtualMachine)vm).remoteDataDump((Object[])args); 24 25 // read to EOF and just print output 26 byte b[] = new byte[256]; 27 int n; 28 do { 29 n = in.read(b); 30 if (n > 0) { 31 String s = new String(b, 0, n, "UTF-8"); 32 System.out.print(s); 33 } 34 } while (n > 0); 35 in.close(); 36 vm.detach(); 37 }
5行:执行VirtualMachine.attach(pid);则会达到3.1、3.2的效果,即服务端已经做好了所有 attach 所需的准备,如 socket 服务端、socket 通信文件、socket 请求处理线程“Attach Listener”。
23行:通过调用 HotSpotVirtualMachine 对象的 remoteDataDump 函数进行远程 dump,获得输入流 InputStream in,最后通过读取输入流的内容,来通过标准输出流输出从服务端获取的数据。至此,jstack -l 命令完成所有操作。
接下来,我们重点分析HotSpotVirtualMachine 对象的 remoteDataDump 函数。首先上HotSpotVirtualMachine(/jdk/src/share/classes/sun/tools/attach/HotSpotVirtualMachine.java) 对象的 remoteDataDump的源码
1 // Remote ctrl-break. The output of the ctrl-break actions can 2 // be read from the input stream. 3 public InputStream remoteDataDump(Object ... args) throws IOException { 4 return executeCommand("threaddump", args); 5 }
请注意4行的 cmd 字符串为“threaddump”,这个和3.2中 AttachOperationFunctionInfo 的定义是吻合的,也就是说最终在服务端会调用 thread_dump 方法,来执行线程 dump,并将结果返回给客户端。接着我们看下下 executeCommand方法,该方法只是简单的调用 execute 方法,如下
1 /* 2 * Convenience method for simple commands 3 */ 4 private InputStream executeCommand(String cmd, Object ... args) throws IOException { 5 try { 6 return execute(cmd, args); 7 } catch (AgentLoadException x) { 8 throw new InternalError("Should not get here", x); 9 } 10 }
exectue 方法在该类中为抽象方法,其具体实现放在了sun.tools.attach.BsdVirtualMachine.java中,我们看下在其具体实现这里可是最最关键的地方了
1 /** 2 * Execute the given command in the target VM. 3 */ 4 InputStream execute(String cmd, Object ... args) throws AgentLoadException, IOException { 5 assert args.length <= 3; // includes null 6 7 // did we detach? 8 String p; 9 synchronized (this) { 10 if (this.path == null) { 11 throw new IOException("Detached from target VM"); 12 } 13 p = this.path; 14 } 15 16 // create UNIX socket 17 int s = socket(); 18 19 // connect to target VM 20 try { 21 connect(s, p); 22 } catch (IOException x) { 23 close(s); 24 throw x; 25 } 26 27 IOException ioe = null; 28 29 // connected - write request 30 // <ver> <cmd> <args...> 31 try { 32 writeString(s, PROTOCOL_VERSION); 33 writeString(s, cmd); 34 35 for (int i=0; i<3; i++) { 36 if (i < args.length && args[i] != null) { 37 writeString(s, (String)args[i]); 38 } else { 39 writeString(s, ""); 40 } 41 } 42 } catch (IOException x) { 43 ioe = x; 44 } 45 46 47 // Create an input stream to read reply 48 SocketInputStream sis = new SocketInputStream(s); 49 50 // Read the command completion status 51 int completionStatus; 52 try { 53 completionStatus = readInt(sis); 54 } catch (IOException x) { 55 sis.close(); 56 if (ioe != null) { 57 throw ioe; 58 } else { 59 throw x; 60 } 61 } 62 63 if (completionStatus != 0) { 64 sis.close(); 65 66 // In the event of a protocol mismatch then the target VM 67 // returns a known error so that we can throw a reasonable 68 // error. 69 if (completionStatus == ATTACH_ERROR_BADVERSION) { 70 throw new IOException("Protocol mismatch with target VM"); 71 } 72 73 // Special-case the "load" command so that the right exception is 74 // thrown. 75 if (cmd.equals("load")) { 76 throw new AgentLoadException("Failed to load agent library"); 77 } else { 78 throw new IOException("Command failed in target VM"); 79 } 80 } 81 82 // Return the input stream so that the command output can be read 83 return sis; 84 }
17行:创建一个 socket,这个是 socket 是一个 jni 本地方法,有兴趣的可以去看对应的实现,源码在jdk/src/solaris/native/sun/tools/attach/BsdVirtualMachine.c中,其关键操作就一个return socket(PF_UNIX, SOCK_STREAM, 0) 客户端 socket 连接。
21行:这里也是一个本地方法,调用了 connect(s,p),这里的 p 就是 attach 时产生的/tmp/.java_pid${pid}的 socket 文件路径,这样,客户端就和目标 JVM 连接上了,该方法同样是一个 native 方法,可以通过查看BsdVirtualMachine.c的源码来进行查看,如下,重点在16行使用 socket 文件路径作为连接地址 和 18 行与目标 JVM 端启动的 socket server 建立连接;
1 /* 2 * Class: sun_tools_attach_BsdVirtualMachine 3 * Method: connect 4 * Signature: (ILjava/lang/String;)I 5 */ 6 JNIEXPORT void JNICALL Java_sun_tools_attach_BsdVirtualMachine_connect 7 (JNIEnv *env, jclass cls, jint fd, jstring path) 8 { 9 jboolean isCopy; 10 const char* p = GetStringPlatformChars(env, path, &isCopy); 11 if (p != NULL) { 12 struct sockaddr_un addr; 13 int err = 0; 14 15 addr.sun_family = AF_UNIX; 16 strcpy(addr.sun_path, p); 17 18 if (connect(fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) { 19 err = errno; 20 } 21 22 if (isCopy) { 23 JNU_ReleaseStringPlatformChars(env, path, p); 24 } 25 26 /* 27 * If the connect failed then we throw the appropriate exception 28 * here (can't throw it before releasing the string as can't call 29 * JNI with pending exception) 30 */ 31 if (err != 0) { 32 if (err == ENOENT) { 33 JNU_ThrowByName(env, "java/io/FileNotFoundException", NULL); 34 } else { 35 char* msg = strdup(strerror(err)); 36 JNU_ThrowIOException(env, msg); 37 if (msg != NULL) { 38 free(msg); 39 } 40 } 41 } 42 }
31~44行:想 Target JVM 端发送命令 threaddump、以及可能存在的相关参数,如-l;这里的 writeString 同样是一个本地方法,涉及到的底层操作就是一个 C 语言库的 write 操作,感兴趣的可以自己看源码,不再赘述;
48~83行:这里就是对当前 socket 连接,构建一个 SocketInputStream 对象,并等待Target JVM 端数据完全返回,最后将这个 InputStream 对象作为方法返回参数返回。
四、总结
本文结合 Attach 的原理和使用案例(jstack -l),对 Attach 的各个方面都进行了深入的分析和总结,希望能对有需要的同学有所帮助。当然,以上均为本人个人所学,所以难免会有错误和疏忽的地方,如果您发现了,还麻烦指出。