各位坐稳扶好,我们要开车了。不过在开车之前,我们还是例行回顾一下上期分享的要点。
经过前两期的铺垫及烧脑的分享,我们大概对「如何实现 Java 应用进程的状态监控,如果被监控的进程 down 掉,是否有机制能启动起来?」问题本身有了一个新的认识,那这期我们不妨拿出攻城狮的绝招 Ctrl + C、Ctrl + V,从 Resin 源码中摘取一二,稍微简单实践一下。
按照图示,咱们先演示一下实践效果吧,首先找到并运行程序入口 MonitorApp,日志输出如下。
此时我们不妨在控制台输入 jps 命令,看一看效果。
18830 MonitorApp 18831 Resin
发现成功启动了 MonitorApp、Resin 两个进程,和 Resin 应用服务器是一模一样的,如果我们把进程号为 18831 的 kill 掉,会是什么效果?发现控制台日志输出又多了一些,貌似丫鬟 Resin 又被重新给启动了。
在控制台输入 jps 命令再确认一下是否真的变了。
18830 MonitorApp 18935 Resin
那我们到底该如何实现?那不妨照葫芦画瓢,模仿一下 Resin 的实现一下(这就是绝招:仿一仿)。
首先定义我们的监控应用入口 MonitorApp,很简单就是把创建子进程的任务给启动起来。
package com.caucho.server.resin; public class MonitorApp { public static void main(String[] args) { new WatchdogChildTask().start(); } }
接下来再编写 WatchdogChildTask 子进程任务的代码,大部分来源于 Resin 的源码,只是剔除了很多很多很多,简化了很多很多很多。仔细看发现也很简单,就有一个循环一直调用 WatchdogChildProcess 的 run 方法,目的也就是一直让丫鬟进程跑起来。
package com.caucho.server.resin; import java.util.concurrent.Executors; import java.util.logging.Level; import java.util.logging.Logger; class WatchdogChildTask implements Runnable { private static final Logger log = Logger.getLogger(WatchdogChildTask.class.getName()); private WatchdogChildProcess _process; /** * Starts management of the watchdog process */ public void start() { //TODO 手动创建线程池会更好 【阿里开发规约】 Executors.newFixedThreadPool(1).execute(this); } /** * Main thread watching over the health of the Resin instances. */ public void run() { try { int i = 0; long retry = Long.MAX_VALUE; while (i++ < retry) { WatchdogChildProcess process = new WatchdogChildProcess(); _process = process; try { log.log(Level.INFO, "我是大总管,准备让乳名为Resin的丫鬟跑起来"); _process.run(); } catch (Exception e) { log.log(Level.WARNING, e.toString(), e); } finally { _process = null; if (process != null) { log.log(Level.INFO, "我是大总管,发现乳名为Resin的丫鬟出状况了,需要让她释放资源,重新跑起来"); process.kill(); } } } } catch (Exception e) { log.log(Level.WARNING, e.toString(), e); } finally { if (_process != null) { _process.kill(); _process = null; } } } }
具体是怎么把丫鬟进程跑起来的,这个事情专门交给 WatchdogChildProcess 去做了,先启动了一个 socket 通讯端口;然后采用 ProcessBuilder 启动 Resin 进程;然后等待丫鬟进程建立 socket 连接通讯。大部分也是来源于 Resin 的源码,只不过做了大量删减。另外重点提一嘴:拿下去只需修改 com.caucho.server.resin.Resin 为你要监控应用的主函数即可。
package com.caucho.server.resin; import java.io.*; import java.net.*; import java.util.*; import java.util.concurrent.atomic.AtomicReference; import java.util.logging.Level; import java.util.logging.Logger; class WatchdogChildProcess { private static final Logger log = Logger.getLogger(WatchdogChildProcess.class.getName()); private Socket _childSocket; private OutputStream _stdOs; private int _status = -1; private AtomicReference<Process> _processRef = new AtomicReference<Process>(); public void run() { ServerSocket ss = null; Socket s = null; try { ss = new ServerSocket(0, 5, InetAddress.getByName("127.0.0.1")); int port = ss.getLocalPort(); log.log(Level.INFO, "我是大总管,我启动一个端口为{0}的socket,让丫鬟们实时与我通讯",port); Process process = createProcess(port); if (process != null) { _processRef.compareAndSet(null, process); InputStream stdIs = process.getInputStream(); _stdOs = process.getOutputStream(); //TODO 不要显式创建线程,请使用线程池【阿里开发规约】 new Thread(new WatchdogProcessLogThread(stdIs)).start(); s = connectToChild(ss); _status = process.waitFor(); logStatus(_status); } } catch (Exception e) { log.log(Level.WARNING, e.toString(), e); try { Thread.sleep(5000); } catch (Exception e1) { } } catch (Throwable e) { log.log(Level.WARNING, e.toString(), e); } finally { if (ss != null) { try { ss.close(); } catch (Throwable e) { } } try { if (s != null) { s.close(); } } catch (Throwable e) { log.log(Level.FINER, e.toString(), e); } kill(); synchronized (this) { notifyAll(); } } } private void logStatus(int status) { String code = " (exit code=" + status + ")"; log.warning("大总管突然发现丫鬟进程罢工了!!"); } void kill() { Process process = _processRef.getAndSet(null); if (process != null) { try { process.destroy(); } catch (Exception e) { log.log(Level.FINE, e.toString(), e); } } OutputStream stdOs = _stdOs; _stdOs = null; if (stdOs != null) { try { stdOs.close(); } catch (Throwable e) { log.log(Level.FINE, e.toString(), e); } } Socket childSocket = _childSocket; _childSocket = null; if (childSocket != null) { try { childSocket.close(); } catch (Throwable e) { log.log(Level.FINE, e.toString(), e); } } if (process != null) { try { process.waitFor(); } catch (Exception e) { log.log(Level.INFO, e.toString(), e); } } } /** * Waits for a socket connection from the child, returning the socket * * @param ss TCP ServerSocket from the watchdog for the child to connect to */ private Socket connectToChild(ServerSocket ss) throws IOException { Socket s = null; try { ss.setSoTimeout(60000); for (int i = 0; i < 120 && s == null; i++) { try { s = ss.accept(); } catch (SocketTimeoutException e) { } } if (s != null) { _childSocket = s; } } catch (Exception e) { log.log(Level.WARNING, e.toString(), e); } finally { ss.close(); } return s; } /** * Creates a new Process for the Resin JVM, initializing the environment * and passing value to the new process. * * @param socketPort the watchdog socket port * @param out the debug log jvm-default.log */ private Process createProcess(int socketPort) throws IOException { HashMap<String, String> env = buildEnv(); ArrayList<String> jvmArgs = buildJvmArgs(); jvmArgs.add("com.caucho.server.resin.Resin"); jvmArgs.add("-socketwait"); jvmArgs.add(String.valueOf(socketPort)); ProcessBuilder builder = new ProcessBuilder(); builder.environment().putAll(env); builder = builder.command(jvmArgs); builder.redirectErrorStream(true); return builder.start(); } private HashMap<String, String> buildEnv() throws IOException { HashMap<String, String> env = new HashMap<String, String>(); env.putAll(System.getenv()); StringBuilder classPath = new StringBuilder(); // TODO 系统不一样分割符也不同 windows为分号; classPath.append(".:"); String appPath = System.getProperty("user.dir"); classPath.append(appPath).append("/resin/target/classes"); env.put("CLASSPATH", classPath.toString()); // 。。。删除了可多可多的代码 。。。 return env; } private ArrayList<String> buildJvmArgs() { ArrayList<String> jvmArgs = new ArrayList<String>(); jvmArgs.add("java"); // ... 又删除了可多代码 ... return jvmArgs; } /** * Watchdog thread responsible for writing jvm-default.log by reading the * JVM's stdout and copying it to the log. */ class WatchdogProcessLogThread implements Runnable { private InputStream _is; /** * @param is the stdout stream from the Resin */ WatchdogProcessLogThread(InputStream is) { _is = is; } @Override public void run() { try { int len; byte[] data = new byte[4096]; while ((len = _is.read(data, 0, data.length)) > 0) { System.out.print(new String(data, 0, len)); } } catch (Throwable e) { log.log(Level.WARNING, e.toString(), e); } finally { kill(); } } } }
下面这个要重点说下,因为这套模型你拿过去,只需修改下面 Resin 这个类的代码,这个其实也就是我们要监控的应用。其实很简单,就有一个 connect 方法主要用于与大总管进行通讯,一旦通讯失败本身就退出。
package com.caucho.server.resin; import java.io.IOException; import java.io.InputStream; import java.net.Socket; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.logging.Level; import java.util.logging.Logger; public class Resin { private static ExecutorService executorService = Executors.newFixedThreadPool(1); private static final Logger log = Logger.getLogger(Resin.class.getName()); public static void main(String[] args) { log.log(Level.INFO, "我是乳名为Resin的丫鬟,大总管给的通讯端口为{0} {1}", args); //获取传入的参数 port int port = Integer.parseInt(args[1]); connect(port); } public static void connect(final int port) { log.log(Level.INFO, "我是乳名为Resin的丫鬟,我要开始与端口为{0}的大总管进行通讯",port); executorService.execute(new Runnable() { @Override public void run() { Socket socket = null; try { socket = new Socket("127.0.0.1", port); InputStream s = socket.getInputStream(); byte[] buf = new byte[1024]; int len; while ((len = s.read(buf)) != -1) { log.log(Level.INFO, "通讯信息 {0}", new String(buf, 0, len)); } } catch (IOException e) { log.log(Level.WARNING, "我是乳名为Resin的丫鬟,与端口为{0}的大总管进行通讯发生异常",port); } finally { try { socket.close(); } catch (IOException e) { log.log(Level.WARNING, e.getMessage(), e); } log.log(Level.INFO, "我是乳名为Resin的丫鬟,与端口为{0}的大总管进行通讯结束,我要退下啦",port); System.exit(0); } } }); log.log(Level.INFO, "丫鬟与大总管通讯完成"); } }
到这,代码也就码完了,不妨把代码拔下去,运行一下,稍微体验体验,看看是不是那回事儿!其中为了演示需要删除了 N 多代码,有些地方很不优雅,还需按照阿里开发规约适当调整调整,不过这些不是咱们这期分享的重点,咱们重点是思想 + 轻实践。
好了,思想也落地了,接下来就看你怎么让它老树开新花啦。分享就到这儿吧,希望能够解你所惑;希望能在你前进的道路上,帮你披荆斩棘。如果感觉有点帮助,欢迎秒赞,疯狂分享转发,因为你的每一次分享,我都认真当成了鼓励与喜欢。