zoukankan      html  css  js  c++  java
  • Erlang process structure -- refc binary

    Erlang 的process 是虚拟机层面的进程,每个Erlang process 都包括一个 pcb(process control block), 一个stack 以及私有heap .

    这部分的姿势, 在各种论文中都有提到. 网上也有各种各样的解读,包括但不仅限于:

    1, http://fengchj.com/?p=2255

    2, http://blog.csdn.net/mycwq/article/details/26613275

    那么, 从现有的资料,可以看出,正因为在Erlang 虚拟机内部,每个进程(process)都有自己的PCB,自己的stack和自己的私有heap(注意,现在的Erlang还不支持shared heap). Erlang的 GC 并不是"stop whe whole world",只是针对于每一个进程而言的.

    Each process’ heap is garbage collected independently. Thus when one scheduler is collecting garbage for a process, other schedulers can keep executing other processes.

    但是,并不是所有的数据都是private heap 的, 同样也有一些数据, 是存放在shared 区域的.

    In addition, binaries larger than 64 bytes are stored in a common heap shared by all processes. ETS tables are also stored in a common heap.

    binaries larger than 64 bytes, 也就是大家常说到的refc binaries, 这部分的解读网上也有很多, 举个栗子:

    1, http://blog.csdn.net/zhongruixian/article/details/9450361

    这个时候, Erlang的GC,就有可能遇到一些问题了,因为Erlang 虚拟机对待这部分shared memory 的GC,是采取引用计数器的.

    然后,看一段完整的小程序:

     1 -module(refc_binary_test).
     2 
     3 -export ([start/0,
     4           handle_big_binary/1]).
     5 
     6 start() ->
     7     Me = erlang:self(),
     8     erlang:spawn(?MODULE, handle_big_binary, [Me]),
     9     receive
    10         {ok, C} ->
    11             io:format("----- get_bin_address C : ~p~n", [test:get_bin_address(C)]),
    12             io:format("------- handled ~p~n", [erts_debug:get_internal_state({binary_info, C})]),
    13             timer:sleep(1000000),
    14             C;
    15         _ ->
    16             error
    17     after 10000 ->
    18             error
    19     end.
    20 
    21 handle_big_binary(Me) ->
    22     A = binary:copy(<<1>>, 1024*1024),
    23     io:format("----- get_bin_address A : ~p~n", [test:get_bin_address(A)]),
    24     io:format("------- resource ~p~n", [erts_debug:get_internal_state({binary_info, A})]),
    25     <<B:1/binary, _/binary>> = A,
    26     io:format("----- get_bin_address B : ~p~n", [test:get_bin_address(B)]),
    27     erlang:send(Me, {ok, B}).

    熟悉Ejabberd 的人,对这一模式应该不会陌生, handle_big_binary/1 的execute 进程可以映射到ejabberd 中的ejabberd_receiver module, start/0 可以看做是ejabberd 中的c2s 进程. ok, 和TCP socket 直接关联的进程会解析socket 数据, 解析完成后, 交给实际处理进程.

    3> refc_binary_test:start().
    ----- get_bin_address A : "bin: size=1048576, ptr=0x18fc0040"
    ------- resource {refc_binary,1048576,{binary,1048576},0}
    ----- get_bin_address B : "bin: size=1, ptr=0x18fc0040"
    ----- get_bin_address C : "bin: size=1, ptr=0x18fc0040"
    ------- handled {refc_binary,1,{binary,1048576},0}

    那么, 这个时候, 可以看到 变量 C 还占据这'{binary,1048576}' 的数据(1048576 是binary 的orig_size),即便是handle_big_binary 的进程在send/2 之后就已经结束生命. 然后, 可以放大一下这个小问题:

    4> [proc_lib:spawn(refc_binary_test, start, []) || _ <- lists:seq(1, 1000)].

    然后就会看到, beam.smp 进程已经占用了超过了1G的内存:(.

    那,how fix ?

    在binary module 中,referenced_byte_size/1 func:

    If a binary references a larger binary (often described as being a sub-binary), it can be useful to get the size of the actual referenced binary. This function can be used in a program to trigger the use of copy/1. By copying a binary, one might dereference the original, possibly large, binary which a smaller binary is a reference to.

    1 store(Binary, GBSet) ->
    2   NewBin =
    3       case binary:referenced_byte_size(Binary) of
    4           Large when Large > 2 * byte_size(Binary) ->
    5              binary:copy(Binary);
    6           _ ->
    7              Binary
    8       end,
    9   gb_sets:insert(NewBin,GBSet).

    然后, 在看binary:copy/1 的function 描述:

    This function will always create a new binary, even if N = 1. By using copy/1 on a binary referencing a larger binary, one might free up the larger binary for garbage collection.

    这个时候再去refc_binary_test:start(). 就不会出现上面的问题了.

    总结:

    1, refc binary 的存储使用的是shared memory;

    2, 对于refc binary 的GC 策略是引用计数器;

    3, refc binary 的内存分配是连续的.

  • 相关阅读:
    JVM 规范
    通过jmap查看jvm采用的垃圾收集器
    Nginx做前端Proxy时TIME_WAIT过多的问题
    nginx访问http自动跳转到https
    mysql5.7启动slave报错 ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository
    nginx检查报错 error while loading shared libraries: libprofiler.so.0: cannot open shared object file: No such file or directory
    Nginx+Center OS 7.2 开机启动设置(转载)
    windows下安装pycharm并连接Linux的python环境
    jenkins结合脚本实现代码自动化部署及一键回滚至上一版本
    centos7-安装mysql5.6.36
  • 原文地址:https://www.cnblogs.com/--00/p/erlang_process_structure_refc_binary.html
Copyright © 2011-2022 走看看