zoukankan      html  css  js  c++  java
  • bcc -execsnoop 性能---未完

      目前使用到的bcc程序主要包括两个部分,一部分是python语言,一部分是c语言。python部分主要做的工作是BPF程序的加载和操作BPF程序的map,并进行数据处理。c部分会被llvm编译器编译为BPF字节码,经过BPF验证器验证安全后,加载到内核中执行。python和c中出现的陌生函数可以查下面这两个手册

    python 等函数:Python链接

    c等函数:链接

    bcc 安装:bcc_install

    bcc program book: book url

    https://www.ebpf.top/post/bpf_learn_path/

    什么是 bcc

    • Bcc 的开源项目:https://github.com/iovisor/bcc
    • eBPF 虚拟机使用的是类似于汇编语言的指令,对于程序编写来说直接使用难度非常大。bcc 提供了一个名为 bcc 的 python 库,简化了 eBPF 应用的开发过程
    • Bcc 收集了大量现成的 eBPF 程序可以直接拿来使用,可以通过以下工具分布图感受一下

     

    https://github.com/brendangregg/perf-tools/blob/master/execsnoop

     其execsnoop 代码实现如下:

    #!/bin/bash
    #
    # execsnoop - trace process exec() with arguments.
    #             Written using Linux ftrace.
    #
    # This shows the execution of new processes, especially short-lived ones that
    # can be missed by sampling tools such as top(1).
    #
    # USAGE: ./execsnoop [-hrt] [-n name]
    #
    # REQUIREMENTS: FTRACE and KPROBE CONFIG, sched:sched_process_fork tracepoint,
    # and either the sys_execve, stub_execve or do_execve kernel function. You may
    # already have these on recent kernels. And awk.
    #
    # This traces exec() from the fork()->exec() sequence, which means it won't
    # catch new processes that only fork(). With the -r option, it will also catch
    # processes that re-exec. It makes a best-effort attempt to retrieve the program
    # arguments and PPID; if these are unavailable, 0 and "[?]" are printed
    # respectively. There is also a limit to the number of arguments printed (by
    # default, 8), which can be increased using -a.
    #
    # This implementation is designed to work on older kernel versions, and without
    # kernel debuginfo. It works by dynamic tracing an execve kernel function to
    # read the arguments from the %si register. The sys_execve function is tried
    # first, then stub_execve and do_execve. The sched:sched_process_fork
    # tracepoint is used to get the PPID. This program is a workaround that should be
    # improved in the future when other kernel capabilities are made available. If
    # you need a more reliable tool now, then consider other tracing alternatives
    # (eg, SystemTap). This tool is really a proof of concept to see what ftrace can
    # currently do.
    #
    # From perf-tools: https://github.com/brendangregg/perf-tools
    #
    # See the execsnoop(8) man page (in perf-tools) for more info.
    #
    # COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
    #
    #  This program is free software; you can redistribute it and/or
    #  modify it under the terms of the GNU General Public License
    #  as published by the Free Software Foundation; either version 2
    #  of the License, or (at your option) any later version.
    #
    #  This program is distributed in the hope that it will be useful,
    #  but WITHOUT ANY WARRANTY; without even the implied warranty of
    #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    #  GNU General Public License for more details.
    #
    #  You should have received a copy of the GNU General Public License
    #  along with this program; if not, write to the Free Software Foundation,
    #  Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
    #
    #  (http://www.gnu.org/copyleft/gpl.html)
    #
    # 07-Jul-2014    Brendan Gregg    Created this.
    
    ### default variables
    tracing=/sys/kernel/debug/tracing
    flock=/var/tmp/.ftrace-lock; wroteflock=0
    opt_duration=0; duration=; opt_name=0; name=; opt_time=0; opt_reexec=0
    opt_argc=0; argc=8; max_argc=16; ftext=
    trap ':' INT QUIT TERM PIPE HUP    # sends execution to end tracing section
    
    function usage {
        cat <<-END >&2
        USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
                         -d seconds      # trace duration, and use buffers
                         -a argc         # max args to show (default 8)
                         -r              # include re-execs
                         -t              # include time (seconds)
                         -h              # this usage message
                         name            # process name to match (REs allowed)
          eg,
               execsnoop                 # watch exec()s live (unbuffered)
               execsnoop -d 1            # trace 1 sec (buffered)
               execsnoop grep            # trace process names containing grep
               execsnoop 'udevd$'        # process names ending in "udevd"
    
        See the man page and example file for more info.
    END
        exit
    }
    
    function warn {
        if ! eval "$@"; then
            echo >&2 "WARNING: command failed "$@""
        fi
    }
    
    function end {
        # disable tracing
        echo 2>/dev/null
        echo "Ending tracing..." 2>/dev/null
        cd $tracing
        warn "echo 0 > events/kprobes/$kname/enable"
        warn "echo 0 > events/sched/sched_process_fork/enable"
        warn "echo -:$kname >> kprobe_events"
        warn "echo > trace"
        (( wroteflock )) && warn "rm $flock"
    }
    
    function die {
        echo >&2 "$@"
        exit 1
    }
    
    function edie {
        # die with a quiet end()
        echo >&2 "$@"
        exec >/dev/null 2>&1
        end
        exit 1
    }
    
    ### process options
    while getopts a:d:hrt opt
    do
        case $opt in
        a)    opt_argc=1; argc=$OPTARG ;;
        d)    opt_duration=1; duration=$OPTARG ;;
        r)    opt_reexec=1 ;;
        t)    opt_time=1 ;;
        h|?)    usage ;;
        esac
    done
    shift $(( $OPTIND - 1 ))
    if (( $# )); then
        opt_name=1
        name=$1
        shift
    fi
    (( $# )) && usage
    
    ### option logic
    (( opt_pid && opt_name )) && die "ERROR: use either -p or -n."
    (( opt_pid )) && ftext=" issued by PID $pid"
    (( opt_name )) && ftext=" issued by process name "$name""
    (( opt_file )) && ftext="$ftext for filenames containing "$file""
    (( opt_argc && argc > max_argc )) && die "ERROR: max -a argc is $max_argc."
    if (( opt_duration )); then
        echo "Tracing exec()s$ftext for $duration seconds (buffered)..."
    else
        echo "Tracing exec()s$ftext. Ctrl-C to end."
    fi
    
    ### select awk
    if (( opt_duration )); then
        [[ -x /usr/bin/mawk ]] && awk=mawk || awk=awk
    else
        # workarounds for mawk/gawk fflush behavior
        if [[ -x /usr/bin/gawk ]]; then
            awk=gawk
        elif [[ -x /usr/bin/mawk ]]; then
            awk="mawk -W interactive"
        else
            awk=awk
        fi
    fi
    
    ### check permissions
    cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
        debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
    
    ### ftrace lock
    [[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
    echo $$ > $flock || die "ERROR: unable to write $flock."
    wroteflock=1
    
    ### build probe
    if [[ -x /usr/bin/getconf ]]; then
        bits=$(getconf LONG_BIT)
    else
        bits=64
        [[ $(uname -m) == i* ]] && bits=32
    fi
    (( offset = bits / 8 ))
    function makeprobe {
        func=$1
        kname=execsnoop_$func
        kprobe="p:$kname $func"
        i=0
        while (( i < argc + 1 )); do
            # p:kname do_execve +0(+0(%si)):string +0(+8(%si)):string ...
            kprobe="$kprobe +0(+$(( i * offset ))(%si)):string"
            (( i++ ))
        done
    }
    # try in this order: sys_execve, stub_execve, do_execve
    makeprobe sys_execve
    
    ### setup and begin tracing
    echo nop > current_tracer
    if ! echo $kprobe >> kprobe_events 2>/dev/null; then
        makeprobe stub_execve
        if ! echo $kprobe >> kprobe_events 2>/dev/null; then
            makeprobe do_execve
            if ! echo $kprobe >> kprobe_events 2>/dev/null; then
                edie "ERROR: adding a kprobe for execve. Exiting."
            fi
        fi
    fi
    if ! echo 1 > events/kprobes/$kname/enable; then
        edie "ERROR: enabling kprobe for execve. Exiting."
    fi
    if ! echo 1 > events/sched/sched_process_fork/enable; then
        edie "ERROR: enabling sched:sched_process_fork tracepoint. Exiting."
    fi
    echo "Instrumenting $func"
    (( opt_time )) && printf "%-16s " "TIMEs"
    printf "%6s %6s %s
    " "PID" "PPID" "ARGS"
    
    #
    # Determine output format. It may be one of the following (newest first):
    #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    # To differentiate between them, the number of header fields is counted,
    # and an offset set, to skip the extra column when needed.
    #
    offset=$($awk 'BEGIN { o = 0; }
        $1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
        $2 ~ /TASK/ { print o; exit }' trace)
    
    ### print trace buffer
    warn "echo > trace"
    ( if (( opt_duration )); then
        # wait then dump buffer
        sleep $duration
        cat -v trace
    else
        # print buffer live
        cat -v trace_pipe
    fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name 
        -v opt_duration=$opt_duration -v opt_time=$opt_time -v kname=$kname 
        -v opt_reexec=$opt_reexec '
        # common fields
        $1 != "#" {
            # task name can contain dashes
            comm = pid = $1
            sub(/-[0-9][0-9]*/, "", comm)
            sub(/.*-/, "", pid)
        }
    
        $1 != "#" && $(4+o) ~ /sched_process_fork/ {
            cpid=$0
            sub(/.* child_pid=/, "", cpid)
            sub(/ .*/, "", cpid)
            getppid[cpid] = pid
            delete seen[pid]
        }
    
        $1 != "#" && $(4+o) ~ kname {
            if (seen[pid])
                next
            if (opt_name && comm !~ name)
                next
    
            #
            # examples:
            # ... arg1="/bin/echo" arg2="1" arg3="2" arg4="3" ...
            # ... arg1="sleep" arg2="2" arg3=(fault) arg4="" ...
            # ... arg1="" arg2=(fault) arg3="" arg4="" ...
            # the last example is uncommon, and may be a race.
            #
            if ($0 ~ /arg1=""/) {
                args = comm " [?]"
            } else {
                args=$0
                sub(/ arg[0-9]*=(fault).*/, "", args)
                sub(/.*arg1="/, "", args)
                gsub(/" arg[0-9]*="/, " ", args)
                sub(/"$/, "", args)
                if ($0 !~ /(fault)/)
                    args = args " [...]"
            }
    
            if (opt_time) {
                time = $(3+o); sub(":", "", time)
                printf "%-16s ", time
            }
            printf "%6s %6d %s
    ", pid, getppid[pid], args
            if (!opt_duration)
                fflush()
            if (!opt_reexec) {
                seen[pid] = 1
                delete getppid[pid]
            }
        }
    
        $0 ~ /LOST.*EVENT[S]/ { print "WARNING: " $0 > "/dev/stderr" }
    '
    
    ### end tracing
    end

    python 版本依赖于bcc bpf 如下:

    #!/usr/bin/python
    # @lint-avoid-python-3-compatibility-imports
    #
    # execsnoop Trace new processes via exec() syscalls.
    #           For Linux, uses BCC, eBPF. Embedded C.
    #
    # USAGE: execsnoop [-h] [-T] [-t] [-x] [-q] [-n NAME] [-l LINE]
    #                  [--max-args MAX_ARGS]
    #
    # This currently will print up to a maximum of 19 arguments, plus the process
    # name, so 20 fields in total (MAXARG).
    #
    # This won't catch all new processes: an application may fork() but not exec().
    #
    # Copyright 2016 Netflix, Inc.
    # Licensed under the Apache License, Version 2.0 (the "License")
    #
    # 07-Feb-2016   Brendan Gregg   Created this.
    
    from __future__ import print_function
    from bcc import BPF
    from bcc.utils import ArgString, printb
    import bcc.utils as utils
    import argparse
    import re
    import time
    import pwd
    from collections import defaultdict
    from time import strftime
    
    
    def parse_uid(user):
        try:
            result = int(user)
        except ValueError:
            try:
                user_info = pwd.getpwnam(user)
            except KeyError:
                raise argparse.ArgumentTypeError(
                    "{0!r} is not valid UID or user entry".format(user))
            else:
                return user_info.pw_uid
        else:
            # Maybe validate if UID < 0 ?
            return result
    
    
    # arguments
    examples = """examples:
        ./execsnoop           # trace all exec() syscalls
        ./execsnoop -x        # include failed exec()s
        ./execsnoop -T        # include time (HH:MM:SS)
        ./execsnoop -U        # include UID
        ./execsnoop -u 1000   # only trace UID 1000
        ./execsnoop -u user   # get user UID and trace only them
        ./execsnoop -t        # include timestamps
        ./execsnoop -q        # add "quotemarks" around arguments
        ./execsnoop -n main   # only print command lines containing "main"
        ./execsnoop -l tpkg   # only print command where arguments contains "tpkg"
        ./execsnoop --cgroupmap ./mappath  # only trace cgroups in this BPF map
    """
    parser = argparse.ArgumentParser(
        description="Trace exec() syscalls",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=examples)
    parser.add_argument("-T", "--time", action="store_true",
        help="include time column on output (HH:MM:SS)")
    parser.add_argument("-t", "--timestamp", action="store_true",
        help="include timestamp on output")
    parser.add_argument("-x", "--fails", action="store_true",
        help="include failed exec()s")
    parser.add_argument("--cgroupmap",
        help="trace cgroups in this BPF map only")
    parser.add_argument("-u", "--uid", type=parse_uid, metavar='USER',
        help="trace this UID only")
    parser.add_argument("-q", "--quote", action="store_true",
        help="Add quotemarks (") around arguments."
        )
    parser.add_argument("-n", "--name",
        type=ArgString,
        help="only print commands matching this name (regex), any arg")
    parser.add_argument("-l", "--line",
        type=ArgString,
        help="only print commands where arg contains this line (regex)")
    parser.add_argument("-U", "--print-uid", action="store_true",
        help="print UID column")
    parser.add_argument("--max-args", default="20",
        help="maximum number of arguments parsed and displayed, defaults to 20")
    parser.add_argument("--ebpf", action="store_true",
        help=argparse.SUPPRESS)
    args = parser.parse_args()
    
    # define BPF program
    bpf_text = """
    #include <uapi/linux/ptrace.h>
    #include <linux/sched.h>
    #include <linux/fs.h>
    
    #define ARGSIZE  128
    
    enum event_type {
        EVENT_ARG,
        EVENT_RET,
    };
    
    struct data_t {
        u32 pid;  // PID as in the userspace term (i.e. task->tgid in kernel)
        u32 ppid; // Parent PID as in the userspace term (i.e task->real_parent->tgid in kernel)
        u32 uid;
        char comm[TASK_COMM_LEN];
        enum event_type type;
        char argv[ARGSIZE];
        int retval;
    };
    
    #if CGROUPSET
    BPF_TABLE_PINNED("hash", u64, u64, cgroupset, 1024, "CGROUPPATH");
    #endif
    BPF_PERF_OUTPUT(events);
    
    static int __submit_arg(struct pt_regs *ctx, void *ptr, struct data_t *data)
    {
        bpf_probe_read(data->argv, sizeof(data->argv), ptr);
        events.perf_submit(ctx, data, sizeof(struct data_t));
        return 1;
    }
    
    static int submit_arg(struct pt_regs *ctx, void *ptr, struct data_t *data)
    {
        const char *argp = NULL;
        bpf_probe_read(&argp, sizeof(argp), ptr);
        if (argp) {
            return __submit_arg(ctx, (void *)(argp), data);
        }
        return 0;
    }
    
    int syscall__execve(struct pt_regs *ctx,
        const char __user *filename,
        const char __user *const __user *__argv,
        const char __user *const __user *__envp)
    {
    
        u32 uid = bpf_get_current_uid_gid() & 0xffffffff;
    
        UID_FILTER
    
    #if CGROUPSET
        u64 cgroupid = bpf_get_current_cgroup_id();
        if (cgroupset.lookup(&cgroupid) == NULL) {
          return 0;
        }
    #endif
    
        // create data here and pass to submit_arg to save stack space (#555)
        struct data_t data = {};
        struct task_struct *task;
    
        data.pid = bpf_get_current_pid_tgid() >> 32;
    
        task = (struct task_struct *)bpf_get_current_task();
        // Some kernels, like Ubuntu 4.13.0-generic, return 0
        // as the real_parent->tgid.
        // We use the get_ppid function as a fallback in those cases. (#1883)
        data.ppid = task->real_parent->tgid;
    
        bpf_get_current_comm(&data.comm, sizeof(data.comm));
        data.type = EVENT_ARG;
    
        __submit_arg(ctx, (void *)filename, &data);
    
        // skip first arg, as we submitted filename
        #pragma unroll
        for (int i = 1; i < MAXARG; i++) {
            if (submit_arg(ctx, (void *)&__argv[i], &data) == 0)
                 goto out;
        }
    
        // handle truncated argument list
        char ellipsis[] = "...";
        __submit_arg(ctx, (void *)ellipsis, &data);
    out:
        return 0;
    }
    
    int do_ret_sys_execve(struct pt_regs *ctx)
    {
    #if CGROUPSET
        u64 cgroupid = bpf_get_current_cgroup_id();
        if (cgroupset.lookup(&cgroupid) == NULL) {
          return 0;
        }
    #endif
    
        struct data_t data = {};
        struct task_struct *task;
    
        u32 uid = bpf_get_current_uid_gid() & 0xffffffff;
        UID_FILTER
    
        data.pid = bpf_get_current_pid_tgid() >> 32;
        data.uid = uid;
    
        task = (struct task_struct *)bpf_get_current_task();
        // Some kernels, like Ubuntu 4.13.0-generic, return 0
        // as the real_parent->tgid.
        // We use the get_ppid function as a fallback in those cases. (#1883)
        data.ppid = task->real_parent->tgid;
    
        bpf_get_current_comm(&data.comm, sizeof(data.comm));
        data.type = EVENT_RET;
        data.retval = PT_REGS_RC(ctx);
        events.perf_submit(ctx, &data, sizeof(data));
    
        return 0;
    }
    """
    
    bpf_text = bpf_text.replace("MAXARG", args.max_args)
    
    if args.uid:
        bpf_text = bpf_text.replace('UID_FILTER',
            'if (uid != %s) { return 0; }' % args.uid)
    else:
        bpf_text = bpf_text.replace('UID_FILTER', '')
    if args.cgroupmap:
        bpf_text = bpf_text.replace('CGROUPSET', '1')
        bpf_text = bpf_text.replace('CGROUPPATH', args.cgroupmap)
    else:
        bpf_text = bpf_text.replace('CGROUPSET', '0')
    if args.ebpf:
        print(bpf_text)
        exit()
    
    # initialize BPF
    b = BPF(text=bpf_text)
    execve_fnname = b.get_syscall_fnname("execve")
    b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
    b.attach_kretprobe(event=execve_fnname, fn_name="do_ret_sys_execve")
    
    # header
    if args.time:
        print("%-9s" % ("TIME"), end="")
    if args.timestamp:
        print("%-8s" % ("TIME(s)"), end="")
    if args.print_uid:
        print("%-6s" % ("UID"), end="")
    print("%-16s %-6s %-6s %3s %s" % ("PCOMM", "PID", "PPID", "RET", "ARGS"))
    
    class EventType(object):
        EVENT_ARG = 0
        EVENT_RET = 1
    
    start_ts = time.time()
    argv = defaultdict(list)
    
    # This is best-effort PPID matching. Short-lived processes may exit
    # before we get a chance to read the PPID.
    # This is a fallback for when fetching the PPID from task->real_parent->tgip
    # returns 0, which happens in some kernel versions.
    def get_ppid(pid):
        try:
            with open("/proc/%d/status" % pid) as status:
                for line in status:
                    if line.startswith("PPid:"):
                        return int(line.split()[1])
        except IOError:
            pass
        return 0
    
    # process event
    def print_event(cpu, data, size):
        event = b["events"].event(data)
        skip = False
    
        if event.type == EventType.EVENT_ARG:
            argv[event.pid].append(event.argv)
        elif event.type == EventType.EVENT_RET:
            if event.retval != 0 and not args.fails:
                skip = True
            if args.name and not re.search(bytes(args.name), event.comm):
                skip = True
            if args.line and not re.search(bytes(args.line),
                                           b' '.join(argv[event.pid])):
                skip = True
            if args.quote:
                argv[event.pid] = [
                    b""" + arg.replace(b""", b"\"") + b"""
                    for arg in argv[event.pid]
                ]
    
            if not skip:
                if args.time:
                    printb(b"%-9s" % strftime("%H:%M:%S").encode('ascii'), nl="")
                if args.timestamp:
                    printb(b"%-8.3f" % (time.time() - start_ts), nl="")
                if args.print_uid:
                    printb(b"%-6d" % event.uid, nl="")
                ppid = event.ppid if event.ppid > 0 else get_ppid(event.pid)
                ppid = b"%d" % ppid if ppid > 0 else b"?"
                argv_text = b' '.join(argv[event.pid]).replace(b'
    ', b'\n')
                printb(b"%-16s %-6d %-6s %3d %s" % (event.comm, event.pid,
                       ppid, event.retval, argv_text))
            try:
                del(argv[event.pid])
            except Exception:
                pass
    
    
    # loop with callback to print_event
    b["events"].open_perf_buffer(print_event)
    while 1:
        try:
            b.perf_buffer_poll()
        except KeyboardInterrupt:
            exit()
    View Code

    目前都是依赖于 kprobe实现

    http代理服务器(3-4-7层代理)-网络事件库公共组件、内核kernel驱动 摄像头驱动 tcpip网络协议栈、netfilter、bridge 好像看过!!!! 但行好事 莫问前程 --身高体重180的胖子
  • 相关阅读:
    【NOIP2009提高组】最优贸易
    matlab数字图像处理-给图片加入可视水印信息
    数字水印技术
    matlab数据插值
    matlab中的多项式计算
    python中圆周率的计算
    python中random库的使用
    python的循环结构
    python身体指数BMI
    python程序控制--分支结构
  • 原文地址:https://www.cnblogs.com/codestack/p/14730076.html
Copyright © 2011-2022 走看看