zoukankan      html  css  js  c++  java
  • Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

    转自:https://www.cnblogs.com/arnoldlu/p/6253665.html

    测试环境:AOSP 7.1.1+Kernel 4.4.17 HW:HiKey

                 Ubuntu 14.04+Kernel 4.4.0-31

    联系方式:arnoldlu@qq.com

    1. Linux内核suspend状态

    Linux内核支持多种类型的睡眠状态,通过设置不同的模块进入低功耗模式来达到省电功能。

    目前存在四种模式:suspend to idle、power-on standby(Standby)、suspend to ram(STR)和sudpend to disk(Hibernate),分别对应ACPI状态的S0、S1、S3和S4。

    State in Linux Label state ACPI state 注释
    #define PM_SUSPEND_ON        ((__force suspend_state_t) 0)        一切正常
    #define PM_SUSPEND_FREEZE    ((__force suspend_state_t) 1) freeze Suspend-to-Idle S0 冻结进程+挂起设备+CPU空闲 
    #define PM_SUSPEND_STANDBY    ((__force suspend_state_t) 2) standby Standby/Power-on Suspend S1 冻结进程+挂起设备+关闭nonbootCPU 
    #define PM_SUSPEND_MEM        ((__force suspend_state_t) 3) mem Suspend-to-RAM S3 仅保留RAM自刷新 
    #define PM_SUSPEND_MAX        ((__force suspend_state_t) 4) disk Suspend-to-disk S4

    关闭所有设备包括RAM,也被称为Hibernate

    从freeze-->standby-->mem睡眠程度越来越深,唤醒花费的时间也越来越多。

    Suspend-To-Idle

    此状态包括frozen processes+suspended devices+idle processors,具有轻量化的特点;

    并且相对于相对于Idle状态能节省更多的功耗,因为此时的用户空间被冻结且I/O设备进入了低功耗状态。

    相对于Suspend-To-RAM它具有低延时的优势。

    Standby/Power-On Suspend

    此状态包括frozen processes+suspended devices+offline nonboot CPUs+suspend low-level system,对CPU的处理更近一步。

    所以相对于Suspend-To-Idle节省了更多的功耗,但是由于需要恢复CPU和一些底层功能也花费了更多的时间。

    Suspend-to-RAM

    此状态使所有的设备进入低功耗状态,仅保留RAM自刷新。

    所有的设备和系统状态都保存在RAM中,所有外设被挂起。

    (在HiKey的实际测试中,boot CPU是没有关闭的!实际上这里也没有standby,mem和standby基本上没有区别。)

    Suspend-to-disk

    此状态是最省功耗的模式。

    相对Suspend-to-RAMRAM能节省更多功耗的原因是数据会被写入磁盘中,RAM也可以被关闭。

    但是这也导致了,更多的恢复延时,在resume的时候读回到RAM,然后在进行系统和设备状态恢复工作。

    但是在一般的嵌入式设备上,此种状态不支持。

    下面用STR表示Suspend to RAM,STI表示Suspend to Idle。

    详情请参考:http://www.linaro.org/blog/suspend-to-idle/

    2. Suspend状态,以及STR 和STI区别

    写入/sys/power/state不同字符串,可以让系统进入不同睡眠状态。

    针对state sysfs节点的写入,最终会进入到state_store这个函数,将字符串转换成上表中不同状态。

    复制代码
    state_store(kernel/power/main.c)
        -->pm_suspend (kernel/power/suspend.c)-------------处理除freeze、standby、mem三种类型suspend
            -->enter_state---------------------------------在进入睡眠之前,做一些准备工作
                -->suspend_devices_and_enter
                    -->suspend_enter-----------------------这里才是freeze与standby/mem区别所在。
        -->hibernate---------------------------------------进入suspend to disk流程
    复制代码

    STR和STI的最主要区别就是下面一段代码:

    复制代码
    static int suspend_enter(suspend_state_t state, bool *wakeup)
    {
    
    …
    
        /*
         * PM_SUSPEND_FREEZE equals
         * frozen processes + suspended devices + idle processors.
         * Thus we should invoke freeze_enter() soon after
         * all the devices are suspended.
         */
    //====================================FREEZE=============================================================== if (state == PM_SUSPEND_FREEZE) {------------------------------------如果要进入freeze状态,就会执行此段代码。 trace_suspend_resume(TPS("machine_suspend"), state, true); freeze_enter(); trace_suspend_resume(TPS("machine_suspend"), state, false); goto Platform_wake;----------------------------------------------在执行结束跳转到Platform_wake,中间一段绿色代码将会被跳过。所以说freeze和standby、mem相比,多了freeze_enter,少了对non-boot CPUs、arch、syscore的操作。 } //=====================================MEM=============================================================== error = disable_nonboot_cpus(); if (error || suspend_test(TEST_CPUS)) { log_suspend_abort_reason("Disabling non-boot cpus failed"); goto Enable_cpus; } arch_suspend_disable_irqs(); BUG_ON(!irqs_disabled()); error = syscore_suspend(); if (!error) { *wakeup = pm_wakeup_pending(); if (!(suspend_test(TEST_CORE) || *wakeup)) { trace_suspend_resume(TPS("machine_suspend"), state, true); error = suspend_ops->enter(state); trace_suspend_resume(TPS("machine_suspend"), state, false); events_check_enabled = false; } else if (*wakeup) { pm_get_active_wakeup_sources(suspend_abort, MAX_SUSPEND_ABORT_LEN); log_suspend_abort_reason(suspend_abort); error = -EBUSY; } syscore_resume(); } arch_suspend_enable_irqs(); BUG_ON(irqs_disabled()); Enable_cpus: enable_nonboot_cpus(); Platform_wake: platform_resume_noirq(state); dpm_resume_noirq(PMSG_RESUME); … }
    复制代码

    3 suspend/resume流程梳理

    下面分析一下suspend/resume每个细分阶段。

    整个suspend可以分为若干阶段,每个阶段函数—>关键节点Trace—>analyze_suspend.py解析Trace—>根据Trace时间画出Timeline图表

    这样就可以分析出总的时间差异,每个阶段差异,甚至一个设备suspend/resume、一个子系统suspend/resume的时间差异。

    analyze_suspend.py 基于默认基于ftrace进行分析(在指定dmesg的时候,会发现缺失了很多log信息,无法生成timeline类型的html文件),将suspend/resume分为若干阶段。

    下面简要介绍一下各个阶段,然后基于此进行代码分析。

    在kernel版本大于等与3.15之后,解析需要的所有log信息都可以从ftrace中获取。之前的内核版本还需要借助于dmesg。

    由于使用的kernel版本是4.4.17,sysvals.usetraceeventsonly被置位,所以只会parseTraceLog()。

    下表中的各个阶段通过解析suspend_resume: XXXXXXX类型的ftrace来获取。

    各子模块、子系统的解析通过device_pm_callback_start和device_pm_callback_end来截取时间段,以及这时间段内的callgraph。

    Phase名称 ftrace关键词    
    suspend_prepare dpm_prepare    
    suspend dpm_suspend    
    suspend_late dpm_suspend_late    
    suspend_noirq dpm_suspend_noirq    
    suspend_machine machine_suspend start    
    resume_machine machine_suspend end    
    resume_noirq dpm_resume_noirq    
    resume_early dpm_resume_early    
    resume dpm_resume    
    resume_complete dpm_complete    

    下面是一组suspend/resume执行ftrace log,我们将据此进行各阶段代码分析,包括suspend_enter、suspend_prepare、suspend、suspend_late、suspend_noirq、suspend_machine、resume_machine、resume_noirq、resume_early、resume、resume_complete。

    从这里也可以看出freeze和mem/standby除了machine部分不同之外,还少了CPU开关和syscore suspend/resume操作。

    suspend_resume: suspend_enter[1] begin
    suspend_resume: sync_filesystems[0] begin
    suspend_resume: sync_filesystems[0] end
    suspend_resume: freeze_processes[0] begin
    suspend_resume: freeze_processes[0] end
    suspend_resume: suspend_enter[1] end
    suspend_resume: dpm_prepare[2] begin
    suspend_resume: dpm_prepare[2] end
    suspend_resume: dpm_suspend[2] begin
    suspend_resume: dpm_suspend[2] end
    suspend_resume: dpm_suspend_late[2] begin
    suspend_resume: dpm_suspend_late[2] end
    suspend_resume: dpm_suspend_noirq[2] begin
    suspend_resume: dpm_suspend_noirq[2] end
    No CPU_OFF…
    No syscore_suspend…
    suspend_resume: machine_suspend[1] begin
    suspend_resume: machine_suspend[1] end
    No suscore_resume…
    No CPU_ON…
    suspend_resume: dpm_resume_noirq[16] begin
    suspend_resume: dpm_resume_noirq[16] end
    suspend_resume: dpm_resume_early[16] begin
    suspend_resume: dpm_resume_early[16] end
    suspend_resume: dpm_resume[16] begin
    suspend_resume: dpm_resume[16] end
    suspend_resume: dpm_complete[16] begin
    suspend_resume: dpm_complete[16] end
    suspend_resume: resume_console[1] begin
    suspend_resume: resume_console[1] end
    suspend_resume: thaw_processes[0] begin
    suspend_resume: thaw_processes[0] end

    suspend_resume: suspend_enter[3] begin
    suspend_resume: sync_filesystems[0] begin
    suspend_resume: sync_filesystems[0] end
    suspend_resume: freeze_processes[0] begin
    suspend_resume: freeze_processes[0] end
    suspend_resume: suspend_enter[3] end
    suspend_resume: dpm_prepare[2] begin
    suspend_resume: dpm_prepare[2] end
    suspend_resume: dpm_suspend[2] begin
    suspend_resume: dpm_suspend[2] end
    suspend_resume: dpm_suspend_late[2] begin
    suspend_resume: dpm_suspend_late[2] end
    suspend_resume: dpm_suspend_noirq[2] begin
    suspend_resume: dpm_suspend_noirq[2] end
    suspend_resume: CPU_OFF[1-7] begin/end
    suspend_resume: syscore_suspend[0] begin/end
    suspend_resume: machine_suspend[3] begin
    suspend_resume: machine_suspend[3] end
    suspend_resume: syscore_resume[0] begin/end
    suspend_resume: CPU_ON[1-7] begin/end
    suspend_resume: dpm_resume_noirq[16] begin
    suspend_resume: dpm_resume_noirq[16] end
    suspend_resume: dpm_resume_early[16] begin
    suspend_resume: dpm_resume_early[16] end
    suspend_resume: dpm_resume[16] begin
    suspend_resume: dpm_resume[16] end
    suspend_resume: dpm_complete[16] begin
    suspend_resume: dpm_complete[16] end
    suspend_resume: resume_console[3] begin
    suspend_resume: resume_console[3] end
    suspend_resume: thaw_processes[0] begin
    suspend_resume: thaw_processes[0] end

    在介绍相关代码之前,先介绍一下HiKey使用的platform_suspend_ops:

    static const struct platform_suspend_ops psci_suspend_ops = {
        .valid          = suspend_valid_only_mem,  仅支持mem类型的suspend
        .enter          = psci_system_suspend_enter,  睡眠的CPU底层支持
    };

    freeze的platform_freeze_ops如下:

    static const struct platform_freeze_ops acpi_freeze_ops = {
        .begin = acpi_freeze_begin,
        .prepare = acpi_freeze_prepare,
        .restore = acpi_freeze_restore,
        .end = acpi_freeze_end,
    };

    3.1 suspend_enter

    enter_state作为suspend/resume的入口点,完成了绝大部分工作。首先确保系统没有正在进入睡眠状态;然后为suspend做一些准备,使系统进入睡眠并在唤醒后进行必要清理恢复工作。

    下面分析一下suspend之前的准备工作,即suspend_enter阶段:

    复制代码
    static int enter_state(suspend_state_t state)
    {
        int error;
    
        trace_suspend_resume(TPS("suspend_enter"), state, true);
        if (state == PM_SUSPEND_FREEZE) {--------------------------------------是否是freeze类型suspend
    #ifdef CONFIG_PM_DEBUG
            if (pm_test_level != TEST_NONE && pm_test_level <= TEST_CPUS) {
                pr_warning("PM: Unsupported test mode for suspend to idle,"
                       "please choose none/freezer/devices/platform.
    ");
                return -EAGAIN;
            }
    #endif
        } else if (!valid_state(state)) {-------------------------------------目前只支持mem类型suspend
            return -EINVAL;
        }
        if (!mutex_trylock(&pm_mutex))
            return -EBUSY;
    
        if (state == PM_SUSPEND_FREEZE)
            freeze_begin();--------------------------------------------------初始化suspend_freeze_state为FREEZE_STATE_NONE
    
    #ifndef CONFIG_SUSPEND_SKIP_SYNC
        trace_suspend_resume(TPS("sync_filesystems"), 0, true);
        printk(KERN_INFO "PM: Syncing filesystems ... ");
        sys_sync();----------------------------------------------------------sync文件系统缓存文件,确保数据sync到硬盘。
        printk("done.
    ");
        trace_suspend_resume(TPS("sync_filesystems"), 0, false);
    #endif
    
        pr_debug("PM: Preparing system for sleep (%s)
    ", pm_states[state]);
        pm_suspend_clear_flags();
        error = suspend_prepare(state);--------------------------------------注意这里面的suspend_prepare和下面的suspend_prepare阶段容易搞混。
        if (error)
            goto Unlock;
    
        if (suspend_test(TEST_FREEZER))
            goto Finish;
    
        trace_suspend_resume(TPS("suspend_enter"), state, false);
        pr_debug("PM: Suspending system (%s)
    ", pm_states[state]);
        pm_restrict_gfp_mask();
        error = suspend_devices_and_enter(state);
        pm_restore_gfp_mask();
    
    Finish:
        pr_debug("PM: Finishing wakeup.
    ");
        suspend_finish();---------------------------------------------------解冻,重启进程;发送PM_POST_SUSPEND通知;释放之前分配的console。
    Unlock:
        mutex_unlock(&pm_mutex);
        return error;
    }
    复制代码

    接着分析一下suspend_prepare函数:

    static int suspend_prepare(suspend_state_t state)
    {
        int error;

        if (!sleep_state_supported(state))  验证suspend状态
            return -EPERM;

        pm_prepare_console();  分配一个suspend console

        error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);  发送PM_SUSPEND_PREPARE通知消息
        if (error)
            goto Finish;

        trace_suspend_resume(TPS("freeze_processes"), 0, true);
        error = suspend_freeze_processes();  冻结进程
        trace_suspend_resume(TPS("freeze_processes"), 0, false);
        if (!error)
            return 0;

        suspend_stats.failed_freeze++;
        dpm_save_failed_step(SUSPEND_FREEZE);
    Finish:
        pm_notifier_call_chain(PM_POST_SUSPEND);
        pm_restore_console();
        return error;
    }

    suspend_freeze_process先处理用户空间进程,然后处理内核进程:

    static inline int suspend_freeze_processes(void)
    {
        int error;

        error = freeze_processes();  触发用户空间进程进入freeze状态。当前进程不会被冻结。因为冻结失败的进程会自动被解冻,所以不需要进行错误处理。
        /*
         * freeze_processes() automatically thaws every task if freezing
         * fails. So we need not do anything extra upon error.
         */
        if (error)
            return error;

        error = freeze_kernel_threads();  冻结内核线程
        /*
         * freeze_kernel_threads() thaws only kernel threads upon freezing
         * failure. So we have to thaw the userspace tasks ourselves.
         */
        if (error)  由于freeze_kernel_threads冻结失败,只会解冻内核线程。所以还需要对用户空间进程进行解冻。
            thaw_processes();

        return error;
    }

    下面的阶段都在suspend_devices_and_enter中,可以看出这是一个对称的流程,每一阶段的suspend,都有对应的resume。

    int suspend_devices_and_enter(suspend_state_t state)
    {
        int error;
        bool wakeup = false;

        if (!sleep_state_supported(state))
            return -ENOSYS;

        error = platform_suspend_begin(state);
        if (error)
            goto Close;

        suspend_console();  关闭console子系统,暂停printk打印
        suspend_test_start();
        error = dpm_suspend_start(PMSG_SUSPEND); suspend_prepare(dpm_prepare)、suspend(dpm_suspend)两阶段
        if (error) {
            pr_err("PM: Some devices failed to suspend, or early wake event detected ");
            log_suspend_abort_reason("Some devices failed to suspend, or early wake event detected");
            goto Recover_platform;
        }
        suspend_test_finish("suspend devices");
        if (suspend_test(TEST_DEVICES))
            goto Recover_platform;

        do {
            error = suspend_enter(state, &wakeup);  suspend_late(dpm_suspend_late)、suspend_noirq(dpm_suspend_noirq)、suspend_machine、resume_machine、resume_noirq(dpm_resume_noirq)、resume_early(dpm_resume_early)
        } while (!error && !wakeup && platform_suspend_again(state));

    Resume_devices:
        suspend_test_start();
        dpm_resume_end(PMSG_RESUME);  resume(dpm_resume)、resume_complete(dpm_complete)
        suspend_test_finish("resume devices");
        trace_suspend_resume(TPS("resume_console"), state, true);
        resume_console();  打开console子系统,恢复printk打印。
        trace_suspend_resume(TPS("resume_console"), state, false);

    Close:
        platform_resume_end(state);
        return error;

    Recover_platform:
        platform_recover(state);
        goto Resume_devices;
    }

    还有必要过一下suspend_enter:

    static int suspend_enter(suspend_state_t state, bool *wakeup)
    {
        char suspend_abort[MAX_SUSPEND_ABORT_LEN];
        int error, last_dev;

        error = platform_suspend_prepare(state); 因为suspend_ops的prepare为空,所以返回0
        if (error)
            goto Platform_finish;

        error = dpm_suspend_late(PMSG_SUSPEND);  suspend_late
        if (error) {
            last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
            last_dev %= REC_FAILED_NUM;
            printk(KERN_ERR "PM: late suspend of devices failed ");
            log_suspend_abort_reason("%s device failed to power down",
                suspend_stats.failed_devs[last_dev]);
            goto Platform_finish;
        }
        error = platform_suspend_prepare_late(state);  执行freeze_ops->prepare()
        if (error)
            goto Devices_early_resume;

        error = dpm_suspend_noirq(PMSG_SUSPEND);  suspend_noirq
        if (error) {
            last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
            last_dev %= REC_FAILED_NUM;
            printk(KERN_ERR "PM: noirq suspend of devices failed ");
            log_suspend_abort_reason("noirq suspend of %s device failed",
                suspend_stats.failed_devs[last_dev]);
            goto Platform_early_resume;
        }
        error = platform_suspend_prepare_noirq(state);
        if (error)
            goto Platform_wake;

        if (suspend_test(TEST_PLATFORM))
            goto Platform_wake;

        /*
         * PM_SUSPEND_FREEZE equals
         * frozen processes + suspended devices + idle processors.
         * Thus we should invoke freeze_enter() soon after
         * all the devices are suspended.
         */
        if (state == PM_SUSPEND_FREEZE) {  这里是freeze和mem/standy差别所在
            trace_suspend_resume(TPS("machine_suspend"), state, true);
            freeze_enter();  
            trace_suspend_resume(TPS("machine_suspend"), state, false);
            goto Platform_wake;
        }

        error = disable_nonboot_cpus(); 关闭所有boot-CPU之外的CPU
        if (error || suspend_test(TEST_CPUS)) {
            log_suspend_abort_reason("Disabling non-boot cpus failed");
            goto Enable_cpus;
        }

        arch_suspend_disable_irqs();
        BUG_ON(!irqs_disabled());

        error = syscore_suspend();  执行syscore_ops_list上所有syscore_ops的suspend回调函数
        if (!error) {
            *wakeup = pm_wakeup_pending();  检查是否需要终止suspend流程?
            if (!(suspend_test(TEST_CORE) || *wakeup)) {
                trace_suspend_resume(TPS("machine_suspend"),
                    state, true);
                error = suspend_ops->enter(state);  调用psci_suspend_ops的enter回调函数,关闭machine
                trace_suspend_resume(TPS("machine_suspend"),
                    state, false);  !!!!!!!!!!!!!!!!这里即为唤醒之后的执行路径了!!!!!!!!!!!!!!!!
                events_check_enabled = false;
            } else if (*wakeup) {
                pm_get_active_wakeup_sources(suspend_abort,
                    MAX_SUSPEND_ABORT_LEN);
                log_suspend_abort_reason(suspend_abort);
                error = -EBUSY;
            }
            syscore_resume();  执行所有syscore_ops_list的resume回调函数
        }

        arch_suspend_enable_irqs();
        BUG_ON(irqs_disabled());

    Enable_cpus:
        enable_nonboot_cpus();  打开所有non-boot CPU

    Platform_wake:
        platform_resume_noirq(state);
        dpm_resume_noirq(PMSG_RESUME);  resume_noirq

    Platform_early_resume:
        platform_resume_early(state);

    Devices_early_resume:
        dpm_resume_early(PMSG_RESUME);  resume_early

    Platform_finish:
        platform_resume_finish(state);
        return error;
    }

    3.2 suspend_prepare和suspend

    DPM是Device Power Management的意思,这些操作都是针对非系统设备(non-sysdev)进行的。那什么是系统设备呢?下面的machine应该就是所谓的sysdev了。

    dpm_prepare实际上就是遍历dpm_list上的所有设备,执行->prepare回调函数。如果设备存在->prepare回电函数,会将设备的prepare阶段打印到ftrace。

    int dpm_prepare(pm_message_t state)
    {
        int error = 0;

        trace_suspend_resume(TPS("dpm_prepare"), state.event, true);
        might_sleep();

        mutex_lock(&dpm_list_mtx);
        while (!list_empty(&dpm_list)) {  遍历dpm_list
            struct device *dev = to_device(dpm_list.next);

            get_device(dev);
            mutex_unlock(&dpm_list_mtx);

            trace_device_pm_callback_start(dev, "", state.event);
            error = device_prepare(dev, state);  执行->prepare回调函数
            trace_device_pm_callback_end(dev, error);

            mutex_lock(&dpm_list_mtx);
            if (error) {
                if (error == -EAGAIN) {
                    put_device(dev);
                    error = 0;
                    continue;
                }
                printk(KERN_INFO "PM: Device %s not prepared "
                    "for power transition: code %d ",
                    dev_name(dev), error);
                put_device(dev);
                break;
            }
            dev->power.is_prepared = true;
            if (!list_empty(&dev->power.entry))
                list_move_tail(&dev->power.entry, &dpm_prepared_list);  移动设备到dpm_prepared_list
            put_device(dev);
        }
        mutex_unlock(&dpm_list_mtx);
        trace_suspend_resume(TPS("dpm_prepare"), state.event, false);
        return error;
    }

    dpm_suspend遍历dpm_prepared_list,这点和dpm_prepare有区别。然后执行设备的->suspend回调函数。

    int dpm_suspend(pm_message_t state)
    {
        ktime_t starttime = ktime_get();
        int error = 0;

        trace_suspend_resume(TPS("dpm_suspend"), state.event, true);
        might_sleep();

        cpufreq_suspend();

        mutex_lock(&dpm_list_mtx);
        pm_transition = state;
        async_error = 0;
        while (!list_empty(&dpm_prepared_list)) { 基于dpm_prepared_list遍历设备
            struct device *dev = to_device(dpm_prepared_list.prev);

            get_device(dev);
            mutex_unlock(&dpm_list_mtx);

            error = device_suspend(dev);  执行设备->suspend回调函数

            mutex_lock(&dpm_list_mtx);
            if (error) {
                pm_dev_err(dev, state, "", error);
                dpm_save_failed_dev(dev_name(dev));
                put_device(dev);
                break;
            }
            if (!list_empty(&dev->power.entry))
                list_move(&dev->power.entry, &dpm_suspended_list);  移动设备到dpm_suspended_list
            put_device(dev);
            if (async_error)
                break;
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        if (!error)
            error = async_error;
        if (error) {
            suspend_stats.failed_suspend++;
            dpm_save_failed_step(SUSPEND_SUSPEND);
        } else
            dpm_show_time(starttime, state, NULL);
        trace_suspend_resume(TPS("dpm_suspend"), state.event, false);
        return error;
    }

    3.3 suspend_late和suspend_noirq

    dpm_suspend_late基于dpm_suspended_list操作设备,所以这也需要函数之间顺序执行。

    int dpm_suspend_late(pm_message_t state)
    {
        ktime_t starttime = ktime_get();
        int error = 0;

        trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
        mutex_lock(&dpm_list_mtx);
        pm_transition = state;
        async_error = 0;

        while (!list_empty(&dpm_suspended_list)) {  遍历dpm_suspended_list列表
            struct device *dev = to_device(dpm_suspended_list.prev);

            get_device(dev);
            mutex_unlock(&dpm_list_mtx);

            error = device_suspend_late(dev);  执行->suspend_late回调函数

            mutex_lock(&dpm_list_mtx);
            if (!list_empty(&dev->power.entry))
                list_move(&dev->power.entry, &dpm_late_early_list);  移动设备到dpm_late_early_list

            if (error) {
                pm_dev_err(dev, state, " late", error);
                dpm_save_failed_dev(dev_name(dev));
                put_device(dev);
                break;
            }
            put_device(dev);

            if (async_error)
                break;
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        if (!error)
            error = async_error;
        if (error) {
            suspend_stats.failed_suspend_late++;
            dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
            dpm_resume_early(resume_event(state));
        } else {
            dpm_show_time(starttime, state, "late");
        }
        trace_suspend_resume(TPS("dpm_suspend_late"), state.event, false);
        return error;
    }

    dpm_suspend_noirq基于dpm_late_early_list遍历所有设备。首先阻止设备驱动接收中断信息,然后执行->suspend_noirq回调函数。

    int dpm_suspend_noirq(pm_message_t state)
    {
        ktime_t starttime = ktime_get();
        int error = 0;

        trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true);
        cpuidle_pause();  暂停cpuidle功能,退出idle的CPU
        device_wakeup_arm_wake_irqs();  将具有wakeirq的设备设置成wakeup resource
        suspend_device_irqs();  关闭当前所有能够关闭的irq,置成IRQS_SUSPENDED。IRQF_NO_SUSPEND类型的wakeup中断不能被关闭,并且作为wakeup唤醒源的中断不能被关闭。
        mutex_lock(&dpm_list_mtx);
        pm_transition = state;
        async_error = 0;

        while (!list_empty(&dpm_late_early_list)) {
            struct device *dev = to_device(dpm_late_early_list.prev);

            get_device(dev);
            mutex_unlock(&dpm_list_mtx);

            error = device_suspend_noirq(dev);  调用->suspend_noirq回调函数

            mutex_lock(&dpm_list_mtx);
            if (error) {
                pm_dev_err(dev, state, " noirq", error);
                dpm_save_failed_dev(dev_name(dev));
                put_device(dev);
                break;
            }
            if (!list_empty(&dev->power.entry))
                list_move(&dev->power.entry, &dpm_noirq_list);  移动设备到dpm_noirq_list
            put_device(dev);

            if (async_error)
                break;
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        if (!error)
            error = async_error;

        if (error) {
            suspend_stats.failed_suspend_noirq++;
            dpm_save_failed_step(SUSPEND_SUSPEND_NOIRQ);
            dpm_resume_noirq(resume_event(state));
        } else {
            dpm_show_time(starttime, state, "noirq");
        }
        trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, false);
        return error;
    }

    3.4 suspend_machine和resume_machine

    freeze和mem/standby在这部分是不同的。

    mem/standby直接调用suspend_ops->enter进入对应的睡眠模式。

    而freeze就要稍微复杂了:

    static void freeze_enter(void)
    {
        spin_lock_irq(&suspend_freeze_lock);
        if (pm_wakeup_pending())  检查是否有wakeup信号在处理,如果有则退出当前流程。
            goto out;

        suspend_freeze_state = FREEZE_STATE_ENTER;
        spin_unlock_irq(&suspend_freeze_lock);

        get_online_cpus();
        cpuidle_resume();  允许使用cpuidle

        /* Push all the CPUs into the idle loop. */
        wake_up_all_idle_cpus();  强制所有CPU退出idle状态
        pr_debug("PM: suspend-to-idle ");
        /* Make the current CPU wait so it can enter the idle loop too. */
        wait_event(suspend_freeze_wait_head,
               suspend_freeze_state == FREEZE_STATE_WAKE);  等待FREEZE_STATE_WAKE事件,进入idle loop
        pr_debug("PM: resume from suspend-to-idle ");  !!!!!!!!!!!!!!!!这里即为唤醒之后的执行路径了!!!!!!!!!!!!!!!!
        cpuidle_pause();  暂停使用cpuidle
        put_online_cpus();

        spin_lock_irq(&suspend_freeze_lock);

    out:
        suspend_freeze_state = FREEZE_STATE_NONE;
        spin_unlock_irq(&suspend_freeze_lock);
    }

    3.5 resume_noirq

    执行dpm_noirq_list上设备的resume_noirq回调函数。

    void dpm_resume_noirq(pm_message_t state)
    {
        struct device *dev;
        ktime_t starttime = ktime_get();

        trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, true);
        mutex_lock(&dpm_list_mtx);
        pm_transition = state;

        /*
         * Advanced the async threads upfront,
         * in case the starting of async threads is
         * delayed by non-async resuming devices.
         */
        list_for_each_entry(dev, &dpm_noirq_list, power.entry) {
            reinit_completion(&dev->power.completion);
            if (is_async(dev)) {
                get_device(dev);
                async_schedule(async_resume_noirq, dev);
            }
        }

        while (!list_empty(&dpm_noirq_list)) {  遍历dpm_noirq_list
            dev = to_device(dpm_noirq_list.next);
            get_device(dev);
            list_move_tail(&dev->power.entry, &dpm_late_early_list);  移动设备到下一级dpm_late_early_list
            mutex_unlock(&dpm_list_mtx);

            if (!is_async(dev)) {
                int error;

                error = device_resume_noirq(dev, state, false);
                if (error) {
                    suspend_stats.failed_resume_noirq++;
                    dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
                    dpm_save_failed_dev(dev_name(dev));
                    pm_dev_err(dev, state, " noirq", error);
                }
            }

            mutex_lock(&dpm_list_mtx);
            put_device(dev);
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        dpm_show_time(starttime, state, "noirq");
        resume_device_irqs();
        device_wakeup_disarm_wake_irqs();
        cpuidle_resume();
        trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, false);
    }

    3.6 resume_early

    执行前述dpm_late_early_list设备的resume_early回调函数,移动设备到dpm_suspended_list列表。

    void dpm_resume_early(pm_message_t state)
    {
        struct device *dev;
        ktime_t starttime = ktime_get();

        trace_suspend_resume(TPS("dpm_resume_early"), state.event, true);
        mutex_lock(&dpm_list_mtx);
        pm_transition = state;

        /*
         * Advanced the async threads upfront,
         * in case the starting of async threads is
         * delayed by non-async resuming devices.
         */
        list_for_each_entry(dev, &dpm_late_early_list, power.entry) {
            reinit_completion(&dev->power.completion);
            if (is_async(dev)) {
                get_device(dev);
                async_schedule(async_resume_early, dev);
            }
        }

        while (!list_empty(&dpm_late_early_list)) {
            dev = to_device(dpm_late_early_list.next);
            get_device(dev);
            list_move_tail(&dev->power.entry, &dpm_suspended_list);
            mutex_unlock(&dpm_list_mtx);

            if (!is_async(dev)) {
                int error;

                error = device_resume_early(dev, state, false);
                if (error) {
                    suspend_stats.failed_resume_early++;
                    dpm_save_failed_step(SUSPEND_RESUME_EARLY);
                    dpm_save_failed_dev(dev_name(dev));
                    pm_dev_err(dev, state, " early", error);
                }
            }
            mutex_lock(&dpm_list_mtx);
            put_device(dev);
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        dpm_show_time(starttime, state, "early");
        trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
    }

    3.7 resume

    执行所有dpm_suspended_list上设备的resume回调函数。

    void dpm_resume(pm_message_t state)
    {
        struct device *dev;
        ktime_t starttime = ktime_get();

        trace_suspend_resume(TPS("dpm_resume"), state.event, true);
        might_sleep();

        mutex_lock(&dpm_list_mtx);
        pm_transition = state;
        async_error = 0;

        list_for_each_entry(dev, &dpm_suspended_list, power.entry) {
            reinit_completion(&dev->power.completion);
            if (is_async(dev)) {
                get_device(dev);
                async_schedule(async_resume, dev);
            }
        }

        while (!list_empty(&dpm_suspended_list)) {
            dev = to_device(dpm_suspended_list.next);
            get_device(dev);
            if (!is_async(dev)) {
                int error;

                mutex_unlock(&dpm_list_mtx);

                error = device_resume(dev, state, false);
                if (error) {
                    suspend_stats.failed_resume++;
                    dpm_save_failed_step(SUSPEND_RESUME);
                    dpm_save_failed_dev(dev_name(dev));
                    pm_dev_err(dev, state, "", error);
                }

                mutex_lock(&dpm_list_mtx);
            }
            if (!list_empty(&dev->power.entry))
                list_move_tail(&dev->power.entry, &dpm_prepared_list);
            put_device(dev);
        }
        mutex_unlock(&dpm_list_mtx);
        async_synchronize_full();
        dpm_show_time(starttime, state, NULL);

        cpufreq_resume();
        trace_suspend_resume(TPS("dpm_resume"), state.event, false);
    }

    3.8 resume_complete

    执行所有dpm_prepared_list上设备的complete回调函数。至此dpm_complete结束所有非系统设备的睡眠。

    void dpm_complete(pm_message_t state)
    {
        struct list_head list;

        trace_suspend_resume(TPS("dpm_complete"), state.event, true);
        might_sleep();

        INIT_LIST_HEAD(&list);
        mutex_lock(&dpm_list_mtx);
        while (!list_empty(&dpm_prepared_list)) {
            struct device *dev = to_device(dpm_prepared_list.prev);

            get_device(dev);
            dev->power.is_prepared = false;
            list_move(&dev->power.entry, &list);
            mutex_unlock(&dpm_list_mtx);

            trace_device_pm_callback_start(dev, "", state.event);
            device_complete(dev, state);
            trace_device_pm_callback_end(dev, 0);

            mutex_lock(&dpm_list_mtx);
            put_device(dev);
        }
        list_splice(&list, &dpm_list);
        mutex_unlock(&dpm_list_mtx);
        trace_suspend_resume(TPS("dpm_complete"), state.event, false);
    }

    4 如何让HiKey进入STR/STI并唤醒?

    可以通过配置GPIO作为唤醒源,或者通过RTC作为唤醒源,延时一定时间来唤醒。

    检查是否存在/sys/class/rtc/rtc0/wakealarm,入不存在则需要打开CONFIG_RTC_DRV_PL031。

    写入wakealarm的参数,表示在多少秒之后resume唤醒,退出suspend。

    写mem进入state,是系统进入suspend流程。

    adb root && adb remount
    adb shell "echo +10 > /sys/class/rtc/rtc0/wakealarm && echo mem > /sys/power/state"

    5. suspend/resume的latency分析手段

    5.1 analyze_suspend.py v3.0

    在kernel的scripts中,这个工具可以帮助内核和OS开发者优化suspend/resume时间。

    在打开一系列内核选项之后,此工具就可以执行suspend操作,然后抓取dmesg和ftrace数据知道resume结束。

    这些数据会按照时间线显示每个设备,并且显示占用最多suspend/resume时间的设备或者子系统的调用关系详图。

    执行工具后,会根据时间生成一个子目录,里面包含:html、dmesg和原始ftrace文件。

    下面简单看一下工具选项:

    Options:
      [general]
        -h          Print this help text
        -v          Print the current tool version
        -verbose    Print extra information during execution and analysis
        -status     Test to see if the system is enabled to run this tool
        -modes      List available suspend modes  显示当前支持的suspend模式
        -m mode     Mode to initiate for suspend ['freeze', 'mem', 'disk'] (default: mem)  设置进入何种模式的suspend
        -rtcwake t  Use rtcwake to autoresume after <t> seconds (default: disabled)  使用rtc来唤醒,参数是间隔时间
      [advanced]
        -f          Use ftrace to create device callgraphs (default: disabled)  基于ftrace生成调用关系图
        -filter "d1 d2 ..." Filter out all but this list of dev names
        -x2         Run two suspend/resumes back to back (default: disabled)
        -x2delay t  Minimum millisecond delay <t> between the two test runs (default: 0 ms)
        -postres t  Time after resume completion to wait for post-resume events (default: 0 S)
        -multi n d  Execute <n> consecutive tests at <d> seconds intervals. The outputs will
                    be created in a new subdirectory with a summary page.
      [utilities]
        -fpdt       Print out the contents of the ACPI Firmware Performance Data Table
        -usbtopo    Print out the current USB topology with power info
        -usbauto    Enable autosuspend for all connected USB devices
      [android testing]
        -adb binary Use the given adb binary to run the test on an android device.  参数需要给出adb路径,工具就会对Android设备进行测试,并将结果pull出来。有一点需要注意,在此之前确保adb具有root权限。
                    The device should already be connected and with root access.
                    Commands will be executed on the device using "adb shell"
      [re-analyze data from previous runs] 针对之前测试数据重新分析
        -ftrace ftracefile  Create HTML output using ftrace input
        -dmesg dmesgfile    Create HTML output using dmesg (not needed for kernel >= 3.15)
        -summary directory  Create a summary of all test in this dir

    在了解了工具使用方法之后,就可以进行相关测试了。

    5.1.1 Android

    ./analysze_suspend.py –modes –adb /usr/bin/adb获取当前系统支持的suspend状态。

    ['freeze', 'mem']

    1.Android上测试STR,suspend/resume共5次,每次间隔20秒。

    ./analyze_suspend.py -adb  /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m mem

    2.Android上测试STI,suspend/resume共10次,每次间隔5秒。

    ./analyze_suspend.py -adb  /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m freeze

    测试结果可以在如下获得:

    https://github.com/arnoldlu/common-use/tree/master/tools/analyze_suspend/hikey_test

    存在的问题:analyze_suspend.py不支持Android的rtcwakeup和callgraph。已经在如下fix:

    https://github.com/arnoldlu/common-use/blob/master/tools/analyze_suspend/analyze_suspend.py

    5.1.1.1 总体对比

    下面是HiKey上测试结果,可以看出两个数据都不够稳定。mem的suspend和resume平均值都比较高。

    freeze相比mem的suspend/resume平均值提高了304.3ms/613.5ms。

    image

    image

    5.1.1.2 是否suspend CPU

    对比如下两幅图,明显看出mem类型的suspend关闭了除CPU0之外的所有CPU;而freeze则没有关闭任何CPU。

    non-boot CPUs的suspend/resume时间就达到300ms/200ms。

    image

    image

    同时从log中也可以看出mem和freeze的主要区别就在于是否disabling/enabling non-boot CPU。其他设备和子系统的suspend/resume时间基本一致。

    同时还可以看出mem的suspend后,系统的timestamp是停止的;而freeze的timestamp还是一直在运行的。可以得出freeze状态持续的时间。

    因为先写rtcwake为10s,然后进入睡眠,再唤醒,所以freeze时间是小于10s的。

    [ 3385.642962] PM: suspend entry 1970-01-01 00:57:30.580909763 UTC
    [ 3385.649165] PM: Syncing filesystems ... done.
    [ 3385.661349] Freezing user space processes ... 
    [ 3385.671207] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable
    [ 3385.678933] dwc2 f72c0000.usb: GINNakEff triggered
    [ 3385.685718] (elapsed 0.019 seconds) done.
    [ 3385.689860] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
    [ 3385.700092] Suspending console(s) (use no_console_suspend to debug)
    [ 3385.736020] PM: suspend of devices complete after 27.195 msecs
    [ 3385.740811] PM: late suspend of devices complete after 4.765 msecs
    [ 3385.743919] PM: noirq suspend of devices complete after 3.090 msecs
    Disabling and Enabling non-boot CPUs
    [ 3386.209126] PM: noirq resume of devices complete after 1.865 msecs
    [ 3386.212066] PM: early resume of devices complete after 2.460 msecs
    [ 3386.234729] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
    [ 3386.311480] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0)
    [ 3386.410411] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
    [ 3386.458232] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0)
    [ 3386.458729] PM: resume of devices complete after 246.646 msecs
    [ 3386.818770] Restarting tasks ... 
    [ 3386.827026] done.
    [ 3386.844139] PM: suspend exit 1970-01-01 00:57:40.624589167 UTC


    [ 3471.760265] PM: Syncing filesystems ... done.
    [ 3471.771897] Freezing user space processes ... 
    [ 3471.780407] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable
    [ 3471.788105] dwc2 f72c0000.usb: GINNakEff triggered
    [ 3471.794916] (elapsed 0.018 seconds) done.
    [ 3471.799078] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
    [ 3471.809320] Suspending console(s) (use no_console_suspend to debug)
    [ 3471.847947] PM: suspend of devices complete after 29.905 msecs
    [ 3471.852473] PM: late suspend of devices complete after 4.497 msecs
    [ 3471.855611] PM: noirq suspend of devices complete after 3.120 msecs

    [ 3481.034722] PM: noirq resume of devices complete after 1.945 msecs
    [ 3481.037992] PM: early resume of devices complete after 2.694 msecs
    [ 3481.062803] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
    [ 3481.137795] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0)
    [ 3481.234796] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
    [ 3481.278601] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0)
    [ 3481.279396] PM: resume of devices complete after 241.388 msecs
    [ 3481.358513] Restarting tasks ... done.
    [ 3481.377766] PM: suspend exit 1970-01-01 00:59:15.332218333 UTC

     

    5.1.1.3 resume_console节省时间

    对比resume_console可以发现,mem要比freeze多210ms。

    5.1.2 Ubuntu

    此工具在Ubuntu上显示了更强大的功能。

    支持了callgraph功能之后,更能清晰地分析每个设备或者子系统的suspend/resume占用的时间。

    sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m mem
    sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m freeze

    在对比两种不同suspend模式后,发现freeze花费的时间要比mem少。这也符合预期,但是没有功耗数据?_?。

    image

    image

    下面着重分析一下如何基于此工具分析。

    5.1.3 工具界面总体分析

    最上面显示Kernel Suspend Time和Kernel Resume Time,可以从总体上查看是否有回退或者进步。

    再下面是一些缩放按钮。

    然后就是基于timeline的图表,比对颜色示意图,可以清晰看出suspend prepare、suspend、suspend late、suspend irq、suspend machine、resume machine、resume irq、resume early、resume和resume complete的分布。

    最下面是每个模块、子系统的详细函数调用图以及开始时间、消耗时间。

    image

    5.1.4 子系统、模块详细分析

    选中一个模块,会在最下面显示详细的模块在suspend/resume各个阶段消费的时间,以及函数调用关系图。

    image

    5.1.5 缩放查看细节

    ZOOM IN放大,ZOOMOUT缩小,ZOOM 1:1恢复原始尺寸。

    通过在timeline图表,放大可以查看到更小的模块消耗的时间。从宏观到模块,再到函数消耗时间,逐步细化,很有利于分析。

    image

    如果发现某个函数占用时间较大,可以逐级展开。知道发现最终占用较大的函数,发现问题所在。

    image

    5.1.6 工具代码分析

    首先从入口main开始,和大多数工具一样开始都是解析命令选项,打印help信息;将所有的待测对象参数和测试参数保存在sysvals;

    # ----------------- MAIN --------------------
    # exec start (skipped if script is loaded as library)
    if __name__ == '__main__':
        cmd = ''
        cmdarg = ''
        multitest = {'run': False, 'count': 0, 'delay': 0}
        # loop through the command line arguments
        args = iter(sys.argv[1:])
        for arg in args:
            …

        # just run a utility command and exit
        if(cmd != ''):
            if(cmd == 'status'):
                statusCheck()
            elif(cmd == 'fpdt'):
                if(sysvals.android):
                    doError('cannot read FPDT on android device', False)
                getFPDT(True)
            elif(cmd == 'usbtopo'):
                if(sysvals.android):
                    doError('cannot read USB topology '+
                        'on an android device', False)
                detectUSB(True)
            elif(cmd == 'modes'):
                modes = getModes()
                print modes
            elif(cmd == 'usbauto'):
                setUSBDevicesAuto()
            elif(cmd == 'summary'):
                print("Generating a summary of folder "%s"" % cmdarg)
                runSummary(cmdarg, True)
            sys.exit()

        # run test on android device
        if(sysvals.android):  注释此段代码可以在Android上支持callgraph
            #if(sysvals.usecallgraph): 
            #    doError('ftrace (-f) is not yet supported '+
            #        'in the android kernel', False)
            if(sysvals.notestrun):
                doError('cannot analyze test files on the '+
                    'android device', False)

        # if instructed, re-analyze existing data files
        if(sysvals.notestrun):  分析已有数据文件,不需要重新测试
            rerunTest()
            sys.exit()

        # verify that we can run a test
        if(not statusCheck()):  检查测试条件是否满足
            print('Check FAILED, aborting the test run!')
            sys.exit()

        if multitest['run']:  连续多次测试
            # run multiple tests in a separte subdirectory
            s = 'x%d' % multitest['count']
            subdir = datetime.now().strftime('suspend-'+s+'-%m%d%y-%H%M%S')
            os.mkdir(subdir)
            for i in range(multitest['count']):
                if(i != 0):
                    print('Waiting %d seconds...' % (multitest['delay']))
                    time.sleep(multitest['delay'])
                print('TEST (%d/%d) START' % (i+1, multitest['count']))
                runTest(subdir)  进行单次测试
                print('TEST (%d/%d) COMPLETE' % (i+1, multitest['count']))
            runSummary(subdir, False)  生成summary.html
        else:
            # run the test in the current directory
            runTest(".")

    sysvals.android表示是否在Android设备进行测试。

    sysvals.usecallgraph表示是否生成函数调用关系图。

    sysvals.rtcwake表示是否使用rtc进行唤醒。

    针对Ubuntu之类的host设备,测试进行的很顺利。但是针对Android设备,在callgraph还存在一点问题。

    run_Test无疑作为核心,收集log信息(ftrace、dmesg),执行suspend/resume,生成输出文件(txt、html)。

    def runTest(subdir):
        global sysvals

        # prepare for the test
        if(not sysvals.android):  针对不同的待测设备,初始化ftrace
            initFtrace()
        else:
            initFtraceAndroid()
        sysvals.initTestOutput(subdir)  生成输出目录,输出文件名等。

        vprint('Output files:     %s' % sysvals.dmesgfile)
        if(sysvals.usecallgraph or
            sysvals.usetraceevents or
            sysvals.usetraceeventsonly):
            vprint('    %s' % sysvals.ftracefile)
        vprint('    %s' % sysvals.htmlfile)

        # execute the test  执行测试,实际上命令内容基本一致。只是针对Android设备,增加了adb shell '…'。
        if(not sysvals.android):
            executeSuspend()
        else:
            executeAndroidSuspend()

        # analyze the data and create the html output
        print('PROCESSING DATA')
        if(sysvals.usetraceeventsonly):  3.15之后的版本,只需要通过ftrace即可获取足够信息。之前的版本的数据都存在dmesg中。
            # data for kernels 3.15 or newer is entirely in ftrace
            testruns = parseTraceLog()
        else:
            # data for kernels older than 3.15 is primarily in dmesg
            testruns = loadKernelLog()
            for data in testruns:
                parseKernelLog(data)
            if(sysvals.usecallgraph or sysvals.usetraceevents):
                appendIncompleteTraceLog(testruns)
        createHTML(testruns)  根据解析的数据生成html矢量图表

    executeAndroidSuspend在Android设备上操作sysfs节点来配置ftrace,抓取log,suspend/resume,然后将log拉到主机。

    def executeAndroidSuspend():
        global sysvals

        # check to see if the display is currently off
        tp = sysvals.tpath
        out = os.popen(sysvals.adb+
            ' shell dumpsys power | grep mScreenOn').read().strip()
        # if so we need to turn it on so we can issue a new suspend
        if(out.endswith('false')):
            print('Waking the device up for the test...')
            # send the KEYPAD_POWER keyevent to wake it up
            os.system(sysvals.adb+' shell input keyevent 26')
            # wait a few seconds so the user can see the device wake up
            time.sleep(3)
        # execute however many s/r runs requested
        for count in range(1,sysvals.execcount+1):
            # clear the kernel ring buffer just as we start
            os.system(sysvals.adb+' shell dmesg -c > /dev/null 2>&1')  清空dmesg
            # start ftrace
            if(sysvals.usetraceevents):
                print('START TRACING')
                os.system(sysvals.adb+" shell 'echo 1 > "+tp+"tracing_on'")  开始ftrace抓取
            # initiate suspend
            for count in range(1,sysvals.execcount+1):
                if(sysvals.usetraceevents):
                    os.system(sysvals.adb+
                        " shell 'echo SUSPEND START > "+tp+"trace_marker'")  写SUSPEND START到ftrace,作为开始标记。后面解析log,会以此为标记。
                if(sysvals.rtcwake):
                    print('SUSPEND START')
                    print('will autoresume in %d seconds' % sysvals.rtcwaketime)
                    os.system(sysvals.adb+" shell 'echo +%d > /sys/class/rtc/rtc0/wakealarm'"%(sysvals.rtcwaketime))  设置wakeup resource
                else:
                    print('SUSPEND START (press a key to resume)')

                os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+
                    " > "+sysvals.powerfile+"'")  进入suspend,之后就是resume
                # execution will pause here, then adb will exit
                while(True):  轮询adb shell pwd判断设备是否被唤醒
                    check = os.popen(sysvals.adb+
                        ' shell pwd 2>/dev/null').read().strip()
                    if(len(check) > 0):
                        break
                    time.sleep(1)
                if(sysvals.usetraceevents):
                    os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+
                        "trace_marker'")  写RESUME COMPLETE到ftrace,作为结束标记
            # return from suspend
            print('RESUME COMPLETE')
            # stop ftrace
            if(sysvals.usetraceevents):
                os.system(sysvals.adb+" shell 'echo 0 > "+tp+"tracing_on'") 关闭ftrace功能
                print('CAPTURING TRACE')
                os.system('echo "'+sysvals.teststamp+'" > '+sysvals.ftracefile)
                os.system(sysvals.adb+' shell cat '+tp+
                    'trace >> '+sysvals.ftracefile)  将/sys/kernel/debug/tracing/trace内容保存到本地log
            # grab a copy of the dmesg output
            print('CAPTURING DMESG')
            os.system('echo "'+sysvals.teststamp+'" > '+sysvals.dmesgfile)
            os.system(sysvals.adb+' shell dmesg >> '+sysvals.dmesgfile)  将dmesg保存到本地

    parseTraceLog用于解析ftrace log,phase的判断是依据suspend_resume关键词;每个模块的开始结束是以device_pm_callback_start/device_pm_callback_end作为判断;还调用FTraceCallGraph进行函数调用关系的解析。

    createHTML是这个工具真正NB的地方,对parseTraceLog结果进行了可视化,生成可缩放、查看细节的html文件。

     

    6 对工具的改进

    虽然工具非常强大,但是在使用中还是有一些视角没有覆盖到。所以做了一些改进。

    在Android上使能rtcwake;在Android上使能callgraph;针对多次测试生成csv比较不同phase消耗时间,比summary.html更细化;这对每次测试给出Phase时间和每个Phase内Device消耗时间。

    6.1 Android上使能rtcwake

    https://github.com/arnoldlu/common-use/commit/a862d8c2a4f9bd005c516c6b61b394386b882217

    可以在Android上使用rtc作为唤醒源,可以在没有实体按键的设备上进行测试。

    6.2 Android上使能callgraph

    https://github.com/arnoldlu/common-use/commit/f8e288753a472cf48ccc0e9d7ffc67978c7d165e

    如果没有callgraph只能显示Phase级别的信息,不能显示每个device的信息以及内部函数耗费的时间。

    6.3 单次测试summary结果

    https://github.com/arnoldlu/common-use/commit/53c270669bb0dfaada53e29852999d5367ec65da

    在每次测试目录下,生成一个summary_phase_dev.csv文件。可以直观的看到不同Phase、不同device消耗的时间。

    image

    如果想要发现那个模块消耗最大时间,可以使用Excel的Filter功能。比如想看suspend_prepare下Device消耗时间有大到小排列。

    这样就可以找出每个Phase中消耗资源大户。

    image

    6.4 多次测试summary结果

    https://github.com/arnoldlu/common-use/commit/d162c4827a0cdc50fe94d3f1303af682b387dc3d

    生成summary_phase.csv文件,按每次测试的不同phase显示耗费时间。

    可以比较不同测试phase的时间耗费,看出哪一个phase存在回退现象。

    image

    6.5 suspend/resume起止时间点判断

    analyze_suspend.py在解析log的时候,以SUSPEND START作为起点,以RESUME COMPLETE为终点。

    在发送SUSPEND START之后,触发suspend动作。在这期间,如果host存在一定抢占,会增加suspend时间。

    然后通poll设备的adb状态,来判断是否resume。一方面,adb可用状态要在resume结束之后,另一方面,在最坏的情况下,可能存在1s的误差,这对于毫秒级的resume来说是非常严重的一个结果。

    最后发送RESUME COMPLETE作为结束。

    if(sysvals.usetraceevents):
        os.system(sysvals.adb+
            " shell 'echo SUSPEND START > "+tp+"trace_marker'")
    print('SUSPEND START (press a key on the device to resume)')
    os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+
        " > "+sysvals.powerfile+"'")
    # execution will pause here, then adb will exit
    while(True):
        check = os.popen(sysvals.adb+
            ' shell pwd 2>/dev/null').read().strip()
        if(len(check) > 0):
            break
        time.sleep(1)
    if(sysvals.usetraceevents):
        os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+
            "trace_marker'")

    更好的方式是在enter_state的开头结尾加ftrace,然后解析的时候以此为标记。

    @@ -486,6 +496,7 @@ static int enter_state(suspend_state_t state)
    {
            int error;
     
    +       trace_suspend_resume(TPS("enter_state"), state, true);
            trace_suspend_resume(TPS("suspend_enter"), state, true);
            if (state == PM_SUSPEND_FREEZE) {
    #ifdef CONFIG_PM_DEBUG
    @@ -532,6 +543,7 @@ static int enter_state(suspend_state_t state)
            suspend_finish();
      Unlock:
            mutex_unlock(&pm_mutex);
    +       trace_suspend_resume(TPS("enter_state"), state, false);
            return error;
    }

    7 分析步骤

    本着从宏观到微观的进阶,一步步分找出可以优化的点。

    下面是从开始一次测试到每次测试到suspend/resume不同phase,再到每个phase里面device callback的关系。

    image

    下面是每一次正常suspend/resume的流程,之前每个阶段函数分析也可以看出他们的对称关系。

    image

    在修改了工具对于suspend和resume时间判断的bug过后,得到了一组的数据。

    image

    分析一下稳定性,均方差比较小,还算比较稳定。数据稳定之后,就可以进行详细分析了。

    image

    下面查看每次测试的每个phase数据,可以看出每个phase数据的稳定性,以及每个phase费时占比。找出费时大户,suspend_prepare、suspend、suspend_machine、resume_machine、resume、resume_complete。

    image

    针对上述六个phase,列出Top 10设备或者子系统。

    从下图可以看出,freeze_processes、sync_filesystems、mmc0、mmc2、CUP0~7、resume_console、tsensor是需要重点分析的设备。

    imageimage

    imageimage

    imageimage

    不区分phase列出Top 30如下,下面逐一分析可优化的空间。

    image

    7.1 resume_console

    adb shell 'echo N > /sys/module/printk/parameters/console_suspend'
    adb shell 'cat /sys/module/printk/parameters/console_suspend'

    先看一下resume_console流程函数:

    void resume_console(void)
    {
        if (!console_suspend_enabled)
            return;
        down_console_sem();  获取console_sem和console_lock_dep_map
        console_suspended = 0;
        console_unlock();  
    }

    通过分析ftrace发现,主要时间消耗在console_unlock中。因为在console_lock被占用期间,有相当一部分由printk缓存的log。所以在释放锁之前需要将其处理掉。

    void console_unlock(void)
    {
        static char ext_text[CONSOLE_EXT_LOG_MAX];
        static char text[LOG_LINE_MAX + PREFIX_MAX];
        static u64 seen_seq;
        unsigned long flags;
        bool wake_klogd = false;
        bool do_cond_resched, retry;

        trace_console_lock("console_unlock start", strlen("console_unlock start"));
        if (console_suspended) {
            up_console_sem();
            return;
        }

        /*
         * Console drivers are called under logbuf_lock, so
         * @console_may_schedule should be cleared before; however, we may
         * end up dumping a lot of lines, for example, if called from
         * console registration path, and should invoke cond_resched()
         * between lines if allowable.  Not doing so can cause a very long
         * scheduling stall on a slow console leading to RCU stall and
         * softlockup warnings which exacerbate the issue with more
         * messages practically incapacitating the system.
         */
        do_cond_resched = console_may_schedule;
        console_may_schedule = 0;

        /* flush buffered message fragment immediately to console */
        console_cont_flush(text, sizeof(text));
    again:
        for (;;) {  如果默认的LOGLEVEL定的比较高,即优先级低,则会有相当多的log需要打印。占用很多时间。
            …
        }
        console_locked = 0;

        /* Release the exclusive_console once it is used */
        if (unlikely(exclusive_console))
            exclusive_console = NULL;

        raw_spin_unlock(&logbuf_lock);

        up_console_sem();  释放console_sem和console_lock_dep_map

        /*
         * Someone could have filled up the buffer again, so re-check if there's
         * something to flush. In case we cannot trylock the console_sem again,
         * there's a new owner and the console_unlock() from them will do the
         * flush, no worries.
         */
        raw_spin_lock(&logbuf_lock);
        retry = console_seq != log_next_seq;
        raw_spin_unlock_irqrestore(&logbuf_lock, flags);

        if (retry && console_trylock())
            goto again;

        if (wake_klogd)
            wake_up_klogd();
        trace_console_lock("console_unlock end", strlen("console_unlock end"));
    }

    那么问题就变得简单了,减少printk量就可以了。

    通过cat /proc/sys/kernel/printk可以得到。在kernel/sysctl.c中有其实现。

    7    4    1    7

    这四个值分别对应:

    #define console_loglevel (console_printk[0])
    #define default_message_loglevel (console_printk[1])
    #define minimum_console_loglevel (console_printk[2])
    #define default_console_loglevel (console_printk[3])

    又对应到:

    int console_printk[4] = {
        CONSOLE_LOGLEVEL_DEFAULT,    /* console_loglevel */
        MESSAGE_LOGLEVEL_DEFAULT,    /* default_message_loglevel */
        CONSOLE_LOGLEVEL_MIN,        /* minimum_console_loglevel */
        CONSOLE_LOGLEVEL_DEFAULT,    /* default_console_loglevel */
    };

    /* We show everything that is MORE important than this.. */
    #define CONSOLE_LOGLEVEL_SILENT  0 /* Mum's the word */
    #define CONSOLE_LOGLEVEL_MIN     1 /* Minimum loglevel we let people use */
    #define CONSOLE_LOGLEVEL_QUIET     4 /* Shhh ..., when booted with "quiet" */
    #define CONSOLE_LOGLEVEL_DEFAULT 7 /* anything MORE serious than KERN_DEBUG */
    #define CONSOLE_LOGLEVEL_DEBUG    10 /* issue debug messages */
    #define CONSOLE_LOGLEVEL_MOTORMOUTH 15    /* You can't shut this one up */

    可知只要内核log优先级高于KERN_DEBUG都会被打印。由下表可知几乎所有的log都会被打印。这就会造成printk相当繁忙,console_unlock会处理相当多信息。

    #define KERN_EMERG    KERN_SOH "0"    /* system is unusable */
    #define KERN_ALERT    KERN_SOH "1"    /* action must be taken immediately */
    #define KERN_CRIT    KERN_SOH "2"    /* critical conditions */
    #define KERN_ERR    KERN_SOH "3"    /* error conditions */
    #define KERN_WARNING    KERN_SOH "4"    /* warning conditions */
    #define KERN_NOTICE    KERN_SOH "5"    /* normal but significant condition */
    #define KERN_INFO    KERN_SOH "6"    /* informational */
    #define KERN_DEBUG    KERN_SOH "7"    /* debug-level messages */

    #define KERN_DEFAULT    KERN_SOH "d"    /* the default kernel loglevel */

    想解决也很简单,提高console_loglevel的优先级。

    diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
    old mode 100644
    new mode 100755
    index e7e586b..b927d67
    --- a/kernel/printk/printk.c
    +++ b/kernel/printk/printk.c
    @@ -60,7 +60,7 @@ extern void printascii(char *);
    #endif
     
    int console_printk[4] = {
    -       CONSOLE_LOGLEVEL_DEFAULT,       /* console_loglevel */
    +       CONSOLE_LOGLEVEL_QUIET, /* console_loglevel */
            MESSAGE_LOGLEVEL_DEFAULT,       /* default_message_loglevel */
            CONSOLE_LOGLEVEL_MIN,           /* minimum_console_loglevel */
            CONSOLE_LOGLEVEL_DEFAULT,       /* default_console_loglevel */

    在进行修改后,再来进行对比测试。可以看出消耗时间得到显著提升,优化后的resume_complete时间基本上可以忽略不计。

    7, mem
        Line 748: resume_complete,resume_console[3],248.54900000002544
        Line 748: resume_complete,resume_console[3],248.6340000000382
        Line 748: resume_complete,resume_console[3],248.26499999994667
        Line 748: resume_complete,resume_console[3],248.3510000000706
        Line 748: resume_complete,resume_console[3],248.42499999999745

    7, freeze
        Line 996: resume_complete,resume_console[1],76.18400000001202
        Line 996: resume_complete,resume_console[1],76.19500000009793
        Line 996: resume_complete,resume_console[1],76.3280000001032
        Line 996: resume_complete,resume_console[1],76.1689999999362
        Line 996: resume_complete,resume_console[1],76.19999999997162

    4, freeze
        Line 996: resume_complete,resume_console[1],0.1010000000007949
        Line 996: resume_complete,resume_console[1],0.10499999999069587
        Line 996: resume_complete,resume_console[1],0.09799999997994746
        Line 996: resume_complete,resume_console[1],0.1010000000007949
        Line 996: resume_complete,resume_console[1],0.10000000003174137

    4, mem
        Line 749: resume_complete,resume_console[3],0.3370000000586515
        Line 749: resume_complete,resume_console[3],0.33800000005612674
        Line 749: resume_complete,resume_console[3],0.37700000007134804
        Line 749: resume_complete,resume_console[3],0.3359999999474894
        Line 749: resume_complete,resume_console[3],0.3429999999298161

    7.2 mmc suspend/resuem分析

    从下图可知,mmc相关suspend/resume主要在mmc0:0001和mmc2:0001两个设备的suspend/resume。下面重点分析这两个设备的suspend/resume回调函数。

    image

    在执行suspend过程中,先将bus上面的设备driver先suspend,然后在suspend bus。

    在resume时,过程相反,先bus resume,然后再逐个设备driver resume。

    mmc0:0001

    那就来看看bus和各个设备耗费的时间:

    4013.868837 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [suspend] */
    4013.868893 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */
    4013.869000 |   4)    sh-4511     |               |  /* device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [suspend] */
    4013.869053 |   4)    sh-4511     |               |  /* device_pm_callback_end: block mmcblk0, err=0 */
    4013.889229 |   5)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [suspend] */
    4013.914631 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */
    4022.787571 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [resume] */
    4022.886749 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */
    4023.059198 |   0)    sh-4511     |               |  /* device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [resume] */
    4023.059270 |   0)    sh-4511     |               |  /* device_pm_callback_end: block mmcblk0, err=0 */
    4023.059398 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [resume] */
    4023.059830 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */

    可以看出driver的suspend/resume并没有耗费太多时间,主要在mmc bus的suspend/resume耗费了太多时间。

    在drivers/mmc/core/bus.c中

    static struct bus_type mmc_bus_type = {
        .name        = "mmc",
        .dev_groups    = mmc_dev_groups,
        .match        = mmc_bus_match,
        .uevent        = mmc_bus_uevent,
        .probe        = mmc_bus_probe,
        .remove        = mmc_bus_remove,
        .shutdown    = mmc_bus_shutdown,
        .pm        = &mmc_bus_pm_ops,
    };

    mmc_bus_pm_bus对应的suspend/resume函数如下:

    static const struct dev_pm_ops mmc_bus_pm_ops = {
        SET_RUNTIME_PM_OPS(mmc_runtime_suspend, mmc_runtime_resume, NULL)
        SET_SYSTEM_SLEEP_PM_OPS(mmc_bus_suspend, mmc_bus_resume)
    };

    mmc bus的suspend/resume如下:

    static int mmc_bus_suspend(struct device *dev)
    {
        struct mmc_card *card = mmc_dev_to_card(dev);
        struct mmc_host *host = card->host;
        int ret;

        ret = pm_generic_suspend(dev);  对应设备驱动的suspend回调函数。
        if (ret)
            return ret;

        ret = host->bus_ops->suspend(host);   这里的host指的是mmc_host,bus_ops指的是mmc_ops。
        return ret;
    }

    static int mmc_bus_resume(struct device *dev)
    {
        struct mmc_card *card = mmc_dev_to_card(dev);
        struct mmc_host *host = card->host;
        int ret;

        ret = host->bus_ops->resume(host);  这里的host指的是mmc_host,bus_ops指的是mmc_ops。


        if (ret)
            pr_warn("%s: error %d during resume (card was removed?) ",
                mmc_hostname(host), ret);

        ret = pm_generic_resume(dev);  对应设备的驱动的resume回调函数。
        return ret;
    }

    pm_generic_suspend和pm_generic_resume是对子系统设备的通用回调函数。

    int pm_generic_suspend(struct device *dev)
    {
        const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

        return pm && pm->suspend ? pm->suspend(dev) : 0;
    }

    int pm_generic_resume(struct device *dev)
    {
        const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

        return pm && pm->resume ? pm->resume(dev) : 0;
    }

    从中可以看出,如果对应设备存在dev->driver->pm->suspend和dev->driver->pm->resume则,调用回调函数。

    mmc_bus_suspend

    mmc_bus_suspend花费了大概25.4ms。下面是ftrace中截取的一段,从中可以看出时间主要消耗在struct mmc_bus_ops mmc_ops的.suspend回调函数。

    /*            */   mmc_bus_suspend() {
    /*            */     pm_generic_suspend() {
    /*! 307.552 us*/     }
    /*            */     mmc_suspend() {
    /** 25060.78 us*/      } /* mmc_suspend */
    /** 25378.28 us*/    } /* mmc_bus_suspend */

    mmc_bus_resume

    mmc_resume消耗了大部分时间,整个流程才99.158ms。

    /*              */   mmc_bus_resume() {
    /*              */     mmc_resume() {
    /* * 97167.39 us*/     }
    /*              */     pm_generic_resume() {
    /* # 1980.104 us*/     }
    /* * 99158.12 us*/   }

    mmc_suspend

    对应的host->bus_ops,即mmc_ops。在host下的设备都suspend之后,suspend mmc_host。

    在mmc_host resume之后,才能进行设备的resume。

    static const struct mmc_bus_ops mmc_ops = {
        .remove = mmc_remove,
        .detect = mmc_detect,
        .suspend = mmc_suspend,
        .resume = mmc_resume,
        .runtime_suspend = mmc_runtime_suspend,
        .runtime_resume = mmc_runtime_resume,
        .alive = mmc_alive,
        .shutdown = mmc_shutdown,
        .reset = mmc_reset,
    };

    通过分析ftrace.txt文件,发现其中msleep花费了17.1ms,这里是存在问题的。

    int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
            unsigned int timeout_ms, bool use_busy_signal, bool send_status,
            bool ignore_crc)
    {
        struct mmc_host *host = card->host;
        int err;
        struct mmc_command cmd = {0};
        unsigned long timeout;
        u32 status = 0;
        bool use_r1b_resp = use_busy_signal;

        mmc_retune_hold(host);


            /*
             * We are not allowed to issue a status command and the host
             * does'nt support MMC_CAP_WAIT_WHILE_BUSY, then we can only
             * rely on waiting for the stated timeout to be sufficient.
             */
            if (!send_status) {
                mmc_delay(timeout_ms);
                goto out;
            }


    }

    mmc_resume

    通过分析ftrece.txt,可以发现mmc_resume存在4个msleep,共消耗了12646.35 +14260.78 +13881.66 +15093.22 =55.882 ms。

    关于mmc_ops的suspend/resume/runtime_suspend/runtime_resume的探讨

    先来看看这四个函数的,其流程受到MMC_CAP_AGGRESSIVE_PM和MMC_CAP_RUNTIME_RESUME两个flag的控制。执行的实体都是_mmc_suspend、_mmc_resume。

    static int mmc_suspend(struct mmc_host *host)
    {
        int err;

        err = _mmc_suspend(host, true);
        if (!err) {
            pm_runtime_disable(&host->card->dev);
            pm_runtime_set_suspended(&host->card->dev);
        }

        return err;
    }

    static int mmc_resume(struct mmc_host *host)
    {
        int err = 0;

        if (!(host->caps & MMC_CAP_RUNTIME_RESUME)) {
            err = _mmc_resume(host);
            pm_runtime_set_active(&host->card->dev);
            pm_runtime_mark_last_busy(&host->card->dev);
        }
        pm_runtime_enable(&host->card->dev);

        return err;
    }


    static int mmc_runtime_suspend(struct mmc_host *host)
    {
        int err;

        if (!(host->caps & MMC_CAP_AGGRESSIVE_PM))
            return 0;

        err = _mmc_suspend(host, true);
        if (err)
            pr_err("%s: error %d doing aggressive suspend ",
                mmc_hostname(host), err);

        return err;
    }


    static int mmc_runtime_resume(struct mmc_host *host)
    {
        int err;

        if (!(host->caps & (MMC_CAP_AGGRESSIVE_PM | MMC_CAP_RUNTIME_RESUME)))
            return 0;

        err = _mmc_resume(host);
        if (err)
            pr_err("%s: error %d doing aggressive resume ",
                mmc_hostname(host), err);

        return 0;
    }

    1.如果两flag都没有定义,则runtime_suspend和runtim_resume都是空函数。起作用的就是跟随系统的suspend/resume流程。

    2.如果只定义了MMC_CAP_RUNTIME_RESUME,则不会runtime_suspend。并且在系统resume的时候,不会执行resume回调函数。只会在根据需要执行runtime_resume。使用runtime_resume代替了resume。

    3.如果只定义了MMC_CAP_AGGRESSIVE_PM ,则suspend/resume跟随系统suspend/resume流程。并且runtime_suspend/resume_resume也根据实际情况执行。一切正常。

    4.如果两者都定义了,既可以suspend也可以runtime_suspend,但是只能runtime_resume,不能跟随系统resume流程执行resume回调函数。

    也就是说MMC_CAP_AGGRESSIVE_PM 则runtime_suspend/runtime_resume都可用,MMC_CAP_RUNTIME_RESUME则只能使用runtime_resume执行resume功能。

    那么就来看一下,在应用了MMC_CAP_RUNTIME_RESUME之后效果如何。

    mmc0:0001增加runtime-suspend属性:

    diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
    old mode 100644
    new mode 100755
    index 09e2c71..2cec392
    --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
    +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
    @@ -801,6 +801,7 @@
                            clock-names = "ciu", "biu";
                            resets = <&sys_ctrl PERIPH_RSTDIS0_MMC0>;
                            bus-width = <0x8>;
    +                      runtime-suspend;
                            vmmc-supply = <&ldo19>;
                            pinctrl-names = "default";
                            pinctrl-0 = <&emmc_pmx_func &emmc_clk_cfg_func

    修改DeviceTree解析文件,增加MMC_CAP_RUNTIME_RESUME。

    index 094202c..35fd7b5
    --- a/drivers/mmc/host/dw_mmc.c
    +++ b/drivers/mmc/host/dw_mmc.c
    @@ -2922,6 +2922,10 @@ static struct dw_mci_board *dw_mci_parse_dt(struct dw_mci *host)
                    dev_info(dev, "supports-highspeed property is deprecated. ");
                    pdata->caps |= MMC_CAP_SD_HIGHSPEED | MMC_CAP_MMC_HIGHSPEED;
            }
    +       if (of_find_property(np, "runtime-suspend", NULL)) {
    +               dev_info(dev, "supports-highspeed property is deprecated. ");
    +               pdata->caps |=  MMC_CAP_RUNTIME_RESUME;
    +       }
     
            return pdata;
    }

    修改后mmc0:0001的resume达到了预期,mmc_resume没有被执行。

    针对统计结果,效果明显。

    image

    虽然没有在系统resume过程中执行,但是mmc0:0001总要resume。只不过稍微延迟了,不再这个工具统计之中。

    延后执行的mmc0:0001的resume耗费了72.317ms,也和之前的差不多。实际上没有对整个流程作出实质贡献,只是不在统计数据之内。

    [32m[   32.486851] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31 caps=40138143 caps2=0)
    [32m[   32.500871] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 1000 1000 664
    [32m[   32.501305] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq 1000 1000 664
    [32m[   32.540313] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq 1000 1000 664
    [32m[   32.540747] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq 1000 1000 664
    [32m[   32.559168] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0 caps=40138143 caps2=0)

    mmc2:0001

    mmc2:0001和mmc0:0001的区别在于不同的mmc_bus_ops,mmc2:0001是SDIO接口,对应的应该是mmc_sdio_ops。

    4013.876306 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [suspend] */
    4013.876360 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */
    4013.876397 |   4)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [suspend] */
    4013.876437 |   4)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
    4013.876470 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [suspend] */
    4013.876525 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
    4013.876556 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [suspend] */
    4013.876596 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
    4013.881676 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [suspend] */
    4013.881698 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
    4013.881740 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [suspend] */
    4013.881765 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
    4013.882582 |   4)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [suspend] */
    4013.882603 |   4)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
    4013.882645 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [suspend] */
    4013.885524 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */
    4022.888667 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [resume] */
    4023.042980 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */
    4023.043021 |   0)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [resume] */
    4023.043037 |   0)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
    4023.043067 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [resume] */
    4023.043089 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
    4023.043128 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [resume] */
    4023.043151 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
    4023.048824 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [resume] */
    4023.048877 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
    4023.048916 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [resume] */
    4023.048979 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
    4023.049011 |   0)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [resume] */
    4023.049074 |   0)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
    4023.049113 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [resume] */
    4023.049165 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */

    由下可知不同部分在于mmc_host的suspend/resume,pm_generic_suspend/pm_generic_resume基本上耗费的时间都很少。

    所以重点看看mmc_sdio_suspend和mmc_sdio_resume两个函数。

    /*              */    mmc_bus_suspend() {
    /*   0.833 us   */      pm_generic_suspend();
    /*              */      mmc_sdio_suspend() {
    /* # 2854.687 us*/      }
    /* # 2864.115 us*/    }

    /*              */    mmc_bus_resume() {
    /*              */      mmc_sdio_resume() {
    /* @ 154277.3 us*/      }
    /*   1.563 us   */      pm_generic_resume();
    /* @ 154290.3 us*/    }

    mmc_sdio_suspend

    时间很短,不关注。

    mmc_sdio_resume

    mmc2:001的mmc_bus_resume时间达到154.313,mmc_sdio_resume包含三个msleep共75331.82+15953.43+14369.58=105654.83us=105.654ms。

    考虑:是否可以将SDIO的resume也像MMC那样延后执行呢?

    7.3 CPU_OFF/CPU_ON

    在分析了resome_console和mmc之后,再来看一下CPU_OFF/CPU_ON的执行过程。

    在disable_nonboot_cpus中选取first_cpu,除此之外的所有for_each_online_cpu都会被_cpu_down,并且将其放到frozen_cpus上。

    在enable_nonboot_cpus中,遍历frozen_cpus,将其_cpu_up。

    针对HiKey,真个流程就是对CPU 1-7进行关闭、打开的操作,所以重点分析一下_cpu_down和_cpu_up。

    耗时最大的三个地方都用粗体下划线标出,除了发送状态通知之外,还有rcu sync处理。

    对cpu_chain上所有注册notifier,逐个执行回调函数notifier_call,根据action进行处理,这是一个很耗时的过程。

    /* Requires cpu_add_remove_lock to be held */
    static int _cpu_down(unsigned int cpu, int tasks_frozen)
    {
        int err, nr_calls = 0;
        void *hcpu = (void *)(long)cpu;
        unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
        struct take_cpu_down_param tcd_param = {
            .mod = mod,
            .hcpu = hcpu,
        };

        if (num_online_cpus() == 1)  如果online只有一个CPU,则无法再进行down操作。
            return -EBUSY;

        if (!cpu_online(cpu))  如果当前CPU没有online,则无需进行down。
            return -EINVAL;

        cpu_hotplug_begin();  取得cpu_hotplug.lock锁

        err = __cpu_notify(CPU_DOWN_PREPARE | mod, hcpu, -1, &nr_calls);  在cpu_chain上发从CPU_DOWN_PREPARE状态。
        if (err) {
            nr_calls--;
            __cpu_notify(CPU_DOWN_FAILED | mod, hcpu, nr_calls, NULL);
            pr_warn("%s: attempt to take down CPU %u failed ",
                __func__, cpu);
            goto out_release;
        }

        /*
         * By now we've cleared cpu_active_mask, wait for all preempt-disabled
         * and RCU users of this state to go away such that all new such users
         * will observe it.
         *
         * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
         * not imply sync_sched(), so wait for both.
         *
         * Do sync before park smpboot threads to take care the rcu boost case.
         */
        if (IS_ENABLED(CONFIG_PREEMPT))
            synchronize_rcu_mult(call_rcu, call_rcu_sched);
        else
            synchronize_rcu();

        smpboot_park_threads(cpu);  将此CPU的由kthread_create创建的线程设置为PARKED。

        /*
         * Prevent irq alloc/free while the dying cpu reorganizes the
         * interrupt affinities.
         */
        irq_lock_sparse();

        /*
         * So now all preempt/rcu users must observe !cpu_active().
         */
        err = stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
        if (err) {
            /* CPU didn't die: tell everyone.  Can't complain. */
            cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
            irq_unlock_sparse();
            goto out_release;
        }
        BUG_ON(cpu_online(cpu));  如果指定的CPU还处于online状态,则触发kernel panic。

        /*
         * The migration_call() CPU_DYING callback will have removed all
         * runnable tasks from the cpu, there's only the idle task left now
         * that the migration thread is done doing the stop_machine thing.
         *
         * Wait for the stop thread to go away.
         */
        while (!per_cpu(cpu_dead_idle, cpu))
            cpu_relax();
        smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
        per_cpu(cpu_dead_idle, cpu) = false;

        /* Interrupts are moved away from the dying cpu, reenable alloc/free */
        irq_unlock_sparse();

        hotplug_cpu__broadcast_tick_pull(cpu);
        /* This actually kills the CPU. */
        __cpu_die(cpu);  调用底层架构相关的cpu_kill回调函数。

        /* CPU is completely dead: tell everyone.  Too late to complain. */
        tick_cleanup_dead_cpu(cpu);
        cpu_notify_nofail(CPU_DEAD | mod, hcpu);  通知完成offline动作的处理器状态为CPU_DEAD。

        check_for_tasks(cpu);

    out_release:
        cpu_hotplug_done();  释放cpu_hotplug.lock锁。
        trace_sched_cpu_hotplug(cpu, err, 0);
        if (!err)
            cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
        return err;
    }

    /* Requires cpu_add_remove_lock to be held */
    static int _cpu_up(unsigned int cpu, int tasks_frozen)
    {
        int ret, nr_calls = 0;
        void *hcpu = (void *)(long)cpu;
        unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
        struct task_struct *idle;

        cpu_hotplug_begin();  如果没有其他进程占有,则退出,执行后面的工作;如果被占用,则设置这个进程为TASK_INTERRUPTIBLE,等待结束。

        if (cpu_online(cpu) || !cpu_present(cpu)) {  如果该CPU已经online,则没有必要执行up;或者非present,则无法up。
            ret = -EINVAL;
            goto out;
        }

        idle = idle_thread_get(cpu);  给指定CPU生成一个idle线程
        if (IS_ERR(idle)) {
            ret = PTR_ERR(idle);
            goto out;
        }

        ret = smpboot_create_threads(cpu);  创建一个用于管理CPU hotplug动作的线程
        if (ret)
            goto out;

        ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);  通知cpu_chain中的处理器,当前正在online的CPU状态为CPU_UP_PREPARE。
        if (ret) {
            nr_calls--;
            pr_warn("%s: attempt to bring up CPU %u failed ",
                __func__, cpu);
            goto out_notify;
        }

        /* Arch-specific enabling code. */
        ret = __cpu_up(cpu, idle);  调用更底层的使能CPU操作。

        if (ret != 0)
            goto out_notify;
        BUG_ON(!cpu_online(cpu));

        /* Now call notifier in preparation. */
        cpu_notify(CPU_ONLINE | mod, hcpu);  通知cpu_chanin中的处理器,目前online动作的处理器的状态为CPU_ONLINE。

    out_notify:
        if (ret != 0)
            __cpu_notify(CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);
    out:
        cpu_hotplug_done();  释放cpu_hotplug.lock锁。
        trace_sched_cpu_hotplug(cpu, ret, 1);

        return ret;
    }

     

    RCU synchronize

    RCU即Read-Copy Update,是Linux内核比较成熟的新型读写锁,具有较高的读写并发性能,常常用在需要互斥的关键性能路径。

    在Kernel中,有两种类型实现tiny和tree,tiny rcu更简洁,常用在小型嵌入式系统中;tree rcu被广泛用在了server、desktop、android中。

    RCU的和心理链式读者访问的同时,写者可以更新访问对象的副本,但写者需要等待所有读者完成访问之后,才能删除老对象。这个过程实现的关键和难点在于如何判断所有的读者已经完成访问。通常把写者开始更新,到所有读者完成访问这段时间叫做宽限期(Grace Period)。内核中实现宽限期等待的函数是synchronize_rcu。

    synchronize_rcu_mult同时在call_rcu()函数列表的宽限期上等待,知道所有的都结束。

    总结:cpu_chain和rcu sync耗时大部是由外界因素决定的,如果cpu_chain或者call_rcu()列表很多,或者里面回调函数特别耗时,都会拉长CPU_OFF/CPU_ON时间。这部分的优化特别离散。

    参考文档:

    RCU synchronize原理分析 http://www.wowotech.net/kernel_synchronization/223.html

    synchronize_rcu()函数详解 http://blog.chinaunix.net/uid-20648784-id-1592811.html

    如何确定一个函数耗费时间?

    在函数中添加以下ftrace,可以得到执行时的timestamp,进程名称,函数名和对应的行数。

    trace_suspend_resume(TPS(__func__), __LINE__, true);

     

    结果如下:

    223.502950 |   1)    sh-2832     |               |                  /* suspend_resume: CPU_ON[4] begin */
    223.502953 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[513] begin */
    223.502957 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[516] begin */
    223.502959 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[522] begin */
    223.502969 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[529] begin */
    223.502973 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[534] begin */
    223.529988 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[544] begin */
    223.530382 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[552] begin */
    223.531451 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[559] begin */
    223.531454 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[563] begin */
    223.531456 |   1)    sh-2832     |               |                  /* suspend_resume: CPU_ON[4] end */

    在Excel中打开,可以轻松算出时间差。可知Line 534到Line 544之前耗费了最多时间。

    image

    详情请参考:

    cpu hotplug的流程 http://blog.csdn.net/u013686805/article/details/46942469

    Linux CPU core的电源管理(5)_cpu control及cpu hotplug http://www.wowotech.net/pm_subsystem/cpu_hotplug.html

    8 参考文档

    Power Management Support in Hikey (suspend-resume):http://www.96boards.org/forums/topic/power-management-support-in-hikey-suspend-resume/#gsc.tab=0

    Suspend to Idle:http://www.linaro.org/blog/suspend-to-idle/

    Suspend and Resume:https://01.org/zh/suspendresume

    SuspendAndResume github:https://github.com/arnoldlu/suspendresume

    Linux电源管理(6)_Generic PM之Suspend功能:http://www.wowotech.net/pm_subsystem/suspend_and_resume.html

    联系方式:arnoldlu@qq.com
  • 相关阅读:
    Failed to load module "canberra-gtk-module"
    [Ubuntu18]桌面美化-仿MAC主题
    [Ubuntu]18自定义切换输入法的快捷键
    2016-2017-1 《信息安全系统设计基础》 学生博客及Git@OSC 链接
    2015-2016-2 《Java程序设计》 游戏化
    2015-2016-2 《Java程序设计》项目小组博客
    博客引用书单
    2015-2016-2 《网络攻防实践》 学生博客
    2015-2016-2 《网络攻击与防范》 学生博客
    《网络攻防技术与实践》学习指导
  • 原文地址:https://www.cnblogs.com/sky-heaven/p/10287266.html
Copyright © 2011-2022 走看看