在自有模块的处理中,我们设计了一个内核线程去做gc,
但同时,我们又用到了rcu,rcu中也会去抢gc的锁,由于该锁用的spin_lock,而不是spin_lock_bh,并没有关软中断,所以在rcu上下文中拿不到锁,造成死锁。
[106251.128106] NMI watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [kflow_gcd:2791] [106251.129425] Modules linked in: newsendfile(OE) witdriver(OE) mysendmsg(OE) xfs libcrc32c fuse tipc(OE) ossmod(OE) iptable_filter mptctl mptbase bonding dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support skx_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure joydev sg mei_me mei shpchp i2c_i801 lpc_ich wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_pad acpi_cpufreq acpi_power_meter tcp_bbr sch_fq binfmt_misc ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ahci libahci libata [106251.129481] ixgbe(OE) i40e(OE) mpt3sas(OE) ptp raid_class pps_core scsi_transport_sas i2c_core dca [last unloaded: witdriver] [106251.129492] CPU: 45 PID: 2791 Comm: kflow_gcd Tainted: G W OEL ------------ 3.10.0-693.21.1.el7.x86_64 #1 [106251.129493] Hardware name: ZTE ZXCDN/SC621DI-16F, BIOS 1.0b 09/21/2017 [106251.129496] task: ffff8839a2e70000 ti: ffff884f3fb34000 task.ti: ffff884f3fb34000 [106251.129498] RIP: 0010:[<ffffffff811005ce>] [<ffffffff811005ce>] native_queued_spin_lock_slowpath+0x1ce/0x200 [106251.129508] RSP: 0018:ffff885fbdf43dd0 EFLAGS: 00000202 [106251.129510] RAX: 0000000000000001 RBX: 000000000000060c RCX: 0000000000000001 [106251.129512] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff885fbdf59b60 [106251.129513] RBP: ffff885fbdf43dd0 R08: 0000000000000101 R09: 0000000000000000 [106251.129515] R10: ffff88607fbc84a0 R11: ffffea00fba2ae00 R12: ffff885fbdf43d48 [106251.129517] R13: ffffffff816c6732 R14: ffff885fbdf43dd0 R15: ffff8852e87ccf40 [106251.129519] FS: 0000000000000000(0000) GS:ffff885fbdf40000(0000) knlGS:0000000000000000 [106251.129521] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106251.129523] CR2: 00007f582cb9816c CR3: 0000000001a0a000 CR4: 00000000003607e0 [106251.129525] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106251.129527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106251.129528] Call Trace: [106251.129530] <IRQ> [106251.129539] [<ffffffff816adeee>] queued_spin_lock_slowpath+0xb/0xf [106251.129546] [<ffffffff816bb080>] _raw_spin_lock+0x20/0x30 [106251.129555] [<ffffffffc055a3e5>] wit_inode_replace+0x3b5/0x400 [witdriver] [106251.129561] [<ffffffff8119243f>] ? free_pages.part.80+0x3f/0x50 [106251.129567] [<ffffffffc055a534>] free_wit_flow+0x104/0x150 [witdriver]----------------rcu中调用 [106251.129571] [<ffffffffc055a592>] release_wit_flow+0x12/0x20 [witdriver] [106251.129577] [<ffffffff81140340>] rcu_process_callbacks+0x1e0/0x580 [106251.129583] [<ffffffff8109404d>] __do_softirq+0xfd/0x290 [106251.129588] [<ffffffff816c8afc>] call_softirq+0x1c/0x30 [106251.129594] [<ffffffff8102d435>] do_softirq+0x65/0xa0 [106251.129597] [<ffffffff81094495>] irq_exit+0x175/0x180 [106251.129600] [<ffffffff816c9e88>] smp_apic_timer_interrupt+0x48/0x60 [106251.129603] [<ffffffff816c6732>] apic_timer_interrupt+0x162/0x170 [106251.129604] <EOI> [106251.129613] [<ffffffff81347655>] ? __list_del_entry+0x35/0xd0 [106251.129615] [<ffffffff813476fd>] list_del+0xd/0x30 [106251.129620] [<ffffffffc055fdea>] prune_wait_free_list+0x9a/0x1c0 [witdriver]----------sys中调用 [106251.129624] [<ffffffffc055ff65>] kflow_gcd_fn+0x55/0x1e0 [witdriver] [106251.129628] [<ffffffffc055ff10>] ? prune_wait_free_list+0x1c0/0x1c0 [witdriver] [106251.129634] [<ffffffff810b5241>] kthread+0xd1/0xe0 [106251.129637] [<ffffffff810b5170>] ? insert_kthread_work+0x40/0x40 [106251.129641] [<ffffffff816c5577>] ret_from_fork+0x77/0xb0 [106251.129644] [<ffffffff810b5170>] ? insert_kthread_work+0x40/0x40
void wit_unref_flow(void *flow) { struct wit_fq_flow *f = (struct wit_fq_flow*)flow; if ( !witdriver_init_done || !f ) return; if ( atomic_dec_and_test(&(f->users)) ) { call_rcu(&f->rcu_head, release_wit_flow); } }
由于call_rcu其实是在软中断中完成的,
crash> irq -b SOFTIRQ_VEC ACTION [0] ffffffff81092e40 <tasklet_hi_action> [1] ffffffff8109bcf0 <run_timer_softirq> [2] ffffffff81590d60 <net_tx_action> [3] ffffffff81592b80 <net_rx_action> [4] ffffffff81308250 <blk_done_softirq> [5] ffffffff81358f00 <irq_poll_softirq> [6] ffffffff81092d00 <tasklet_action> [7] ffffffff810d6610 <run_rebalance_domains> [9] ffffffff8113ea70 <rcu_process_callbacks>
release_wit_flow 会和gc流程抢一把锁,当gc流程拿了锁又被中断之后,执行到rcu的流程,执行 release_wit_flow 去抢锁,造成死锁。
所以需要加bh。
需要注意的是,这个软中断打印,很明显缺少[8],这个8对应的是啥呢?
crash> p softirq_to_name softirq_to_name = $5 = {0xffffffff8195816a "HI", 0xffffffff818f79c0 "TIMER", 0xffffffff81914a4c "NET_TX", 0xffffffff81914a53 "NET_RX", 0xffffffff819547d4 "BLOCK", 0xffffffff81914a8a "BLOCK_IOPOLL", 0xffffffff81914a63 "TASKLET", 0xffffffff81914a6b "SCHED", 0xffffffff81914a71 "HRTIMER", 0xffffffff81914a79 "RCU"}
看起来【8】对应的是 HRTIMER ,而 HRTIMER 其实是在硬中断中实现的,而不是在软中断中实现的,所以这个看起来应该是废弃了。
hrtimers - High-resolution kernel timers,在rh的3.10内核中,高精度定时器是通过硬中断来实现的, void __init hrtimers_init(void) { hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE, (void *)(long)smp_processor_id()); register_cpu_notifier(&hrtimers_nb); } 对应的中断处理堆栈可以参考如下: [ 6401.459859] [<ffffffff816b9d53>] _raw_spin_lock_bh+0x33/0x40 [ 6401.466087] [<ffffffffc0358971>] wit_hrtimer_pool_notify+0x181/0x3c0 [witdriver]-----这个是我们的timer的func [ 6401.474060] [<ffffffffc03587f0>] ? wit_send_tasklet+0x590/0x590 [witdriver] [ 6401.481600] [<ffffffff810b9096>] __hrtimer_run_queues+0xd6/0x260 [ 6401.488187] [<ffffffff810b962f>] hrtimer_interrupt+0xaf/0x1d0 [ 6401.494566] [<ffffffff8105467b>] local_apic_timer_interrupt+0x3b/0x60 [ 6401.501625] [<ffffffff816c8e83>] smp_apic_timer_interrupt+0x43/0x60 [ 6401.508511] [<ffffffff816c5732>] apic_timer_interrupt+0x162/0x170