捉虫日记 0015: Freescale 8360e Gadget 插电引起 Kernel panic
来自Jack's Lab
(版本间的差异)
(以“== Phenomenon == 环境: * Freescale 8360e * Linux 2.6.34.6 产品内核版本从 2.6.27 升级到 2.6.34.6,内核启动后手动加载 Gadget Ether 模块 g...”为内容创建页面) |
2014年5月19日 (一) 16:59的最后版本
[编辑] 1 Phenomenon
环境:
- Freescale 8360e
- Linux 2.6.34.6
产品内核版本从 2.6.27 升级到 2.6.34.6,内核启动后手动加载 Gadget Ether 模块 g_ether.ko,成功后显示:
$ dmesg | tail g_ether gadget: using random self ethernet address g_ether gadget: using random host ethernet address usb0: MAC 8a:13:2e:1b:03:4f usb0: HOST MAC 4a:25:7f:fd:8f:c4 g_ether gadget: Ethernet Gadget, version: Memorial Day 2008 g_ether gadget: g_ether ready fsl_qe_udc e01006c0.usb: fsl_qe_udc bind to driver g_ether
但一插 USB 电缆到 PC,内核立刻 panic:
Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc02a0d04 Oops: Kernel access of bad area, sig: 11 [#1] NIP: c02a0d04LR: c02a0cd8 CTR: c029976c REGS: c051bcc0 TRAP: 0300 Not tainted (2.6.34.6-WR4.0.0.0_standard) MSR: 00001032 <ME,IR,DR> CR: 22008042 XER: 20000000 DAR: 00000000, DSISR: 20000000 TASK = c04ee450[0] 'swapper' THREAD: c051a000 GPR00: 00000003 c051bd70 c04ee450 0000003f 000022f5 ffffffff c0260f98 0000000 GPR08: 00000000 cf9679c0 00000000 c051a000 22008044 9a6c7e8a 00000001 2000000 GPR16: 40000000 00800000 00400000 cf98c0f0 0c000000 cf98c0d8 00000000 0000000 GPR24: c050d560 00000000 cf98c000 00000000 c050d560 cf98c000 cf9785c0 cf97860 NIP [c02a0d04] composite_setup+0xa68/0xb58 LR [c02a0cd8] composite_setup+0xa3c/0xb58 Call Trace: [c051bd70] [c02a0cd8] composite_setup+0xa3c/0xb58 (unreliable) [c051bdb0] [c029d23c] qe_udc_irq+0xbbc/0xe10 [c051be20] [c0092e48] handle_IRQ_event+0xb8/0x30c [c051be70] [c0095f7c] handle_level_irq+0xb8/0x184 [c051be90] [c0030138] qe_ic_cascade_low_ipic+0x3c/0x50 [c051bea0] [c00064e4] native_do_IRQ+0x98/0xb4 [c051bec0] [c0005164] do_IRQ+0x10/0x20 [c051bed0] [c0015bb4] ret_from_except+0x0/0x14 --- Exception: 501 at cpu_idle+0x88/0xec LR = cpu_idle+0x88/0xec [c051bf90] [c0009b78] cpu_idle+0xe8/0xec (unreliable) [c051bfb0] [c0003e90] rest_init+0xb0/0xe0 [c051bfc0] [c04b3890] start_kernel+0x304/0x318 [c051bff0] [00003438] 0x3438 Instruction dump: 4812b041 939e000c 3ac00000 3b200000 3ae00001 81380034 2f890000 419e00c4 801a0010 2f800003 419e00a0 81090008 <81680000> 3949003c 2f8b0000 419e0040 Kernel panic - not syncing: Fatal exception in interrupt Call Trace: [c051bc00] [c00089b0] show_stack+0x50/0x160 (unreliable) [c051bc30] [c03cbc94] panic+0x128/0x1a8 [c051bc80] [c0012eb8] die+0x168/0x224 [c051bca0] [c0018fb0] bad_page_fault+0x90/0xc8 [c051bcb0] [c00159b8] handle_page_fault+0x7c/0x80 --- Exception: 300 at composite_setup+0xa68/0xb58 LR = composite_setup+0xa3c/0xb58 [c051bdb0] [c029d23c] qe_udc_irq+0xbbc/0xe10 [c051be20] [c0092e48] handle_IRQ_event+0xb8/0x30c [c051be70] [c0095f7c] handle_level_irq+0xb8/0x184 [c051be90] [c0030138] qe_ic_cascade_low_ipic+0x3c/0x50 [c051bea0] [c00064e4] native_do_IRQ+0x98/0xb4 [c051bec0] [c0005164] do_IRQ+0x10/0x20 [c051bed0] [c0015bb4] ret_from_except+0x0/0x14 --- Exception: 501 at cpu_idle+0x88/0xec LR = cpu_idle+0x88/0xec [c051bf90] [c0009b78] cpu_idle+0xe8/0xec (unreliable) [c051bfb0] [c0003e90] rest_init+0xb0/0xe0 [c051bfc0] [c04b3890] start_kernel+0x304/0x318 [c051bff0] [00003438] 0x3438 Rebooting in 180 seconds..
[编辑] 2 Analysis
显然这是一个访问空指针的错误。根据 NIP 的值,定位到出错指令地址为 0xc02a0d04
反汇编内核,找到这个指令的所在:
c02a0cfc: 41 9e 00 a0 beq- cr7,c02a0d9c <composite_setup+0xb00> c02a0d00: 81 09 00 08 lwz r8,8(r9) c02a0d04: 81 68 00 00 lwz r11,0(r8) c02a0d08: 39 49 00 3c addi r10,r9,60 c02a0d0c: 2f 8b 00 00 cmpwi cr7,r11,0
位于 composite_setup(),查看 drivers/usb/gadget/composite.c,函数体比较大,很难定位到具体出错 C 语句。再一次 objdump -S -d 反汇编 g_ether.o 文件(-S 表示让 objdump 在输出中穿插原始的 C 源码,因为整个内核的源码很大,所以我们只针对一个小文件即可)得到:
if (gadget->speed == USB_SPEED_HIGH) 3800: 80 1a 00 10 lwz r0,16(r26) 3804: 2f 80 00 03 cmpwi cr7,r0,3 3808: 41 9e 00 a0 beq- cr7,38a8 <composite_setup+0xb00> descriptors = f->hs_descriptors; else descriptors = f->descriptors; 380c: 81 09 00 08 lwz r8,8(r9) for (; *descriptors; ++descriptors) { 3810: 81 68 00 00 lwz r11,0(r8) continue; ep = (struct usb_endpoint_descriptor *)*descriptors; addr = ((ep->bEndpointAddress & 0x80) >> 3) | (ep->bEndpointAddress & 0x0f); set_bit(addr, f->endpoints);
g_ether.o 无绝对地址,因此只能通过指令编码去对应,找到出错源码行是 'for (; *descriptors; ++descriptors) ',一看就知道是 *descriptors 出错,则往上找 descriptors 为何没有被赋值。
加了几条 printk,很快发现是 gadget->speed 的模式设置有误,导致 descriptors 拿错了值,其值应该是 f->descriptors 而不是 f->hs_descriptors