捉虫日记 0015: Freescale 8360e Gadget 插电引起 Kernel panic

来自Jack's Lab
跳转到: 导航, 搜索

1 Phenomenon

环境:

  • Freescale 8360e
  • Linux 2.6.34.6

产品内核版本从 2.6.27 升级到 2.6.34.6,内核启动后手动加载 Gadget Ether 模块 g_ether.ko,成功后显示:

$ dmesg | tail

g_ether gadget: using random self ethernet address
g_ether gadget: using random host ethernet address
usb0: MAC 8a:13:2e:1b:03:4f
usb0: HOST MAC 4a:25:7f:fd:8f:c4
g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
g_ether gadget: g_ether ready
fsl_qe_udc e01006c0.usb: fsl_qe_udc bind to driver g_ether 


但一插 USB 电缆到 PC,内核立刻 panic:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc02a0d04
Oops: Kernel access of bad area, sig: 11 [#1]
NIP: c02a0d04LR: c02a0cd8 CTR: c029976c
REGS: c051bcc0 TRAP: 0300   Not tainted  (2.6.34.6-WR4.0.0.0_standard)
MSR: 00001032 <ME,IR,DR>  CR: 22008042  XER: 20000000
DAR: 00000000, DSISR: 20000000
TASK = c04ee450[0] 'swapper' THREAD: c051a000
GPR00: 00000003 c051bd70 c04ee450 0000003f 000022f5 ffffffff c0260f98 0000000
GPR08: 00000000 cf9679c0 00000000 c051a000 22008044 9a6c7e8a 00000001 2000000
GPR16: 40000000 00800000 00400000 cf98c0f0 0c000000 cf98c0d8 00000000 0000000
GPR24: c050d560 00000000 cf98c000 00000000 c050d560 cf98c000 cf9785c0 cf97860
NIP [c02a0d04] composite_setup+0xa68/0xb58
LR [c02a0cd8] composite_setup+0xa3c/0xb58
Call Trace:
[c051bd70] [c02a0cd8] composite_setup+0xa3c/0xb58 (unreliable)
[c051bdb0] [c029d23c] qe_udc_irq+0xbbc/0xe10
[c051be20] [c0092e48] handle_IRQ_event+0xb8/0x30c
[c051be70] [c0095f7c] handle_level_irq+0xb8/0x184
[c051be90] [c0030138] qe_ic_cascade_low_ipic+0x3c/0x50
[c051bea0] [c00064e4] native_do_IRQ+0x98/0xb4
[c051bec0] [c0005164] do_IRQ+0x10/0x20
[c051bed0] [c0015bb4] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0x88/0xec
    LR = cpu_idle+0x88/0xec
[c051bf90] [c0009b78] cpu_idle+0xe8/0xec (unreliable)
[c051bfb0] [c0003e90] rest_init+0xb0/0xe0
[c051bfc0] [c04b3890] start_kernel+0x304/0x318
[c051bff0] [00003438] 0x3438
Instruction dump:
4812b041 939e000c 3ac00000 3b200000 3ae00001 81380034 2f890000 419e00c4
801a0010 2f800003 419e00a0 81090008 <81680000> 3949003c 2f8b0000 419e0040
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[c051bc00] [c00089b0] show_stack+0x50/0x160 (unreliable)
[c051bc30] [c03cbc94] panic+0x128/0x1a8
[c051bc80] [c0012eb8] die+0x168/0x224
[c051bca0] [c0018fb0] bad_page_fault+0x90/0xc8
[c051bcb0] [c00159b8] handle_page_fault+0x7c/0x80
--- Exception: 300 at composite_setup+0xa68/0xb58
    LR = composite_setup+0xa3c/0xb58
[c051bdb0] [c029d23c] qe_udc_irq+0xbbc/0xe10
[c051be20] [c0092e48] handle_IRQ_event+0xb8/0x30c
[c051be70] [c0095f7c] handle_level_irq+0xb8/0x184
[c051be90] [c0030138] qe_ic_cascade_low_ipic+0x3c/0x50
[c051bea0] [c00064e4] native_do_IRQ+0x98/0xb4
[c051bec0] [c0005164] do_IRQ+0x10/0x20
[c051bed0] [c0015bb4] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0x88/0xec
    LR = cpu_idle+0x88/0xec
[c051bf90] [c0009b78] cpu_idle+0xe8/0xec (unreliable)
[c051bfb0] [c0003e90] rest_init+0xb0/0xe0
[c051bfc0] [c04b3890] start_kernel+0x304/0x318
[c051bff0] [00003438] 0x3438
Rebooting in 180 seconds..



2 Analysis

显然这是一个访问空指针的错误。根据 NIP 的值,定位到出错指令地址为 0xc02a0d04

反汇编内核,找到这个指令的所在:

c02a0cfc:       41 9e 00 a0     beq-    cr7,c02a0d9c <composite_setup+0xb00>
c02a0d00:       81 09 00 08     lwz     r8,8(r9)
c02a0d04:      81 68 00 00     lwz     r11,0(r8)
c02a0d08:       39 49 00 3c     addi    r10,r9,60

c02a0d0c:       2f 8b 00 00     cmpwi   cr7,r11,0


位于 composite_setup(),查看 drivers/usb/gadget/composite.c,函数体比较大,很难定位到具体出错 C 语句。再一次 objdump -S -d 反汇编 g_ether.o 文件(-S 表示让 objdump 在输出中穿插原始的 C 源码,因为整个内核的源码很大,所以我们只针对一个小文件即可)得到:

                if (gadget->speed == USB_SPEED_HIGH)
    3800:       80 1a 00 10     lwz     r0,16(r26)
    3804:       2f 80 00 03     cmpwi   cr7,r0,3
    3808:       41 9e 00 a0     beq-    cr7,38a8 <composite_setup+0xb00>
                        descriptors = f->hs_descriptors;
                else
                        descriptors = f->descriptors;
    380c:       81 09 00 08     lwz     r8,8(r9)

                for (; *descriptors; ++descriptors) {
    3810:      81 68 00 00     lwz     r11,0(r8)
                                continue;

                        ep = (struct usb_endpoint_descriptor *)*descriptors;
                        addr = ((ep->bEndpointAddress & 0x80) >> 3)
                             |  (ep->bEndpointAddress & 0x0f);

                        set_bit(addr, f->endpoints);


g_ether.o 无绝对地址,因此只能通过指令编码去对应,找到出错源码行是 'for (; *descriptors; ++descriptors) ',一看就知道是 *descriptors 出错,则往上找 descriptors 为何没有被赋值。


加了几条 printk,很快发现是 gadget->speed 的模式设置有误,导致 descriptors 拿错了值,其值应该是 f->descriptors 而不是 f->hs_descriptors















个人工具
名字空间

变换
操作
导航
工具箱