环境
CPU:Phytium,S2500/64 C00
内核版本:4.19.90-25.10
网讯网卡:txgbe
共两台设备,光纤直连
复现步骤
设备A、B分别执行以下操作,即可复现
modprobe fcoe
systemctl start lldpad
systemctl start fcoe
总结
重启问题是SCSI存储模块libfcoe中fcoe_ctlr_timer_work(drivers/scsi/fcoe/fcoe_ctlr.c)函数访问了非法的内存地址,地址异常原因是编码问题导致,使用结构体强制赋值而忽略了list指针成员的值。
调试记录
查看内核日志
麒麟4.19.90 25.10版本,开启CONFIG_FCOE后,加载txgbe系统发生重启,查看系统日志未发现异常信息,需要抓取串口信息
获取串口信息
串口信息抓取失败,内核在bios阶段串口日志正常,但内核阶段无输出,根据日志可知console enable时ttyS1还未创建。(4.19.90 25.10 51.40现象一样)
[root@compute ~]# dmesg | grep tty
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.19.90-51.0.v2207.fortest.ky10.aarch64 root=/dev/mapper/klas-root ro crashkernel=auto rd.lvm.lv=klas/root rd.lvm.lv=klas/swap acpi=on video=VGA-1:640x480-32@60me smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 crashkernel=1024M,high video=efifb:off video=VGA-1:640x480-32@60me console=ttyS1,115200 loglevel=7
[ 14.695352] 00:02: ttyS0 at MMIO 0x200002f8 (irq = 0, base_baud = 115200) is a 16550A
[ 15.478192] console [ttyS0] enabled
[ 15.479627] HISI0031:00: ttyS1 at MMIO 0x28001000 (irq = 7, base_baud = 3125000) is a 16550A
[ 90.445804] audit: type=1300 audit(1676610550.480:126): arch=c00000b7 syscall=105 success=yes exit=0 a0=aaaaf7e3bcb0 a1=2ab7 a2=aaaae0aff1c8 a3=aaaaf7e322d0 items=0 ppid=3107 pid=5502 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="modprobe" exe="/usr/bin/kmod" key=(null)
[ 182.268682] audit: type=1006 audit(1676610660.130:141): pid=5598 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
[ 333.116270] audit: type=1006 audit(1676610810.970:231): pid=6326 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=3 res=1
使用kdump
kdump部署命令:
yum install -y kexec-tools
systemctl restart kdump.service
kdumpctl restart
4.19.90 25.10 kdump 宕机
4.19.90 51.40 kdump 测试正常
kdump日志:
1950 [ 811.607882] Unable to handle kernel paging request at virtual address fffffffffffffed8
1951 [ 811.608613] Mem abort info:
1952 [ 811.608882] ESR = 0x96000005
1953 [ 811.609175] Exception class = DABT (current EL), IL = 32 bits
1954 [ 811.609697] SET = 0, FnV = 0
1955 [ 811.609989] EA = 0, S1PTW = 0
1956 [ 811.610285] Data abort info:
1957 [ 811.610576] ISV = 0, ISS = 0x00000005
1958 [ 811.610933] CM = 0, WnR = 0
1959 [ 811.611219] swapper pgtable: 64k pages, 48-bit VAs, pgdp = 000000003e78cc5f
1960 [ 811.611855] [fffffffffffffed8] pgd=0000000000000000, pud=0000000000000000
1961 [ 811.612454] Internal error: Oops: 96000005 [#1] SMP
1962 [ 811.612930] Modules linked in: qedf qed crc8 fcoe libfcoe libfc scsi_transport_fc xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_f ilter ip6_tables iptable_filter tun ebtable_filter ebtable_nat ebtables iptable_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi br_netfilter bridge 8021q garp mrp ipmi_ssif stp rfkill llc sunrpc vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif _ce ghash_ce joydev ch341 ses sha2_ce usbserial sha256_arm64 enclosure txgbe ngbe sha1_ce sbsa_gwdt ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ip_tables megaraid_sas ast dm_mirror dm_region_hash dm_log gb
1963 [ 811.618343] Process kworker/6:2 (pid: 1264, s