freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ipmi-dcmi: page allocation failure


From: Benedikt Braunger
Subject: ipmi-dcmi: page allocation failure
Date: Wed, 28 Jul 2021 18:44:33 +0200

Greetings!

I have a bunch of Supermicro server which serve large ZFS storages. Once
in a while the report a strange page allocation failure when the
monitoring system uses ipmi-dcmi.
No OutOfMemory situation is visible and I cannot reproduce the error on
purpose yet.
All I found out about this is nvidia forum thread which describes a
similar error message during high I/O
https://forums.developer.nvidia.com/t/455-23-04-page-allocation-failure-in-kernel-module-at-random-points/155250
I can confirm that the errors appear when the server does a lot of I/O,
several hundred MiB/s.

I cannot see any actual problem for our workload on these machines but
still I wanted to report this in case anyone else might use the information.

Regards,
Beni


Further Information about system, OS and the trace:

product: X11SPi-TF
vendor: Supermicro

Firmware Revision: 01.73.03
Firmware Build Time: 06/30/2020
BIOS Version: 3.3
BIOS Build Time: 02/21/2020

CentOS 8.4
Kernel 4.18.0-193.el8.x86_64

Jun  8 04:55:40 bck-srv ipmi_exporter[3724813]:
time="2021-06-08T04:55:40+02:00" level=error msg="Error while calling
ipmi-dcmi for [local]: ipmi_ctx_find_inband: out of memory\n"
source="collector.go:278"||

Jun 08 04:55:40 bck-srv011201 kernel: ipmi-dcmi: page allocation
failure: order:4, mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
Jun 08 04:55:40 bck-srv kernel: ipmi-dcmi cpuset=/ mems_allowed=0
Jun 08 04:55:40 bck-srv kernel: CPU: 4 PID: 3348161 Comm: ipmi-dcmi
Tainted: P           OE    --------- -  - 4.18.0-193.el8.x86_64 #1
Jun 08 04:55:40 bck-srv kernel: Hardware name: Supermicro Super
Server/X11SPi-TF, BIOS 3.3 02/21/2020
Jun 08 04:55:40 bck-srv kernel: Call Trace:
Jun 08 04:55:40 bck-srv kernel:  dump_stack+0x5c/0x80
Jun 08 04:55:40 bck-srv kernel:  warn_alloc.cold.123+0x6f/0x101
Jun 08 04:55:40 bck-srv kernel:  ? _cond_resched+0x15/0x30
Jun 08 04:55:40 bck-srv kernel:  ? __alloc_pages_direct_compact+0x128/0x130
Jun 08 04:55:40 bck-srv kernel:  __alloc_pages_slowpath+0xccd/0xd00
Jun 08 04:55:40 bck-srv kernel:  ? enqueue_entity+0xf6/0x630
Jun 08 04:55:40 bck-srv kernel:  __alloc_pages_nodemask+0x245/0x280
Jun 08 04:55:40 bck-srv kernel:  kmalloc_order+0x14/0x30
Jun 08 04:55:40 bck-srv kernel:  kmalloc_order_trace+0x1d/0xa0
Jun 08 04:55:40 bck-srv kernel:  ipmi_create_user+0x5a/0x1e0
[ipmi_msghandler]
Jun 08 04:55:40 bck-srv kernel:  ? _cond_resched+0x15/0x30
Jun 08 04:55:40 bck-srv kernel:  ? kmem_cache_alloc_trace+0x140/0x1c0
Jun 08 04:55:40 bck-srv kernel:  ipmi_open+0x4d/0xd0 [ipmi_devintf]
Jun 08 04:55:40 bck-srv kernel:  chrdev_open+0xcb/0x1e0
Jun 08 04:55:40 bck-srv kernel:  ? cdev_default_release+0x20/0x20
Jun 08 04:55:40 bck-srv kernel:  do_dentry_open+0x132/0x330
Jun 08 04:55:40 bck-srv kernel:  path_openat+0x573/0x14d0
Jun 08 04:55:40 bck-srv kernel:  ? do_page_fault+0x32/0x110
Jun 08 04:55:40 bck-srv kernel:  do_filp_open+0x93/0x100
Jun 08 04:55:40 bck-srv kernel:  ? strncpy_from_user+0x7c/0x1b0
Jun 08 04:55:40 bck-srv kernel:  do_sys_open+0x184/0x220
Jun 08 04:55:40 bck-srv kernel:  do_syscall_64+0x5b/0x1a0
Jun 08 04:55:40 bck-srv kernel:  entry_SYSCALL_64_after_hwframe+0x65/0xca
Jun 08 04:55:40 bck-srv kernel: RIP: 0033:0x7fe8708c904f
Jun 08 04:55:40 bck-srv kernel: Code: 52 89 f0 25 00 00 41 00 3d 00 00
41 00 74 44 8b 05 06 d4 20 00 85 c0 75 65 89 f2 b8 01 01 00 00 48 89 fe
bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 9d 00 00 00 48 8b 4c 24
28 64 48 33 0c 25
Jun 08 04:55:40 bck-srv kernel: RSP: 002b:00007ffed94225c0 EFLAGS:
00000246 ORIG_RAX: 0000000000000101
Jun 08 04:55:40 bck-srv kernel: RAX: ffffffffffffffda RBX:
000055d126e47ec0 RCX: 00007fe8708c904f
Jun 08 04:55:40 bck-srv kernel: RDX: 0000000000000002 RSI:
00007fe870cc3720 RDI: 00000000ffffff9c
Jun 08 04:55:40 bck-srv kernel: RBP: 000055d126e47ea0 R08:
000055d126e481f0 R09: 000055d126e47eb0
Jun 08 04:55:40 bck-srv kernel: R10: 0000000000000000 R11:
0000000000000246 R12: 0000000000000000
Jun 08 04:55:40 bck-srv kernel: R13: 0000000000000000 R14:
0000000000000000 R15: 0000000000000007
Jun 08 04:55:40 bck-srv kernel: warn_alloc_show_mem: 3 callbacks suppressed
Jun 08 04:55:40 bck-srv kernel: Mem-Info:
Jun 08 04:55:40 bck-srv kernel: active_anon:665748 inactive_anon:440411
isolated_anon:0
                                       active_file:153083
inactive_file:92603 isolated_file:0
                                       unevictable:0 dirty:41
writeback:0 unstable:0
                                       slab_reclaimable:75153
slab_unreclaimable:392945
                                       mapped:62785 shmem:1044900
pagetables:2392 bounce:0
                                       free:4211681 free_pcp:0 free_cma:0
Jun 08 04:55:40 bck-srv kernel: Node 0 active_anon:2662992kB
inactive_anon:1761644kB active_file:612332kB inactive_file:370412kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:251140kB
dirty:164kB writeback:0kB shmem:4179600kB shmem_thp: 0kB
shmem_pmdmapped: 0kB>
Jun 08 04:55:40 bck-srv kernel: Node 0 DMA free:15360kB min:8kB low:20kB
high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15964kB
managed:15360kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB
free_>
Jun 08 04:55:40 bck-srv kernel: lowmem_reserve[]: 0 1336 95014 95014 95014
Jun 08 04:55:40 bck-srv kernel: Node 0 DMA32 free:374900kB min:948kB
low:2316kB high:3684kB active_anon:47800kB inactive_anon:30864kB
active_file:29832kB inactive_file:24268kB unevictable:0kB
writepending:0kB present:1725332kB managed:1397652kB mlocked:0kB
kernel_stack:0kB pa>
Jun 08 04:55:40 bck-srv kernel: lowmem_reserve[]: 0 0 93677 93677 93677
Jun 08 04:55:40 bck-srv kernel: Node 0 Normal free:16456464kB
min:66620kB low:162544kB high:258468kB active_anon:2615192kB
inactive_anon:1730780kB active_file:582500kB inactive_file:346144kB
unevictable:0kB writepending:164kB present:97517568kB managed:95932772kB
mlocked:0kB >
Jun 08 04:55:40 bck-srv kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 08 04:55:40 bck-srv kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) =
15360kB
Jun 08 04:55:40 bck-srv kernel: Node 0 DMA32: 57*4kB (UM) 180*8kB (UM)
131*16kB (UM) 68*32kB (UM) 39*64kB (UM) 25*128kB (UM) 17*256kB (UM)
13*512kB (M) 10*1024kB (M) 3*2048kB (UM) 82*4096kB (UM) = 374900kB
Jun 08 04:55:40 bck-srv kernel: Node 0 Normal: 9421*4kB (UME) 318223*8kB
(UME) 368527*16kB (UME) 249283*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 16456956kB
Jun 08 04:55:40 bck-srv kernel: Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 08 04:55:40 bck-srv kernel: Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 08 04:55:40 bck-srv kernel: 1263502 total pagecache pages
Jun 08 04:55:40 bck-srv kernel: 13 pages in swap cache
Jun 08 04:55:40 bck-srv kernel: Swap cache stats: add 349058, delete
349045, find 620/1882
Jun 08 04:55:40 bck-srv kernel: Free swap  = 8374524kB
Jun 08 04:55:40 bck-srv kernel: Total swap = 8388604kB
Jun 08 04:55:40 bck-srv kernel: 24814716 pages RAM
Jun 08 04:55:40 bck-srv kernel: 0 pages HighMem/MovableOnly
Jun 08 04:55:40 bck-srv kernel: 478270 pages reserved
Jun 08 04:55:40 bck-srv kernel: 0 pages hwpoisoned




reply via email to

[Prev in Thread] Current Thread [Next in Thread]