開發與維運

Linux 關於內存的案例

作者:牧原

實戰案例1

記一次內存充足但是java申請不到內存的排查

背景信息

客戶8g的實例,java使用4g的內存申請,直接oom

排查如下

image.png

oom的記錄顯示為申請4g內存失敗

4294967296 /1024 /1024 = 4096 M

1,第一反應是想起來之前的vm.min_free_kbytes & nr_hugepage導致的free大於available案例有關

centos7 memavailable 小於 memfree
二者的統計方式不一樣
 MemFree: The sum of LowFree+HighFree
+MemAvailable: An estimate of how much memory is available for starting new
+              applications, without swapping. Calculated from MemFree,
+              SReclaimable, the size of the file LRU lists, and the low
+              watermarks in each zone.
+              The estimate takes into account that the system needs some
+              page cache to function well, and that not all reclaimable
+              slab will be reclaimable, due to items being in use. The
+              impact of those factors will vary from system to system.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773
memfree統計的是所有內存的free內存,而memavailable統計的是可以拿來給程序用的內存,而客戶設置了vm.min_free_kbytes(2.5G),這個內存在free統計,但是不在memavailable統計
nr_hugepage也會有這個問題

2,跟客戶要 free -m && sysctl -p && /proc/meminfo等信息分析問題
HugePages_Total 為0 說明沒有設置nr_hugepage
MemAvailable: 7418172 kB 說明這麼多內存可用

image.png

#sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 1
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.core.netdev_max_backlog=10000
vm.overcommit_memory=2
net.ipv4.conf.all.arp_filter = 1 
net.ipv4.ip_local_port_range=1025 65535
kernel.msgmni = 2048
net.ipv6.conf.all.disable_ipv6=1
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_keepalive_time = 600
#cat /proc/meminfo
MemTotal:        8009416 kB
MemFree:         7347684 kB
MemAvailable:    7418172 kB
Buffers:           18924 kB
Cached:           262836 kB
SwapCached:            0 kB
Active:           315188 kB
Inactive:         222364 kB
Active(anon):     256120 kB
Inactive(anon):      552 kB
Active(file):      59068 kB
Inactive(file):   221812 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               176 kB
Writeback:             0 kB
AnonPages:        255804 kB
Mapped:            85380 kB
Shmem:               880 kB
Slab:              40660 kB
SReclaimable:      22240 kB
SUnreclaim:        18420 kB
KernelStack:        4464 kB
PageTables:         6512 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4004708 kB
Committed_AS:    2061568 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       21452 kB
VmallocChunk:   34359707388 kB
HardwareCorrupted:     0 kB
AnonHugePages:    126976 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0  
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      114560 kB
DirectMap2M:     4079616 kB
DirectMap1G:     6291456 kB

3,實際上面的meminfo已經說明了問題,但是由於經驗不足,一時沒有看明白怎麼回事,嘗試自行測試
使用java命令,去申請超出我的測試機物理內存嘗試,拿到報錯

[root@test ~]# java -Xmx8192M -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
[root@test ~]# java -Xms8192M -version
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000005c0000000, 5726797824, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 5726797824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/hs_err_pid8769.log
[root@test ~]# java -Xms4096M -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
[root@test ~]# java -Xms5000M -version
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000687800000, 3495428096, 0) failed; error='Cannot allocate memory' (errno=12)
......
---------------  S Y S T E M  ---------------
OS:CentOS Linux release 7.4.1708 (Core)
uname:Linux 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE 0k, NPROC 15088, NOFILE 65535, AS infinity
load average:0.05 0.05 0.05
/proc/meminfo:
MemTotal:        3881692 kB
MemFree:         2567724 kB
MemAvailable:    2968640 kB
Buffers:           69016 kB
Cached:           536116 kB
SwapCached:            0 kB
Active:           355280 kB
Inactive:         326020 kB
Active(anon):      87864 kB
Inactive(anon):    13296 kB
Active(file):     267416 kB
Inactive(file):   312724 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:         72200 kB
Mapped:            31232 kB
Shmem:             24996 kB
Slab:              63032 kB
SReclaimable:      51080 kB
SUnreclaim:        11952 kB
KernelStack:        1664 kB
PageTables:         4044 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1678700 kB
Committed_AS:    2282236 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       14280 kB
VmallocChunk:   34359715580 kB
HardwareCorrupted:     0 kB
AnonHugePages:     30720 kB
HugePages_Total:     256
HugePages_Free:      256
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       57216 kB
DirectMap2M:     3088384 kB
DirectMap1G:     3145728 kB
container (cgroup) information:
container_type: cgroupv1
cpu_cpuset_cpus: 0-1
cpu_memory_nodes: 0
active_processor_count: 2
cpu_quota: -1
cpu_period: 100000
cpu_shares: -1
memory_limit_in_bytes: -1
memory_and_swap_limit_in_bytes: -1
memory_soft_limit_in_bytes: -1
memory_usage_in_bytes: 697741312
memory_max_usage_in_bytes: 0
CPU:total 2 (initial active 2) (1 cores per cpu, 2 threads per core) family 6 model 79 stepping 1, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2
, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, bmi1, bmi2, adx
/proc/cpuinfo:
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 79
model name  : Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
stepping    : 1
microcode   : 0x1
cpu MHz     : 2500.036
cache size  : 40960 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 1
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl
eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust
 bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bogomips    : 5000.07
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 79
model name  : Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
stepping    : 1
microcode   : 0x1
cpu MHz     : 2500.036
cache size  : 40960 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 1
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl
eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust
 bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bogomips    : 5000.07
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
Memory: 4k page, physical 3881692k(2567600k free), swap 0k(0k free)
vm_info: OpenJDK 64-Bit Server VM (25.242-b08) for linux-amd64 JRE (1.8.0_242-b08), built on Jan 28 2020 14:28:22 by "mockbuild" with gcc 4.8.5 20150623 (Red Hat 4.8.5-39)
time: Thu Feb 20 15:13:30 2020
timezone: CST
elapsed time: 0 seconds (0d 0h 0m 0s)

4, java測試證明正常申請內存不會有問題,超額的內存才會oom,那麼為什麼超額呢,視線迴歸到
sysctl -p有所發現

vm.overcommit_memory=2
overcommit_memory
0 — 默認設置。:當應用進程嘗試申請內存時,內核會做一個檢測。內核將檢查是否有足夠的
可用內存供應用進程使用;如果有足夠的可用內存,內存申請允許;否則,內存申請失敗,並把錯誤返回給應用進程。
舉個例子,比如1G的機器,A進程已經使用了500M,當有另外進程嘗試malloc 500M的內存時,內核就會進行check,
發現超出剩餘可用內存,就會提示失敗。
1 — 對於內存的申請請求,內核不會做任何check,直到物理內存用完,觸發OOM殺用戶態進程。同樣是上面的例子,
1G的機器,A進程500M,B進程嘗試malloc 500M,會成功,但是一旦kernel發現內存使用率接近1個G(內核有策略),
就觸發OOM,殺掉一些用戶態的進程(有策略的殺)。
2 — 當 請求申請的內存 >= SWAP內存大小 + 物理內存 * N,則拒絕此次內存申請。解釋下這個N:N是一個百分比,
根據overcommit_ratio/100來確定,比如overcommit_ratio=50(我的測試機默認50%),那麼N就是50%。
vm.overcommit_ratio
只有當vm.overcommit_memory = 2的時候才會生效,內存可申請內存為
SWAP內存大小 + 物理內存 * overcommit_ratio/100
看看上面日誌的overcommit信息
CommitLimit:     4004708 kB  小於客戶申請的4096M
Committed_AS:    2061568 kB
CommitLimit:最大能分配的內存(測試下來在vm.overcommit_memory=2時候生效),具體的值是
SWAP內存大小(ecs均未開啟) + 物理內存 * overcommit_ratio / 100
Committed_AS:當前已經分配的內存大小

5,兩相對照,說明客戶設置的vm.overcommit_memory在生效,建議改回0再試試

vm.overcommit_memory = 2
[root@test ~]# grep -i commit /proc/meminfo
CommitLimit:     1940844 kB
Committed_AS:     480352 kB
# java -Xms2048M -version 失敗了
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000080000000, 1431830528, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1431830528 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/hs_err_pid1267.log
改回0 恢復
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
[root@test ~]# java -Xms2048M -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

Leave a Reply

Your email address will not be published. Required fields are marked *