A lot of operating system vendors released recently an updated version of the Linux Kernel to fix an issue called Stack Clash (CVE-2017-1000364). But after applying the new Kernel version, the startup of the Grid Infrastructure fails.
Table of Contents
Situation
After applying the most recent Kernel update 3.10.0-514.21.2, the startup of the Grid Infrastructure fails with the following error messages.
[root] /u00/app/grid/product/12.2.0.1/bin/crsctl start has CLSU-00100: operating system function: waitpid failed with error data: 0 CLSU-00101: operating system error message: Error 0 CLSU-00103: error location: usrgetgrp12 CLSU-00104: additional error information: child returned 232 CRS-4000: Command Start failed, or completed with errors.
Check the blog post Stack Clash vs. Oracle Database: On the importance of testing Linux OS bug fixes and updates to get more information about the bug.
The Unbreakable Enterprise Kernel (UEK) provided by Oracle is not affected by this issue. As you can read from the blog post, Oracle test every update against there software stack to assure, that every Oracle software run without issues after applying an operating system patch.
Solution
To get your Grid Infrastructure stack back running, either switch to an older version of the Kernel or switch to the UEK kernel (for systems running Oracle Linux or RedHat Enterprise Linux).
Example: Updating Grub2 bootloader on Oracle Linux 7
- Get the list of all installed kernels
[root] grubby --info=ALL index=0 kernel=/boot/vmlinuz-3.10.0-514.21.2.el7.x86_64 args="ro vconsole.keymap=de crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_system/lv_swap rd.lvm.lv=vg_system/lv_root rhgb quiet LANG=en_US.UTF-8" root=/dev/mapper/vg_system-lv_root initrd=/boot/initramfs-3.10.0-514.21.2.el7.x86_64.img title=Oracle Linux Server 7.3, with Linux 3.10.0-514.21.2.el7.x86_64 ... index=3 kernel=/boot/vmlinuz-4.1.12-61.1.25.el7uek.x86_64 args="ro vconsole.keymap=de crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_system/lv_swap rd.lvm.lv=vg_system/lv_root rhgb quiet LANG=de_DE.UTF-8" root=/dev/mapper/vg_system-lv_root initrd=/boot/initramfs-4.1.12-61.1.25.el7uek.x86_64.img title=Oracle Linux Server 7.3, with Unbreakable Enterprise Kernel 4.1.12-61.1.25.el7uek.x86_64 ... index=6 kernel=/boot/vmlinuz-0-rescue-5add6b7950bc4b87aad1b67b2425e5ca args="ro vconsole.keymap=de crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_system/lv_swap rd.lvm.lv=vg_system/lv_root rhgb quiet" root=UUID=0a074993-43bd-47ce-b08c-5eba809bfb12 initrd=/boot/initramfs-0-rescue-5add6b7950bc4b87aad1b67b2425e5ca.img title=Oracle Linux Server, with Linux 0-rescue-5add6b7950bc4b87aad1b67b2425e5ca index=7 non linux entry
- Change the default kernel using the full path
[root] grubby --set-default /boot/vmlinuz-4.1.12-61.1.25.el7uek.x86_64
- Reboot the server
[root] reboot
References
- RedHat CVE Database – CVE-2017-1000364
- Stack Clash vs. Oracle Database: On the importance of testing Linux OS bug fixes and updates
- ALERT: Grid Infrastructure Fails to Start OHASD With Linux Kernel Version 3.10.0-514.21.2.EL7.X86_64 or Higher (Doc ID 2282371.1)
- OHASD fails to start with kernel version 3.10.0-514.21.2.el7.x86_64 (Doc ID 2281492.1)