ORA-00800: soft external error, arguments: [Set Priority Failed], [VKTM], [Check traces and OS configuration]

While adding a new standby node to the existing DG configuration noticed the following in the new standby databases's alert log.

Starting background process VKTM
2021-03-30T00:08:42.781044+10:00
Errors in file /opt/app/oracle/diag/rdbms/dbx12/dbx12/trace/dbx12_vktm_32410.trc  (incident=41):
ORA-00800: soft external error, arguments: [Set Priority Failed], [VKTM], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
Incident details in: /opt/app/oracle/diag/rdbms/dbx12/dbx12/incident/incdir_41/dbx12_vktm_32410_i41.trc
2021-03-30T00:08:42.782979+10:00
Error attempting to elevate VKTM's priority: no further priority changes will be attempted for this process
VKTM started with pid=5, OS id=32410

MOS Doc 2718971.1 gives a workaround for this issue (seems this doc has now been made internal).
The problme was related to cgroup setup. In a database setup where everythig is working fine the cgroup for the VKTM would have /. For example if VKTM PID is 5207 then following could be be used to find otu cgroup setting

cat /proc/5207/cgroup | grep cpu
10:cpu,cpuacct:/
6:cpuset:/

Any other setting would mean VKTM would run into above issue.
One of the solutions is to set the hidden parameter _high_priority_processes='VKTM'. But this was already set to TRUE so no going to be a solution.
The problem server had the cgroup setting as following.

# ps -eaf | grep -i vktm | grep -v grep
oracle    3315     1  0 17:53 ?        00:00:00 ora_vktm_dbx12
oracle    3357     1  0 14:19 ?        00:01:40 asm_vktm_+ASM
# cat /proc/3315/cgroup | grep cpu
11:cpuset:/
6:cpu,cpuacct:/user.slice

So the next workround was to set the kernel parameter as below.

# echo 0 > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us
# echo 950000 > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_us

This seem to resolve the situation and was able stop and start the standby instance without the above error message appaering on the alert log.

However, the question reamin why did the cgroup change. This was the first incident of facing this error. Since this is adding a standby database server to exisitng setup there was a reference point to check against any changes. So first the OS. The "good" servers had OEL 7.9 while the "problem" server had OEL 7.7. Both servers are Azure VMs
Then decided to check inside cgroup setting. The good server had following (no user.slice)

 ls -l /sys/fs/cgroup/cpu,cpuacct/
drwxr-xr-x. 3 root root 0 Apr  7 09:32 WALinuxAgent
-rw-r--r--. 1 root root 0 Apr  7 09:32 tasks
-rw-r--r--. 1 root root 0 Apr  7 09:32 cgroup.procs
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.cfs_period_us
-r--r--r--. 1 root root 0 Apr  7 09:32 cgroup.sane_behavior
-r--r--r--. 1 root root 0 Apr  7 09:32 cpu.stat
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_percpu_sys
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.shares
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_percpu
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.stat
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.cfs_quota_us
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_sys
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_all
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_percpu_user
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.rt_runtime_us
-rw-r--r--. 1 root root 0 Apr  7 09:32 notify_on_release
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.rt_period_us
-rw-r--r--. 1 root root 0 Apr  7 09:32 release_agent
-rw-r--r--. 1 root root 0 Apr  7 09:32 cgroup.clone_children
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_user

While the problem server had the following

drwxr-xr-x.  2 root root 0 Apr  7 09:29 auoms
drwxr-xr-x.  2 root root 0 Apr  7 09:29 auomscollect
-rw-r--r--.  1 root root 0 Apr  7 09:29 cgroup.clone_children
-rw-r--r--.  1 root root 0 Apr  7 09:29 cgroup.procs
-r--r--r--.  1 root root 0 Apr  7 09:29 cgroup.sane_behavior
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.stat
-rw-r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage_all
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage_percpu
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage_percpu_sys
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage_percpu_user
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage_sys
-r--r--r--.  1 root root 0 Apr  7 09:29 cpuacct.usage_user
-rw-r--r--.  1 root root 0 Apr  7 09:29 cpu.cfs_period_us
-rw-r--r--.  1 root root 0 Apr  7 09:29 cpu.cfs_quota_us
-rw-r--r--.  1 root root 0 Apr  7 09:29 cpu.rt_period_us
-rw-r--r--.  1 root root 0 Apr  7 09:29 cpu.rt_runtime_us
-rw-r--r--.  1 root root 0 Apr  7 09:29 cpu.shares
-r--r--r--.  1 root root 0 Apr  7 09:29 cpu.stat
-rw-r--r--.  1 root root 0 Apr  7 09:29 notify_on_release
-rw-r--r--.  1 root root 0 Apr  7 09:29 release_agent
drwxr-xr-x. 69 root root 0 Apr  7 09:29 system.slice
-rw-r--r--.  1 root root 0 Apr  6 14:35 tasks
drwxr-xr-x.  2 root root 0 Apr  7 09:29 user.slice
drwxr-xr-x.  2 root root 0 Apr  7 09:29 WALinuxAgent

Beside user.slice the auoms* seem to be the different between the two. Auoms is Azure management agent plugin. Could this be the reason why cgroup has a user.slice? To test this out disableed auoms and restarted the server.

# systemctl stop auoms.service
# systemctl disable auoms.service
Removed symlink /etc/systemd/system/multi-user.target.wants/auoms.service.
Removed symlink /etc/systemd/system/auoms.service.
# /sbin/reboot

When the server restart the cgroup didn't have a user.slice

drwxr-xr-x. 3 root root 0 Apr  7 09:32 WALinuxAgent
-rw-r--r--. 1 root root 0 Apr  7 09:32 tasks
-rw-r--r--. 1 root root 0 Apr  7 09:32 cgroup.procs
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.cfs_period_us
-r--r--r--. 1 root root 0 Apr  7 09:32 cgroup.sane_behavior
-r--r--r--. 1 root root 0 Apr  7 09:32 cpu.stat
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_percpu_sys
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.shares
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_percpu
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.stat
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.cfs_quota_us
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_sys
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_all
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_percpu_user
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.rt_runtime_us
-rw-r--r--. 1 root root 0 Apr  7 09:32 notify_on_release
-rw-r--r--. 1 root root 0 Apr  7 09:32 cpu.rt_period_us
-rw-r--r--. 1 root root 0 Apr  7 09:32 release_agent
-rw-r--r--. 1 root root 0 Apr  7 09:32 cgroup.clone_children
-r--r--r--. 1 root root 0 Apr  7 09:32 cpuacct.usage_user

DB was starting without the VKTM related error on alert log (earlier set kernel parameters were not set to be persistent). VKTM had the / for cgroup.

 ps ax  | grep vktm
 3314 ?        Ss     0:03 asm_vktm_+ASM
 3506 ?        Ss     0:03 ora_vktm_dbx12
 4664 pts/0    S+     0:00 grep --color=auto vktm

# cat /proc/3506/cgroup | grep cpu
6:cpu,cpuacct:/
4:cpuset:/

There were several instances to be added to the DG and on other servers the OS was upgraded to 7.9 from 7.7. This upgrade seem to be remove the auoms and there were no cgroup issues.
If facing similar issues first check what has caused the cgroup change before attempting hidden parameter or kernel parameter workarounds.

ORA-00800: soft external error, arguments: [Set Priority Failed], [VKTM], [Check traces and OS configuration]

Trending Articles

Prison officer charged!

Angry father ordered to compensate daughter’s male friend

Download: Rich Bizzy -Panono Ukwenda (Cover)

Anthony Wahome Biography, Family, Wife and Children

Best 5 Happy Mothers Day Poems For Step Mother

Practice Sheet of Right form of verbs for HSC Students

Hyper-V replication "Enabling Replication Failed"

DMG Audio Limitless v1.01 WiN/OSX Incl Patched and Keygen-R2R

Stories • Goddess Stepmom

Sri Lankan Actress Nadeesha Hemamali Hot Shoot

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Who’s been sentenced from Corby, Kettering, Ringstead, Rothwell, Rushden,...

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

Jamani mm nauliza hivi second selection za form five zinatoka lini?

JESSIE ROGERSON ON JULY 10, 20...

Madonna – Behind Me (feat. Guido Dos Santos) – Single [iTunes Plus M4A]

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Joseph Bradley – Carlisle

Laura Pausini - Platinum Collection (3Cd) (2009) .mp3 - 320 Kbps