libdrgn: x86_64: unwind call when pc is 0

While using drgn for debugging a linux vmcore, a stack trace was found
to contain a single frame of "0: 0x0". It seemed peculiar since the
corresponding panic dump in kmsg showed a stack trace of multiple frames
with resolved symbols. Digging into the drgn stack unwinding functions,
it was found that there was no way of handling a program counter (RIP on
x86) of zero. The main unwinding function would return an error leading
to an attempt to call a fallback unwind function. Even though the
fallback routine was attempted, it did not help since the fallback tries
to make use of frame pointers. In the case where the kernel is compiled
without frame pointers, the fallback function would also error out due
to an attempt to read from an unmapped register.

A program counter of zero can be assumed to be a call to a null function
pointer. Using this as a basis, when in the absence of frame pointers,
the call to a null address can be unwound in a special way. Registers
can be slightly adjusted so that unwinding is possible. First, the
program counter needs to point somewhere it can be unwound. At the time
of the null call, RSP will contain the return address so if we take that
RSP and assign it to RIP, we effectively put the program counter after
the null call. Next, the top of the stack frame needs to change to the
top of the calling frame. This is done by incrementing RSP. At this
point, unwinding can continue and any frames below the null call will be
included in the trace. This special unwinding operation is done in the
new function "unwind_call()". To solve the problem of being unable to
handle a zero pc when frame pointers are not present, the fallback
unwind function is updated to check for a zero pc and call the new
function if needed.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
This commit is contained in:
inwardvessel 2023-08-15 15:17:13 -07:00 committed by Omar Sandoval
parent a397b3a990
commit 412ce956b0

View File

@ -243,6 +243,44 @@ get_registers_from_frame_pointer(struct drgn_program *prog,
return NULL;
}
// Unwind from a call instruction, assuming that nothing else has been changed
// since.
static struct drgn_error *unwind_call(struct drgn_program *prog,
struct drgn_register_state *regs,
struct drgn_register_state **ret)
{
struct drgn_error *err;
struct optional_uint64 rsp =
drgn_register_state_get_u64(prog, regs, rsp);
if (!rsp.has_value)
return &drgn_stop;
// Read the return address from the top of the stack.
uint64_t ret_addr;
err = drgn_program_read_u64(prog, rsp.value, false, &ret_addr);
if (err) {
if (err->code == DRGN_ERROR_FAULT) {
drgn_error_destroy(err);
err = &drgn_stop;
}
return err;
}
// Most of the registers are unchanged.
struct drgn_register_state *tmp = drgn_register_state_dup(regs);
if (!tmp)
return &drgn_enomem;
// The PC and rip are the return address we just read.
drgn_register_state_set_pc(prog, tmp, ret_addr);
drgn_register_state_set_from_u64(prog, tmp, rip, ret_addr);
// rsp is after the saved return address.
drgn_register_state_set_from_u64(prog, tmp, rsp, rsp.value + 8);
*ret = tmp;
return NULL;
}
static struct drgn_error *
fallback_unwind_x86_64(struct drgn_program *prog,
struct drgn_register_state *regs,
@ -250,6 +288,13 @@ fallback_unwind_x86_64(struct drgn_program *prog,
{
struct drgn_error *err;
// If the program counter is 0, it's likely that a NULL function pointer
// was called. Assume that the only thing we need to unwind is a single
// call instruction.
struct optional_uint64 pc = drgn_register_state_get_pc(regs);
if (pc.has_value && pc.value == 0)
return unwind_call(prog, regs, ret);
struct optional_uint64 rbp =
drgn_register_state_get_u64(prog, regs, rbp);
if (!rbp.has_value)