Commit Graph

734 Commits

Author SHA1 Message Date
Omar Sandoval
1b47b866b4 libdrgn: go back to trusting PRSTATUS PID
Commit eea5422546 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
2020-07-08 18:34:16 -07:00
Omar Sandoval
4de147e478 Add CONTRIBUTING.rst
This documents best practices for contributing to drgn. We now require a
DCO sign-off.

Also clean up some related areas in the documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-07 17:44:02 -07:00
Omar Sandoval
293418294a libdrgn: assume compiler uses sane integer implementation
I once tried to implement a generic arithmetic right shift macro without
relying on any implementation-defined behavior, but this turned out to
be really hard. drgn is fairly tied to GCC and GCC-compatible compilers
(like Clang), so let's just assume GCC's model [1]: modular conversion
to signed types, two's complement signed bitwise operators, and sign
extension for signed right shift.

1: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
2020-07-07 17:18:17 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
e4c52c5422 libdrgn: linux_kernel: use names for kmod index constants
This makes it much easier to follow along with the code and understand
the format.
2020-06-30 15:14:21 -07:00
Omar Sandoval
03d8cb0e32 libdrgn: fix hash_pair_from_non_avalanching_hash() on 64-bit without SSE 4.2
We were forgetting to mask away the extra bits. There are two places
that we use the tag without converting it to a uint8_t:
hash_table_probe_delta(), which is mostly benign since we mask it by the
chunk mask anyways; and table_chunk_match() without SSE 2, which
completely breaks.

While we're here, let's align the comments better.
2020-06-24 13:33:08 -07:00
Omar Sandoval
8e7c1f1009 drgn 0.0.5 2020-05-26 10:04:06 -07:00
Omar Sandoval
a227d0d50e Update elfutils and revert activation frame patch
After thinking about it some more, I realized that "libdwfl: simplify
activation frame logic" breaks the case where during unwinding someone
queries isactivation for reasons other than knowing whether to decrement
program counter. Revert the patch and refactor "libdwfl: add interface
for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it
differently.

Based on:

c95081596 size: Also obey radix printing for bsd format.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-20 13:38:49 -07:00
Omar Sandoval
eea5422546 libdrgn: make Linux kernel stack unwinding more robust
drgn has a couple of issues unwinding stack traces for kernel core
dumps:

1. It can't unwind the stack for the idle task (PID 0), which commonly
   appears in core dumps.
2. It uses the PID in PRSTATUS, which is racy and can't actually be
   trusted.

The solution for both of these is to look up the PRSTATUS note by CPU
instead of PID.

For the live kernel, drgn refuses to unwind the stack of tasks in the
"R" state. However, the "R" state is running *or runnable*, so in the
latter case, we can still unwind the stack. The solution for this is to
look at on_cpu for the task instead of the state.
2020-05-20 12:03:00 -07:00
Omar Sandoval
146930aff8 libdrgn: replace arch frame_registers with callbacks
We currently unwind from pt_regs and NT_PRSTATUS using an array of
register definitions. It's more flexible and more efficient to do this
with an architecture-specific callback. For x86-64, this change also
makes us depend on the binary layout rather than member names of struct
pt_regs, but that shouldn't matter unless people are defining their own,
weird struct pt_regs.
2020-05-19 17:11:27 -07:00
Omar Sandoval
4d8597f0f8 libdrgn: add THREAD_SIZE to Linux kernel object finder
Despite the naming, this is the kernel stack size.
2020-05-19 17:10:54 -07:00
Omar Sandoval
971a2d3687 libdrgn/python: make Objects fully immutable
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
2020-05-18 00:07:49 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
1cc771605d tests: run black on stray change 2020-05-15 15:19:11 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
c339113f9c libdrgn: adjust program counter when looking up frame symbol
For functions that call a noreturn function, the compiler may omit code
after the call instruction. This means that the return address may not
lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is
an "activation", i.e., its program counter is guaranteed to lie within
the caller. This is only the case for the initial frame, frames
interrupted by a signal, and the signal trampoline frame. For everything
else, we need to decrement the program counter before doing any lookups.
2020-05-13 17:11:54 -07:00
Omar Sandoval
175f83fc23 Update elfutils with noreturn unwinding fix
Rebase on master and fix dwfl_frame_module/dwfl_frame_dwarf_frame to
decrement the program counter when necessary.

Based on:

a8493c12a libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: simplify activation frame logic
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-13 16:41:52 -07:00
Omar Sandoval
bf545105c6 libdrgn: build in silent mode by default
The automake/libtool compilation output is obnoxiously verbose. Switch
on automake's silent mode, and make the custom rules honor it.
2020-05-10 00:12:50 -07:00
Omar Sandoval
62ddc96350 setup.py: add 5.7 to vmtest kernels
5.7 is up to rc4 (oops). Better late than never.
2020-05-08 18:13:12 -07:00
Omar Sandoval
2d1481f5ab libdrgn: add page table walker kernel memory reader
Now that we can walk page tables, we can use it in a memory reader that
reads kernel memory via the kernel page table. This means that we don't
need libkdumpfile for ELF vmcores anymore (although I'll keep the
functionality around until this code has been validated more).
2020-05-08 17:37:56 -07:00
Omar Sandoval
e697be707c libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET
I originally wanted to avoid depending on another vmcoreinfo field, but
an the next change is going to depend on swapper_pg_dir in vmcoreinfo
anyways, and it ends up being simpler to use it.
2020-05-08 17:37:56 -07:00
Omar Sandoval
4b82d1e075 helpers: add cmdline() and environ()
These are two of the most common use cases for reading a process's
memory.
2020-05-08 17:37:56 -07:00
Omar Sandoval
8a276838ac helpers: add access_process_vm() and access_remote_vm()
Now that we can walk page tables, we can finally read memory from
userspace tasks.

Closes #53.
2020-05-08 17:37:01 -07:00
Omar Sandoval
d0a1718451 libdrgn: implement virtual address translation/page table walking
There are a few big use cases for this in drgn:

* Helpers for accessing memory in the virtual address space of userspace
  tasks.
* Removing the libkdumpfile dependency for vmcores.
* Handling gaps in the virtual address space of /proc/kcore (cf. #27).

I dragged my feet on implementing this because I thought it would be
more complicated, but the page table layout on x86-64 isn't too bad.
This commit implements page table walking using a page table iterator
abstraction. The first thing we'll add on top of this will be a helper
for reading memory from a virtual address space, but in the future it'd
also be possible to export the page table iterator directly.
2020-05-08 17:36:19 -07:00
Omar Sandoval
63299e0701 libdrgn: actually use uint64_t for two's complement unary ops
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
2020-05-08 13:50:24 -07:00
Omar Sandoval
8f81ea255f libdrgn: don't use unaligned loads to parse DWARF
-fsanitize=undefined reports that the read_u* helpers rely on unaligned
loads. Use memcpy() instead.
2020-05-08 13:50:24 -07:00
Omar Sandoval
3d59e042f4 libdrgn: don't open-code fls()
c_integer_literal() has an open-coded equivalent of fls() that assumes
that unsigned long long is 64 bits. Use fls() instead.
2020-05-08 00:20:42 -07:00
Omar Sandoval
340e00dfb5 libdrgn: improve and document bit operations
fls() can be implemented with __bitop(), and we can get rid of clz() since
it's only used by fls().
2020-05-08 00:14:25 -07:00
Omar Sandoval
f49d68d8f9 libdrgn: split generic utility functions out of internal.h
internal.h includes both drgn-specific helpers and generic utility
functions. Split the latter into their own util.h header and use it
instead of internal.h in the generic data structure code. This makes it
easier to copy the data structures into other projects/test programs.
2020-05-07 16:03:43 -07:00
Omar Sandoval
a95e42ef2e libdrgn/python: use vector for Program_load_debug_info()
Program_load_debug_info() is the last user of the
resize_array()/realloc_array() utility functions. We can clean it up by
using a vector and finally get rid of those functions.

This also happens to fix three bugs in Program_load_debug_info(): we
weren't setting a Python exception if we couldn't allocate the path_args
array, we weren't zeroing path_args after resizing the array, and we
weren't freeing the path_args array. Shame on whoever wrote this.
2020-05-07 15:47:57 -07:00
Omar Sandoval
0a100064c1 libdrgn: improve and rename DRGN_UNREACHABLE()
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.

DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
2020-05-07 15:16:22 -07:00
Omar Sandoval
d759c7ed20 libdrgn: get rid of OFF_MAX
This hasn't been used since commit 417a6f0d76 ("libdrgn: make memory
reader pluggable with callbacks").
2020-05-07 14:41:05 -07:00
Omar Sandoval
23574e59d5 libdrgn: add /proc/kcore physical segments on old kernels
Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so
it's currently not possible to read from physical memory on old kernels.
However, if we can figure out the address of the direct mapping, then we
can determine the corresponding physical addresses for the segments and
add them.
2020-05-04 13:20:27 -07:00
Omar Sandoval
f8c33518eb libdrgn: handle kernel core dumps with all zero p_paddr
We treat core dumps with all zero p_paddrs as not having valid physical
addresses. However, it is theoretically possible for a kernel core dump
to only have one segment which legitimately has a p_paddr of 0 (e.g., if
it only has a segment for the direct mapping, although note that this
isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the
BIOS [1]).

If the core dump has a VMCOREINFO note, then it is either a vmcore,
which has valid physical addresses, or it is /proc/kcore with Linux
kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to
/proc/kcore") (in v4.19), so it must also have Linux kernel commit
464920104bf7 ("/proc/kcore: update physical address for kcore ram and
text") (in v4.11) (ignoring the possibility of a franken-kernel which
backported the former but not the latter). Therefore, treat core dumps
with a VMCOREINFO note as having valid physical addresses.

1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678
2020-05-04 13:20:27 -07:00
Omar Sandoval
5505628235 libdrgn: get rid of struct drgn_program.num_file_segments
This isn't used anymore. Remove it and simplify the loop adding file
segments.
2020-05-04 13:20:27 -07:00
Omar Sandoval
510b6ef3b5 libdrgn: read vmcoreinfo physical address as uint64_t
Even on 32-bit architectures, physical addresses can be 64 bits (e.g.,
x86 with PAE). It's unlikely that the vmcoreinfo note would be in such
an address, but let's always parse it as a uint64_t just to be safe.
2020-05-04 13:20:27 -07:00
Omar Sandoval
10e58777c3 Add Program.read_{u8,u16,u32,u64,word}()
I've found that I do this manually a lot (e.g., when digging through a
task's stack). Add shortcuts for reading unsigned integers and a note
for how to manually read other formats.
2020-04-27 17:27:10 -07:00
Omar Sandoval
b1315fcaa1 libdrgn: add drgn_program_bswap()
This is clearer than open-coding the endianness check.
2020-04-27 17:08:02 -07:00
Omar Sandoval
02743b2da7 drgndoc: don't add :rtype: if function doesn't have docstring 2020-04-27 17:08:02 -07:00
Omar Sandoval
36d350bf93 Add scripts/cscope.sh
For generating the cscope index from the proper files.
2020-04-27 17:07:15 -07:00
Omar Sandoval
0d64d66c80 Add examples and tools to source distribution 2020-04-27 16:53:14 -07:00
Omar Sandoval
1af57230c4 drgn 0.0.4 2020-04-21 15:52:01 -07:00
Omar Sandoval
61a7a0c39d Fix Object.__getitem__() idx type annotation
In obj[idx], idx can also be another Object. Technically, it can be
anything that implements __index__() (), but typing.SupportsIndex was added
in v3.8. For now, allow Object explicitly.
2020-04-21 15:49:26 -07:00
Omar Sandoval
ea8cdede6d Fix Program.pointer_type() type annotation
"type" can be a string or Type object.
2020-04-21 15:03:33 -07:00
Omar Sandoval
55135d5854 Remove stray type comments
These are already emitted by drgndoc, so we don't need them.
2020-04-21 14:59:04 -07:00
Omar Sandoval
a070f1ca14 drgndoc: visit submodules in deterministic order
parse_package() visits submodules in whatever order they're returned by
the filesystem, which is arbitrary. Let's sort them so we always visit
them in the same order.
2020-04-21 10:44:52 -07:00
Omar Sandoval
ea3cdb43f7 libdrgn/python: optimize Object.__getattribute__()
DrgnObject_getattro() uses PyObject_GenericGetAttr() and catches the
AttributeError raised if the name is not an attribute of the Object
class. If the member is found, we then destroy the AttributeError.
Raising an exception only to destroy it is obviously wasteful. Luckily,
as of Python 3.7, the lower-level _PyObject_GenericGetAttrWithDict() can
suppress the AttributeError; we can raise it ourselves if we need it. In
my microbenchmarks, this makes Object.__getattribute__() at least twice
as fast when the member exists.

This also fixes a drgn_error leak.
2020-04-20 17:45:30 -07:00
Omar Sandoval
da0c637f2d Fix Object.__getattribute__() return type annotation
mypy only uses the annotated return type of __getattribute__() for
attributes which aren't otherwise annotated, so we can annotate it as
returning Object.
2020-04-20 12:37:55 -07:00
Omar Sandoval
7b08f7682e Fix mypy errors
There are a handful of mypy errors that I hadn't gotten around to
fixing. This cleans them up and makes drgn mypy-clean (although not in
strict mode).
2020-04-18 20:55:13 -07:00
Omar Sandoval
12a23a6e98 Add stubs for internal helpers
mypy complains about these functions not existing, so add stubs to let
mypy know they exist. In the future, we probably want the docstring to
be in these stubs and use them directly instead of wrapping them in
Python, but that's for another day.
2020-04-18 15:10:29 -07:00