Commit Graph

50 Commits

Author SHA1 Message Date
Omar Sandoval
de6a4e07ae libdrgn: fix Doxygen
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-30 01:32:33 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
f83bb7c71b libdrgn: move debugging information tracking into drgn_debug_info
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.

This also reworks a couple of details of loading debugging information:

- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
  a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
  compilation units (by using OpenMP tasks).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
3ac9ae357b libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info
The current name is too verbose. Let's go with a shorter, more generic
name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-11 17:41:23 -07:00
Omar Sandoval
ff96c75da0 helpers: translate task_state_to_char() to Python
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 13:54:39 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
1c8181e22d libdrgn: rearrange struct drgn_program members
struct drgn_program has a bunch of state scattered around. Group it
together more logically, even if it means sacrificing some padding here
and there.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:34:44 -07:00
Omar Sandoval
d4e0771f87 libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}()
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
1b47b866b4 libdrgn: go back to trusting PRSTATUS PID
Commit eea5422546 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
2020-07-08 18:34:16 -07:00
Omar Sandoval
eea5422546 libdrgn: make Linux kernel stack unwinding more robust
drgn has a couple of issues unwinding stack traces for kernel core
dumps:

1. It can't unwind the stack for the idle task (PID 0), which commonly
   appears in core dumps.
2. It uses the PID in PRSTATUS, which is racy and can't actually be
   trusted.

The solution for both of these is to look up the PRSTATUS note by CPU
instead of PID.

For the live kernel, drgn refuses to unwind the stack of tasks in the
"R" state. However, the "R" state is running *or runnable*, so in the
latter case, we can still unwind the stack. The solution for this is to
look at on_cpu for the task instead of the state.
2020-05-20 12:03:00 -07:00
Omar Sandoval
4d8597f0f8 libdrgn: add THREAD_SIZE to Linux kernel object finder
Despite the naming, this is the kernel stack size.
2020-05-19 17:10:54 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
2d1481f5ab libdrgn: add page table walker kernel memory reader
Now that we can walk page tables, we can use it in a memory reader that
reads kernel memory via the kernel page table. This means that we don't
need libkdumpfile for ELF vmcores anymore (although I'll keep the
functionality around until this code has been validated more).
2020-05-08 17:37:56 -07:00
Omar Sandoval
e697be707c libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET
I originally wanted to avoid depending on another vmcoreinfo field, but
an the next change is going to depend on swapper_pg_dir in vmcoreinfo
anyways, and it ends up being simpler to use it.
2020-05-08 17:37:56 -07:00
Omar Sandoval
d0a1718451 libdrgn: implement virtual address translation/page table walking
There are a few big use cases for this in drgn:

* Helpers for accessing memory in the virtual address space of userspace
  tasks.
* Removing the libkdumpfile dependency for vmcores.
* Handling gaps in the virtual address space of /proc/kcore (cf. #27).

I dragged my feet on implementing this because I thought it would be
more complicated, but the page table layout on x86-64 isn't too bad.
This commit implements page table walking using a page table iterator
abstraction. The first thing we'll add on top of this will be a helper
for reading memory from a virtual address space, but in the future it'd
also be possible to export the page table iterator directly.
2020-05-08 17:36:19 -07:00
Omar Sandoval
5505628235 libdrgn: get rid of struct drgn_program.num_file_segments
This isn't used anymore. Remove it and simplify the loop adding file
segments.
2020-05-04 13:20:27 -07:00
Omar Sandoval
b1315fcaa1 libdrgn: add drgn_program_bswap()
This is clearer than open-coding the endianness check.
2020-04-27 17:08:02 -07:00
Omar Sandoval
7a9fad0fd2 libdrgn: move _vmemmap() to object finder
Similarly to PAGE_OFFSET, vmemmap makes more sense as part of the Linux
kernel object finder than an internal helper.

While we're here, let's fix the definition for 5-level page tables. This
only matters for kernels with commit 77ef56e4f0fb ("x86: Enable 5-level
paging support via CONFIG_X86_5LEVEL=y") but without eedb92abb9bb
("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y")
(namely, v4.14, v4.15, and v4.16); since v4.17, 5-level page table
support enables KASLR.
2020-04-10 15:33:29 -07:00
Omar Sandoval
5ac95e491a libdrgn: fix _page_offset() helper and move to object finder
The internal _page_offset() helper gets the value of PAGE_OFFSET, but
the fallback when KASLR is disabled has been out of date since Linux
v4.20 and never handled 5-level page tables. Additionally, it makes more
sense as part of the Linux kernel (formerly vmcoreinfo) object finder so
that it's cleanly accessible outside of drgn internals.
2020-04-10 15:33:27 -07:00
Omar Sandoval
1dbc718840 helpers: add pgtable_l5_enabled() 2020-04-10 15:18:46 -07:00
Serapheim Dimitropoulos
08193a97aa Support stack traces for running threads on kdumps 2020-03-27 16:12:03 -07:00
Jay Kamat
3f870603fa libdrgn: add default language to drgn_program
For operations where we don't have a type available, we currently fall
back to C. Instead, we should guess the language of the program and use
that as the default. The heurisitic implemented here gets the language
of the CU containing "main" (except for the Linux kernel, which is
always C). In the future, we should allow manually overriding the
automatically determined language.
2020-02-26 19:55:42 -08:00
Omar Sandoval
a5cd92f24e libdrgn: make vmcoreinfo accessible before loading debug info
UTS_RELEASE is currently only accessible once debug info is loaded with
prog.load_debug_info(main=True). This makes it difficult to get the
release, find the appropriate vmlinux, then load the found vmlinux. We
can add vmcoreinfo_object_find as part of set_core_dump(), which makes
it possible to do the following:

  prog = drgn.Program()
  prog.set_core_dump(core_dump_path)
  release = prog['UTS_RELEASE'].string_()
  vmlinux_path = find_vmlinux(release)
  prog.load_debug_info([vmlinux_path])

The only downside is that this ends up using the default definition of
char rather than what we would get from the debug info, but that
shouldn't be a big problem.
2020-02-19 12:11:45 -08:00
Jay Kamat
054cb54a01 libdrgn: Rename find_symbol to find_symbol_by_address 2020-02-12 14:06:49 -08:00
Omar Sandoval
0a707b0c9d libdrgn: rework drgn_find_symbol_internal()
Instead of having two internal variants (drgn_find_symbol_internal() and
drgn_program_find_symbol_in_module()), combine them into the former and
add a separate drgn_error_symbol_not_found() for translating the static
error to the user-facing one. This makes things more flexible for the
next change.
2019-12-19 11:43:54 -08:00
Omar Sandoval
326107f054 libdrgn: add task_state_to_char() helper
Add a helper to get the state of a task (e.g., 'R', 'S', 'D'). This will
be used to make sure that a task is not running when getting a stack
trace, so implement it in libdrgn.
2019-10-28 13:37:57 -07:00
Omar Sandoval
91f5c8e2e7 libdrgn: stack_trace: support unwinding stack from thread ID
When debugging the Linux kernel, it's inconvenient to have to get the
task_struct of a thread in order to get its stack trace. This adds
support for looking it up solely by PID. In that case, we do the
find_task() inside of libdrgn. This also gives us stack trace support
for userspace core dumps almost for free since we already added support
for NT_PRSTATUS.
2019-10-28 13:37:53 -07:00
Omar Sandoval
0f7ad0ed26 libdrgn: stack_trace: support unwinding stack from core dump
vmcores include a NT_PRSTATUS note for each CPU containing the PID of
the task running on that CPU at the time of the crash and its registers.
We can use that to unwind the stack of the crashed tasks.
2019-10-28 13:36:02 -07:00
Omar Sandoval
75c74022ff libdrgn: save Elf handle for core dump
Currently, we close the Elf handle in drgn_set_core_dump() after we're
done with it. However, we need the Elf handle in
userspace_report_debug_info(), so we reopen it temporarily. We will also
need it to support getting stack traces from core dumps, so we might as
well keep it open. Note that we keep it even if we're using libkdumpfile
because libkdumpfile doesn't seem to have an API to access ELF notes.
2019-10-28 13:09:25 -07:00
Omar Sandoval
4fb0e2e110 libdrgn: use new libdwfl stack trace API 2019-10-18 14:34:11 -07:00
Omar Sandoval
423d2cd500 libdrgn: dwarf_index: rework file reporting
Currently, the interface between the DWARF index, libdwfl, and the code
which finds and reports vmlinux/kernel modules is spaghetti. The DWARF
index tracks Dwfl_Modules via their userdata. However, despite
conceptually being owned by the DWARF index, the reporting code reports
the Dwfl_Modules and sets up the userdata. These Dwfl_Modules and
drgn_dwfl_module_userdatas are messy to track and pass between the
layers.

This reworks the architecture so that the DWARF index owns the Dwfl
instance and files are reported to the DWARF index; the DWARF index
takes care of reporting to libdwfl internally. In addition to making the
interface for the reporter much cleaner, this improves a few things as a
side-effect:

- We now deduplicate on build ID in addition to path.
- We now skip searching for vmlinux and/or kernel modules if they were
  already indexed.
- We now support compressed ELF files via libdwelf.
- We can now load default debug info at the same time as additional
  debug info.
2019-10-02 17:22:11 -07:00
Omar Sandoval
10142f922f Add basic stack trace support
For now, we only support stack traces for the Linux kernel (at least
v4.9) on x86-64, and we only support getting the program counter and
corresponding function symbol from each stack frame.
2019-08-02 00:26:28 -07:00
Serapheim Dimitropoulos
93d7ea9f01 Add support for kdump-compressed core dumps with libkdumpfile 2019-08-02 00:20:16 -07:00
Omar Sandoval
690b5fd650 libdrgn: generalize architecture to platform
For stack trace support, we'll need to have some architecture-specific
functionality. drgn's current notion of an architecture doesn't actually
include the instruction set architecture. This change expands it to a
"platform", which includes the ISA as well as the existing flags.
2019-08-02 00:11:56 -07:00
Omar Sandoval
71e6744210 libdrgn: add symbol table interface
Now that we're not overloading the name "symbol", we can define struct
drgn_symbol as a symbol table entry. For now, this is very minimal: it's
just a name, address, and size. We can then add a way to find the symbol
for a given address, drgn_program_find_symbol(). For now, this is only
supported through the actual ELF symbol tables. However, in the future,
we can probably support adding "symbol finders".
2019-07-30 09:25:34 -07:00
Omar Sandoval
0c5df56fba libdrgn: replace symbol index with object index
struct drgn_symbol doesn't really represent a symbol; it's just an
object which hasn't been fully initialized (see c2be52dff0 ("libdrgn:
rename object index to symbol index"), it used to be called a "partial
object"). For stack traces, we're going to have a notion of a symbol
that more closely represents an ELF symbol, so let's get rid of the
temporary struct drgn_symbol representation and just return an object
directly.
2019-07-29 17:04:47 -07:00
Omar Sandoval
27a27940bc libdrgn: split up drgn_program_get_dwarf()
We don't need to get the DWARF index at the time we get the Dwfl handle,
so get rid of drgn_program_get_dwarf(), add drgn_program_get_dwfl(), and
create the DWARF index right before we update in a new function,
drgn_program_update_dwarf_index().
2019-07-19 09:26:30 -07:00
Omar Sandoval
e5874ad18a libdrgn: use libdwfl
libdwfl is the elfutils "DWARF frontend library". It has high-level
functionality for looking up symbols, walking stack traces, etc. In
order to use this functionality, we need to report our debugging
information through libdwfl. For userspace programs, libdwfl has a much
better implementation than drgn for automatically finding debug
information from a core dump or PID. However, for the kernel, libdwfl
has a few issues:

- It only supports finding debug information for the running kernel, not
  vmcores.
- It determines the vmlinux address range by reading /proc/kallsyms,
  which is slow (~70ms on my machine).
- If separate debug information isn't available for a kernel module, it
  finds it by walking /lib/modules/$(uname -r)/kernel; this is repeated
  for every module.
- It doesn't find kernel modules with names containing both dashes and
  underscores (e.g., aes-x86_64).

Luckily, drgn already solved all of these problems, and with some
effort, we can keep doing it ourselves and report it to libdwfl.

The conversion replaces a bunch of code for dealing with userspace core
dump notes, /proc/$pid/maps, and relocations.
2019-07-15 12:27:48 -07:00
Omar Sandoval
129f1493b8 libdrgn: split kernel-specific stuff out of program.c
Almost half of program.c is stuff specific to the Linux kernel, so let's
separate that out (and combine it with the existing kernel module code).
2019-07-08 16:53:58 -07:00
Omar Sandoval
f55158c74c libdrgn: add PAGE_{SHIFT,SIZE,MASK} symbols from vmcoreinfo
Since we currently don't parse DWARF macro information, there's no easy
way to get the value PAGE_SIZE and friends in drgn. However, vmcoreinfo
contains the value of PAGE_SIZE, so let's add a special symbol finder
that returns that.
2019-05-29 00:02:48 -07:00
Omar Sandoval
ed6a6f0b3e libdrgn: get module section address from sysfs when possible
In the running kernel, we don't have to walk the list of modules and
module sections, since we can just look it up directly in sysfs.
2019-05-13 18:05:09 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
5200a6652c libdrgn: embed memory reader, type index, and symbol index in program 2019-05-06 14:55:34 -07:00
Omar Sandoval
ba162ac001 libdrgn: remove endianness from type index
The type index doesn't need to know or care about endianness. Move it to
the program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
77443cecd1 libdrgn: use NULL-terminated arrays for mock {type,symbol} index
This will simplify the next couple of changes.
2019-05-06 14:55:34 -07:00
Omar Sandoval
c2be52dff0 libdrgn: rename object index to symbol index
An "object index" doesn't actually index objects, but really "partial
objects" -- i.e., a type + address. "Symbol" is a better name for this.
2019-05-06 14:55:34 -07:00
Omar Sandoval
d0633d2f1a libdrgn: allow multiple cleanup callbacks
For now, this makes things slightly awkward, but it will be necessary
for the upcoming changes making drgn_program more pluggable.
2019-05-06 14:55:34 -07:00
Omar Sandoval
043cddf6d8 libdrgn: move member cache to type index
It makes more sense here than in struct drgn_program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00