JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-25 02:13:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	1088ef4a1e	libdrgn: platform: replace demangle_return_address() with demangle_cfi_registers() While documenting struct drgn_architecture_info, I realized that demangle_return_address() is difficult to explain. It's more straightforward to define this functionality as demangling any registers that are mangled when using CFI rather than just the return address register. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-12-02 13:52:06 -08:00
Omar Sandoval	18b12a5c7b	libdrgn: get .eh_frame from the correct file We're currently getting .eh_frame from the debug file. However, since .eh_frame is an SHF_ALLOC section, it is actually in the loaded file, and may not be in the debug file. This causes us to fail to unwind in modules whose debug file was created with objcopy --only-keep-debug (which is typical for Linux distro debug files). Fix it by getting .eh_frame from the loaded file. To make this easier, we split .eh_frame and .debug_frame data into two separate tables. We also don't bother deduplicating them anymore, since GCC and Clang only seem to generate one or the other in practice. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	bcb53d712b	libdrgn: bypass libdwfl with struct drgn_elf_file Now that we track the debug file ourselves, we can avoid calling libdwfl in a bunch of places. By tracking the bias ourselves, we can avoid a bunch more. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	34f122144a	libdrgn: debug_info: wrap ELF file information in new struct drgn_elf_file struct drgn_module contains a bunch of information about the debug info file. Let's pull it out into its own structure, struct drgn_elf_file. This will be reused for the "main"/"loaded" file in an upcoming change. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	b3bab1c5b0	libdrgn: make module vs. program platform difference more clear It's confusing that we have a platform both for the program and for each module. They usually match, but they're not required to. For example, the user can manually add a file with a different platform just to read its debug info. Our rule is that if we're parsing anything from the module, we use the module platform; and otherwise, use the program platform. There are a couple of places where the platforms must match: when using call frame information (CFI) or registers. Let's make all of this more clear in the code (by using the module's platform even when it must match the program's platform) and in comments. No functional change. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 12:53:45 -08:00
Omar Sandoval	222680b47a	Add StackFrame.sp We have some generic helpers that we'd like to add (for example, #210) that need to know the stack pointer of a frame. These shouldn't need to hard-code register names for different architectures. Add a generic shortcut, StackFrame.sp. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-22 18:47:16 -08:00
Stephen Brennan	5f3a91f80d	Add StackFrame.locals() method The StackFrame's __getitem__() method allows looking up names in the scope of a stack frame, which is an incredibly useful tool for debugging. However, the names are not discoverable -- you must already be looking at the source code or some other source to know what names can be queried. To fix this, add a locals() method to StackFrame, which lists names that can be queried in the scope. Since this method is named locals(), it stops at the function scope and doesn't include globals or class members. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>	2022-11-02 22:40:33 -07:00
Omar Sandoval	87b7292aa5	Relicense drgn from GPLv3+ to LGPLv2.1+ drgn is currently licensed as GPLv3+. Part of the long term vision for drgn is that other projects can use it as a library providing programmatic interfaces for debugger functionality. A more permissive license is better suited to this goal. We decided on LGPLv2.1+ as a good balance between software freedom and permissiveness. All contributors not employed by Meta were contacted via email and consented to the license change. The only exception was the author of commit `c4fbf7e589` ("libdrgn: fix for compilation error"), who did not respond. That commit reverted a single line of code to one originally written by me in commit `640b1c011d` ("libdrgn: embed DWARF index in DWARF info cache"). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-01 17:05:16 -07:00
Omar Sandoval	d465071651	libdrgn: replace copies of elfutils headers with generated files Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-01 15:41:53 -07:00
Omar Sandoval	70af25849c	libdrgn: rename drgn_debug_info_module to drgn_module Eventually, modules will be exposed as part of the public libdrgn API, so they should have a clean name. Additionally, the module API I'm currently working on will allow modules for which we don't have the debug info file, so "debug info module" would be a misnomer. Also rename drgn_dwarf_module_info to drgn_module_dwarf_info and drgn_orc_module_info to drgn_module_orc_info to fit the new naming scheme better. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-05 16:52:46 -07:00
Omar Sandoval	03d5c2ebac	libdrgn: string_builder: replace string_builder_finalize() Instead of string_builder_finalize(), which leaves the string_builder in an undefined state (according to the documentation, at least), define string_builder_null_terminate(), which documents exactly what it does. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-05 15:55:04 -07:00
Omar Sandoval	d76a3a338f	libdrgn: string_builder: add dedicated initializer Rather than documenting how to initialize a struct string_builder, provide an initializer, STRING_BUILDER_INIT. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-05 15:32:07 -07:00
Omar Sandoval	0b7ac5b046	Fix vmcore stack traces on Linux < 4.9 or >= 5.16 and add drgn.helpers.linux.task_cpu() task->cpu was moved to task->thread_info.cpu in Linux 5.16, which causes drgn_get_initial_registers() to think that the kernel is !SMP and use CPU 0 instead, producing incorrect stack traces. This has also always been wrong for Linux < 4.9 and on architectures that don't enable CONFIG_THREAD_INFO_IN_TASK; in those cases, it should be ((struct thread_info *)task->stack)->cpu. Fix it by factoring out a new task_cpu() helper that handles all of the above cases. Also add a test case for task_cpu() in case this changes again. Fixes: `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-03 16:21:12 -07:00
Omar Sandoval	f8ba278bc1	libdrgn: fix include-what-you-use warnings It's been awhile since I've run this. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-08-26 12:43:20 -07:00
Omar Sandoval	faaf01ad1b	Add drgn.StackTrace.prog and drgn_stack_trace_program() If we only have the stack trace available, it's useful to get the program it came from. This'll be used eventually for helpers that take a stack trace. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-08-11 14:45:54 -07:00
Omar Sandoval	63c0684b68	libdrgn: aarch64: mask away pointer authentication code in return addresses Now that we track RA_SIGN_STATE and get the pointer authentication code mask, we can remove the pointer authentication code from the return address while unwinding. Add a new architecture callback, ->demangle_return_address(), for this purpose. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-06-26 09:18:07 -07:00
Omar Sandoval	9c9a2136f1	libdrgn: cfi: add rule to set register to constant This will be used to implement DW_CFA_AARCH64_negate_ra_state. Also fix a typographical error in a nearby comment. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-06-26 09:18:07 -07:00
Omar Sandoval	42e37e72c1	libdrgn: stack_trace: fix byte order for drgn_stack_frame_register() drgn_stack_frame_register() gets the register value with copy_lsbytes() and then byte swaps it if the program's byte order is different from the host's. But, copy_lsbytes() already fixes the byte order, so this ends up with the original (wrong) byte order. We also don't need to zero out the integer that we copy into since copy_lsbytes() also does that. Fixes: `eec67768aa` ("libdrgn: replace elfutils DWARF unwinder with our own") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-06-24 09:17:56 -07:00
Kevin Svetlitski	301cc767ba	Implement a new API for representing threads Previously, drgn had no way to represent a thread – retrieving a stack trace (the only extant thread-specific operation) was achieved by requiring the user to directly provide a tid. This commit introduces the scaffolding for the design outlined in issue #92, and implements the corresponding methods for userspace core dumps, the live Linux kernel, and Linux kernel core dumps. Future work will build on top of this commit to support live userspace processes. Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>	2022-01-11 17:28:17 -08:00
Omar Sandoval	69c069b09f	libdrgn: allow NULL argument to drgn_stack_trace_destroy() This is one place where I broke the convention that I just documented. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-01-06 18:23:27 -08:00
Omar Sandoval	c0d8709b45	Update copyright headers to Meta Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-21 15:59:44 -08:00
Omar Sandoval	5591d199b1	libdrgn: debug_info: split DWARF support into its own file Continuing the refactoring from the previous commit, move the DWARF code from debug_info.c to its own file, leaving only the generic ELF file management in debug_info.c Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:54 -08:00
Omar Sandoval	d1745755f1	Fix some include-what-you-use warnings Also: * Rename struct string to struct nstring and move it to its own header. * Fix scripts/iwyu.py, which was broken by commit `5541fad063` ("Fix some flake8 errors"). * Add workarounds for a few outstanding include-what-you-use issues. There is still a false positive for include-what-you-use/include-what-you-use#970, but hopefully that is fixed soon. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-10 15:09:29 -08:00
Omar Sandoval	802d6cc9ff	libdrgn: rename drgn_program::_dbinfo to dbinfo The underscore was meant to discourage direct access in favor of using drgn_program_get_dbinfo(), but it turns out that it's more normal to access it directly. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-23 00:52:23 -07:00
Omar Sandoval	add17a9a36	libdrgn: stack_trace: fix source info without .debug_aranges dwfl_module_getsrc() relies on .debug_aranges to find the CU containing the PC. If the module has a missing or incomplete .debug_aranges, it fails. This lookup is actually redundant since we already found the CU when we unwound the stack. Use the libdw helpers that take the CU DIE instead to avoid this. We also need to save the CU for frames where we found it but couldn't find the subprogram (typically assembly files). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-07-07 13:41:17 -07:00
Omar Sandoval	bc85767e5f	libdrgn: support looking up parameters and variables in stack traces After all of the preparatory work, the last two missing pieces are a way to find a variable by name in the list of scopes that we saved while unwinding, and a way to find the containing scopes of an inlined function. With that, we can finally look up parameters and variables in stack traces. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	38573cfdde	libdrgn: stack_trace: pretty print frames and add frames for inline functions If we want to access a parameter or local variable in an inlined function, then we need a stack frame for that function. It's also much more useful to see inlined functions in the stack trace in general. So, when we've unwound the registers for a stack frame, walk the debugging information to find all of the (possibly inlined) functions at the program counter, and add a drgn stack frame for each of those. Also add StackFrame.name and StackFrame.is_inline so that we can distinguish inline frames. Also add StackFrame.source() to get the filename and line and column numbers. Finally, add the source code location to pretty-printed stack traces and add pretty-printing for individual stack frames that includes extra information. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	a4b9d68a8c	Use GPL-3.0-or-later license identifier instead of GPL-3.0+ Apparently the latter is deprecated and the former is preferred. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-03 01:10:35 -07:00
Omar Sandoval	630d39e345	libdrgn: add ORC unwinder The Linux kernel has its own stack unwinding format for x86-64 called ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is essentially a simplified, less complete version of DWARF CFI. ORC is generated by analyzing machine code, so it is present for all but a few ignored functions. In contrast, DWARF CFI is generated by the compiler and is therefore missing for functions written in assembly and inline assembly (which is widespread in the kernel). This implements an ORC stack unwinder: it applies ELF relocations to the ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule kind, parses and efficiently stores ORC data, and translates ORC to drgn CFI rules. This will allow us to stack trace through assembly code, interrupts, and system calls. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-29 10:01:52 -07:00
Omar Sandoval	eec67768aa	libdrgn: replace elfutils DWARF unwinder with our own The elfutils DWARF unwinder has a couple of limitations: 1. libdwfl doesn't have an interface for getting register values, so we have to bundle a patched version of elfutils with drgn. 2. Error handling is very awkward: dwfl_getthread_frames() can return an error even on success, so we have to squirrel away our own errors in the callback. Furthermore, there are a couple of things that will be easier with our own unwinder: 1. Integrating unwinding using ORC will be easier when we're handling unwinding ourselves. 2. Support for local variables isn't too far away now that we have DWARF expression evaluation. Now that we have the register state, CFI, and DWARF expression pieces in place, stitch them together with the new unwinder, and tweak the public API a bit to reflect it. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-15 16:43:12 -07:00
Omar Sandoval	b899a10836	Remove register numbers from API and add register aliases enum drgn_register_number in the public libdrgn API and drgn.Register.number in the Python bindings are basically exports of DWARF register numbers. They only exist as a way to identify registers that's lighter weight than string lookups. libdrgn already has struct drgn_register, so we can use that to identify registers in the public API and remove enum drgn_register_number. This has a couple of benefits: we don't depend on DWARF numbering in our API, and we don't have to generate drgn.h from the architecture files. The Python bindings can just use string names for now. If it seems useful, StackFrame.register() can take a Register in the future, we'll just need to be careful to not allow Registers from the wrong platform. While we're changing the API anyways, also change it so that registers have a list of names instead of one name. This isn't needed for x86-64 at the moment, but will be for architectures that have multiple names for the same register (like ARM). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-28 17:47:45 -08:00
Omar Sandoval	46343ae08d	libdrgn: get rid of struct drgn_stack_frame In preparation for adding a "real", internal-only struct drgn_stack_frame, replace the existing struct drgn_stack_frame with explicit trace/frame arguments. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-27 11:22:34 -08:00
Omar Sandoval	5f17281926	libdrgn: make drgn_object::is_reference an enum To prepare for a new kind of object, replace the is_reference bool with an enum drgn_object_kind. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-04 13:37:58 -08:00
Omar Sandoval	edb1fe7f2f	libdrgn: rename drgn_object_kind to drgn_object_encoding I'd like to use the name drgn_object_kind to distinguish between values and references. "Encoding" is more accurate than "kind", anyways. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-04 12:02:26 -08:00
Omar Sandoval	286c09844e	Clean up #includes with include-what-you-use I recently hit a couple of CI failures caused by relying on transitive includes that weren't always present. include-what-you-use is a Clang-based tool that helps with this. It's a bit finicky and noisy, so this adds scripts/iwyu.py to make running it more convenient (but not reliable enough to automate it in Travis). This cleans up all reasonable include-what-you-use warnings and reorganizes a few header files. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-23 16:29:42 -07:00
Omar Sandoval	f83bb7c71b	libdrgn: move debugging information tracking into drgn_debug_info Debugging information tracking is currently in two places: drgn_program finds debugging information, and drgn_dwarf_index stores it. Both of these responsibilities make more sense as part of drgn_debug_info, so let's move them there. This prepares us to track extra debugging information that isn't pertinent to indexing. This also reworks a couple of details of loading debugging information: - drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into a single structure, drgn_debug_info_module. - The first pass of DWARF indexing now happens in parallel with reading compilation units (by using OpenMP tasks). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	7a85b4188e	libdrgn: clean up read.h helpers and avoid undefined pointer behavior There are a couple of related ways that we can cause undefined behavior when parsing a malformed DWARF or depmod index file: 1. There are several places where we increment the cursor to skip past some data. It is undefined behavior if the result points out of bounds of the data, even if we don't attempt to dereference it. 2. read_in_bounds() checks that ptr <= end. This pointer comparison is only defined if ptr and end both point to elements of the same array object or one past the last element. If ptr has gone past end, then this comparison is likely undefined anyways. Fix it by adding a helper to skip past data with bounds checking. Then, all of the helpers can assume that ptr <= end and maintain that invariant. while we're here and auditing all of the call sites, let's clean up the API and rename it from read_foo() to the less generic mread_foo(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	e49a87a3d7	libdrgn: remove struct drgn_object::prog We can get it via the type now. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:21 -07:00
Omar Sandoval	1b47b866b4	libdrgn: go back to trusting PRSTATUS PID Commit `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust") overlooked that if the task is running in userspace, the stack pointer in PRSTATUS obviously won't match the kernel stack pointer. Let's bite the bullet and use the PID. If the race shows up in practice, we can try to come up with another workaround.	2020-07-08 18:34:16 -07:00
Omar Sandoval	eea5422546	libdrgn: make Linux kernel stack unwinding more robust drgn has a couple of issues unwinding stack traces for kernel core dumps: 1. It can't unwind the stack for the idle task (PID 0), which commonly appears in core dumps. 2. It uses the PID in PRSTATUS, which is racy and can't actually be trusted. The solution for both of these is to look up the PRSTATUS note by CPU instead of PID. For the live kernel, drgn refuses to unwind the stack of tasks in the "R" state. However, the "R" state is running or runnable, so in the latter case, we can still unwind the stack. The solution for this is to look at on_cpu for the task instead of the state.	2020-05-20 12:03:00 -07:00
Omar Sandoval	146930aff8	libdrgn: replace arch frame_registers with callbacks We currently unwind from pt_regs and NT_PRSTATUS using an array of register definitions. It's more flexible and more efficient to do this with an architecture-specific callback. For x86-64, this change also makes us depend on the binary layout rather than member names of struct pt_regs, but that shouldn't matter unless people are defining their own, weird struct pt_regs.	2020-05-19 17:11:27 -07:00
Omar Sandoval	8b264f8823	Update copyright headers to Facebook and add missing headers drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.	2020-05-15 15:13:02 -07:00
Omar Sandoval	c339113f9c	libdrgn: adjust program counter when looking up frame symbol For functions that call a noreturn function, the compiler may omit code after the call instruction. This means that the return address may not lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is an "activation", i.e., its program counter is guaranteed to lie within the caller. This is only the case for the initial frame, frames interrupted by a signal, and the signal trampoline frame. For everything else, we need to decrement the program counter before doing any lookups.	2020-05-13 17:11:54 -07:00
Omar Sandoval	0a100064c1	libdrgn: improve and rename DRGN_UNREACHABLE() DRGN_UNREACHABLE() currently expands to abort(), but assert() provides more information. If NDEBUG is defined, we can use __builtin_unreachable() instead. DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to UNREACHABLE(). It's also not really related to errors, so this moves it to internal.h.	2020-05-07 15:16:22 -07:00
Omar Sandoval	10e58777c3	Add Program.read_{u8,u16,u32,u64,word}() I've found that I do this manually a lot (e.g., when digging through a task's stack). Add shortcuts for reading unsigned integers and a note for how to manually read other formats.	2020-04-27 17:27:10 -07:00
Serapheim Dimitropoulos	08193a97aa	Support stack traces for running threads on kdumps	2020-03-27 16:12:03 -07:00
Omar Sandoval	9246094cdc	libdrgn: use dwfl_frame_register() instead of dwfl_frame_eval_expr() I thought I'd be able to avoid adding a separate API for register values and reuse dwfl_frame_eval_expr(), but this doesn't work if the frame is missing debug information but has known register values (e.g., if the program crashed with an invalid instruction pointer).	2020-02-20 14:13:08 -08:00
Jay Kamat	054cb54a01	libdrgn: Rename find_symbol to find_symbol_by_address	2020-02-12 14:06:49 -08:00
Omar Sandoval	0a707b0c9d	libdrgn: rework drgn_find_symbol_internal() Instead of having two internal variants (drgn_find_symbol_internal() and drgn_program_find_symbol_in_module()), combine them into the former and add a separate drgn_error_symbol_not_found() for translating the static error to the user-facing one. This makes things more flexible for the next change.	2019-12-19 11:43:54 -08:00
Omar Sandoval	3b22bd3022	libdrgn: rename pretty_print -> format In preparation for making drgn_pretty_print_object() more flexible (i.e., not always "pretty"), rename it to drgn_format_object(). For consistency, let's rename drgn_pretty_print_type_name(), drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.	2019-12-16 11:21:12 -08:00

1 2

57 Commits