JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-22 17:23:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	f43af4b037	libdrgn: fix drgn_program_crashed_thread() on !SMP kernels On !SMP kernels, crashing_cpu either doesn't exist or is always -1, so drgn_program_crashed_thread() fails. Detect those cases and treat crashing_cpu as 0. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-04-01 15:10:04 -07:00
Omar Sandoval	50e4ac8245	libdrgn: allow overriding program default language Our cheap heuristic for the default language will not always be correct, and although we can improve it as cases arise, we should also just have a way for the user to explicitly set the default language. Add drgn_program_set_language() to libdrgn and allow setting drgn.Program.language in the Python bindings. This will also make unit testing different languages easier. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-16 13:29:12 -08:00
Omar Sandoval	12c2de2956	libdrgn: implement thread API for live processes This implements the existing thread API methods for live processes other than drgn_thread_stack_trace(). It also doesn't yet add support for full-blown tracing, but it at least brings live processes to feature parity. This is taken from the non-ptrace parts of Kevin Svetlitski's PR #142, with some modifications. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-12 13:33:41 -08:00
Omar Sandoval	28c5a2016b	libdrgn: split up some thread API functions drgn_thread_iterator_create(), drgn_thread_iterator_next(), and drgn_program_find_thread() have big, divergent code paths for different targets, and this would get worse once we add live processes. Split them up into multiple functions. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-12 01:42:37 -08:00
Omar Sandoval	c71300d024	libdrgn: use exact buffer sizes when formatting decimal numbers We have a few places where we format a decimal number with sprintf() or snprintf() to a buffer with an arbitrary size. Instead of this arbitrary size, let's add a macro to get the exact number of characters required to format a decimal number, use it in all of these places, and make all of these places use snprintf() just to be safe. This is more verbose but self-documenting. The max_decimal_length() macro is inspired by https://stackoverflow.com/a/13546502/1811295 with some improvements. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-12 01:16:54 -08:00
Omar Sandoval	98577e5e23	libdrgn: fix drgn_program_find_thread() for Linux kernel when thread isn't found If a TID does not exist, then linux_helper_find_task() succeeds but returns a null pointer object. Check for that instead of returning a bogus thread. Fixes: `301cc767ba` ("Implement a new API for representing threads") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-12 01:16:49 -08:00
Mykola Lysenko	7580fffbdf	Add drgn.Program.main_thread() Currently only supported for user-space crash dumps. E.g. no support for live user-space application debugging or kernel debugging. Closes #144. Signed-off-by: Mykola Lysenko <mykolal@fb.com>	2022-02-10 15:53:50 -08:00
Kevin Svetlitski	d51843017e	Fix double-free of crashed_thread Running the test suite with ASAN enabled revealed that when the current target was a userspace core dump, the `crashed_thread` member of `struct drgn_program` was being freed twice – once indirectly via `drgn_thread_set_deinit`, and once explicitly in `drgn_prog_deinit`. Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>	2022-02-02 16:51:21 -08:00
Stephen Brennan	7970a60818	Add methods to return multiple matching symbols Currently we can lookup symbols by name or address, but this will only return one symbol, prioritizing the global symbols. However, symbols may share the same name, and symbols may also overlap address ranges, so it's possible for searches to return multiple results. Add functions which can return a list of multiple matching symbols. Signed-off-by: Stephen Brennan <stephen@brennan.io>	2022-01-15 11:44:33 -08:00
Kevin Svetlitski	301cc767ba	Implement a new API for representing threads Previously, drgn had no way to represent a thread – retrieving a stack trace (the only extant thread-specific operation) was achieved by requiring the user to directly provide a tid. This commit introduces the scaffolding for the design outlined in issue #92, and implements the corresponding methods for userspace core dumps, the live Linux kernel, and Linux kernel core dumps. Future work will build on top of this commit to support live userspace processes. Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>	2022-01-11 17:28:17 -08:00
Omar Sandoval	e6abfeac03	libdrgn: debug_info: report userspace core dump debug info ourselves There are a few reasons for this: 1. dwfl_core_file_report() crashes on elfutils 0.183-0.185. Those versions are still used by several distros. 2. In order to support --main-symbols and --symbols properly, we need to report things ourselves. 3. I'm considering moving away from libdwfl in the long term. We provide an escape hatch for now: setting the environment variable DRGN_USE_LIBDWFL_REPORT=1 opts out of drgn's reporting and uses libdwfl's. Fixes #130. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-08 12:11:10 -08:00
Omar Sandoval	02912ca7d0	libdrgn: fix handling of p_filesz < p_memsz in core dumps I implemented the case of a segment in a core file with p_filesz < p_memsz by treating the difference as zero bytes. This is correct for ET_EXEC and ET_DYN, but for ET_CORE, it actually means that the memory existed in the program but was not saved. For userspace core dumps, this typically happens for read-only file mappings. For kernel core dumps, makedumpfile does this to indicate memory that was excluded. Instead, let's return a DRGN_FAULT_ERROR if an attempt is made to read from these bytes. In the future, we need to read from the executable/library files when we can. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-08 00:02:44 -08:00
Omar Sandoval	91f6d03ee8	libdrgn: fix note name matching The current code matches the desired note name as a prefix, but we need an exact match. Fixes: `75c3679147` ("Rewrite drgn core in C") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-03 12:03:19 -08:00
Omar Sandoval	c0d8709b45	Update copyright headers to Meta Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-21 15:59:44 -08:00
Omar Sandoval	ff40f65f0d	libdrgn: allow symbol name lookup to get local symbols Global symbols are preferred over weak symbols, and weak symbols are preferred over other symbols. dwfl_module_addrinfo() seems to have the same preference, so document address lookups as having the same behavior. (This is actually incorrect in the case of STB_GNU_UNIQUE, as dwfl_module_addrinfo() treats anything other than STB_GLOBAL, STB_WEAK, and STB_LOCAL as having the lowest precedence, but STB_GNU_UNIQUE is so obscure that it probably doesn't matter.) Based on work from Stephen Brennan. Closes #121. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-21 14:30:57 -08:00
Omar Sandoval	5591d199b1	libdrgn: debug_info: split DWARF support into its own file Continuing the refactoring from the previous commit, move the DWARF code from debug_info.c to its own file, leaving only the generic ELF file management in debug_info.c Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:54 -08:00
Omar Sandoval	d1745755f1	Fix some include-what-you-use warnings Also: * Rename struct string to struct nstring and move it to its own header. * Fix scripts/iwyu.py, which was broken by commit `5541fad063` ("Fix some flake8 errors"). * Add workarounds for a few outstanding include-what-you-use issues. There is still a false positive for include-what-you-use/include-what-you-use#970, but hopefully that is fixed soon. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-10 15:09:29 -08:00
Omar Sandoval	1339dc6a2f	libdrgn: hash_table: move entry_to_key to DEFINE_HASH_TABLE_FUNCTIONS() DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type. Move that argument (and some of the derived constants) to DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table types. As a nice side effect, it also reduces the size of common header files. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-23 00:52:23 -07:00
Omar Sandoval	802d6cc9ff	libdrgn: rename drgn_program::_dbinfo to dbinfo The underscore was meant to discourage direct access in favor of using drgn_program_get_dbinfo(), but it turns out that it's more normal to access it directly. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-23 00:52:23 -07:00
Omar Sandoval	c1e16ae3ec	libdrgn: fold drgn_program_get_dbinfo() into only caller The only time that we want to create the drgn_debug_info is when we're loading debugging information. Everywhere else, we fail fast if there is no debugging information. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-23 00:40:57 -07:00
Omar Sandoval	fba5947fec	libdrgn: add array_for_each() And use it in a few appropriate places. This should hopefully make it harder to make iteration mistakes like the one fixed by commit `4755cfac7c` ("libdrgn: dwarf_index: increment correct variable when rolling back"). While we're doing this, move ARRAY_SIZE() into a new header file with array_for_each() and make it lowercase. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-08-23 17:32:00 -07:00
Stephen Brennan	3d8db22c47	libdrgn: Add kind and binding fields to drgn_symbol Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>	2021-08-20 18:16:57 -07:00
Omar Sandoval	0e3054a0ba	libdrgn: make addresses wrap around when reading memory Define that addresses for memory reads wrap around after the maximum address rather than the current unpredictable behavior. This is done by: 1. Reworking drgn_memory_reader to work with an inclusive address range so that a segment can contain UINT64_MAX. drgn_memory_reader remains agnostic to the maximum address and requires that address ranges do not overflow a uint64_t. 2. Adding the overflow/wrap-around logic to drgn_program_add_memory_segment() and drgn_program_read_memory(). 3. Changing direct uses of drgn_memory_reader_reader() to drgn_program_read_memory() now that they are no longer equivalent. (For some platforms, a fault might be more appropriate than wrapping around, but this is a step in the right direction.) Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-03 17:49:29 -07:00
Omar Sandoval	e5ff1ea7ac	libdrgn: program: use preset platform in drgn_program_set_core_dump() If the program already had a platform set, we should its callbacks instead of the ones from the ELF file's platform. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-03 17:04:28 -07:00
Omar Sandoval	a4b9d68a8c	Use GPL-3.0-or-later license identifier instead of GPL-3.0+ Apparently the latter is deprecated and the former is preferred. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-03 01:10:35 -07:00
Omar Sandoval	630d39e345	libdrgn: add ORC unwinder The Linux kernel has its own stack unwinding format for x86-64 called ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is essentially a simplified, less complete version of DWARF CFI. ORC is generated by analyzing machine code, so it is present for all but a few ignored functions. In contrast, DWARF CFI is generated by the compiler and is therefore missing for functions written in assembly and inline assembly (which is widespread in the kernel). This implements an ORC stack unwinder: it applies ELF relocations to the ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule kind, parses and efficiently stores ORC data, and translates ORC to drgn CFI rules. This will allow us to stack trace through assembly code, interrupts, and system calls. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-29 10:01:52 -07:00
Serapheim Dimitropoulos	a68abd5de4	libdrgn: stretch minimum supported version of libelf to 0.170 Currently libdrgn requires libelf to be of version 0.175 or later. This patch allows the library to be compiled with libelf 0.170 (the newest version supported by Ubuntu 18.04 LTS). Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>	2021-03-21 14:28:29 -07:00
Omar Sandoval	a7962e9477	libdrgn: debug_info: pass around Dwfl_Module instead of bias We're going to need the module start and end in drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and get the bias when we need it. This means we don't need the bias from drgn_dwarf_index_get_die(), so get rid of that, too. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-21 10:12:29 -08:00
Omar Sandoval	798f0887a5	libdrgn: simplify language fall back handling If the language for a DWARF type is not found or unrecognized, we should fall back to the global default, not the program default (the program default language is for language-specific operations on the program, so DWARF parsing shouldn't depend on it). Add a fall_back parameter to drgn_language_from_die() and use it in DWARF parsing, and replace drgn_language_or_default() with a drgn_default_language variable. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-08 10:46:35 -08:00
Omar Sandoval	e72ecd0e2c	libdrgn: replace drgn_program_member_info() with drgn_type_find_member() Now that types are associated with their program, we don't need to pass the program separately to drgn_program_member_info() and can replace it with a more natural drgn_type_find_member() API that takes only the type and member name. While we're at it, get rid of drgn_member_info and return the drgn_type_member and bit_offset directly. This also fixes a bug that drgn_error_member_not_found() ignores the member name length. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 14:40:54 -08:00
Omar Sandoval	22c1d87aec	libdrgn: cache page_offset and vmemmap as objects instead of uint64_t This is a little cleaner and saves on conversions back and forth between C values and objects. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-10 02:40:07 -08:00
Omar Sandoval	3360170336	libdrgn: only install page table memory reader when supported If virtual address translation isn't implemented for the target architecture, then we shouldn't add the page table memory reader. If we do, we get a DRGN_ERROR_INVALID_ARGUMENT error from linux_helper_read_vm() instead of a DRGN_ERROR_FAULT error as expected. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-11-27 01:27:30 -08:00
Omar Sandoval	3c5d22637e	libdrgn: clean up hash function APIs and improve documentation Use _hash_pair() for hash functions that do the full double hashing and return a struct hash_pair and hash_() for other hashing utility functions. Also change some of the equality function names to be more symmetric and improve the documentation. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-12 16:20:08 -07:00
Omar Sandoval	286c09844e	Clean up #includes with include-what-you-use I recently hit a couple of CI failures caused by relying on transitive includes that weren't always present. include-what-you-use is a Clang-based tool that helps with this. It's a bit finicky and noisy, so this adds scripts/iwyu.py to make running it more convenient (but not reliable enough to automate it in Travis). This cleans up all reasonable include-what-you-use warnings and reorganizes a few header files. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-23 16:29:42 -07:00
Omar Sandoval	f83bb7c71b	libdrgn: move debugging information tracking into drgn_debug_info Debugging information tracking is currently in two places: drgn_program finds debugging information, and drgn_dwarf_index stores it. Both of these responsibilities make more sense as part of drgn_debug_info, so let's move them there. This prepares us to track extra debugging information that isn't pertinent to indexing. This also reworks a couple of details of loading debugging information: - drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into a single structure, drgn_debug_info_module. - The first pass of DWARF indexing now happens in parallel with reading compilation units (by using OpenMP tasks). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	3ac9ae357b	libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info The current name is too verbose. Let's go with a shorter, more generic name. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-11 17:41:23 -07:00
Jay Kamat	d1beb0184a	libdrgn: add support for objects in C++ namespaces DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the DWARF index, with each namespace being its own sub-index. We only index the namespace itself when it is first accessed, which should help with startup time and simplifies tracking. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	66ad5077c9	libdrgn: dwarf_index: return indexed DIE entry from drgn_dwarf_index_iterator_next() For namespace support, we will want to access the struct drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	7a85b4188e	libdrgn: clean up read.h helpers and avoid undefined pointer behavior There are a couple of related ways that we can cause undefined behavior when parsing a malformed DWARF or depmod index file: 1. There are several places where we increment the cursor to skip past some data. It is undefined behavior if the result points out of bounds of the data, even if we don't attempt to dereference it. 2. read_in_bounds() checks that ptr <= end. This pointer comparison is only defined if ptr and end both point to elements of the same array object or one past the last element. If ptr has gone past end, then this comparison is likely undefined anyways. Fix it by adding a helper to skip past data with bounds checking. Then, all of the helpers can assume that ptr <= end and maintain that invariant. while we're here and auditing all of the call sites, let's clean up the API and rename it from read_foo() to the less generic mread_foo(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	ff96c75da0	helpers: translate task_state_to_char() to Python Commit `326107f054` ("libdrgn: add task_state_to_char() helper") implemented task_state_to_char() in libdrgn so that it could be used in commit `4780c7a266` ("libdrgn: stack_trace: prohibit unwinding stack of running tasks"). As of commit `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust"), it is no longer used in libdrgn, so we can translate it to Python. This removes a bunch of code and is more useful as an example. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 13:54:39 -07:00
Omar Sandoval	e49a87a3d7	libdrgn: remove struct drgn_object::prog We can get it via the type now. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:21 -07:00
Omar Sandoval	c31208f69c	libdrgn: fold drgn_type_index into drgn_program This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:36:35 -07:00
Omar Sandoval	d4e0771f87	libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}() Most places that call these check has_platform and return an error, and those that don't can live with the extra check. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 16:56:28 -07:00
Omar Sandoval	1b47b866b4	libdrgn: go back to trusting PRSTATUS PID Commit `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust") overlooked that if the task is running in userspace, the stack pointer in PRSTATUS obviously won't match the kernel stack pointer. Let's bite the bullet and use the PID. If the race shows up in practice, we can try to come up with another workaround.	2020-07-08 18:34:16 -07:00
Omar Sandoval	948cda2941	libdrgn: add vector/hash table initializers and update coding style Declaring a local vector or hash table and separately initializing it with vector_init()/hash_table_init() is annoying. Add macros that can be used as initializers. This exposes several places where the C89 style of placing all declarations at the beginning of a block is awkward. I adopted this style from the Linux kernel, which uses C89 and thus requires this style. I'm now convinced that it's usually nicer to declare variables where they're used. So let's officially adopt the style of mixing declarations and code (and ditch the blank line after declarations) and update the functions touched by this change.	2020-07-01 12:48:24 -07:00
Omar Sandoval	eea5422546	libdrgn: make Linux kernel stack unwinding more robust drgn has a couple of issues unwinding stack traces for kernel core dumps: 1. It can't unwind the stack for the idle task (PID 0), which commonly appears in core dumps. 2. It uses the PID in PRSTATUS, which is racy and can't actually be trusted. The solution for both of these is to look up the PRSTATUS note by CPU instead of PID. For the live kernel, drgn refuses to unwind the stack of tasks in the "R" state. However, the "R" state is running or runnable, so in the latter case, we can still unwind the stack. The solution for this is to look at on_cpu for the task instead of the state.	2020-05-20 12:03:00 -07:00
Omar Sandoval	8b264f8823	Update copyright headers to Facebook and add missing headers drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.	2020-05-15 15:13:02 -07:00
Omar Sandoval	2d1481f5ab	libdrgn: add page table walker kernel memory reader Now that we can walk page tables, we can use it in a memory reader that reads kernel memory via the kernel page table. This means that we don't need libkdumpfile for ELF vmcores anymore (although I'll keep the functionality around until this code has been validated more).	2020-05-08 17:37:56 -07:00
Omar Sandoval	d0a1718451	libdrgn: implement virtual address translation/page table walking There are a few big use cases for this in drgn: * Helpers for accessing memory in the virtual address space of userspace tasks. * Removing the libkdumpfile dependency for vmcores. * Handling gaps in the virtual address space of /proc/kcore (cf. #27). I dragged my feet on implementing this because I thought it would be more complicated, but the page table layout on x86-64 isn't too bad. This commit implements page table walking using a page table iterator abstraction. The first thing we'll add on top of this will be a helper for reading memory from a virtual address space, but in the future it'd also be possible to export the page table iterator directly.	2020-05-08 17:36:19 -07:00
Omar Sandoval	23574e59d5	libdrgn: add /proc/kcore physical segments on old kernels Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so it's currently not possible to read from physical memory on old kernels. However, if we can figure out the address of the direct mapping, then we can determine the corresponding physical addresses for the segments and add them.	2020-05-04 13:20:27 -07:00

1 2 3

113 Commits