JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-23 09:43:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	0e3054a0ba	libdrgn: make addresses wrap around when reading memory Define that addresses for memory reads wrap around after the maximum address rather than the current unpredictable behavior. This is done by: 1. Reworking drgn_memory_reader to work with an inclusive address range so that a segment can contain UINT64_MAX. drgn_memory_reader remains agnostic to the maximum address and requires that address ranges do not overflow a uint64_t. 2. Adding the overflow/wrap-around logic to drgn_program_add_memory_segment() and drgn_program_read_memory(). 3. Changing direct uses of drgn_memory_reader_reader() to drgn_program_read_memory() now that they are no longer equivalent. (For some platforms, a fault might be more appropriate than wrapping around, but this is a step in the right direction.) Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-03 17:49:29 -07:00
Omar Sandoval	e5ff1ea7ac	libdrgn: program: use preset platform in drgn_program_set_core_dump() If the program already had a platform set, we should its callbacks instead of the ones from the ELF file's platform. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-03 17:04:28 -07:00
Omar Sandoval	a4b9d68a8c	Use GPL-3.0-or-later license identifier instead of GPL-3.0+ Apparently the latter is deprecated and the former is preferred. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-03 01:10:35 -07:00
Omar Sandoval	630d39e345	libdrgn: add ORC unwinder The Linux kernel has its own stack unwinding format for x86-64 called ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is essentially a simplified, less complete version of DWARF CFI. ORC is generated by analyzing machine code, so it is present for all but a few ignored functions. In contrast, DWARF CFI is generated by the compiler and is therefore missing for functions written in assembly and inline assembly (which is widespread in the kernel). This implements an ORC stack unwinder: it applies ELF relocations to the ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule kind, parses and efficiently stores ORC data, and translates ORC to drgn CFI rules. This will allow us to stack trace through assembly code, interrupts, and system calls. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-29 10:01:52 -07:00
Serapheim Dimitropoulos	a68abd5de4	libdrgn: stretch minimum supported version of libelf to 0.170 Currently libdrgn requires libelf to be of version 0.175 or later. This patch allows the library to be compiled with libelf 0.170 (the newest version supported by Ubuntu 18.04 LTS). Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>	2021-03-21 14:28:29 -07:00
Omar Sandoval	a7962e9477	libdrgn: debug_info: pass around Dwfl_Module instead of bias We're going to need the module start and end in drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and get the bias when we need it. This means we don't need the bias from drgn_dwarf_index_get_die(), so get rid of that, too. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-21 10:12:29 -08:00
Omar Sandoval	798f0887a5	libdrgn: simplify language fall back handling If the language for a DWARF type is not found or unrecognized, we should fall back to the global default, not the program default (the program default language is for language-specific operations on the program, so DWARF parsing shouldn't depend on it). Add a fall_back parameter to drgn_language_from_die() and use it in DWARF parsing, and replace drgn_language_or_default() with a drgn_default_language variable. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-08 10:46:35 -08:00
Omar Sandoval	e72ecd0e2c	libdrgn: replace drgn_program_member_info() with drgn_type_find_member() Now that types are associated with their program, we don't need to pass the program separately to drgn_program_member_info() and can replace it with a more natural drgn_type_find_member() API that takes only the type and member name. While we're at it, get rid of drgn_member_info and return the drgn_type_member and bit_offset directly. This also fixes a bug that drgn_error_member_not_found() ignores the member name length. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 14:40:54 -08:00
Omar Sandoval	22c1d87aec	libdrgn: cache page_offset and vmemmap as objects instead of uint64_t This is a little cleaner and saves on conversions back and forth between C values and objects. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-10 02:40:07 -08:00
Omar Sandoval	3360170336	libdrgn: only install page table memory reader when supported If virtual address translation isn't implemented for the target architecture, then we shouldn't add the page table memory reader. If we do, we get a DRGN_ERROR_INVALID_ARGUMENT error from linux_helper_read_vm() instead of a DRGN_ERROR_FAULT error as expected. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-11-27 01:27:30 -08:00
Omar Sandoval	3c5d22637e	libdrgn: clean up hash function APIs and improve documentation Use _hash_pair() for hash functions that do the full double hashing and return a struct hash_pair and hash_() for other hashing utility functions. Also change some of the equality function names to be more symmetric and improve the documentation. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-12 16:20:08 -07:00
Omar Sandoval	286c09844e	Clean up #includes with include-what-you-use I recently hit a couple of CI failures caused by relying on transitive includes that weren't always present. include-what-you-use is a Clang-based tool that helps with this. It's a bit finicky and noisy, so this adds scripts/iwyu.py to make running it more convenient (but not reliable enough to automate it in Travis). This cleans up all reasonable include-what-you-use warnings and reorganizes a few header files. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-23 16:29:42 -07:00
Omar Sandoval	f83bb7c71b	libdrgn: move debugging information tracking into drgn_debug_info Debugging information tracking is currently in two places: drgn_program finds debugging information, and drgn_dwarf_index stores it. Both of these responsibilities make more sense as part of drgn_debug_info, so let's move them there. This prepares us to track extra debugging information that isn't pertinent to indexing. This also reworks a couple of details of loading debugging information: - drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into a single structure, drgn_debug_info_module. - The first pass of DWARF indexing now happens in parallel with reading compilation units (by using OpenMP tasks). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	3ac9ae357b	libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info The current name is too verbose. Let's go with a shorter, more generic name. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-11 17:41:23 -07:00
Jay Kamat	d1beb0184a	libdrgn: add support for objects in C++ namespaces DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the DWARF index, with each namespace being its own sub-index. We only index the namespace itself when it is first accessed, which should help with startup time and simplifies tracking. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	66ad5077c9	libdrgn: dwarf_index: return indexed DIE entry from drgn_dwarf_index_iterator_next() For namespace support, we will want to access the struct drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	7a85b4188e	libdrgn: clean up read.h helpers and avoid undefined pointer behavior There are a couple of related ways that we can cause undefined behavior when parsing a malformed DWARF or depmod index file: 1. There are several places where we increment the cursor to skip past some data. It is undefined behavior if the result points out of bounds of the data, even if we don't attempt to dereference it. 2. read_in_bounds() checks that ptr <= end. This pointer comparison is only defined if ptr and end both point to elements of the same array object or one past the last element. If ptr has gone past end, then this comparison is likely undefined anyways. Fix it by adding a helper to skip past data with bounds checking. Then, all of the helpers can assume that ptr <= end and maintain that invariant. while we're here and auditing all of the call sites, let's clean up the API and rename it from read_foo() to the less generic mread_foo(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	ff96c75da0	helpers: translate task_state_to_char() to Python Commit `326107f054` ("libdrgn: add task_state_to_char() helper") implemented task_state_to_char() in libdrgn so that it could be used in commit `4780c7a266` ("libdrgn: stack_trace: prohibit unwinding stack of running tasks"). As of commit `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust"), it is no longer used in libdrgn, so we can translate it to Python. This removes a bunch of code and is more useful as an example. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 13:54:39 -07:00
Omar Sandoval	e49a87a3d7	libdrgn: remove struct drgn_object::prog We can get it via the type now. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:21 -07:00
Omar Sandoval	c31208f69c	libdrgn: fold drgn_type_index into drgn_program This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:36:35 -07:00
Omar Sandoval	d4e0771f87	libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}() Most places that call these check has_platform and return an error, and those that don't can live with the extra check. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 16:56:28 -07:00
Omar Sandoval	1b47b866b4	libdrgn: go back to trusting PRSTATUS PID Commit `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust") overlooked that if the task is running in userspace, the stack pointer in PRSTATUS obviously won't match the kernel stack pointer. Let's bite the bullet and use the PID. If the race shows up in practice, we can try to come up with another workaround.	2020-07-08 18:34:16 -07:00
Omar Sandoval	948cda2941	libdrgn: add vector/hash table initializers and update coding style Declaring a local vector or hash table and separately initializing it with vector_init()/hash_table_init() is annoying. Add macros that can be used as initializers. This exposes several places where the C89 style of placing all declarations at the beginning of a block is awkward. I adopted this style from the Linux kernel, which uses C89 and thus requires this style. I'm now convinced that it's usually nicer to declare variables where they're used. So let's officially adopt the style of mixing declarations and code (and ditch the blank line after declarations) and update the functions touched by this change.	2020-07-01 12:48:24 -07:00
Omar Sandoval	eea5422546	libdrgn: make Linux kernel stack unwinding more robust drgn has a couple of issues unwinding stack traces for kernel core dumps: 1. It can't unwind the stack for the idle task (PID 0), which commonly appears in core dumps. 2. It uses the PID in PRSTATUS, which is racy and can't actually be trusted. The solution for both of these is to look up the PRSTATUS note by CPU instead of PID. For the live kernel, drgn refuses to unwind the stack of tasks in the "R" state. However, the "R" state is running or runnable, so in the latter case, we can still unwind the stack. The solution for this is to look at on_cpu for the task instead of the state.	2020-05-20 12:03:00 -07:00
Omar Sandoval	8b264f8823	Update copyright headers to Facebook and add missing headers drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.	2020-05-15 15:13:02 -07:00
Omar Sandoval	2d1481f5ab	libdrgn: add page table walker kernel memory reader Now that we can walk page tables, we can use it in a memory reader that reads kernel memory via the kernel page table. This means that we don't need libkdumpfile for ELF vmcores anymore (although I'll keep the functionality around until this code has been validated more).	2020-05-08 17:37:56 -07:00
Omar Sandoval	d0a1718451	libdrgn: implement virtual address translation/page table walking There are a few big use cases for this in drgn: * Helpers for accessing memory in the virtual address space of userspace tasks. * Removing the libkdumpfile dependency for vmcores. * Handling gaps in the virtual address space of /proc/kcore (cf. #27). I dragged my feet on implementing this because I thought it would be more complicated, but the page table layout on x86-64 isn't too bad. This commit implements page table walking using a page table iterator abstraction. The first thing we'll add on top of this will be a helper for reading memory from a virtual address space, but in the future it'd also be possible to export the page table iterator directly.	2020-05-08 17:36:19 -07:00
Omar Sandoval	23574e59d5	libdrgn: add /proc/kcore physical segments on old kernels Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so it's currently not possible to read from physical memory on old kernels. However, if we can figure out the address of the direct mapping, then we can determine the corresponding physical addresses for the segments and add them.	2020-05-04 13:20:27 -07:00
Omar Sandoval	f8c33518eb	libdrgn: handle kernel core dumps with all zero p_paddr We treat core dumps with all zero p_paddrs as not having valid physical addresses. However, it is theoretically possible for a kernel core dump to only have one segment which legitimately has a p_paddr of 0 (e.g., if it only has a segment for the direct mapping, although note that this isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the BIOS [1]). If the core dump has a VMCOREINFO note, then it is either a vmcore, which has valid physical addresses, or it is /proc/kcore with Linux kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to /proc/kcore") (in v4.19), so it must also have Linux kernel commit 464920104bf7 ("/proc/kcore: update physical address for kcore ram and text") (in v4.11) (ignoring the possibility of a franken-kernel which backported the former but not the latter). Therefore, treat core dumps with a VMCOREINFO note as having valid physical addresses. 1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678	2020-05-04 13:20:27 -07:00
Omar Sandoval	5505628235	libdrgn: get rid of struct drgn_program.num_file_segments This isn't used anymore. Remove it and simplify the loop adding file segments.	2020-05-04 13:20:27 -07:00
Omar Sandoval	10e58777c3	Add Program.read_{u8,u16,u32,u64,word}() I've found that I do this manually a lot (e.g., when digging through a task's stack). Add shortcuts for reading unsigned integers and a note for how to manually read other formats.	2020-04-27 17:27:10 -07:00
Omar Sandoval	b1315fcaa1	libdrgn: add drgn_program_bswap() This is clearer than open-coding the endianness check.	2020-04-27 17:08:02 -07:00
Omar Sandoval	5ac95e491a	libdrgn: fix _page_offset() helper and move to object finder The internal _page_offset() helper gets the value of PAGE_OFFSET, but the fallback when KASLR is disabled has been out of date since Linux v4.20 and never handled 5-level page tables. Additionally, it makes more sense as part of the Linux kernel (formerly vmcoreinfo) object finder so that it's cleanly accessible outside of drgn internals.	2020-04-10 15:33:27 -07:00
Omar Sandoval	fa61977f60	libdrgn: fix default language detection Jay reported that the default language detection was happening too early and not finding "main". We need to make sure to do it after the DWARF index is actually populated. The problem with that is that it makes error reporting much harder, as we don't want to return a fatal error from drgn_program_set_language_from_main() if we actually succeeded in loading debug info. That means we probably need to ignore errors in drgn_program_set_language_from_main(). To reduce the surface area where we'd be failing, let's get the language directly from the DWARF index. This also allows us to avoid setting the language if it's actually unknown (information which is lost by the time we convert it to a drgn_object in the current code).	2020-04-03 16:35:38 -07:00
Serapheim Dimitropoulos	08193a97aa	Support stack traces for running threads on kdumps	2020-03-27 16:12:03 -07:00
Jay Kamat	3f870603fa	libdrgn: add default language to drgn_program For operations where we don't have a type available, we currently fall back to C. Instead, we should guess the language of the program and use that as the default. The heurisitic implemented here gets the language of the CU containing "main" (except for the Linux kernel, which is always C). In the future, we should allow manually overriding the automatically determined language.	2020-02-26 19:55:42 -08:00
Omar Sandoval	a5cd92f24e	libdrgn: make vmcoreinfo accessible before loading debug info UTS_RELEASE is currently only accessible once debug info is loaded with prog.load_debug_info(main=True). This makes it difficult to get the release, find the appropriate vmlinux, then load the found vmlinux. We can add vmcoreinfo_object_find as part of set_core_dump(), which makes it possible to do the following: prog = drgn.Program() prog.set_core_dump(core_dump_path) release = prog['UTS_RELEASE'].string_() vmlinux_path = find_vmlinux(release) prog.load_debug_info([vmlinux_path]) The only downside is that this ends up using the default definition of char rather than what we would get from the debug info, but that shouldn't be a big problem.	2020-02-19 12:11:45 -08:00
Jay Kamat	31d544949f	libdrgn: Add find_symbol_by_name to look up ELF symbols	2020-02-12 14:06:49 -08:00
Jay Kamat	054cb54a01	libdrgn: Rename find_symbol to find_symbol_by_address	2020-02-12 14:06:49 -08:00
Omar Sandoval	0a707b0c9d	libdrgn: rework drgn_find_symbol_internal() Instead of having two internal variants (drgn_find_symbol_internal() and drgn_program_find_symbol_in_module()), combine them into the former and add a separate drgn_error_symbol_not_found() for translating the static error to the user-facing one. This makes things more flexible for the next change.	2019-12-19 11:43:54 -08:00
Omar Sandoval	4a8152175b	libdrgn: translate EIO from /proc/$pid/mem to DRGN_ERROR_FAULT For live userspace processes, we add a single [0, UINT64_MAX) memory file segment for /proc/$pid/mem. Of course, not every address in that range is valid; reading from an invalid address returns EIO. We should translate this to a DRGN_ERROR_FAULT instead of DRGN_ERROR_OS, but only for /proc/$pid/mem.	2019-12-10 13:30:34 -08:00
Omar Sandoval	6af6159cfc	libdrgn: support loading only load main debug info If we only want debugging information for vmlinux and not kernel modules, it'd be nice to only load the former. This adds a load_main parameter to drgn_program_load_debug_info() which specifies just that. For now, it's only implemented for the Linux kernel. While we're here, let's make the paths parameter optional for the Python bindings.	2019-11-22 16:38:52 -08:00
Omar Sandoval	326107f054	libdrgn: add task_state_to_char() helper Add a helper to get the state of a task (e.g., 'R', 'S', 'D'). This will be used to make sure that a task is not running when getting a stack trace, so implement it in libdrgn.	2019-10-28 13:37:57 -07:00
Omar Sandoval	0f7ad0ed26	libdrgn: stack_trace: support unwinding stack from core dump vmcores include a NT_PRSTATUS note for each CPU containing the PID of the task running on that CPU at the time of the crash and its registers. We can use that to unwind the stack of the crashed tasks.	2019-10-28 13:36:02 -07:00
Omar Sandoval	75c74022ff	libdrgn: save Elf handle for core dump Currently, we close the Elf handle in drgn_set_core_dump() after we're done with it. However, we need the Elf handle in userspace_report_debug_info(), so we reopen it temporarily. We will also need it to support getting stack traces from core dumps, so we might as well keep it open. Note that we keep it even if we're using libkdumpfile because libkdumpfile doesn't seem to have an API to access ELF notes.	2019-10-28 13:09:25 -07:00
Omar Sandoval	4fb0e2e110	libdrgn: use new libdwfl stack trace API	2019-10-18 14:34:11 -07:00
Omar Sandoval	423d2cd500	libdrgn: dwarf_index: rework file reporting Currently, the interface between the DWARF index, libdwfl, and the code which finds and reports vmlinux/kernel modules is spaghetti. The DWARF index tracks Dwfl_Modules via their userdata. However, despite conceptually being owned by the DWARF index, the reporting code reports the Dwfl_Modules and sets up the userdata. These Dwfl_Modules and drgn_dwfl_module_userdatas are messy to track and pass between the layers. This reworks the architecture so that the DWARF index owns the Dwfl instance and files are reported to the DWARF index; the DWARF index takes care of reporting to libdwfl internally. In addition to making the interface for the reporter much cleaner, this improves a few things as a side-effect: - We now deduplicate on build ID in addition to path. - We now skip searching for vmlinux and/or kernel modules if they were already indexed. - We now support compressed ELF files via libdwelf. - We can now load default debug info at the same time as additional debug info.	2019-10-02 17:22:11 -07:00
Omar Sandoval	b05cc0eb75	libdrgn: use libkdumpfile for ELF vmcores when available vmcores don't include program headers for special memory regions like vmalloc and percpu. Instead, we need to walk the kernel page table to map those addresses. Luckily, libkdumpfile already does that. So, if drgn was built with libkdumpfile support, use it for ELF vmcores. Also add an environment variable to override this behavior. Closes #15.	2019-10-02 17:15:36 -07:00
Omar Sandoval	6a13d74c0c	libdrgn: build with bundled elfutils Now that we have the bundled version of elfutils, build it from libdrgn and link to it. We can also get rid of the elfutils version checks from the libdrgn code.	2019-09-19 11:07:12 -07:00
Serapheim Dimitropoulos	93d7ea9f01	Add support for kdump-compressed core dumps with libkdumpfile	2019-08-02 00:20:16 -07:00

1 2

91 Commits