JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-23 09:43:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	0e3054a0ba	libdrgn: make addresses wrap around when reading memory Define that addresses for memory reads wrap around after the maximum address rather than the current unpredictable behavior. This is done by: 1. Reworking drgn_memory_reader to work with an inclusive address range so that a segment can contain UINT64_MAX. drgn_memory_reader remains agnostic to the maximum address and requires that address ranges do not overflow a uint64_t. 2. Adding the overflow/wrap-around logic to drgn_program_add_memory_segment() and drgn_program_read_memory(). 3. Changing direct uses of drgn_memory_reader_reader() to drgn_program_read_memory() now that they are no longer equivalent. (For some platforms, a fault might be more appropriate than wrapping around, but this is a step in the right direction.) Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-03 17:49:29 -07:00
Omar Sandoval	a4b9d68a8c	Use GPL-3.0-or-later license identifier instead of GPL-3.0+ Apparently the latter is deprecated and the former is preferred. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-03 01:10:35 -07:00
Omar Sandoval	63672be809	libdrgn: linux_kernel: save module .init section addresses Linux kernel modules usually contain ELF relocations in DWARF and ORC sections for symbols in .init sections. Since we ignore .init sections entirely in cache_kernel_module_sections(), these relocations end up being based on an address of 0 (so, e.g., a function from .init.text could be reported as having an address of 0x0). It makes a little more sense to use the address where the .init section was before it was freed. So, let's update the sections' sh_addr but continue ignoring them for determining the module's address range. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-26 15:13:47 -07:00
Omar Sandoval	da180b7274	libdrgn: handle errors from elf_strptr() For some reason, we consistently ignore errors from elf_strptr(), but we shouldn't. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-26 14:28:16 -07:00
Omar Sandoval	a24c0f5b33	libdrgn: clean up usage of drgn_stop Use drgn_not_found where it's more appropriate, and check explicitly against drgn_stop instead of err->code == DRGN_ERROR_STOP. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-05 12:46:06 -08:00
Omar Sandoval	9fda010789	Track byte order in scalar types instead of objects Currently, reference objects and buffer value objects have a byte order. However, this doesn't always make sense for a couple of reasons: - Byte order is only meaningful for scalars. What does it mean for a struct to be big endian? A struct doesn't have a most or least significant byte; its scalar members do. - The DWARF specification allows either types or variables to have a byte order (DW_AT_endianity). The only producer I could find that uses this is GCC for the scalar_storage_order type attribute, and it only uses it for base types, not variables. GDB only seems to use to check it for base types, as well. So, remove the byte order from objects, and move it to integer, boolean, floating-point, and pointer types. This model makes more sense, and it means that we can get the binary representation of any object now. The only downside is that we can no longer support a bit offset for non-scalars, but as far as I can tell, nothing needs that. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-02-19 21:41:29 -08:00
Omar Sandoval	30cfa40a72	libdrgn: rename "unavailable" objects to "absent" objects I was going to add an Object.available_ attribute, but that made me realize that the naming is somewhat ambiguous, as a reference object with an invalid address might also be considered "unavailable" by users. Use the name "absent" instead, which is more clear: the object isn't there at all. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-29 14:58:26 -08:00
Omar Sandoval	e72ecd0e2c	libdrgn: replace drgn_program_member_info() with drgn_type_find_member() Now that types are associated with their program, we don't need to pass the program separately to drgn_program_member_info() and can replace it with a more natural drgn_type_find_member() API that takes only the type and member name. While we're at it, get rid of drgn_member_info and return the drgn_type_member and bit_offset directly. This also fixes a bug that drgn_error_member_not_found() ignores the member name length. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 14:40:54 -08:00
Omar Sandoval	abafdd965f	Remove bit_offset from value objects There are a couple of reasons that it was the wrong choice to have a bit_offset for value objects: 1. When we store a buffer with a bit_offset, we're storing useless padding bits. 2. bit_offset describes a location, or in other words, part of an address. This makes sense for references, but not for values, which are just a bag of bytes. Get rid of union drgn_value.bit_offset in libdrgn, make Object.bit_offset None for value objects, and disallow passing bit_offset to the Object() constructor when creating a value. bit_offset can still be passed when creating an object from a buffer, but we'll shift the bytes down as necessary to store the value with no offset. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-14 12:29:17 -08:00
Omar Sandoval	22c1d87aec	libdrgn: cache page_offset and vmemmap as objects instead of uint64_t This is a little cleaner and saves on conversions back and forth between C values and objects. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-10 02:40:07 -08:00
Omar Sandoval	bce9ef5f8d	libdrgn: linux kernel: remove THREAD_SIZE object finder THREAD_SIZE is still broken and I haven't looked into the root cause (see commit `95be142d17` ("tests: disable THREAD_SIZE test")). We don't need it anymore anyways, so let's remove it entirely. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-10 02:08:13 -08:00
Omar Sandoval	5975d19580	libdrgn: report better errors when parsing DWARF/kmod index If the DWARF index encounters any error while parsing, it returns an error saying only "debug information is truncated", which makes it hard to track down parsing errors. The kmod index parser silently swallows errors. For both, replace the mread functions with a higher-level binary_buffer interface that can include more information including the location of the error. For example: /tmp/mybinary: .debug_info+0x4: expected at least 56 bytes, have 55 Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-11-13 17:00:07 -08:00
Omar Sandoval	fa081e32b9	libdrgn: update module section iterator for Linux v5.8 Linux v5.8 changed the module section structure, so we need to get the section name differently. Closes #73. Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-13 13:07:36 -07:00
Omar Sandoval	1c6465f0b0	libdrgn: fix infinite loop on error caching kernel module sections If cache_kernel_module_sections() in report_loaded_kernel_module() fails, we continue to the next iteration without advancing to the next kernel module. Then, we fail on that same kernel module and repeat. Make sure that we go to the next kernel module. Fixes: `423d2cd500` ("libdrgn: dwarf_index: rework file reporting") Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-13 12:25:22 -07:00
Omar Sandoval	431b91ddb5	libdrgn: fix use-after-free in kernel module reporting error case We're freeing path and then using it to report an error. This has some weird knock-on effects. Since we freed the path, the error message contains garbage. So, PyErr_SetString() can't decode it as a UTF-8 string. The end result is a MissingDebugInfoError with no message. Fix it by creating the error before freeing the path. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-13 11:55:06 -07:00
Omar Sandoval	2b325b9262	libdrgn: add an environment variable to disable use of /proc/modules and /sys/module We use /proc/modules and /sys/module to find loaded kernel modules for the running kernel instead of walking the module list in the core dump as an optimization. To make it easier to test the core dump path, add an environment variable to disable the optimization. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-13 11:24:39 -07:00
Omar Sandoval	661a5c56c3	libdrgn: refactor kernel module iterator The next commit will allow using the offline path for the live kernel, so the offline naming won't make much sense. Fold the offline path into the top-level functions, and make the live path an escape hatch. Also add some comments and improve naming for the file and directory handles and update the coding style. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-12 23:22:01 -07:00
Omar Sandoval	ce8540e39c	libdrgn: get rid of kernel_module_iterator::notes* These were added in commit `e5874ad18a` ("libdrgn: use libdwfl"), but they have never been used. Remove them. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-12 17:23:31 -07:00
Omar Sandoval	3c5d22637e	libdrgn: clean up hash function APIs and improve documentation Use _hash_pair() for hash functions that do the full double hashing and return a struct hash_pair and hash_() for other hashing utility functions. Also change some of the equality function names to be more symmetric and improve the documentation. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-12 16:20:08 -07:00
Omar Sandoval	fa44171ba1	libdrgn: split bit operations into their own header And improve their documentation. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-09 17:44:15 -07:00
Omar Sandoval	286c09844e	Clean up #includes with include-what-you-use I recently hit a couple of CI failures caused by relying on transitive includes that weren't always present. include-what-you-use is a Clang-based tool that helps with this. It's a bit finicky and noisy, so this adds scripts/iwyu.py to make running it more convenient (but not reliable enough to automate it in Travis). This cleans up all reasonable include-what-you-use warnings and reorganizes a few header files. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-23 16:29:42 -07:00
Omar Sandoval	f83bb7c71b	libdrgn: move debugging information tracking into drgn_debug_info Debugging information tracking is currently in two places: drgn_program finds debugging information, and drgn_dwarf_index stores it. Both of these responsibilities make more sense as part of drgn_debug_info, so let's move them there. This prepares us to track extra debugging information that isn't pertinent to indexing. This also reworks a couple of details of loading debugging information: - drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into a single structure, drgn_debug_info_module. - The first pass of DWARF indexing now happens in parallel with reading compilation units (by using OpenMP tasks). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	7a85b4188e	libdrgn: clean up read.h helpers and avoid undefined pointer behavior There are a couple of related ways that we can cause undefined behavior when parsing a malformed DWARF or depmod index file: 1. There are several places where we increment the cursor to skip past some data. It is undefined behavior if the result points out of bounds of the data, even if we don't attempt to dereference it. 2. read_in_bounds() checks that ptr <= end. This pointer comparison is only defined if ptr and end both point to elements of the same array object or one past the last element. If ptr has gone past end, then this comparison is likely undefined anyways. Fix it by adding a helper to skip past data with bounds checking. Then, all of the helpers can assume that ptr <= end and maintain that invariant. while we're here and auditing all of the call sites, let's clean up the API and rename it from read_foo() to the less generic mread_foo(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	a97f6c4fa2	Associate types with program I originally envisioned types as dumb descriptors. This mostly works for C because in C, types are fairly simple. However, even then the drgn_program_member_info() API is awkward. You should be able to look up a member directly from a type, but we need the program for caching purposes. This has also held me back from adding offsetof() or has_member() APIs. Things get even messier with C++. C++ template parameters can be objects (e.g., template <int N>). Such parameters would best be represented by a drgn object, which we need a drgn program for. Static members are a similar case. So, let's reimagine types as being owned by a program. This has a few parts: 1. In libdrgn, simple types are now created by factory functions, drgn_foo_type_create(). 2. To handle their variable length fields, compound types, enum types, and function types are constructed with a "builder" API. 3. Simple types are deduplicated. 4. The Python type factory functions are replaced by methods of the Program class. 5. While we're changing the API, the parameters to pointer_type() and array_type() are reordered to be more logical (and to allow pointer_type() to take a default size of None for the program's default pointer size). 6. Likewise, the type factory methods take qualifiers as a keyword argument only. A big part of this change is updating the tests and splitting up large test cases into smaller ones in a few places. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:41:09 -07:00
Omar Sandoval	c31208f69c	libdrgn: fold drgn_type_index into drgn_program This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:36:35 -07:00
Omar Sandoval	948cda2941	libdrgn: add vector/hash table initializers and update coding style Declaring a local vector or hash table and separately initializing it with vector_init()/hash_table_init() is annoying. Add macros that can be used as initializers. This exposes several places where the C89 style of placing all declarations at the beginning of a block is awkward. I adopted this style from the Linux kernel, which uses C89 and thus requires this style. I'm now convinced that it's usually nicer to declare variables where they're used. So let's officially adopt the style of mixing declarations and code (and ditch the blank line after declarations) and update the functions touched by this change.	2020-07-01 12:48:24 -07:00
Omar Sandoval	e4c52c5422	libdrgn: linux_kernel: use names for kmod index constants This makes it much easier to follow along with the code and understand the format.	2020-06-30 15:14:21 -07:00
Omar Sandoval	eea5422546	libdrgn: make Linux kernel stack unwinding more robust drgn has a couple of issues unwinding stack traces for kernel core dumps: 1. It can't unwind the stack for the idle task (PID 0), which commonly appears in core dumps. 2. It uses the PID in PRSTATUS, which is racy and can't actually be trusted. The solution for both of these is to look up the PRSTATUS note by CPU instead of PID. For the live kernel, drgn refuses to unwind the stack of tasks in the "R" state. However, the "R" state is running or runnable, so in the latter case, we can still unwind the stack. The solution for this is to look at on_cpu for the task instead of the state.	2020-05-20 12:03:00 -07:00
Omar Sandoval	4d8597f0f8	libdrgn: add THREAD_SIZE to Linux kernel object finder Despite the naming, this is the kernel stack size.	2020-05-19 17:10:54 -07:00
Omar Sandoval	8b264f8823	Update copyright headers to Facebook and add missing headers drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.	2020-05-15 15:13:02 -07:00
Omar Sandoval	2d1481f5ab	libdrgn: add page table walker kernel memory reader Now that we can walk page tables, we can use it in a memory reader that reads kernel memory via the kernel page table. This means that we don't need libkdumpfile for ELF vmcores anymore (although I'll keep the functionality around until this code has been validated more).	2020-05-08 17:37:56 -07:00
Omar Sandoval	e697be707c	libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET I originally wanted to avoid depending on another vmcoreinfo field, but an the next change is going to depend on swapper_pg_dir in vmcoreinfo anyways, and it ends up being simpler to use it.	2020-05-08 17:37:56 -07:00
Omar Sandoval	23574e59d5	libdrgn: add /proc/kcore physical segments on old kernels Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so it's currently not possible to read from physical memory on old kernels. However, if we can figure out the address of the direct mapping, then we can determine the corresponding physical addresses for the segments and add them.	2020-05-04 13:20:27 -07:00
Omar Sandoval	f8c33518eb	libdrgn: handle kernel core dumps with all zero p_paddr We treat core dumps with all zero p_paddrs as not having valid physical addresses. However, it is theoretically possible for a kernel core dump to only have one segment which legitimately has a p_paddr of 0 (e.g., if it only has a segment for the direct mapping, although note that this isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the BIOS [1]). If the core dump has a VMCOREINFO note, then it is either a vmcore, which has valid physical addresses, or it is /proc/kcore with Linux kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to /proc/kcore") (in v4.19), so it must also have Linux kernel commit 464920104bf7 ("/proc/kcore: update physical address for kcore ram and text") (in v4.11) (ignoring the possibility of a franken-kernel which backported the former but not the latter). Therefore, treat core dumps with a VMCOREINFO note as having valid physical addresses. 1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678	2020-05-04 13:20:27 -07:00
Omar Sandoval	510b6ef3b5	libdrgn: read vmcoreinfo physical address as uint64_t Even on 32-bit architectures, physical addresses can be 64 bits (e.g., x86 with PAE). It's unlikely that the vmcoreinfo note would be in such an address, but let's always parse it as a uint64_t just to be safe.	2020-05-04 13:20:27 -07:00
Omar Sandoval	7a9fad0fd2	libdrgn: move _vmemmap() to object finder Similarly to PAGE_OFFSET, vmemmap makes more sense as part of the Linux kernel object finder than an internal helper. While we're here, let's fix the definition for 5-level page tables. This only matters for kernels with commit 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y") but without eedb92abb9bb ("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y") (namely, v4.14, v4.15, and v4.16); since v4.17, 5-level page table support enables KASLR.	2020-04-10 15:33:29 -07:00
Omar Sandoval	5ac95e491a	libdrgn: fix _page_offset() helper and move to object finder The internal _page_offset() helper gets the value of PAGE_OFFSET, but the fallback when KASLR is disabled has been out of date since Linux v4.20 and never handled 5-level page tables. Additionally, it makes more sense as part of the Linux kernel (formerly vmcoreinfo) object finder so that it's cleanly accessible outside of drgn internals.	2020-04-10 15:33:27 -07:00
Omar Sandoval	1dbc718840	helpers: add pgtable_l5_enabled()	2020-04-10 15:18:46 -07:00
Jay Kamat	6c264b0eae	libdrgn: add language to struct drgn_type For types obtained from DWARF, we determine it from the language of the CU. For other types, it can be specified manually or fall back to the default (C). Then, we can use the language for operations where the type is available.	2020-02-26 19:55:42 -08:00
Omar Sandoval	a5cd92f24e	libdrgn: make vmcoreinfo accessible before loading debug info UTS_RELEASE is currently only accessible once debug info is loaded with prog.load_debug_info(main=True). This makes it difficult to get the release, find the appropriate vmlinux, then load the found vmlinux. We can add vmcoreinfo_object_find as part of set_core_dump(), which makes it possible to do the following: prog = drgn.Program() prog.set_core_dump(core_dump_path) release = prog['UTS_RELEASE'].string_() vmlinux_path = find_vmlinux(release) prog.load_debug_info([vmlinux_path]) The only downside is that this ends up using the default definition of char rather than what we would get from the debug info, but that shouldn't be a big problem.	2020-02-19 12:11:45 -08:00
Omar Sandoval	cc18d9e502	libdrgn: add UTS_RELEASE to vmcoreinfo_object_find The osrelease is accessible via init_uts_ns.name.release, but we can also get it straight out of vmcoreinfo, which will be useful for the next change. UTS_RELEASE is the name of the macro defined in the kernel.	2020-02-19 12:11:20 -08:00
Serapheim Dimitropoulos	501d36c18e	libdrgn: fix regression in kernel module loading Commit `f327552229` ("libdrgn: add strstartswith()") flipped the test for a name entry in modinfo. This introduced a regression resulting in kernel modules not loading at the right offset. This patch fixes the regression.	2019-12-13 19:19:31 -05:00
Omar Sandoval	b0c4f894d4	libdrgn: really fix failed kernel module symbol lookups It turns out this wasn't a problem with dwfl_addrmodule() at all; the real problem is that .init sections are freed once the module is loaded but we're still considering them for the address range we pass to dwfl_report_module(). Ignore those sections entirely (by omitting them from the section name to section index map). While we're here, let's not bother inserting non-SHF_ALLOC sections in the map.	2019-12-12 21:14:02 -08:00
Omar Sandoval	f327552229	libdrgn: add strstartswith() Instead of open coding this check all over the place, add a helper function.	2019-12-12 13:26:50 -08:00
Omar Sandoval	6af6159cfc	libdrgn: support loading only load main debug info If we only want debugging information for vmlinux and not kernel modules, it'd be nice to only load the former. This adds a load_main parameter to drgn_program_load_debug_info() which specifies just that. For now, it's only implemented for the Linux kernel. While we're here, let's make the paths parameter optional for the Python bindings.	2019-11-22 16:38:52 -08:00
Omar Sandoval	423d2cd500	libdrgn: dwarf_index: rework file reporting Currently, the interface between the DWARF index, libdwfl, and the code which finds and reports vmlinux/kernel modules is spaghetti. The DWARF index tracks Dwfl_Modules via their userdata. However, despite conceptually being owned by the DWARF index, the reporting code reports the Dwfl_Modules and sets up the userdata. These Dwfl_Modules and drgn_dwfl_module_userdatas are messy to track and pass between the layers. This reworks the architecture so that the DWARF index owns the Dwfl instance and files are reported to the DWARF index; the DWARF index takes care of reporting to libdwfl internally. In addition to making the interface for the reporter much cleaner, this improves a few things as a side-effect: - We now deduplicate on build ID in addition to path. - We now skip searching for vmlinux and/or kernel modules if they were already indexed. - We now support compressed ELF files via libdwelf. - We can now load default debug info at the same time as additional debug info.	2019-10-02 17:22:11 -07:00
Omar Sandoval	698991b27b	Get rid of DRGN_ERROR_{ELF,DWARF}_ERROR and FileFormatError We're too inconsistent with how we use these for them to be useful (and it's impossible to distinguish between a format error and some other error from libelf/libdw/libdwfl), so let's just get rid of them and make it all DRGN_ERROR_OTHER/Exception.	2019-08-15 15:03:42 -07:00
Omar Sandoval	0c5df56fba	libdrgn: replace symbol index with object index struct drgn_symbol doesn't really represent a symbol; it's just an object which hasn't been fully initialized (see `c2be52dff0` ("libdrgn: rename object index to symbol index"), it used to be called a "partial object"). For stack traces, we're going to have a notion of a symbol that more closely represents an ELF symbol, so let's get rid of the temporary struct drgn_symbol representation and just return an object directly.	2019-07-29 17:04:47 -07:00
Omar Sandoval	27a27940bc	libdrgn: split up drgn_program_get_dwarf() We don't need to get the DWARF index at the time we get the Dwfl handle, so get rid of drgn_program_get_dwarf(), add drgn_program_get_dwfl(), and create the DWARF index right before we update in a new function, drgn_program_update_dwarf_index().	2019-07-19 09:26:30 -07:00
Omar Sandoval	e5874ad18a	libdrgn: use libdwfl libdwfl is the elfutils "DWARF frontend library". It has high-level functionality for looking up symbols, walking stack traces, etc. In order to use this functionality, we need to report our debugging information through libdwfl. For userspace programs, libdwfl has a much better implementation than drgn for automatically finding debug information from a core dump or PID. However, for the kernel, libdwfl has a few issues: - It only supports finding debug information for the running kernel, not vmcores. - It determines the vmlinux address range by reading /proc/kallsyms, which is slow (~70ms on my machine). - If separate debug information isn't available for a kernel module, it finds it by walking /lib/modules/$(uname -r)/kernel; this is repeated for every module. - It doesn't find kernel modules with names containing both dashes and underscores (e.g., aes-x86_64). Luckily, drgn already solved all of these problems, and with some effort, we can keep doing it ourselves and report it to libdwfl. The conversion replaces a bunch of code for dealing with userspace core dump notes, /proc/$pid/maps, and relocations.	2019-07-15 12:27:48 -07:00

1 2

55 Commits