JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-26 10:35:36 +00:00

Author	SHA1	Message	Date
Omar Sandoval	18b12a5c7b	libdrgn: get .eh_frame from the correct file We're currently getting .eh_frame from the debug file. However, since .eh_frame is an SHF_ALLOC section, it is actually in the loaded file, and may not be in the debug file. This causes us to fail to unwind in modules whose debug file was created with objcopy --only-keep-debug (which is typical for Linux distro debug files). Fix it by getting .eh_frame from the loaded file. To make this easier, we split .eh_frame and .debug_frame data into two separate tables. We also don't bother deduplicating them anymore, since GCC and Clang only seem to generate one or the other in practice. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	270375f077	libdrgn: debug_info: get "loaded" ELF file For upcoming changes, we will need loaded (SHF_ALLOC) sections for modules. Some separate debug files (e.g., those created with objcopy --only-keep-debug) don't have those sections. Let's get the loaded file from libdwfl with dwfl_module_getelf() and save it in a struct drgn_elf_file. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	bcb53d712b	libdrgn: bypass libdwfl with struct drgn_elf_file Now that we track the debug file ourselves, we can avoid calling libdwfl in a bunch of places. By tracking the bias ourselves, we can avoid a bunch more. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	34f122144a	libdrgn: debug_info: wrap ELF file information in new struct drgn_elf_file struct drgn_module contains a bunch of information about the debug info file. Let's pull it out into its own structure, struct drgn_elf_file. This will be reused for the "main"/"loaded" file in an upcoming change. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	b3bab1c5b0	libdrgn: make module vs. program platform difference more clear It's confusing that we have a platform both for the program and for each module. They usually match, but they're not required to. For example, the user can manually add a file with a different platform just to read its debug info. Our rule is that if we're parsing anything from the module, we use the module platform; and otherwise, use the program platform. There are a couple of places where the platforms must match: when using call frame information (CFI) or registers. Let's make all of this more clear in the code (by using the module's platform even when it must match the program's platform) and in comments. No functional change. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 12:53:45 -08:00
Omar Sandoval	87b7292aa5	Relicense drgn from GPLv3+ to LGPLv2.1+ drgn is currently licensed as GPLv3+. Part of the long term vision for drgn is that other projects can use it as a library providing programmatic interfaces for debugger functionality. A more permissive license is better suited to this goal. We decided on LGPLv2.1+ as a good balance between software freedom and permissiveness. All contributors not employed by Meta were contacted via email and consented to the license change. The only exception was the author of commit `c4fbf7e589` ("libdrgn: fix for compilation error"), who did not respond. That commit reverted a single line of code to one originally written by me in commit `640b1c011d` ("libdrgn: embed DWARF index in DWARF info cache"). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-01 17:05:16 -07:00
Omar Sandoval	70af25849c	libdrgn: rename drgn_debug_info_module to drgn_module Eventually, modules will be exposed as part of the public libdrgn API, so they should have a clean name. Additionally, the module API I'm currently working on will allow modules for which we don't have the debug info file, so "debug info module" would be a misnomer. Also rename drgn_dwarf_module_info to drgn_module_dwarf_info and drgn_orc_module_info to drgn_module_orc_info to fit the new naming scheme better. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-05 16:52:46 -07:00
Omar Sandoval	d76a3a338f	libdrgn: string_builder: add dedicated initializer Rather than documenting how to initialize a struct string_builder, provide an initializer, STRING_BUILDER_INIT. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-05 15:32:07 -07:00
Omar Sandoval	c9265ef6d6	libdrgn: move alloc_or_reuse() to util.h Preparation for using it elsewhere. Also make it take a size_t instead of uint64_t. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-06-01 05:11:58 -07:00
Omar Sandoval	b16dad8a36	libdrgn: support SHT_REL relocations In preparation for supporting ELF relocations for more architectures, generalize ELF relocations to handle SHT_REL sections/ElfN_Rel. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-04-18 17:56:37 -07:00
Omar Sandoval	929b7de266	libdrgn: handle reading data from SHT_NOBITS sections Peilin Ye reported a couple of related crashes in drgn caused by Linux kernel modules which had been processed with objcopy --only-keep-debug (although he notes that since binutils-gdb commit 8c803a2dd7d3 ("elf_backend_section_flags and _bfd_elf_init_private_section_data") (in binutils v2.35), objcopy --only-keep-debug doesn't seem to work for kernel modules). If given an SHT_NOBITS section, elf_getdata() returns an Elf_Data with d_buf = NULL and d_size set to the size in the section header, which is often non-zero. There are a few places where this can cause us to dereference a NULL pointer: * In relocate_elf_sections() for the relocated section data. * In relocate_elf_sections() for the symbol table section data. * In get_kernel_module_name_from_modinfo(). * In get_kernel_module_name_from_this_module(). Fix it by checking the section type or directly checking Elf_Data::d_buf everywhere that could potentially get an SHT_NOBITS section. This is based on a PR from Peilin Ye. Closes #145. Reported-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-01-27 12:23:09 -08:00
Omar Sandoval	8e8e3a4f57	libdrgn: debug_info: refactor relocate_elf_section() relocate_elf_section() shouldn't need to deal with reading the sections. Pull that logic out into relocate_elf_file() (which will be shared with REL-style relocations when we support those) and rename relocate_elf_section() to apply_elf_relas(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-01-27 12:22:57 -08:00
Omar Sandoval	26ff3667cb	libdrgn: debug_info: use elf_rawdata() instead of elf_getdata() Most of the places where we call elf_getdata() (via read_elf_section()) deal with SHT_PROGBITS sections. elf_getdata() always returns the literal contents of the file for SHT_PROGBITS sections (usually straight out of the mmap'd file). The exceptions are the SHT_RELA and SHT_SYMTAB sections in relocate_elf_section(). For these, if the byte order or alignment of the file do not match the host, elf_getdata() allocates a new buffer and converts the contents. relocate_elf_section() also handles unaligned buffers and swaps the byte order, so it mistakenly ends up with the original byte order of the file. Rather than removing that from relocate_elf_section(), let's avoid the extra allocation and use elf_rawdata(), which always returns the literal contents of the file. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-01-27 12:20:10 -08:00
Omar Sandoval	061094187b	libdrgn: debug_info: serialize initial calls to dwfl_module_getdwarf dwfl_module_getdwarf() may call into debuginfod_find_executable() or debuginfod_find_debuginfo(), which aren't thread-safe. So, let's put the initial call of dwfl_module_getdwarf() (which is the call that may go into the debuginfod client) into a critical section. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-09 20:37:14 +00:00
Omar Sandoval	e6abfeac03	libdrgn: debug_info: report userspace core dump debug info ourselves There are a few reasons for this: 1. dwfl_core_file_report() crashes on elfutils 0.183-0.185. Those versions are still used by several distros. 2. In order to support --main-symbols and --symbols properly, we need to report things ourselves. 3. I'm considering moving away from libdwfl in the long term. We provide an escape hatch for now: setting the environment variable DRGN_USE_LIBDWFL_REPORT=1 opts out of drgn's reporting and uses libdwfl's. Fixes #130. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-08 12:11:10 -08:00
Omar Sandoval	02912ca7d0	libdrgn: fix handling of p_filesz < p_memsz in core dumps I implemented the case of a segment in a core file with p_filesz < p_memsz by treating the difference as zero bytes. This is correct for ET_EXEC and ET_DYN, but for ET_CORE, it actually means that the memory existed in the program but was not saved. For userspace core dumps, this typically happens for read-only file mappings. For kernel core dumps, makedumpfile does this to indicate memory that was excluded. Instead, let's return a DRGN_FAULT_ERROR if an attempt is made to read from these bytes. In the future, we need to read from the executable/library files when we can. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-08 00:02:44 -08:00
Omar Sandoval	844d82848c	libdrgn: add partial support for .gnu_debugaltlink Issue #130 reported an "unknown attribute form 0x1f20" from drgn. 0x1f20 is DW_FORM_GNU_ref_alt, which is a reference to a DIE in an alternate file. Similarly, DW_FORM_GNU_strp_alt is a string in an alternate file. The alternate file is specified by the .gnu_debugaltlink section. This is generated by dwz, which is used by at least Fedora and Debian. libdwfl already finds the alternate debug info file, so we can save its .debug_info and .debug_str and use those to support DW_FORM_GNU_ref_alt and DW_FORM_GNU_strp_alt in the DWARF index. Imported units are going to be more work to support in the DWARF index, but this at least lets drgn start up. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-07 13:49:09 -08:00
Omar Sandoval	aef144c944	libdrgn: debug_info: improve elf_address_range() Instead of iterating through every segment, we can just look at the first and last loadable segments. This even works for vmlinux on x86-64 and Arm which have some special, relocatable segments. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-06 13:33:55 -08:00
Omar Sandoval	10c66d4e99	libdrgn: get correct error when dwelf_elf_gnu_build_id() fails The documentation for libdwelf states that "functions starting with dwelf_elf will take a (libelf) Elf object as first argument and might set elf_errno on error". So, we should be using drgn_error_libelf(), not drgn_error_libdwfl(). While we're here, close the Elf handle before the file descriptor for consistency. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-06 01:51:54 -08:00
Omar Sandoval	0e318754fe	libdrgn: don't swallow errors in relocate_elf_file() Fixes: `62d98b3016` ("libdrgn: fold ELF relocation code into dwarf_index") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-12-03 11:31:05 -08:00
Omar Sandoval	c0d8709b45	Update copyright headers to Meta Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-21 15:59:44 -08:00
Omar Sandoval	4808ef72ee	libdrgn: debug_info: get address range of reported ET_EXEC files When explicitly reporting a debugging information file for a userspace program, userspace_report_debug_info() currently always reports it with a load address range of [0, 0) (i.e., not actually loaded into the program). This is because for ET_DYN and ET_REL files, we have to determine the address range by inspecting the core dump or program state, which is a bit involved. However, ET_EXEC is much easier: we can get the address range from the segment headers. In fact, we already implemented this for vmlinux files, so we can reuse that with a modification to make it more permissive. ET_CORE debug info files don't make much sense, but libdwfl seems to treat a reported ET_CORE file the same as ET_EXEC (see dwfl_report_elf()), so we do, too. Unfortunately, most executables on modern Linux distributions are ET_DYN, but this will at least make testing easier. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-19 14:58:10 -08:00
Omar Sandoval	c3f31e28f9	libdrgn: reorganize and move DWARF index into dwarf_info.c The upcoming introduction of a higher level data structure to represent a namespace has implications on the organization of the DWARF index and debug info management code. Basically, we're going to want to track what is currently known as struct drgn_dwarf_index_namespace as part of the new struct drgn_namespace. That only leaves the DWARF specification map and list of CUs in struct drgn_dwarf_index, which doesn't make much sense anymore. Instead, let's: * Move the specification map and CUs into struct drgn_dwarf_info. * Rename struct drgn_dwarf_index_namespace to struct drgn_namespace_dwarf_index to indicate that it is the "DWARF index for a namespace" rather than a "namespace of a DWARF index". * Move the DWARF index implementation into dwarf_info.c. The DWARF index and debugging information management have always been coupled, so this makes it more explicit and is more convenient. * Improve documentation and naming in the DWARF index implementation. Now, the only DWARF-specific code outside of dwarf_info.c is for stack tracing, but we'll leave that for another day. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:55 -08:00
Omar Sandoval	5591d199b1	libdrgn: debug_info: split DWARF support into its own file Continuing the refactoring from the previous commit, move the DWARF code from debug_info.c to its own file, leaving only the generic ELF file management in debug_info.c Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:54 -08:00
Omar Sandoval	c6b2bc4181	libdrgn: debug_info: split ORC support into its own file debug_info.c currently contains code for managing ELF files with debugging information, for parsing DWARF, and for parsing ORC. Let's split it up, starting by moving ORC support to its own file. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:04 -08:00
Jay Kamat	3700bb75b8	libdrgn: Follow typedefs in enum backing type lookup In C++ enums can be a typedef to an int, not just an int itself. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2021-11-18 13:48:31 -08:00
Omar Sandoval	40357b9d9e	libdrgn: debug_info: don't use strlen() in drgn_debug_info_find_object() The length of the name was passed, and the name may not be null-terminated. Fixes: `565e0343ef` ("libdrgn: make symbol index pluggable with callbacks") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-17 18:06:43 -08:00
Omar Sandoval	568f4f9c2b	libdrgn: debug_info: remove dies and length out parameters to drgn_dwarf_die_iterator_next() These are already available in it->dies. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-29 18:38:49 -07:00
Omar Sandoval	198499e74b	libdrgn: debug_info: optimize drgn_find_die_ancestors() Jay pointed out that when finding the ancestors for a DIE, we should use DW_AT_sibling to skip over subtrees that can't contain the target DIE. So, let's check each DIE that we encounter for a DW_AT_sibling attribute. dwarf_attr() also returns the end of the DIE if it doesn't find the attribute, which we can use to avoid parsing DIEs redundantly. This doesn't fit very well into drgn_dwarf_iterator, so let's just hand-roll this special type of iteration. In my measurements, this made drgn_find_die_ancestors() ~6x as fast on average. Closes #124. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-29 18:38:40 -07:00
Omar Sandoval	1339dc6a2f	libdrgn: hash_table: move entry_to_key to DEFINE_HASH_TABLE_FUNCTIONS() DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type. Move that argument (and some of the derived constants) to DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table types. As a nice side effect, it also reduces the size of common header files. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-23 00:52:23 -07:00
Omar Sandoval	802d6cc9ff	libdrgn: rename drgn_program::_dbinfo to dbinfo The underscore was meant to discourage direct access in favor of using drgn_program_get_dbinfo(), but it turns out that it's more normal to access it directly. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-10-23 00:52:23 -07:00
Omar Sandoval	fba5947fec	libdrgn: add array_for_each() And use it in a few appropriate places. This should hopefully make it harder to make iteration mistakes like the one fixed by commit `4755cfac7c` ("libdrgn: dwarf_index: increment correct variable when rolling back"). While we're doing this, move ARRAY_SIZE() into a new header file with array_for_each() and make it lowercase. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-08-23 17:32:00 -07:00
Omar Sandoval	8b4532ca0a	libdrgn: debug_info: improve handling of DW_AT_data_member_location There are a couple of issues with how we interpret DW_AT_data_member_location: 1. DW_AT_data_member_location can be a location list, and we shouldn't interpret the section offset as the member offset. 2. DW_AT_data_member_location can be location description block, and in DWARF 2, it cannot be a constant. We should handle constant offset expressions as generated by GCC and Clang. Closes #13. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-08-16 11:57:42 -07:00
Omar Sandoval	611e4d90b2	libdrgn: debug_info: support DWARF 3 forms for loclistptr DWARF 3 uses DW_FORM_data4 or DW_FORM_data8 for DW_AT_location loclistptrs. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-08-13 17:52:24 -07:00
Omar Sandoval	26001733f6	libdrgn: debug_info: support DWARF 5 location lists The DWARF 5 format is a little more complicated than DWARF 2-4 but functionally very similar. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-07-09 01:52:08 -07:00
Omar Sandoval	215f7d79d7	libdrgn: debug_info: implement DW_OP_{addr,const}x These were added in DWARF 5. They need to know the CU that they're being evaluated in, but the parameters for drgn_eval_dwarf_expression() were already getting unwieldy. Wrap the evaluation context in a new struct drgn_dwarf_expression_context, add the additional CU information, and implement the operations. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-07-09 01:52:08 -07:00
Omar Sandoval	81053a1c57	libdrgn: dwarf_index: support DWARF 5 The main changes are: 1. Skipping the new attribute forms. 2. Handling DW_FORM_strx*, DW_FORM_line_strp, and DW_FORM_implicit_const for the attributes that we care about. 3. Parsing the new unit header format. 4. Parsing the new line number program header format. Note that Clang currently produces an incorrect DWARF 5 line number program header for the Linux kernel (https://reviews.llvm.org/D105662), so some types are not properly deduplicated in that case. Closes #104. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-07-09 01:51:59 -07:00
Omar Sandoval	347c578aa0	libdrgn: debug_info: don't open-code drgn_platform_address_size() Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-07-07 15:03:03 -07:00
Omar Sandoval	fbe102f37e	libdrgn: debug_info: handle incomplete .debug_aranges Clang does not generate .debug_aranges by default, but the GNU toolchain does. This means that a Linux kernel binary compiled with Clang and GNU binutils will have ranges in .debug_aranges for assembly files and nothing else. This breaks our assumption that a non-empty .debug_aranges has ranges for every compilation unit. Fix it by always falling back to checking every CU if a range was not found in .debug_aranges. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-07-07 11:25:13 -07:00
Omar Sandoval	6b2dda3f95	libdrgn: bring back dwfl_core_file_report() bug workaround This workaround was originally present in commit `e5874ad18a` ("libdrgn: use libdwfl"). We dropped in in commit `6a13d74c0c` ("libdrgn: build with bundled elfutils") because the bundled version of elfutils had the fix. We forgot to bring it back in commit `4c5c5f3842` ("Remove bundled version of elfutils") even though we support versions without the fix. Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-09 15:16:12 -07:00
Omar Sandoval	faad25d7b2	libdrgn: debug_info: fix address of objects with size zero The stack trace variable work introduced a regression that causes objects with size zero to always be marked absent even if they have an address. This matters because GCC sometimes seems to omit the complete array type for arrays declared without a length, so an array variable can end up with an incomplete array type. I saw this with the "swapper_spaces" variable in mm/swap_state.c from the Linux kernel. Make sure to use the address of an empty piece if the variable is also empty. Fixes: `ffcb9ccb19` ("libdrgn: debug_info: implement creating objects from DWARF location descriptions") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-07 15:46:22 -07:00
Omar Sandoval	bc85767e5f	libdrgn: support looking up parameters and variables in stack traces After all of the preparatory work, the last two missing pieces are a way to find a variable by name in the list of scopes that we saved while unwinding, and a way to find the containing scopes of an inlined function. With that, we can finally look up parameters and variables in stack traces. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	0e113ecc8d	libdrgn: debug_info: add drgn_find_die_ancestors() This will be used for finding the ancestors of the abstract instance root corresponding to a concrete inlined instance root for variable lookups in inlined functions. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	d8d4157346	libdrgn: debug_info: add drgn_debug_info_module_find_dwarf_scopes() This will be used for finding functions, inlined functions, and blocks containing a PC for stack unwinding and variable lookups. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	b6d810b344	libdrgn: debug_info: add DWARF DIE iterator We have a couple of upcoming use cases for iterating through all of the DIEs in a module: searching for scopes and searching for a DIE's ancestors. Add a DIE iterator interface to abstract away the details of walking DIEs and allows us to efficiently track ancestors. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	ffcb9ccb19	libdrgn: debug_info: implement creating objects from DWARF location descriptions Add support for evaluating a DWARF location description and translating it into a drgn object. In this commit, this is just used for global variables, but an upcoming commit will wire this up to stack traces for parameters and local variables. There are a few locations that drgn's object model can't represent yet. DW_OP_piece/DW_OP_bit_piece can describe objects that are only partially known or partially in memory; we approximate these where we can. We don't have a good way to support DW_OP_implicit_pointer at all yet. This also adds test cases for DWARF expressions, which we couldn't easily test before. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	8335450ecb	libdrgn: debug_info: implement DW_OP_fbreg Implement looking up location descriptions and evaluating DW_OP_fbreg. This isn't actually used yet since CFI expressions don't have a current function DIE, but it will be used for parameters/local variables in stack traces. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	d5b68455b8	libdrgn: debug_info: save .debug_loc .debug_loc will be used for variable resolution. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	e105be6c18	libdrgn: debug_info: add helper to cache module section Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	dcda688c9a	libdrgn: debug_info: parenthesize PUSH() macro argument It doesn't make a difference anywhere it's currently used, but let's do it just in case that changes in the future. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00

1 2 3

102 Commits