JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-23 09:43:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	948cda2941	libdrgn: add vector/hash table initializers and update coding style Declaring a local vector or hash table and separately initializing it with vector_init()/hash_table_init() is annoying. Add macros that can be used as initializers. This exposes several places where the C89 style of placing all declarations at the beginning of a block is awkward. I adopted this style from the Linux kernel, which uses C89 and thus requires this style. I'm now convinced that it's usually nicer to declare variables where they're used. So let's officially adopt the style of mixing declarations and code (and ditch the blank line after declarations) and update the functions touched by this change.	2020-07-01 12:48:24 -07:00
Omar Sandoval	e4c52c5422	libdrgn: linux_kernel: use names for kmod index constants This makes it much easier to follow along with the code and understand the format.	2020-06-30 15:14:21 -07:00
Omar Sandoval	03d8cb0e32	libdrgn: fix hash_pair_from_non_avalanching_hash() on 64-bit without SSE 4.2 We were forgetting to mask away the extra bits. There are two places that we use the tag without converting it to a uint8_t: hash_table_probe_delta(), which is mostly benign since we mask it by the chunk mask anyways; and table_chunk_match() without SSE 2, which completely breaks. While we're here, let's align the comments better.	2020-06-24 13:33:08 -07:00
Omar Sandoval	8e7c1f1009	drgn 0.0.5	2020-05-26 10:04:06 -07:00
Omar Sandoval	a227d0d50e	Update elfutils and revert activation frame patch After thinking about it some more, I realized that "libdwfl: simplify activation frame logic" breaks the case where during unwinding someone queries isactivation for reasons other than knowing whether to decrement program counter. Revert the patch and refactor "libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it differently. Based on: c95081596 size: Also obey radix printing for bsd format. With the following patches: configure: Add --disable-programs configure: Add --disable-shared libdwfl: add interface for attaching to/detaching from threads libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame libdwfl: add interface for evaluating DWARF expressions in a frame	2020-05-20 13:38:49 -07:00
Omar Sandoval	eea5422546	libdrgn: make Linux kernel stack unwinding more robust drgn has a couple of issues unwinding stack traces for kernel core dumps: 1. It can't unwind the stack for the idle task (PID 0), which commonly appears in core dumps. 2. It uses the PID in PRSTATUS, which is racy and can't actually be trusted. The solution for both of these is to look up the PRSTATUS note by CPU instead of PID. For the live kernel, drgn refuses to unwind the stack of tasks in the "R" state. However, the "R" state is running or runnable, so in the latter case, we can still unwind the stack. The solution for this is to look at on_cpu for the task instead of the state.	2020-05-20 12:03:00 -07:00
Omar Sandoval	146930aff8	libdrgn: replace arch frame_registers with callbacks We currently unwind from pt_regs and NT_PRSTATUS using an array of register definitions. It's more flexible and more efficient to do this with an architecture-specific callback. For x86-64, this change also makes us depend on the binary layout rather than member names of struct pt_regs, but that shouldn't matter unless people are defining their own, weird struct pt_regs.	2020-05-19 17:11:27 -07:00
Omar Sandoval	4d8597f0f8	libdrgn: add THREAD_SIZE to Linux kernel object finder Despite the naming, this is the kernel stack size.	2020-05-19 17:10:54 -07:00
Omar Sandoval	971a2d3687	libdrgn/python: make Objects fully immutable The model has always been that drgn Objects are immutable, but for some reason I went through the trouble of allowing __init__() to reinitialize an already initialized Object. Instead, let's fully initialize the Object in __new__() and get rid of __init__().	2020-05-18 00:07:49 -07:00
Omar Sandoval	ab876f3dbd	libdrgn/python: allow specifying Object value positionally It's annoying to have to do value= when creating objects, especially in interactive mode. Let's allow passing in the value positionally so that `Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's clear enough that this is creating an int with value 0.	2020-05-18 00:07:49 -07:00
Omar Sandoval	8b264f8823	Update copyright headers to Facebook and add missing headers drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.	2020-05-15 15:13:02 -07:00
Omar Sandoval	c339113f9c	libdrgn: adjust program counter when looking up frame symbol For functions that call a noreturn function, the compiler may omit code after the call instruction. This means that the return address may not lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is an "activation", i.e., its program counter is guaranteed to lie within the caller. This is only the case for the initial frame, frames interrupted by a signal, and the signal trampoline frame. For everything else, we need to decrement the program counter before doing any lookups.	2020-05-13 17:11:54 -07:00
Omar Sandoval	175f83fc23	Update elfutils with noreturn unwinding fix Rebase on master and fix dwfl_frame_module/dwfl_frame_dwarf_frame to decrement the program counter when necessary. Based on: a8493c12a libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree With the following patches: configure: Add --disable-programs configure: Add --disable-shared libdwfl: simplify activation frame logic libdwfl: add interface for attaching to/detaching from threads libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register libdwfl: add interface for evaluating DWARF expressions in a frame	2020-05-13 16:41:52 -07:00
Omar Sandoval	bf545105c6	libdrgn: build in silent mode by default The automake/libtool compilation output is obnoxiously verbose. Switch on automake's silent mode, and make the custom rules honor it.	2020-05-10 00:12:50 -07:00
Omar Sandoval	2d1481f5ab	libdrgn: add page table walker kernel memory reader Now that we can walk page tables, we can use it in a memory reader that reads kernel memory via the kernel page table. This means that we don't need libkdumpfile for ELF vmcores anymore (although I'll keep the functionality around until this code has been validated more).	2020-05-08 17:37:56 -07:00
Omar Sandoval	e697be707c	libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET I originally wanted to avoid depending on another vmcoreinfo field, but an the next change is going to depend on swapper_pg_dir in vmcoreinfo anyways, and it ends up being simpler to use it.	2020-05-08 17:37:56 -07:00
Omar Sandoval	8a276838ac	helpers: add access_process_vm() and access_remote_vm() Now that we can walk page tables, we can finally read memory from userspace tasks. Closes #53.	2020-05-08 17:37:01 -07:00
Omar Sandoval	d0a1718451	libdrgn: implement virtual address translation/page table walking There are a few big use cases for this in drgn: * Helpers for accessing memory in the virtual address space of userspace tasks. * Removing the libkdumpfile dependency for vmcores. * Handling gaps in the virtual address space of /proc/kcore (cf. #27). I dragged my feet on implementing this because I thought it would be more complicated, but the page table layout on x86-64 isn't too bad. This commit implements page table walking using a page table iterator abstraction. The first thing we'll add on top of this will be a helper for reading memory from a virtual address space, but in the future it'd also be possible to export the page table iterator directly.	2020-05-08 17:36:19 -07:00
Omar Sandoval	63299e0701	libdrgn: actually use uint64_t for two's complement unary ops UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid signed integer overflow... except that there's a typo and the uint64_t is actually an int64_t. Fix it and add a test that would catch it with -fsanitize=undefined.	2020-05-08 13:50:24 -07:00
Omar Sandoval	8f81ea255f	libdrgn: don't use unaligned loads to parse DWARF -fsanitize=undefined reports that the read_u* helpers rely on unaligned loads. Use memcpy() instead.	2020-05-08 13:50:24 -07:00
Omar Sandoval	3d59e042f4	libdrgn: don't open-code fls() c_integer_literal() has an open-coded equivalent of fls() that assumes that unsigned long long is 64 bits. Use fls() instead.	2020-05-08 00:20:42 -07:00
Omar Sandoval	340e00dfb5	libdrgn: improve and document bit operations fls() can be implemented with __bitop(), and we can get rid of clz() since it's only used by fls().	2020-05-08 00:14:25 -07:00
Omar Sandoval	f49d68d8f9	libdrgn: split generic utility functions out of internal.h internal.h includes both drgn-specific helpers and generic utility functions. Split the latter into their own util.h header and use it instead of internal.h in the generic data structure code. This makes it easier to copy the data structures into other projects/test programs.	2020-05-07 16:03:43 -07:00
Omar Sandoval	a95e42ef2e	libdrgn/python: use vector for Program_load_debug_info() Program_load_debug_info() is the last user of the resize_array()/realloc_array() utility functions. We can clean it up by using a vector and finally get rid of those functions. This also happens to fix three bugs in Program_load_debug_info(): we weren't setting a Python exception if we couldn't allocate the path_args array, we weren't zeroing path_args after resizing the array, and we weren't freeing the path_args array. Shame on whoever wrote this.	2020-05-07 15:47:57 -07:00
Omar Sandoval	0a100064c1	libdrgn: improve and rename DRGN_UNREACHABLE() DRGN_UNREACHABLE() currently expands to abort(), but assert() provides more information. If NDEBUG is defined, we can use __builtin_unreachable() instead. DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to UNREACHABLE(). It's also not really related to errors, so this moves it to internal.h.	2020-05-07 15:16:22 -07:00
Omar Sandoval	d759c7ed20	libdrgn: get rid of OFF_MAX This hasn't been used since commit `417a6f0d76` ("libdrgn: make memory reader pluggable with callbacks").	2020-05-07 14:41:05 -07:00
Omar Sandoval	23574e59d5	libdrgn: add /proc/kcore physical segments on old kernels Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so it's currently not possible to read from physical memory on old kernels. However, if we can figure out the address of the direct mapping, then we can determine the corresponding physical addresses for the segments and add them.	2020-05-04 13:20:27 -07:00
Omar Sandoval	f8c33518eb	libdrgn: handle kernel core dumps with all zero p_paddr We treat core dumps with all zero p_paddrs as not having valid physical addresses. However, it is theoretically possible for a kernel core dump to only have one segment which legitimately has a p_paddr of 0 (e.g., if it only has a segment for the direct mapping, although note that this isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the BIOS [1]). If the core dump has a VMCOREINFO note, then it is either a vmcore, which has valid physical addresses, or it is /proc/kcore with Linux kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to /proc/kcore") (in v4.19), so it must also have Linux kernel commit 464920104bf7 ("/proc/kcore: update physical address for kcore ram and text") (in v4.11) (ignoring the possibility of a franken-kernel which backported the former but not the latter). Therefore, treat core dumps with a VMCOREINFO note as having valid physical addresses. 1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678	2020-05-04 13:20:27 -07:00
Omar Sandoval	5505628235	libdrgn: get rid of struct drgn_program.num_file_segments This isn't used anymore. Remove it and simplify the loop adding file segments.	2020-05-04 13:20:27 -07:00
Omar Sandoval	510b6ef3b5	libdrgn: read vmcoreinfo physical address as uint64_t Even on 32-bit architectures, physical addresses can be 64 bits (e.g., x86 with PAE). It's unlikely that the vmcoreinfo note would be in such an address, but let's always parse it as a uint64_t just to be safe.	2020-05-04 13:20:27 -07:00
Omar Sandoval	10e58777c3	Add Program.read_{u8,u16,u32,u64,word}() I've found that I do this manually a lot (e.g., when digging through a task's stack). Add shortcuts for reading unsigned integers and a note for how to manually read other formats.	2020-04-27 17:27:10 -07:00
Omar Sandoval	b1315fcaa1	libdrgn: add drgn_program_bswap() This is clearer than open-coding the endianness check.	2020-04-27 17:08:02 -07:00
Omar Sandoval	1af57230c4	drgn 0.0.4	2020-04-21 15:52:01 -07:00
Omar Sandoval	ea3cdb43f7	libdrgn/python: optimize Object.__getattribute__() DrgnObject_getattro() uses PyObject_GenericGetAttr() and catches the AttributeError raised if the name is not an attribute of the Object class. If the member is found, we then destroy the AttributeError. Raising an exception only to destroy it is obviously wasteful. Luckily, as of Python 3.7, the lower-level _PyObject_GenericGetAttrWithDict() can suppress the AttributeError; we can raise it ourselves if we need it. In my microbenchmarks, this makes Object.__getattribute__() at least twice as fast when the member exists. This also fixes a drgn_error leak.	2020-04-20 17:45:30 -07:00
Jeffrey Mahoney	9bb2ccecb7	Enable DWARF indexing to work with partial units Loading debuginfo from a kernel with separate or re-combined debuginfo results in the following failure: Traceback (most recent call last): File "/home/jeffm/.local/bin/drgn", line 11, in <module> load_entry_point('drgn==0.0.3+39.gbf05d9bf', 'console_scripts', 'drgn')() File "/home/jeffm/.local/lib/python3.6/site-packages/drgn-0.0.3+39.gbf05d9bf-py3.6-linux-x86_64.egg/drgn/internal/cli.py", line 121, in main prog.load_debug_info(args.symbols, **args.default_symbols) Exception: invalid DW_AT_decl_file 10 A typical kernel build results in DWARF information with DW_TAG_compile_unit tags but the objcopy --only-keep-debug and eu-unstrip commands use DW_TAG_partial_unit tags. The DWARF indexer only handles the former, so the file table isn't populated and we get the exception. Fortunately, the two are more or less interchangeable. (See http://dwarfstd.org/doc/Dwarf3.pdf, section E.2.3). Accepting both for indexing compile units results in a working session.	2020-04-14 14:11:24 -07:00
Omar Sandoval	a3248b51e3	libdrgn: fix use after free when formatting compound types compound_initializer_init_next() saves a pointer to the compound initializer stack and uses it after appending to the stack, which may have reallocated the stack.	2020-04-13 16:49:18 -07:00
Jay Kamat	ecef9d74ef	libdrgn: get rid of arrays embedded in drgn_type For C++ support, we need to add an array of template parameters to struct drgn_type. struct drgn_type already has arrays for members, enumerators, and parameters embedded at the end of the structure, because no type needs more than one of those. However, struct, union, and class types may need members and template parameters. We could add a separate array of templates, but then it gets confusing having two methods of storing arrays in struct drgn_type. Let's make these arrays separate instead of embedding them.	2020-04-13 16:47:05 -07:00
Omar Sandoval	7a9fad0fd2	libdrgn: move _vmemmap() to object finder Similarly to PAGE_OFFSET, vmemmap makes more sense as part of the Linux kernel object finder than an internal helper. While we're here, let's fix the definition for 5-level page tables. This only matters for kernels with commit 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y") but without eedb92abb9bb ("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y") (namely, v4.14, v4.15, and v4.16); since v4.17, 5-level page table support enables KASLR.	2020-04-10 15:33:29 -07:00
Omar Sandoval	5ac95e491a	libdrgn: fix _page_offset() helper and move to object finder The internal _page_offset() helper gets the value of PAGE_OFFSET, but the fallback when KASLR is disabled has been out of date since Linux v4.20 and never handled 5-level page tables. Additionally, it makes more sense as part of the Linux kernel (formerly vmcoreinfo) object finder so that it's cleanly accessible outside of drgn internals.	2020-04-10 15:33:27 -07:00
Omar Sandoval	1dbc718840	helpers: add pgtable_l5_enabled()	2020-04-10 15:18:46 -07:00
Jeff Mahoney	bf05d9bf3f	libdrgn: allow to build without openmp The configure script allows the user to not use any openmp implementation but dwarf_index.c uses the locking APIs unconditionally. This compiles but fails at runtime. Adding simple stubs for the locking API. This is useful when debugging crashes in dwarf indexing during development.	2020-04-08 12:33:40 -07:00
Omar Sandoval	512acbc9c8	Rebase elfutils to pick up PN_XNUM fix elfutils 0.179 was just released, and it includes my fix for vmcores with >= 2^16 phdrs. Based on: 3a7728808 Prepare for 0.179 With the following patches: configure: Add --disable-programs configure: Add --disable-shared libdwfl: add interface for attaching to/detaching from threads libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register libdwfl: add interface for evaluating DWARF expressions in a frame	2020-04-06 12:07:01 -07:00
Jay Kamat	d8fadf10ee	libdrgn: Add cpp language and tests	2020-04-03 16:35:38 -07:00
Omar Sandoval	fa61977f60	libdrgn: fix default language detection Jay reported that the default language detection was happening too early and not finding "main". We need to make sure to do it after the DWARF index is actually populated. The problem with that is that it makes error reporting much harder, as we don't want to return a fatal error from drgn_program_set_language_from_main() if we actually succeeded in loading debug info. That means we probably need to ignore errors in drgn_program_set_language_from_main(). To reduce the surface area where we'd be failing, let's get the language directly from the DWARF index. This also allows us to avoid setting the language if it's actually unknown (information which is lost by the time we convert it to a drgn_object in the current code).	2020-04-03 16:35:38 -07:00
Omar Sandoval	bd902f299c	libdrgn: return unknown language from drgn_language_from_die() Currently, drgn_language_from_die() returns the default language when it encounters an unknown DW_LANG because the dwarf_info_cache always wants a language. The next change will want to detect the unknown language case, so make drgn_language_from_die() return NULL if the language is unknown, move it to language.c, and fold drgn_language_from_dw_lang() into it.	2020-04-03 16:35:38 -07:00
Serapheim Dimitropoulos	08193a97aa	Support stack traces for running threads on kdumps	2020-03-27 16:12:03 -07:00
Omar Sandoval	79f973007b	libdrgn/python: fix reference counting on Object.type_ We need to keep the Program alive for its types to stay valid, not just the objects the Program has pinned. (I have no idea why I changed this in commit `565e0343ef` ("libdrgn: make symbol index pluggable with callbacks").)	2020-03-13 16:05:43 -07:00
Omar Sandoval	cae7336750	libdrgn: fix error when expecting identifier after tag in type name We should be looking at the kind of the previous token, not the kind of the unexpected token. Closes #52.	2020-03-13 11:07:46 -07:00
Jay Kamat	3f870603fa	libdrgn: add default language to drgn_program For operations where we don't have a type available, we currently fall back to C. Instead, we should guess the language of the program and use that as the default. The heurisitic implemented here gets the language of the CU containing "main" (except for the Linux kernel, which is always C). In the future, we should allow manually overriding the automatically determined language.	2020-02-26 19:55:42 -08:00
Jay Kamat	6c264b0eae	libdrgn: add language to struct drgn_type For types obtained from DWARF, we determine it from the language of the CU. For other types, it can be specified manually or fall back to the default (C). Then, we can use the language for operations where the type is available.	2020-02-26 19:55:42 -08:00

1 2 3 4 5

248 Commits