For Python-based object, type, and symbol finders, the vmcoreinfo is a
critical source of information. It can contain addresses necessary for
loading certain information (such as kallsyms). Expose this information
as a special object.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The current generic vector API is pretty minimal and exposes its
internal members as part of the public interface. This has worked well
but prevents us from changing the vector implementation. In particular,
I'd like to have "small vector" variants that can store some entries
directly in the vector structure, use a smaller integer type for the
size and capacity, or both.
So, let's make the generated vector type "private" and add accessor
functions. This is very verbose in some cases, but it'll grant us much
more flexibility. While we're changing every user anyways, let's also
make use of _cleanup_(vector_deinit) where possible.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the call to drgn_debug_info_main_language() from
drgn_program_set_language_from_main() fails, then the latter needs to
bail, not write garbage from the stack into prog->lang, which will crash
later.
Fixes: 5591d199b1 ("libdrgn: debug_info: split DWARF support into its own file")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The makedumpfile flattened format is occasionally seen by users, but is
not read by libkdumpfile and thus unsupported by Drgn. A simple
'reassembly' process is all that is necessary to allow Drgn to open the
vmcore, but this fact isn't easily discoverable, resulting in issues
like #344. To help users, detect this when we're testing for kdump
signatures, and raise an error with reassembly instructions.
For further details on the flattened format, consult makedumpfile(8),
particularly the sections documenting options -F and -R.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
The lack of a semicolon after these macros has always confused tooling
like cscope. We could add semicolons everywhere now, but let's enforce
it for the future, too. Let's add a dummy struct forward declaration at
the end of each macro that enforces this requirement and also provides a
useful error message.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Exceptions aren't enough to debug complicated code paths like debug info
discovery or stack unwinding. We really need logs for that, so let's add
a small logging framework. By default, we log to stderr, but we also
provide a way to direct logs to a different file, or even an arbitrary
callback so that logs can be directed to the application's logging
library of choice.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DWARF indexing can take a long time; Kevin Svetlitski notes that it can
take almost a minute on some large binaries. Let's use the new blocking
API around it so that the Python bindings drop the GIL.
Closes#247.
Suggested-by: Kevin Svetlitski <svetlitski@meta.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are places in drgn where it'd be a good idea to drop the Python
GIL. However, some of these are deep inside of libdrgn, where some code
paths are fast and dropping the GIL would be extra overhead and others
are slow (e.g., type lookups, which may be cached or may require DWARF
namespace indexing). Instead of trying to do this from the Python
bindings, add hooks to libdrgn. These hooks can be used directly or with
a new scope guard macro, drgn_blocking_guard, that we can start
sprinkling around in appropriate places in libdrgn.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We currently use crashing_cpu to determine the thread that caused a
kernel crash. However, crashing_cpu is x86-specific (it is defined in
arch/x86/kernel/reboot.c). Since Linux 4.5, the generic panic code
defines a very similar variable, panic_cpu. Use that instead so that we
support all architectures, but fall back to crashing_cpu to support
older kernels on x86 (even though we don't claim to support 4.4
anymore).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.
All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We have a couple of loops that deal with short reads/EINTR from read(2)
and pread(2), and upcoming changes would need to add more. Add some
wrappers to abstract this away.
drgn_read_memory_file() still needs the loop so it can fault on the
exact offset that returns EIO.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There's a lot more context here that we should write down. It's also
worth noting that it appears that GDB always zero fills the range
between p_filesz and p_memsz, so if we end up having any other issues
because of this, we might have to concede and go back to the behavior
before commit 02912ca7d0 ("libdrgn: fix handling of p_filesz < p_memsz
in core dumps").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
makedumpfile will exclude zero pages. We found a core file where a
structure straddled a page boundary and the end of the structure
was all zeros so the page was excluded and we were generating a
FaultError trying to access the structure.
This change reverts a portion of that behaviour such that when we are
debugging a kernel core we go back to the zero fill behaviour. To do this
we go back to creating segments based on memsz instead of filesz and
handling the filesz->memsz gap in drgn_read_memory_file.
Fixes: 02912ca7d0 ("libdrgn: fix handling of p_filesz < p_memsz in core dumps")
Signed-off-by: Glen McCready <gkm@mysteryinc.ca>
The config option is and always has been CONFIG_FW_CFG_SYSFS, not
CONFIG_FW_CFG. Also suggest the user-visible CONFIG_KEXEC instead of the
internal CONFIG_CRASH_CORE.
Fixes: 2bd861f719 ("libdrgn: program: detect QEMU guest memory dumps without VMCOREINFO")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
On x86-64, the difference between virtual addresses in the direct map
and the corresponding physical addresses is called PAGE_OFFSET, so we
exposed that via an architecture callback and the Linux kernel object
finder. However, this doesn't translate to other architectures. Namely,
on AArch64, the difference is PAGE_OFFSET - PHYS_OFFSET, and both
PAGE_OFFSET and PHYS_OFFSET have varied over time and between
configurations.
We can remove the architecture callback and avoid version-specific logic
by letting the page table tell us the offset. We just need an address in
the direct map, which is easy to find since this includes kmalloc and
memblock allocations.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
AArch64 will need different sizes of page table iterators depending on
the page size and virtual address size. Rather than the static
pgtable_iterator_arch_size, allow architectures to define callbacks for
allocating and freeing a page table iterator. Also remove the generic
page table iterator wrapper and just pass that information to the
iterator function.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Issue #182 reported that a core dump created by QEMU's dump-guest-memory
command confuses drgn: by default, it only has NT_PRSTATUS notes and
QEMU state notes for each CPU, so drgn thinks it's a userspace core
dump, and it doesn't have the necessary VMCOREINFO to use it as a Linux
kernel core dump.
It turns out that QEMU and Linux can cooperate to add a VMCOREINFO note
to the guest memory dump, which suffices for drgn. Let's detect a QEMU
guest memory dump without a VMCOREINFO note and include instructions on
how to capture a QEMU dump that makes drgn happy.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In order to support removing the pointer authentication code (PAC) from
return addresses on AArch64, we need to know what bits are being used
for the PAC. We can get this from the NT_ARM_PAC_MASK note in userspace
core dumps and from the NUMBER(KERNELPACMASK) field in VMCOREINFO for
Linux kernel core dumps.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In an upcoming commit, we will parse the AArch64 pointer authentication
code mask either from the VMCOREINFO note or the NT_ARM_PAC_MASK note.
Since it doesn't always come from VMCOREINFO, it doesn't make sense to
put it in struct vmcoreinfo; struct drgn_program makes more sense. So,
make parse_vmcoreinfo() take struct drgn_program instead of struct
vmcoreinfo, rename it to drgn_program_parse_vmcoreinfo(), and replace
struct vmcoreinfo with an anonymous struct in struct drgn_program.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
On !SMP kernels, crashing_cpu either doesn't exist or is always -1, so
drgn_program_crashed_thread() fails. Detect those cases and treat
crashing_cpu as 0.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Our cheap heuristic for the default language will not always be correct,
and although we can improve it as cases arise, we should also just have
a way for the user to explicitly set the default language. Add
drgn_program_set_language() to libdrgn and allow setting
drgn.Program.language in the Python bindings. This will also make unit
testing different languages easier.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This implements the existing thread API methods for live processes other
than drgn_thread_stack_trace(). It also doesn't yet add support for
full-blown tracing, but it at least brings live processes to feature
parity. This is taken from the non-ptrace parts of Kevin Svetlitski's
PR #142, with some modifications.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_thread_iterator_create(), drgn_thread_iterator_next(), and
drgn_program_find_thread() have big, divergent code paths for different
targets, and this would get worse once we add live processes. Split them
up into multiple functions.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We have a few places where we format a decimal number with sprintf() or
snprintf() to a buffer with an arbitrary size. Instead of this arbitrary
size, let's add a macro to get the exact number of characters required
to format a decimal number, use it in all of these places, and make all
of these places use snprintf() just to be safe. This is more verbose but
self-documenting. The max_decimal_length() macro is inspired by
https://stackoverflow.com/a/13546502/1811295 with some improvements.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If a TID does not exist, then linux_helper_find_task() succeeds but
returns a null pointer object. Check for that instead of returning a
bogus thread.
Fixes: 301cc767ba ("Implement a new API for representing threads")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently only supported for user-space crash dumps. E.g. no support for
live user-space application debugging or kernel debugging.
Closes#144.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Running the test suite with ASAN enabled revealed that when
the current target was a userspace core dump, the `crashed_thread`
member of `struct drgn_program` was being freed twice – once indirectly
via `drgn_thread_set_deinit`, and once explicitly in `drgn_prog_deinit`.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
Currently we can lookup symbols by name or address, but this will only
return one symbol, prioritizing the global symbols. However, symbols may
share the same name, and symbols may also overlap address ranges, so
it's possible for searches to return multiple results. Add functions
which can return a list of multiple matching symbols.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
Previously, drgn had no way to represent a thread – retrieving a stack
trace (the only extant thread-specific operation) was achieved by
requiring the user to directly provide a tid.
This commit introduces the scaffolding for the design outlined in
issue #92, and implements the corresponding methods for userspace core
dumps, the live Linux kernel, and Linux kernel core dumps. Future work
will build on top of this commit to support live userspace processes.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
There are a few reasons for this:
1. dwfl_core_file_report() crashes on elfutils 0.183-0.185. Those
versions are still used by several distros.
2. In order to support --main-symbols and --symbols properly, we need to
report things ourselves.
3. I'm considering moving away from libdwfl in the long term.
We provide an escape hatch for now: setting the environment variable
DRGN_USE_LIBDWFL_REPORT=1 opts out of drgn's reporting and uses
libdwfl's.
Fixes#130.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I implemented the case of a segment in a core file with p_filesz <
p_memsz by treating the difference as zero bytes. This is correct for
ET_EXEC and ET_DYN, but for ET_CORE, it actually means that the memory
existed in the program but was not saved. For userspace core dumps, this
typically happens for read-only file mappings. For kernel core dumps,
makedumpfile does this to indicate memory that was excluded.
Instead, let's return a DRGN_FAULT_ERROR if an attempt is made to read
from these bytes. In the future, we need to read from the
executable/library files when we can.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The current code matches the desired note name as a prefix, but we need
an exact match.
Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Global symbols are preferred over weak symbols, and weak symbols are
preferred over other symbols.
dwfl_module_addrinfo() seems to have the same preference, so document
address lookups as having the same behavior. (This is actually incorrect
in the case of STB_GNU_UNIQUE, as dwfl_module_addrinfo() treats anything
other than STB_GLOBAL, STB_WEAK, and STB_LOCAL as having the lowest
precedence, but STB_GNU_UNIQUE is so obscure that it probably doesn't
matter.)
Based on work from Stephen Brennan. Closes#121.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Continuing the refactoring from the previous commit, move the DWARF code
from debug_info.c to its own file, leaving only the generic ELF file
management in debug_info.c
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Also:
* Rename struct string to struct nstring and move it to its own header.
* Fix scripts/iwyu.py, which was broken by commit 5541fad063 ("Fix
some flake8 errors").
* Add workarounds for a few outstanding include-what-you-use issues.
There is still a false positive for
include-what-you-use/include-what-you-use#970, but hopefully that is
fixed soon.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type.
Move that argument (and some of the derived constants) to
DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table
types. As a nice side effect, it also reduces the size of common header
files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The underscore was meant to discourage direct access in favor of using
drgn_program_get_dbinfo(), but it turns out that it's more normal to
access it directly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The only time that we want to create the drgn_debug_info is when we're
loading debugging information. Everywhere else, we fail fast if there is
no debugging information.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
And use it in a few appropriate places. This should hopefully make it
harder to make iteration mistakes like the one fixed by commit
4755cfac7c ("libdrgn: dwarf_index: increment correct variable when
rolling back"). While we're doing this, move ARRAY_SIZE() into a new
header file with array_for_each() and make it lowercase.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:
1. Reworking drgn_memory_reader to work with an inclusive address range
so that a segment can contain UINT64_MAX. drgn_memory_reader remains
agnostic to the maximum address and requires that address ranges do
not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
drgn_program_read_memory() now that they are no longer equivalent.
(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the program already had a platform set, we should its callbacks
instead of the ones from the ELF file's platform.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The Linux kernel has its own stack unwinding format for x86-64 called
ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is
essentially a simplified, less complete version of DWARF CFI. ORC is
generated by analyzing machine code, so it is present for all but a few
ignored functions. In contrast, DWARF CFI is generated by the compiler
and is therefore missing for functions written in assembly and inline
assembly (which is widespread in the kernel).
This implements an ORC stack unwinder: it applies ELF relocations to the
ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule
kind, parses and efficiently stores ORC data, and translates ORC to drgn
CFI rules. This will allow us to stack trace through assembly code,
interrupts, and system calls.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently libdrgn requires libelf to be of version 0.175 or
later. This patch allows the library to be compiled with libelf
0.170 (the newest version supported by Ubuntu 18.04 LTS).
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
We're going to need the module start and end in
drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and
get the bias when we need it. This means we don't need the bias from
drgn_dwarf_index_get_die(), so get rid of that, too.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the language for a DWARF type is not found or unrecognized, we should
fall back to the global default, not the program default (the program
default language is for language-specific operations on the program, so
DWARF parsing shouldn't depend on it). Add a fall_back parameter to
drgn_language_from_die() and use it in DWARF parsing, and replace
drgn_language_or_default() with a drgn_default_language variable.
Signed-off-by: Omar Sandoval <osandov@osandov.com>