makedumpfile will exclude zero pages. We found a core file where a
structure straddled a page boundary and the end of the structure
was all zeros so the page was excluded and we were generating a
FaultError trying to access the structure.
This change reverts a portion of that behaviour such that when we are
debugging a kernel core we go back to the zero fill behaviour. To do this
we go back to creating segments based on memsz instead of filesz and
handling the filesz->memsz gap in drgn_read_memory_file.
Fixes: 02912ca7d0 ("libdrgn: fix handling of p_filesz < p_memsz in core dumps")
Signed-off-by: Glen McCready <gkm@mysteryinc.ca>
The config option is and always has been CONFIG_FW_CFG_SYSFS, not
CONFIG_FW_CFG. Also suggest the user-visible CONFIG_KEXEC instead of the
internal CONFIG_CRASH_CORE.
Fixes: 2bd861f719 ("libdrgn: program: detect QEMU guest memory dumps without VMCOREINFO")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
On x86-64, the difference between virtual addresses in the direct map
and the corresponding physical addresses is called PAGE_OFFSET, so we
exposed that via an architecture callback and the Linux kernel object
finder. However, this doesn't translate to other architectures. Namely,
on AArch64, the difference is PAGE_OFFSET - PHYS_OFFSET, and both
PAGE_OFFSET and PHYS_OFFSET have varied over time and between
configurations.
We can remove the architecture callback and avoid version-specific logic
by letting the page table tell us the offset. We just need an address in
the direct map, which is easy to find since this includes kmalloc and
memblock allocations.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
AArch64 will need different sizes of page table iterators depending on
the page size and virtual address size. Rather than the static
pgtable_iterator_arch_size, allow architectures to define callbacks for
allocating and freeing a page table iterator. Also remove the generic
page table iterator wrapper and just pass that information to the
iterator function.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Issue #182 reported that a core dump created by QEMU's dump-guest-memory
command confuses drgn: by default, it only has NT_PRSTATUS notes and
QEMU state notes for each CPU, so drgn thinks it's a userspace core
dump, and it doesn't have the necessary VMCOREINFO to use it as a Linux
kernel core dump.
It turns out that QEMU and Linux can cooperate to add a VMCOREINFO note
to the guest memory dump, which suffices for drgn. Let's detect a QEMU
guest memory dump without a VMCOREINFO note and include instructions on
how to capture a QEMU dump that makes drgn happy.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In order to support removing the pointer authentication code (PAC) from
return addresses on AArch64, we need to know what bits are being used
for the PAC. We can get this from the NT_ARM_PAC_MASK note in userspace
core dumps and from the NUMBER(KERNELPACMASK) field in VMCOREINFO for
Linux kernel core dumps.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In an upcoming commit, we will parse the AArch64 pointer authentication
code mask either from the VMCOREINFO note or the NT_ARM_PAC_MASK note.
Since it doesn't always come from VMCOREINFO, it doesn't make sense to
put it in struct vmcoreinfo; struct drgn_program makes more sense. So,
make parse_vmcoreinfo() take struct drgn_program instead of struct
vmcoreinfo, rename it to drgn_program_parse_vmcoreinfo(), and replace
struct vmcoreinfo with an anonymous struct in struct drgn_program.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
On !SMP kernels, crashing_cpu either doesn't exist or is always -1, so
drgn_program_crashed_thread() fails. Detect those cases and treat
crashing_cpu as 0.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Our cheap heuristic for the default language will not always be correct,
and although we can improve it as cases arise, we should also just have
a way for the user to explicitly set the default language. Add
drgn_program_set_language() to libdrgn and allow setting
drgn.Program.language in the Python bindings. This will also make unit
testing different languages easier.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This implements the existing thread API methods for live processes other
than drgn_thread_stack_trace(). It also doesn't yet add support for
full-blown tracing, but it at least brings live processes to feature
parity. This is taken from the non-ptrace parts of Kevin Svetlitski's
PR #142, with some modifications.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_thread_iterator_create(), drgn_thread_iterator_next(), and
drgn_program_find_thread() have big, divergent code paths for different
targets, and this would get worse once we add live processes. Split them
up into multiple functions.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We have a few places where we format a decimal number with sprintf() or
snprintf() to a buffer with an arbitrary size. Instead of this arbitrary
size, let's add a macro to get the exact number of characters required
to format a decimal number, use it in all of these places, and make all
of these places use snprintf() just to be safe. This is more verbose but
self-documenting. The max_decimal_length() macro is inspired by
https://stackoverflow.com/a/13546502/1811295 with some improvements.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If a TID does not exist, then linux_helper_find_task() succeeds but
returns a null pointer object. Check for that instead of returning a
bogus thread.
Fixes: 301cc767ba ("Implement a new API for representing threads")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently only supported for user-space crash dumps. E.g. no support for
live user-space application debugging or kernel debugging.
Closes#144.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Running the test suite with ASAN enabled revealed that when
the current target was a userspace core dump, the `crashed_thread`
member of `struct drgn_program` was being freed twice – once indirectly
via `drgn_thread_set_deinit`, and once explicitly in `drgn_prog_deinit`.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
Currently we can lookup symbols by name or address, but this will only
return one symbol, prioritizing the global symbols. However, symbols may
share the same name, and symbols may also overlap address ranges, so
it's possible for searches to return multiple results. Add functions
which can return a list of multiple matching symbols.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
Previously, drgn had no way to represent a thread – retrieving a stack
trace (the only extant thread-specific operation) was achieved by
requiring the user to directly provide a tid.
This commit introduces the scaffolding for the design outlined in
issue #92, and implements the corresponding methods for userspace core
dumps, the live Linux kernel, and Linux kernel core dumps. Future work
will build on top of this commit to support live userspace processes.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
There are a few reasons for this:
1. dwfl_core_file_report() crashes on elfutils 0.183-0.185. Those
versions are still used by several distros.
2. In order to support --main-symbols and --symbols properly, we need to
report things ourselves.
3. I'm considering moving away from libdwfl in the long term.
We provide an escape hatch for now: setting the environment variable
DRGN_USE_LIBDWFL_REPORT=1 opts out of drgn's reporting and uses
libdwfl's.
Fixes#130.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I implemented the case of a segment in a core file with p_filesz <
p_memsz by treating the difference as zero bytes. This is correct for
ET_EXEC and ET_DYN, but for ET_CORE, it actually means that the memory
existed in the program but was not saved. For userspace core dumps, this
typically happens for read-only file mappings. For kernel core dumps,
makedumpfile does this to indicate memory that was excluded.
Instead, let's return a DRGN_FAULT_ERROR if an attempt is made to read
from these bytes. In the future, we need to read from the
executable/library files when we can.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The current code matches the desired note name as a prefix, but we need
an exact match.
Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Global symbols are preferred over weak symbols, and weak symbols are
preferred over other symbols.
dwfl_module_addrinfo() seems to have the same preference, so document
address lookups as having the same behavior. (This is actually incorrect
in the case of STB_GNU_UNIQUE, as dwfl_module_addrinfo() treats anything
other than STB_GLOBAL, STB_WEAK, and STB_LOCAL as having the lowest
precedence, but STB_GNU_UNIQUE is so obscure that it probably doesn't
matter.)
Based on work from Stephen Brennan. Closes#121.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Continuing the refactoring from the previous commit, move the DWARF code
from debug_info.c to its own file, leaving only the generic ELF file
management in debug_info.c
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Also:
* Rename struct string to struct nstring and move it to its own header.
* Fix scripts/iwyu.py, which was broken by commit 5541fad063 ("Fix
some flake8 errors").
* Add workarounds for a few outstanding include-what-you-use issues.
There is still a false positive for
include-what-you-use/include-what-you-use#970, but hopefully that is
fixed soon.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type.
Move that argument (and some of the derived constants) to
DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table
types. As a nice side effect, it also reduces the size of common header
files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The underscore was meant to discourage direct access in favor of using
drgn_program_get_dbinfo(), but it turns out that it's more normal to
access it directly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The only time that we want to create the drgn_debug_info is when we're
loading debugging information. Everywhere else, we fail fast if there is
no debugging information.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
And use it in a few appropriate places. This should hopefully make it
harder to make iteration mistakes like the one fixed by commit
4755cfac7c ("libdrgn: dwarf_index: increment correct variable when
rolling back"). While we're doing this, move ARRAY_SIZE() into a new
header file with array_for_each() and make it lowercase.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:
1. Reworking drgn_memory_reader to work with an inclusive address range
so that a segment can contain UINT64_MAX. drgn_memory_reader remains
agnostic to the maximum address and requires that address ranges do
not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
drgn_program_read_memory() now that they are no longer equivalent.
(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the program already had a platform set, we should its callbacks
instead of the ones from the ELF file's platform.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The Linux kernel has its own stack unwinding format for x86-64 called
ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is
essentially a simplified, less complete version of DWARF CFI. ORC is
generated by analyzing machine code, so it is present for all but a few
ignored functions. In contrast, DWARF CFI is generated by the compiler
and is therefore missing for functions written in assembly and inline
assembly (which is widespread in the kernel).
This implements an ORC stack unwinder: it applies ELF relocations to the
ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule
kind, parses and efficiently stores ORC data, and translates ORC to drgn
CFI rules. This will allow us to stack trace through assembly code,
interrupts, and system calls.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently libdrgn requires libelf to be of version 0.175 or
later. This patch allows the library to be compiled with libelf
0.170 (the newest version supported by Ubuntu 18.04 LTS).
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
We're going to need the module start and end in
drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and
get the bias when we need it. This means we don't need the bias from
drgn_dwarf_index_get_die(), so get rid of that, too.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the language for a DWARF type is not found or unrecognized, we should
fall back to the global default, not the program default (the program
default language is for language-specific operations on the program, so
DWARF parsing shouldn't depend on it). Add a fall_back parameter to
drgn_language_from_die() and use it in DWARF parsing, and replace
drgn_language_or_default() with a drgn_default_language variable.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If virtual address translation isn't implemented for the target
architecture, then we shouldn't add the page table memory reader. If we
do, we get a DRGN_ERROR_INVALID_ARGUMENT error from
linux_helper_read_vm() instead of a DRGN_ERROR_FAULT error as expected.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).
This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.
This also reworks a couple of details of loading debugging information:
- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
compilation units (by using OpenMP tasks).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the
DWARF index, with each namespace being its own sub-index. We only index
the namespace itself when it is first accessed, which should help with
startup time and simplifies tracking.
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
For namespace support, we will want to access the struct
drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split
drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of related ways that we can cause undefined behavior
when parsing a malformed DWARF or depmod index file:
1. There are several places where we increment the cursor to skip past
some data. It is undefined behavior if the result points out of
bounds of the data, even if we don't attempt to dereference it.
2. read_in_bounds() checks that ptr <= end. This pointer comparison is
only defined if ptr and end both point to elements of the same array
object or one past the last element. If ptr has gone past end, then
this comparison is likely undefined anyways.
Fix it by adding a helper to skip past data with bounds checking. Then,
all of the helpers can assume that ptr <= end and maintain that
invariant. while we're here and auditing all of the call sites, let's
clean up the API and rename it from read_foo() to the less generic
mread_foo().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.
Signed-off-by: Omar Sandoval <osandov@osandov.com>