Drgn is used by multiple companies and stakeholders in a variety of
situations. I think it qualifies as a production / stable. Also, add in
the Linux kernel trove classifier.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
Since 'prog->aux->linked_prog' has been moved to trampoline, it has to
inspect linked progs from tracing link's trampoline.
Therefore, add --show-details for link command to show the linked progs
for tracing link.
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
We got a couple of reports about drgn failing to get a stack trace
(#391) or getting the wrong stack trace (#404) from a kernel core dump.
Both were caused because drgn assumes that there is an NT_PRSTATUS note
for each CPU in order by CPU number, and in these core dumps some
NT_PRSTATUS notes were missing. There are a least a couple of things
that can cause this: offline CPUs or CPUs that were in a bad state and
didn't respond to the kdump NMI. The former is expected and could be
special-cased, but the latter basically means that we can't trust the
order of the notes. Instead, look up the notes from the crash_notes
per-CPU variable that the kernel uses to populate the ELF notes. We
still need to use the actual NT_PRSTATUS notes for QEMU
dump-guest-memory dumps, but for those we need to use the PID field to
handle CPU hotplugging.
Closes#404.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We have a few places where we parse raw ELF notes independently of
libelf, which isn't complicated but also not trivial thanks to alignment
requirements. In preparation for adding another place, factor the
parsing out into a common helper. Also document the complexities around
figuring out the correct alignment.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The vmap_nodes global variable is optimized out on 32-bit kernels, so we
need to hard-code a workaround for the find_vmap_area() and
for_each_vmap_area() helpers.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The path of the Python interpreter in the Debian rootfs is
/usr/bin/python3, which may not be the same as sys.executable on the
host.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
When running tests manually with vmtest.vm, it's tedious to set up the
test kernel module. Make vmtest.vm.run_in_vm() handle it and add command
line options.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Linux 6.10 switched PageSlab() from using a page flag to a page type.
This is annoying since page types are all represented using macros
instead of enums, but there's a VMCOREINFO entry we can use. A better
solution is to change the page type definitions to enums in the kernel
source. We're looking into that upstream, but for now let's get the
tests passing on 6.10.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This one doesn't need any changes to the callback signature, just the
new interface. We also keep add_object_finder() for compatibility.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The ELF and libkdumpfile paths have duplicated logic for adding the
Linux kernel object finder and setting the default language to C. Factor
them out into a common helper.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Like for symbol finders, we want extra flexibility around configuring
type finders. The type finder callback signature also has a couple of
warts: it doesn't take the program since it was added before types
needed to be constructed from a program, and it is called separately for
each type kind since originally type lookups were for only one kind.
While we're adding a new interface, let's fix these warts: pass the
program and a set of type kinds. However, we need to keep the old
add_type_finder() interface for backwards compatibility.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
C type finders are passed a bitset of type kinds, but Python type
finders currently get called for each type kind. An upcoming update to
the type finder interface will fix this, but we need a set of TypeKinds,
and we'd rather not construct a real Python set for it. Instead, add a
TypeKindSet bitset that satisfies the collections.abc.Set interface.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, the bare-bones add_symbol_finder() interface only allows
adding a symbol finder that is called before any existing finders. It'd
be useful to be able to specify the order that symbol finders should be
called in and to selectively enable and disable them. To do that, we
also need finders to have a name to identify them by. So, replace
add_symbol_finder() (which hasn't been in a release yet) with a set of
interfaces providing this flexibility: register_symbol_finder(),
set_enabled_symbol_finders(), registered_symbol_finders(), and
enabled_symbol_finders(). Also change the callback signature to take the
program.
In particular, this flexibility will be very useful for a plugin system:
pre-installed plugins can register symbol finders that the user can
choose to enable or disable.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This will be used to allow providing names for type, object, and symbol
finders and configuring which ones are called and in what order. We
might even want this for memory readers. I'm assuming there will only be
a handful of handlers on a given list, but the enabled handlers will be
called frequently, and there may be many lists. The implementation is
therefore optimized for fast iteration and small size, and we don't
cache anything that would speed up reconfiguring the list.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This doesn't matter for anything currently in the documentation, but an
upcoming change uses collections.abc.Set, which got formatted
incorrectly as collections.Set.Set.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Just like in commit 4d970a98c1 ("libdrgn: python: set Python error
indicator if Program_hold_reserve() fails"), callers of
Program_hold_object() assume it sets the error indicator if it fails.
Fixes: a8d632b4c1 ("libdrgn/python: use F14 instead of PyDict for Program::objects")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Most core dumps contain some virtual address mappings: usually at a
minimum, the kernel's direct map is represented in ELF vmcores via a
segment. So normally, drgn can rely on the vmcore to read the virtual
address of swapper_pg_dir. However, some vmcores only contain physical
address information, so when drgn reads memory at swapper_pg_dir, it
needs to first translate that address, thus causing a recursive
translation error like below:
>>> prog["slab_caches"]
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/stepbren/repos/drgn/drgn/cli.py", line 141, in _displayhook
text = value.format_(columns=shutil.get_terminal_size((0, 0)).columns)
_drgn.FaultError: recursive address translation; page table may be missing from core dump: 0xffffffff9662aff8
Debuggers like crash, as well as libkdumpfile, contain fallback code
which can translate swapper_pg_dir in order to bootstrap this address
translation. In fact, the above error does not occur in drgn when using
libkdumpfile. So, let's add this fallback case to drgn as well. Other
architectures will need to have equivalent support added.
Co-authored-by: Illia Ostapyshyn <ostapyshyn@sra.uni-hannover.de>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
It's not immediately obvious from the API, but libkdumpfile allows
setting the vmcoreinfo attribute. However, setting the vmcoreinfo is not
enough, we must also set the platform information given by the user.
Further, we need to specify these elements in the correct order with
respect to the file descriptor.
If done correctly, then libkdumpfile can successfully handle a core
whose vmcoreinfo is not present in the diskdump or ELF metadata. Of
course, the user must find the vmcoreinfo note and manually give this to
Drgn, along with the platform architecture.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Ideally these could be put under some sort of "advanced" settings so
they don't confuse new users.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Currently set_core_dump() expects to be initializing the vmcoreinfo
itself. But it could be beneficial to let callers set the vmcoreinfo
with something else, e.g. if the vmcoreinfo can't be found in the ELF
notes or kdump metadata, but has been extracted via other means. So
update these initialization steps to only setup vmcoreinfo information
if it's not already present.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This is essentially the same fix as commit 3f3a957562 ("libdrgn:
linux_kernel: Fix module detection on kernel v6.4").
Closes#373.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
When run `list_bpf_progs()` in drgn shell, it's meaningless to pass
`args` to `list_bpf_progs()`. Therefore, let's hide `args` from drgn
shell.
BTW, select appropriate global variables and functions to run drgn shell.
So, it's ready to add more bpf functions to inspect bpf progs, bpf maps,
bpf links and so on.
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Our setup.py imports setuptools which is no longer provided by Python
3.12. It must be installed manually via pip. We did not notice this
because the "nodeenv" package listed setuptools in its install_requires,
thus providing the dependency for us.
Four hours ago nodeenv 1.9.0 was released, and it happened to drop the
setuptools dependency. This broke CI. Add an explicit setuptools
requirement in our pip command to fix it.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
`drgn` no longer builds in EPEL 9 because `libkdumpfile-devel` is in
RHEL 9 but not pushed out to CRB: https://issues.redhat.com/browse/RHEL-38224
Instead, let's make sure it still builds on CentOS Stream 9 + EPEL Next
9, with the `c9s-build` repo enabled, since that repo is accessible from
COPR.
Fixes: #395
Signed-off-by: Michel Lind <salimma@fedoraproject.org>
drgn is now in RHEL 9.4, so it was retired from EPEL 9. The EPEL 9
Packit build is failing with
No matching package to install: 'libkdumpfile-devel'
It might still be useful to build for RHEL 9, but Davide Cavalca is
looking into what the best thing to do here is. For now, disable the
EPEL 9 builds to get the CI green.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add support for parsing and printing most of the Btrfs item types and
clean up/add some tree search and extent buffer helpers. I used this to
debug a Btrfs crash in production. It's almost ready to be turned into
proper helpers, but it needs lots of test cases and a big more cleanup.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
When looking up a local variable, we pass the function scope DIE to the
DWARF expression evaluator, which uses it to look up DW_AT_frame_base
for DW_OP_fbreg. However, for inline frames, the function scope DIE is
the DW_TAG_inlined_subroutine DIE, which doesn't have a
DW_AT_frame_base; we're supposed to get it from the containing
DW_TAG_subprogram DIE. Fix drgn_stack_frame_find_object() to always pass
the containing DW_TAG_subprogram DIE. This fixes some cases where local
variables are reported as absent even though they are available.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the host is running util-linux >= v2.40 (specifically, with commit
c14bee4d44ac ("libmount: reduce utab.lock permissions")), then the
unprivileged QEMU process can't open /run/mount/utab.lock, which causes
umount in the VM to fail. Use umount -n to skip it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>