Commit Graph

1987 Commits

Author SHA1 Message Date
Peter Collingbourne
e99921d77b libdrgn: aarch64: Rework page table walker to only read one PTE per level
The current page table walker will on average read around half of the
entire page table for each level. This is inefficient, especially when
debugging a remote target which may have a low bandwidth connection to
the debugger. Address this by only reading one PTE per level.

I've only done the aarch64 page table walker because that's all that I
needed, but in principle the other page table walkers could work in a
similar way.

Signed-off-by: Peter Collingbourne <pcc@google.com>
2023-11-06 13:09:29 -08:00
Imran Khan
79a1ea2a33 tests: add tests for waitqueue helpers.
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
2023-11-06 11:26:44 -08:00
Imran Khan
fb87916331 drgn.helpers.linux: Add helpers to work with wait-queue(s).
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
2023-11-06 11:26:44 -08:00
Omar Sandoval
c18a1a973e Revert "vmtest.config: bump kernel.org crosstool compiler version"
This reverts commit 3fc72a92ea. GCC 13
seems to be generating DW_TAG_unspecified_type DWARF entries that older
versions of pahole don't support.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-03 15:48:14 -07:00
Omar Sandoval
3fc72a92ea vmtest.config: bump kernel.org crosstool compiler version
While we're doing a mass rebuild for only a handful of kernels, we might
as well rebuild with the latest compiler.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-03 15:25:10 -07:00
Omar Sandoval
2b9877a4fa vmtest.kbuild: add workaround for mystery QEMU hang
Kernel versions 5.11-5.13 are hanging in the CI. It feels like a QEMU
bug, but there's a workaround we can apply to the kernel. The details
are in the new patch.

See #365.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-03 15:25:10 -07:00
Omar Sandoval
8a6b8415af libdrgn: bitops: add ilog2()
We can split it out from fls().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-02 16:26:19 -07:00
Omar Sandoval
c635adb7e6 Revert "libdrgn: add last virtual address hint to page table iterator"
This reverts commit 747e02857d (except for
the test improvements). Peter Collingbourne noticed that the change I
used to test the performance of reading a single PTE at a time [1]
didn't cache higher level entries. Keeping that caching makes the
regression I was worried about negligible. So, there's no reason to add
the extra complexity of the hint.

1: https://github.com/osandov/drgn/pull/312#issuecomment-1754082129

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-01 16:31:39 -07:00
Omar Sandoval
966ab4bdef libdrgn: python: clean up add_type_aliases()
- Fix messed up indentation by seven spaces instead of a tab.
- Use //-style comments.
- Put "imports" first.
- Call after setting up all other types so that future changes can set
  up aliases referring to those types.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-01 14:02:18 -07:00
Omar Sandoval
1636bbd478 drgndoc: work around Sphinx not linking to type aliases in annotations
Sphinx normally makes type names in annotations links to the
documentation for that type, but this doesn't work for type aliases
(like drgn.Path). See sphinx-doc/sphinx#10785. Add a workaround inspired
by adafruit/circuitpython#8236.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-01 09:19:19 -07:00
Omar Sandoval
a2bd731d98 drgndoc: fix mypy errors
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-31 23:41:41 -07:00
Omar Sandoval
45ca8c5e32 libdrgn: python: reduce exception type boilerplate
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-31 14:19:06 -07:00
Omar Sandoval
1487db5389 libdrgn: stack_trace: get rid of PRSTATUS PID match heuristic
Commit 1b47b866b4 ("libdrgn: go back to trusting PRSTATUS PID")
introduced a check so that we only use the PRSTATUS note for stack
unwinding if its PID field matches the PID of the task we're trying to
unwind. This was intended to detect when a CPU was in the middle of a
context switch during a crash. However, it has caused more trouble than
it's worth. For example:

1. It's broken on s390x, which puts the CPU number + 1 in the PID field.
   See also commit 7cb3e99b23 ("libdrgn: program: find crashed task
   with cpu_curr() instead of find_task()").
2. It's broken on dumps from QEMU's dump-guest-memory command, which
   also puts the CPU number + 1 in the PID field. See issue #356.
3. It breaks the reasonable assumption that
   prog.stack_trace(cpu_curr(prog, cpu)) always gets the stack trace for
   a given CPU. This slowed down an internal scheduler investigation.

It doesn't even really accomplish what it's trying to do, since the
check itself is racy. Let's remove it (which is not as simple as it
sounds, since we were also using it in lieu of a proper on_cpu fallback
for !SMP kernels).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 23:22:17 -07:00
Omar Sandoval
9a45af4ec7 libdrgn: stack_trace: check for PRSTATUS note earlier
There's no reason to go through the trouble of checking the task_struct
if we were given a PRSTATUS note; it must be a thread that was running
at the time of the core dump. Refactor drgn_get_initial_registers() so
that we can use PRSTATUS earlier.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 23:22:17 -07:00
Omar Sandoval
bc6c1ab151 libdrgn: stack_trace: return NOT_IMPLEMENTED instead of INVALID_ARGUMENT where appropriate
In places where we're missing support for an architectures or live
processes, return a NOT_IMPLEMENTED error instead of an INVALID_ARGUMENT
error.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 23:22:17 -07:00
Omar Sandoval
3e7244d25f libdrgn: ppc64: update special case for vmcore PRSTATUS r1
The weird mismatch of r1 and the program counter was fixed in Linux v6.5
and backported to several stable branches. Unfortunately, we have to
break the kernel version checking ban to handle that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 23:22:17 -07:00
Omar Sandoval
cf0b75420c contrib/vmmap.py: use new for_each_vma() helper
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 13:09:21 -07:00
Omar Sandoval
eab64bdbc4 drgn.helpers.linux.mm: add for_each_vma() helper
This takes care of the transition from the VMA linked list to maple
trees for users.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 13:09:21 -07:00
Omar Sandoval
e4100db4d5 drgn.helpers.linux: add maple tree helpers
Maple trees have been around and used for VMAs for almost a year now
(since Linux 6.1). Finally add helpers and tests for them.

Closes #261.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 13:09:19 -07:00
Omar Sandoval
31fd062f25 pre-commit: update Black, flake8, and pre-commit-hooks
No changes required.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-30 12:41:44 -07:00
Omar Sandoval
c63df3827a docs: document that drgn.helpers.linux.mm also supports ppc64 now
Fixes: 191569bf31 ("ppc64: add virtual address translation support")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-25 16:48:51 -07:00
Omar Sandoval
59702918c9 packit: reenable non-s390x ELN builds
This partially reverts commit 002b63b437 ("packit: disable ELN builds
until fedora-eln/eln#165 is fixed"). That ELN issue was fixed, but it's
still broken on s390x because of fedora-eln/eln#170.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-25 15:22:53 -07:00
Omar Sandoval
219a819659 vmtest.vm: quote command arguments from CLI
This fixes commands like `python3 -m vmtest.vm python3 -c 'foo; bar'`.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-25 10:09:59 -07:00
Omar Sandoval
5a8bf55652 tests: offline CPUs for cpumask tests
This makes the cpumask tests a little more thorough, as now the online
mask will be different from the possible and present masks. It also
makes the cpulist discontiguous in most cases (since you usually can't
offline CPU 0).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-25 10:02:15 -07:00
Omar Sandoval
e3e3295a57 tests: add test for cpumask_to_cpulist()
And refactor the for_each_cpu() tests.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-25 09:41:15 -07:00
Omar Sandoval
bf1d8440a4 drgn.helpers.linux.cpumask: add cpu_{online,possible,present}_mask()
Factor these out of the for_each_{online,possible,present}_cpu()
helpers. These are mainly so that we can test cpumask_to_cpulist(), but
they're also useful in their own right.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-25 09:39:09 -07:00
Imran Khan
754b7be807 contrib: add script to dump irq stats.
Sample output:

python3 -m drgn -s vmlinux -c vmcore contrib/irq.py

List of IRQs
irq: 0 name: timer (struct irq_desc *)0xffff9b42fd00d400  (struct irqaction *)0xffffffff88425240
irq: 1 name: i8042 (struct irq_desc *)0xffff9b42fd00d600  (struct irqaction *)0xffff9b42fcb13c00
irq: 4 name: ttyS0 (struct irq_desc *)0xffff9b42fd00dc00  (struct irqaction *)0xffff9b42fcb2bc00
irq: 8 name: rtc0 (struct irq_desc *)0xffff9b42fd00e400  (struct irqaction *)0xffff9b42fcb59380
irq: 9 name: acpi (struct irq_desc *)0xffff9b42fd00e600  (struct irqaction *)0xffff9b42fcafca80
irq: 11 name: eth0 (struct irq_desc *)0xffff9b42fd00ea00  (struct irqaction *)0xffff9b42fc878d80
irq: 12 name: i8042 (struct irq_desc *)0xffff9b42fd00ec00  (struct irqaction *)0xffff9b42fcb13900
irq: 14 name: ata_piix (struct irq_desc *)0xffff9b42fd00f000  (struct irqaction *)0xffff9b42fc690800
irq: 15 name: ata_piix (struct irq_desc *)0xffff9b42fd00f200  (struct irqaction *)0xffff9b42fc690780

IRQ affinities
irq: 0 name: timer affinity: 0-7
irq: 1 name: i8042 affinity: 0-7
irq: 4 name: ttyS0 affinity: 0-7
irq: 8 name: rtc0 affinity: 0-7
irq: 9 name: acpi affinity: 0-7
irq: 11 name: eth0 affinity: 0-7
irq: 12 name: i8042 affinity: 0-7
irq: 14 name: ata_piix affinity: 0-7
irq: 15 name: ata_piix affinity: 0-7

IRQ stats
irq: 0 name: timer (struct irq_desc *)0xffff9b42fd00d400
    CPU: 0  	 count: 162
    Total: 162
irq: 1 name: i8042 (struct irq_desc *)0xffff9b42fd00d600
    CPU: 0  	 count: 10
    Total: 10
irq: 4 name: ttyS0 (struct irq_desc *)0xffff9b42fd00dc00
    CPU: 0  	 count: 220
    Total: 220
irq: 8 name: rtc0 (struct irq_desc *)0xffff9b42fd00e400
    CPU: 0  	 count: 1
    Total: 1
    Total: 0
irq: 11 name: eth0 (struct irq_desc *)0xffff9b42fd00ea00
    CPU: 0  	 count: 85
    Total: 85
irq: 12 name: i8042 (struct irq_desc *)0xffff9b42fd00ec00
    CPU: 0  	 count: 125
    Total: 125
irq: 14 name: ata_piix (struct irq_desc *)0xffff9b42fd00f000
    CPU: 0  	 count: 496
    Total: 496
irq: 15 name: ata_piix (struct irq_desc *)0xffff9b42fd00f200
    CPU: 0  	 count: 11
    Total: 11

cpuwise IRQ stats
IRQ stats for cpu: 0
    irq: 0 name: timer (struct irq_desc *)0xffff9b42fd00d400 count: 162
    irq: 1 name: i8042 (struct irq_desc *)0xffff9b42fd00d600 count: 10
    irq: 4 name: ttyS0 (struct irq_desc *)0xffff9b42fd00dc00 count: 220
    irq: 8 name: rtc0 (struct irq_desc *)0xffff9b42fd00e400 count: 1
    irq: 11 name: eth0 (struct irq_desc *)0xffff9b42fd00ea00 count: 85
    irq: 12 name: i8042 (struct irq_desc *)0xffff9b42fd00ec00 count: 125
    irq: 14 name: ata_piix (struct irq_desc *)0xffff9b42fd00f000 count: 496
    irq: 15 name: ata_piix (struct irq_desc *)0xffff9b42fd00f200 count: 11
Total: 1110

IRQ stats for cpu: 1
Total: 0

IRQ stats for cpu: 2
Total: 0

IRQ stats for cpu: 3
Total: 0

IRQ stats for cpu: 4
Total: 0

IRQ stats for cpu: 5
Total: 0

IRQ stats for cpu: 6
Total: 0

IRQ stats for cpu: 7
Total: 0

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
2023-10-25 09:28:29 -07:00
Imran Khan
7013c31e3e helpers: cpumask: add helper to convert a cpumask to string of cpu ranges.
In some cases (for example getting affinity of an irq), it is better
to have an easily understandable list of cpus corresonding to a given
cpumask.
This helper converts a given cpumask to string, such that the string
represents the range of CPUs that are present in the given mask.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
2023-10-25 09:28:29 -07:00
Omar Sandoval
7cae874b71 docs: expand documentation for operator substitutions
The replacements of * with [0], -> with ., and & with address_of_() are
documented in the Object class docstring, but they're important enough
that we should mention them in the user guide. Also expand the
documentation of __getitem__ and __getattribute__ to mention this.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-23 11:22:28 -07:00
Omar Sandoval
a34be20882 libdrgn: fix getting debug info for live processes
We're determining which callbacks to use for the Dwfl handle too early,
before the program flags are set. Instead of creating it later, shift
the flag checks to the callback itself.

Fixes: c85dd74f3e ("libdrgn: embed drgn_debug_info in drgn_program")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-23 10:51:23 -07:00
Alex Gartrell
ff2a39586b Improve debuginfo warning
Rather than the indirect link between verbage in the docs and getting
dependencies, just include the URL. Indent the related modules for
clarity.

Signed-off-by: Alex Gartrell <alexgartrell@gmail.com>
2023-10-18 13:54:31 -07:00
Stephen Brennan
49971c9a28 Correct types/docs for object+type finder API
The callables for object and type finders may return None -- and they
must do this in the case of a lookup failure. Update the type
annotations and docstrings to reflect this.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
2023-10-16 16:57:43 -07:00
Omar Sandoval
54f5fa044f contrib/find_struct_file.py: also look in binfmt_misc
This is a surprising place where file references can be hiding that I've
run into before. There are some beginnings of an lsof-like script here.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-16 10:45:45 -07:00
Omar Sandoval
185a5bf66d contrib: add stack_trace_call_fault.py
It's difficult to automatically detect calling an invalid, non-NULL
pointer when getting a stack trace. This manually recreates what we do
for calls to NULL since commit 412ce956b0 ("libdrgn: x86_64: unwind
call when pc is 0"). This was used to debug the issue fixed by "net:
tcp: fix crashes trying to free half-baked MTU probes" [1].

1: https://lore.kernel.org/all/20231010173651.3990234-1-kuba@kernel.org/T/

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-13 16:48:21 -07:00
Omar Sandoval
bbaf608af0 contrib: add find_struct_file.py
I used this to find some missing sockets that weren't showing up in lsof
or ss.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-13 16:39:17 -07:00
Omar Sandoval
06fd63b585 libdrgn: use _cleanup_free_ in relocate_elf_file()
I missed this easy conversion back in commit ee51244dc1 ("libdrgn: add
_cleanup_free_ scope guard, no_cleanup_ptr(), and return_ptr()").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-12 15:57:40 -07:00
Omar Sandoval
52092bed09 libdrgn: actually use our implementation of SHT_REL relocations
Commit b16dad8a36 ("libdrgn: support SHT_REL relocations") added our
own implementation of SHT_REL relocations, which are used by Arm and
i386. However, it failed to remove the check that skips over all
non-SHT_RELA sections, so we've been falling back to the (slow) libdwfl
implementation this whole time.

Fixes: b16dad8a36 ("libdrgn: support SHT_REL relocations")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-12 15:47:24 -07:00
Omar Sandoval
6668618b2d libdrgn: elf_file: drop .debug_line and .debug_line_str
These aren't needed since commit c4a122ead6 ("libdrgn: dwarf_info:
scalably index all DIEs per name").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 15:07:28 -07:00
Omar Sandoval
68fc40fe85 tests: replace setenv() with more general modifyenv()
No functional change for existing tests, but it will be used for future
tests.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 14:45:06 -07:00
Omar Sandoval
882d73a523 tests: make tests.elf.SHF an IntFlag
Upcoming tests will need to combine flags.

Fixes: 104a14781d ("tests: test compressed debug sections")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 14:36:03 -07:00
Omar Sandoval
64de9ef829 libdrgn: log: fix typos in drgn_error_log_{critical,error}()
The log levels are DRGN_LOG_CRITICAL and DRGN_LOG_ERROR, not
DRGN_LOG_CRIT and DRGN_LOG_ERR. These macros aren't being used anywhere
yet, so it wasn't caught before.

Fixes: c1a2792e6a ("libdrgn: add simple logging framework")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 14:35:10 -07:00
Omar Sandoval
002b63b437 packit: disable ELN builds until fedora-eln/eln#165 is fixed
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 14:06:54 -07:00
Omar Sandoval
0b99eab65f packit: remove commented out 32-bit ARM builds
It's been commented out since commit 287fc845dd ("packit: disable
32-bit ARM builds"), and Fedora no longer supports 32-bit ARM [1].

1: https://fedoraproject.org/wiki/Changes/RetireARMv7

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 13:54:47 -07:00
Omar Sandoval
747e02857d libdrgn: add last virtual address hint to page table iterator
Peter Collingbourne reported that the over-reading we do in the AArch64
page table iterator uses too much bandwidth for remote targets. His
original proposal in #312 was to change the page table iterator to only
read one entry per level. However, this would regress large reads that
do end up using the additional entries (in particular when the target is
/proc/kcore, which has a high latency per read but also high enough
bandwidth that the over-read is essentially free).

We can get the best of both worlds by informing the page table iterator
how much we expect to need (at the cost of some additional complexity in
this admittedly already pretty complex code). Requiring an accurate end
would limit the flexibility of the page table iterator and be more
error-prone, so let's make it a non-binding hint.

Add the hint and use it in the x86-64 page table iterator to only read
as many entries as necessary. Also extend the test case for large page
table reads to test this better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 11:29:35 -07:00
Omar Sandoval
b7af85df52 libdrgn: x86_64: update code style
The arch_x86_64.c is often used as a reference when implementing support
for other architectures, so make sure it uses our latest best practices.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-11 10:57:23 -07:00
Omar Sandoval
1009271721 libdrgn: mark log format as non-NULL and disable -Wformat-zero-length
In my branch for the module API (#332), I want to log an error without
any additional context. Passing an empty format string causes a
"zero-length gnu_printf format string" warning from GCC, and passing
NULL crashes in vsnprintf().

Empty format strings are totally valid, but NULL clearly isn't, so
annotate the format parameter as non-NULL and disable
-Wformat-zero-length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-09 14:01:23 -07:00
Omar Sandoval
aefe704b03 libdrgn: string_builder: add STRING_BUILDER scope guard
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-07 22:30:38 -07:00
Peter Collingbourne
cf649b741a Add DRGN_PROGRAM_IS_LOCAL flag
Currently, when drgn is used to debug a running program, we assume it to
be running on the local machine. However, with remote debugging, this will
no longer be the case. To accommodate remote debugging, introduce a flag
DRGN_PROGRAM_IS_LOCAL, and use it to decide whether to use /sys/module.

Signed-off-by: Peter Collingbourne <pcc@google.com>
2023-10-06 16:09:58 -07:00
Omar Sandoval
31bb9c0521 libdrgn: stack_trace: don't bother masking address before reading memory
drgn_program_read_memory() already takes care of masking the address.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-06 15:43:18 -07:00
Omar Sandoval
e91fcbe22a libdrgn: promise that drgn_{program,object}_read_c_string() out parameter is not modified on error
And take advantage of that to use _cleanup_free_.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-03 13:45:02 -07:00