Commit Graph

1646 Commits

Author SHA1 Message Date
Omar Sandoval
3ce37c8002 libdrgn: python: fix creating compound value with 32-bit float member on big-endian
This is similar to commit 155ec92ef2 ("libdrgn: fix reading 32-bit
float object values on big-endian").

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-02 10:39:34 -07:00
Omar Sandoval
0bc79c877a libdrgn: fix stray bits when reading bytes of bit field
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-01 16:31:17 -07:00
Oleksandr Natalenko
54b1630370 helpers: support dmesg with RHEL 7
RHEL 7 kernel still uses `struct log *` for a structured kernel log
instead of `struct printk_log *`, so lets try to support it.

Tested on:

* `3.10.0-229`
* `3.10.0-1160.80.1`
* `3.10.0-1160.83.1`

This should have no impact on existing supported cases.

Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
2023-07-25 08:52:13 -07:00
Omar Sandoval
470b2e02de drgn.helpers.linux.net: add skb_shinfo()
Closes #335.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-20 10:23:14 -07:00
Omar Sandoval
d572ecbe4f drgn.helpers.linux.net: add netdev_priv()
Closes #334.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 23:15:08 -07:00
Omar Sandoval
50258ad0b4 pre-commit: update Black to 23.7.0
This added one minor style fix.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 14:51:36 -07:00
Omar Sandoval
f27485670a CONTRIBUTING: add Linux kernel helper guidelines
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 14:48:22 -07:00
Omar Sandoval
9022a01667 docs: update required Sphinx version to 5.3.0
This isn't the latest version, but it's the version that I use locally
on Fedora 38.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 13:50:09 -07:00
Omar Sandoval
ece86bd260 docs: document supported architectures and kernel versions
Closes #287, closes #288.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 13:45:28 -07:00
Omar Sandoval
55a3ebca6c libdrgn: dwarf_info: support DWO split DWARF
We've addressed all of the smaller differences with GNU Debug Fission
and split DWARF 5, so now all that remains is the DWARF index.

The general approach is: in drgn_dwarf_index_read_cus(), for each CU,
ask libdw for the "sub-DIE". For skeleton CUs, this is the split CU DIE
from the .dwo file. From that Dwarf_Die, we can get the Dwarf_CU and
then the Dwarf handle. Then, we wrap that in a struct drgn_elf_file
(cached in a hash table in the struct drgn_module), which the DWARF
index can work with from there.

Additionally, a couple of places (.debug_addr parsing and stack trace
local variable lookup) need to be updated to use the correct
drgn_elf_file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
c7f1d0d40c libdrgn: dwarf_info: read CU DIE with libdw in DWARF index
Split DWARF is challenging for the DWARF index for a couple of reasons:

1. We need libdw to look up the split files.
2. The file name table comes from the skeleton file, but everything else
   relevant to the index comes from the split file.

(1) requires the index to use libdw to get the CU DIE. Unfortunately,
due to the overhead of libdw, this makes the indexing step 5-10% slower.
On the plus side, getting the CU DIE upfront simplifies quite a bit: we
can read the file name table, compilation directory, and str_offsets
base before indexing, which makes supporting (2) possible.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
fc1ee46941 libdrgn: dwarf_info: parse units with dwarf_next_unit() in DWARF index
In the next change, we'll need more information about the unit, and
there's no benefit to doing it ourselves anymore.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
645950134b libdrgn: dwarf_info: move file name table parsing code
No changes, this just moves the code now so that later changes are more
obvious.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
0e6a0a5f94 libdrgn: dwarf_info: get rid of struct drgn_dwarf_index_pending_cu
Instead, reuse struct drgn_dwarf_index_cu for the pending CUs. This is
mainly so that we can save more information in the pending CU in a later
change. It also lets us merge our per-thread pending CU arrays with
memcpy() instead of element-by-element, but I didn't measure a
performance difference one way or the other.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
05c3b244bf libdrgn: dwarf_info: handle GNU Debug Fission location lists
GNU Debug Fission's location lists are a hybrid of the DWARF 5 and
non-split DWARF 4 versions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
9c307a4df4 libdrgn: dwarf_info: handle split DWARF .debug_addr
There are a couple of differences with non-split DWARF 5:

- DW_AT_addr_base/DW_AT_GNU_addr_base is in the skeleton DIE, so we need
  to use dwarf_attr_integrate().
- GNU Debug Fission for DWARF 4 doesn't have headers in .debug_addr.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 09:59:31 -07:00
Omar Sandoval
a78d30e13e libdrgn: dwarf_info: handle split DWARF in dwarf_module_find_dwarf_scopes()
dwarf_module_find_dwarf_scopes() and drgn_dwarf_die_iterator_next() just
need to go from skeleton units to split units. We need to use
dwarf_cu_info(), which was added in 0.171, which incidentally was when
elfutils gained split DWARF support anyways.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 09:59:31 -07:00
Omar Sandoval
4fa1dfc063 libdrgn: dwarf_info: handle missing DW_AT_loclists_base
It seems like GCC omits this for split units when using DWARF 5,
intending it to mean the first entry in .debug_loclists.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 09:59:31 -07:00
Omar Sandoval
28b3e016f9 libdrgn: dwarf_info: handle missing DW_AT_str_offsets_base
GNU Debug Fission doesn't have DW_AT_str_offsets_base but does have
.debug_str_offsets. GCC doesn't emit DW_AT_str_offsets_base for DWARF 5
split DWARF. In both cases, the default is the first entry in
.debug_str_offsets.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 09:59:31 -07:00
Omar Sandoval
c4ebbc29ca libdrgn: dwarf_info: fix CU header size computation for GNU Debug Fission
dwo_id was added in split DWARF 5; GNU Debug Fission doesn't have it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 09:59:31 -07:00
Omar Sandoval
69a99dde0d tests: test DWARF 5
Pick a few tests where the version difference matters rather than
running every test twice.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 09:59:31 -07:00
Omar Sandoval
4eb0a8fa85 cli: configure logger
Now that drgn is hooked up to log to the logging module, let's configure
the logging module to print logs nicely and add a --log-level command
line option. This makes the quiet parameter to run_interactive()
redundant, so we ignore it now and will remove it in a future release.
I'm not sure whether we should expose the log formatter, or maybe
run_interactive() should also set up the logger. I may also want to
break download progress out into a separate option from --quiet and then
make --quiet equivalent to --log-level=none --progress=never. All of
that can happen later.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:47:34 -07:00
Omar Sandoval
81c8672d4d libdrgn: python: log to the standard logging module
Rather than coming up with our own, separate logging API for the Python
bindings, let's integrate with the logging module. The straightforward
part is creating a logger from the C extension and adding a log callback
that calls its log() method. However, syncing the log level between the
logging module and libdrgn requires monkey patching.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:47:34 -07:00
Omar Sandoval
c1a2792e6a libdrgn: add simple logging framework
Exceptions aren't enough to debug complicated code paths like debug info
discovery or stack unwinding. We really need logs for that, so let's add
a small logging framework. By default, we log to stderr, but we also
provide a way to direct logs to a different file, or even an arbitrary
callback so that logs can be directed to the application's logging
library of choice.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:47:34 -07:00
Omar Sandoval
fa82071618 libdrgn: call blocking hooks around DWARF index
DWARF indexing can take a long time; Kevin Svetlitski notes that it can
take almost a minute on some large binaries. Let's use the new blocking
API around it so that the Python bindings drop the GIL.

Closes #247.

Suggested-by: Kevin Svetlitski <svetlitski@meta.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:47:34 -07:00
Omar Sandoval
0ad19dc37b libdrgn: python: set blocking callback to release GIL
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:47:34 -07:00
Omar Sandoval
06a825f315 libdrgn: add API for hooks around blocking operations
There are places in drgn where it'd be a good idea to drop the Python
GIL. However, some of these are deep inside of libdrgn, where some code
paths are fast and dropping the GIL would be extra overhead and others
are slow (e.g., type lookups, which may be cached or may require DWARF
namespace indexing). Instead of trying to do this from the Python
bindings, add hooks to libdrgn. These hooks can be used directly or with
a new scope guard macro, drgn_blocking_guard, that we can start
sprinkling around in appropriate places in libdrgn.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:47:34 -07:00
Omar Sandoval
5c1b6cf764 docs: document thread safety
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:33:35 -07:00
Omar Sandoval
471e32e906 libdrgn: debug_info: try harder to get debug file path
We're getting (null) file paths in error messages (e.g., #233) because
libdwfl doesn't always return the debug file path. Fall back to the
loaded file path, which is better than nothing until we get rid of
libdwfl.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-18 12:33:35 -07:00
Omar Sandoval
859b3c5053 vmtest.config: add highmem=off to Arm QEMU config
My Arm VM fails to boot on QEMU 7.2.1 after the following sequence of
events on the kernel console:

  pci-host-generic 4010000000.pcie: can't claim ECAM area [mem 0x10000000-0x1fffffff]: address conflict with pcie@10000000 [mem 0x10000000-0x3efeffff]
  pci-host-generic: probe of 4010000000.pcie failed with error -16
  ...
  9pnet_virtio: no channels available for device /dev/root
  VFS: Cannot open root device "" or unknown-block(0,0): error -2
  Please append a correct "root=" boot option; here are the available partitions:
  Can't find any bdev filesystem to be used for mount!
  Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Turning off highmem fixes the conflict. (I think this previously worked
without highmem=off on Arch Linux, so maybe there's something different
in Fedora's QEMU.)

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-10 10:59:40 -07:00
Omar Sandoval
745217031f vmtest.config: add 6.5 to supported kernels
After a streak of changes required from 6.2-6.4, 6.5 thankfully doesn't
need any updates.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-10 10:42:35 -07:00
Omar Sandoval
e901de6137 vmtest.config: work around ppc64 build failure on v6.5-rc1
I sent a patch titled "powerpc/crypto: fix missing skcipher dependency
for aes-gcm-p10" [1] to fix the build failure, but we also have no need
for this module anyways.

1: https://lists.ozlabs.org/pipermail/linuxppc-dev/2023-July/260369.html

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-10 09:51:51 -07:00
Omar Sandoval
0bcef5b77f libdrgn: dwarf_info: get byte order from passed file in drgn_eval_cfi_dwarf_expression()
Commit 18b12a5c7b ("libdrgn: get .eh_frame from the correct file")
missed this, but it's unlikely to matter in practice.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-07 15:33:44 -07:00
Omar Sandoval
4ea46bf283 setup.py: allow testing against local kernels
Move the logic for local kernels from vmtest.kmod and vmtest.vm to
vmtest.config and use it in setup.py.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-07 14:57:08 -07:00
Omar Sandoval
c76f25b852 libdrgn: dwarf_info: ignore DW_OP_{,GNU_}entry_value
These opcodes appear in practice, and we choke on them with an exception
like "unknown DWARF expression opcode 0xf3" or "unknown DWARF expression
opcode 0xa3". In some cases, it'd be possible to recover the entry value
by looking at call site information, but that's pretty involved. For
now, just treat these operations as optimized out so we stop failing
hard.

Closes #233.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-06 22:00:25 -07:00
Omar Sandoval
916a7217fb libdrgn: dwarf_info: don't call dwarf_dieoffset() redundantly
When we get the DIE from the offset with dwarf_offdie(), there's no need
to go back to the offset with dwarf_dieoffset().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-06 13:56:00 -07:00
Omar Sandoval
b5018aa913 libdrgn: dwarf_info: only iterate necessary DIE subtrees in drgn_module_find_dwarf_scopes()
Thierry found that as soon as drgn_module_find_dwarf_scopes() finds any
DIE containing the PC, it walks the entire subtree rooted at that DIE.
However, we only need to look at the immediate children of a DIE
containing the PC. I think this is what I originally intended, but I
failed to reset the children flag to false when the last DIE didn't
contain the PC. Thierry's suggested check of it.dies.size == subtree is
simpler.

This is a massive performance improvement: for a kernel core dump with
10k threads, getting the stack trace of every thread took ~90 seconds
without this fix and ~50 seconds with it.

Let's also add a comment to this very subtle code.

Fixes: d8d4157346 ("libdrgn: debug_info: add drgn_debug_info_module_find_dwarf_scopes()")
Co-authored-by: Thierry Treyer <ttreyer@fb.com>
Signed-off-by: Thierry Treyer <ttreyer@fb.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-06 11:00:16 -07:00
Omar Sandoval
59ce589984 setup.py: get kmod path from build_kmod()
No need to duplicate building the path, and this way if we ever change
the location it won't need to be updated here.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-06 00:36:21 -07:00
Omar Sandoval
c69e5b1435 drgn.helpers.linux.list: add list_count_nodes()
I've needed this many times, but there wasn't a corresponding function
in the kernel so I could never decide what to name it. Linux kernel
commit 4d70c74659d9 ("i915: Move list_count() to list.h as
list_count_nodes() for broader use") (in v6.3-rc1) fixed that problem
for me.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-30 15:40:47 -07:00
Omar Sandoval
7cb3e99b23 libdrgn: program: find crashed task with cpu_curr() instead of find_task()
s390x populates the pid field in NT_PRSTATUS with the CPU number plus 1
[1] instead of the PID of the task that was running on that CPU. This
means that we get the wrong task_struct from drgn_program_find_thread()
in drgn_program_kernel_core_dump_cache_crashed_thread(), or don't find
the task_struct and crash because of a missing NULL check.

We can work around this and also gracefully handle the normal and idle
cases by instead getting the current task_struct from the CPU runqueue.
This is slightly racy: rq->curr is updated in __schedule() [2] before
the registers and stack are switched in context_switch() [3]. However,
it was already racy, since the pid field in NT_PRSTATUS is populated
from current, which is updated after the registers and stack are
switched (at least on x86-64) [4].

Closes #314.

1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/s390/kernel/crash_dump.c?h=v6.4#n309
2: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v6.4#n6646
3: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v6.4#n5343
4: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/process_64.c?h=v6.4#n621

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-29 16:16:27 -07:00
Omar Sandoval
cc0994a010 drgn.helpers.linux.sched: add cpu_curr() helper
This will be used internally, but it's also a nice shortcut for
per_cpu(prog["runqueues"], cpu).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-29 15:58:52 -07:00
Omar Sandoval
d5c53099d2 vmtest.enter_kdump: use kexec(8) on non-x86-64 architectures
We can trivially call the kexec_file_load() syscall on x86-64, but it's
more complicated for other architectures. We can call the kexec utility
for those.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-29 15:21:50 -07:00
Omar Sandoval
fed301f518 vmtest.enter_kdump: move body into main() function
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-29 15:19:44 -07:00
Omar Sandoval
5057308c0f drgn 0.0.23
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-28 13:59:18 -07:00
Omar Sandoval
75d2fee3e9 docs: add 0.0.23 release highlights
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-28 13:58:19 -07:00
Omar Sandoval
0d6438d994 libdrgn: orc_info: use .orc_header to detect version
My kernel patch was merged for Linux 6.4 and backported to 6.3.10, so
now we can use the .orc_header section to reliably detect the ORC format
version. Since the 6.4 release candidates and older versions of 6.3
don't have .orc_header, we'll keep the version check as a fallback.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-28 11:10:18 -07:00
Omar Sandoval
1787e25a71 drgn.helpers.common.format: fix number_in_binary_units() docstring
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-28 10:45:17 -07:00
Omar Sandoval
ec9a963571 tests: use symbol from test module instead of __schedule for identify_symbol() test
__schedule tends to end up aliased with other special symbols because
it's placed in the .sched.text section. We're handling
__sched_text_start, but on ppc64, I also saw __cpuidle_text_end. Use
drgn_test_function from the test kernel module instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-27 18:57:43 -07:00
Omar Sandoval
b47152a6e7 tests: remove unused IDR from test kernel module
The test kernel module has several -Woverflow warnings when compiled for
Arm:

  /home/osandov/repos/drgn/build/vmtest/arm/tmp9g8977mn/drgn_test.c:785:9: warning: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '6221254864074593878' to '1448498774' [-Woverflow]
    785 |         0x5656565656565656,
        |         ^~~~~~~~~~~~~~~~~~
  /home/osandov/repos/drgn/build/vmtest/arm/tmp9g8977mn/drgn_test.c:786:9: warning: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '1311768465173141112' to '305419896' [-Woverflow]
    786 |         0x1234567812345678,
        |         ^~~~~~~~~~~~~~~~~~
  /home/osandov/repos/drgn/build/vmtest/arm/tmp9g8977mn/drgn_test.c:787:9: warning: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '1311768467294899695' to '2427178479' [-Woverflow]
    787 |         0x1234567890abcdef,
        |         ^~~~~~~~~~~~~~~~~~

drgn_test_idr_dense and drgn_test_idr_ptrs aren't actually used by the
tests, so remove them.

Fixes: 4f2c8f0735 ("tests: idr: add test cases for idr.")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-27 17:31:23 -07:00
Omar Sandoval
28e7a79925 setup.py: use setuptools versions of distutils.command.build and distutils.errors
The setuptools guide for porting from distutils [1] recommends this.
However, the setuptools versions aren't available on older versions of
setuptools, so we still need to fall back to distutils.

1: https://setuptools.pypa.io/en/latest/deprecated/distutils-legacy.html

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-27 16:03:43 -07:00