The test kernel module has several -Woverflow warnings when compiled for
Arm:
/home/osandov/repos/drgn/build/vmtest/arm/tmp9g8977mn/drgn_test.c:785:9: warning: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '6221254864074593878' to '1448498774' [-Woverflow]
785 | 0x5656565656565656,
| ^~~~~~~~~~~~~~~~~~
/home/osandov/repos/drgn/build/vmtest/arm/tmp9g8977mn/drgn_test.c:786:9: warning: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '1311768465173141112' to '305419896' [-Woverflow]
786 | 0x1234567812345678,
| ^~~~~~~~~~~~~~~~~~
/home/osandov/repos/drgn/build/vmtest/arm/tmp9g8977mn/drgn_test.c:787:9: warning: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '1311768467294899695' to '2427178479' [-Woverflow]
787 | 0x1234567890abcdef,
| ^~~~~~~~~~~~~~~~~~
drgn_test_idr_dense and drgn_test_idr_ptrs aren't actually used by the
tests, so remove them.
Fixes: 4f2c8f0735 ("tests: idr: add test cases for idr.")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The setuptools guide for porting from distutils [1] recommends this.
However, the setuptools versions aren't available on older versions of
setuptools, so we still need to fall back to distutils.
1: https://setuptools.pypa.io/en/latest/deprecated/distutils-legacy.html
Signed-off-by: Omar Sandoval <osandov@osandov.com>
distutils is deprecated and was removed in Python 3.12 [1]. setuptools
provides a vendored copy of distutils, but they discourage against using
it [2]. So, let's stop importing distutils, starting with replacing our
calls of distutils.file_util.copy_file() and distutils.dir_util.mkpath()
with their equivalent methods from setuptools.Command.
1: https://peps.python.org/pep-0632/
2: https://setuptools.pypa.io/en/latest/deprecated/distutils-legacy.html
Signed-off-by: Omar Sandoval <osandov@osandov.com>
6.4 needed an unusual number of changes:
- 3f3a957562 ("libdrgn: linux_kernel: Fix module detection on kernel
v6.4")
- da7a11f83e ("drgn.helpers.linux.block: update
for_each_{disk,partition} for Linux 6.4")
- 7e7c300a139b ("libdrgn: orc_info: handle ORC changes in Linux 6.3 and
6.4")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The ORC format changed twice recently:
- Linux kernel commit ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field
to ORC metadata") (in v6.3).
- Linux kernel commit fb799447ae29 ("x86,objtool: Split
UNWIND_HINT_EMPTY in two") (in v6.4).
The former went unnoticed because the change was subtle, and the latter
completely broke x86-64 kernel stack traces.
To handle this, let's "upgrade" the format to the latest version when we
load and sort the ORC information. This is more work upfront but avoids
needing to handle the version differences every time we use ORC to
unwind.
Unfortunately, ORC currently doesn't have any sort of versioning, so we
have to break the rule of not checking kernel versions. However, I have
a kernel patch pending merging that should fix this for the future.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
It's unrealistic for there to be more than 4 billion ORC entries. Switch
to an unsigned int. The main benefit is that the indices array that we
use to sort the parallel arrays of entries and pc_offsets becomes half
the size, which also makes parsing ORC about 10% faster (down from ~5 ms
to ~4.5 ms for the Fedora vmlinux on my laptop).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
.orc_unwind_ip and .orc_unwind are only referenced while initially
parsing ORC data and then never touched again, so it's wasteful to cache
them in struct drgn_elf_file. Look them up if and when we parse the ORC
data instead.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In practice, the .orc_unwind and .orc_unwind_ip sections will always be
suitably aligned. Check it, then assume the alignment later.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
By using the same temporary objects in the Linux 6.4 branch as the
pre-6.4 branch, we get slightly better code generation.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We only support .debug_* sections, but libdw also supports .zdebug_*,
.debug_*.dwo, and .gnu.debuglto_.debug_*. Mimic how libdw chooses debug
sections, with one exception: .debug_cu_index and .debug_tu_index (used
for DWP, which we don't support yet but will) should be considered DWO
sections (this needs to be fixed in libdw, too).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
With 3.12 beta, there are 7 Python versions to test, which is making the
test matrix too large. To reduce the amount of CI test runs that occur,
by default we will only run 2: the oldest and newest released Python
versions. To further reduce the CI test runs, we will eliminate CI runs
when a branch is pushed (only main will be tested).
The "ci" action becomes a reusable workflow, which gets called by other
possible workflows:
1. pull_request.yml: On pull request, only the 2 default Python versions
are tested. The "test-all-python-versions" label can be applied to
pull requests to override this.
2. push.yml: On push to main, all Python versions are tested.
3. vmtest-build.yml: After the vmtest kernels are built
4. On manual request: the user has the option to specify
Signed-off-by: Stephen Brennan <stephen@brennan.io>
ORC_REG_SP_INDIRECT is supposed to be an indirect access via rsp, but we
have a typo and are using rbp instead. This is a partial fix for #304.
Fixes: 630d39e345 ("libdrgn: add ORC unwinder")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Instead of simply checking whether the thread ID is non-zero, write a
specific task name to /proc/self/comm before crashing and check that.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We currently use crashing_cpu to determine the thread that caused a
kernel crash. However, crashing_cpu is x86-specific (it is defined in
arch/x86/kernel/reboot.c). Since Linux 4.5, the generic panic code
defines a very similar variable, panic_cpu. Use that instead so that we
support all architectures, but fall back to crashing_cpu to support
older kernels on x86 (even though we don't claim to support 4.4
anymore).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
btrfs_tree.py is a pile of helpers Omar gave me once that I have
cherished ever since. It mostly deals with walking and parsing btrees.
btrfs_tree_mod_log.py is a script I wrote to do a tree mod log search
and rewind while debugging a btrfs issue.
Signed-off-by: Boris Burkov <boris@bur.io>
follow_{page,pfn,phys}() translate the virtual address by walking the
page table for a given mm_struct (built on top of the existing page
table iterator interface). vmalloc_to_page() and vmalloc_to_pfn() are
special cases for vmalloc addresses.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Applying the 5.10-stable patches fixing BTF with older pahole to
5.11-5.13 missed updating the pahole flags for kernel modules (because
kernel modules didn't have BTF in 5.10). Regenerate the patches for 5.11
ourselves.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Commit bf9159cff367 ("drgn.helpers.linux.mm: fix compound_{order,nr} for
Linux 6.3+") is the only change necessary for Linux v6.3.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Linux 6.3 removed struct page::compound_order. Instead, we need to use
struct folio::_folio_order. Update compound_order() and redefine
compound_nr() in terms of compound_order() so we only need to handle the
divergence in one place.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Linux kernel commits 884f8ce42cce ("driver core: class: implement
class_get/put without the private pointer.") and 2df418cf4b72 ("driver
core: class: remove subsystem private pointer from struct class") (in
v6.4) changed the way to go from struct class to struct subsys_private.
_for_each_block_device(), used by for_each_disk() and
for_each_partition(), needs struct subsys_private. The "new" way also
works on old kernels, so add _class_to_subsys() mirroring
class_to_subsys() in Linux 6.4 and make _for_each_block_device() use it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
With GCC 13.1.1 and the recommended build
setup (CONFIGURE_FLAGS="--enable-compiler-warnings=error"), I get the
following failure:
In function 'linux_kernel_get_vmemmap',
inlined from 'linux_kernel_object_find' at ../../libdrgn/linux_kernel_object_find.inc.strswitch:34:12:
../../libdrgn/linux_kernel.c:370:23: error: 'address' may be used uninitialized [-Werror=maybe-uninitialized]
370 | err = drgn_object_set_unsigned(&prog->vmemmap, qualified_type,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
371 | address, 0);
| ~~~~~~~~~~~
../../libdrgn/linux_kernel.c: In function 'linux_kernel_object_find':
../../libdrgn/linux_kernel.c:361:26: note: 'address' was declared here
361 | uint64_t address;
| ^~~~~~~
cc1: all warnings being treated as errors
While linux_kernel_get_vmemmap_address should always update address in a
non-error case, the compiler seems to disagree. It's easy enough to shut
up the compiler by initializing address to 0. What's more, if there is
an actual issue where the linux_kernel_get_vmemmap_address does NOT
update the address variable, a 0 value will be easier to debug than
garbage from an uninitialized variable.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Running tests on Python 3.12, we get:
test_int (tests.test_language_c.TestLiteral.test_int) ... python3.12: /usr/include/python3.12/object.h:215: Py_SIZE: Assertion `ob->ob_type != &PyLong_Type' failed.
Aborted (core dumped)
We're relying on an implementation detail to check whether the object is
negative. Instead, catch an overflow error, negate and try again.
Genuine overflows will still overflow on the second time, but negative
numbers will succeed.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Kernel commit ac3b43283923 ("module: replace module_layout with
module_memory") in v6.4 changed the layout of `struct module`, resulting
in the following drgn error [1].
Fix this by first trying to determine the base address and size of each
kernel module via the `struct module_memory mem[MOD_TEXT]` member,
before falling back to previous methods that work on older kernels.
Tested on v6.4-rc2 and v6.3 which does not include the above mentioned
commit.
Note that kernel commit b4aff7513df3 ("scripts/gdb: use mem instead of
core_layout to get the module address") performs a similar fix in Python
GDB scripts.
Closes#296.
[1]
```
# drgn
drgn 0.0.22 (using Python 3.11.3, elfutils 0.189, with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import NULL, Object, cast, container_of, execscript, offsetof, reinterpret, sizeof
>>> from drgn.helpers.common import *
>>> from drgn.helpers.linux import *
warning: could not get debugging information for:
kernel modules (could not find loaded kernel modules: 'struct module' has no member 'core_size')
```
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Add support for walking s390x page tables. This supports
up to 5 level page table walking and huge/large pages. In order
to figure out the level of paging used, we read the first entry
of the pgd, which is always mapped for lowcore access and use the
level bits of the next page table. This is because drgn passes mm::pgd
as pgtable argument to the walker function which doesn't contain the
ASCE bits.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Since Linux kernel commit 8dbd76e79a16 ("tcp/dccp: fix possible race
__inet_lookup_established()") (in v5.5), it's actually hlist_nulls_head
what is used for items of inet_hashinfo::listening_hash. This commit was
also backported to stable kernels, so with the change it works for all
releases starting with v4.9. The version 4.4 is not supported right now
(nulls head list is probably not known at the time).
Fixes: #268
Signed-off-by: Martin Liska <mliska@suse.cz>
This script mimics crash>sys and provides the following output:
CPUS 16
DATE Fri Jan 27 20:26:24 2023
UPTIME 1 day, 7:29:37
LOAD AVERAGE 0.00, 0.00, 0.00
TASKS 317
NODENAME tw
RELEASE 6.1.7-1-default
VERSION #1 SMP PREEMPT_DYNAMIC Wed Jan 18 11:12:34 UTC 2023 (872045c)
MACHINE x86_64
MEMORY 12.67 GiB
Signed-off-by: Martin Liska <mliska@suse.cz>
This false positive appears to only trigger on 32-bit. I reproduced it
with GCC 10 and 12.
Fixes#242.
Reported-by: Timothée Cocault <timothee.cocault@gmail.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
LPAE uses a different page table format and enables 2GB huge pages, so
it'll be interesting to test.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We go through the trouble of enabling a real-time clock driver for other
architectures, so do the same on x86-64.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
vmtest.kbuild -b $build_dir -p $package_dir --package-format directory
fails when attempting to rename the temporary package directory if
$build_dir and $package_dir are on different filesystems or Btrfs
subvolumes. Put the temporary package directory in $package_dir so that
the rename can succeed.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
High memory is not directly mapped, so _find_containing_slab() needs to
check against max_low_pfn, not max_pfn.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The phys <-> pfn helpers only need PAGE_SHIFT, which is always
available, the page <-> pfn helpers only need vmemmap or mem_map, which
is independent of virtual address translation, and the page <-> phys
helpers are built on top of those.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
AArch64, ppc64, s390x, and x86-64 all default to
CONFIG_SPARSEMEM_VMEMMAP, which allows us to translate between struct
page * and PFN with trivial array operations on vmemmap. Arm and i386,
however, use CONFIG_FLATMEM, which has a different array, mem_map, that
we can use similarly. Update the helpers that use vmemmap to use a new
wrapper that gets the proper array.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The value can change between two reads. This caused test failures in the
previous commit 18a8f69ad8 ("libdrgn: linux_kernel: add object finder
for jiffies"). The important thing is that the addresses are the same.
Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>