We have multiple places where we match an input string against several
cases:
* drgn_lexer_c() checks C identifiers against a runtime hash table of C
keywords.
* linux_kernel_object_find() has an if-else ladder of checks for object
names.
* drgn_debug_info_find_sections() loops over an array of ELF section
names to look for sections we need.
* libdrgn/build-aux/gen_arch.awk generates a compile-time trie using
nested switch statements to match register names.
This commit adds a script, gen_strswitch.py, that can hopefully be used
to replace all of these. gen_strswitch.py generalizes the compile-time
trie idea from gen_arch.awk in a few ways:
* It has syntax and semantics based on C switch statements.
* It supports both null-terminated strings and strings with an explicit
length.
* It compresses unique substrings to calls to strcmp(), strncmp(), or
memcmp() when appropriate.
In benchmarks, this approach is more performant than the above options
as well as a candidate based on gperf, while resulting in machine code
around the same size as the straightforward if-else ladder approach.
Future commits will convert the use cases above to use this script.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, Python is only required to build the Python bindings. I
originally wanted to avoid having Python as a build dependency of
libdrgn, which is why gen_arch is an AWK script. However, I want to add
another code generation script which is harder to do in AWK.
Additionally, these days more people are familiar with Python than AWK,
so let's just bite the bullet and require Python to build. No one builds
libdrgn by itself anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
ASAN is incredibly useful during development, especially when dealing
with non-deterministic behavior where re-running the code under a debugger
won't necessarily reproduce the bug each time. In order not to break any
existing workflows, building with ASAN is opt-in (via --enable-asan).
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
Running the test suite with ASAN enabled revealed that when
the current target was a userspace core dump, the `crashed_thread`
member of `struct drgn_program` was being freed twice – once indirectly
via `drgn_thread_set_deinit`, and once explicitly in `drgn_prog_deinit`.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
libdrgn uses rint() for formatting floating-point numbers. rint() is
provided by libm, so we need to link with -lm.
This missing library has been masked for a couple of reasons:
1. Python is linked against libm, so the drgn Python bindings implicitly
have this dependency satisfied.
2. On x86-64, GCC has a builtin implementation of rint().
This can be reproduced on x86-64 by building examples/load_debug_info
with -fno-builtin.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Peilin Ye reported a couple of related crashes in drgn caused by Linux
kernel modules which had been processed with objcopy --only-keep-debug
(although he notes that since binutils-gdb commit 8c803a2dd7d3
("elf_backend_section_flags and _bfd_elf_init_private_section_data") (in
binutils v2.35), objcopy --only-keep-debug doesn't seem to work for
kernel modules).
If given an SHT_NOBITS section, elf_getdata() returns an Elf_Data with
d_buf = NULL and d_size set to the size in the section header, which is
often non-zero. There are a few places where this can cause us to
dereference a NULL pointer:
* In relocate_elf_sections() for the relocated section data.
* In relocate_elf_sections() for the symbol table section data.
* In get_kernel_module_name_from_modinfo().
* In get_kernel_module_name_from_this_module().
Fix it by checking the section type or directly checking Elf_Data::d_buf
everywhere that could potentially get an SHT_NOBITS section. This is
based on a PR from Peilin Ye.
Closes#145.
Reported-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
relocate_elf_section() shouldn't need to deal with reading the sections.
Pull that logic out into relocate_elf_file() (which will be shared with
REL-style relocations when we support those) and rename
relocate_elf_section() to apply_elf_relas().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Most of the places where we call elf_getdata() (via read_elf_section())
deal with SHT_PROGBITS sections. elf_getdata() always returns the
literal contents of the file for SHT_PROGBITS sections (usually straight
out of the mmap'd file).
The exceptions are the SHT_RELA and SHT_SYMTAB sections in
relocate_elf_section(). For these, if the byte order or alignment of the
file do not match the host, elf_getdata() allocates a new buffer and
converts the contents. relocate_elf_section() also handles unaligned
buffers and swaps the byte order, so it mistakenly ends up with the
original byte order of the file.
Rather than removing that from relocate_elf_section(), let's avoid the
extra allocation and use elf_rawdata(), which always returns the literal
contents of the file.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Sphinx 4.4.0 warns that kyber_stack_trace.rst could use an extlink for
the link to the Kyber source code instead of hard-coding the link to the
Linux repo.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Commit 7d7aa7bf7b ("libdrgn/python: remove Type == operator") removed
the substantial part of tests.TestCase.setUp() but didn't remove the
method.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We have several for_each_entry helpers that take a type as a str or
drgn.Type and pass that directly to container_of(). If a str is given,
then this looks up the type by name for every entry. We can optimize
this and only look up the type once. In a benchmark derived from
examples/linux/fs_inodes.py, iterating over ~900k entries was reduced
from ~2 seconds to ~1.6 seconds.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We indirectly test the linked list helpers via other helpers that use
them, but let's add some dedicated tests. We test against two lists:
* "modules", which is never empty in vmtest because we load the loop
module.
* "vmcore_list", which is always empty except in the kdump kernel.
Like the red-black tree tests, these tests are written generically so we
can use a different list if necessary.
There aren't any great candidates for hlists, so we'll have to make do
with indirectly testing them for now.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The test cases use the VMA tree and cross-reference it with
/proc/$pid/maps, but they're written so it could easily be swapped out
for another tree if necessary.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We're passing the entry type to NULL instead of the entry pointer type,
which results in errors like:
File "/home/osandov/repos/drgn/drgn/helpers/linux/rbtree.py", line 210, in rb_find
return NULL(root.prog_, type)
TypeError: 'struct vm_area_struct' value must be dictionary or mapping
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Some helpers can accept either a str or a Type. If they want to always
work with a Type internally, they need to do something like:
if isinstance(type, str):
type = prog.type(type)
Instead, let's let Program.type() accept a Type and return the exact
same type, so those helpers can unconditionally do:
type = prog.type(type)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This is a common mistake:
$ drgn core_dump
Traceback (most recent call last):
File "/usr/bin/drgn", line 33, in <module>
sys.exit(load_entry_point('drgn==0.0.16', 'console_scripts', 'drgn')())
File "/usr/lib/python3.10/site-packages/drgn/internal/cli.py", line 133, in main
runpy.run_path(args.script[0], init_globals=init_globals, run_name="__main__")
File "/usr/lib/python3.10/runpy.py", line 268, in run_path
code, fname = _get_code_from_file(run_name, path_name)
File "/usr/lib/python3.10/runpy.py", line 242, in _get_code_from_file
code = compile(f.read(), fname, 'exec')
ValueError: source code string cannot contain null bytes
The user intends to debug the core dump, but they've actually specified
the core dump as a Python script to run. The error message from the
runpy internals does not make that clear. So, let's catch this earlier
by doing a quick-and-dirty test of the file magic to see if it looks
like a core dump or other ELF file. If so, we exit with a more helpful
message:
$ drgn core_dump
error: core_dump is a core dump
Did you mean "-c core_dump"?
$ drgn /usr/bin/ls
error: /usr/bin/ls is a binary, not a drgn script
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently we can lookup symbols by name or address, but this will only
return one symbol, prioritizing the global symbols. However, symbols may
share the same name, and symbols may also overlap address ranges, so
it's possible for searches to return multiple results. Add functions
which can return a list of multiple matching symbols.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
Now that pre-commit is added, replace the manual commands for mypy,
isort, and black with equivalent pre-commit commands. This allows us to
avoid duplicating linter arguments. It also allows us to pin the linters
used in CI by way of the .pre-commit-config.yaml file, ensuring
reproducible lint errors.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
`pre-commit run --all-files` results in the following minor
updates, which appear to be caused by my own failure to run linters.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
During PRs, lint and mypy errors can show up in the CI tests, which is
useful, but can introduce unnecessary churn on the PR as small lint
fixes are pushed. This commit adds (optional) support for pre-commit, a
tool which can be configured to run as a git pre-commit hook, running
linters on all changed code to catch issues before you push your code.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Previously, drgn had no way to represent a thread – retrieving a stack
trace (the only extant thread-specific operation) was achieved by
requiring the user to directly provide a tid.
This commit introduces the scaffolding for the design outlined in
issue #92, and implements the corresponding methods for userspace core
dumps, the live Linux kernel, and Linux kernel core dumps. Future work
will build on top of this commit to support live userspace processes.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
The thread API needs a way to iterate over all task_structs in the
kernel. Previously, we translated the existing for_each_task helper,
which supports iterating through specific PID namespaces by walking
through the PID radix tree or PID hashtable. However, we don't need
specific namespaces for the thread API, so we can instead use the much
simpler linked lists of thread groups and threads.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
This reverts commit 2b47583c73. After
Kevin had completed this, we realized that there is a simpler method for
iterating through tasks from libdrgn, which the next commit will
implement. Revert the translation, but keep the improved
tests.helpers.linux.test_pid.TestPid.test_for_each_task.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Disabling SMP is necessary to work around a bug in QEMU's handling of
the capture kernel, but makes the tests run much slower. However, this
bug only appears to manifest when KVM acceleration is disabled, so the
testing harness has been modified to only disable SMP when this is true.
[Omar: use an environment variable instead of touching a file]
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
The majority of test cases already inherited from drgn's TestCase class.
The few outliers that inherited directly from unittest.TestCase have
been brought in line with the other tests.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
Now that the vmtest kernel supports kdump, add a script that can be used
to crash and enter the kdump environment on demand. Use that to crash
after running the normal test suite so that we can run tests against
/proc/vmcore. vmcore tests live in their own directory; presently the
only test is a simple sanity check that ensures we can can attach to
/proc/vmcore.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
We can avoid the need for the kexec tool if we load the kdump kernel
ourselves, which is much easier with kexec_file_load(). Add the config
options to enable it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We would like to test drgn against kernel core dumps (e.g., for #129).
One option would be to include some vmcore files in the repository and
test against those. But those can be huge, and we'd need a lot of them
to test different kernel versions. Instead, we can run vmtest, enable
kdump, and trigger a crash. To do that, we first need to enable a few
kernel config options.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Kernels built without multiprocessing support don't have
__per_cpu_offset; instead, per_cpu_ptr() is a no-op. Make the helper do
the same and update the test case to work on !SMP as well.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I missed that the kernel has an idle_task() function which uses
cpu_rq()->idle instead of idle_threads; the latter is technically
architecture-specific. So, replace idle_thread() with idle_task(), which
is architecture-independent and more consistent with the kernel.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add helpers to convert between sockets and inodes. As an example:
>>> file = fget(task, fd)
>>> sock = SOCKET_I(file.f_inode)
>>> sock.type.value_()
2
>>> import socket
>>> int(socket.SOCK_DGRAM)
2
>>> inode = SOCK_INODE(sock)
Also add tests for the new helpers to tests/helpers/linux/test_net.py.
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
PR #129 will need to get the idle thread for a CPU when the idle thread
crashed. Add a helper for this.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, we create a section filled with zeroes to contain the symbols
in our ELF symbol tests. We can just use a NOBITS section with no file
data instead.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for introducing an API to represent threads, the linux
helper iterators, radix_tree_for_each, idr_for_each, for_each_pid, and
for_each_task have been rewritten in C. This will allow them to be
accessed from libdrgn, which will be necessary for the threads API.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
This minor change is a quality of life improvement ensuring developers
receive more warnings and diagnostics in their editors.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
With mypy 0.920, two warnings appear on current main:
$ mypy --strict --no-warn-return-any drgn _drgn.pyi
drgn/helpers/linux/__init__.py:36: error: Need type annotation for "__all__" (hint: "__all__: List[<type>] = ...")
drgn/helpers/linux/__init__.py:38: error: unused "type: ignore" comment
Found 2 errors in 1 file (checked 33 source files)
The "unused" type:ignore directive was necessary for prior versions, so
add --no-warn-unused-ignores, so that we pass on multiple versions.
Apply a List[str] annotation to the __all__ variable to silence the
other error.
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
On gcc version 7.3, we get following compilation error
CC libdrgnimpl_la-dwarf_info.lo
../../libdrgn/dwarf_info.c:181:51: error: initializer element is not
constant
static const size_t DRGN_DWARF_INDEX_NUM_SHARDS = 1 <<
DRGN_DWARF_INDEX_SHARD_BITS;
This fixes the compilation error on older versions of gcc
Signed-off-by: Alakesh Haloi <alakesh.haloi@gmail.com>
Draft pull requests can have temporary commits, so it doesn't make much
sense to check for sign-offs. Skip the check on drafts, making sure it
runs when a draft is changed to a normal pull request.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Document conventions for init/deinit functions, create/destroy
functions, and functions which modify a struct drgn_object.
Signed-off-by: Omar Sandoval <osandov@osandov.com>