Commit Graph

573 Commits

Author SHA1 Message Date
Omar Sandoval
dd59e5431c libdrgn: fix extremely slow type comparison
Matt Ahrens reported that comparing two types would sometimes end up in
a seemingly infinite loop, which he discovered was because we repeat
comparisons of types as long as they're not in a cycle. Fix it by
caching all comparisons during a call.
2019-11-24 09:46:00 -08:00
Omar Sandoval
35b59203d8 docs: document that Type == is not for type checking
While we're here, let's make the note formatting look a bit nicer.
2019-11-24 09:45:39 -08:00
Omar Sandoval
b8b93ae3e6 libdrgn: python: fix deprecation warning in unit tests
Some tests (e.g., tests.test_object.TestSpecialMethods.test_round) are
printing:

  DeprecationWarning: an integer is required (got type float).  Implicit
  conversion to integers using __int__ is deprecated, and may be removed
  in a future version of Python.

See https://bugs.python.org/issue36048. This is coming from calls like:

  Object(prog, 'int', value=1.5)

We actually want the truncating behavior, so explicitly call
PyNumber_Long().
2019-11-22 17:18:55 -08:00
Omar Sandoval
1c9acb8eed Add virtual machine testing setup
Now that we have tests for kernel-specific functionality, we should run
them on various kernel versions. This adds a script for doing so using
QEMU with a pre-built root filesystem image and kernels that I'm hosting
on my Dropbox. The script can be run locally, but this also sets it up
to be run on Travis. For now, we're testing the mainline, stable, and
longterm releases from kernel.org (not including v3.16, which doesn't
even boot for me).
2019-11-22 16:54:00 -08:00
Omar Sandoval
40e509044c Add tests for Linux helpers
We currently have no test coverage for helpers. This is a problem, as
they can be fairly complicated and are susceptible to breaking with new
kernel versions. It's actually not too hard to test user-facing
subsystems on the running kernel as long as we're root and have debug
info for vmlinux, so let's add several tests for those. Specific data
structures will be a little trickier to test, so for now I'm not
covering those.
2019-11-22 16:38:52 -08:00
Omar Sandoval
b00a45915b cli: add --main-symbols
This loads debugging information for vmlinux but not kernel modules.
2019-11-22 16:38:52 -08:00
Omar Sandoval
6af6159cfc libdrgn: support loading only load main debug info
If we only want debugging information for vmlinux and not kernel
modules, it'd be nice to only load the former. This adds a load_main
parameter to drgn_program_load_debug_info() which specifies just that.
For now, it's only implemented for the Linux kernel. While we're here,
let's make the paths parameter optional for the Python bindings.
2019-11-22 16:38:52 -08:00
Omar Sandoval
09108d22fa libdrgn: x86_64: support unwinding stack on Linux < 4.9 2019-11-22 16:38:49 -08:00
Omar Sandoval
fef84017d7 helpers: fix path_lookup() on self-mounted bind mount
_follow_mount() needs to check that the parent mount matches like
__lookup_mnt() in the kernel, otherwise for

  mount --bind /tmp/foo /tmp/foo

path_lookup(prog, '/tmp/foo') will loop forever.
2019-11-18 17:29:46 -08:00
Omar Sandoval
20a7c8c85f helpers: fix d_path() for bind mounts
d_path() for bind mounts returns the wrong path. E.g., for

  mount --bind /tmp/foo /tmp/foo

print_mounts() shows '/tmp/foo/foo'. Let's do exactly what
prepend_path() in the kernel does, which fixes this case.
2019-11-18 17:29:46 -08:00
Omar Sandoval
5fbe1b1ba9 setup.py: fix sdist if files are removed
setuptools has a long-standing bug that if files are removed from the
list of sources but were included in a previous run of egg_info, they
remain in the generated list of sources (pypa/setuptools#436). This
affects egg_info and sdist. Let's work around this by removing the old
SOURCES.txt if we can recreate it from git.
2019-11-18 17:29:46 -08:00
Amlan Nayak
0df2152307 Add basic class type support
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
2019-11-18 10:36:40 -08:00
Omar Sandoval
b49f773fe6 libdrgn: python: fix build on Python 3.8
Python 3.8 replaced the unused void *tp_print field with Py_ssize_t
tp_vectorcall_offset, so with -Werror we get "error: initialization of
‘long int’ from ‘void *’ makes integer from pointer without a cast".
Let's just use designated initializers.
2019-11-15 10:41:58 -08:00
Omar Sandoval
aec6a279aa Fix "DeprecationWarning: invalid escape sequence \*" in docstring 2019-11-04 14:41:54 -08:00
Omar Sandoval
1c8eced0c6 libdrgn: stack_trace: support unwinding stack from struct pt_regs
Linux kernel IRQ handlers store the registers from before the interrupt
as struct pt_regs, so add a way to unwind the stack given only that
structure.
2019-10-28 13:56:54 -07:00
Omar Sandoval
4780c7a266 libdrgn: stack_trace: prohibit unwinding stack of running tasks
We currently don't check that the task we're unwinding is actually
blocked, which means that linux_kernel_set_initial_registers_x86_64()
will get garbage from the stack and we'll return a nonsense stack trace.
Let's avoid this by checking that the task isn't running if we didn't
find a NT_PRSTATUS note.
2019-10-28 13:37:57 -07:00
Omar Sandoval
326107f054 libdrgn: add task_state_to_char() helper
Add a helper to get the state of a task (e.g., 'R', 'S', 'D'). This will
be used to make sure that a task is not running when getting a stack
trace, so implement it in libdrgn.
2019-10-28 13:37:57 -07:00
Omar Sandoval
91f5c8e2e7 libdrgn: stack_trace: support unwinding stack from thread ID
When debugging the Linux kernel, it's inconvenient to have to get the
task_struct of a thread in order to get its stack trace. This adds
support for looking it up solely by PID. In that case, we do the
find_task() inside of libdrgn. This also gives us stack trace support
for userspace core dumps almost for free since we already added support
for NT_PRSTATUS.
2019-10-28 13:37:53 -07:00
Omar Sandoval
0f7ad0ed26 libdrgn: stack_trace: support unwinding stack from core dump
vmcores include a NT_PRSTATUS note for each CPU containing the PID of
the task running on that CPU at the time of the crash and its registers.
We can use that to unwind the stack of the crashed tasks.
2019-10-28 13:36:02 -07:00
Omar Sandoval
75c74022ff libdrgn: save Elf handle for core dump
Currently, we close the Elf handle in drgn_set_core_dump() after we're
done with it. However, we need the Elf handle in
userspace_report_debug_info(), so we reopen it temporarily. We will also
need it to support getting stack traces from core dumps, so we might as
well keep it open. Note that we keep it even if we're using libkdumpfile
because libkdumpfile doesn't seem to have an API to access ELF notes.
2019-10-28 13:09:25 -07:00
Omar Sandoval
c243daed59 Translate find_task() helper (and dependencies) to C
We'd like to be able to look up tasks by PID from libdrgn, but those
helpers are written in Python. Translate them to C and add some thin
bindings so we can use the same implementation from Python.
2019-10-28 13:08:57 -07:00
Omar Sandoval
b5735de8dc libdrgn: add drgn_object_read_integer()
There are some cases where we want to read an integer regardless of its
signedness, so drgn_object_read_signed() and drgn_object_read_unsigned()
are cumbersome to use, and drgn_object_read_value() is too permissive.
2019-10-28 13:06:38 -07:00
Omar Sandoval
97b5967c37 libdrgn: add a couple of helpers for working with buffer and reference objects
Expose drgn_object_buffer() and add drgn_buffer_object_size() and
drgn_reference_object_size().
2019-10-28 11:34:08 -07:00
Omar Sandoval
63911165f8 helpers: fix missing word in drgn.helpers.linux.mm docstring 2019-10-28 02:40:40 -07:00
Omar Sandoval
0da60a41cd libdrgn: support getting register values from stack frames
Currently, the only information available from a stack frame is the
program counter. Eventually, we'd like to add support for getting
arguments and local variables, but that will require more work. In the
mean time, we can at least get the values of other registers. A
determined user can read the assembly for the code they're debugging and
derive the values of variables from the registers.
2019-10-19 13:53:06 -07:00
Omar Sandoval
4fb0e2e110 libdrgn: use new libdwfl stack trace API 2019-10-18 14:34:11 -07:00
Omar Sandoval
6f43fff627 Update elfutils with new stack frame interface
Rebase the existing patches and add the patches which extend the libdwfl
stack frame interface.

Based on:

47780c9e elflint, readelf: enhance error diagnostics

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: don't bother freeing frames outside of dwfl_thread_getframes
libdwfl: only use thread->unwound for initial frame
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2019-10-18 14:34:11 -07:00
Omar Sandoval
d60c6a1d68 libdrgn: add register information to platform
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.
2019-10-18 14:33:02 -07:00
Omar Sandoval
b8c657d760 libdrgn: python: add sizeof()
It's annoying to do obj.type_.size, and that doesn't even work for every
type. Add sizeof() that does the right thing whether it's given a Type
or Object.
2019-10-18 11:47:32 -07:00
Omar Sandoval
12b0214b4d libdrgn: work around DW_AT_upper_bound of -1 for empty arrays
For the following source code:

  int arr[] = {};

GCC emits the following DWARF:

  DWARF section [ 4] '.debug_info' at offset 0x40:
   [Offset]
   Compilation unit at offset 0:
   Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
   [     b]  compile_unit         abbrev: 1
             producer             (strp) "GNU C17 9.2.0 -mtune=generic -march=x86-64 -g"
             language             (data1) C99 (12)
             name                 (strp) "test.c"
             comp_dir             (strp) "/home/osandov"
             stmt_list            (sec_offset) 0
   [    1d]    array_type           abbrev: 2
               type                 (ref4) [    34]
               sibling              (ref4) [    2d]
   [    26]      subrange_type        abbrev: 3
                 type                 (ref4) [    2d]
                 upper_bound          (sdata) -1
   [    2d]    base_type            abbrev: 4
               byte_size            (data1) 8
               encoding             (data1) signed (5)
               name                 (strp) "ssizetype"
   [    34]    base_type            abbrev: 5
               byte_size            (data1) 4
               encoding             (data1) signed (5)
               name                 (string) "int"
   [    3b]    variable             abbrev: 6
               name                 (string) "arr"
               decl_file            (data1) test.c (1)
               decl_line            (data1) 1
               decl_column          (data1) 5
               type                 (ref4) [    1d]
               external             (flag_present) yes
               location             (exprloc)
                [ 0] addr .bss+0 <arr>

Note the DW_AT_upper_bound of -1. We end up parsing this as UINT64_MAX
and returning a "DW_AT_upper_bound is too large" error. It appears that
GCC is simply emitting the array length minus one, so let's treat these
as having a length of zero.

Fixes #19.
2019-10-18 03:18:21 -07:00
Omar Sandoval
c0cb50ac25 docs: fix enum_type() parameter documentation 2019-10-18 02:37:00 -07:00
Omar Sandoval
430732093d libdrgn: python: add converter for byteorder
Rather than open-coding the conversion where we need it, make it a
proper converter function.
2019-10-15 21:21:21 -07:00
Omar Sandoval
55a9700435 libdrgn: python: accept integer-like arguments in more places
There are a few places (e.g., Program.symbol(), Program.read()) where it
makes sense to accept, e.g., a drgn.Object with integer type. Replace
index_arg() with a converter function and use it everywhere that we use
the "K" format for PyArg_Parse*.
2019-10-15 21:10:11 -07:00
Omar Sandoval
77253dbdd8 libdrgn: dwarf_info_cache: fix wrong DW_AT_upper_bound error message
I got the error messages for DW_AT_upper_bound and DW_AT_count
backwards; fix it. Also fix the condition for word + 1 overflowing
dimension->length to be word >= UINT64_MAX. (Dwarf_Word is uint64_t so
this is kind of silly, but at least it documents the intent).
2019-10-15 17:07:27 -07:00
Omar Sandoval
181ebe1a01 Add missing entries in drgn.__all__
PlatformFlags and PrimitiveType got squashed into one string because of
a missing comma, and execscript was never added. Fix it and add some
test cases for it.
2019-10-03 16:50:00 -07:00
Omar Sandoval
4e330bbb6e cli: indicate if drgn was compiled with libkdumpfile 2019-10-03 16:22:10 -07:00
Omar Sandoval
6a0e7fb93a docs: fix Btrfs debugger example
Commit c0bc72b0ea ("libdrgn: use splay tree for memory reader")
changed the signature of add_memory_segment(), but I didn't update the
example that used it. Fix it, and while we're here make it take the
device on the command line.
2019-10-02 17:24:11 -07:00
Omar Sandoval
78192cd61e libdrgn: add environment variable to see more missing debug info errors
Sometimes, I'd like to see all of the missing debug info errors rather
than just the first 5. Allow setting this through the
DRGN_MAX_DEBUG_INFO_ERRORS environment variable.
2019-10-02 17:22:12 -07:00
Omar Sandoval
7848c17097 libdrgn: dwarf_index: tweak missing debug section error message
Make the error message more concise, and reorder the sections so that we
check the most obviously-named section (.debug_info) first and least
important section (.debug_line) last.
2019-10-02 17:22:12 -07:00
Omar Sandoval
423d2cd500 libdrgn: dwarf_index: rework file reporting
Currently, the interface between the DWARF index, libdwfl, and the code
which finds and reports vmlinux/kernel modules is spaghetti. The DWARF
index tracks Dwfl_Modules via their userdata. However, despite
conceptually being owned by the DWARF index, the reporting code reports
the Dwfl_Modules and sets up the userdata. These Dwfl_Modules and
drgn_dwfl_module_userdatas are messy to track and pass between the
layers.

This reworks the architecture so that the DWARF index owns the Dwfl
instance and files are reported to the DWARF index; the DWARF index
takes care of reporting to libdwfl internally. In addition to making the
interface for the reporter much cleaner, this improves a few things as a
side-effect:

- We now deduplicate on build ID in addition to path.
- We now skip searching for vmlinux and/or kernel modules if they were
  already indexed.
- We now support compressed ELF files via libdwelf.
- We can now load default debug info at the same time as additional
  debug info.
2019-10-02 17:22:11 -07:00
Omar Sandoval
91265c37a0 libdrgn: hash_table: fix memcmp() undefined behavior
It's undefined behavior to pass NULL to memcmp() even if the length is
zero. See also commit a17215e984 ("libdrgn: dwarf_index: fix memcpy()
undefined behavior").
2019-10-02 17:16:43 -07:00
Omar Sandoval
b05cc0eb75 libdrgn: use libkdumpfile for ELF vmcores when available
vmcores don't include program headers for special memory regions like
vmalloc and percpu. Instead, we need to walk the kernel page table to
map those addresses. Luckily, libkdumpfile already does that. So, if
drgn was built with libkdumpfile support, use it for ELF vmcores. Also
add an environment variable to override this behavior.

Closes #15.
2019-10-02 17:15:36 -07:00
Omar Sandoval
191c5ae253 libelf: clean up SHF_COMPRESSED handling
We don't need the ifdef anymore since we're using the elf.h from our
local elfutils. We can also fold a leftover nested if.
2019-09-24 17:16:17 -07:00
Omar Sandoval
4f2fa5d86f setup.py: only copy extension if newer than destination
I was always doing the copy based on a comment in the equivalent
setuptools code [1]:

    # Always copy, even if source is older than destination, to ensure
    # that the right extensions for the current Python/platform are
    # used.

On closer inspection, this isn't relevant as of PEP 3149 [2], since
extensions are tagged with the ABI version. Let's avoid the unnecessary
copies.

While we're here, let's restore self.inplace just to be safe.

1: e00f4a87ad
2: https://www.python.org/dev/peps/pep-3149/
2019-09-19 11:08:04 -07:00
Omar Sandoval
ca9cdc1991 libdrgn: autogenerate docstrings.h
I didn't want to use BUILT_SOURCES before because that would break make
$TARGET. But, now that doesn't work anyways because we're using SUBDIRS,
so we might as well use BUILT_SOURCES.
2019-09-19 11:08:04 -07:00
Omar Sandoval
aa4bfd646f libdrgn: simplify gen_constants.py header search
Instead of passing in a directory for header files, add -iquote for that
directory.
2019-09-19 11:08:04 -07:00
Omar Sandoval
6a13d74c0c libdrgn: build with bundled elfutils
Now that we have the bundled version of elfutils, build it from libdrgn
and link to it. We can also get rid of the elfutils version checks from
the libdrgn code.
2019-09-19 11:07:12 -07:00
Omar Sandoval
1cedca8ff4 Import elfutils
Based on:

c950e8a9 config: Fix spec file, add manpages and new GFDL license.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough

The plan is to stop relying on the distribution's version of elfutils
and instead ship our own. This gives us freedom to assume that we're
using the latest version and even ship our own patches (starting with a
few build system improvements). More details are in
scripts/update-elfutils.sh, which was used to generate this commit.
2019-09-05 01:04:33 -07:00
Omar Sandoval
f11a8766bf setup.py: get list of source files from git
Currently, we have a special Makefile target to output the files for a
libdrgn source tarball, and we use that for setuptools. However, the
next change is going to import elfutils, and it'd be a pain to add the
same thing for the elfutils sources. Instead, let's just use git
ls-files for everything. The only difference is that source
distributions won't have the autoconf/automake output.
2019-09-03 17:19:02 -07:00
Omar Sandoval
a3f4fe0518 libdrgn: handle get_debug_sections() errors per-module
There's no reason to fail indexing just because one file is missing
debug information.
2019-08-29 12:26:40 -07:00