Commit Graph

45 Commits

Author SHA1 Message Date
Omar Sandoval
2d8aeacb30 Allow naming and configuring order of object finders
This one doesn't need any changes to the callback signature, just the
new interface. We also keep add_object_finder() for compatibility.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2024-06-05 13:40:26 -07:00
Stephen Brennan
ff322c7070 libdrgn: introduce Symbol Finder API
Symbol lookup is not yet modular, like type or object lookup. However,
making it modular would enable easier development and prototyping of
alternative Symbol providers, such as Linux kernel module symbol tables,
vmlinux kallsyms tables, and BPF function symbols. To begin with, create
a modular Symbol API within libdrgn, and refactor the ELF symbol search
to use it.

For now, we leave drgn_program_find_symbol_by_address_internal() alone.
Its conversion will require some surgery, since the new API can return
errors, whereas this function cannot.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
2024-03-11 16:43:43 -07:00
Omar Sandoval
c85dd74f3e libdrgn: embed drgn_debug_info in drgn_program
This will simplify the implementation of the module API (#332).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-10-02 11:27:36 -07:00
Omar Sandoval
f24805486f libdrgn: embed type and object finders in drgn_debug_info
This is preparation for embedding drgn_debug_info in drgn_program and
initializing it earlier.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-09-29 12:00:31 -07:00
Omar Sandoval
30ecdd901e libdrgn: allow passing multiple type kinds to type finder function
For the next change, we want to look up a name which may have one of
multiple type kinds. Make drgn_type_kind_fn in libdrgn take a bitmask of
kinds instead of a single kind.

We could change the Python bindings to take the same bitmask, or a tuple
of drgn.TypeKind, but either would be a breaking API change. For now,
let's call the type finder function for each kind in the bitmask
instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-21 15:18:26 -07:00
Omar Sandoval
c8406e1ea0 libdrgn: require semicolon after DEFINE_{HASH,VECTOR,BINARY_SEARCH_TREE}*
The lack of a semicolon after these macros has always confused tooling
like cscope. We could add semicolons everywhere now, but let's enforce
it for the future, too. Let's add a dummy struct forward declaration at
the end of each macro that enforces this requirement and also provides a
useful error message.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-02 14:54:59 -07:00
Omar Sandoval
55a3ebca6c libdrgn: dwarf_info: support DWO split DWARF
We've addressed all of the smaller differences with GNU Debug Fission
and split DWARF 5, so now all that remains is the DWARF index.

The general approach is: in drgn_dwarf_index_read_cus(), for each CU,
ask libdw for the "sub-DIE". For skeleton CUs, this is the split CU DIE
from the .dwo file. From that Dwarf_Die, we can get the Dwarf_CU and
then the Dwarf handle. Then, we wrap that in a struct drgn_elf_file
(cached in a hash table in the struct drgn_module), which the DWARF
index can work with from there.

Additionally, a couple of places (.debug_addr parsing and stack trace
local variable lookup) need to be updated to use the correct
drgn_elf_file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
fc47ec1b78 libdrgn: add prog pointer to struct drgn_module
The next commit needs this.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-22 15:27:39 -07:00
Omar Sandoval
fc3ea4184a libdrgn: use new include-what-you-use exported declarations and fix warnings
include-what-you-use/include-what-you-use#1164 fixed
include-what-you-use/include-what-you-use#971 so that we can export
forward declarations instead of hacking around it. I can't reproduce the
issue with BINARY_OP_SIGNED_2C anymore either, so we can remove that
hack, too. Also fix any other warnings.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-05-24 00:25:25 -07:00
Omar Sandoval
18b12a5c7b libdrgn: get .eh_frame from the correct file
We're currently getting .eh_frame from the debug file. However, since
.eh_frame is an SHF_ALLOC section, it is actually in the loaded file,
and may not be in the debug file. This causes us to fail to unwind in
modules whose debug file was created with objcopy --only-keep-debug
(which is typical for Linux distro debug files).

Fix it by getting .eh_frame from the loaded file. To make this easier,
we split .eh_frame and .debug_frame data into two separate tables. We
also don't bother deduplicating them anymore, since GCC and Clang only
seem to generate one or the other in practice.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 13:37:29 -08:00
Omar Sandoval
270375f077 libdrgn: debug_info: get "loaded" ELF file
For upcoming changes, we will need loaded (SHF_ALLOC) sections for
modules. Some separate debug files (e.g., those created with objcopy
--only-keep-debug) don't have those sections. Let's get the loaded file
from libdwfl with dwfl_module_getelf() and save it in a struct
drgn_elf_file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 13:37:29 -08:00
Omar Sandoval
bcb53d712b libdrgn: bypass libdwfl with struct drgn_elf_file
Now that we track the debug file ourselves, we can avoid calling libdwfl
in a bunch of places. By tracking the bias ourselves, we can avoid a
bunch more.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 13:37:29 -08:00
Omar Sandoval
34f122144a libdrgn: debug_info: wrap ELF file information in new struct drgn_elf_file
struct drgn_module contains a bunch of information about the debug info
file. Let's pull it out into its own structure, struct drgn_elf_file.
This will be reused for the "main"/"loaded" file in an upcoming change.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 13:37:29 -08:00
Omar Sandoval
b3bab1c5b0 libdrgn: make module vs. program platform difference more clear
It's confusing that we have a platform both for the program and for each
module. They usually match, but they're not required to. For example,
the user can manually add a file with a different platform just to read
its debug info. Our rule is that if we're parsing anything from the
module, we use the module platform; and otherwise, use the program
platform. There are a couple of places where the platforms must match:
when using call frame information (CFI) or registers. Let's make all of
this more clear in the code (by using the module's platform even when it
must match the program's platform) and in comments. No functional
change.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 12:53:45 -08:00
Omar Sandoval
87b7292aa5 Relicense drgn from GPLv3+ to LGPLv2.1+
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.

All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-01 17:05:16 -07:00
Omar Sandoval
70af25849c libdrgn: rename drgn_debug_info_module to drgn_module
Eventually, modules will be exposed as part of the public libdrgn API,
so they should have a clean name. Additionally, the module API I'm
currently working on will allow modules for which we don't have the
debug info file, so "debug info module" would be a misnomer.

Also rename drgn_dwarf_module_info to drgn_module_dwarf_info and
drgn_orc_module_info to drgn_module_orc_info to fit the new naming
scheme better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-10-05 16:52:46 -07:00
Omar Sandoval
929b7de266 libdrgn: handle reading data from SHT_NOBITS sections
Peilin Ye reported a couple of related crashes in drgn caused by Linux
kernel modules which had been processed with objcopy --only-keep-debug
(although he notes that since binutils-gdb commit 8c803a2dd7d3
("elf_backend_section_flags and _bfd_elf_init_private_section_data") (in
binutils v2.35), objcopy --only-keep-debug doesn't seem to work for
kernel modules).

If given an SHT_NOBITS section, elf_getdata() returns an Elf_Data with
d_buf = NULL and d_size set to the size in the section header, which is
often non-zero. There are a few places where this can cause us to
dereference a NULL pointer:

* In relocate_elf_sections() for the relocated section data.
* In relocate_elf_sections() for the symbol table section data.
* In get_kernel_module_name_from_modinfo().
* In get_kernel_module_name_from_this_module().

Fix it by checking the section type or directly checking Elf_Data::d_buf
everywhere that could potentially get an SHT_NOBITS section. This is
based on a PR from Peilin Ye.

Closes #145.

Reported-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-01-27 12:23:09 -08:00
Omar Sandoval
e6abfeac03 libdrgn: debug_info: report userspace core dump debug info ourselves
There are a few reasons for this:

1. dwfl_core_file_report() crashes on elfutils 0.183-0.185. Those
   versions are still used by several distros.
2. In order to support --main-symbols and --symbols properly, we need to
   report things ourselves.
3. I'm considering moving away from libdwfl in the long term.

We provide an escape hatch for now: setting the environment variable
DRGN_USE_LIBDWFL_REPORT=1 opts out of drgn's reporting and uses
libdwfl's.

Fixes #130.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-12-08 12:11:10 -08:00
Omar Sandoval
844d82848c libdrgn: add partial support for .gnu_debugaltlink
Issue #130 reported an "unknown attribute form 0x1f20" from drgn. 0x1f20
is DW_FORM_GNU_ref_alt, which is a reference to a DIE in an alternate
file. Similarly, DW_FORM_GNU_strp_alt is a string in an alternate file.
The alternate file is specified by the .gnu_debugaltlink section. This
is generated by dwz, which is used by at least Fedora and Debian.

libdwfl already finds the alternate debug info file, so we can save its
.debug_info and .debug_str and use those to support DW_FORM_GNU_ref_alt
and DW_FORM_GNU_strp_alt in the DWARF index.

Imported units are going to be more work to support in the DWARF index,
but this at least lets drgn start up.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-12-07 13:49:09 -08:00
Omar Sandoval
c0d8709b45 Update copyright headers to Meta
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-21 15:59:44 -08:00
Omar Sandoval
5591d199b1 libdrgn: debug_info: split DWARF support into its own file
Continuing the refactoring from the previous commit, move the DWARF code
from debug_info.c to its own file, leaving only the generic ELF file
management in debug_info.c

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-18 15:08:54 -08:00
Omar Sandoval
c6b2bc4181 libdrgn: debug_info: split ORC support into its own file
debug_info.c currently contains code for managing ELF files with
debugging information, for parsing DWARF, and for parsing ORC. Let's
split it up, starting by moving ORC support to its own file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-18 15:08:04 -08:00
Omar Sandoval
1339dc6a2f libdrgn: hash_table: move entry_to_key to DEFINE_HASH_TABLE_FUNCTIONS()
DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type.
Move that argument (and some of the derived constants) to
DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table
types. As a nice side effect, it also reduces the size of common header
files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-10-23 00:52:23 -07:00
Omar Sandoval
26001733f6 libdrgn: debug_info: support DWARF 5 location lists
The DWARF 5 format is a little more complicated than DWARF 2-4 but
functionally very similar.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-09 01:52:08 -07:00
Omar Sandoval
215f7d79d7 libdrgn: debug_info: implement DW_OP_{addr,const}x
These were added in DWARF 5. They need to know the CU that they're being
evaluated in, but the parameters for drgn_eval_dwarf_expression() were
already getting unwieldy. Wrap the evaluation context in a new struct
drgn_dwarf_expression_context, add the additional CU information, and
implement the operations.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-09 01:52:08 -07:00
Omar Sandoval
81053a1c57 libdrgn: dwarf_index: support DWARF 5
The main changes are:

1. Skipping the new attribute forms.
2. Handling DW_FORM_strx*, DW_FORM_line_strp, and DW_FORM_implicit_const
   for the attributes that we care about.
3. Parsing the new unit header format.
4. Parsing the new line number program header format.

Note that Clang currently produces an incorrect DWARF 5 line number
program header for the Linux kernel (https://reviews.llvm.org/D105662),
so some types are not properly deduplicated in that case.

Closes #104.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-09 01:51:59 -07:00
Omar Sandoval
bc85767e5f libdrgn: support looking up parameters and variables in stack traces
After all of the preparatory work, the last two missing pieces are a way
to find a variable by name in the list of scopes that we saved while
unwinding, and a way to find the containing scopes of an inlined
function. With that, we can finally look up parameters and variables in
stack traces.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
0e113ecc8d libdrgn: debug_info: add drgn_find_die_ancestors()
This will be used for finding the ancestors of the abstract instance
root corresponding to a concrete inlined instance root for variable
lookups in inlined functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
d8d4157346 libdrgn: debug_info: add drgn_debug_info_module_find_dwarf_scopes()
This will be used for finding functions, inlined functions, and blocks
containing a PC for stack unwinding and variable lookups.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
d5b68455b8 libdrgn: debug_info: save .debug_loc
.debug_loc will be used for variable resolution.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Jay Kamat
9dabec1264 libdrgn: add support for parsing type units
Adds support for parsing of type units as enabled by
-fdebug-types-section. If a module has both a debug info section and
type unit section, both are read.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-04-23 02:37:31 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
630d39e345 libdrgn: add ORC unwinder
The Linux kernel has its own stack unwinding format for x86-64 called
ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is
essentially a simplified, less complete version of DWARF CFI. ORC is
generated by analyzing machine code, so it is present for all but a few
ignored functions. In contrast, DWARF CFI is generated by the compiler
and is therefore missing for functions written in assembly and inline
assembly (which is widespread in the kernel).

This implements an ORC stack unwinder: it applies ELF relocations to the
ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule
kind, parses and efficiently stores ORC data, and translates ORC to drgn
CFI rules. This will allow us to stack trace through assembly code,
interrupts, and system calls.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-29 10:01:52 -07:00
Omar Sandoval
eec67768aa libdrgn: replace elfutils DWARF unwinder with our own
The elfutils DWARF unwinder has a couple of limitations:

1. libdwfl doesn't have an interface for getting register values, so we
   have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
   error even on success, so we have to squirrel away our own errors in
   the callback.

Furthermore, there are a couple of things that will be easier with our
own unwinder:

1. Integrating unwinding using ORC will be easier when we're handling
   unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
   expression evaluation.

Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:43:12 -07:00
Omar Sandoval
fdaf7790a9 libdrgn: add DWARF call frame information parsing
In preparation for adding our own unwinder, add support for parsing and
finding DWARF/EH call frame information. Use a generic representation of
call frame information so that we can support other formats like ORC in
the future.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
cc1a5606d0 libdrgn: debug_info: save platform per module
Stack unwinding depends on some platform-specific information. If for
some reason a program has debugging information with different
platforms, then we need to make sure that while we're unwinding the
stack, we don't end up in a frame with a different platform, because the
registers won't make sense. Additionally, we should parse debugging
information using the module's platform rather than the program's
platform, which may not match. So, cache the platform derived from each
module's ELF file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
6065fc87af libdrgn: debug_info: save .debug_frame, .eh_frame, .text, and .got
These sections are needed for stack unwinding. However, .debug_frame and
.eh_frame don't need to be read right away, and .text and .got don't
need to be read at all, so partition them accordingly. Also, check that
the sections are specifically SHT_PROGBITS rather than not SHT_NOBITS.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
7eab40aaeb libdrgn: rename drgn_error_debug_info() to drgn_error_debug_info_scn()
An upcoming change will introduce a similar function for when the
section isn't known. Rename the original so that the new one can take
its name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-10 02:07:16 -08:00
Omar Sandoval
aaa98ccae3 libdrgn: consistently use __ for __attribute__ names
In some places, we add __ preceding and following an attribute name, and
in some places, we don't. Let's make it consistent. We might as well opt
for the __ to make clashes with macros less likely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 03:16:23 -08:00
Omar Sandoval
5975d19580 libdrgn: report better errors when parsing DWARF/kmod index
If the DWARF index encounters any error while parsing, it returns an
error saying only "debug information is truncated", which makes it hard
to track down parsing errors. The kmod index parser silently swallows
errors. For both, replace the mread functions with a higher-level
binary_buffer interface that can include more information including the
location of the error. For example:

  /tmp/mybinary: .debug_info+0x4: expected at least 56 bytes, have 55

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-13 17:00:07 -08:00
Omar Sandoval
756e5d27ad libdrgn: debug_info: put sections in an array (again)
Back in commit 9ce9094ee0 ("libdrgn: dwarf_index: don't copy sections
into each CU"), I changed the sections to be individual members. The
next change will be easier if they're in an array.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 16:22:04 -08:00
Omar Sandoval
de6a4e07ae libdrgn: fix Doxygen
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-30 01:32:33 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
f83bb7c71b libdrgn: move debugging information tracking into drgn_debug_info
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.

This also reworks a couple of details of loading debugging information:

- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
  a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
  compilation units (by using OpenMP tasks).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
3ac9ae357b libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info
The current name is too verbose. Let's go with a shorter, more generic
name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-11 17:41:23 -07:00