Commit Graph

102 Commits

Author SHA1 Message Date
Omar Sandoval
5fc879ef3e libdrgn: debug_info: limit number of DWARF expression operations executed
A malformed DWARF expression can easily get us into an infinite loop.
Avoid this by capping the number of operations that we'll execute.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
43b90ffb1b libdrgn: debug_info: add missing stack size check for DW_OP_deref
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-23 13:14:28 -07:00
Omar Sandoval
92fd967a3a libdrgn: print uint8_t as hex with PRIx8 format, not x
In practice, they're probably always the same, but PRIx8 is more
correct.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-07 15:34:30 -07:00
Omar Sandoval
e0921c5bdb libdrgn: don't use OpenMP tasking
libomp (at least in LLVM 9 and 10) seems to have buggy OpenMP tasking
support. See commit 1cc3868955 ("CI: temporarily disable Clang") for
one example. OpenMP tasks aren't buying us much; they simplify DWARF
index updates in some places but complicate it in others. Let's ditch
tasks and go back to building an array of CUs to index similar to what
we did before commit f83bb7c71b ("libdrgn: move debugging information
tracking into drgn_debug_info"). There is no significant performance
difference.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-06 16:56:02 -07:00
Jay Kamat
6be21f674a libdrgn: follow DW_AT_signature when parsing DWARF types
When using type units, skeleton declarations are made instead of
concrete ones. However, these declarations have signature tags attached
that point to the type unit with the definition, so we can simply follow
the signature to get the concrete type.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-04-23 02:37:31 -07:00
Jay Kamat
9dabec1264 libdrgn: add support for parsing type units
Adds support for parsing of type units as enabled by
-fdebug-types-section. If a module has both a debug info section and
type unit section, both are read.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-04-23 02:37:31 -07:00
Omar Sandoval
33300d426e libdrgn: debug_info: don't overwrite Dwarf_Die passed to drgn_type_from_dwarf_internal()
If the DIE passed to drgn_type_from_dwarf_internal() is a declaration,
then we overwrite it with dwarf_offdie(). As far as I can tell, this
doesn't break anything at the moment, but it's sketchy to overwrite an
input parameter and may cause issues in the future. Use a temporary DIE
on the stack in this case instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-23 02:37:31 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
b772432a86 libdrgn: cfi: don't rely on member containing a flexible array
Clang enables -Wgnu-variable-sized-type-not-at-end by default, which
warns for DRGN_CFI_ROW():

  arch_x86_64.c:735:27: warning: field 'row' with variable sized type 'struct drgn_cfi_row' not at the end of a struct or class is a GNU extension
        [-Wgnu-variable-sized-type-not-at-end]
          .default_dwarf_cfi_row = DRGN_CFI_ROW(

DRGN_CFI_ROW() is gnarly anyways, so instead of having it expand to a
pointer expression relying on this GCC extension, make it expand to an
initializer. Then, we can initialize default_dwarf_cfi_row as a separate
variable rather than directly in the initializer for struct
drgn_architecture_info.

This still relies on a GCC extension for static initialization of
flexible array members, but apparently Clang is okay with that one by
default (-Wgnu-flexible-array-initializer must be enabled explictly or
by -Wgnu or -Wpedantic).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 16:19:21 -07:00
Omar Sandoval
630d39e345 libdrgn: add ORC unwinder
The Linux kernel has its own stack unwinding format for x86-64 called
ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is
essentially a simplified, less complete version of DWARF CFI. ORC is
generated by analyzing machine code, so it is present for all but a few
ignored functions. In contrast, DWARF CFI is generated by the compiler
and is therefore missing for functions written in assembly and inline
assembly (which is widespread in the kernel).

This implements an ORC stack unwinder: it applies ELF relocations to the
ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule
kind, parses and efficiently stores ORC data, and translates ORC to drgn
CFI rules. This will allow us to stack trace through assembly code,
interrupts, and system calls.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-29 10:01:52 -07:00
Omar Sandoval
090064f20d libdrgn: x86-64: support R_X86_64_PC32 relocation type
This is used for .orc_unwind_ip for kernel modules.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 15:16:36 -07:00
Omar Sandoval
e0aaaf203d libdrgn: generalize applying ELF relocations
To support unwinding with ORC, we need to apply relocations to
.orc_unwind_ip, which libdwfl doesn't do. That means that we always need
to apply relocations on x86-64, not just as a fast path when the file's
byte order matches the host's. So, generalize handling of 64- vs 32-bit
and little- vs big-endian relocations, and move the handling of
relocation types to an arch-specific callback.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 15:16:35 -07:00
Omar Sandoval
da180b7274 libdrgn: handle errors from elf_strptr()
For some reason, we consistently ignore errors from elf_strptr(), but we
shouldn't.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 14:28:16 -07:00
Omar Sandoval
e5bc41f16c libdrgn: add latest elf.h and dwarf.h to support elfutils 0.165
The oldest LTS version of Ubuntu, 16.04, has elfutils 0.165. This
version is missing some ELF and DWARF definitions used by drgn. Add
copies of elf.h from glibc 2.33 and dwarf.h and elfutils/known-dwarf.h
from elfutils 0.183 to get the latest definitions and drop the minimum
required version of elfutils further to 0.165.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-21 23:18:39 -07:00
Serapheim Dimitropoulos
a68abd5de4 libdrgn: stretch minimum supported version of libelf to 0.170
Currently libdrgn requires libelf to be of version 0.175 or
later. This patch allows the library to be compiled with libelf
0.170 (the newest version supported by Ubuntu 18.04 LTS).

Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
2021-03-21 14:28:29 -07:00
Omar Sandoval
eec67768aa libdrgn: replace elfutils DWARF unwinder with our own
The elfutils DWARF unwinder has a couple of limitations:

1. libdwfl doesn't have an interface for getting register values, so we
   have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
   error even on success, so we have to squirrel away our own errors in
   the callback.

Furthermore, there are a couple of things that will be easier with our
own unwinder:

1. Integrating unwinding using ORC will be easier when we're handling
   unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
   expression evaluation.

Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:43:12 -07:00
Omar Sandoval
35a1af7ad6 libdrgn: add DWARF expression evaluation
For DW_CFA_def_cfa_expression, DW_CFA_expression, and
DW_CFA_val_expression, we need to be evaluate a DWARF expression. Add an
interface for this. It doesn't yet support operations that aren't
applicable to CFI or some more exotic operations.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
fdaf7790a9 libdrgn: add DWARF call frame information parsing
In preparation for adding our own unwinder, add support for parsing and
finding DWARF/EH call frame information. Use a generic representation of
call frame information so that we can support other formats like ORC in
the future.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
cc1a5606d0 libdrgn: debug_info: save platform per module
Stack unwinding depends on some platform-specific information. If for
some reason a program has debugging information with different
platforms, then we need to make sure that while we're unwinding the
stack, we don't end up in a frame with a different platform, because the
registers won't make sense. Additionally, we should parse debugging
information using the module's platform rather than the program's
platform, which may not match. So, cache the platform derived from each
module's ELF file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
6065fc87af libdrgn: debug_info: save .debug_frame, .eh_frame, .text, and .got
These sections are needed for stack unwinding. However, .debug_frame and
.eh_frame don't need to be read right away, and .text and .got don't
need to be read at all, so partition them accordingly. Also, check that
the sections are specifically SHT_PROGBITS rather than not SHT_NOBITS.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
7eab40aaeb libdrgn: rename drgn_error_debug_info() to drgn_error_debug_info_scn()
An upcoming change will introduce a similar function for when the
section isn't known. Rename the original so that the new one can take
its name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-10 02:07:16 -08:00
Jay Kamat
4552d78f4a libdrgn: debug_info: try to find DIE specification when parsing type
Currently, we look up incomplete types by name, which can fail if the
name is ambiguous or the type is unnamed. Try finding the complete type
via the DW_AT_specification map in the DWARF index first.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-03-08 15:24:24 -08:00
Omar Sandoval
a24c0f5b33 libdrgn: clean up usage of drgn_stop
Use drgn_not_found where it's more appropriate, and check explicitly
against drgn_stop instead of err->code == DRGN_ERROR_STOP.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-05 12:46:06 -08:00
Omar Sandoval
25eb2abb1a libdrgn: add drgn_platform getters
Add low-level getters equivalent to the drgn_program platform-related
helpers and use them in places where we have checked or can assume that
the platform is known.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
e04eda9880 libdrgn: define HOST_LITTLE_ENDIAN
As a minor cleanup, instead of writing __BYTE_ORDER__ ==
__ORDER_LITTLE_ENDIAN__ everywhere, define and use HOST_LITTLE_ENDIAN.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Jay Kamat
c22e501295 libdrgn: debug_info: fix parsing specifications of declarations
drgn_compound_type_from_dwarf() and drgn_enum_type_from_dwarf() check
the DW_AT_declaration flag to decide whether the type is a declaration
of an incomplete type or a definition of a complete type. However, they
check DW_AT_declaration with dwarf_attr_integrate(), which follows the
DW_AT_specification reference if it is present. The DIE referenced by
DW_AT_specification typically is a declaration, so this erroneously
identifies definitions as declarations. Additionally, if
drgn_debug_info_find_complete() finds the same definition, we can end up
recursing until we hit the DWARF parsing depth limit. Fix it by not
using dwarf_attr_integrate() for DW_AT_declaration.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-02-25 10:46:34 -08:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
72b4aa9669 libdrgn: clean up object initialization
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 17:43:14 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
71c6ac6927 libdrgn: use drgn_debug_info_module instead of Dwfl_Module in more places
It's easier to go from drgn_debug_info_module to Dwfl_Module than the
other direction, and I'd rather use the "higher-level"
drgn_debug_info_module wherever possible. So, store
drgn_debug_info_module in the DWARF index (which also saves a
dereference while building the index), and pass around
drgn_debug_info_module when parsing types/objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-27 11:17:41 -08:00
Omar Sandoval
bbefc573d8 libdrgn: debug_info: make sure DW_TAG_template_value_parameter has value
Otherwise, an invalid DW_TAG_template_value_parameter can be confused
for a type parameter.

Fixes: 352c31e1ac ("Add support for C++ template parameters")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 12:07:46 -08:00
Omar Sandoval
2c612ea97f libdrgn: fix address of global per-CPU variables with KASLR
The address of a per-CPU variable is really an offset into the per-CPU
area, but we're applying the load bias (i.e., KASLR offset) to it as if
it were an address, resulting in an invalid pointer when it's eventually
passed to per_cpu_ptr().

Fix this by applying the bias only if it the address is in the module's
address range. This heuristic avoids any Linux kernel-specific logic;
hopefully it doesn't have any undesired side effects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 10:14:50 -08:00
Omar Sandoval
a7962e9477 libdrgn: debug_info: pass around Dwfl_Module instead of bias
We're going to need the module start and end in
drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and
get the bias when we need it. This means we don't need the bias from
drgn_dwarf_index_get_die(), so get rid of that, too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 10:12:29 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
b6958f920c libdrgn: debug_info: move object parsing code in debug_info.c
In preparation for calling the object parsing code from the type parsing
code, move it up in the file (and update the coding style in
drgn_object_from_dwarf_enumerator() while we're at it).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
be1bb279aa libdrgn: debug_info: pass DIE bias when parsing types
This will be needed for types containing reference objects.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
934dd36302 libdrgn: remove unused name parameter from drgn_object_from_dwarf_{subprogram,variable}()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 12:16:29 -08:00
Omar Sandoval
a57c26ed32 libdrgn: fix zero-length array GCC < 9.0 workaround for qualified types
We're not applying the zero-length array workaround when the array type
is qualified. Make sure we pass through can_be_incomplete_array when
parsing DW_TAG_{const,restrict,volatile,atomic}_type.

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 11:21:57 -08:00
Omar Sandoval
ca7682650d libdrgn: rename drgn_type_from_dwarf_child() to drgn_type_from_dwarf_attr()
The type comes from the DW_AT_type attribute of the DIE, not a child
DIE, so this is a better name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 11:02:27 -08:00
Omar Sandoval
798f0887a5 libdrgn: simplify language fall back handling
If the language for a DWARF type is not found or unrecognized, we should
fall back to the global default, not the program default (the program
default language is for language-specific operations on the program, so
DWARF parsing shouldn't depend on it). Add a fall_back parameter to
drgn_language_from_die() and use it in DWARF parsing, and replace
drgn_language_or_default() with a drgn_default_language variable.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 10:46:35 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
97fbedec1f libdrgn: return unavailable objects for DWARF objects without value or address
Now that we have the concept of unavailable objects, use it for DWARF
where appropriate.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 14:15:09 -08:00
Omar Sandoval
edb1fe7f2f libdrgn: rename drgn_object_kind to drgn_object_encoding
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:26 -08:00
Omar Sandoval
5975d19580 libdrgn: report better errors when parsing DWARF/kmod index
If the DWARF index encounters any error while parsing, it returns an
error saying only "debug information is truncated", which makes it hard
to track down parsing errors. The kmod index parser silently swallows
errors. For both, replace the mread functions with a higher-level
binary_buffer interface that can include more information including the
location of the error. For example:

  /tmp/mybinary: .debug_info+0x4: expected at least 56 bytes, have 55

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-13 17:00:07 -08:00
Omar Sandoval
756e5d27ad libdrgn: debug_info: put sections in an array (again)
Back in commit 9ce9094ee0 ("libdrgn: dwarf_index: don't copy sections
into each CU"), I changed the sections to be individual members. The
next change will be easier if they're in an array.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 16:22:04 -08:00
Omar Sandoval
f0a3629c26 libdrgn: debug_info: add dwarf_tag_str() and use it for error messages
There are several places where we manually pass around the string name
of a tag so it can be used for error messages. Do it programatically
instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 16:22:04 -08:00
Omar Sandoval
3c5d22637e libdrgn: clean up hash function APIs and improve documentation
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 16:20:08 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
f83bb7c71b libdrgn: move debugging information tracking into drgn_debug_info
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.

This also reworks a couple of details of loading debugging information:

- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
  a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
  compilation units (by using OpenMP tasks).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
3ac9ae357b libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info
The current name is too verbose. Let's go with a shorter, more generic
name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-11 17:41:23 -07:00