Commit Graph

669 Commits

Author SHA1 Message Date
Omar Sandoval
81053a1c57 libdrgn: dwarf_index: support DWARF 5
The main changes are:

1. Skipping the new attribute forms.
2. Handling DW_FORM_strx*, DW_FORM_line_strp, and DW_FORM_implicit_const
   for the attributes that we care about.
3. Parsing the new unit header format.
4. Parsing the new line number program header format.

Note that Clang currently produces an incorrect DWARF 5 line number
program header for the Linux kernel (https://reviews.llvm.org/D105662),
so some types are not properly deduplicated in that case.

Closes #104.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-09 01:51:59 -07:00
Omar Sandoval
347c578aa0 libdrgn: debug_info: don't open-code drgn_platform_address_size()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-07 15:03:03 -07:00
Omar Sandoval
add17a9a36 libdrgn: stack_trace: fix source info without .debug_aranges
dwfl_module_getsrc() relies on .debug_aranges to find the CU containing
the PC. If the module has a missing or incomplete .debug_aranges, it
fails. This lookup is actually redundant since we already found the CU
when we unwound the stack. Use the libdw helpers that take the CU DIE
instead to avoid this. We also need to save the CU for frames where we
found it but couldn't find the subprogram (typically assembly files).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-07 13:41:17 -07:00
Omar Sandoval
fbe102f37e libdrgn: debug_info: handle incomplete .debug_aranges
Clang does not generate .debug_aranges by default, but the GNU toolchain
does. This means that a Linux kernel binary compiled with Clang and GNU
binutils will have ranges in .debug_aranges for assembly files and
nothing else. This breaks our assumption that a non-empty .debug_aranges
has ranges for every compilation unit. Fix it by always falling back to
checking every CU if a range was not found in .debug_aranges.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-07 11:25:13 -07:00
Omar Sandoval
2e04e6b73c libdrgn: binary_buffer: handle non-canonical LEB128 numbers
LEB128 allows for redundant zero/sign bits, but we currently always
treat extra bytes as overflow. Let's check those bytes correctly.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 21:39:31 -07:00
Omar Sandoval
d12d4368b8 libdrgn: support passing debug info files to load_debug_info example program
And don't set the target by default; -k must be given explicitly now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 16:58:47 -07:00
Omar Sandoval
73d5a207c8 libdrgn: dwarf_index: fix skipping DW_FORM_ref_addr in DWARF 2
In DWARF 2, DW_FORM_ref_addr has the size of an address, not a size
depending on the format.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 11:53:34 -07:00
Omar Sandoval
86e966fbf8 libdrgn: dwarf_index: handle DW_FORM_block
Somehow I missed this form, and I've never seen it used. It's the same
as DW_FORM_exprloc for our purposes, so it's an easy fix.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 01:34:52 -07:00
Omar Sandoval
018b00cede libdrgn: binary_buffer: check bounds with 64-bit size
There are a few places in the DWARF indexing code that we skip a 64-bit
size. On 32-bit systems, this can wrap if the count is greater than
SIZE_MAX. Rather than requiring vigilance against this, change the size
to uint64_t.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 01:27:47 -07:00
Omar Sandoval
52df3cb5ff libdrgn: dwarf_index: properly return error when hashing path fails
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 01:21:47 -07:00
Omar Sandoval
157f8ed7dc libdrgn: dwarf_index: #define FNV constants
Older versions of GCC and Clang don't accept const variables for
initializers ("error: initializer element is not constant"), so #define
the FNV constants instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-28 11:08:30 -07:00
Omar Sandoval
87809f7692 libdrgn: dwarf_index: improve file path hashing for deduplication
We currently don't include the compilation directory when hashing file
names for deduplication. This can cause us to incorrectly deduplicate a
definition if, for example, two libraries have a definition with the
same name in files with the same name. Fix this by hashing the full file
path including the compilation directory.

This also requires reworking our strategy for path normalization to
better handle ".." components, since directories may end up outside of
the compilation directory. The new strategy keeps a linked list of
hashes (now FNV-1a instead of SipHash) for each parent directory. This
is actually more efficient than the previous approach, offsetting the
cost of the extra hash computations for the compilation directory. It
also correctly handles file names in the line number program header
which consist of multiple components.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-25 18:03:01 -07:00
Omar Sandoval
82824c0e5f libdrgn: path: simplify logic in path_iterator_next()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-25 18:03:01 -07:00
Omar Sandoval
57cc0deb98 libdrgn: replace struct path_iterator_component with struct string
The former is the same as the latter with less generic naming.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-25 18:03:01 -07:00
Omar Sandoval
420d2bb1dc libdrgn: dwarf_index: fix DW_AT_strp bounds check
The string must be null terminated, so there must be at least one byte
left in .debug_str.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-25 17:50:08 -07:00
Omar Sandoval
c4b174af74 libdrgn: fix kdump format support
I missed the drgn_program_set_kdump() code path when making sure that we
set the platform before adding memory segments.

Fixes: 0e3054a0ba ("libdrgn: make addresses wrap around when reading memory")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-09 15:30:48 -07:00
Omar Sandoval
6b2dda3f95 libdrgn: bring back dwfl_core_file_report() bug workaround
This workaround was originally present in commit e5874ad18a ("libdrgn:
use libdwfl"). We dropped in in commit 6a13d74c0c ("libdrgn: build
with bundled elfutils") because the bundled version of elfutils had the
fix. We forgot to bring it back in commit 4c5c5f3842 ("Remove bundled
version of elfutils") even though we support versions without the fix.

Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-09 15:16:12 -07:00
Omar Sandoval
82ca5634b5 libdrgn: fix copying value to big-endian from little-endian
copy_lsbytes() doesn't copy enough bytes when copying from a smaller
little-endian value to a larger big-endian value. This was caught by the
test cases for DW_OP_deref{,_size}, but it can affect other places when
debugging a little-endian target from a big-endian host or vice-versa.

Closes #105.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-08 12:24:20 -07:00
Omar Sandoval
5a03d6b13f drgn 0.0.13
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 16:17:11 -07:00
Omar Sandoval
faad25d7b2 libdrgn: debug_info: fix address of objects with size zero
The stack trace variable work introduced a regression that causes
objects with size zero to always be marked absent even if they have an
address. This matters because GCC sometimes seems to omit the complete
array type for arrays declared without a length, so an array variable
can end up with an incomplete array type. I saw this with the
"swapper_spaces" variable in mm/swap_state.c from the Linux kernel.

Make sure to use the address of an empty piece if the variable is also
empty.

Fixes: ffcb9ccb19 ("libdrgn: debug_info: implement creating objects from DWARF location descriptions")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 15:46:22 -07:00
Omar Sandoval
f7fe93e573 cli: show elfutils version in use
drgn depends heavily on libelf and libdw, so it's useful to know what
version we're using. Add drgn._elfutils_version and use that in the CLI
and in the test cases where we currently check the libdw version.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 11:10:50 -07:00
Omar Sandoval
6357cea46b drgn 0.0.12
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 01:13:28 -07:00
Omar Sandoval
bc85767e5f libdrgn: support looking up parameters and variables in stack traces
After all of the preparatory work, the last two missing pieces are a way
to find a variable by name in the list of scopes that we saved while
unwinding, and a way to find the containing scopes of an inlined
function. With that, we can finally look up parameters and variables in
stack traces.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
38573cfdde libdrgn: stack_trace: pretty print frames and add frames for inline functions
If we want to access a parameter or local variable in an inlined
function, then we need a stack frame for that function. It's also much
more useful to see inlined functions in the stack trace in general. So,
when we've unwound the registers for a stack frame, walk the debugging
information to find all of the (possibly inlined) functions at the
program counter, and add a drgn stack frame for each of those.

Also add StackFrame.name and StackFrame.is_inline so that we can
distinguish inline frames. Also add StackFrame.source() to get the
filename and line and column numbers. Finally, add the source code
location to pretty-printed stack traces and add pretty-printing for
individual stack frames that includes extra information.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
0e113ecc8d libdrgn: debug_info: add drgn_find_die_ancestors()
This will be used for finding the ancestors of the abstract instance
root corresponding to a concrete inlined instance root for variable
lookups in inlined functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
d8d4157346 libdrgn: debug_info: add drgn_debug_info_module_find_dwarf_scopes()
This will be used for finding functions, inlined functions, and blocks
containing a PC for stack unwinding and variable lookups.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
b6d810b344 libdrgn: debug_info: add DWARF DIE iterator
We have a couple of upcoming use cases for iterating through all of the
DIEs in a module: searching for scopes and searching for a DIE's
ancestors. Add a DIE iterator interface to abstract away the details of
walking DIEs and allows us to efficiently track ancestors.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
ffcb9ccb19 libdrgn: debug_info: implement creating objects from DWARF location descriptions
Add support for evaluating a DWARF location description and translating
it into a drgn object. In this commit, this is just used for global
variables, but an upcoming commit will wire this up to stack traces for
parameters and local variables.

There are a few locations that drgn's object model can't represent yet.
DW_OP_piece/DW_OP_bit_piece can describe objects that are only partially
known or partially in memory; we approximate these where we can. We
don't have a good way to support DW_OP_implicit_pointer at all yet.

This also adds test cases for DWARF expressions, which we couldn't
easily test before.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
8335450ecb libdrgn: debug_info: implement DW_OP_fbreg
Implement looking up location descriptions and evaluating DW_OP_fbreg.
This isn't actually used yet since CFI expressions don't have a current
function DIE, but it will be used for parameters/local variables in
stack traces.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
d5b68455b8 libdrgn: debug_info: save .debug_loc
.debug_loc will be used for variable resolution.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
e105be6c18 libdrgn: debug_info: add helper to cache module section
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
dcda688c9a libdrgn: debug_info: parenthesize PUSH() macro argument
It doesn't make a difference anywhere it's currently used, but let's do
it just in case that changes in the future.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
5fc879ef3e libdrgn: debug_info: limit number of DWARF expression operations executed
A malformed DWARF expression can easily get us into an infinite loop.
Avoid this by capping the number of operations that we'll execute.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
0e3054a0ba libdrgn: make addresses wrap around when reading memory
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:

1. Reworking drgn_memory_reader to work with an inclusive address range
   so that a segment can contain UINT64_MAX. drgn_memory_reader remains
   agnostic to the maximum address and requires that address ranges do
   not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
   drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
   drgn_program_read_memory() now that they are no longer equivalent.

(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-03 17:49:29 -07:00
Omar Sandoval
e5ff1ea7ac libdrgn: program: use preset platform in drgn_program_set_core_dump()
If the program already had a platform set, we should its callbacks
instead of the ones from the ELF file's platform.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-03 17:04:28 -07:00
Omar Sandoval
43b90ffb1b libdrgn: debug_info: add missing stack size check for DW_OP_deref
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-23 13:14:28 -07:00
Omar Sandoval
ad37c79cba libdrgn: python: add documentation and type annotation for Program.__contains__()
drgn.Program has supported the "in" operator since commit 25e7a9d3b8
("libdrgn/python: implement Program.__contains__"), but it's
undocumented and unannotated. Add a type annotation with a docstring
along with a METH_COEXIST method.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-12 16:26:56 -07:00
Omar Sandoval
92fd967a3a libdrgn: print uint8_t as hex with PRIx8 format, not x
In practice, they're probably always the same, but PRIx8 is more
correct.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-07 15:34:30 -07:00
Omar Sandoval
e0921c5bdb libdrgn: don't use OpenMP tasking
libomp (at least in LLVM 9 and 10) seems to have buggy OpenMP tasking
support. See commit 1cc3868955 ("CI: temporarily disable Clang") for
one example. OpenMP tasks aren't buying us much; they simplify DWARF
index updates in some places but complicate it in others. Let's ditch
tasks and go back to building an array of CUs to index similar to what
we did before commit f83bb7c71b ("libdrgn: move debugging information
tracking into drgn_debug_info"). There is no significant performance
difference.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-06 16:56:02 -07:00
Jay Kamat
95646b47c9 libdrgn: dwarf_index: add support for DW_FORM_indirect
First, add instructions for DW_FORM_indirect. Then, we can call the
function to convert a form to an instruction whenever we see an indirect
instruction.

Note that without elfutils commit d63b26b8d21f ("libdw: handle
DW_FORM_indirect when reading attributes") (queued for elfutils
0.184), DW_FORM_indirect will cause errors later when parsing with
libdw.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-05-04 16:56:54 -07:00
Omar Sandoval
609a1cafc6 libdrgn: dwarf_index: check for attribute forms more strictly
Rather than silently ignoring attributes whose form we don't recognize,
return an error. This way, we won't mysteriously skip indexing DIEs.
While we're doing this, split the form -> instruction mapping to its own
functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-04 16:56:54 -07:00
Omar Sandoval
2ad52cb5f4 libdrgn: add option to time load_debug_info example program
I often use examples/load_debug_info to benchmark loading/DWARF
indexing, so add a -T option that prints the time it takes to load debug
info.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-23 09:29:44 -07:00
Omar Sandoval
2d40d6e146 libdrgn: add configure~ to .gitignore
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-23 09:18:16 -07:00
Jay Kamat
6be21f674a libdrgn: follow DW_AT_signature when parsing DWARF types
When using type units, skeleton declarations are made instead of
concrete ones. However, these declarations have signature tags attached
that point to the type unit with the definition, so we can simply follow
the signature to get the concrete type.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-04-23 02:37:31 -07:00
Jay Kamat
9dabec1264 libdrgn: add support for parsing type units
Adds support for parsing of type units as enabled by
-fdebug-types-section. If a module has both a debug info section and
type unit section, both are read.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-04-23 02:37:31 -07:00
Omar Sandoval
33300d426e libdrgn: debug_info: don't overwrite Dwarf_Die passed to drgn_type_from_dwarf_internal()
If the DIE passed to drgn_type_from_dwarf_internal() is a declaration,
then we overwrite it with dwarf_offdie(). As far as I can tell, this
doesn't break anything at the moment, but it's sketchy to overwrite an
input parameter and may cause issues in the future. Use a temporary DIE
on the stack in this case instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-23 02:37:31 -07:00
Omar Sandoval
155ec92ef2 libdrgn: fix reading 32-bit float object values on big-endian
Closes #99.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-22 09:45:41 -07:00
Omar Sandoval
0e2703dd4e libdrgn: python: use _PyDict_GetItemIdWithError()
CPython commit fb5db7ec5862 ("bpo-42006: Stop using PyDict_GetItem,
PyDict_GetItemString and _PyDict_GetItemId. (GH-22648)") (in v3.10)
removed _PyDict_GetItemId() because it suppresses errors. Use
_PyDict_GetItemIdWithError() instead (which we should've been using
anyways).

Closes #101.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-22 01:17:22 -07:00
Omar Sandoval
c768e97394 libdrgn: python: use _Thread_local instead of PyThreadState for drgn_in_python
Using a Python dictionary for this is much more heavyweight than just
using a thread-local variable (with no benefit as far as I can tell).
This also gets rid of a call to _PyDict_GetItem().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-22 01:17:12 -07:00
Omar Sandoval
08498967f7 libdrgn: configure with large file support
/proc/pid/mem is indexed by address. On 32-bit systems, addresses may be
out of the range of a 32-bit signed off_t. This results in pread()
returning EINVAL in drgn_read_memory_file(). Use AC_SYS_LARGEFILE in
configure.ac so that we use 64-bit off_t by default.

Closes #98.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-21 13:34:31 -07:00
Omar Sandoval
78b4188dd9 drgn 0.0.11
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:50:09 -07:00
Omar Sandoval
e7367a4a94 libdrgn: Makefile: remove generated source files from CLEANFILES
We don't actually want make clean to remove the generated files that are
included in a distribution tarball, because then the user will need to
regenerate them, and they might not have the dependencies installed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:31:14 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
76d3348a6d libdrgn: hash_table: mark table##_delete_iterator() as unused
GCC doesn't warn about table##_delete_iterator() being unused because it
is inline, but Clang does, so add the unused attribute.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 16:28:46 -07:00
Omar Sandoval
acf722d315 libdrgn: hash_table: remove unused table##_chunk_set_capacity_scale()
The folly implementation calls this elsewhere, but we only need it in
table##_chunk_mark_eof(), so it was folded in there.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 16:27:27 -07:00
Omar Sandoval
b772432a86 libdrgn: cfi: don't rely on member containing a flexible array
Clang enables -Wgnu-variable-sized-type-not-at-end by default, which
warns for DRGN_CFI_ROW():

  arch_x86_64.c:735:27: warning: field 'row' with variable sized type 'struct drgn_cfi_row' not at the end of a struct or class is a GNU extension
        [-Wgnu-variable-sized-type-not-at-end]
          .default_dwarf_cfi_row = DRGN_CFI_ROW(

DRGN_CFI_ROW() is gnarly anyways, so instead of having it expand to a
pointer expression relying on this GCC extension, make it expand to an
initializer. Then, we can initialize default_dwarf_cfi_row as a separate
variable rather than directly in the initializer for struct
drgn_architecture_info.

This still relies on a GCC extension for static initialization of
flexible array members, but apparently Clang is okay with that one by
default (-Wgnu-flexible-array-initializer must be enabled explictly or
by -Wgnu or -Wpedantic).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 16:19:21 -07:00
Omar Sandoval
5c86e30b6e libdrgn: work around Clang __muloti4 for the third time
See commit 0cb77b303c ("libdrgn: work around Clang __muloti4 again")
and commit 2dd14ad522 ("libdrgn: work around "undefined reference to
'__muloti4'" when using Clang"). These keep sneaking in because I don't
have an old enough version of Clang lying around.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 15:30:07 -07:00
Omar Sandoval
301cc3f139 libdrgn: fix UBSan "applying zero offset to null pointer" errors
There are a couple of places where we compute `NULL + 0`, which is
undefined behavior. Add a helper to do this safely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 13:38:29 -07:00
Omar Sandoval
9c31f11e35 libdrgn: object: fix UBSan error for uninitialized boolean
drgn_object_reinit() and drgn_object_copy() can both load from an
uninitialized little_endian field, causing UBSan errors like:

  libdrgn/object.h:105:27: runtime error: load of value 68, which is not a valid value for type '_Bool'

This only happens when little_endian isn't valid for the type and won't
be used anyways, but it's easy enough to work around.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 13:38:13 -07:00
Omar Sandoval
c9dc7fd574 libdrgn: type: fix memcpy() undefined behavior
It's undefined behavior to pass NULL to memcpy() even if the length is
zero. See also commit a17215e984 ("libdrgn: dwarf_index: fix memcpy()
undefined behavior").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 13:38:13 -07:00
Omar Sandoval
beb0c9d640 drgn 0.0.10 2021-03-31 13:32:05 -07:00
Omar Sandoval
f285764f8a Include full libdrgn distribution in drgn sdist
Building drgn from an sdist currently requires autotools and gawk
because libdrgn in the sdist is more or less a git checkout. It's more
user-friendly to include the autotools output and generated code. Do
this by extending the sdist command to include a full libdrgn
distribution with `make distdir`.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-30 23:19:38 -07:00
Omar Sandoval
630d39e345 libdrgn: add ORC unwinder
The Linux kernel has its own stack unwinding format for x86-64 called
ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is
essentially a simplified, less complete version of DWARF CFI. ORC is
generated by analyzing machine code, so it is present for all but a few
ignored functions. In contrast, DWARF CFI is generated by the compiler
and is therefore missing for functions written in assembly and inline
assembly (which is widespread in the kernel).

This implements an ORC stack unwinder: it applies ELF relocations to the
ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule
kind, parses and efficiently stores ORC data, and translates ORC to drgn
CFI rules. This will allow us to stack trace through assembly code,
interrupts, and system calls.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-29 10:01:52 -07:00
Omar Sandoval
090064f20d libdrgn: x86-64: support R_X86_64_PC32 relocation type
This is used for .orc_unwind_ip for kernel modules.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 15:16:36 -07:00
Omar Sandoval
e0aaaf203d libdrgn: generalize applying ELF relocations
To support unwinding with ORC, we need to apply relocations to
.orc_unwind_ip, which libdwfl doesn't do. That means that we always need
to apply relocations on x86-64, not just as a fast path when the file's
byte order matches the host's. So, generalize handling of 64- vs 32-bit
and little- vs big-endian relocations, and move the handling of
relocation types to an arch-specific callback.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 15:16:35 -07:00
Omar Sandoval
63672be809 libdrgn: linux_kernel: save module .init section addresses
Linux kernel modules usually contain ELF relocations in DWARF and ORC
sections for symbols in .init sections. Since we ignore .init sections
entirely in cache_kernel_module_sections(), these relocations end up
being based on an address of 0 (so, e.g., a function from .init.text
could be reported as having an address of 0x0). It makes a little more
sense to use the address where the .init section was before it was
freed. So, let's update the sections' sh_addr but continue ignoring them
for determining the module's address range.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 15:13:47 -07:00
Omar Sandoval
da180b7274 libdrgn: handle errors from elf_strptr()
For some reason, we consistently ignore errors from elf_strptr(), but we
shouldn't.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 14:28:16 -07:00
Omar Sandoval
e5bc41f16c libdrgn: add latest elf.h and dwarf.h to support elfutils 0.165
The oldest LTS version of Ubuntu, 16.04, has elfutils 0.165. This
version is missing some ELF and DWARF definitions used by drgn. Add
copies of elf.h from glibc 2.33 and dwarf.h and elfutils/known-dwarf.h
from elfutils 0.183 to get the latest definitions and drop the minimum
required version of elfutils further to 0.165.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-21 23:18:39 -07:00
Serapheim Dimitropoulos
a68abd5de4 libdrgn: stretch minimum supported version of libelf to 0.170
Currently libdrgn requires libelf to be of version 0.175 or
later. This patch allows the library to be compiled with libelf
0.170 (the newest version supported by Ubuntu 18.04 LTS).

Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
2021-03-21 14:28:29 -07:00
Omar Sandoval
da0280016c libdrgn: python: identify bit fields in TypeMember.__repr__
If a member is a bit field, then we should format it with the underlying
Object so that it shows the bit field size.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-17 12:02:53 -07:00
Omar Sandoval
55354b3038 libdrgn: use flexible array for pgtable_iterator::arch
There's no reason to use GCC's zero-length array extension for this. Use
a standard flexible array instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-16 16:18:49 -07:00
Omar Sandoval
38d4330fec libdrgn: clean up stale comment references and Doxygen warnings
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-16 16:15:43 -07:00
Omar Sandoval
671947d185 libdrgn: remove unused drgn_program::attached_dwfl_state
I missed this when I removed the code that used it.

Fixes: eec67768aa ("libdrgn: replace elfutils DWARF unwinder with our own")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-16 15:41:07 -07:00
Omar Sandoval
4c5c5f3842 Remove bundled version of elfutils
We currently bundle a version of elfutils with patches to export
additional stack tracing functionality. This has a few drawbacks:

- Most of drgn's build time is actually building elfutils.
- Distributions don't like packages that bundle verions of other
  packages.
- elfutils, and thus drgn, can't be built with clang.

Now that we've replaced the elfutils DWARF unwinder with our own, we
don't need the patches, so we can drop the bundled elfutils and fix
these issues.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-16 00:52:09 -07:00
Omar Sandoval
eec67768aa libdrgn: replace elfutils DWARF unwinder with our own
The elfutils DWARF unwinder has a couple of limitations:

1. libdwfl doesn't have an interface for getting register values, so we
   have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
   error even on success, so we have to squirrel away our own errors in
   the callback.

Furthermore, there are a couple of things that will be easier with our
own unwinder:

1. Integrating unwinding using ORC will be easier when we're handling
   unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
   expression evaluation.

Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:43:12 -07:00
Omar Sandoval
35a1af7ad6 libdrgn: add DWARF expression evaluation
For DW_CFA_def_cfa_expression, DW_CFA_expression, and
DW_CFA_val_expression, we need to be evaluate a DWARF expression. Add an
interface for this. It doesn't yet support operations that aren't
applicable to CFI or some more exotic operations.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
fdaf7790a9 libdrgn: add DWARF call frame information parsing
In preparation for adding our own unwinder, add support for parsing and
finding DWARF/EH call frame information. Use a generic representation of
call frame information so that we can support other formats like ORC in
the future.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
0a6aaaae5d libdrgn: define structure for storing processor register values
libdwfl stores registers in an array of uint64_t indexed by the DWARF
register number. This is suboptimal for a couple of reasons:

1. Although the DWARF specification states that registers should be
   numbered for "optimal density", in practice this isn't the case. ABIs
   include unused ranges of numbers and don't order registers based on
   how likely they are to be known (e.g., caller-saved registers usually
   aren't recovered while unwinding the stack, but they are often
   numbered before callee-saved registers).
2. This precludes support for registers larger than 64 bits, like SSE
   registers.

For our own unwinder, we want to store registers in an
architecture-specific format to solve both of these problems.

So, have each architecture define its layout with registers arranged for
space efficiency and convenience when parsing saved registers from core
dumps. Instead of generating an arch_foo.c file from arch_foo.c.in,
separately define the logical register order in an arch_foo.defs file,
and use it to generate an arch_foo.inc file that is included from
arch_foo.c. The layout is defined as a macro in arch_foo.c. While we're
here, drop some register definitions that aren't useful at the moment.

Then, define struct drgn_register_state to efficiently store registers
in the defined format.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
cc1a5606d0 libdrgn: debug_info: save platform per module
Stack unwinding depends on some platform-specific information. If for
some reason a program has debugging information with different
platforms, then we need to make sure that while we're unwinding the
stack, we don't end up in a frame with a different platform, because the
registers won't make sense. Additionally, we should parse debugging
information using the module's platform rather than the program's
platform, which may not match. So, cache the platform derived from each
module's ELF file.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
6065fc87af libdrgn: debug_info: save .debug_frame, .eh_frame, .text, and .got
These sections are needed for stack unwinding. However, .debug_frame and
.eh_frame don't need to be read right away, and .text and .got don't
need to be read at all, so partition them accordingly. Also, check that
the sections are specifically SHT_PROGBITS rather than not SHT_NOBITS.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
744cc414d3 libdrgn: add copy_lsbytes()
It will be used to copy register values.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:48 -07:00
Omar Sandoval
b0a6d12501 libdrgn: binary_buffer: add binary_buffer_next_[us]int()
These will be used for parsing .debug_frame, .eh_frame, and DWARF
expressions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 12:13:45 -07:00
Omar Sandoval
b55a5f7f4b libdrgn: binary_buffer: add binary_buffer_next_sN()
Along with _into_s64 and _into_u64 variants. These will be used for
parsing .eh_frame and DWARF expressions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-10 02:07:30 -08:00
Omar Sandoval
e5219b13e3 libdrgn: binary_buffer: add binary_buffer_next_sleb128()
Revive it from all the way back in commit 90fbec02fc ("dwarfindex:
delete unused read_sleb128() and read_strlen()") and add an _into_u64
variant. These will be used for parsing .debug_frame, .eh_frame, and
DWARF expressions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-10 02:07:30 -08:00
Omar Sandoval
7eab40aaeb libdrgn: rename drgn_error_debug_info() to drgn_error_debug_info_scn()
An upcoming change will introduce a similar function for when the
section isn't known. Rename the original so that the new one can take
its name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-10 02:07:16 -08:00
Jay Kamat
4552d78f4a libdrgn: debug_info: try to find DIE specification when parsing type
Currently, we look up incomplete types by name, which can fail if the
name is ambiguous or the type is unnamed. Try finding the complete type
via the DW_AT_specification map in the DWARF index first.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-03-08 15:24:24 -08:00
Jay Kamat
3823b21e17 libdrgn: dwarf_index: uses DIE address instead of section offset
To support indexing DWARF 4 type units, we need to be able to
differentiate between DIEs in .debug_info and .debug_types. We can't do
that with just a section offset, so instead store the address of the DIE
in the index and specification map.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-03-08 15:24:24 -08:00
Omar Sandoval
ca1a2598fd libdrgn: python: add missing function name to Object.format_() exceptions
The ":function name" is missing from the PyArg_ParseTupleAndKeywords()
call in DrgnObject_format(), so errors say, for example, "'foo' is an
invalid keyword argument for this function" instead of "for format_()".

Fixes: cf3a07bdfb ("libdrgn: python: replace Object.__format__ with Object.format_")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-08 14:27:33 -08:00
Omar Sandoval
a24c0f5b33 libdrgn: clean up usage of drgn_stop
Use drgn_not_found where it's more appropriate, and check explicitly
against drgn_stop instead of err->code == DRGN_ERROR_STOP.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-05 12:46:06 -08:00
Omar Sandoval
4680b93103 libdrgn: improve truncate_signed() and truncate_unsigned()
truncate_signed() requires 5 operations (compute a mask for the lower
bits, and it, compute the sign extension mask, xor it, subtract it) and
a branch. We can do it in 3 operations and no branches if we assume that
the compiler does an arithmetic shift for signed integers, which we
already depend on. Then, we can remove sign_extend(), which is the same
as truncate_signed() except it assumes that the upper bits are zero to
save on a couple of operations.

Similarly, for truncate_unsigned() we can remove the branch.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
b5ed892481 Fix some include-what-you-use warnings and update for Bear 3
Bear 3 changed the CLI arguments, so update scripts/iwyu.py for it and
clean up some new warnings.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
98e1947d26 libdrgn: require non-NULL drgn_architecture_info::register_by_name
Instead of checking whether it's NULL, define a stub for
arch_info_unknown.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
25eb2abb1a libdrgn: add drgn_platform getters
Add low-level getters equivalent to the drgn_program platform-related
helpers and use them in places where we have checked or can assume that
the platform is known.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
e04eda9880 libdrgn: define HOST_LITTLE_ENDIAN
As a minor cleanup, instead of writing __BYTE_ORDER__ ==
__ORDER_LITTLE_ENDIAN__ everywhere, define and use HOST_LITTLE_ENDIAN.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Jay Kamat
c22e501295 libdrgn: debug_info: fix parsing specifications of declarations
drgn_compound_type_from_dwarf() and drgn_enum_type_from_dwarf() check
the DW_AT_declaration flag to decide whether the type is a declaration
of an incomplete type or a definition of a complete type. However, they
check DW_AT_declaration with dwarf_attr_integrate(), which follows the
DW_AT_specification reference if it is present. The DIE referenced by
DW_AT_specification typically is a declaration, so this erroneously
identifies definitions as declarations. Additionally, if
drgn_debug_info_find_complete() finds the same definition, we can end up
recursing until we hit the DWARF parsing depth limit. Fix it by not
using dwarf_attr_integrate() for DW_AT_declaration.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-02-25 10:46:34 -08:00
Omar Sandoval
aaa98ccae3 libdrgn: consistently use __ for __attribute__ names
In some places, we add __ preceding and following an attribute name, and
in some places, we don't. Let's make it consistent. We might as well opt
for the __ to make clashes with macros less likely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 03:16:23 -08:00
Omar Sandoval
c54ef80412 libdrgn: add missing LIBDRGN_PUBLIC exports
drgn_object_dereference_offset() and drgn_object_member_dereference()
are both in drgn.h.in but aren't exported. They should be.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 02:42:12 -08:00
Omar Sandoval
3ecb31de9f libdrgn: update stale references in drgn_object_slice() comment
drgn_program_member_info() was replaced by drgn_type_find_member() in
commit e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with
drgn_type_find_member()"). drgn_object_pointer_offset() never existed;
it's supposed to be drgn_object_dereference_offset().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 02:41:20 -08:00
Omar Sandoval
da1e72f0d5 libdrgn: remove drgn_{,qualified_}type_eq() from drgn.h.in
The definitions were removed but these public declarations weren't.

Fixes: 7d7aa7bf7b ("libdrgn/python: remove Type == operator")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 02:37:36 -08:00
Omar Sandoval
55e3a58e06 libdrgn: python: use correct member offset when creating object from value
We need to use the offset of the member in the outermost object type,
not the offset in the immediate containing type in the case of nested
anonymous structs.

Fixes: e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with drgn_type_find_member()")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 02:29:59 -08:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
72b4aa9669 libdrgn: clean up object initialization
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 17:43:14 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
f09ab62b73 drgn 0.0.9
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 02:19:09 -08:00
Omar Sandoval
36df5fc076 libdrgn: ppc64: fix fetching cr fields from pt_regs
The condition register fields are numbered from most significant to
least significant. Also, the CFI for unwinding the condition register
fields restores them in their position in the condition register, so do
the same when initially populating them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 00:45:14 -08:00
Kamalesh Babulal
221a218704 libdrgn: add powerpc stack trace support
Add powerpc specific register information required to retrive the
stack traces of the tasks on both live system and from the core dump.
It uses the existing DSL format to define platform registers and
helper functions to initial them. It also adds architecture specific
information to enable powerpc. Current support is for little-endian
powerpc only.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2021-01-29 11:31:59 -08:00
Omar Sandoval
b899a10836 Remove register numbers from API and add register aliases
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.

While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 17:47:45 -08:00
Omar Sandoval
10e6464769 libdrgn: python: clean up module creation
Add a helper based on PyModule_AddType() from Python 3.9 and use it to
simplify PyInit__drgn(). Also handle errors in PyInit__drgn().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 12:41:13 -08:00
Omar Sandoval
0d35dec8ee libdrgn: python: define Py_RETURN_BOOL
And use it instead of an if statement with
Py_RETURN_TRUE/Py_RETURN_FALSE or PyBool_FromLong().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 11:35:09 -08:00
Omar Sandoval
46343ae08d libdrgn: get rid of struct drgn_stack_frame
In preparation for adding a "real", internal-only struct
drgn_stack_frame, replace the existing struct drgn_stack_frame with
explicit trace/frame arguments.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-27 11:22:34 -08:00
Omar Sandoval
71c6ac6927 libdrgn: use drgn_debug_info_module instead of Dwfl_Module in more places
It's easier to go from drgn_debug_info_module to Dwfl_Module than the
other direction, and I'd rather use the "higher-level"
drgn_debug_info_module wherever possible. So, store
drgn_debug_info_module in the DWARF index (which also saves a
dereference while building the index), and pass around
drgn_debug_info_module when parsing types/objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-27 11:17:41 -08:00
Omar Sandoval
bbefc573d8 libdrgn: debug_info: make sure DW_TAG_template_value_parameter has value
Otherwise, an invalid DW_TAG_template_value_parameter can be confused
for a type parameter.

Fixes: 352c31e1ac ("Add support for C++ template parameters")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 12:07:46 -08:00
Omar Sandoval
2c612ea97f libdrgn: fix address of global per-CPU variables with KASLR
The address of a per-CPU variable is really an offset into the per-CPU
area, but we're applying the load bias (i.e., KASLR offset) to it as if
it were an address, resulting in an invalid pointer when it's eventually
passed to per_cpu_ptr().

Fix this by applying the bias only if it the address is in the module's
address range. This heuristic avoids any Linux kernel-specific logic;
hopefully it doesn't have any undesired side effects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 10:14:50 -08:00
Omar Sandoval
a7962e9477 libdrgn: debug_info: pass around Dwfl_Module instead of bias
We're going to need the module start and end in
drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and
get the bias when we need it. This means we don't need the bias from
drgn_dwarf_index_get_die(), so get rid of that, too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 10:12:29 -08:00
Omar Sandoval
048952f9a6 libdrgn: x86-64: fix rsp of initial stack frame
We're using task->thread.sp for rsp in the initial frame for both the
struct inactive_task_frame path and frame pointer path. This is not
correct for either.

For kernels with struct inactive_task_frame, task->thread.sp points to
to the struct inactive_task_frame. The stack pointer in the initial
frame is the address immediately after the struct inactive_task_frame.

For kernels without struct inactive_task_frame, task->thread.sp points
to the saved rbp. We follow that rbp to the rbp and return address for
the initial frame; its stack pointer is the address immediately after
those.

Fixes: 10142f922f ("Add basic stack trace support")
Fixes: 51596f4d6c ("libdrgn: x86-64: remove garbage initial stack frame on old kernels")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-15 10:57:08 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
b6958f920c libdrgn: debug_info: move object parsing code in debug_info.c
In preparation for calling the object parsing code from the type parsing
code, move it up in the file (and update the coding style in
drgn_object_from_dwarf_enumerator() while we're at it).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
be1bb279aa libdrgn: debug_info: pass DIE bias when parsing types
This will be needed for types containing reference objects.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
190062f470 libdrgn: get drgn_type_member.bit_field_size through drgn_member_type()
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
359177295d libdrgn: move type definitions in drgn.h
In preparation for struct drgn_type referencing struct drgn_object, move
the former after the latter.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
934dd36302 libdrgn: remove unused name parameter from drgn_object_from_dwarf_{subprogram,variable}()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 12:16:29 -08:00
Omar Sandoval
a57c26ed32 libdrgn: fix zero-length array GCC < 9.0 workaround for qualified types
We're not applying the zero-length array workaround when the array type
is qualified. Make sure we pass through can_be_incomplete_array when
parsing DW_TAG_{const,restrict,volatile,atomic}_type.

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 11:21:57 -08:00
Omar Sandoval
ca7682650d libdrgn: rename drgn_type_from_dwarf_child() to drgn_type_from_dwarf_attr()
The type comes from the DW_AT_type attribute of the DIE, not a child
DIE, so this is a better name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 11:02:27 -08:00
Omar Sandoval
798f0887a5 libdrgn: simplify language fall back handling
If the language for a DWARF type is not found or unrecognized, we should
fall back to the global default, not the program default (the program
default language is for language-specific operations on the program, so
DWARF parsing shouldn't depend on it). Add a fall_back parameter to
drgn_language_from_die() and use it in DWARF parsing, and replace
drgn_language_or_default() with a drgn_default_language variable.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 10:46:35 -08:00
Omar Sandoval
a8be40ca60 libdrgn: python: fix Program_hold_object() reference leak
We should only increment a held object's reference count when it is
initially inserted into the set; subsequent holds are no-ops.

Fixes: a8d632b4c1 ("libdrgn/python: use F14 instead of PyDict for Program::objects")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-06 17:48:29 -08:00
Omar Sandoval
b87070f98c libdrgn: fix vector_shrink_to_fit() with size 0
realloc(ptr, 0) is equivalent to free(ptr). It may return NULL, in which
case vector_do_shrink_to_fit() won't update the vector's data and
capacity. A subsequent append will then try to reuse the previous
allocation, causing a use-after-free. free() empty vectors explicitly
instead.

Fixes: 8d52536271 ("libdrgn: add common vector implementation")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-06 17:47:39 -08:00
Omar Sandoval
d6a840ec30 libdrgn: deinitialize empty members/parameters/enumerators when deduplicating
Right now, an empty builder vector will not have anything to free, but
if we start pre-reserving these later, it will be a leak.

Fixes: c7af566c6e ("libdrgn: deduplicate all types with no members/parameters/enumerators")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-06 17:47:25 -08:00
Omar Sandoval
c7af566c6e libdrgn: deduplicate all types with no members/parameters/enumerators
Even if a compound, function, or enumerated type is complete, we can
still deduplicate it as long as it doesn't have members, parameters, or
enumerators.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-06 01:59:48 -08:00
Omar Sandoval
988e9e7190 libdrgn/python: add Object.absent_
Without this, the only way to check whether an object is absent in
Python is to try to use the object and catch the ObjectAbsentError.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 15:06:40 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
c2eec00ae0 libdrgn/python: use None instead of 0 for TypeMember.bit_field_size
Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by
using None to represent a non-bit field instead of 0.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-25 01:53:23 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
523fd26959 libdrgn: don't allow casting to non-scalar types at all
Currently, we try to emulate the GNU C extension of casting a struct
type to itself. This does a deep type comparison, which is expensive. We
could take a shortcut like only comparing the kind and type name, but
seeing as standard C only allows casting to a scalar type, let's drop
support for casting to a struct (or other non-scalar) type entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 02:46:05 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
a595e52d22 libdrgn/python: add Type.has_member()
Add drgn_type_has_member() to libdrgn and Type.has_member() to the
Python bindings. This can simplify some version checks, like the one in
_for_each_block_device() since commit 9a10a927b0 ("helpers: fix
for_each_{disk,partition}() on kernels >= v5.1").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:38:48 -08:00
Omar Sandoval
fd04463596 libdrgn/python: add Type.member()
In Python, looking up a member in a drgn Type by name currently looks
something like:

  member = [member for member in type.members if member.name == "foo"][0]

Add a Type.member(name) method, which is both easier and more efficient.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:10:23 -08:00
Omar Sandoval
e72ecd0e2c libdrgn: replace drgn_program_member_info() with drgn_type_find_member()
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:40:54 -08:00
Omar Sandoval
cf9a068820 libdrgn/python: fix reference counting on Type.members and Type.parameters
The TypeMember and TypeParameter instances referring to a libdrgn
drgn_lazy_type are only valid as long as the Type containing them is
still alive. Hold a reference on the containing Type from LazyType. We
can do this without growing LazyType by getting rid of the enum state
and using sentinel values for LazyType::lazy_type as the state.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:09:12 -08:00
Omar Sandoval
738ae2c75f libdrgn: pack struct drgn_object better
We can get struct drgn_object down from 40 bytes to 32 bytes (on x86-64)
by moving the bit_offset and little_endian members out of the value and
reference structs.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
22c1d87aec libdrgn: cache page_offset and vmemmap as objects instead of uint64_t
This is a little cleaner and saves on conversions back and forth between
C values and objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 02:40:07 -08:00
Omar Sandoval
bce9ef5f8d libdrgn: linux kernel: remove THREAD_SIZE object finder
THREAD_SIZE is still broken and I haven't looked into the root cause
(see commit 95be142d17 ("tests: disable THREAD_SIZE test")). We don't
need it anymore anyways, so let's remove it entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 02:08:13 -08:00
Omar Sandoval
51596f4d6c libdrgn: x86-64: remove garbage initial stack frame on old kernels
On old kernels, we set the initial frame as containing only rbp and let
libdwfl unwind it assuming frame pointers from there. This means that
the initial frame has a garbage rip. Follow the frame pointer and set
the previous rbp and return address ourselves instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 02:02:54 -08:00
Omar Sandoval
6e189027be libdrgn: x86-64: pass frame object as const
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 01:55:36 -08:00
Omar Sandoval
3187453689 libdrgn: x86-64: remove unused read
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 01:38:11 -08:00
Omar Sandoval
ffa2e0acf1 libdrgn: add missing break in drgn_object_copy()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-09 10:20:37 -08:00
Omar Sandoval
97fbedec1f libdrgn: return unavailable objects for DWARF objects without value or address
Now that we have the concept of unavailable objects, use it for DWARF
where appropriate.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 14:15:09 -08:00
Omar Sandoval
6bd0c2b4d2 libdrgn: add concept of "unavailable" objects
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:58:19 -08:00
Omar Sandoval
5f17281926 libdrgn: make drgn_object::is_reference an enum
To prepare for a new kind of object, replace the is_reference bool with
an enum drgn_object_kind.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:37:58 -08:00
Omar Sandoval
edb1fe7f2f libdrgn: rename drgn_object_kind to drgn_object_encoding
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:26 -08:00
Omar Sandoval
2710b4d2aa libdrgn: add macros for strict enum switch statements
There are several places where we'd like to enforce that every
enumeration is handled in a switch. Add SWITCH_ENUM() and
SWITCH_ENUM_DEFAULT() macros for that and use them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:23 -08:00
Omar Sandoval
a4dbd7bf95 libdrgn: remove unused DRGN_NUM_ARCH
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:23 -08:00
Omar Sandoval
3360170336 libdrgn: only install page table memory reader when supported
If virtual address translation isn't implemented for the target
architecture, then we shouldn't add the page table memory reader. If we
do, we get a DRGN_ERROR_INVALID_ARGUMENT error from
linux_helper_read_vm() instead of a DRGN_ERROR_FAULT error as expected.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-27 01:27:30 -08:00
Omar Sandoval
5975d19580 libdrgn: report better errors when parsing DWARF/kmod index
If the DWARF index encounters any error while parsing, it returns an
error saying only "debug information is truncated", which makes it hard
to track down parsing errors. The kmod index parser silently swallows
errors. For both, replace the mread functions with a higher-level
binary_buffer interface that can include more information including the
location of the error. For example:

  /tmp/mybinary: .debug_info+0x4: expected at least 56 bytes, have 55

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-13 17:00:07 -08:00
Omar Sandoval
756e5d27ad libdrgn: debug_info: put sections in an array (again)
Back in commit 9ce9094ee0 ("libdrgn: dwarf_index: don't copy sections
into each CU"), I changed the sections to be individual members. The
next change will be easier if they're in an array.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 16:22:04 -08:00
Omar Sandoval
f0a3629c26 libdrgn: debug_info: add dwarf_tag_str() and use it for error messages
There are several places where we manually pass around the string name
of a tag so it can be used for error messages. Do it programatically
instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 16:22:04 -08:00
Omar Sandoval
3885697696 drgn 0.0.8
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 13:32:04 -08:00
Omar Sandoval
fa081e32b9 libdrgn: update module section iterator for Linux v5.8
Linux v5.8 changed the module section structure, so we need to get the
section name differently.

Closes #73.

Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 13:07:36 -07:00
Omar Sandoval
1c6465f0b0 libdrgn: fix infinite loop on error caching kernel module sections
If cache_kernel_module_sections() in report_loaded_kernel_module()
fails, we continue to the next iteration without advancing to the next
kernel module. Then, we fail on that same kernel module and repeat. Make
sure that we go to the next kernel module.

Fixes: 423d2cd500 ("libdrgn: dwarf_index: rework file reporting")
Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 12:25:22 -07:00
Omar Sandoval
431b91ddb5 libdrgn: fix use-after-free in kernel module reporting error case
We're freeing path and then using it to report an error.

This has some weird knock-on effects. Since we freed the path, the error
message contains garbage. So, PyErr_SetString() can't decode it as a
UTF-8 string. The end result is a MissingDebugInfoError with no message.

Fix it by creating the error before freeing the path.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 11:55:06 -07:00
Omar Sandoval
2b325b9262 libdrgn: add an environment variable to disable use of /proc/modules and /sys/module
We use /proc/modules and /sys/module to find loaded kernel modules for
the running kernel instead of walking the module list in the core dump
as an optimization. To make it easier to test the core dump path, add an
environment variable to disable the optimization.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 11:24:39 -07:00
Omar Sandoval
661a5c56c3 libdrgn: refactor kernel module iterator
The next commit will allow using the offline path for the live kernel,
so the offline naming won't make much sense. Fold the offline path into
the top-level functions, and make the live path an escape hatch. Also
add some comments and improve naming for the file and directory handles
and update the coding style.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 23:22:01 -07:00
Omar Sandoval
ce8540e39c libdrgn: get rid of kernel_module_iterator::notes*
These were added in commit e5874ad18a ("libdrgn: use libdwfl"), but
they have never been used. Remove them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 17:23:31 -07:00
Omar Sandoval
3c5d22637e libdrgn: clean up hash function APIs and improve documentation
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 16:20:08 -07:00
Omar Sandoval
761da83ddd libdrgn: add {min,max}_iconst() and rewrite min() and max()
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-10 23:48:03 -07:00
Omar Sandoval
fa44171ba1 libdrgn: split bit operations into their own header
And improve their documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 17:44:15 -07:00
Omar Sandoval
cae79d2676 libdrgn: add preprocessor utility macros
These will be used in upcoming changes.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 16:36:59 -07:00
Omar Sandoval
4cbb9b552a libdrgn: fix comparison of types with anonymous members
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-08 17:32:46 -07:00
Omar Sandoval
de6a4e07ae libdrgn: fix Doxygen
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-30 01:32:33 -07:00
Omar Sandoval
2704fd17c3 libdrgn: update Doxyfile
doxygen warns about a few obsolete Doxyfile options. Update it with
doxygen -u.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-29 23:04:08 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
fdbe336386 libdrgn: use -isystem for elfutils headers
The elfutils header files should be treated as if they were in the
standard location, so use -isystem instead of -I.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 15:45:10 -07:00
Omar Sandoval
89b5da2abb libdrgn: dwarf_index: free namespaces when rolling back
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
e69d0c0064 libdrgn: dwarf_index: fix use after free of pending CU
If we create a pending CU for a namespace, then add more CUs to the
index, the CU might get reallocated, resulting in a use after free. Fix
it by storing the index of the CU instead of the pointer.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
f83bb7c71b libdrgn: move debugging information tracking into drgn_debug_info
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.

This also reworks a couple of details of loading debugging information:

- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
  a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
  compilation units (by using OpenMP tasks).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
3ac9ae357b libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info
The current name is too verbose. Let's go with a shorter, more generic
name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-11 17:41:23 -07:00
Jay Kamat
d1beb0184a libdrgn: add support for objects in C++ namespaces
DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the
DWARF index, with each namespace being its own sub-index. We only index
the namespace itself when it is first accessed, which should help with
startup time and simplifies tracking.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2020-09-02 17:13:16 -07:00
Jay Kamat
a51abfcd70 libdrgn: dwarf_index: keep CUs after indexing
In order to index namespaces lazily, we need the CU structures. Rename
struct compilation_unit to the less generic struct drgn_dwarf_index_cu
and keep the CUs in a vector in the dindex.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
66ad5077c9 libdrgn: dwarf_index: return indexed DIE entry from drgn_dwarf_index_iterator_next()
For namespace support, we will want to access the struct
drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split
drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
d512964c1e libdrgn: add drgn_error_copy()
This is needed for a future change where we'll want to save an error and
return it multiple times.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
7a85b4188e libdrgn: clean up read.h helpers and avoid undefined pointer behavior
There are a couple of related ways that we can cause undefined behavior
when parsing a malformed DWARF or depmod index file:

1. There are several places where we increment the cursor to skip past
   some data. It is undefined behavior if the result points out of
   bounds of the data, even if we don't attempt to dereference it.
2. read_in_bounds() checks that ptr <= end. This pointer comparison is
   only defined if ptr and end both point to elements of the same array
   object or one past the last element. If ptr has gone past end, then
   this comparison is likely undefined anyways.

Fix it by adding a helper to skip past data with bounds checking. Then,
all of the helpers can assume that ptr <= end and maintain that
invariant. while we're here and auditing all of the call sites, let's
clean up the API and rename it from read_foo() to the less generic
mread_foo().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
c053c2b212 libdrgn: dwarf_index: handle DW_AT_specification with DW_FORM_ref_addr
Now that we can handle a DW_AT_specification that references another
compilation unit, add support for DW_FORM_ref_addr.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
26291647eb libdrgn: dwarf_index: handle DW_AT_specification DIEs with two passes
We currently handle DIEs with a DW_AT_specification attribute by parsing
the corresponding declaration to get the name and inserting the DIE as
usual. This has a couple of problems:

1. It only works if DW_AT_specification refers to the same compilation
   unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not
   DW_FORM_ref_addr. As a result, drgn doesn't support the latter.
2. It assumes that the DIE with DW_AT_specification is in the correct
   "scope". Unfortunately, this is not true for g++: for a variable
   definition in a C++ namespace, it generates a DIE with
   DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE
   which refers to the declaration with DW_AT_specification _outside_ of
   the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE.

Supporting both of these cases requires reworking how we handle
DW_AT_specification. This commit takes an approach of parsing the DWARF
data in two passes: the first pass reads the abbrevation and file name
tables and builds a map of instances of DW_AT_specification; the second
pass indexes DIEs as before, but ignores DIEs with DW_AT_specification
and handles DIEs with DW_AT_declaration by looking them up in the map
built by the first pass.

This approach is a 10-20% regression in indexing time in the benchmarks
I ran. Thankfully, it is not 100% slower for a couple of reasons. The
first is that the two passes are simpler than the original combined
pass. The second is that a decent part of the indexing time is spent
faulting in the mapped debugging information, which only needs to happen
once (even if the file is cached, minor page faults add non-negligible
overhead).

This doesn't handle DW_AT_specification "chains" yet, but neither did
the original code. If it is necessary, it shouldn't be too difficult to
add.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
507977664c libdrgn: dwarf_index: store abbrevation and file name tables in CU
This is preparation for the next change where we'll need to do two
passes over the CUs.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
0b4ab1772b libdrgn: dwarf_index: store DIE indices as uint32_t
It's very unlikely that we'll ever index more than 4 billion DIEs in a
single shard, so we can shrink the index a bit by using uint32_t
indices (and uint8_t tag).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
9ce9094ee0 libdrgn: dwarf_index: don't copy sections into each CU
I originally copied the sections into each compilation unit to avoid a
pointer indirection, but performance-wise it's a wash, so we might as
well save the memory. This will be more important when we keep the CUs
after indexing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
94e7b1f92c libdrgn: dwarf_index: avoid copying CUs for one thread
In read_cus(), the master thread can use the final CUs vector directly
and the rest of the threads can merge their private vectors in. This
consistently shaves a few milliseconds off of startup.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
53ba7262cd libdrgn: dwarf_index: handle DW_AT_declaration with DW_FORM_flag
We currently assume that if DW_AT_declaration is present, it is true.
This seems to be true in practice, and I see no reason to ever use
DW_FORM_flag with a value of zero. There's no performance hit to handle
it, though, so we might as well.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
ea9f3f3114 libdrgn: dwarf_index: don't worry about tag of CU DIE
As a small simplification, we can take commit 9bb2ccecb7 ("Enable
DWARF indexing to work with partial units") further and not look at the
tag of the top-level DIE at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
c8f84c57fb libdrgn: dwarf_index: use size_t instead of uint64_t where appropriate
The CU unit length and DIE offset are both limited by the size of the
mapped debugging information, i.e., size_t.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
2252bef1a7 libdrgn: dwarf_index: rename TAG_FLAG_* and TAG_MASK to DIE_FLAG_*
This is more clear: although these flags happen to be encoded with the
DWARF tag, they are flags regarding the DIE.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
85c4b36820 libdrgn: dwarf_index: fix leak when parsing bad line number program header
If we fail to read an include directory in read_file_name_table(), we
need to free the directory hashes.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
ff96c75da0 helpers: translate task_state_to_char() to Python
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 13:54:39 -07:00
Omar Sandoval
e96d9fd3fd libdrgn/python: don't allow None for Program.object() flags
Similar to the previous commit, this was to work around pydoc issues
that we don't have anymore.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
e49a87a3d7 libdrgn: remove struct drgn_object::prog
We can get it via the type now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:21 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
1c8181e22d libdrgn: rearrange struct drgn_program members
struct drgn_program has a bunch of state scattered around. Group it
together more logically, even if it means sacrificing some padding here
and there.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:34:44 -07:00
Omar Sandoval
d4e0771f87 libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}()
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
a8d632b4c1 libdrgn/python: use F14 instead of PyDict for Program::objects
Program::objects is used to store references to objects that must stay
alive while the Program is alive. It is currently a PyDict where the
keys are the object addresses as PyLong and the values are the objects
themselves. This has two problems:

1. Allocating the key as a full object is obviously wasteful.
2. PyDict doesn't have an API for reserving capacity ahead of time,
   which we want for an upcoming change.

Both of these are easily fixed by using our own hash table.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
arsarwade
6f6c5f272f
libdrgn: export function drgn_object_init() (#70)
drgn_object_init() is available in drgh.h file and seems to a required
call before calling drgn_program_find_object().

Without this, trying to call drgn_object_init() from an external C
application results in undefined reference.

Signed-off-by: Aditya Sarwade <asarwade@fb.com>
2020-08-21 10:24:52 -07:00
Omar Sandoval
0cf3320a89 Add type annotations to helpers
Now that drgndoc can handle overloads and we have the IntegerLike and
Path aliases, we can add type annotations to all helpers. There are also
a couple of functional changes that snuck in here to make annotating
easier.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:28:02 -07:00
Omar Sandoval
2d49ef657b Add Path type alias
Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add
an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes]
to get rid of some mypy --strict errors.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:20:29 -07:00
Omar Sandoval
66c5cc83a6 Add IntegerLike type annotation
Lots if interfaces in drgn transparently turn an integer Object into an
int by using __index__(), so add an IntegerLike protocol for this and
use it everywhere applicable.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:16:50 -07:00
Omar Sandoval
20bcde1f1d drgn 0.0.7
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 23:32:32 -07:00
Omar Sandoval
025989871b drgn 0.0.6
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:25:54 -07:00
Omar Sandoval
e3309765f9 helpers: add kaslr_offset() and move pgtable_l5_enabled()
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:00:16 -07:00
Omar Sandoval
e7f353c118 libdrgn: hash_table: clean up coding style
Clean up the coding style of the remaining few places that the last
couple of changes didn't rewrite.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 11:53:05 -07:00
Omar Sandoval
f94b0262c6 libdrgn: hash_table: implement vector storage policy
The folly F14 implementation provides 3 storage policies: value, node,
and vector. The default F14FastMap/F14FastSet chooses between the value
and vector policies based on the value size.

We currently only implement the value policy, as the node policy is easy
to emulate and the vector policy would've added more complexity. This
adds support for the vector policy (adding even more C abuse :) and
automatically chooses the policy the same way as folly. It'd be easy to
add a way to choose the policy if needed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 11:53:00 -07:00
Omar Sandoval
9ea11a7c26 libdrgn: hash_table: port reserve optimization
The only major change to the folly F14 implementation since I originally
ported it is commit 3d169f4365cf ("memory savings for F14 tables with
explicit reserve()"). That is a small improvement for small tables and a
large improvement for vector tables, which are about to be added.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
2eab47ce9e libdrgn: hash_table: use posix_memalign() instead of aligned_alloc()
posix_memalign() doesn't have the restriction that the size must be a
multiple of the alignment like aligned_alloc() does in C11.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
2409868409 libdrgn: hash_table: define chunk alignment constant
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
6d4af7e17e libdrgn: dwarf_info_cache: handle variables DW_AT_const_value
Compile-time constants have DW_AT_const_value instead of DW_AT_location.
We can translate those to a value object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 15:23:51 -07:00
Omar Sandoval
213c148ce6 libdrgn: dwarf_info_cache: handle DW_AT_endianity
Variables can have a non-default endianity. Handle it and clean up
variable endian handling.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 14:26:58 -07:00
Omar Sandoval
c840072d05 libdrgn: make drgn_object_set_buffer() take a void *
It's awkward to make callers cast to char *.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 10:25:03 -07:00
Omar Sandoval
f1eaf5b14c libdrgn: add load_debug_info example program
Really it's more of a test program than an example program. It's useful
for benchmarking, testing with valgrind, etc. It's not built by default,
but it can be built manually with:

  $ make -C build/temp.* examples/load_debug_info

And run with:

  $ ./build/temp.*/examples/load_debug_info

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-10 16:18:58 -07:00
Omar Sandoval
3028da4d1d libdrgn: compare language in drgn_type_eq()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-08 22:07:49 -07:00
Omar Sandoval
1b47b866b4 libdrgn: go back to trusting PRSTATUS PID
Commit eea5422546 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
2020-07-08 18:34:16 -07:00
Omar Sandoval
293418294a libdrgn: assume compiler uses sane integer implementation
I once tried to implement a generic arithmetic right shift macro without
relying on any implementation-defined behavior, but this turned out to
be really hard. drgn is fairly tied to GCC and GCC-compatible compilers
(like Clang), so let's just assume GCC's model [1]: modular conversion
to signed types, two's complement signed bitwise operators, and sign
extension for signed right shift.

1: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
2020-07-07 17:18:17 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
e4c52c5422 libdrgn: linux_kernel: use names for kmod index constants
This makes it much easier to follow along with the code and understand
the format.
2020-06-30 15:14:21 -07:00
Omar Sandoval
03d8cb0e32 libdrgn: fix hash_pair_from_non_avalanching_hash() on 64-bit without SSE 4.2
We were forgetting to mask away the extra bits. There are two places
that we use the tag without converting it to a uint8_t:
hash_table_probe_delta(), which is mostly benign since we mask it by the
chunk mask anyways; and table_chunk_match() without SSE 2, which
completely breaks.

While we're here, let's align the comments better.
2020-06-24 13:33:08 -07:00
Omar Sandoval
8e7c1f1009 drgn 0.0.5 2020-05-26 10:04:06 -07:00
Omar Sandoval
a227d0d50e Update elfutils and revert activation frame patch
After thinking about it some more, I realized that "libdwfl: simplify
activation frame logic" breaks the case where during unwinding someone
queries isactivation for reasons other than knowing whether to decrement
program counter. Revert the patch and refactor "libdwfl: add interface
for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it
differently.

Based on:

c95081596 size: Also obey radix printing for bsd format.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-20 13:38:49 -07:00
Omar Sandoval
eea5422546 libdrgn: make Linux kernel stack unwinding more robust
drgn has a couple of issues unwinding stack traces for kernel core
dumps:

1. It can't unwind the stack for the idle task (PID 0), which commonly
   appears in core dumps.
2. It uses the PID in PRSTATUS, which is racy and can't actually be
   trusted.

The solution for both of these is to look up the PRSTATUS note by CPU
instead of PID.

For the live kernel, drgn refuses to unwind the stack of tasks in the
"R" state. However, the "R" state is running *or runnable*, so in the
latter case, we can still unwind the stack. The solution for this is to
look at on_cpu for the task instead of the state.
2020-05-20 12:03:00 -07:00
Omar Sandoval
146930aff8 libdrgn: replace arch frame_registers with callbacks
We currently unwind from pt_regs and NT_PRSTATUS using an array of
register definitions. It's more flexible and more efficient to do this
with an architecture-specific callback. For x86-64, this change also
makes us depend on the binary layout rather than member names of struct
pt_regs, but that shouldn't matter unless people are defining their own,
weird struct pt_regs.
2020-05-19 17:11:27 -07:00
Omar Sandoval
4d8597f0f8 libdrgn: add THREAD_SIZE to Linux kernel object finder
Despite the naming, this is the kernel stack size.
2020-05-19 17:10:54 -07:00
Omar Sandoval
971a2d3687 libdrgn/python: make Objects fully immutable
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
2020-05-18 00:07:49 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
c339113f9c libdrgn: adjust program counter when looking up frame symbol
For functions that call a noreturn function, the compiler may omit code
after the call instruction. This means that the return address may not
lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is
an "activation", i.e., its program counter is guaranteed to lie within
the caller. This is only the case for the initial frame, frames
interrupted by a signal, and the signal trampoline frame. For everything
else, we need to decrement the program counter before doing any lookups.
2020-05-13 17:11:54 -07:00
Omar Sandoval
175f83fc23 Update elfutils with noreturn unwinding fix
Rebase on master and fix dwfl_frame_module/dwfl_frame_dwarf_frame to
decrement the program counter when necessary.

Based on:

a8493c12a libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: simplify activation frame logic
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-13 16:41:52 -07:00
Omar Sandoval
bf545105c6 libdrgn: build in silent mode by default
The automake/libtool compilation output is obnoxiously verbose. Switch
on automake's silent mode, and make the custom rules honor it.
2020-05-10 00:12:50 -07:00
Omar Sandoval
2d1481f5ab libdrgn: add page table walker kernel memory reader
Now that we can walk page tables, we can use it in a memory reader that
reads kernel memory via the kernel page table. This means that we don't
need libkdumpfile for ELF vmcores anymore (although I'll keep the
functionality around until this code has been validated more).
2020-05-08 17:37:56 -07:00
Omar Sandoval
e697be707c libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET
I originally wanted to avoid depending on another vmcoreinfo field, but
an the next change is going to depend on swapper_pg_dir in vmcoreinfo
anyways, and it ends up being simpler to use it.
2020-05-08 17:37:56 -07:00
Omar Sandoval
8a276838ac helpers: add access_process_vm() and access_remote_vm()
Now that we can walk page tables, we can finally read memory from
userspace tasks.

Closes #53.
2020-05-08 17:37:01 -07:00
Omar Sandoval
d0a1718451 libdrgn: implement virtual address translation/page table walking
There are a few big use cases for this in drgn:

* Helpers for accessing memory in the virtual address space of userspace
  tasks.
* Removing the libkdumpfile dependency for vmcores.
* Handling gaps in the virtual address space of /proc/kcore (cf. #27).

I dragged my feet on implementing this because I thought it would be
more complicated, but the page table layout on x86-64 isn't too bad.
This commit implements page table walking using a page table iterator
abstraction. The first thing we'll add on top of this will be a helper
for reading memory from a virtual address space, but in the future it'd
also be possible to export the page table iterator directly.
2020-05-08 17:36:19 -07:00
Omar Sandoval
63299e0701 libdrgn: actually use uint64_t for two's complement unary ops
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
2020-05-08 13:50:24 -07:00
Omar Sandoval
8f81ea255f libdrgn: don't use unaligned loads to parse DWARF
-fsanitize=undefined reports that the read_u* helpers rely on unaligned
loads. Use memcpy() instead.
2020-05-08 13:50:24 -07:00
Omar Sandoval
3d59e042f4 libdrgn: don't open-code fls()
c_integer_literal() has an open-coded equivalent of fls() that assumes
that unsigned long long is 64 bits. Use fls() instead.
2020-05-08 00:20:42 -07:00
Omar Sandoval
340e00dfb5 libdrgn: improve and document bit operations
fls() can be implemented with __bitop(), and we can get rid of clz() since
it's only used by fls().
2020-05-08 00:14:25 -07:00
Omar Sandoval
f49d68d8f9 libdrgn: split generic utility functions out of internal.h
internal.h includes both drgn-specific helpers and generic utility
functions. Split the latter into their own util.h header and use it
instead of internal.h in the generic data structure code. This makes it
easier to copy the data structures into other projects/test programs.
2020-05-07 16:03:43 -07:00
Omar Sandoval
a95e42ef2e libdrgn/python: use vector for Program_load_debug_info()
Program_load_debug_info() is the last user of the
resize_array()/realloc_array() utility functions. We can clean it up by
using a vector and finally get rid of those functions.

This also happens to fix three bugs in Program_load_debug_info(): we
weren't setting a Python exception if we couldn't allocate the path_args
array, we weren't zeroing path_args after resizing the array, and we
weren't freeing the path_args array. Shame on whoever wrote this.
2020-05-07 15:47:57 -07:00
Omar Sandoval
0a100064c1 libdrgn: improve and rename DRGN_UNREACHABLE()
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.

DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
2020-05-07 15:16:22 -07:00
Omar Sandoval
d759c7ed20 libdrgn: get rid of OFF_MAX
This hasn't been used since commit 417a6f0d76 ("libdrgn: make memory
reader pluggable with callbacks").
2020-05-07 14:41:05 -07:00
Omar Sandoval
23574e59d5 libdrgn: add /proc/kcore physical segments on old kernels
Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so
it's currently not possible to read from physical memory on old kernels.
However, if we can figure out the address of the direct mapping, then we
can determine the corresponding physical addresses for the segments and
add them.
2020-05-04 13:20:27 -07:00
Omar Sandoval
f8c33518eb libdrgn: handle kernel core dumps with all zero p_paddr
We treat core dumps with all zero p_paddrs as not having valid physical
addresses. However, it is theoretically possible for a kernel core dump
to only have one segment which legitimately has a p_paddr of 0 (e.g., if
it only has a segment for the direct mapping, although note that this
isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the
BIOS [1]).

If the core dump has a VMCOREINFO note, then it is either a vmcore,
which has valid physical addresses, or it is /proc/kcore with Linux
kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to
/proc/kcore") (in v4.19), so it must also have Linux kernel commit
464920104bf7 ("/proc/kcore: update physical address for kcore ram and
text") (in v4.11) (ignoring the possibility of a franken-kernel which
backported the former but not the latter). Therefore, treat core dumps
with a VMCOREINFO note as having valid physical addresses.

1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678
2020-05-04 13:20:27 -07:00
Omar Sandoval
5505628235 libdrgn: get rid of struct drgn_program.num_file_segments
This isn't used anymore. Remove it and simplify the loop adding file
segments.
2020-05-04 13:20:27 -07:00