Commit Graph

519 Commits

Author SHA1 Message Date
Omar Sandoval
d4e0771f87 libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}()
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
a8d632b4c1 libdrgn/python: use F14 instead of PyDict for Program::objects
Program::objects is used to store references to objects that must stay
alive while the Program is alive. It is currently a PyDict where the
keys are the object addresses as PyLong and the values are the objects
themselves. This has two problems:

1. Allocating the key as a full object is obviously wasteful.
2. PyDict doesn't have an API for reserving capacity ahead of time,
   which we want for an upcoming change.

Both of these are easily fixed by using our own hash table.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
arsarwade
6f6c5f272f
libdrgn: export function drgn_object_init() (#70)
drgn_object_init() is available in drgh.h file and seems to a required
call before calling drgn_program_find_object().

Without this, trying to call drgn_object_init() from an external C
application results in undefined reference.

Signed-off-by: Aditya Sarwade <asarwade@fb.com>
2020-08-21 10:24:52 -07:00
Omar Sandoval
0cf3320a89 Add type annotations to helpers
Now that drgndoc can handle overloads and we have the IntegerLike and
Path aliases, we can add type annotations to all helpers. There are also
a couple of functional changes that snuck in here to make annotating
easier.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:28:02 -07:00
Omar Sandoval
2d49ef657b Add Path type alias
Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add
an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes]
to get rid of some mypy --strict errors.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:20:29 -07:00
Omar Sandoval
66c5cc83a6 Add IntegerLike type annotation
Lots if interfaces in drgn transparently turn an integer Object into an
int by using __index__(), so add an IntegerLike protocol for this and
use it everywhere applicable.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:16:50 -07:00
Omar Sandoval
20bcde1f1d drgn 0.0.7
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 23:32:32 -07:00
Omar Sandoval
025989871b drgn 0.0.6
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:25:54 -07:00
Omar Sandoval
e3309765f9 helpers: add kaslr_offset() and move pgtable_l5_enabled()
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:00:16 -07:00
Omar Sandoval
e7f353c118 libdrgn: hash_table: clean up coding style
Clean up the coding style of the remaining few places that the last
couple of changes didn't rewrite.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 11:53:05 -07:00
Omar Sandoval
f94b0262c6 libdrgn: hash_table: implement vector storage policy
The folly F14 implementation provides 3 storage policies: value, node,
and vector. The default F14FastMap/F14FastSet chooses between the value
and vector policies based on the value size.

We currently only implement the value policy, as the node policy is easy
to emulate and the vector policy would've added more complexity. This
adds support for the vector policy (adding even more C abuse :) and
automatically chooses the policy the same way as folly. It'd be easy to
add a way to choose the policy if needed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 11:53:00 -07:00
Omar Sandoval
9ea11a7c26 libdrgn: hash_table: port reserve optimization
The only major change to the folly F14 implementation since I originally
ported it is commit 3d169f4365cf ("memory savings for F14 tables with
explicit reserve()"). That is a small improvement for small tables and a
large improvement for vector tables, which are about to be added.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
2eab47ce9e libdrgn: hash_table: use posix_memalign() instead of aligned_alloc()
posix_memalign() doesn't have the restriction that the size must be a
multiple of the alignment like aligned_alloc() does in C11.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
2409868409 libdrgn: hash_table: define chunk alignment constant
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
6d4af7e17e libdrgn: dwarf_info_cache: handle variables DW_AT_const_value
Compile-time constants have DW_AT_const_value instead of DW_AT_location.
We can translate those to a value object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 15:23:51 -07:00
Omar Sandoval
213c148ce6 libdrgn: dwarf_info_cache: handle DW_AT_endianity
Variables can have a non-default endianity. Handle it and clean up
variable endian handling.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 14:26:58 -07:00
Omar Sandoval
c840072d05 libdrgn: make drgn_object_set_buffer() take a void *
It's awkward to make callers cast to char *.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 10:25:03 -07:00
Omar Sandoval
f1eaf5b14c libdrgn: add load_debug_info example program
Really it's more of a test program than an example program. It's useful
for benchmarking, testing with valgrind, etc. It's not built by default,
but it can be built manually with:

  $ make -C build/temp.* examples/load_debug_info

And run with:

  $ ./build/temp.*/examples/load_debug_info

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-10 16:18:58 -07:00
Omar Sandoval
3028da4d1d libdrgn: compare language in drgn_type_eq()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-08 22:07:49 -07:00
Omar Sandoval
1b47b866b4 libdrgn: go back to trusting PRSTATUS PID
Commit eea5422546 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
2020-07-08 18:34:16 -07:00
Omar Sandoval
293418294a libdrgn: assume compiler uses sane integer implementation
I once tried to implement a generic arithmetic right shift macro without
relying on any implementation-defined behavior, but this turned out to
be really hard. drgn is fairly tied to GCC and GCC-compatible compilers
(like Clang), so let's just assume GCC's model [1]: modular conversion
to signed types, two's complement signed bitwise operators, and sign
extension for signed right shift.

1: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
2020-07-07 17:18:17 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
e4c52c5422 libdrgn: linux_kernel: use names for kmod index constants
This makes it much easier to follow along with the code and understand
the format.
2020-06-30 15:14:21 -07:00
Omar Sandoval
03d8cb0e32 libdrgn: fix hash_pair_from_non_avalanching_hash() on 64-bit without SSE 4.2
We were forgetting to mask away the extra bits. There are two places
that we use the tag without converting it to a uint8_t:
hash_table_probe_delta(), which is mostly benign since we mask it by the
chunk mask anyways; and table_chunk_match() without SSE 2, which
completely breaks.

While we're here, let's align the comments better.
2020-06-24 13:33:08 -07:00
Omar Sandoval
8e7c1f1009 drgn 0.0.5 2020-05-26 10:04:06 -07:00
Omar Sandoval
a227d0d50e Update elfutils and revert activation frame patch
After thinking about it some more, I realized that "libdwfl: simplify
activation frame logic" breaks the case where during unwinding someone
queries isactivation for reasons other than knowing whether to decrement
program counter. Revert the patch and refactor "libdwfl: add interface
for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it
differently.

Based on:

c95081596 size: Also obey radix printing for bsd format.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-20 13:38:49 -07:00
Omar Sandoval
eea5422546 libdrgn: make Linux kernel stack unwinding more robust
drgn has a couple of issues unwinding stack traces for kernel core
dumps:

1. It can't unwind the stack for the idle task (PID 0), which commonly
   appears in core dumps.
2. It uses the PID in PRSTATUS, which is racy and can't actually be
   trusted.

The solution for both of these is to look up the PRSTATUS note by CPU
instead of PID.

For the live kernel, drgn refuses to unwind the stack of tasks in the
"R" state. However, the "R" state is running *or runnable*, so in the
latter case, we can still unwind the stack. The solution for this is to
look at on_cpu for the task instead of the state.
2020-05-20 12:03:00 -07:00
Omar Sandoval
146930aff8 libdrgn: replace arch frame_registers with callbacks
We currently unwind from pt_regs and NT_PRSTATUS using an array of
register definitions. It's more flexible and more efficient to do this
with an architecture-specific callback. For x86-64, this change also
makes us depend on the binary layout rather than member names of struct
pt_regs, but that shouldn't matter unless people are defining their own,
weird struct pt_regs.
2020-05-19 17:11:27 -07:00
Omar Sandoval
4d8597f0f8 libdrgn: add THREAD_SIZE to Linux kernel object finder
Despite the naming, this is the kernel stack size.
2020-05-19 17:10:54 -07:00
Omar Sandoval
971a2d3687 libdrgn/python: make Objects fully immutable
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
2020-05-18 00:07:49 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
c339113f9c libdrgn: adjust program counter when looking up frame symbol
For functions that call a noreturn function, the compiler may omit code
after the call instruction. This means that the return address may not
lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is
an "activation", i.e., its program counter is guaranteed to lie within
the caller. This is only the case for the initial frame, frames
interrupted by a signal, and the signal trampoline frame. For everything
else, we need to decrement the program counter before doing any lookups.
2020-05-13 17:11:54 -07:00
Omar Sandoval
175f83fc23 Update elfutils with noreturn unwinding fix
Rebase on master and fix dwfl_frame_module/dwfl_frame_dwarf_frame to
decrement the program counter when necessary.

Based on:

a8493c12a libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: simplify activation frame logic
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-13 16:41:52 -07:00
Omar Sandoval
bf545105c6 libdrgn: build in silent mode by default
The automake/libtool compilation output is obnoxiously verbose. Switch
on automake's silent mode, and make the custom rules honor it.
2020-05-10 00:12:50 -07:00
Omar Sandoval
2d1481f5ab libdrgn: add page table walker kernel memory reader
Now that we can walk page tables, we can use it in a memory reader that
reads kernel memory via the kernel page table. This means that we don't
need libkdumpfile for ELF vmcores anymore (although I'll keep the
functionality around until this code has been validated more).
2020-05-08 17:37:56 -07:00
Omar Sandoval
e697be707c libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET
I originally wanted to avoid depending on another vmcoreinfo field, but
an the next change is going to depend on swapper_pg_dir in vmcoreinfo
anyways, and it ends up being simpler to use it.
2020-05-08 17:37:56 -07:00
Omar Sandoval
8a276838ac helpers: add access_process_vm() and access_remote_vm()
Now that we can walk page tables, we can finally read memory from
userspace tasks.

Closes #53.
2020-05-08 17:37:01 -07:00
Omar Sandoval
d0a1718451 libdrgn: implement virtual address translation/page table walking
There are a few big use cases for this in drgn:

* Helpers for accessing memory in the virtual address space of userspace
  tasks.
* Removing the libkdumpfile dependency for vmcores.
* Handling gaps in the virtual address space of /proc/kcore (cf. #27).

I dragged my feet on implementing this because I thought it would be
more complicated, but the page table layout on x86-64 isn't too bad.
This commit implements page table walking using a page table iterator
abstraction. The first thing we'll add on top of this will be a helper
for reading memory from a virtual address space, but in the future it'd
also be possible to export the page table iterator directly.
2020-05-08 17:36:19 -07:00
Omar Sandoval
63299e0701 libdrgn: actually use uint64_t for two's complement unary ops
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
2020-05-08 13:50:24 -07:00
Omar Sandoval
8f81ea255f libdrgn: don't use unaligned loads to parse DWARF
-fsanitize=undefined reports that the read_u* helpers rely on unaligned
loads. Use memcpy() instead.
2020-05-08 13:50:24 -07:00
Omar Sandoval
3d59e042f4 libdrgn: don't open-code fls()
c_integer_literal() has an open-coded equivalent of fls() that assumes
that unsigned long long is 64 bits. Use fls() instead.
2020-05-08 00:20:42 -07:00
Omar Sandoval
340e00dfb5 libdrgn: improve and document bit operations
fls() can be implemented with __bitop(), and we can get rid of clz() since
it's only used by fls().
2020-05-08 00:14:25 -07:00
Omar Sandoval
f49d68d8f9 libdrgn: split generic utility functions out of internal.h
internal.h includes both drgn-specific helpers and generic utility
functions. Split the latter into their own util.h header and use it
instead of internal.h in the generic data structure code. This makes it
easier to copy the data structures into other projects/test programs.
2020-05-07 16:03:43 -07:00
Omar Sandoval
a95e42ef2e libdrgn/python: use vector for Program_load_debug_info()
Program_load_debug_info() is the last user of the
resize_array()/realloc_array() utility functions. We can clean it up by
using a vector and finally get rid of those functions.

This also happens to fix three bugs in Program_load_debug_info(): we
weren't setting a Python exception if we couldn't allocate the path_args
array, we weren't zeroing path_args after resizing the array, and we
weren't freeing the path_args array. Shame on whoever wrote this.
2020-05-07 15:47:57 -07:00
Omar Sandoval
0a100064c1 libdrgn: improve and rename DRGN_UNREACHABLE()
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.

DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
2020-05-07 15:16:22 -07:00
Omar Sandoval
d759c7ed20 libdrgn: get rid of OFF_MAX
This hasn't been used since commit 417a6f0d76 ("libdrgn: make memory
reader pluggable with callbacks").
2020-05-07 14:41:05 -07:00
Omar Sandoval
23574e59d5 libdrgn: add /proc/kcore physical segments on old kernels
Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so
it's currently not possible to read from physical memory on old kernels.
However, if we can figure out the address of the direct mapping, then we
can determine the corresponding physical addresses for the segments and
add them.
2020-05-04 13:20:27 -07:00
Omar Sandoval
f8c33518eb libdrgn: handle kernel core dumps with all zero p_paddr
We treat core dumps with all zero p_paddrs as not having valid physical
addresses. However, it is theoretically possible for a kernel core dump
to only have one segment which legitimately has a p_paddr of 0 (e.g., if
it only has a segment for the direct mapping, although note that this
isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the
BIOS [1]).

If the core dump has a VMCOREINFO note, then it is either a vmcore,
which has valid physical addresses, or it is /proc/kcore with Linux
kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to
/proc/kcore") (in v4.19), so it must also have Linux kernel commit
464920104bf7 ("/proc/kcore: update physical address for kcore ram and
text") (in v4.11) (ignoring the possibility of a franken-kernel which
backported the former but not the latter). Therefore, treat core dumps
with a VMCOREINFO note as having valid physical addresses.

1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678
2020-05-04 13:20:27 -07:00
Omar Sandoval
5505628235 libdrgn: get rid of struct drgn_program.num_file_segments
This isn't used anymore. Remove it and simplify the loop adding file
segments.
2020-05-04 13:20:27 -07:00
Omar Sandoval
510b6ef3b5 libdrgn: read vmcoreinfo physical address as uint64_t
Even on 32-bit architectures, physical addresses can be 64 bits (e.g.,
x86 with PAE). It's unlikely that the vmcoreinfo note would be in such
an address, but let's always parse it as a uint64_t just to be safe.
2020-05-04 13:20:27 -07:00
Omar Sandoval
10e58777c3 Add Program.read_{u8,u16,u32,u64,word}()
I've found that I do this manually a lot (e.g., when digging through a
task's stack). Add shortcuts for reading unsigned integers and a note
for how to manually read other formats.
2020-04-27 17:27:10 -07:00
Omar Sandoval
b1315fcaa1 libdrgn: add drgn_program_bswap()
This is clearer than open-coding the endianness check.
2020-04-27 17:08:02 -07:00
Omar Sandoval
1af57230c4 drgn 0.0.4 2020-04-21 15:52:01 -07:00
Omar Sandoval
ea3cdb43f7 libdrgn/python: optimize Object.__getattribute__()
DrgnObject_getattro() uses PyObject_GenericGetAttr() and catches the
AttributeError raised if the name is not an attribute of the Object
class. If the member is found, we then destroy the AttributeError.
Raising an exception only to destroy it is obviously wasteful. Luckily,
as of Python 3.7, the lower-level _PyObject_GenericGetAttrWithDict() can
suppress the AttributeError; we can raise it ourselves if we need it. In
my microbenchmarks, this makes Object.__getattribute__() at least twice
as fast when the member exists.

This also fixes a drgn_error leak.
2020-04-20 17:45:30 -07:00
Jeffrey Mahoney
9bb2ccecb7 Enable DWARF indexing to work with partial units
Loading debuginfo from a kernel with separate or re-combined debuginfo
results in the following failure:

Traceback (most recent call last):
  File "/home/jeffm/.local/bin/drgn", line 11, in <module>
    load_entry_point('drgn==0.0.3+39.gbf05d9bf', 'console_scripts', 'drgn')()
  File "/home/jeffm/.local/lib/python3.6/site-packages/drgn-0.0.3+39.gbf05d9bf-py3.6-linux-x86_64.egg/drgn/internal/cli.py", line 121, in main
    prog.load_debug_info(args.symbols, **args.default_symbols)
Exception: invalid DW_AT_decl_file 10

A typical kernel build results in DWARF information with
DW_TAG_compile_unit tags but the objcopy --only-keep-debug and eu-unstrip
commands use DW_TAG_partial_unit tags.  The DWARF indexer only handles
the former, so the file table isn't populated and we get the exception.

Fortunately, the two are more or less interchangeable. (See
http://dwarfstd.org/doc/Dwarf3.pdf, section E.2.3).  Accepting both for
indexing compile units results in a working session.
2020-04-14 14:11:24 -07:00
Omar Sandoval
a3248b51e3 libdrgn: fix use after free when formatting compound types
compound_initializer_init_next() saves a pointer to the compound
initializer stack and uses it after appending to the stack, which may
have reallocated the stack.
2020-04-13 16:49:18 -07:00
Jay Kamat
ecef9d74ef libdrgn: get rid of arrays embedded in drgn_type
For C++ support, we need to add an array of template parameters to
struct drgn_type. struct drgn_type already has arrays for members,
enumerators, and parameters embedded at the end of the structure,
because no type needs more than one of those. However, struct, union,
and class types may need members and template parameters. We could add a
separate array of templates, but then it gets confusing having two
methods of storing arrays in struct drgn_type. Let's make these arrays
separate instead of embedding them.
2020-04-13 16:47:05 -07:00
Omar Sandoval
7a9fad0fd2 libdrgn: move _vmemmap() to object finder
Similarly to PAGE_OFFSET, vmemmap makes more sense as part of the Linux
kernel object finder than an internal helper.

While we're here, let's fix the definition for 5-level page tables. This
only matters for kernels with commit 77ef56e4f0fb ("x86: Enable 5-level
paging support via CONFIG_X86_5LEVEL=y") but without eedb92abb9bb
("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y")
(namely, v4.14, v4.15, and v4.16); since v4.17, 5-level page table
support enables KASLR.
2020-04-10 15:33:29 -07:00
Omar Sandoval
5ac95e491a libdrgn: fix _page_offset() helper and move to object finder
The internal _page_offset() helper gets the value of PAGE_OFFSET, but
the fallback when KASLR is disabled has been out of date since Linux
v4.20 and never handled 5-level page tables. Additionally, it makes more
sense as part of the Linux kernel (formerly vmcoreinfo) object finder so
that it's cleanly accessible outside of drgn internals.
2020-04-10 15:33:27 -07:00
Omar Sandoval
1dbc718840 helpers: add pgtable_l5_enabled() 2020-04-10 15:18:46 -07:00
Jeff Mahoney
bf05d9bf3f libdrgn: allow to build without openmp
The configure script allows the user to not use any openmp
implementation but dwarf_index.c uses the locking APIs unconditionally.
This compiles but fails at runtime.

Adding simple stubs for the locking API. This is useful when debugging
crashes in dwarf indexing during development.
2020-04-08 12:33:40 -07:00
Omar Sandoval
512acbc9c8 Rebase elfutils to pick up PN_XNUM fix
elfutils 0.179 was just released, and it includes my fix for vmcores
with >= 2^16 phdrs.

Based on:

3a7728808 Prepare for 0.179

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-04-06 12:07:01 -07:00
Jay Kamat
d8fadf10ee libdrgn: Add cpp language and tests 2020-04-03 16:35:38 -07:00
Omar Sandoval
fa61977f60 libdrgn: fix default language detection
Jay reported that the default language detection was happening too early
and not finding "main". We need to make sure to do it after the DWARF
index is actually populated. The problem with that is that it makes
error reporting much harder, as we don't want to return a fatal error
from drgn_program_set_language_from_main() if we actually succeeded in
loading debug info. That means we probably need to ignore errors in
drgn_program_set_language_from_main(). To reduce the surface area where
we'd be failing, let's get the language directly from the DWARF index.
This also allows us to avoid setting the language if it's actually
unknown (information which is lost by the time we convert it to a
drgn_object in the current code).
2020-04-03 16:35:38 -07:00
Omar Sandoval
bd902f299c libdrgn: return unknown language from drgn_language_from_die()
Currently, drgn_language_from_die() returns the default language when it
encounters an unknown DW_LANG because the dwarf_info_cache always wants
a language. The next change will want to detect the unknown language
case, so make drgn_language_from_die() return NULL if the language is
unknown, move it to language.c, and fold drgn_language_from_dw_lang()
into it.
2020-04-03 16:35:38 -07:00
Serapheim Dimitropoulos
08193a97aa Support stack traces for running threads on kdumps 2020-03-27 16:12:03 -07:00
Omar Sandoval
79f973007b libdrgn/python: fix reference counting on Object.type_
We need to keep the Program alive for its types to stay valid, not just
the objects the Program has pinned. (I have no idea why I changed this
in commit 565e0343ef ("libdrgn: make symbol index pluggable with
callbacks").)
2020-03-13 16:05:43 -07:00
Omar Sandoval
cae7336750 libdrgn: fix error when expecting identifier after tag in type name
We should be looking at the kind of the previous token, not the kind of
the unexpected token.

Closes #52.
2020-03-13 11:07:46 -07:00
Jay Kamat
3f870603fa libdrgn: add default language to drgn_program
For operations where we don't have a type available, we currently fall
back to C. Instead, we should guess the language of the program and use
that as the default. The heurisitic implemented here gets the language
of the CU containing "main" (except for the Linux kernel, which is
always C). In the future, we should allow manually overriding the
automatically determined language.
2020-02-26 19:55:42 -08:00
Jay Kamat
6c264b0eae libdrgn: add language to struct drgn_type
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
2020-02-26 19:55:42 -08:00
Omar Sandoval
3d3c32f849 libdrgn/python: add Language to Python bindings 2020-02-26 19:55:42 -08:00
Omar Sandoval
9e2df9f217 libdrgn: put language definitions in one array
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
2020-02-26 19:55:42 -08:00
Omar Sandoval
c1e1724c8e libdrgn: unify drgn_type_from_dwarf_child{,_internal}
The plain variant is a trivial wrapper around the internal variant, so
get rid of the wrapper and use the internal variant directly everywhere.
2020-02-26 19:55:42 -08:00
Omar Sandoval
efccc93e65 libdrgn: remove can_be_void from drgn_lazy_type_from_dwarf
We only lazily evaluate compound type members and function type
parameters, which are never void.
2020-02-26 19:55:42 -08:00
Omar Sandoval
376979d25a Remove stray reference to gen_docstrings.py 2020-02-25 13:58:10 -08:00
Omar Sandoval
80c9fb35ff Add type hint stubs and generate documentation from them
I've been wanting to add type hints for the _drgn C extension for
awhile. The main blocker was that there is a large overlap between the
documentation (in docs/api_reference.rst) and the stub file, and I
really didn't want to duplicate the information. Therefore, it was a
requirement that the the documentation could be generated from the stub
file, or vice versa. Unfortunately, none of the existing tools that I
could find supported this very well. So, I bit the bullet and wrote my
own Sphinx extension that uses the stub file as the source of truth (and
subsumes my old autopackage extension and gen_docstrings script).

The stub file is probably incomplete/inaccurate in places, but this
should be a good starting point to improve on.

Closes #22.
2020-02-25 13:39:06 -08:00
Omar Sandoval
52e9b2f8d8 drgn 0.0.3 2020-02-21 10:37:02 -08:00
Serapheim Dimitropoulos
e3789512ab Fix leak in kdump code
prog->kdump_ctx is never really initialized, and the kdump_ctx struct
allocated in drgn_program_set_kdump() is leaked.
2020-02-20 15:49:44 -08:00
Omar Sandoval
9246094cdc libdrgn: use dwfl_frame_register() instead of dwfl_frame_eval_expr()
I thought I'd be able to avoid adding a separate API for register values
and reuse dwfl_frame_eval_expr(), but this doesn't work if the frame is
missing debug information but has known register values (e.g., if the
program crashed with an invalid instruction pointer).
2020-02-20 14:13:08 -08:00
Omar Sandoval
016189f477 Update elfutils with improved stack frame interface
Rebase on master, add the improved
dwfl_frame_module/dwfl_frame_dwarf_frame patch, and add the
dwfl_frame_register patch.

Based on:

889edd912 PR25365: debuginfod-client: restrict cleanup to client-pattern files

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-02-20 13:49:10 -08:00
Omar Sandoval
a5cd92f24e libdrgn: make vmcoreinfo accessible before loading debug info
UTS_RELEASE is currently only accessible once debug info is loaded with
prog.load_debug_info(main=True). This makes it difficult to get the
release, find the appropriate vmlinux, then load the found vmlinux. We
can add vmcoreinfo_object_find as part of set_core_dump(), which makes
it possible to do the following:

  prog = drgn.Program()
  prog.set_core_dump(core_dump_path)
  release = prog['UTS_RELEASE'].string_()
  vmlinux_path = find_vmlinux(release)
  prog.load_debug_info([vmlinux_path])

The only downside is that this ends up using the default definition of
char rather than what we would get from the debug info, but that
shouldn't be a big problem.
2020-02-19 12:11:45 -08:00
Omar Sandoval
cc18d9e502 libdrgn: add UTS_RELEASE to vmcoreinfo_object_find
The osrelease is accessible via init_uts_ns.name.release, but we can
also get it straight out of vmcoreinfo, which will be useful for the
next change. UTS_RELEASE is the name of the macro defined in the kernel.
2020-02-19 12:11:20 -08:00
Omar Sandoval
26ef465007 libdrgn/python: add proper type for members and parameters
This continues the conversion from the last commit. Members and
parameters are basically the same, so we can do them together. Unlike
enumerators, these don't make sense to unpack or access as sequences.
2020-02-12 15:40:19 -08:00
Omar Sandoval
7c70a1a384 libdrgn/python: add proper type for enumerators
Currently, type members, enumerators, and parameters are all represented
by tuples in the Python bindings. This is awkward to document and
implement. Instead, let's replace these tuples with proper types,
starting with the easiest one, TypeEnumerator. This one still makes
sense to treat as a sequence so that it can be unpacked as (name,
value).
2020-02-12 15:37:41 -08:00
Jay Kamat
31d544949f libdrgn: Add find_symbol_by_name to look up ELF symbols 2020-02-12 14:06:49 -08:00
Jay Kamat
054cb54a01 libdrgn: Rename find_symbol to find_symbol_by_address 2020-02-12 14:06:49 -08:00
Omar Sandoval
9de2cc8410 libdrgn/python: make Object.__index__() TypeError message clearer
Currently, we print:

>>> prog.symbol(prog['init_task'])
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: cannot convert 'struct task_struct' to index

It's not obvious what it means to convert to an index. Instead, let's
use the error message raised by operator.index():

TypeError: 'struct task_struct' object cannot be interpreted as an integer
2020-02-11 09:19:53 -08:00
Serapheim Dimitropoulos
80fef04c70 Add address attribute to FaultError exception 2020-02-04 14:59:31 -08:00
Serapheim Dimitropoulos
ad82e9623a Introduce OutOfBoundsError
Decouple some of the responsibilities of FaultError to
OutOfBoundsError so consumers can differentiate between
invalid memory accesses and running out of bounds in
drgn Objects which may be based on valid memory address.
2020-02-04 14:59:31 -08:00
Omar Sandoval
653e923657 drgn 0.0.2 2020-01-31 13:25:33 -08:00
Omar Sandoval
d4cc7945af Support building with alternative OpenMP runtime libraries
At Facebook, we link OpenMP code with libomp instead of libgomp. We have
an internal patch to drgn to do this, as it can't be done by setting
CFLAGS/LDFLAGS. Let's add a way to specify the OpenMP library at
configure time so that we can drop the internal patch.
2020-01-24 10:22:38 -08:00
Omar Sandoval
0088618578 Include git revision in version
When investigating a reported bug, it's important to know which exact
version of drgn is being used. Let's include the git revision in the
version printed by the CLI in PEP440 format. This commit will also serve
as the 0.0.1 release.
2020-01-23 12:29:43 -08:00
Omar Sandoval
660276a0b8 Format Python code with Black
I'm not a fan of 100% of the Black coding style, but I've spent too much
time manually formatting Python code, so let's just pull the trigger.
2020-01-14 11:51:58 -08:00
Omar Sandoval
e8d1ef82fa Make drgn.h depend on configure.ac
The previous commit forgot to add this dependency so that when the
version number is updated drgn.h actually gets regenerated.
2020-01-11 22:34:03 -08:00
Omar Sandoval
09a64f5cba Define version in libdrgn/configure.ac
Currently the drgn version number is defined in drgn.h.in, and configure
and setup.py both parse it out of there. However, now that we're
generating drgn.h anyways, it's easier to make configure.ac the source
of truth.
2020-01-11 10:05:57 -08:00
Omar Sandoval
1443d17fb4 libdrgn: add DRGN_FORMAT_OBJECT_IMPLICIT_ELEMENTS 2019-12-19 11:43:54 -08:00
Omar Sandoval
db66952b2e libdrgn: add DRGN_FORMAT_OBJECT_IMPLICIT_MEMBERS 2019-12-19 11:43:54 -08:00
Omar Sandoval
c8434e9a9e libdrgn: add DRGN_FORMAT_OBJECT_ELEMENT_INDICES 2019-12-19 11:43:54 -08:00
Omar Sandoval
cfceb491db libdrgn: add DRGN_FORMAT_OBJECT_MEMBER_NAMES 2019-12-19 11:43:54 -08:00
Omar Sandoval
4fad941ec1 libdrgn: add DRGN_FORMAT_OBJECT_{MEMBERS,ELEMENTS}_SAME_LINE 2019-12-19 11:43:54 -08:00
Omar Sandoval
c3f69ba559 libdrgn: use c_format_initializer for struct/union/class 2019-12-19 11:43:54 -08:00
Omar Sandoval
e2d2df4024 libdrgn: factor c_format_initializer out of c_format_array
This will also be used for compound types, and we're going to add a few
more options that we should handle in one place.
2019-12-19 11:43:54 -08:00
Omar Sandoval
6bb8da04a0 libdrgn: omit trailing comma when formatting one-line array
This is somewhat arbitrary, but I think it looks more natural to only
use the trailing comma for multi-line initializers.
2019-12-19 11:43:54 -08:00
Omar Sandoval
1411ba36a8 libdrgn: remove dead code in c_format_array_object
When we're checking whether the element that we formatted on one line
would fit on the previous line, we check whether the previous line is
empty with remaining_columns == start_columns. This is never true, as
remaining_columns is always set to start_columns - 1 at most, and it
only decreases from there until we start  a new line.
2019-12-19 11:43:54 -08:00
Omar Sandoval
7a3bf73df0 libdrgn: replace drgn_object_truthiness() with drgn_object_is_zero()
drgn_object_truthiness() is a misnomer, as truthiness is a
language-specific concept. Instead, invert the return value and rename
it to drgn_object_is_zero(), which more accurately conveys the meaning.
2019-12-19 11:43:54 -08:00
Omar Sandoval
d77b7bd7e3 libdrgn: add DRGN_FORMAT_OBJECT_{TYPE_NAME,MEMBER_TYPE_NAMES,ELEMENT_TYPE_NAMES} 2019-12-19 11:43:54 -08:00
Omar Sandoval
89307c532a libdrgn: add DRGN_FORMAT_OBJECT_CHAR 2019-12-19 11:43:54 -08:00
Omar Sandoval
7cee597fff libdrgn: add DRGN_FORMAT_OBJECT_STRING 2019-12-19 11:43:54 -08:00
Omar Sandoval
5865fa4d16 libdrgn: add DRGN_FORMAT_OBJECT_SYMBOLIZE 2019-12-19 11:43:54 -08:00
Omar Sandoval
0a707b0c9d libdrgn: rework drgn_find_symbol_internal()
Instead of having two internal variants (drgn_find_symbol_internal() and
drgn_program_find_symbol_in_module()), combine them into the former and
add a separate drgn_error_symbol_not_found() for translating the static
error to the user-facing one. This makes things more flexible for the
next change.
2019-12-19 11:43:54 -08:00
Omar Sandoval
f58bc4bf3a libdrgn: add DRGN_FORMAT_OBJECT_DEREFERENCE 2019-12-19 11:43:54 -08:00
Omar Sandoval
5fb02f03fd libdrgn: add flags to drgn_format_object() 2019-12-19 11:43:54 -08:00
Omar Sandoval
cf3a07bdfb libdrgn: python: replace Object.__format__ with Object.format_
We'd like to have more control over how objects are formatted. I
considered defining a custom string format specification syntax, but
that's not easily discoverable. Instead, let's get rid of the current
format specification support and replace it with a normal method.
2019-12-19 11:43:52 -08:00
Omar Sandoval
3b22bd3022 libdrgn: rename pretty_print -> format
In preparation for making drgn_pretty_print_object() more flexible
(i.e., not always "pretty"), rename it to drgn_format_object(). For
consistency, let's rename drgn_pretty_print_type_name(),
drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.
2019-12-16 11:21:12 -08:00
Serapheim Dimitropoulos
501d36c18e libdrgn: fix regression in kernel module loading
Commit f327552229 ("libdrgn: add strstartswith()") flipped the test
for a name entry in modinfo. This introduced a regression resulting in
kernel modules not loading at the right offset. This patch fixes the
regression.
2019-12-13 19:19:31 -05:00
Omar Sandoval
54e3e4a6d6 Rebase elfutils and remove dwfl_addrmodule patches
The previous commit was the real fix for the failed symbol lookups. On
the bright side, the build fixes were merged, so we can rebase on master
and drop those.

Based on:

277c2c54f libcpu: Compile i386_lex.c with -Wno-implicit-fallthrough

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2019-12-12 21:14:51 -08:00
Omar Sandoval
b0c4f894d4 libdrgn: really fix failed kernel module symbol lookups
It turns out this wasn't a problem with dwfl_addrmodule() at all; the
real problem is that .init sections are freed once the module is loaded
but we're still considering them for the address range we pass to
dwfl_report_module(). Ignore those sections entirely (by omitting them
from the section name to section index map). While we're here, let's not
bother inserting non-SHF_ALLOC sections in the map.
2019-12-12 21:14:02 -08:00
Omar Sandoval
f327552229 libdrgn: add strstartswith()
Instead of open coding this check all over the place, add a helper
function.
2019-12-12 13:26:50 -08:00
Omar Sandoval
ad5c925aff Update elfutils with dwfl_addrmodule fix
This fixes the issue that Program.symbol() sometimes fails for kernel
module symbols.

Based on:

2c7c4037 elfutils.spec.in: Sync with fedora spec, remove rhel/fedora specifics.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
libdwfl: return error from __libdwfl_relocate_value for unloaded sections
libdwfl: remove broken coalescing logic in dwfl_report_segment
libdwfl: store module lookup table separately from segments
libdwfl: use sections of relocatable files for dwfl_addrmodule
2019-12-11 22:34:05 -08:00
Omar Sandoval
4a8152175b libdrgn: translate EIO from /proc/$pid/mem to DRGN_ERROR_FAULT
For live userspace processes, we add a single [0, UINT64_MAX) memory
file segment for /proc/$pid/mem. Of course, not every address in that
range is valid; reading from an invalid address returns EIO. We should
translate this to a DRGN_ERROR_FAULT instead of DRGN_ERROR_OS, but only
for /proc/$pid/mem.
2019-12-10 13:30:34 -08:00
Omar Sandoval
248cec7f7c libdrgn: python: fix uninitialized index_args
In commit 55a9700435 ("libdrgn: python: accept integer-like arguments
in more places"), I converted Program_symbol to use index_converter but
forgot to initialize the struct index_arg. Then, in commit c243daed59
("Translate find_task() helper (and dependencies) to C"), I added a
bunch more cases of uninitialized struct index_arg. If
index_arg.allow_none gets a non-zero garbage value, then this can end up
allowing None through when it shouldn't. Furthermore, since commit
2561226918 ("libdrgn: python: add signed integer support to
index_converter"), if index_arg.is_signed gets a non-zero garbage value,
then this will try to get a signed integer when we're expecting an
unsigned integer, which can blow up for values >= 2**63 (like kernel
symbols). Fix it by initializing struct index_arg everywhere.

Fixes #30.
2019-12-05 14:35:54 -08:00
Omar Sandoval
d3afc63ac9 Update to elfutils 0.178
Rebase on 0.178. The only additional change needed is to pass
--disable-debuginfod to configure.

Based on:

2c7c4037 elfutils.spec.in: Sync with fedora spec, remove rhel/fedora specifics.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2019-12-03 12:39:11 -08:00
Omar Sandoval
7b518fc2fd libdrgn: support negative array subscripts
This was an oversight, as negative indices are completely valid (and
occasionally useful, like when looking at a stack).
2019-11-29 21:06:37 -08:00
Omar Sandoval
2561226918 libdrgn: python: add signed integer support to index_converter
This is preparation for the next change.
2019-11-29 20:40:40 -08:00
Omar Sandoval
dd59e5431c libdrgn: fix extremely slow type comparison
Matt Ahrens reported that comparing two types would sometimes end up in
a seemingly infinite loop, which he discovered was because we repeat
comparisons of types as long as they're not in a cycle. Fix it by
caching all comparisons during a call.
2019-11-24 09:46:00 -08:00
Omar Sandoval
b8b93ae3e6 libdrgn: python: fix deprecation warning in unit tests
Some tests (e.g., tests.test_object.TestSpecialMethods.test_round) are
printing:

  DeprecationWarning: an integer is required (got type float).  Implicit
  conversion to integers using __int__ is deprecated, and may be removed
  in a future version of Python.

See https://bugs.python.org/issue36048. This is coming from calls like:

  Object(prog, 'int', value=1.5)

We actually want the truncating behavior, so explicitly call
PyNumber_Long().
2019-11-22 17:18:55 -08:00
Omar Sandoval
6af6159cfc libdrgn: support loading only load main debug info
If we only want debugging information for vmlinux and not kernel
modules, it'd be nice to only load the former. This adds a load_main
parameter to drgn_program_load_debug_info() which specifies just that.
For now, it's only implemented for the Linux kernel. While we're here,
let's make the paths parameter optional for the Python bindings.
2019-11-22 16:38:52 -08:00
Omar Sandoval
09108d22fa libdrgn: x86_64: support unwinding stack on Linux < 4.9 2019-11-22 16:38:49 -08:00
Amlan Nayak
0df2152307 Add basic class type support
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
2019-11-18 10:36:40 -08:00
Omar Sandoval
b49f773fe6 libdrgn: python: fix build on Python 3.8
Python 3.8 replaced the unused void *tp_print field with Py_ssize_t
tp_vectorcall_offset, so with -Werror we get "error: initialization of
‘long int’ from ‘void *’ makes integer from pointer without a cast".
Let's just use designated initializers.
2019-11-15 10:41:58 -08:00
Omar Sandoval
1c8eced0c6 libdrgn: stack_trace: support unwinding stack from struct pt_regs
Linux kernel IRQ handlers store the registers from before the interrupt
as struct pt_regs, so add a way to unwind the stack given only that
structure.
2019-10-28 13:56:54 -07:00
Omar Sandoval
4780c7a266 libdrgn: stack_trace: prohibit unwinding stack of running tasks
We currently don't check that the task we're unwinding is actually
blocked, which means that linux_kernel_set_initial_registers_x86_64()
will get garbage from the stack and we'll return a nonsense stack trace.
Let's avoid this by checking that the task isn't running if we didn't
find a NT_PRSTATUS note.
2019-10-28 13:37:57 -07:00
Omar Sandoval
326107f054 libdrgn: add task_state_to_char() helper
Add a helper to get the state of a task (e.g., 'R', 'S', 'D'). This will
be used to make sure that a task is not running when getting a stack
trace, so implement it in libdrgn.
2019-10-28 13:37:57 -07:00
Omar Sandoval
91f5c8e2e7 libdrgn: stack_trace: support unwinding stack from thread ID
When debugging the Linux kernel, it's inconvenient to have to get the
task_struct of a thread in order to get its stack trace. This adds
support for looking it up solely by PID. In that case, we do the
find_task() inside of libdrgn. This also gives us stack trace support
for userspace core dumps almost for free since we already added support
for NT_PRSTATUS.
2019-10-28 13:37:53 -07:00
Omar Sandoval
0f7ad0ed26 libdrgn: stack_trace: support unwinding stack from core dump
vmcores include a NT_PRSTATUS note for each CPU containing the PID of
the task running on that CPU at the time of the crash and its registers.
We can use that to unwind the stack of the crashed tasks.
2019-10-28 13:36:02 -07:00
Omar Sandoval
75c74022ff libdrgn: save Elf handle for core dump
Currently, we close the Elf handle in drgn_set_core_dump() after we're
done with it. However, we need the Elf handle in
userspace_report_debug_info(), so we reopen it temporarily. We will also
need it to support getting stack traces from core dumps, so we might as
well keep it open. Note that we keep it even if we're using libkdumpfile
because libkdumpfile doesn't seem to have an API to access ELF notes.
2019-10-28 13:09:25 -07:00
Omar Sandoval
c243daed59 Translate find_task() helper (and dependencies) to C
We'd like to be able to look up tasks by PID from libdrgn, but those
helpers are written in Python. Translate them to C and add some thin
bindings so we can use the same implementation from Python.
2019-10-28 13:08:57 -07:00
Omar Sandoval
b5735de8dc libdrgn: add drgn_object_read_integer()
There are some cases where we want to read an integer regardless of its
signedness, so drgn_object_read_signed() and drgn_object_read_unsigned()
are cumbersome to use, and drgn_object_read_value() is too permissive.
2019-10-28 13:06:38 -07:00
Omar Sandoval
97b5967c37 libdrgn: add a couple of helpers for working with buffer and reference objects
Expose drgn_object_buffer() and add drgn_buffer_object_size() and
drgn_reference_object_size().
2019-10-28 11:34:08 -07:00
Omar Sandoval
0da60a41cd libdrgn: support getting register values from stack frames
Currently, the only information available from a stack frame is the
program counter. Eventually, we'd like to add support for getting
arguments and local variables, but that will require more work. In the
mean time, we can at least get the values of other registers. A
determined user can read the assembly for the code they're debugging and
derive the values of variables from the registers.
2019-10-19 13:53:06 -07:00
Omar Sandoval
4fb0e2e110 libdrgn: use new libdwfl stack trace API 2019-10-18 14:34:11 -07:00
Omar Sandoval
6f43fff627 Update elfutils with new stack frame interface
Rebase the existing patches and add the patches which extend the libdwfl
stack frame interface.

Based on:

47780c9e elflint, readelf: enhance error diagnostics

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: don't bother freeing frames outside of dwfl_thread_getframes
libdwfl: only use thread->unwound for initial frame
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2019-10-18 14:34:11 -07:00
Omar Sandoval
d60c6a1d68 libdrgn: add register information to platform
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.
2019-10-18 14:33:02 -07:00
Omar Sandoval
b8c657d760 libdrgn: python: add sizeof()
It's annoying to do obj.type_.size, and that doesn't even work for every
type. Add sizeof() that does the right thing whether it's given a Type
or Object.
2019-10-18 11:47:32 -07:00
Omar Sandoval
12b0214b4d libdrgn: work around DW_AT_upper_bound of -1 for empty arrays
For the following source code:

  int arr[] = {};

GCC emits the following DWARF:

  DWARF section [ 4] '.debug_info' at offset 0x40:
   [Offset]
   Compilation unit at offset 0:
   Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4
   [     b]  compile_unit         abbrev: 1
             producer             (strp) "GNU C17 9.2.0 -mtune=generic -march=x86-64 -g"
             language             (data1) C99 (12)
             name                 (strp) "test.c"
             comp_dir             (strp) "/home/osandov"
             stmt_list            (sec_offset) 0
   [    1d]    array_type           abbrev: 2
               type                 (ref4) [    34]
               sibling              (ref4) [    2d]
   [    26]      subrange_type        abbrev: 3
                 type                 (ref4) [    2d]
                 upper_bound          (sdata) -1
   [    2d]    base_type            abbrev: 4
               byte_size            (data1) 8
               encoding             (data1) signed (5)
               name                 (strp) "ssizetype"
   [    34]    base_type            abbrev: 5
               byte_size            (data1) 4
               encoding             (data1) signed (5)
               name                 (string) "int"
   [    3b]    variable             abbrev: 6
               name                 (string) "arr"
               decl_file            (data1) test.c (1)
               decl_line            (data1) 1
               decl_column          (data1) 5
               type                 (ref4) [    1d]
               external             (flag_present) yes
               location             (exprloc)
                [ 0] addr .bss+0 <arr>

Note the DW_AT_upper_bound of -1. We end up parsing this as UINT64_MAX
and returning a "DW_AT_upper_bound is too large" error. It appears that
GCC is simply emitting the array length minus one, so let's treat these
as having a length of zero.

Fixes #19.
2019-10-18 03:18:21 -07:00
Omar Sandoval
430732093d libdrgn: python: add converter for byteorder
Rather than open-coding the conversion where we need it, make it a
proper converter function.
2019-10-15 21:21:21 -07:00
Omar Sandoval
55a9700435 libdrgn: python: accept integer-like arguments in more places
There are a few places (e.g., Program.symbol(), Program.read()) where it
makes sense to accept, e.g., a drgn.Object with integer type. Replace
index_arg() with a converter function and use it everywhere that we use
the "K" format for PyArg_Parse*.
2019-10-15 21:10:11 -07:00
Omar Sandoval
77253dbdd8 libdrgn: dwarf_info_cache: fix wrong DW_AT_upper_bound error message
I got the error messages for DW_AT_upper_bound and DW_AT_count
backwards; fix it. Also fix the condition for word + 1 overflowing
dimension->length to be word >= UINT64_MAX. (Dwarf_Word is uint64_t so
this is kind of silly, but at least it documents the intent).
2019-10-15 17:07:27 -07:00
Omar Sandoval
4e330bbb6e cli: indicate if drgn was compiled with libkdumpfile 2019-10-03 16:22:10 -07:00
Omar Sandoval
78192cd61e libdrgn: add environment variable to see more missing debug info errors
Sometimes, I'd like to see all of the missing debug info errors rather
than just the first 5. Allow setting this through the
DRGN_MAX_DEBUG_INFO_ERRORS environment variable.
2019-10-02 17:22:12 -07:00
Omar Sandoval
7848c17097 libdrgn: dwarf_index: tweak missing debug section error message
Make the error message more concise, and reorder the sections so that we
check the most obviously-named section (.debug_info) first and least
important section (.debug_line) last.
2019-10-02 17:22:12 -07:00
Omar Sandoval
423d2cd500 libdrgn: dwarf_index: rework file reporting
Currently, the interface between the DWARF index, libdwfl, and the code
which finds and reports vmlinux/kernel modules is spaghetti. The DWARF
index tracks Dwfl_Modules via their userdata. However, despite
conceptually being owned by the DWARF index, the reporting code reports
the Dwfl_Modules and sets up the userdata. These Dwfl_Modules and
drgn_dwfl_module_userdatas are messy to track and pass between the
layers.

This reworks the architecture so that the DWARF index owns the Dwfl
instance and files are reported to the DWARF index; the DWARF index
takes care of reporting to libdwfl internally. In addition to making the
interface for the reporter much cleaner, this improves a few things as a
side-effect:

- We now deduplicate on build ID in addition to path.
- We now skip searching for vmlinux and/or kernel modules if they were
  already indexed.
- We now support compressed ELF files via libdwelf.
- We can now load default debug info at the same time as additional
  debug info.
2019-10-02 17:22:11 -07:00
Omar Sandoval
91265c37a0 libdrgn: hash_table: fix memcmp() undefined behavior
It's undefined behavior to pass NULL to memcmp() even if the length is
zero. See also commit a17215e984 ("libdrgn: dwarf_index: fix memcpy()
undefined behavior").
2019-10-02 17:16:43 -07:00
Omar Sandoval
b05cc0eb75 libdrgn: use libkdumpfile for ELF vmcores when available
vmcores don't include program headers for special memory regions like
vmalloc and percpu. Instead, we need to walk the kernel page table to
map those addresses. Luckily, libkdumpfile already does that. So, if
drgn was built with libkdumpfile support, use it for ELF vmcores. Also
add an environment variable to override this behavior.

Closes #15.
2019-10-02 17:15:36 -07:00
Omar Sandoval
191c5ae253 libelf: clean up SHF_COMPRESSED handling
We don't need the ifdef anymore since we're using the elf.h from our
local elfutils. We can also fold a leftover nested if.
2019-09-24 17:16:17 -07:00
Omar Sandoval
ca9cdc1991 libdrgn: autogenerate docstrings.h
I didn't want to use BUILT_SOURCES before because that would break make
$TARGET. But, now that doesn't work anyways because we're using SUBDIRS,
so we might as well use BUILT_SOURCES.
2019-09-19 11:08:04 -07:00
Omar Sandoval
aa4bfd646f libdrgn: simplify gen_constants.py header search
Instead of passing in a directory for header files, add -iquote for that
directory.
2019-09-19 11:08:04 -07:00
Omar Sandoval
6a13d74c0c libdrgn: build with bundled elfutils
Now that we have the bundled version of elfutils, build it from libdrgn
and link to it. We can also get rid of the elfutils version checks from
the libdrgn code.
2019-09-19 11:07:12 -07:00
Omar Sandoval
1cedca8ff4 Import elfutils
Based on:

c950e8a9 config: Fix spec file, add manpages and new GFDL license.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough

The plan is to stop relying on the distribution's version of elfutils
and instead ship our own. This gives us freedom to assume that we're
using the latest version and even ship our own patches (starting with a
few build system improvements). More details are in
scripts/update-elfutils.sh, which was used to generate this commit.
2019-09-05 01:04:33 -07:00
Omar Sandoval
f11a8766bf setup.py: get list of source files from git
Currently, we have a special Makefile target to output the files for a
libdrgn source tarball, and we use that for setuptools. However, the
next change is going to import elfutils, and it'd be a pain to add the
same thing for the elfutils sources. Instead, let's just use git
ls-files for everything. The only difference is that source
distributions won't have the autoconf/automake output.
2019-09-03 17:19:02 -07:00
Omar Sandoval
a3f4fe0518 libdrgn: handle get_debug_sections() errors per-module
There's no reason to fail indexing just because one file is missing
debug information.
2019-08-29 12:26:40 -07:00
Omar Sandoval
62d98b3016 libdrgn: fold ELF relocation code into dwarf_index
I started with drgn_elf_relocator as a separate interface to parallelize
by relocation. However, the final result is parallelized by file, which
means that it can be done as part of the main read_cus() loop. Get rid
of the elf_relocator interface and do it in dwarf_index.c instead. This
means that if/when libdwfl gets faster at ELF relocations, we can rip
out the relocation code without any other changes.
2019-08-29 12:26:22 -07:00
Omar Sandoval
698991b27b Get rid of DRGN_ERROR_{ELF,DWARF}_ERROR and FileFormatError
We're too inconsistent with how we use these for them to be useful (and
it's impossible to distinguish between a format error and some other
error from libelf/libdw/libdwfl), so let's just get rid of them and make
it all DRGN_ERROR_OTHER/Exception.
2019-08-15 15:03:42 -07:00
Omar Sandoval
10142f922f Add basic stack trace support
For now, we only support stack traces for the Linux kernel (at least
v4.9) on x86-64, and we only support getting the program counter and
corresponding function symbol from each stack frame.
2019-08-02 00:26:28 -07:00
Serapheim Dimitropoulos
93d7ea9f01 Add support for kdump-compressed core dumps with libkdumpfile 2019-08-02 00:20:16 -07:00
Omar Sandoval
690b5fd650 libdrgn: generalize architecture to platform
For stack trace support, we'll need to have some architecture-specific
functionality. drgn's current notion of an architecture doesn't actually
include the instruction set architecture. This change expands it to a
"platform", which includes the ISA as well as the existing flags.
2019-08-02 00:11:56 -07:00
Omar Sandoval
71e6744210 libdrgn: add symbol table interface
Now that we're not overloading the name "symbol", we can define struct
drgn_symbol as a symbol table entry. For now, this is very minimal: it's
just a name, address, and size. We can then add a way to find the symbol
for a given address, drgn_program_find_symbol(). For now, this is only
supported through the actual ELF symbol tables. However, in the future,
we can probably support adding "symbol finders".
2019-07-30 09:25:34 -07:00
Omar Sandoval
0c5df56fba libdrgn: replace symbol index with object index
struct drgn_symbol doesn't really represent a symbol; it's just an
object which hasn't been fully initialized (see c2be52dff0 ("libdrgn:
rename object index to symbol index"), it used to be called a "partial
object"). For stack traces, we're going to have a notion of a symbol
that more closely represents an ELF symbol, so let's get rid of the
temporary struct drgn_symbol representation and just return an object
directly.
2019-07-29 17:04:47 -07:00
Omar Sandoval
74bd59e38a libdrgn: python: get rid of Program._symbol()
We can test with Program.object() just as easily, so get rid of this
undocumented method.
2019-07-29 17:04:47 -07:00
Omar Sandoval
62ff4e1dba libdrgn: indicate finder lookup failure with special error
Currently, finders indicate a non-fatal lookup error by setting the type
member to NULL. This won't work when we replace the symbol finder with
an object finder (which shouldn't modify the object on failure).
Instead, use a static error for this purpose.
2019-07-29 17:04:47 -07:00
Omar Sandoval
0cb77b303c libdrgn: work around Clang __muloti4 again
See 2dd14ad522 ("libdrgn: work around "undefined reference to
'__muloti4'" when using Clang").
2019-07-29 17:03:45 -07:00
Omar Sandoval
b01d1a943f libdrgn: python: make set_drgn_error() return void *
It still always returns NULL, but now we can directly return from
functions returning some PyObject subtype.
2019-07-28 00:58:36 -07:00
Omar Sandoval
0a74a610bc libdrgn: python: only repr() one level of type members
Currently, repr() of structure and union types goes arbitrarily deep
(except for cycles). However, for lots of real-world types, this is
easily deeper than Python's recursion limit, so we can't get a useful
repr() at all:

>>> repr(prog.type('struct task_struct'))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
RecursionError: maximum recursion depth exceeded while getting the repr of an object

Instead, only print one level of structure and union types.
2019-07-27 15:04:31 -07:00
Omar Sandoval
d63125f133 libdrgn: python: make Program.object() flags optional
Default to FindObjectFlags.ANY.
2019-07-24 11:02:34 -07:00
Omar Sandoval
06cce1baa1 libdrgn: fix typo in drgn_enomem documentation 2019-07-22 17:23:27 -07:00
Omar Sandoval
3e95e88028 libdrgn: vector: protect against overflow when doubling capacity
It seems extremely unlikely that we'd actually overflow before we run
out of memory, but let's just be safe.
2019-07-19 09:27:20 -07:00
Omar Sandoval
27a27940bc libdrgn: split up drgn_program_get_dwarf()
We don't need to get the DWARF index at the time we get the Dwfl handle,
so get rid of drgn_program_get_dwarf(), add drgn_program_get_dwfl(), and
create the DWARF index right before we update in a new function,
drgn_program_update_dwarf_index().
2019-07-19 09:26:30 -07:00
Omar Sandoval
a17215e984 libdrgn: dwarf_index: fix memcpy() undefined behavior
Apparently, it's undefined behavior to pass NULL as the source to
memcpy(), even if the length is zero. It's an easy fix, so let's appease
UBSan.
2019-07-15 12:27:48 -07:00
Omar Sandoval
1d4854a5bc libdrgn: implement optimized x86-64 ELF relocations
After the libdwfl conversion, we apply ELF relocations with libdwfl
instead of our homegrown implementation. However, libdwfl is much slower
at it than the previous implementation. We can work around this by
(again) applying ELF relocations ourselves for architectures that we
care about (x86-64, to start). For other architectures, we can fall back
to libdwfl.

This new implementation of ELF relocation reworks the parallelization to
be per-file rather than per-relocation. The latter was done originally
because before commit 6f16ab09d6 ("libdrgn: only apply ELF relocations
to relocatable files"), we applied relocations to vmlinux, which is much
larger than most kernel modules. Now that we don't do that, it seems to
be slightly faster to parallelize by file.
2019-07-15 12:27:48 -07:00
Omar Sandoval
e5874ad18a libdrgn: use libdwfl
libdwfl is the elfutils "DWARF frontend library". It has high-level
functionality for looking up symbols, walking stack traces, etc. In
order to use this functionality, we need to report our debugging
information through libdwfl. For userspace programs, libdwfl has a much
better implementation than drgn for automatically finding debug
information from a core dump or PID. However, for the kernel, libdwfl
has a few issues:

- It only supports finding debug information for the running kernel, not
  vmcores.
- It determines the vmlinux address range by reading /proc/kallsyms,
  which is slow (~70ms on my machine).
- If separate debug information isn't available for a kernel module, it
  finds it by walking /lib/modules/$(uname -r)/kernel; this is repeated
  for every module.
- It doesn't find kernel modules with names containing both dashes and
  underscores (e.g., aes-x86_64).

Luckily, drgn already solved all of these problems, and with some
effort, we can keep doing it ourselves and report it to libdwfl.

The conversion replaces a bunch of code for dealing with userspace core
dump notes, /proc/$pid/maps, and relocations.
2019-07-15 12:27:48 -07:00
Omar Sandoval
a9a2cb7cac libdrgn: dwarf_index: move bswap from file to compilation unit
Remove an indirection.
2019-07-15 12:27:38 -07:00
Omar Sandoval
1c9ab2e7d1 libdrgn: dwarf_index: fix leak of DWARF index entries on failure
We're forgetting to unchain new entries which are chained on old
entries.
2019-07-15 12:27:36 -07:00
Omar Sandoval
996d3094ef libdrgn: dwarf_index: fold unindex_files() into index_cus() 2019-07-15 12:27:33 -07:00
Omar Sandoval
b7e1b6ede6 libdrgn: dwarf_index: rename drgn_dwarf_index_iterator_next() output parameter 2019-07-15 12:27:24 -07:00
Omar Sandoval
d423361d8a libdrgn: dwarf_index: move .debug_str null-termination check
Check it right after we read the section instead of when updating the
index.
2019-07-15 12:27:18 -07:00
Omar Sandoval
9f9bec4762 libdrgn: use common vector where applicable
This converts several open-coded dynamic arrays to the new common vector
implementation:

- drgn_lexer stack
- Array dimension array for DWARF parsing
- drgn_program_read_c_string()
- DWARF index directory name hashes
- DWARF index file name hashes
- DWARF index abbreviation table
- DWARF index shard entries
2019-07-15 12:27:16 -07:00
Omar Sandoval
8d52536271 libdrgn: add common vector implementation
drgn has enough open-coded dynamic arrays at this point to warrant a
common implementation. Add one inspired by hash_table.h. The API is
pretty minimal. I'll add more to it as the need arises.
2019-07-15 12:27:15 -07:00
Omar Sandoval
e2a27e7f59 libdrgn: add drgn_error_format_os()
There are some cases where we format a path (e.g., with asprintf()) and
keep it around only in case of errors. Add drgn_error_format_os() so we
can just reformat it if we hit the error, which simplifies cleanup.
2019-07-11 16:19:18 -07:00
Omar Sandoval
74c0aa8612 libdrgn: reorder drgn_error_create_os() arguments
To make it more consistent with the upcoming drgn_error_format_os().
2019-07-11 16:12:56 -07:00
Omar Sandoval
ce808440f7 libdrgn: move string_builder_line_break() to string_builder.c 2019-07-11 15:33:10 -07:00
Omar Sandoval
0ebcfc8178 libdrgn: use drgn_stop error in DWARF index
The (struct drgn_error *)-1 hack predates DRGN_ERROR_STOP. Get rid of
the hack.
2019-07-11 10:53:13 -07:00
Omar Sandoval
5b2da5e682 libdrgn: get rid of unnecessary gelf_getshdr() in read_elf_section() 2019-07-11 09:53:56 -07:00
Omar Sandoval
e73346b488 libdrgn: generalize IS_RUNNING_KERNEL flag to IS_LIVE
I.e., also flag running processes as live.
2019-07-08 16:55:54 -07:00
Omar Sandoval
129f1493b8 libdrgn: split kernel-specific stuff out of program.c
Almost half of program.c is stuff specific to the Linux kernel, so let's
separate that out (and combine it with the existing kernel module code).
2019-07-08 16:53:58 -07:00
Omar Sandoval
a0fc02efd3 libdrgn: don't store version in struct compilation_unit
We don't use it after checking it.
2019-07-08 16:23:38 -07:00
Omar Sandoval
8a59a7e819 libdrgn: don't preallocate DWARF index memory
This doesn't make things any faster in my benchmarks, and it complicates
DWARF index initialization.
2019-07-08 16:23:38 -07:00
Omar Sandoval
ec33f9bf73 libdrgn: get rid of DWARF index flags
We always index everything, so simplify the code a bit.
2019-07-08 16:23:38 -07:00
Omar Sandoval
426ee1e57c libdrgn/python: add missing name in Symbol argument parsing 2019-06-29 01:38:36 -07:00
Omar Sandoval
97ebc2a57c libdrgn/python: add Program.cache
For caching metadata between invocations of helpers (e.g., detected
kernel version differences or config options).
2019-06-28 16:15:07 -07:00
Omar Sandoval
25e7a9d3b8 libdrgn/python: implement Program.__contains__ 2019-06-28 16:02:52 -07:00
Omar Sandoval
f55158c74c libdrgn: add PAGE_{SHIFT,SIZE,MASK} symbols from vmcoreinfo
Since we currently don't parse DWARF macro information, there's no easy
way to get the value PAGE_SIZE and friends in drgn. However, vmcoreinfo
contains the value of PAGE_SIZE, so let's add a special symbol finder
that returns that.
2019-05-29 00:02:48 -07:00
Omar Sandoval
1614b1e6f6 libdrgn: add better vmcoreinfo fallback
Currently, if we don't get vmcoreinfo from /proc/kcore, and we can't get
it from /sys/kernel/vmcoreinfo, then we manually determine the kernel
release and KASLR offset. This has a couple of issues:

1. We look for vmlinux to determine the KASLR offset, which may not be
   in a standard location.
2. We might want to start using other information from vmcoreinfo which
   can't be determined as easily.

Instead, we can get the virtual address of vmcoreinfo from
/proc/kallsyms and read it directly from there.
2019-05-28 15:54:49 -07:00
Omar Sandoval
8e45a305fb libdrgn: fix file/memory leak in proc_kallsyms_symbol_addr() 2019-05-28 15:01:35 -07:00
Serapheim Dimitropoulos
2396cdca47 libdrgn: add /usr/lib/debug/boot in the vmlinux_paths
Ubuntu-based distros tend to put vmlinux with debug info
under /usr/lib/debug/boot/vmlinux-<version>.
2019-05-27 17:43:10 -07:00
Omar Sandoval
eeac241c65 libdrgn: make all kernel module iterator errors non-fatal when loading default symbols
kernel_module_iterator_next() can also fail in
open_loaded_kernel_modules(), so handle it in the same way that we
currently handle kernel_module_iterator_init().
2019-05-26 14:58:02 -07:00
Omar Sandoval
68f7b87d6a libdrgn: ignore physical core dump segments with address -1
/proc/kcore contains segments which don't have a valid physical address,
which it indicates with a p_paddr of -1. Skip those segments, otherwise
we got an overflow error from the memory reader.
2019-05-26 14:49:48 -07:00
Omar Sandoval
c0bc72b0ea libdrgn: use splay tree for memory reader
The current array-based memory reader has a bug in the following
scenario:

    prog.add_memory_segment(0xffff0000, 128, ...)
    # This should replace a subset of the first segment.
    prog.add_memory_segment(0xffff0020, 32, ...)
    # This moves the first segment back to the front of the array.
    prog.read(0xffff0000, 32)
    # This finds the first segment instead of the second segment.
    prog.read(0xffff0032, 32)

Fix it by using the newly-added splay tree. This also splits up the
virtual and physical memory segments into separate trees.
2019-05-24 17:48:08 -07:00
Omar Sandoval
10fb398338 libdrgn: add splay tree implementation
This will be used to track memory segments instead of the array we
currently use. The API is based on the hash table API; it can support
alternative implementations in the future, like red-black trees.
2019-05-24 17:48:08 -07:00
Omar Sandoval
dcddaa2cc1 libdrgn: revamp hash table API
This makes several improvements to the hash table API.

The first two changes make things more general in order to be consistent
with the upcoming binary search tree API:

- Items are renamed to entries.
- Positions are renamed to iterators.
- hash_table_empty() is added.

One change makes the definition API more convenient:

- It is no longer necessary to pass the types into
  DEFINE_HASH_{MAP,SET}_FUNCTIONS().

A few changes take some good ideas from the C++ STL:

- hash_table_insert() now fails on duplicates instead of overwriting.
- hash_table_delete_iterator() returns the next iterator.
- hash_table_next() returns an iterator instead of modifying it.

One change reduces memory usage:

- The lower-level DEFINE_HASH_TABLE() is cleaned up and exposed as an
  alternative to DEFINE_HASH_MAP() and DEFINE_HASH_SET(). This allows us
  to get rid of the duplicated key where a hash map value already embeds
  the key (the DWARF index file table) and gets rid of the need to make
  a dummy hash set entry to do a search (the pointer and array type
  caches).
2019-05-24 17:48:05 -07:00
Omar Sandoval
0026eeae66 libdrgn: find kernel module name in kernels < v4.13
If we can't find the module name in .modinfo, fall back to
.gnu.linkonce.this_module.
2019-05-14 15:39:16 -07:00
Omar Sandoval
e4b8af7807 libdrgn/python: fix uninitialized variable in Program.load_debug_info()
path_converter() reads arg->allow_none, so make sure we zero the array
of path arguments.
2019-05-14 14:28:03 -07:00
Omar Sandoval
39876ccbac libdrgn: find kernel module debuginfo on Debian
Debian's linux-image*-dbg packages name the ELF files without the extra
.debug suffix that Fedora includes.
2019-05-14 12:42:56 -07:00
Omar Sandoval
ac27f2c1ec libdrgn: only load debug information from loaded kernel modules
Currently, we load debug information for every kernel module that we
find under /lib/modules/$(uname -r)/kernel. This has a few issues:

1. Distribution kernels have lots of modules (~3000 for Fedora and
   Debian).
   a) This can exceed the default soft limit on the number of open file
      descriptors.
   b) The mmap'd debug information can trip the overcommit heuristics
      and cause OOM kills.
   c) It can take a long time to parse all of the debug information.
2. Not all modules are under the "kernel" directory; some distros also
   have an "extra" directory.
3. The user is not made aware of loaded kernel modules that don't have
   debug information available.

So, instead of walking /lib/modules, walk the list of loaded kernel
modules and look up their debugging information.
2019-05-14 11:55:39 -07:00
Omar Sandoval
efaec41ca2 libdrgn: add string_builder_append_error() 2019-05-14 11:49:52 -07:00
Omar Sandoval
e21ed988fb libdrgn: add drgn_error_from_string_builder()
And use that instead of exposing drgn_error_create_nodup().
2019-05-14 10:16:01 -07:00
Omar Sandoval
b0f10d3b58 libdrgn: pass enum drgn_error_code to error constructors 2019-05-14 10:13:05 -07:00
Omar Sandoval
f08d4c9a08 libdrgn: make string_builder API return bool
It can only fail with no memory, so simplify it.
2019-05-14 10:07:50 -07:00
Omar Sandoval
0135dbd0cc libdrgn: get loaded module names from /proc/modules when possible
Similar to the last optimization, for the running kernel, we can just
read /proc/modules instead of walking the kernel data structures.
2019-05-13 18:05:17 -07:00
Omar Sandoval
ed6a6f0b3e libdrgn: get module section address from sysfs when possible
In the running kernel, we don't have to walk the list of modules and
module sections, since we can just look it up directly in sysfs.
2019-05-13 18:05:09 -07:00
Omar Sandoval
f11e030aaa libdrgn: factor out kernel module iteration and section lookup 2019-05-13 16:39:30 -07:00
Omar Sandoval
9b563170f8 libdrgn: make load_debug_info() API saner
Rather than exposing the underlying open and load steps of DWARF index,
simplify it down to a single load step.
2019-05-13 15:04:27 -07:00
Omar Sandoval
60c9e26ff5 libdrgn: make C string hash table helpers work with char *
const char * const * is not compatible with char * const *, so make
c_string_hash() and c_string_eq() macros so they can work with both
const char * and char * keys.
2019-05-13 11:52:55 -07:00
Omar Sandoval
1206730730 libdrgn: fix hash_set_insert_searched_pos() documentation 2019-05-13 11:52:55 -07:00
Omar Sandoval
ab58a5bff0 libdrgn: determine default size_t and ptrdiff_t more intelligently
Currently, size_t and ptrdiff_t default to typedefs of the default
unsigned long and long, respectively, regardless of what the program
actually defines unsigned long or long as. Instead, make them refer the
whatever integer type (long, long long, or int) is the same size as the
word size.
2019-05-10 15:14:03 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
ac946ba8a7 libdrgn: fix zero-filling reads from core dump segments 2019-05-09 16:35:48 -07:00
Omar Sandoval
15bc0286b9 libdrgn: add drgn_error_create_nodup() 2019-05-09 16:35:14 -07:00
Omar Sandoval
fb10623903 libdrgn: use next_power_of_two() for string_builder 2019-05-09 16:01:07 -07:00
Omar Sandoval
6d7b0631b9 libdrgn: simplify string_builder API
Instead of maintaining a null-terminated string, null-terminate just
before returning the string as a "finalize" step.
2019-05-09 15:25:58 -07:00
Omar Sandoval
5200a6652c libdrgn: embed memory reader, type index, and symbol index in program 2019-05-06 14:55:34 -07:00
Omar Sandoval
bb2357bc09 libdrgn: don't require word size for type index initialization 2019-05-06 14:55:34 -07:00
Omar Sandoval
640b1c011d libdrgn: embed DWARF index in DWARF info cache 2019-05-06 14:55:34 -07:00
Omar Sandoval
2ed8e3148c libdrgn: get architecture info from core file instead of DWARF index 2019-05-06 14:55:34 -07:00
Omar Sandoval
ba162ac001 libdrgn: remove endianness from type index
The type index doesn't need to know or care about endianness. Move it to
the program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
565e0343ef libdrgn: make symbol index pluggable with callbacks
The last piece of making the major program components pluggable.
2019-05-06 14:55:34 -07:00
Omar Sandoval
47151c4554 libdrgn/python: add Program.object() 2019-05-06 14:55:34 -07:00
Omar Sandoval
0ea2825817 libdrgn: refactor gen_constants.py
All of the constants are generated with basically the same code, so
refactor it.
2019-05-06 14:55:34 -07:00
Omar Sandoval
9c6575e783 libdrgn: move relocation hook to drgn_info_cache 2019-05-06 14:55:34 -07:00
Omar Sandoval
52a8681a8d libdrgn: rename drgn_dwarf_type_cache to drgn_dwarf_info_cache
This is preparation for sharing this with the symbol index.
2019-05-06 14:55:34 -07:00
Omar Sandoval
a98445c277 libdrgn: make type index pluggable with callbacks
Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
2019-05-06 14:55:34 -07:00
Omar Sandoval
77443cecd1 libdrgn: use NULL-terminated arrays for mock {type,symbol} index
This will simplify the next couple of changes.
2019-05-06 14:55:34 -07:00
Omar Sandoval
c2be52dff0 libdrgn: rename object index to symbol index
An "object index" doesn't actually index objects, but really "partial
objects" -- i.e., a type + address. "Symbol" is a better name for this.
2019-05-06 14:55:34 -07:00
Omar Sandoval
417a6f0d76 libdrgn: make memory reader pluggable with callbacks
I've been planning to make memory readers pluggable (in order to support
use cases like, e.g., reading a core file over the network), but the
C-style "inheritance" drgn uses internally is awkward as a library
interface; it's much easier to just register a callback. This change
effectively makes drgn_memory_reader a mapping from a memory range to an
arbitrary callback. As a bonus, this means that read callbacks can be
mixed and matched; a part of memory can be in a core file, another part
can be in the executable file, and another part could be filled from an
arbitrary buffer.
2019-05-06 14:55:34 -07:00
Omar Sandoval
e78e6c9226 libdrgn/python: improve Python exception handling
Make {set,clear}_drgn_in_python() handle nested calls, and return a
valid error to libdrgn rather than -1.
2019-05-06 14:55:34 -07:00
Omar Sandoval
d0633d2f1a libdrgn: allow multiple cleanup callbacks
For now, this makes things slightly awkward, but it will be necessary
for the upcoming changes making drgn_program more pluggable.
2019-05-06 14:55:34 -07:00
Omar Sandoval
043cddf6d8 libdrgn: move member cache to type index
It makes more sense here than in struct drgn_program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
3645ce78ea libdrgn: assume all pointers have same size in type index 2019-05-06 14:55:34 -07:00
Omar Sandoval
06960f591c libdrgn: look up primitive types on demand
Instead of caching all primitive types ahead of time, look them up on
demand. This is preparation for making the type index API more flexible.
2019-05-06 14:55:34 -07:00
Omar Sandoval
932b7857b5 libdrgn: expose primitive type concept to public interface
Previously known as c_type.
2019-05-06 14:55:34 -07:00
Omar Sandoval
bad1b9b33c libdrgn/python: use generated docstrings for generated types
I missed ProgramFlags, Qualifiers, and TypeKind.
2019-05-06 14:55:34 -07:00
Omar Sandoval
839252564a libdrgn: deduplicate files in DWARF index
Currently, we deduplicate files for userspace mappings manually.
However, to prepare for adding symbol files at runtime, move the
deduplication to DWARF index. In the future, we probably want to
deduplicate based on build ID, as well.
2019-05-06 14:55:34 -07:00
Omar Sandoval
6f16ab09d6 libdrgn: only apply ELF relocations to relocatable files
Relocations are only supposed to be applied to ET_REL files, not ET_EXEC
files like vmlinux. This hasn't been an issue with the kernel builds
that I've tested on because the relocations match the contents of the
section. However, on Fedora, the relocation sections don't match,
probably because they post-process the binary in some way. This leads to
completely bogus debug information being parsed by drgn_dwarf_index. Fix
it by only relocating ET_REL files.
2019-05-06 11:09:46 -07:00
Omar Sandoval
4bb36fc150 libdrgn: check for Python development headers at configure time 2019-05-03 14:10:48 -07:00
Omar Sandoval
7282c40a75 libdrgn: fix crash in drgn_object_slice()
We need to set the value after we've reinitialized the object, otherwise
drgn_object_deinit() may try to free a buffer that we've already
overwritten. This also adds a test which triggers the crash.
2019-04-24 17:57:36 -07:00
Omar Sandoval
97f5cf70c6 libdrgn: fix C array and function casting
Casting an array or function should first convert the array or function
into a pointer.
2019-04-12 16:40:12 -07:00
Omar Sandoval
1db8d11f84 libdrgn: allow void pointer arithmetic
This is a GCC extension, but it's used pretty often in practice.
2019-04-12 16:06:24 -07:00
Omar Sandoval
7719d76820 libdrgn: allow comparison of function pointers and incomplete arrays 2019-04-12 15:49:15 -07:00
Omar Sandoval
309dc82789 libdrgn: allow comparing any pointer types in C
There's a bug that we don't allow comparisons between void * and other
pointer types, so let's fix it by allowing all pointer comparisons
regardless of the referenced type. Although this isn't valid by the C
standard, GCC and Clang both allow it by default (with a warning).
2019-04-12 15:44:08 -07:00
Omar Sandoval
73090f6128 libdrgn: fix error message for cast to incomplete type
The errant type is the one we're trying to cast to, not the object's
type. This fixes an abort in drgn_error_incomplete_type().
2019-04-12 13:24:51 -07:00
Omar Sandoval
1b4aee1a55 libdrgn/python: add Program.pointer_type() 2019-04-11 23:35:10 -07:00
Omar Sandoval
435640faf6 Fix some linter errors 2019-04-11 15:51:20 -07:00
Omar Sandoval
393a1f3149 Document with Sphinx
drgn has pretty thorough in-program documentation, but it doesn't have a
nice overview or introduction to the basic concepts. This commit adds
that using Sphinx. In order to avoid documenting everything in two
places, the libdrgn bindings have their docstrings generated from the
API documentation. The alternative would be to use Sphinx's autodoc
extension, but that's not as flexible and would also require building
the extension to build the docs. The documentation for the helpers is
generated using autodoc and a small custom extension.
2019-04-11 12:48:15 -07:00
Omar Sandoval
321e2c210e libdrgn/python: format pointers in hex for Object.__repr__() 2019-04-05 10:36:54 -07:00
Omar Sandoval
687ea74ff2 Build Python bindings with automake
I went back and forth on using setuptools or autotools for the Python
extension, but I eventually settled on using only setuptools after
fighting to get the two to integrate well. However, setuptools is kind
of crappy; for one, it rebuilds every source file when rebuilding the
extension, which is really annoying for development. automake is a
better designed build system overall, so let's use that for the
extension. We override the build_ext command to build using autotools
and copy things where setuptools expects them.
2019-04-03 17:00:53 -07:00
Omar Sandoval
0b72e180fa libdrgn: match partial paths for type/object lookups
The declaration file name of a DIE depends on the compilation directory,
which may not always be what the user expects. Instead, make the search
match as long as the full declaration file name ends with the given file
name. This is more convenient and more intuitive.
2019-04-02 14:12:11 -07:00
Omar Sandoval
19c8fd75e6 libdrgn: don't use zero length arrays for struct drgn_type
Running tests with Clang's AddressSanitizer fails with "runtime error:
index 1 out of bounds for type 'struct drgn_type_member [0]'". Zero
length arrays are a GCC extension and aren't buying us much anyways, so
just add a helper function that gets the array payload using pointer
arithmetic.
2019-04-02 14:12:11 -07:00
Omar Sandoval
2dd14ad522 libdrgn: work around "undefined reference to '__muloti4'" when using Clang
Older versions of Clang generate a call to __muloti4() for
__builtin_mul_overflow() with mixed signed and unsigned types. However,
Clang doesn't link to compiler-rt by default. Work around it by making
all of our calls to __builtin_mul_overflow() use unsigned types only.

1: https://bugs.llvm.org/show_bug.cgi?id=16404
2019-04-02 14:12:11 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00