Commit Graph

282 Commits

Author SHA1 Message Date
Omar Sandoval
94e7b1f92c libdrgn: dwarf_index: avoid copying CUs for one thread
In read_cus(), the master thread can use the final CUs vector directly
and the rest of the threads can merge their private vectors in. This
consistently shaves a few milliseconds off of startup.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
53ba7262cd libdrgn: dwarf_index: handle DW_AT_declaration with DW_FORM_flag
We currently assume that if DW_AT_declaration is present, it is true.
This seems to be true in practice, and I see no reason to ever use
DW_FORM_flag with a value of zero. There's no performance hit to handle
it, though, so we might as well.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
ea9f3f3114 libdrgn: dwarf_index: don't worry about tag of CU DIE
As a small simplification, we can take commit 9bb2ccecb7 ("Enable
DWARF indexing to work with partial units") further and not look at the
tag of the top-level DIE at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
c8f84c57fb libdrgn: dwarf_index: use size_t instead of uint64_t where appropriate
The CU unit length and DIE offset are both limited by the size of the
mapped debugging information, i.e., size_t.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
2252bef1a7 libdrgn: dwarf_index: rename TAG_FLAG_* and TAG_MASK to DIE_FLAG_*
This is more clear: although these flags happen to be encoded with the
DWARF tag, they are flags regarding the DIE.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
85c4b36820 libdrgn: dwarf_index: fix leak when parsing bad line number program header
If we fail to read an include directory in read_file_name_table(), we
need to free the directory hashes.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
ff96c75da0 helpers: translate task_state_to_char() to Python
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 13:54:39 -07:00
Omar Sandoval
e96d9fd3fd libdrgn/python: don't allow None for Program.object() flags
Similar to the previous commit, this was to work around pydoc issues
that we don't have anymore.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
e49a87a3d7 libdrgn: remove struct drgn_object::prog
We can get it via the type now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:21 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
1c8181e22d libdrgn: rearrange struct drgn_program members
struct drgn_program has a bunch of state scattered around. Group it
together more logically, even if it means sacrificing some padding here
and there.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:34:44 -07:00
Omar Sandoval
d4e0771f87 libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}()
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
a8d632b4c1 libdrgn/python: use F14 instead of PyDict for Program::objects
Program::objects is used to store references to objects that must stay
alive while the Program is alive. It is currently a PyDict where the
keys are the object addresses as PyLong and the values are the objects
themselves. This has two problems:

1. Allocating the key as a full object is obviously wasteful.
2. PyDict doesn't have an API for reserving capacity ahead of time,
   which we want for an upcoming change.

Both of these are easily fixed by using our own hash table.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
arsarwade
6f6c5f272f
libdrgn: export function drgn_object_init() (#70)
drgn_object_init() is available in drgh.h file and seems to a required
call before calling drgn_program_find_object().

Without this, trying to call drgn_object_init() from an external C
application results in undefined reference.

Signed-off-by: Aditya Sarwade <asarwade@fb.com>
2020-08-21 10:24:52 -07:00
Omar Sandoval
0cf3320a89 Add type annotations to helpers
Now that drgndoc can handle overloads and we have the IntegerLike and
Path aliases, we can add type annotations to all helpers. There are also
a couple of functional changes that snuck in here to make annotating
easier.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:28:02 -07:00
Omar Sandoval
2d49ef657b Add Path type alias
Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add
an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes]
to get rid of some mypy --strict errors.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:20:29 -07:00
Omar Sandoval
66c5cc83a6 Add IntegerLike type annotation
Lots if interfaces in drgn transparently turn an integer Object into an
int by using __index__(), so add an IntegerLike protocol for this and
use it everywhere applicable.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:16:50 -07:00
Omar Sandoval
20bcde1f1d drgn 0.0.7
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 23:32:32 -07:00
Omar Sandoval
025989871b drgn 0.0.6
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:25:54 -07:00
Omar Sandoval
e3309765f9 helpers: add kaslr_offset() and move pgtable_l5_enabled()
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:00:16 -07:00
Omar Sandoval
e7f353c118 libdrgn: hash_table: clean up coding style
Clean up the coding style of the remaining few places that the last
couple of changes didn't rewrite.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 11:53:05 -07:00
Omar Sandoval
f94b0262c6 libdrgn: hash_table: implement vector storage policy
The folly F14 implementation provides 3 storage policies: value, node,
and vector. The default F14FastMap/F14FastSet chooses between the value
and vector policies based on the value size.

We currently only implement the value policy, as the node policy is easy
to emulate and the vector policy would've added more complexity. This
adds support for the vector policy (adding even more C abuse :) and
automatically chooses the policy the same way as folly. It'd be easy to
add a way to choose the policy if needed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 11:53:00 -07:00
Omar Sandoval
9ea11a7c26 libdrgn: hash_table: port reserve optimization
The only major change to the folly F14 implementation since I originally
ported it is commit 3d169f4365cf ("memory savings for F14 tables with
explicit reserve()"). That is a small improvement for small tables and a
large improvement for vector tables, which are about to be added.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
2eab47ce9e libdrgn: hash_table: use posix_memalign() instead of aligned_alloc()
posix_memalign() doesn't have the restriction that the size must be a
multiple of the alignment like aligned_alloc() does in C11.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
2409868409 libdrgn: hash_table: define chunk alignment constant
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-18 01:42:37 -07:00
Omar Sandoval
6d4af7e17e libdrgn: dwarf_info_cache: handle variables DW_AT_const_value
Compile-time constants have DW_AT_const_value instead of DW_AT_location.
We can translate those to a value object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 15:23:51 -07:00
Omar Sandoval
213c148ce6 libdrgn: dwarf_info_cache: handle DW_AT_endianity
Variables can have a non-default endianity. Handle it and clean up
variable endian handling.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 14:26:58 -07:00
Omar Sandoval
c840072d05 libdrgn: make drgn_object_set_buffer() take a void *
It's awkward to make callers cast to char *.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 10:25:03 -07:00
Omar Sandoval
f1eaf5b14c libdrgn: add load_debug_info example program
Really it's more of a test program than an example program. It's useful
for benchmarking, testing with valgrind, etc. It's not built by default,
but it can be built manually with:

  $ make -C build/temp.* examples/load_debug_info

And run with:

  $ ./build/temp.*/examples/load_debug_info

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-10 16:18:58 -07:00
Omar Sandoval
3028da4d1d libdrgn: compare language in drgn_type_eq()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-08 22:07:49 -07:00
Omar Sandoval
1b47b866b4 libdrgn: go back to trusting PRSTATUS PID
Commit eea5422546 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
2020-07-08 18:34:16 -07:00
Omar Sandoval
293418294a libdrgn: assume compiler uses sane integer implementation
I once tried to implement a generic arithmetic right shift macro without
relying on any implementation-defined behavior, but this turned out to
be really hard. drgn is fairly tied to GCC and GCC-compatible compilers
(like Clang), so let's just assume GCC's model [1]: modular conversion
to signed types, two's complement signed bitwise operators, and sign
extension for signed right shift.

1: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
2020-07-07 17:18:17 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
e4c52c5422 libdrgn: linux_kernel: use names for kmod index constants
This makes it much easier to follow along with the code and understand
the format.
2020-06-30 15:14:21 -07:00
Omar Sandoval
03d8cb0e32 libdrgn: fix hash_pair_from_non_avalanching_hash() on 64-bit without SSE 4.2
We were forgetting to mask away the extra bits. There are two places
that we use the tag without converting it to a uint8_t:
hash_table_probe_delta(), which is mostly benign since we mask it by the
chunk mask anyways; and table_chunk_match() without SSE 2, which
completely breaks.

While we're here, let's align the comments better.
2020-06-24 13:33:08 -07:00
Omar Sandoval
8e7c1f1009 drgn 0.0.5 2020-05-26 10:04:06 -07:00
Omar Sandoval
a227d0d50e Update elfutils and revert activation frame patch
After thinking about it some more, I realized that "libdwfl: simplify
activation frame logic" breaks the case where during unwinding someone
queries isactivation for reasons other than knowing whether to decrement
program counter. Revert the patch and refactor "libdwfl: add interface
for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it
differently.

Based on:

c95081596 size: Also obey radix printing for bsd format.

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-20 13:38:49 -07:00
Omar Sandoval
eea5422546 libdrgn: make Linux kernel stack unwinding more robust
drgn has a couple of issues unwinding stack traces for kernel core
dumps:

1. It can't unwind the stack for the idle task (PID 0), which commonly
   appears in core dumps.
2. It uses the PID in PRSTATUS, which is racy and can't actually be
   trusted.

The solution for both of these is to look up the PRSTATUS note by CPU
instead of PID.

For the live kernel, drgn refuses to unwind the stack of tasks in the
"R" state. However, the "R" state is running *or runnable*, so in the
latter case, we can still unwind the stack. The solution for this is to
look at on_cpu for the task instead of the state.
2020-05-20 12:03:00 -07:00
Omar Sandoval
146930aff8 libdrgn: replace arch frame_registers with callbacks
We currently unwind from pt_regs and NT_PRSTATUS using an array of
register definitions. It's more flexible and more efficient to do this
with an architecture-specific callback. For x86-64, this change also
makes us depend on the binary layout rather than member names of struct
pt_regs, but that shouldn't matter unless people are defining their own,
weird struct pt_regs.
2020-05-19 17:11:27 -07:00
Omar Sandoval
4d8597f0f8 libdrgn: add THREAD_SIZE to Linux kernel object finder
Despite the naming, this is the kernel stack size.
2020-05-19 17:10:54 -07:00
Omar Sandoval
971a2d3687 libdrgn/python: make Objects fully immutable
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
2020-05-18 00:07:49 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
c339113f9c libdrgn: adjust program counter when looking up frame symbol
For functions that call a noreturn function, the compiler may omit code
after the call instruction. This means that the return address may not
lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is
an "activation", i.e., its program counter is guaranteed to lie within
the caller. This is only the case for the initial frame, frames
interrupted by a signal, and the signal trampoline frame. For everything
else, we need to decrement the program counter before doing any lookups.
2020-05-13 17:11:54 -07:00
Omar Sandoval
175f83fc23 Update elfutils with noreturn unwinding fix
Rebase on master and fix dwfl_frame_module/dwfl_frame_dwarf_frame to
decrement the program counter when necessary.

Based on:

a8493c12a libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree

With the following patches:

configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: simplify activation frame logic
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
2020-05-13 16:41:52 -07:00
Omar Sandoval
bf545105c6 libdrgn: build in silent mode by default
The automake/libtool compilation output is obnoxiously verbose. Switch
on automake's silent mode, and make the custom rules honor it.
2020-05-10 00:12:50 -07:00
Omar Sandoval
2d1481f5ab libdrgn: add page table walker kernel memory reader
Now that we can walk page tables, we can use it in a memory reader that
reads kernel memory via the kernel page table. This means that we don't
need libkdumpfile for ELF vmcores anymore (although I'll keep the
functionality around until this code has been validated more).
2020-05-08 17:37:56 -07:00
Omar Sandoval
e697be707c libdrgn: use swapper_pg_dir in vmcoreinfo for fallback PAGE_OFFSET
I originally wanted to avoid depending on another vmcoreinfo field, but
an the next change is going to depend on swapper_pg_dir in vmcoreinfo
anyways, and it ends up being simpler to use it.
2020-05-08 17:37:56 -07:00