Commit Graph

34 Commits

Author SHA1 Message Date
Omar Sandoval
660276a0b8 Format Python code with Black
I'm not a fan of 100% of the Black coding style, but I've spent too much
time manually formatting Python code, so let's just pull the trigger.
2020-01-14 11:51:58 -08:00
Omar Sandoval
690b5fd650 libdrgn: generalize architecture to platform
For stack trace support, we'll need to have some architecture-specific
functionality. drgn's current notion of an architecture doesn't actually
include the instruction set architecture. This change expands it to a
"platform", which includes the ISA as well as the existing flags.
2019-08-02 00:11:56 -07:00
Omar Sandoval
0c5df56fba libdrgn: replace symbol index with object index
struct drgn_symbol doesn't really represent a symbol; it's just an
object which hasn't been fully initialized (see c2be52dff0 ("libdrgn:
rename object index to symbol index"), it used to be called a "partial
object"). For stack traces, we're going to have a notion of a symbol
that more closely represents an ELF symbol, so let's get rid of the
temporary struct drgn_symbol representation and just return an object
directly.
2019-07-29 17:04:47 -07:00
Omar Sandoval
74bd59e38a libdrgn: python: get rid of Program._symbol()
We can test with Program.object() just as easily, so get rid of this
undocumented method.
2019-07-29 17:04:47 -07:00
Omar Sandoval
b5b024ecac tests: move common helpers to top-level 2019-07-29 17:04:47 -07:00
Omar Sandoval
e5874ad18a libdrgn: use libdwfl
libdwfl is the elfutils "DWARF frontend library". It has high-level
functionality for looking up symbols, walking stack traces, etc. In
order to use this functionality, we need to report our debugging
information through libdwfl. For userspace programs, libdwfl has a much
better implementation than drgn for automatically finding debug
information from a core dump or PID. However, for the kernel, libdwfl
has a few issues:

- It only supports finding debug information for the running kernel, not
  vmcores.
- It determines the vmlinux address range by reading /proc/kallsyms,
  which is slow (~70ms on my machine).
- If separate debug information isn't available for a kernel module, it
  finds it by walking /lib/modules/$(uname -r)/kernel; this is repeated
  for every module.
- It doesn't find kernel modules with names containing both dashes and
  underscores (e.g., aes-x86_64).

Luckily, drgn already solved all of these problems, and with some
effort, we can keep doing it ourselves and report it to libdwfl.

The conversion replaces a bunch of code for dealing with userspace core
dump notes, /proc/$pid/maps, and relocations.
2019-07-15 12:27:48 -07:00
Omar Sandoval
e73346b488 libdrgn: generalize IS_RUNNING_KERNEL flag to IS_LIVE
I.e., also flag running processes as live.
2019-07-08 16:55:54 -07:00
Omar Sandoval
25e7a9d3b8 libdrgn/python: implement Program.__contains__ 2019-06-28 16:02:52 -07:00
Omar Sandoval
c0bc72b0ea libdrgn: use splay tree for memory reader
The current array-based memory reader has a bug in the following
scenario:

    prog.add_memory_segment(0xffff0000, 128, ...)
    # This should replace a subset of the first segment.
    prog.add_memory_segment(0xffff0020, 32, ...)
    # This moves the first segment back to the front of the array.
    prog.read(0xffff0000, 32)
    # This finds the first segment instead of the second segment.
    prog.read(0xffff0032, 32)

Fix it by using the newly-added splay tree. This also splits up the
virtual and physical memory segments into separate trees.
2019-05-24 17:48:08 -07:00
Omar Sandoval
9b563170f8 libdrgn: make load_debug_info() API saner
Rather than exposing the underlying open and load steps of DWARF index,
simplify it down to a single load step.
2019-05-13 15:04:27 -07:00
Omar Sandoval
ab58a5bff0 libdrgn: determine default size_t and ptrdiff_t more intelligently
Currently, size_t and ptrdiff_t default to typedefs of the default
unsigned long and long, respectively, regardless of what the program
actually defines unsigned long or long as. Instead, make them refer the
whatever integer type (long, long long, or int) is the same size as the
word size.
2019-05-10 15:14:03 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
1b4aee1a55 libdrgn/python: add Program.pointer_type() 2019-04-11 23:35:10 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00
Omar Sandoval
9a4262b609 Make cast() a function instead of a method 2019-02-22 10:34:53 -08:00
Omar Sandoval
3b998be960 Make container_of() a function instead of a method
The Object class is a little too bloated in terms of functionality, so
let's start streamlining it.
2019-02-22 10:34:46 -08:00
Omar Sandoval
29cfa7d03d program: get rid of Program.object() and null()
It's silly to have these trival factory methods. Instead, directly
construct with Object() and add a new NULL() function. We also
automatically import these in interactive mode.
2019-01-31 00:54:42 -08:00
Omar Sandoval
e912edc7d6 Move Program and Object to top-level drgn module 2019-01-29 21:04:40 -08:00
Omar Sandoval
2d340e30bb Rename ProgramObject to Object 2019-01-29 20:48:37 -08:00
Omar Sandoval
0e9ca182bc program: fix ProgramObject.__index__() for enum objects
EnumType is in fact an integer, so enum objects should be allowed as an
index.
2018-08-22 01:04:06 -07:00
Omar Sandoval
6abb2f2402 Separate internal API from public API
While we're here, clean up some rough edges of the API and document a
lot more.
2018-07-14 10:20:17 -07:00
Omar Sandoval
6a6d37c5e1 Make Program and CoreReader context managers
This is preparation for the next change, so that a Program can clean up
after itself. The Program will close the CoreReader, and the CoreReader
will close the underlying file.
2018-07-11 19:16:34 -07:00
Omar Sandoval
800ee3ec36 corereader: take fd and list of segments instead of path
Now, we can get rid of the ELF parsing implementation in CoreReader.
Instead, we parse in ElfFile and pass the parsed information to
CoreReader.
2018-07-09 19:12:33 -07:00
Omar Sandoval
50f52ec295 program: make ProgramObject() arguments mandatory keywords
It's too hard to remember whether value or address comes first, so make
sure we're always explicit which one.
2018-06-28 23:40:47 -07:00
Omar Sandoval
aeb767d43e Replace lookup_variable callback with new VariableIndex class
Similar to how we replaced the lookup_type callback with TypeIndex.
2018-06-28 21:05:21 -07:00
Omar Sandoval
2ee6662639 program: handle pointers to typedefs of struct/union types
ProgramObject.__dir__(), member_(), and container_of_() all check that
the relevant type is a struct or union, but they all need to allow
typedefs of structs or unions, as well.
2018-05-25 22:32:03 -07:00
Omar Sandoval
6b178c5108 program: only check struct or union in ProgramObject.member_()
We're currently also doing this check in __getattr__(), which is
unnecessary overhead in the common case. We can just check the exception
in the error case.
2018-05-25 21:56:24 -07:00
Omar Sandoval
95c0682718 corereader: implement type reads in C
Now that read_memory() was converted to C, IntType.read() and friends
are a bottleneck. Convert them to C methods of CoreReader.
2018-05-25 00:41:12 -07:00
Omar Sandoval
15849f5795 type: make operand_type() a method of Type
This gets rid of the huge isinstance() chain in
TypeIndex.operand_type().
2018-05-18 23:51:20 -07:00
Omar Sandoval
e7a2ba32e8 program: show dereferenced value for ProgramObject.str()
I always forget to dereference pointers, which gets annoying.
2018-05-14 20:36:19 -07:00
Omar Sandoval
e327aef860 program: fix ProgramObject.__round__() with ndigits is not None
In this case, we must return a ProgramObject.
2018-05-13 00:47:44 -07:00
Omar Sandoval
35271231fe program: swap ProgramObject arguments
type, address, value makes more sense and looks more like C.
2018-05-07 22:48:00 -07:00
Omar Sandoval
15ea6e8d97 program: implement ProgramObject binary operators 2018-05-07 18:48:10 -07:00
Omar Sandoval
da83e0adb3 program: add ProgramObject tests 2018-05-03 23:26:43 -07:00