Commit Graph

52 Commits

Author SHA1 Message Date
Omar Sandoval
f7fe93e573 cli: show elfutils version in use
drgn depends heavily on libelf and libdw, so it's useful to know what
version we're using. Add drgn._elfutils_version and use that in the CLI
and in the test cases where we currently check the libdw version.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 11:10:50 -07:00
Omar Sandoval
bc85767e5f libdrgn: support looking up parameters and variables in stack traces
After all of the preparatory work, the last two missing pieces are a way
to find a variable by name in the list of scopes that we saved while
unwinding, and a way to find the containing scopes of an inlined
function. With that, we can finally look up parameters and variables in
stack traces.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
38573cfdde libdrgn: stack_trace: pretty print frames and add frames for inline functions
If we want to access a parameter or local variable in an inlined
function, then we need a stack frame for that function. It's also much
more useful to see inlined functions in the stack trace in general. So,
when we've unwound the registers for a stack frame, walk the debugging
information to find all of the (possibly inlined) functions at the
program counter, and add a drgn stack frame for each of those.

Also add StackFrame.name and StackFrame.is_inline so that we can
distinguish inline frames. Also add StackFrame.source() to get the
filename and line and column numbers. Finally, add the source code
location to pretty-printed stack traces and add pretty-printing for
individual stack frames that includes extra information.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
ad37c79cba libdrgn: python: add documentation and type annotation for Program.__contains__()
drgn.Program has supported the "in" operator since commit 25e7a9d3b8
("libdrgn/python: implement Program.__contains__"), but it's
undocumented and unannotated. Add a type annotation with a docstring
along with a METH_COEXIST method.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-12 16:26:56 -07:00
Omar Sandoval
85c367bf79 Reformat empty docstrings
Black 21.4b2 now replaces empty docstrings with a docstring containing a
single space. Apply that formatting.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-30 17:06:47 -07:00
Omar Sandoval
037a510ff2 Fix drgn.FaultError type annotations
FaultError() also takes an error message.

Fixes: 80c9fb35ff ("Add type hint stubs and generate documentation from them")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-30 17:03:16 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
eec67768aa libdrgn: replace elfutils DWARF unwinder with our own
The elfutils DWARF unwinder has a couple of limitations:

1. libdwfl doesn't have an interface for getting register values, so we
   have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
   error even on success, so we have to squirrel away our own errors in
   the callback.

Furthermore, there are a couple of things that will be easier with our
own unwinder:

1. Integrating unwinding using ORC will be easier when we're handling
   unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
   expression evaluation.

Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:43:12 -07:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
9a066b409f docs: mention that default arguments are not yet parsed from DWARF
TypeParameter.default_argument is currently basically a placeholder
because we don't parse it from DWARF and compilers don't emit it, so
document that. See #82.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 02:16:23 -08:00
Kamalesh Babulal
221a218704 libdrgn: add powerpc stack trace support
Add powerpc specific register information required to retrive the
stack traces of the tasks on both live system and from the core dump.
It uses the existing DSL format to define platform registers and
helper functions to initial them. It also adds architecture specific
information to enable powerpc. Current support is for little-endian
powerpc only.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2021-01-29 11:31:59 -08:00
Omar Sandoval
b899a10836 Remove register numbers from API and add register aliases
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.

While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 17:47:45 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
988e9e7190 libdrgn/python: add Object.absent_
Without this, the only way to check whether an object is absent in
Python is to try to use the object and catch the ObjectAbsentError.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 15:06:40 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
c2eec00ae0 libdrgn/python: use None instead of 0 for TypeMember.bit_field_size
Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by
using None to represent a non-bit field instead of 0.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-25 01:53:23 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
a595e52d22 libdrgn/python: add Type.has_member()
Add drgn_type_has_member() to libdrgn and Type.has_member() to the
Python bindings. This can simplify some version checks, like the one in
_for_each_block_device() since commit 9a10a927b0 ("helpers: fix
for_each_{disk,partition}() on kernels >= v5.1").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:38:48 -08:00
Omar Sandoval
fd04463596 libdrgn/python: add Type.member()
In Python, looking up a member in a drgn Type by name currently looks
something like:

  member = [member for member in type.members if member.name == "foo"][0]

Add a Type.member(name) method, which is both easier and more efficient.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:10:23 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
d495d65108 Use @overload for drgn.Object() constructors
Instead of the current tangle of requirements on arguments, use
overloads.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
c801e5e9b1 drgndoc: format __init__() signature separately from class
Having the signature in the class line is awkward, especially when
__init__() is overloaded. Instead, document __init__() separately, but
refer to it by the name of the class. There might still be a better way
to represent this, but this is at least better than before.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
6bd0c2b4d2 libdrgn: add concept of "unavailable" objects
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:58:19 -08:00
Omar Sandoval
dc867781e3 docs: remove stale comment in Program.pointer_type()
When pointer_type() became a Program method, I forgot to remove the
reference to the old pointer_type() method.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-11 11:56:18 -08:00
Omar Sandoval
ff96c75da0 helpers: translate task_state_to_char() to Python
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 13:54:39 -07:00
Omar Sandoval
e96d9fd3fd libdrgn/python: don't allow None for Program.object() flags
Similar to the previous commit, this was to work around pydoc issues
that we don't have anymore.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
8c7c80e2f7 Fix mypy --strict warnings
The remaining warnings are all no-any-return, which is hard to avoid in
drgn.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:28:02 -07:00
Omar Sandoval
0cf3320a89 Add type annotations to helpers
Now that drgndoc can handle overloads and we have the IntegerLike and
Path aliases, we can add type annotations to all helpers. There are also
a couple of functional changes that snuck in here to make annotating
easier.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:28:02 -07:00
Omar Sandoval
64a04a6c4f drgndoc: include attributes based on presence of docstring
We can get rid of the :include: and :exclude: options by deciding solely
based on whether a node has a docstring. Empty docstrings can be used to
indicate nodes that should be included with no additional content. The
__init__() method must now also have a docstring in order to be
documented. Additionally, the directives are now fully formatted by the
Formatter rather than being split between the Formatter and
DrgnDocDirective.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:21:29 -07:00
Omar Sandoval
f41cc7fb48 drgndoc: recursively document names imported with alias
The helpers implemented in C have Python wrappers only for the purpose
of documentation. This is because drgndoc ignores all imports when
recursively documenting attributes. However, mypy uses the convention
that aliased imports (i.e., import ... as ... or from ... import ... as
...) are considered re-exported, so we can follow that convention and
include aliased imports. (mypy also considered attributes in __all__ as
re-exported, so we should probably follow that in the future, too, but
for now aliased imports are enough). This lets us get rid of the Python
wrappers.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:21:29 -07:00
Omar Sandoval
2d49ef657b Add Path type alias
Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add
an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes]
to get rid of some mypy --strict errors.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:20:29 -07:00
Omar Sandoval
66c5cc83a6 Add IntegerLike type annotation
Lots if interfaces in drgn transparently turn an integer Object into an
int by using __index__(), so add an IntegerLike protocol for this and
use it everywhere applicable.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 11:16:50 -07:00
Omar Sandoval
e3309765f9 helpers: add kaslr_offset() and move pgtable_l5_enabled()
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:00:16 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
c339113f9c libdrgn: adjust program counter when looking up frame symbol
For functions that call a noreturn function, the compiler may omit code
after the call instruction. This means that the return address may not
lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is
an "activation", i.e., its program counter is guaranteed to lie within
the caller. This is only the case for the initial frame, frames
interrupted by a signal, and the signal trampoline frame. For everything
else, we need to decrement the program counter before doing any lookups.
2020-05-13 17:11:54 -07:00
Omar Sandoval
8a276838ac helpers: add access_process_vm() and access_remote_vm()
Now that we can walk page tables, we can finally read memory from
userspace tasks.

Closes #53.
2020-05-08 17:37:01 -07:00
Omar Sandoval
10e58777c3 Add Program.read_{u8,u16,u32,u64,word}()
I've found that I do this manually a lot (e.g., when digging through a
task's stack). Add shortcuts for reading unsigned integers and a note
for how to manually read other formats.
2020-04-27 17:27:10 -07:00
Omar Sandoval
61a7a0c39d Fix Object.__getitem__() idx type annotation
In obj[idx], idx can also be another Object. Technically, it can be
anything that implements __index__() (), but typing.SupportsIndex was added
in v3.8. For now, allow Object explicitly.
2020-04-21 15:49:26 -07:00
Omar Sandoval
ea8cdede6d Fix Program.pointer_type() type annotation
"type" can be a string or Type object.
2020-04-21 15:03:33 -07:00
Omar Sandoval
55135d5854 Remove stray type comments
These are already emitted by drgndoc, so we don't need them.
2020-04-21 14:59:04 -07:00
Omar Sandoval
da0c637f2d Fix Object.__getattribute__() return type annotation
mypy only uses the annotated return type of __getattribute__() for
attributes which aren't otherwise annotated, so we can annotate it as
returning Object.
2020-04-20 12:37:55 -07:00
Omar Sandoval
12a23a6e98 Add stubs for internal helpers
mypy complains about these functions not existing, so add stubs to let
mypy know they exist. In the future, we probably want the docstring to
be in these stubs and use them directly instead of wrapping them in
Python, but that's for another day.
2020-04-18 15:10:29 -07:00
Jay Kamat
3f870603fa libdrgn: add default language to drgn_program
For operations where we don't have a type available, we currently fall
back to C. Instead, we should guess the language of the program and use
that as the default. The heurisitic implemented here gets the language
of the CU containing "main" (except for the Linux kernel, which is
always C). In the future, we should allow manually overriding the
automatically determined language.
2020-02-26 19:55:42 -08:00
Jay Kamat
6c264b0eae libdrgn: add language to struct drgn_type
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
2020-02-26 19:55:42 -08:00