Commit Graph

77 Commits

Author SHA1 Message Date
Omar Sandoval
5541fad063 Fix some flake8 errors
Mainly unused imports, unused variables, unnecessary f-strings, and
regex literals missing the r prefix. I'm not adding it to the CI linter
because it's too noisy, though.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-11 14:52:44 -07:00
Omar Sandoval
6b79b21ab5 tests: fix test depending on repr(enum.Flag) format
CPython commit b775106d940e ("bpo-40066: Enum: modify `repr()` and
`str()` (GH-22392)") changed repr(enum.Flag) from, e.g.,
<Qualifiers.VOLATILE|CONST: 3> to Qualifiers.CONST|Qualifiers.VOLATILE.
Fix tests.test_type.TestType.test_qualifiers to not assume the format.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-22 01:17:22 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Davide Cavalca
7ca157316f tests: properly escape regexp strings
Signed-off-by: Davide Cavalca <dcavalca@fb.com>
2021-04-02 10:37:33 -07:00
Omar Sandoval
da0280016c libdrgn: python: identify bit fields in TypeMember.__repr__
If a member is a bit field, then we should format it with the underlying
Object so that it shows the bit field size.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-17 12:02:53 -07:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
c2eec00ae0 libdrgn/python: use None instead of 0 for TypeMember.bit_field_size
Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by
using None to represent a non-bit field instead of 0.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-25 01:53:23 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
fd04463596 libdrgn/python: add Type.member()
In Python, looking up a member in a drgn Type by name currently looks
something like:

  member = [member for member in type.members if member.name == "foo"][0]

Add a Type.member(name) method, which is both easier and more efficient.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:10:23 -08:00
Omar Sandoval
4cbb9b552a libdrgn: fix comparison of types with anonymous members
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-08 17:32:46 -07:00
Omar Sandoval
36068a0ea8 Fix trailing commas for Black v20.8b1
Black was recently changed to treat a trailing comma as an indicator to
put each item/argument on its own line. We have a bunch of places where
something previously had to be split into multiple lines, then was
edited to fit on one line, but Black kept the trailing comma. Now this
update wants to unnecessarily split it back up. For now, let's get rid
of these commas. Hopefully in the future Black has a way to opt out of
this.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
4e770fb18a Format imports with isort
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:55:07 -07:00
Omar Sandoval
3028da4d1d libdrgn: compare language in drgn_type_eq()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-08 22:07:49 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Jay Kamat
d8fadf10ee libdrgn: Add cpp language and tests 2020-04-03 16:35:38 -07:00
Jay Kamat
6c264b0eae libdrgn: add language to struct drgn_type
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
2020-02-26 19:55:42 -08:00
Omar Sandoval
26ef465007 libdrgn/python: add proper type for members and parameters
This continues the conversion from the last commit. Members and
parameters are basically the same, so we can do them together. Unlike
enumerators, these don't make sense to unpack or access as sequences.
2020-02-12 15:40:19 -08:00
Omar Sandoval
7c70a1a384 libdrgn/python: add proper type for enumerators
Currently, type members, enumerators, and parameters are all represented
by tuples in the Python bindings. This is awkward to document and
implement. Instead, let's replace these tuples with proper types,
starting with the easiest one, TypeEnumerator. This one still makes
sense to treat as a sequence so that it can be unpacked as (name,
value).
2020-02-12 15:37:41 -08:00
Omar Sandoval
660276a0b8 Format Python code with Black
I'm not a fan of 100% of the Black coding style, but I've spent too much
time manually formatting Python code, so let's just pull the trigger.
2020-01-14 11:51:58 -08:00
Amlan Nayak
0df2152307 Add basic class type support
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
2019-11-18 10:36:40 -08:00
Omar Sandoval
b8c657d760 libdrgn: python: add sizeof()
It's annoying to do obj.type_.size, and that doesn't even work for every
type. Add sizeof() that does the right thing whether it's given a Type
or Object.
2019-10-18 11:47:32 -07:00
Omar Sandoval
0a74a610bc libdrgn: python: only repr() one level of type members
Currently, repr() of structure and union types goes arbitrarily deep
(except for cycles). However, for lots of real-world types, this is
easily deeper than Python's recursion limit, so we can't get a useful
repr() at all:

>>> repr(prog.type('struct task_struct'))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
RecursionError: maximum recursion depth exceeded while getting the repr of an object

Instead, only print one level of structure and union types.
2019-07-27 15:04:31 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
932b7857b5 libdrgn: expose primitive type concept to public interface
Previously known as c_type.
2019-05-06 14:55:34 -07:00
Omar Sandoval
7dbf8447b2 tests: test comparison between Type and other object 2019-05-06 14:55:34 -07:00
Omar Sandoval
435640faf6 Fix some linter errors 2019-04-11 15:51:20 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00
Omar Sandoval
0e9ca182bc program: fix ProgramObject.__index__() for enum objects
EnumType is in fact an integer, so enum objects should be allowed as an
index.
2018-08-22 01:04:06 -07:00
Omar Sandoval
b70addd5ae type: get rid of unused Type._read_pretty() 2018-07-22 00:30:41 -07:00
Omar Sandoval
cab896a65b type: exclude all-zero arrays and structs from array pretty-printing 2018-07-22 00:26:10 -07:00
Omar Sandoval
f4bb55e359 type: support putting multiple array elements on the same line 2018-07-21 22:53:05 -07:00
Omar Sandoval
a69a743f55 tests: fix const_anonymous_color_type
It has negative enumerators, so it obviously must be signed.
2018-07-16 19:42:52 -07:00
Omar Sandoval
878e4017c8 type: add new CompoundType.members()
CompoundType.members() currently returns a list of member names;
sometimes, we actually want the type and offset. So, rename members() to
member_names(), and make members() return the type and offset, using a
newly added version of functools.partial() that caches the return value.
2018-07-15 08:08:42 -07:00
Omar Sandoval
6abb2f2402 Separate internal API from public API
While we're here, clean up some rough edges of the API and document a
lot more.
2018-07-14 10:20:17 -07:00
Omar Sandoval
9273de829b type: handle anonymous types thoroughly
There were a couple more corner cases missing: arrays of anonymous types
and bit fields of anonymous enums.
2018-07-11 23:56:32 -07:00
Omar Sandoval
51aa35bc05 type: also format typedefs of anonymous enums 2018-07-11 22:34:01 -07:00
Omar Sandoval
6fa2d68c0c type: add Type.unqualified() 2018-07-11 21:51:48 -07:00
Omar Sandoval
9b5b721838 type: format typedefs of anonymous types better
Currently, printing, e.g.,

typedef struct {
	int counter;
} atomic_t

results in

typedef struct atomic_t

Which isn't very useful. Instead, print the former.
2018-07-11 21:40:01 -07:00
Omar Sandoval
6a6d37c5e1 Make Program and CoreReader context managers
This is preparation for the next change, so that a Program can clean up
after itself. The Program will close the CoreReader, and the CoreReader
will close the underlying file.
2018-07-11 19:16:34 -07:00
Omar Sandoval
800ee3ec36 corereader: take fd and list of segments instead of path
Now, we can get rid of the ELF parsing implementation in CoreReader.
Instead, we parse in ElfFile and pass the parsed information to
CoreReader.
2018-07-09 19:12:33 -07:00
Omar Sandoval
95c0682718 corereader: implement type reads in C
Now that read_memory() was converted to C, IntType.read() and friends
are a bottleneck. Convert them to C methods of CoreReader.
2018-05-25 00:41:12 -07:00
Omar Sandoval
15849f5795 type: make operand_type() a method of Type
This gets rid of the huge isinstance() chain in
TypeIndex.operand_type().
2018-05-18 23:51:20 -07:00
Omar Sandoval
54f5124be8 type: format char arrays as strings
For, e.g., comm in struct task_struct, a string is much nicer to look
at.
2018-05-14 20:35:21 -07:00
Omar Sandoval
1916528f5d type: make offsetof() and typeof() accept arbitrary member designators
Instead of only accepting an identifier.
2018-05-13 00:42:50 -07:00