Commit Graph

68 Commits

Author SHA1 Message Date
Omar Sandoval
d1ffd581bd libdrgn: allow reinterpreting primitive scalar values
We don't allow this because "value objects with a scalar type cannot be
reinterpreted, as their memory layout in the program is not known". That
doesn't really make sense: we already support reconstructing the
in-memory representation with drgn_object_read_bytes().

Implement this by making drgn_object_slice() support slicing all
objects, using drgn_object_read_bytes() when necessary, then make
drgn_object_reinterpret() a trivial wrapper around it.

Closes #378.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2024-01-12 15:50:23 -08:00
Omar Sandoval
60a289fdff tests: make identical() stricter
Require exact type matches, not subclasses, and fail hard for types we
don't explicitly handle. This caught one place where we weren't testing
what we thought we were.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-11-08 13:18:44 -08:00
Omar Sandoval
243f6fb7d5 libdrgn: support value objects with >64-bit integer types
The Linux kernel's struct task_struct on AArch64 contains an array of
__uint128_t:

  >>> task = find_task(prog, 1)
  >>> task.type_
  struct task_struct *
  >>> task.thread.type_
  struct thread_struct {
          struct cpu_context cpu_context;
          struct {
                  unsigned long tp_value;
                  unsigned long tp2_value;
                  struct user_fpsimd_state fpsimd_state;
          } uw;
          enum fp_type fp_type;
          unsigned int fpsimd_cpu;
          void *sve_state;
          void *sme_state;
          unsigned int vl[2];
          unsigned int vl_onexec[2];
          unsigned long fault_address;
          unsigned long fault_code;
          struct debug_info debug;
          struct ptrauth_keys_user keys_user;
          struct ptrauth_keys_kernel keys_kernel;
          u64 mte_ctrl;
          u64 sctlr_user;
          u64 svcr;
          u64 tpidr2_el0;
  }
  >>> task.thread.uw.fpsimd_state.type_
  struct user_fpsimd_state {
          __int128 unsigned vregs[32];
          __u32 fpsr;
          __u32 fpcr;
          __u32 __reserved[2];
  }

As a result, printing a task_struct fails:

  >>> task
  Traceback (most recent call last):
    File "<console>", line 1, in <module>
    File "/host/home/osandov/repos/drgn3/drgn/cli.py", line 140, in _displayhook
      text = value.format_(columns=shutil.get_terminal_size((0, 0)).columns)
  NotImplementedError: integer values larger than 64 bits are not yet supported

PR #311 suggested treating >64-bit integers as byte arrays for now; I
tried an alternate hack of handling >64-bit integers only in the
pretty-printing code. Both of these had issues, though.

Instead, let's push >64-bit integer support a little further and allow
storing "big integer" value objects. We still don't support any
operations on them, so this still doesn't complete #170. We store the
raw bytes of the value for now, but we'll probably change this if we add
support for operations (e.g., to store the value as an mp_limb_t array
for GMP). We also print >64-bit integer types in hexadecimal for
simplicity. This is inconsistent with the existing behavior of printing
in decimal, but more readable. In the future, we might want to add
heuristics to decide when to print in decimal vs hexadecimal for all
sizes.

Closes #311.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-02 14:21:46 -07:00
Omar Sandoval
3ce37c8002 libdrgn: python: fix creating compound value with 32-bit float member on big-endian
This is similar to commit 155ec92ef2 ("libdrgn: fix reading 32-bit
float object values on big-endian").

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-02 10:39:34 -07:00
Omar Sandoval
0bc79c877a libdrgn: fix stray bits when reading bytes of bit field
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-01 16:31:17 -07:00
Omar Sandoval
87b7292aa5 Relicense drgn from GPLv3+ to LGPLv2.1+
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.

All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-01 17:05:16 -07:00
Shung-Hsi Yu
e8d0c85811 test: add test for _repr_pretty_() method
Since _repr_pretty_() uses output of str(), and the latter is already
heavily tested in tests/test_language_c.py, we can simply test whether
p.text() is called instead of duplicating all the test cases.

Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
2022-08-25 13:52:28 -07:00
Kevin Svetlitski
5aaf3db6fc libdrgn: support reference and absent objects with float types which aren't 32 or 64 bits
Very similar to a541e9b170, but adds
partial support for floats (as opposed to integers) which aren't 32 or
64 bits.

Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
2022-07-06 15:47:18 -07:00
Omar Sandoval
a541e9b170 libdrgn: support reference and absent objects with >64-bit integer types
GCC and Clang have 128-bit integer types on 64-bit targets: __int128 and
unsigned __int128. Clang additionally has N-bit integers of up to 2<<24
bits with _ExtInt(N), which was standardized in C23 as _BitInt(N).

Currently, we disallow creating objects with a >64-bit integer type. Jay
Kamat reported that this would cause errors when examining some
binaries. The reason we disallow this is that we don't have a way to
represent or do operations on >64-bit values. We could make use of a
bignum library like GMP to do this in the future.

However, for now, we can loosen this restriction and at least allow
reference and absent objects with big integer types. This requires
enforcing two things: that we never create a value object with a >64-bit
integer type, and that we never read the value of a reference object
with a >64-bit integer type.

Co-authored-by: Jay Kamat <jaygkamat@gmail.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-04-28 13:38:38 -07:00
Omar Sandoval
7f232a4815 pre-commit: update Black
Black 22.1.0 has some style changes: string prefixes are normalized and
spaces around the power operator are removed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-12 13:48:49 -08:00
Omar Sandoval
c0d8709b45 Update copyright headers to Meta
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-21 15:59:44 -08:00
Omar Sandoval
5541fad063 Fix some flake8 errors
Mainly unused imports, unused variables, unnecessary f-strings, and
regex literals missing the r prefix. I'm not adding it to the CI linter
because it's too noisy, though.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-11 14:52:44 -07:00
Omar Sandoval
7335df114c libdrgn: python: add Object.to_bytes_()
And the libdrgn implementation, drgn_object_read_bytes().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-26 17:12:34 -07:00
Omar Sandoval
9c00552007 libdrgn: python: add Object.from_bytes_()
Add a way to create an object from raw bytes. One example where I've
wanted this is creating a struct pt_regs from a PRSTATUS note or other
source.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-26 17:06:58 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
85dec2b8f6 tests: move C-specific tests from test_object to test_language_c
TestCLiteral, TestCIntegerPromotion, TestCCommonRealType,
TestCOperators, and TestCPretty in test_object all test various
operations on objects, but since they're testing language-specific
behavior, they belong in test_language_c.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 16:11:19 -08:00
Omar Sandoval
55e3a58e06 libdrgn: python: use correct member offset when creating object from value
We need to use the offset of the member in the outermost object type,
not the offset in the immediate containing type in the case of nested
anonymous structs.

Fixes: e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with drgn_type_find_member()")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 02:29:59 -08:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
72b4aa9669 libdrgn: clean up object initialization
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 17:43:14 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
988e9e7190 libdrgn/python: add Object.absent_
Without this, the only way to check whether an object is absent in
Python is to try to use the object and catch the ObjectAbsentError.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 15:06:40 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
523fd26959 libdrgn: don't allow casting to non-scalar types at all
Currently, we try to emulate the GNU C extension of casting a struct
type to itself. This does a deep type comparison, which is expensive. We
could take a shortcut like only comparing the kind and type name, but
seeing as standard C only allows casting to a scalar type, let's drop
support for casting to a struct (or other non-scalar) type entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 02:46:05 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
e72ecd0e2c libdrgn: replace drgn_program_member_info() with drgn_type_find_member()
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:40:54 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
6bd0c2b4d2 libdrgn: add concept of "unavailable" objects
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:58:19 -08:00
Omar Sandoval
5f17281926 libdrgn: make drgn_object::is_reference an enum
To prepare for a new kind of object, replace the is_reference bool with
an enum drgn_object_kind.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:37:58 -08:00
Omar Sandoval
36068a0ea8 Fix trailing commas for Black v20.8b1
Black was recently changed to treat a trailing comma as an indicator to
put each item/argument on its own line. We have a bunch of places where
something previously had to be split into multiple lines, then was
edited to fit on one line, but Black kept the trailing comma. Now this
update wants to unnecessarily split it back up. For now, let's get rid
of these commas. Hopefully in the future Black has a way to opt out of
this.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
971a2d3687 libdrgn/python: make Objects fully immutable
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
2020-05-18 00:07:49 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
63299e0701 libdrgn: actually use uint64_t for two's complement unary ops
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
2020-05-08 13:50:24 -07:00
Omar Sandoval
26ef465007 libdrgn/python: add proper type for members and parameters
This continues the conversion from the last commit. Members and
parameters are basically the same, so we can do them together. Unlike
enumerators, these don't make sense to unpack or access as sequences.
2020-02-12 15:40:19 -08:00
Omar Sandoval
7c70a1a384 libdrgn/python: add proper type for enumerators
Currently, type members, enumerators, and parameters are all represented
by tuples in the Python bindings. This is awkward to document and
implement. Instead, let's replace these tuples with proper types,
starting with the easiest one, TypeEnumerator. This one still makes
sense to treat as a sequence so that it can be unpacked as (name,
value).
2020-02-12 15:37:41 -08:00
Omar Sandoval
9de2cc8410 libdrgn/python: make Object.__index__() TypeError message clearer
Currently, we print:

>>> prog.symbol(prog['init_task'])
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: cannot convert 'struct task_struct' to index

It's not obvious what it means to convert to an index. Instead, let's
use the error message raised by operator.index():

TypeError: 'struct task_struct' object cannot be interpreted as an integer
2020-02-11 09:19:53 -08:00
Serapheim Dimitropoulos
ad82e9623a Introduce OutOfBoundsError
Decouple some of the responsibilities of FaultError to
OutOfBoundsError so consumers can differentiate between
invalid memory accesses and running out of bounds in
drgn Objects which may be based on valid memory address.
2020-02-04 14:59:31 -08:00
Omar Sandoval
660276a0b8 Format Python code with Black
I'm not a fan of 100% of the Black coding style, but I've spent too much
time manually formatting Python code, so let's just pull the trigger.
2020-01-14 11:51:58 -08:00
Omar Sandoval
1443d17fb4 libdrgn: add DRGN_FORMAT_OBJECT_IMPLICIT_ELEMENTS 2019-12-19 11:43:54 -08:00
Omar Sandoval
db66952b2e libdrgn: add DRGN_FORMAT_OBJECT_IMPLICIT_MEMBERS 2019-12-19 11:43:54 -08:00
Omar Sandoval
c8434e9a9e libdrgn: add DRGN_FORMAT_OBJECT_ELEMENT_INDICES 2019-12-19 11:43:54 -08:00
Omar Sandoval
cfceb491db libdrgn: add DRGN_FORMAT_OBJECT_MEMBER_NAMES 2019-12-19 11:43:54 -08:00
Omar Sandoval
4fad941ec1 libdrgn: add DRGN_FORMAT_OBJECT_{MEMBERS,ELEMENTS}_SAME_LINE 2019-12-19 11:43:54 -08:00
Omar Sandoval
6bb8da04a0 libdrgn: omit trailing comma when formatting one-line array
This is somewhat arbitrary, but I think it looks more natural to only
use the trailing comma for multi-line initializers.
2019-12-19 11:43:54 -08:00
Omar Sandoval
d77b7bd7e3 libdrgn: add DRGN_FORMAT_OBJECT_{TYPE_NAME,MEMBER_TYPE_NAMES,ELEMENT_TYPE_NAMES} 2019-12-19 11:43:54 -08:00
Omar Sandoval
89307c532a libdrgn: add DRGN_FORMAT_OBJECT_CHAR 2019-12-19 11:43:54 -08:00
Omar Sandoval
7cee597fff libdrgn: add DRGN_FORMAT_OBJECT_STRING 2019-12-19 11:43:54 -08:00
Omar Sandoval
5865fa4d16 libdrgn: add DRGN_FORMAT_OBJECT_SYMBOLIZE 2019-12-19 11:43:54 -08:00