Commit Graph

220 Commits

Author SHA1 Message Date
Omar Sandoval
72b4aa9669 libdrgn: clean up object initialization
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 17:43:14 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
b899a10836 Remove register numbers from API and add register aliases
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.

While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 17:47:45 -08:00
Omar Sandoval
bbefc573d8 libdrgn: debug_info: make sure DW_TAG_template_value_parameter has value
Otherwise, an invalid DW_TAG_template_value_parameter can be confused
for a type parameter.

Fixes: 352c31e1ac ("Add support for C++ template parameters")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 12:07:46 -08:00
Omar Sandoval
5f170ea3f3 helpers: add per_cpu()
The correct way to access global per-CPU variables
(per_cpu_ptr(prog[name].address_of_(), cpu)) has been a common source of
confusion (see #77). Add an analogue to the per_cpu() macro in the
kernel as a shortcut and document it as the easiest method for getting a
global per-CPU variable: per_cpu(prog[name], cpu).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 11:40:05 -08:00
Omar Sandoval
81a203c48f helpers: fix for_each_{possible,online,present}_cpu() on v4.4
Also reorder the definitions to alphabetical order and add tests.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 10:08:48 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
a57c26ed32 libdrgn: fix zero-length array GCC < 9.0 workaround for qualified types
We're not applying the zero-length array workaround when the array type
is qualified. Make sure we pass through can_be_incomplete_array when
parsing DW_TAG_{const,restrict,volatile,atomic}_type.

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 11:21:57 -08:00
Omar Sandoval
988e9e7190 libdrgn/python: add Object.absent_
Without this, the only way to check whether an object is absent in
Python is to try to use the object and catch the ObjectAbsentError.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 15:06:40 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
c2eec00ae0 libdrgn/python: use None instead of 0 for TypeMember.bit_field_size
Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by
using None to represent a non-bit field instead of 0.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-25 01:53:23 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
523fd26959 libdrgn: don't allow casting to non-scalar types at all
Currently, we try to emulate the GNU C extension of casting a struct
type to itself. This does a deep type comparison, which is expensive. We
could take a shortcut like only comparing the kind and type name, but
seeing as standard C only allows casting to a scalar type, let's drop
support for casting to a struct (or other non-scalar) type entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 02:46:05 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
fd04463596 libdrgn/python: add Type.member()
In Python, looking up a member in a drgn Type by name currently looks
something like:

  member = [member for member in type.members if member.name == "foo"][0]

Add a Type.member(name) method, which is both easier and more efficient.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:10:23 -08:00
Omar Sandoval
e72ecd0e2c libdrgn: replace drgn_program_member_info() with drgn_type_find_member()
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:40:54 -08:00
Omar Sandoval
cf9a068820 libdrgn/python: fix reference counting on Type.members and Type.parameters
The TypeMember and TypeParameter instances referring to a libdrgn
drgn_lazy_type are only valid as long as the Type containing them is
still alive. Hold a reference on the containing Type from LazyType. We
can do this without growing LazyType by getting rid of the enum state
and using sentinel values for LazyType::lazy_type as the state.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:09:12 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
bce9ef5f8d libdrgn: linux kernel: remove THREAD_SIZE object finder
THREAD_SIZE is still broken and I haven't looked into the root cause
(see commit 95be142d17 ("tests: disable THREAD_SIZE test")). We don't
need it anymore anyways, so let's remove it entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 02:08:13 -08:00
Omar Sandoval
97fbedec1f libdrgn: return unavailable objects for DWARF objects without value or address
Now that we have the concept of unavailable objects, use it for DWARF
where appropriate.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 14:15:09 -08:00
Omar Sandoval
6bd0c2b4d2 libdrgn: add concept of "unavailable" objects
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:58:19 -08:00
Omar Sandoval
5f17281926 libdrgn: make drgn_object::is_reference an enum
To prepare for a new kind of object, replace the is_reference bool with
an enum drgn_object_kind.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:37:58 -08:00
Omar Sandoval
e7caa24176 tests: test kernel module debug info loading
Now that vmtest supports kernel modules, test that we load them
correctly.

Closes #74.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-18 01:19:27 -07:00
Omar Sandoval
4431b4f918 vmtest: enable kernel modules
We currently build with CONFIG_MODULES=n for simplicity. However, this
means that we don't test kernel module support at all. Let's enable
module support. This requires changing how we distribute kernels. Now,
the /lib/modules/$(uname -r) directory (including the vmlinux and
vmlinuz) is bundled up as a tarball. We extract it, then mount it with
VirtFS, and do some extra setup for device nodes. (We lose the ability
to run kernel builds directly, but I've never actually used that
functionality.)

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-18 01:13:01 -07:00
Omar Sandoval
4cbb9b552a libdrgn: fix comparison of types with anonymous members
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-08 17:32:46 -07:00
Jay Kamat
d1beb0184a libdrgn: add support for objects in C++ namespaces
DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the
DWARF index, with each namespace being its own sub-index. We only index
the namespace itself when it is first accessed, which should help with
startup time and simplifies tracking.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
26291647eb libdrgn: dwarf_index: handle DW_AT_specification DIEs with two passes
We currently handle DIEs with a DW_AT_specification attribute by parsing
the corresponding declaration to get the name and inserting the DIE as
usual. This has a couple of problems:

1. It only works if DW_AT_specification refers to the same compilation
   unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not
   DW_FORM_ref_addr. As a result, drgn doesn't support the latter.
2. It assumes that the DIE with DW_AT_specification is in the correct
   "scope". Unfortunately, this is not true for g++: for a variable
   definition in a C++ namespace, it generates a DIE with
   DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE
   which refers to the declaration with DW_AT_specification _outside_ of
   the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE.

Supporting both of these cases requires reworking how we handle
DW_AT_specification. This commit takes an approach of parsing the DWARF
data in two passes: the first pass reads the abbrevation and file name
tables and builds a map of instances of DW_AT_specification; the second
pass indexes DIEs as before, but ignores DIEs with DW_AT_specification
and handles DIEs with DW_AT_declaration by looking them up in the map
built by the first pass.

This approach is a 10-20% regression in indexing time in the benchmarks
I ran. Thankfully, it is not 100% slower for a couple of reasons. The
first is that the two passes are simpler than the original combined
pass. The second is that a decent part of the indexing time is spent
faulting in the mapped debugging information, which only needs to happen
once (even if the file is cached, minor page faults add non-negligible
overhead).

This doesn't handle DW_AT_specification "chains" yet, but neither did
the original code. If it is necessary, it shouldn't be too difficult to
add.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
36068a0ea8 Fix trailing commas for Black v20.8b1
Black was recently changed to treat a trailing comma as an indicator to
put each item/argument on its own line. We have a bunch of places where
something previously had to be split into multiple lines, then was
edited to fit on one line, but Black kept the trailing comma. Now this
update wants to unnecessarily split it back up. For now, let's get rid
of these commas. Hopefully in the future Black has a way to opt out of
this.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
4e770fb18a Format imports with isort
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-20 16:55:07 -07:00
Omar Sandoval
e3309765f9 helpers: add kaslr_offset() and move pgtable_l5_enabled()
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-27 17:00:16 -07:00
Omar Sandoval
6d4af7e17e libdrgn: dwarf_info_cache: handle variables DW_AT_const_value
Compile-time constants have DW_AT_const_value instead of DW_AT_location.
We can translate those to a value object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-13 15:23:51 -07:00
Omar Sandoval
3028da4d1d libdrgn: compare language in drgn_type_eq()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-08 22:07:49 -07:00
Omar Sandoval
95be142d17 tests: disable THREAD_SIZE test
GCC 10 doesn't generate a DIE for union thread_union, which breaks our
THREAD_SIZE object finder. The previous change removed our internal
dependency on THREAD_SIZE, so disable this test while I investigate why
GCC changed.
2020-07-08 18:36:10 -07:00
Omar Sandoval
4d8597f0f8 libdrgn: add THREAD_SIZE to Linux kernel object finder
Despite the naming, this is the kernel stack size.
2020-05-19 17:10:54 -07:00
Omar Sandoval
971a2d3687 libdrgn/python: make Objects fully immutable
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
2020-05-18 00:07:49 -07:00
Omar Sandoval
ab876f3dbd libdrgn/python: allow specifying Object value positionally
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
2020-05-18 00:07:49 -07:00
Omar Sandoval
1cc771605d tests: run black on stray change 2020-05-15 15:19:11 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
4b82d1e075 helpers: add cmdline() and environ()
These are two of the most common use cases for reading a process's
memory.
2020-05-08 17:37:56 -07:00
Omar Sandoval
8a276838ac helpers: add access_process_vm() and access_remote_vm()
Now that we can walk page tables, we can finally read memory from
userspace tasks.

Closes #53.
2020-05-08 17:37:01 -07:00
Omar Sandoval
63299e0701 libdrgn: actually use uint64_t for two's complement unary ops
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
2020-05-08 13:50:24 -07:00
Omar Sandoval
23574e59d5 libdrgn: add /proc/kcore physical segments on old kernels
Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so
it's currently not possible to read from physical memory on old kernels.
However, if we can figure out the address of the direct mapping, then we
can determine the corresponding physical addresses for the segments and
add them.
2020-05-04 13:20:27 -07:00
Omar Sandoval
10e58777c3 Add Program.read_{u8,u16,u32,u64,word}()
I've found that I do this manually a lot (e.g., when digging through a
task's stack). Add shortcuts for reading unsigned integers and a note
for how to manually read other formats.
2020-04-27 17:27:10 -07:00
Omar Sandoval
35bb02443d tests: add mm helper tests
Add tests for pfn_to_virt(), virt_to_pfn(), pfn_to_page(), and
page_to_pfn() using the pagemap interface
(https://www.kernel.org/doc/html/latest/admin-guide/mm/pagemap.html).
Also add tests for the PAGE_SIZE, PAGE_SHIFT, and PAGE_MASK macros.
2020-04-10 15:33:29 -07:00
Omar Sandoval
1dbc718840 helpers: add pgtable_l5_enabled() 2020-04-10 15:18:46 -07:00
Jay Kamat
d8fadf10ee libdrgn: Add cpp language and tests 2020-04-03 16:35:38 -07:00