JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-23 09:43:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	bc85767e5f	libdrgn: support looking up parameters and variables in stack traces After all of the preparatory work, the last two missing pieces are a way to find a variable by name in the list of scopes that we saved while unwinding, and a way to find the containing scopes of an inlined function. With that, we can finally look up parameters and variables in stack traces. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	ffcb9ccb19	libdrgn: debug_info: implement creating objects from DWARF location descriptions Add support for evaluating a DWARF location description and translating it into a drgn object. In this commit, this is just used for global variables, but an upcoming commit will wire this up to stack traces for parameters and local variables. There are a few locations that drgn's object model can't represent yet. DW_OP_piece/DW_OP_bit_piece can describe objects that are only partially known or partially in memory; we approximate these where we can. We don't have a good way to support DW_OP_implicit_pointer at all yet. This also adds test cases for DWARF expressions, which we couldn't easily test before. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-05 16:18:51 -07:00
Omar Sandoval	0e3054a0ba	libdrgn: make addresses wrap around when reading memory Define that addresses for memory reads wrap around after the maximum address rather than the current unpredictable behavior. This is done by: 1. Reworking drgn_memory_reader to work with an inclusive address range so that a segment can contain UINT64_MAX. drgn_memory_reader remains agnostic to the maximum address and requires that address ranges do not overflow a uint64_t. 2. Adding the overflow/wrap-around logic to drgn_program_add_memory_segment() and drgn_program_read_memory(). 3. Changing direct uses of drgn_memory_reader_reader() to drgn_program_read_memory() now that they are no longer equivalent. (For some platforms, a fault might be more appropriate than wrapping around, but this is a step in the right direction.) Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-06-03 17:49:29 -07:00
Omar Sandoval	cf371594f3	tests: run a few test cases with DW_FORM_indirect Pick a few DWARF parsing test cases that exercise the interesting cases for DW_FORM_indirect and run them with and without DW_FORM_indirect. We only test DW_FORM_indirect if libdw is new enough to support it. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-05-04 16:56:54 -07:00
Omar Sandoval	609a1cafc6	libdrgn: dwarf_index: check for attribute forms more strictly Rather than silently ignoring attributes whose form we don't recognize, return an error. This way, we won't mysteriously skip indexing DIEs. While we're doing this, split the form -> instruction mapping to its own functions. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-05-04 16:56:54 -07:00
Jay Kamat	c108f9a24c	tests: add basic tests for type units Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2021-04-23 02:37:31 -07:00
Omar Sandoval	6b79b21ab5	tests: fix test depending on repr(enum.Flag) format CPython commit b775106d940e ("bpo-40066: Enum: modify `repr()` and `str()` (GH-22392)") changed repr(enum.Flag) from, e.g., <Qualifiers.VOLATILE\|CONST: 3> to Qualifiers.CONST\|Qualifiers.VOLATILE. Fix tests.test_type.TestType.test_qualifiers to not assume the format. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-22 01:17:22 -07:00
Omar Sandoval	a4b9d68a8c	Use GPL-3.0-or-later license identifier instead of GPL-3.0+ Apparently the latter is deprecated and the former is preferred. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-03 01:10:35 -07:00
Davide Cavalca	7ca157316f	tests: properly escape regexp strings Signed-off-by: Davide Cavalca <dcavalca@fb.com>	2021-04-02 10:37:33 -07:00
Davide Cavalca	081d7773e1	tests: rename test_type_dies for pytest compatibility Signed-off-by: Davide Cavalca <dcavalca@fb.com>	2021-04-02 10:37:14 -07:00
Omar Sandoval	630d39e345	libdrgn: add ORC unwinder The Linux kernel has its own stack unwinding format for x86-64 called ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is essentially a simplified, less complete version of DWARF CFI. ORC is generated by analyzing machine code, so it is present for all but a few ignored functions. In contrast, DWARF CFI is generated by the compiler and is therefore missing for functions written in assembly and inline assembly (which is widespread in the kernel). This implements an ORC stack unwinder: it applies ELF relocations to the ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule kind, parses and efficiently stores ORC data, and translates ORC to drgn CFI rules. This will allow us to stack trace through assembly code, interrupts, and system calls. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-29 10:01:52 -07:00
Omar Sandoval	12723a0c08	tests: clean up tests.helpers.linux.test_debug_info Split the two modes into separate tests and move the environment variable fiddling into a separate helper function. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-26 12:49:06 -07:00
Omar Sandoval	da0280016c	libdrgn: python: identify bit fields in TypeMember.__repr__ If a member is a bit field, then we should format it with the underlying Object so that it shows the bit field size. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-17 12:02:53 -07:00
Jay Kamat	c22e501295	libdrgn: debug_info: fix parsing specifications of declarations drgn_compound_type_from_dwarf() and drgn_enum_type_from_dwarf() check the DW_AT_declaration flag to decide whether the type is a declaration of an incomplete type or a definition of a complete type. However, they check DW_AT_declaration with dwarf_attr_integrate(), which follows the DW_AT_specification reference if it is present. The DIE referenced by DW_AT_specification typically is a declaration, so this erroneously identifies definitions as declarations. Additionally, if drgn_debug_info_find_complete() finds the same definition, we can end up recursing until we hit the DWARF parsing depth limit. Fix it by not using dwarf_attr_integrate() for DW_AT_declaration. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2021-02-25 10:46:34 -08:00
Omar Sandoval	85dec2b8f6	tests: move C-specific tests from test_object to test_language_c TestCLiteral, TestCIntegerPromotion, TestCCommonRealType, TestCOperators, and TestCPretty in test_object all test various operations on objects, but since they're testing language-specific behavior, they belong in test_language_c. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-02-21 16:11:19 -08:00
Omar Sandoval	55e3a58e06	libdrgn: python: use correct member offset when creating object from value We need to use the offset of the member in the outermost object type, not the offset in the immediate containing type in the case of nested anonymous structs. Fixes: `e72ecd0e2c` ("libdrgn: replace drgn_program_member_info() with drgn_type_find_member()") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-02-21 02:29:59 -08:00
Omar Sandoval	9fda010789	Track byte order in scalar types instead of objects Currently, reference objects and buffer value objects have a byte order. However, this doesn't always make sense for a couple of reasons: - Byte order is only meaningful for scalars. What does it mean for a struct to be big endian? A struct doesn't have a most or least significant byte; its scalar members do. - The DWARF specification allows either types or variables to have a byte order (DW_AT_endianity). The only producer I could find that uses this is GCC for the scalar_storage_order type attribute, and it only uses it for base types, not variables. GDB only seems to use to check it for base types, as well. So, remove the byte order from objects, and move it to integer, boolean, floating-point, and pointer types. This model makes more sense, and it means that we can get the binary representation of any object now. The only downside is that we can no longer support a bit offset for non-scalars, but as far as I can tell, nothing needs that. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-02-19 21:41:29 -08:00
Omar Sandoval	72b4aa9669	libdrgn: clean up object initialization Rename struct drgn_object_type to struct drgn_operand_type, add a new struct drgn_object_type which contains all of the type-related fields from struct drgn_object, and use it to implement drgn_object_type() and drgn_object_type_operand(), which are replacements for drgn_object_set_common() and drgn_object_type_encoding_and_size(). This cleans up a lot of the boilerplate around initializing objects. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-02-19 17:43:14 -08:00
Omar Sandoval	78316a28fb	libdrgn: remove half-baked support for complex types We've nominally supported complex types since commit `75c3679147` ("Rewrite drgn core in C"), but parsing them from DWARF has been incorrect from the start (they don't have a DW_AT_type attribute like we assume), and we never implemented proper support for complex objects. Drop the partial implementation; we can bring it back (properly) if someone requests it. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-02-17 14:56:33 -08:00
Omar Sandoval	b899a10836	Remove register numbers from API and add register aliases enum drgn_register_number in the public libdrgn API and drgn.Register.number in the Python bindings are basically exports of DWARF register numbers. They only exist as a way to identify registers that's lighter weight than string lookups. libdrgn already has struct drgn_register, so we can use that to identify registers in the public API and remove enum drgn_register_number. This has a couple of benefits: we don't depend on DWARF numbering in our API, and we don't have to generate drgn.h from the architecture files. The Python bindings can just use string names for now. If it seems useful, StackFrame.register() can take a Register in the future, we'll just need to be careful to not allow Registers from the wrong platform. While we're changing the API anyways, also change it so that registers have a list of names instead of one name. This isn't needed for x86-64 at the moment, but will be for architectures that have multiple names for the same register (like ARM). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-28 17:47:45 -08:00
Omar Sandoval	bbefc573d8	libdrgn: debug_info: make sure DW_TAG_template_value_parameter has value Otherwise, an invalid DW_TAG_template_value_parameter can be confused for a type parameter. Fixes: `352c31e1ac` ("Add support for C++ template parameters") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-21 12:07:46 -08:00
Omar Sandoval	5f170ea3f3	helpers: add per_cpu() The correct way to access global per-CPU variables (per_cpu_ptr(prog[name].address_of_(), cpu)) has been a common source of confusion (see #77). Add an analogue to the per_cpu() macro in the kernel as a shortcut and document it as the easiest method for getting a global per-CPU variable: per_cpu(prog[name], cpu). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-21 11:40:05 -08:00
Omar Sandoval	81a203c48f	helpers: fix for_each_{possible,online,present}_cpu() on v4.4 Also reorder the definitions to alphabetical order and add tests. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-21 10:08:48 -08:00
Omar Sandoval	352c31e1ac	Add support for C++ template parameters Add struct drgn_type_template_parameter to libdrgn, the corresponding TypeTemplateParameter to the Python bindings, and support for parsing them from DWARF. With this, support for templates is almost, but not quite, complete. The main wart is that DW_TAG_name of compound types includes the template parameters, so the type tag includes it as well. We should remove that from the tag and instead have the type formatting code add it only when getting the full type name. Based on a patch from Jay Kamat. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-08 17:39:51 -08:00
Omar Sandoval	d35243b354	libdrgn: replace lazy types with lazy objects In order to support static members, methods, default function arguments, and value template parameters, we need to be able to store a drgn_object in a drgn_type_member or drgn_type_parameter. These are all cases where we want lazy evaluation, so we can replace drgn_lazy_type with a new drgn_lazy_object which implements the same idea but for objects. Types can still be represented with an absent object. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-08 17:39:51 -08:00
Omar Sandoval	a57c26ed32	libdrgn: fix zero-length array GCC < 9.0 workaround for qualified types We're not applying the zero-length array workaround when the array type is qualified. Make sure we pass through can_be_incomplete_array when parsing DW_TAG_{const,restrict,volatile,atomic}_type. Fixes: `75c3679147` ("Rewrite drgn core in C") Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-08 11:21:57 -08:00
Omar Sandoval	988e9e7190	libdrgn/python: add Object.absent_ Without this, the only way to check whether an object is absent in Python is to try to use the object and catch the ObjectAbsentError. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-29 15:06:40 -08:00
Omar Sandoval	30cfa40a72	libdrgn: rename "unavailable" objects to "absent" objects I was going to add an Object.available_ attribute, but that made me realize that the naming is somewhat ambiguous, as a reference object with an invalid address might also be considered "unavailable" by users. Use the name "absent" instead, which is more clear: the object isn't there at all. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-29 14:58:26 -08:00
Omar Sandoval	c2eec00ae0	libdrgn/python: use None instead of 0 for TypeMember.bit_field_size Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by using None to represent a non-bit field instead of 0. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-25 01:53:23 -08:00
Omar Sandoval	7d7aa7bf7b	libdrgn/python: remove Type == operator The == operator on drgn.Type is only intended for testing. It's expensive and slow and not what people usually want. It's going to get even more awkward to define once types can refer to objects (for template parameters and static members and such). Let's replace == with a new identical() function only available in unit tests. Then, remove the operator from the Python bindings as well as the underlying libdrgn drgn_type_eq() and drgn_qualified_type_eq() functions. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-22 03:11:38 -08:00
Omar Sandoval	523fd26959	libdrgn: don't allow casting to non-scalar types at all Currently, we try to emulate the GNU C extension of casting a struct type to itself. This does a deep type comparison, which is expensive. We could take a shortcut like only comparing the kind and type name, but seeing as standard C only allows casting to a scalar type, let's drop support for casting to a struct (or other non-scalar) type entirely. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-22 02:46:05 -08:00
Omar Sandoval	40004e5c8f	libdrgn/python: add offsetof() offsetof() can almost be implemented with Type.member(name).offset, but that doesn't parse member designators. Add an offsetof() function that does (and add drgn_type_offsetof() in libdrgn). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 16:46:41 -08:00
Omar Sandoval	fd04463596	libdrgn/python: add Type.member() In Python, looking up a member in a drgn Type by name currently looks something like: member = [member for member in type.members if member.name == "foo"][0] Add a Type.member(name) method, which is both easier and more efficient. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 16:10:23 -08:00
Omar Sandoval	e72ecd0e2c	libdrgn: replace drgn_program_member_info() with drgn_type_find_member() Now that types are associated with their program, we don't need to pass the program separately to drgn_program_member_info() and can replace it with a more natural drgn_type_find_member() API that takes only the type and member name. While we're at it, get rid of drgn_member_info and return the drgn_type_member and bit_offset directly. This also fixes a bug that drgn_error_member_not_found() ignores the member name length. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 14:40:54 -08:00
Omar Sandoval	cf9a068820	libdrgn/python: fix reference counting on Type.members and Type.parameters The TypeMember and TypeParameter instances referring to a libdrgn drgn_lazy_type are only valid as long as the Type containing them is still alive. Hold a reference on the containing Type from LazyType. We can do this without growing LazyType by getting rid of the enum state and using sentinel values for LazyType::lazy_type as the state. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-15 14:09:12 -08:00
Omar Sandoval	abafdd965f	Remove bit_offset from value objects There are a couple of reasons that it was the wrong choice to have a bit_offset for value objects: 1. When we store a buffer with a bit_offset, we're storing useless padding bits. 2. bit_offset describes a location, or in other words, part of an address. This makes sense for references, but not for values, which are just a bag of bytes. Get rid of union drgn_value.bit_offset in libdrgn, make Object.bit_offset None for value objects, and disallow passing bit_offset to the Object() constructor when creating a value. bit_offset can still be passed when creating an object from a buffer, but we'll shift the bytes down as necessary to store the value with no offset. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-14 12:29:17 -08:00
Omar Sandoval	bce9ef5f8d	libdrgn: linux kernel: remove THREAD_SIZE object finder THREAD_SIZE is still broken and I haven't looked into the root cause (see commit `95be142d17` ("tests: disable THREAD_SIZE test")). We don't need it anymore anyways, so let's remove it entirely. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-10 02:08:13 -08:00
Omar Sandoval	97fbedec1f	libdrgn: return unavailable objects for DWARF objects without value or address Now that we have the concept of unavailable objects, use it for DWARF where appropriate. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-04 14:15:09 -08:00
Omar Sandoval	6bd0c2b4d2	libdrgn: add concept of "unavailable" objects There are some situations where we can find an object but can't determine its value, like local variables that have been optimized out, inlined functions without a concrete instance, and pure virtual methods. It's still useful to get some information from these objects, namely their types. Let's add the concept of an "unavailable" object, which is an object with a known type but unknown value/address. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-04 13:58:19 -08:00
Omar Sandoval	5f17281926	libdrgn: make drgn_object::is_reference an enum To prepare for a new kind of object, replace the is_reference bool with an enum drgn_object_kind. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-12-04 13:37:58 -08:00
Omar Sandoval	e7caa24176	tests: test kernel module debug info loading Now that vmtest supports kernel modules, test that we load them correctly. Closes #74. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-18 01:19:27 -07:00
Omar Sandoval	4431b4f918	vmtest: enable kernel modules We currently build with CONFIG_MODULES=n for simplicity. However, this means that we don't test kernel module support at all. Let's enable module support. This requires changing how we distribute kernels. Now, the /lib/modules/$(uname -r) directory (including the vmlinux and vmlinuz) is bundled up as a tarball. We extract it, then mount it with VirtFS, and do some extra setup for device nodes. (We lose the ability to run kernel builds directly, but I've never actually used that functionality.) Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-18 01:13:01 -07:00
Omar Sandoval	4cbb9b552a	libdrgn: fix comparison of types with anonymous members drgn_type_members_eq() skips comparing the types of anonymous members. Fix that and add a test for it. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-10-08 17:32:46 -07:00
Jay Kamat	d1beb0184a	libdrgn: add support for objects in C++ namespaces DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the DWARF index, with each namespace being its own sub-index. We only index the namespace itself when it is first accessed, which should help with startup time and simplifies tracking. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	26291647eb	libdrgn: dwarf_index: handle DW_AT_specification DIEs with two passes We currently handle DIEs with a DW_AT_specification attribute by parsing the corresponding declaration to get the name and inserting the DIE as usual. This has a couple of problems: 1. It only works if DW_AT_specification refers to the same compilation unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not DW_FORM_ref_addr. As a result, drgn doesn't support the latter. 2. It assumes that the DIE with DW_AT_specification is in the correct "scope". Unfortunately, this is not true for g++: for a variable definition in a C++ namespace, it generates a DIE with DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE which refers to the declaration with DW_AT_specification _outside_ of the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE. Supporting both of these cases requires reworking how we handle DW_AT_specification. This commit takes an approach of parsing the DWARF data in two passes: the first pass reads the abbrevation and file name tables and builds a map of instances of DW_AT_specification; the second pass indexes DIEs as before, but ignores DIEs with DW_AT_specification and handles DIEs with DW_AT_declaration by looking them up in the map built by the first pass. This approach is a 10-20% regression in indexing time in the benchmarks I ran. Thankfully, it is not 100% slower for a couple of reasons. The first is that the two passes are simpler than the original combined pass. The second is that a decent part of the indexing time is spent faulting in the mapped debugging information, which only needs to happen once (even if the file is cached, minor page faults add non-negligible overhead). This doesn't handle DW_AT_specification "chains" yet, but neither did the original code. If it is necessary, it shouldn't be too difficult to add. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	36068a0ea8	Fix trailing commas for Black v20.8b1 Black was recently changed to treat a trailing comma as an indicator to put each item/argument on its own line. We have a bunch of places where something previously had to be split into multiple lines, then was edited to fit on one line, but Black kept the trailing comma. Now this update wants to unnecessarily split it back up. For now, let's get rid of these commas. Hopefully in the future Black has a way to opt out of this. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:29 -07:00
Omar Sandoval	2fc514f2a4	libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers] I originally did it this way because pydoc doesn't handle non-trivial defaults in signature very well (see commit `67a16a09b8` ("tests: test that Python documentation renders")). drgndoc doesn't generate signature for pydoc anymore, though, so we don't need to worry about it and can clean up the typing. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:29 -07:00
Omar Sandoval	a97f6c4fa2	Associate types with program I originally envisioned types as dumb descriptors. This mostly works for C because in C, types are fairly simple. However, even then the drgn_program_member_info() API is awkward. You should be able to look up a member directly from a type, but we need the program for caching purposes. This has also held me back from adding offsetof() or has_member() APIs. Things get even messier with C++. C++ template parameters can be objects (e.g., template <int N>). Such parameters would best be represented by a drgn object, which we need a drgn program for. Static members are a similar case. So, let's reimagine types as being owned by a program. This has a few parts: 1. In libdrgn, simple types are now created by factory functions, drgn_foo_type_create(). 2. To handle their variable length fields, compound types, enum types, and function types are constructed with a "builder" API. 3. Simple types are deduplicated. 4. The Python type factory functions are replaced by methods of the Program class. 5. While we're changing the API, the parameters to pointer_type() and array_type() are reordered to be more logical (and to allow pointer_type() to take a default size of None for the program's default pointer size). 6. Likewise, the type factory methods take qualifiers as a keyword argument only. A big part of this change is updating the tests and splitting up large test cases into smaller ones in a few places. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:41:09 -07:00
Omar Sandoval	c31208f69c	libdrgn: fold drgn_type_index into drgn_program This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:36:35 -07:00
Omar Sandoval	4e770fb18a	Format imports with isort Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 16:55:07 -07:00

1 2 3 4 5

237 Commits