Commit Graph

249 Commits

Author SHA1 Message Date
Peilin Ye
557b8152cc helpers: Add netdev_get_by_name()
Add a helper to get the network device ("struct net_device *") given an
interface name.  As an example:

    >>> netdev = netdev_get_by_name(prog["init_net"], "lo")
    >>> netdev.ifindex.value_()
    1

Or pass a "Program" as the first argument, and let the helper find in
the initial network namespace (i.e. "init_net"):

    >>> netdev = netdev_get_by_index(prog, "dummy0")
    >>> netdev.ifindex.value_()
    2

Also add a test for this new helper to tests/helpers/linux/test_net.py.

This helper simply does a linear search over the name hash table of the
network namespace, since implementing hashing in drgn is non-trivial.
It is obviously slower than net/core/dev.c:netdev_name_node_lookup() in
the kernel, but still useful.

Linux kernel commit ff92741270bf ("net: introduce name_node struct to be
used in hashlist") introduced struct netdev_name_node for name lookups.
Start by assuming that the kernel has this commit, and fall back to the
old path if that fails.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
2021-08-11 20:23:39 -07:00
Omar Sandoval
5541fad063 Fix some flake8 errors
Mainly unused imports, unused variables, unnecessary f-strings, and
regex literals missing the r prefix. I'm not adding it to the CI linter
because it's too noisy, though.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-11 14:52:44 -07:00
Peilin Ye
e9915886f6 helpers: Add netdev_get_by_index()
Add a helper to find the corresponding "struct net_device *" object
given an interface index number.  As an example:

    >>> netdev = netdev_get_by_index(prog["init_net"], 1)
    >>> netdev.name.string_().decode()
    'lo'

Or pass a "Program" as the first argument, and let the helper find in
its initial network namespace (i.e. "init_net"):

    >>> netdev = netdev_get_by_index(prog, 3)
    >>> netdev.name.string_().decode()
    'enp0s3'

Also add a test for this new helper to tests/helpers/linux/test_net.py.

For now, a user may combine this new helper with socket.if_nametoindex()
to look up by interface name:

    >>> netdev = find_netdev_by_index(prog, socket.if_nametoindex("dummy0"))
    >>> netdev.name.string_().decode()
    'dummy0'

However, as mentioned by Cong, one should keep in mind that
socket.if_nametoindex() is based on system's current name-to-index
mapping, which may be different from that of e.g. a kdump.

Thus, as suggested by Omar, a better way to do name lookups would be
simply linear-searching the name hash table, which is slower, but less
erorr-prone.

Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
2021-08-11 13:36:32 -07:00
Omar Sandoval
51f63bb53b helpers: add node_state() to nodemask helpers
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-03 17:06:36 -07:00
Omar Sandoval
1213eb8f49 helpers: add bit operation helpers
Extract for_each_set_bit() that was added internally for the cpumask and
nodemask helpers, and add for_each_clear_bit() and test_bit() to go with
it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-03 15:43:04 -07:00
Qi Zheng
2f97cb4f75 helpers: add kernel nodemask helpers
Sometimes we want to traverse numa nodes in the system,
so add kernel nodemask helpers to support this.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
2021-07-29 19:28:59 -07:00
Omar Sandoval
7335df114c libdrgn: python: add Object.to_bytes_()
And the libdrgn implementation, drgn_object_read_bytes().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-26 17:12:34 -07:00
Omar Sandoval
9c00552007 libdrgn: python: add Object.from_bytes_()
Add a way to create an object from raw bytes. One example where I've
wanted this is creating a struct pt_regs from a PRSTATUS note or other
source.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-07-26 17:06:58 -07:00
Omar Sandoval
2e04e6b73c libdrgn: binary_buffer: handle non-canonical LEB128 numbers
LEB128 allows for redundant zero/sign bits, but we currently always
treat extra bytes as overflow. Let's check those bytes correctly.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 21:39:31 -07:00
Omar Sandoval
86e966fbf8 libdrgn: dwarf_index: handle DW_FORM_block
Somehow I missed this form, and I've never seen it used. It's the same
as DW_FORM_exprloc for our purposes, so it's an easy fix.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-30 01:34:52 -07:00
Omar Sandoval
faad25d7b2 libdrgn: debug_info: fix address of objects with size zero
The stack trace variable work introduced a regression that causes
objects with size zero to always be marked absent even if they have an
address. This matters because GCC sometimes seems to omit the complete
array type for arrays declared without a length, so an array variable
can end up with an incomplete array type. I saw this with the
"swapper_spaces" variable in mm/swap_state.c from the Linux kernel.

Make sure to use the address of an empty piece if the variable is also
empty.

Fixes: ffcb9ccb19 ("libdrgn: debug_info: implement creating objects from DWARF location descriptions")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 15:46:22 -07:00
Omar Sandoval
f7fe93e573 cli: show elfutils version in use
drgn depends heavily on libelf and libdw, so it's useful to know what
version we're using. Add drgn._elfutils_version and use that in the CLI
and in the test cases where we currently check the libdw version.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-07 11:10:50 -07:00
Omar Sandoval
bc85767e5f libdrgn: support looking up parameters and variables in stack traces
After all of the preparatory work, the last two missing pieces are a way
to find a variable by name in the list of scopes that we saved while
unwinding, and a way to find the containing scopes of an inlined
function. With that, we can finally look up parameters and variables in
stack traces.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
ffcb9ccb19 libdrgn: debug_info: implement creating objects from DWARF location descriptions
Add support for evaluating a DWARF location description and translating
it into a drgn object. In this commit, this is just used for global
variables, but an upcoming commit will wire this up to stack traces for
parameters and local variables.

There are a few locations that drgn's object model can't represent yet.
DW_OP_piece/DW_OP_bit_piece can describe objects that are only partially
known or partially in memory; we approximate these where we can. We
don't have a good way to support DW_OP_implicit_pointer at all yet.

This also adds test cases for DWARF expressions, which we couldn't
easily test before.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-05 16:18:51 -07:00
Omar Sandoval
0e3054a0ba libdrgn: make addresses wrap around when reading memory
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:

1. Reworking drgn_memory_reader to work with an inclusive address range
   so that a segment can contain UINT64_MAX. drgn_memory_reader remains
   agnostic to the maximum address and requires that address ranges do
   not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
   drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
   drgn_program_read_memory() now that they are no longer equivalent.

(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-03 17:49:29 -07:00
Omar Sandoval
cf371594f3 tests: run a few test cases with DW_FORM_indirect
Pick a few DWARF parsing test cases that exercise the interesting cases
for DW_FORM_indirect and run them with and without DW_FORM_indirect. We
only test DW_FORM_indirect if libdw is new enough to support it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-04 16:56:54 -07:00
Omar Sandoval
609a1cafc6 libdrgn: dwarf_index: check for attribute forms more strictly
Rather than silently ignoring attributes whose form we don't recognize,
return an error. This way, we won't mysteriously skip indexing DIEs.
While we're doing this, split the form -> instruction mapping to its own
functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-05-04 16:56:54 -07:00
Jay Kamat
c108f9a24c tests: add basic tests for type units
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-04-23 02:37:31 -07:00
Omar Sandoval
6b79b21ab5 tests: fix test depending on repr(enum.Flag) format
CPython commit b775106d940e ("bpo-40066: Enum: modify `repr()` and
`str()` (GH-22392)") changed repr(enum.Flag) from, e.g.,
<Qualifiers.VOLATILE|CONST: 3> to Qualifiers.CONST|Qualifiers.VOLATILE.
Fix tests.test_type.TestType.test_qualifiers to not assume the format.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-22 01:17:22 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Davide Cavalca
7ca157316f tests: properly escape regexp strings
Signed-off-by: Davide Cavalca <dcavalca@fb.com>
2021-04-02 10:37:33 -07:00
Davide Cavalca
081d7773e1 tests: rename test_type_dies for pytest compatibility
Signed-off-by: Davide Cavalca <dcavalca@fb.com>
2021-04-02 10:37:14 -07:00
Omar Sandoval
630d39e345 libdrgn: add ORC unwinder
The Linux kernel has its own stack unwinding format for x86-64 called
ORC: https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html. It is
essentially a simplified, less complete version of DWARF CFI. ORC is
generated by analyzing machine code, so it is present for all but a few
ignored functions. In contrast, DWARF CFI is generated by the compiler
and is therefore missing for functions written in assembly and inline
assembly (which is widespread in the kernel).

This implements an ORC stack unwinder: it applies ELF relocations to the
ORC sections, adds a new DRGN_CFI_RULE_REGISTER_ADD_OFFSET CFI rule
kind, parses and efficiently stores ORC data, and translates ORC to drgn
CFI rules. This will allow us to stack trace through assembly code,
interrupts, and system calls.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-29 10:01:52 -07:00
Omar Sandoval
12723a0c08 tests: clean up tests.helpers.linux.test_debug_info
Split the two modes into separate tests and move the environment
variable fiddling into a separate helper function.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-26 12:49:06 -07:00
Omar Sandoval
da0280016c libdrgn: python: identify bit fields in TypeMember.__repr__
If a member is a bit field, then we should format it with the underlying
Object so that it shows the bit field size.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-17 12:02:53 -07:00
Jay Kamat
c22e501295 libdrgn: debug_info: fix parsing specifications of declarations
drgn_compound_type_from_dwarf() and drgn_enum_type_from_dwarf() check
the DW_AT_declaration flag to decide whether the type is a declaration
of an incomplete type or a definition of a complete type. However, they
check DW_AT_declaration with dwarf_attr_integrate(), which follows the
DW_AT_specification reference if it is present. The DIE referenced by
DW_AT_specification typically is a declaration, so this erroneously
identifies definitions as declarations. Additionally, if
drgn_debug_info_find_complete() finds the same definition, we can end up
recursing until we hit the DWARF parsing depth limit. Fix it by not
using dwarf_attr_integrate() for DW_AT_declaration.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2021-02-25 10:46:34 -08:00
Omar Sandoval
85dec2b8f6 tests: move C-specific tests from test_object to test_language_c
TestCLiteral, TestCIntegerPromotion, TestCCommonRealType,
TestCOperators, and TestCPretty in test_object all test various
operations on objects, but since they're testing language-specific
behavior, they belong in test_language_c.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 16:11:19 -08:00
Omar Sandoval
55e3a58e06 libdrgn: python: use correct member offset when creating object from value
We need to use the offset of the member in the outermost object type,
not the offset in the immediate containing type in the case of nested
anonymous structs.

Fixes: e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with drgn_type_find_member()")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 02:29:59 -08:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
72b4aa9669 libdrgn: clean up object initialization
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 17:43:14 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
b899a10836 Remove register numbers from API and add register aliases
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.

While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 17:47:45 -08:00
Omar Sandoval
bbefc573d8 libdrgn: debug_info: make sure DW_TAG_template_value_parameter has value
Otherwise, an invalid DW_TAG_template_value_parameter can be confused
for a type parameter.

Fixes: 352c31e1ac ("Add support for C++ template parameters")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 12:07:46 -08:00
Omar Sandoval
5f170ea3f3 helpers: add per_cpu()
The correct way to access global per-CPU variables
(per_cpu_ptr(prog[name].address_of_(), cpu)) has been a common source of
confusion (see #77). Add an analogue to the per_cpu() macro in the
kernel as a shortcut and document it as the easiest method for getting a
global per-CPU variable: per_cpu(prog[name], cpu).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 11:40:05 -08:00
Omar Sandoval
81a203c48f helpers: fix for_each_{possible,online,present}_cpu() on v4.4
Also reorder the definitions to alphabetical order and add tests.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-21 10:08:48 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
a57c26ed32 libdrgn: fix zero-length array GCC < 9.0 workaround for qualified types
We're not applying the zero-length array workaround when the array type
is qualified. Make sure we pass through can_be_incomplete_array when
parsing DW_TAG_{const,restrict,volatile,atomic}_type.

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 11:21:57 -08:00
Omar Sandoval
988e9e7190 libdrgn/python: add Object.absent_
Without this, the only way to check whether an object is absent in
Python is to try to use the object and catch the ObjectAbsentError.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 15:06:40 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
c2eec00ae0 libdrgn/python: use None instead of 0 for TypeMember.bit_field_size
Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by
using None to represent a non-bit field instead of 0.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-25 01:53:23 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
523fd26959 libdrgn: don't allow casting to non-scalar types at all
Currently, we try to emulate the GNU C extension of casting a struct
type to itself. This does a deep type comparison, which is expensive. We
could take a shortcut like only comparing the kind and type name, but
seeing as standard C only allows casting to a scalar type, let's drop
support for casting to a struct (or other non-scalar) type entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 02:46:05 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
fd04463596 libdrgn/python: add Type.member()
In Python, looking up a member in a drgn Type by name currently looks
something like:

  member = [member for member in type.members if member.name == "foo"][0]

Add a Type.member(name) method, which is both easier and more efficient.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:10:23 -08:00
Omar Sandoval
e72ecd0e2c libdrgn: replace drgn_program_member_info() with drgn_type_find_member()
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:40:54 -08:00
Omar Sandoval
cf9a068820 libdrgn/python: fix reference counting on Type.members and Type.parameters
The TypeMember and TypeParameter instances referring to a libdrgn
drgn_lazy_type are only valid as long as the Type containing them is
still alive. Hold a reference on the containing Type from LazyType. We
can do this without growing LazyType by getting rid of the enum state
and using sentinel values for LazyType::lazy_type as the state.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:09:12 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
bce9ef5f8d libdrgn: linux kernel: remove THREAD_SIZE object finder
THREAD_SIZE is still broken and I haven't looked into the root cause
(see commit 95be142d17 ("tests: disable THREAD_SIZE test")). We don't
need it anymore anyways, so let's remove it entirely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-10 02:08:13 -08:00
Omar Sandoval
97fbedec1f libdrgn: return unavailable objects for DWARF objects without value or address
Now that we have the concept of unavailable objects, use it for DWARF
where appropriate.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 14:15:09 -08:00