In Python, looking up a member in a drgn Type by name currently looks
something like:
member = [member for member in type.members if member.name == "foo"][0]
Add a Type.member(name) method, which is both easier and more efficient.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The TypeMember and TypeParameter instances referring to a libdrgn
drgn_lazy_type are only valid as long as the Type containing them is
still alive. Hold a reference on the containing Type from LazyType. We
can do this without growing LazyType by getting rid of the enum state
and using sentinel values for LazyType::lazy_type as the state.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:
1. When we store a buffer with a bit_offset, we're storing useless
padding bits.
2. bit_offset describes a location, or in other words, part of an
address. This makes sense for references, but not for values, which
are just a bag of bytes.
Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
THREAD_SIZE is still broken and I haven't looked into the root cause
(see commit 95be142d17 ("tests: disable THREAD_SIZE test")). We don't
need it anymore anyways, so let's remove it entirely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We currently build with CONFIG_MODULES=n for simplicity. However, this
means that we don't test kernel module support at all. Let's enable
module support. This requires changing how we distribute kernels. Now,
the /lib/modules/$(uname -r) directory (including the vmlinux and
vmlinuz) is bundled up as a tarball. We extract it, then mount it with
VirtFS, and do some extra setup for device nodes. (We lose the ability
to run kernel builds directly, but I've never actually used that
functionality.)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the
DWARF index, with each namespace being its own sub-index. We only index
the namespace itself when it is first accessed, which should help with
startup time and simplifies tracking.
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
We currently handle DIEs with a DW_AT_specification attribute by parsing
the corresponding declaration to get the name and inserting the DIE as
usual. This has a couple of problems:
1. It only works if DW_AT_specification refers to the same compilation
unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not
DW_FORM_ref_addr. As a result, drgn doesn't support the latter.
2. It assumes that the DIE with DW_AT_specification is in the correct
"scope". Unfortunately, this is not true for g++: for a variable
definition in a C++ namespace, it generates a DIE with
DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE
which refers to the declaration with DW_AT_specification _outside_ of
the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE.
Supporting both of these cases requires reworking how we handle
DW_AT_specification. This commit takes an approach of parsing the DWARF
data in two passes: the first pass reads the abbrevation and file name
tables and builds a map of instances of DW_AT_specification; the second
pass indexes DIEs as before, but ignores DIEs with DW_AT_specification
and handles DIEs with DW_AT_declaration by looking them up in the map
built by the first pass.
This approach is a 10-20% regression in indexing time in the benchmarks
I ran. Thankfully, it is not 100% slower for a couple of reasons. The
first is that the two passes are simpler than the original combined
pass. The second is that a decent part of the indexing time is spent
faulting in the mapped debugging information, which only needs to happen
once (even if the file is cached, minor page faults add non-negligible
overhead).
This doesn't handle DW_AT_specification "chains" yet, but neither did
the original code. If it is necessary, it shouldn't be too difficult to
add.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Black was recently changed to treat a trailing comma as an indicator to
put each item/argument on its own line. We have a bunch of places where
something previously had to be split into multiple lines, then was
edited to fit on one line, but Black kept the trailing comma. Now this
update wants to unnecessarily split it back up. For now, let's get rid
of these commas. Hopefully in the future Black has a way to opt out of
this.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.
Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.
So, let's reimagine types as being owned by a program. This has a few
parts:
1. In libdrgn, simple types are now created by factory functions,
drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
Program class.
5. While we're changing the API, the parameters to pointer_type() and
array_type() are reordered to be more logical (and to allow
pointer_type() to take a default size of None for the program's
default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
argument only.
A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Compile-time constants have DW_AT_const_value instead of DW_AT_location.
We can translate those to a value object.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
GCC 10 doesn't generate a DIE for union thread_union, which breaks our
THREAD_SIZE object finder. The previous change removed our internal
dependency on THREAD_SIZE, so disable this test while I investigate why
GCC changed.
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so
it's currently not possible to read from physical memory on old kernels.
However, if we can figure out the address of the direct mapping, then we
can determine the corresponding physical addresses for the segments and
add them.
I've found that I do this manually a lot (e.g., when digging through a
task's stack). Add shortcuts for reading unsigned integers and a note
for how to manually read other formats.
The current implementation of vmtest has a few issues:
1. Building drgn for each kernel version on Travis is slow, mostly
because they don't all run in parallel.
2. For local, incremental testing, recreating the filesystem image and
rebuilding drgn is slow, and syncing the code to the filesystem image
is brittle.
3. The filesystem image is the only communication channel, and reading
the exit status from the filesystem image is awkward and fragile.
4. Creating and accessing the filesystem image requires root.
This reworks vmtest to use the build on the host via VirtFS with a
simple agent on the guest that can execute arbitrary commands and return
the exit status. This has a few more moving parts but is faster and
saner overall.
Add an verrevcmp() function based on the coreutils implementation (which
comes from gnulib, which is derived from the implementation in dpkg).
This will be used by vmtest.
We need to keep the Program alive for its types to stay valid, not just
the objects the Program has pinned. (I have no idea why I changed this
in commit 565e0343ef ("libdrgn: make symbol index pluggable with
callbacks").)
For operations where we don't have a type available, we currently fall
back to C. Instead, we should guess the language of the program and use
that as the default. The heurisitic implemented here gets the language
of the CU containing "main" (except for the Linux kernel, which is
always C). In the future, we should allow manually overriding the
automatically determined language.
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
While we're here, make generate_dwarf_constants.py use the bundled
dwarf.h, generate code that black is happy with, and use the keyword
list from the standard library.
UTS_RELEASE is currently only accessible once debug info is loaded with
prog.load_debug_info(main=True). This makes it difficult to get the
release, find the appropriate vmlinux, then load the found vmlinux. We
can add vmcoreinfo_object_find as part of set_core_dump(), which makes
it possible to do the following:
prog = drgn.Program()
prog.set_core_dump(core_dump_path)
release = prog['UTS_RELEASE'].string_()
vmlinux_path = find_vmlinux(release)
prog.load_debug_info([vmlinux_path])
The only downside is that this ends up using the default definition of
char rather than what we would get from the debug info, but that
shouldn't be a big problem.
The osrelease is accessible via init_uts_ns.name.release, but we can
also get it straight out of vmcoreinfo, which will be useful for the
next change. UTS_RELEASE is the name of the macro defined in the kernel.
This continues the conversion from the last commit. Members and
parameters are basically the same, so we can do them together. Unlike
enumerators, these don't make sense to unpack or access as sequences.
Currently, type members, enumerators, and parameters are all represented
by tuples in the Python bindings. This is awkward to document and
implement. Instead, let's replace these tuples with proper types,
starting with the easiest one, TypeEnumerator. This one still makes
sense to treat as a sequence so that it can be unpacked as (name,
value).
Currently, we print:
>>> prog.symbol(prog['init_task'])
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: cannot convert 'struct task_struct' to index
It's not obvious what it means to convert to an index. Instead, let's
use the error message raised by operator.index():
TypeError: 'struct task_struct' object cannot be interpreted as an integer