Commit Graph

59 Commits

Author SHA1 Message Date
Omar Sandoval
e7367a4a94 libdrgn: Makefile: remove generated source files from CLEANFILES
We don't actually want make clean to remove the generated files that are
included in a distribution tarball, because then the user will need to
regenerate them, and they might not have the dependencies installed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:31:14 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
f285764f8a Include full libdrgn distribution in drgn sdist
Building drgn from an sdist currently requires autotools and gawk
because libdrgn in the sdist is more or less a git checkout. It's more
user-friendly to include the autotools output and generated code. Do
this by extending the sdist command to include a full libdrgn
distribution with `make distdir`.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-30 23:19:38 -07:00
Omar Sandoval
e5bc41f16c libdrgn: add latest elf.h and dwarf.h to support elfutils 0.165
The oldest LTS version of Ubuntu, 16.04, has elfutils 0.165. This
version is missing some ELF and DWARF definitions used by drgn. Add
copies of elf.h from glibc 2.33 and dwarf.h and elfutils/known-dwarf.h
from elfutils 0.183 to get the latest definitions and drop the minimum
required version of elfutils further to 0.165.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-21 23:18:39 -07:00
Omar Sandoval
4c5c5f3842 Remove bundled version of elfutils
We currently bundle a version of elfutils with patches to export
additional stack tracing functionality. This has a few drawbacks:

- Most of drgn's build time is actually building elfutils.
- Distributions don't like packages that bundle verions of other
  packages.
- elfutils, and thus drgn, can't be built with clang.

Now that we've replaced the elfutils DWARF unwinder with our own, we
don't need the patches, so we can drop the bundled elfutils and fix
these issues.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-16 00:52:09 -07:00
Omar Sandoval
fdaf7790a9 libdrgn: add DWARF call frame information parsing
In preparation for adding our own unwinder, add support for parsing and
finding DWARF/EH call frame information. Use a generic representation of
call frame information so that we can support other formats like ORC in
the future.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
0a6aaaae5d libdrgn: define structure for storing processor register values
libdwfl stores registers in an array of uint64_t indexed by the DWARF
register number. This is suboptimal for a couple of reasons:

1. Although the DWARF specification states that registers should be
   numbered for "optimal density", in practice this isn't the case. ABIs
   include unused ranges of numbers and don't order registers based on
   how likely they are to be known (e.g., caller-saved registers usually
   aren't recovered while unwinding the stack, but they are often
   numbered before callee-saved registers).
2. This precludes support for registers larger than 64 bits, like SSE
   registers.

For our own unwinder, we want to store registers in an
architecture-specific format to solve both of these problems.

So, have each architecture define its layout with registers arranged for
space efficiency and convenience when parsing saved registers from core
dumps. Instead of generating an arch_foo.c file from arch_foo.c.in,
separately define the logical register order in an arch_foo.defs file,
and use it to generate an arch_foo.inc file that is included from
arch_foo.c. The layout is defined as a macro in arch_foo.c. While we're
here, drop some register definitions that aren't useful at the moment.

Then, define struct drgn_register_state to efficiently store registers
in the defined format.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Kamalesh Babulal
221a218704 libdrgn: add powerpc stack trace support
Add powerpc specific register information required to retrive the
stack traces of the tasks on both live system and from the core dump.
It uses the existing DSL format to define platform registers and
helper functions to initial them. It also adds architecture specific
information to enable powerpc. Current support is for little-endian
powerpc only.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2021-01-29 11:31:59 -08:00
Omar Sandoval
b899a10836 Remove register numbers from API and add register aliases
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.

While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 17:47:45 -08:00
Omar Sandoval
46343ae08d libdrgn: get rid of struct drgn_stack_frame
In preparation for adding a "real", internal-only struct
drgn_stack_frame, replace the existing struct drgn_stack_frame with
explicit trace/frame arguments.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-27 11:22:34 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
5975d19580 libdrgn: report better errors when parsing DWARF/kmod index
If the DWARF index encounters any error while parsing, it returns an
error saying only "debug information is truncated", which makes it hard
to track down parsing errors. The kmod index parser silently swallows
errors. For both, replace the mread functions with a higher-level
binary_buffer interface that can include more information including the
location of the error. For example:

  /tmp/mybinary: .debug_info+0x4: expected at least 56 bytes, have 55

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-11-13 17:00:07 -08:00
Omar Sandoval
761da83ddd libdrgn: add {min,max}_iconst() and rewrite min() and max()
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-10 23:48:03 -07:00
Omar Sandoval
fa44171ba1 libdrgn: split bit operations into their own header
And improve their documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 17:44:15 -07:00
Omar Sandoval
cae79d2676 libdrgn: add preprocessor utility macros
These will be used in upcoming changes.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 16:36:59 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
fdbe336386 libdrgn: use -isystem for elfutils headers
The elfutils header files should be treated as if they were in the
standard location, so use -isystem instead of -I.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 15:45:10 -07:00
Omar Sandoval
3ac9ae357b libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info
The current name is too verbose. Let's go with a shorter, more generic
name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-11 17:41:23 -07:00
Omar Sandoval
7a85b4188e libdrgn: clean up read.h helpers and avoid undefined pointer behavior
There are a couple of related ways that we can cause undefined behavior
when parsing a malformed DWARF or depmod index file:

1. There are several places where we increment the cursor to skip past
   some data. It is undefined behavior if the result points out of
   bounds of the data, even if we don't attempt to dereference it.
2. read_in_bounds() checks that ptr <= end. This pointer comparison is
   only defined if ptr and end both point to elements of the same array
   object or one past the last element. If ptr has gone past end, then
   this comparison is likely undefined anyways.

Fix it by adding a helper to skip past data with bounds checking. Then,
all of the helpers can assume that ptr <= end and maintain that
invariant. while we're here and auditing all of the call sites, let's
clean up the API and rename it from read_foo() to the less generic
mread_foo().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
f1eaf5b14c libdrgn: add load_debug_info example program
Really it's more of a test program than an example program. It's useful
for benchmarking, testing with valgrind, etc. It's not built by default,
but it can be built manually with:

  $ make -C build/temp.* examples/load_debug_info

And run with:

  $ ./build/temp.*/examples/load_debug_info

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-10 16:18:58 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
bf545105c6 libdrgn: build in silent mode by default
The automake/libtool compilation output is obnoxiously verbose. Switch
on automake's silent mode, and make the custom rules honor it.
2020-05-10 00:12:50 -07:00
Omar Sandoval
f49d68d8f9 libdrgn: split generic utility functions out of internal.h
internal.h includes both drgn-specific helpers and generic utility
functions. Split the latter into their own util.h header and use it
instead of internal.h in the generic data structure code. This makes it
easier to copy the data structures into other projects/test programs.
2020-05-07 16:03:43 -07:00
Omar Sandoval
3d3c32f849 libdrgn/python: add Language to Python bindings 2020-02-26 19:55:42 -08:00
Omar Sandoval
9e2df9f217 libdrgn: put language definitions in one array
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
2020-02-26 19:55:42 -08:00
Omar Sandoval
376979d25a Remove stray reference to gen_docstrings.py 2020-02-25 13:58:10 -08:00
Omar Sandoval
80c9fb35ff Add type hint stubs and generate documentation from them
I've been wanting to add type hints for the _drgn C extension for
awhile. The main blocker was that there is a large overlap between the
documentation (in docs/api_reference.rst) and the stub file, and I
really didn't want to duplicate the information. Therefore, it was a
requirement that the the documentation could be generated from the stub
file, or vice versa. Unfortunately, none of the existing tools that I
could find supported this very well. So, I bit the bullet and wrote my
own Sphinx extension that uses the stub file as the source of truth (and
subsumes my old autopackage extension and gen_docstrings script).

The stub file is probably incomplete/inaccurate in places, but this
should be a good starting point to improve on.

Closes #22.
2020-02-25 13:39:06 -08:00
Omar Sandoval
d4cc7945af Support building with alternative OpenMP runtime libraries
At Facebook, we link OpenMP code with libomp instead of libgomp. We have
an internal patch to drgn to do this, as it can't be done by setting
CFLAGS/LDFLAGS. Let's add a way to specify the OpenMP library at
configure time so that we can drop the internal patch.
2020-01-24 10:22:38 -08:00
Omar Sandoval
e8d1ef82fa Make drgn.h depend on configure.ac
The previous commit forgot to add this dependency so that when the
version number is updated drgn.h actually gets regenerated.
2020-01-11 22:34:03 -08:00
Omar Sandoval
09a64f5cba Define version in libdrgn/configure.ac
Currently the drgn version number is defined in drgn.h.in, and configure
and setup.py both parse it out of there. However, now that we're
generating drgn.h anyways, it's easier to make configure.ac the source
of truth.
2020-01-11 10:05:57 -08:00
Omar Sandoval
c243daed59 Translate find_task() helper (and dependencies) to C
We'd like to be able to look up tasks by PID from libdrgn, but those
helpers are written in Python. Translate them to C and add some thin
bindings so we can use the same implementation from Python.
2019-10-28 13:08:57 -07:00
Omar Sandoval
d60c6a1d68 libdrgn: add register information to platform
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.
2019-10-18 14:33:02 -07:00
Omar Sandoval
ca9cdc1991 libdrgn: autogenerate docstrings.h
I didn't want to use BUILT_SOURCES before because that would break make
$TARGET. But, now that doesn't work anyways because we're using SUBDIRS,
so we might as well use BUILT_SOURCES.
2019-09-19 11:08:04 -07:00
Omar Sandoval
aa4bfd646f libdrgn: simplify gen_constants.py header search
Instead of passing in a directory for header files, add -iquote for that
directory.
2019-09-19 11:08:04 -07:00
Omar Sandoval
6a13d74c0c libdrgn: build with bundled elfutils
Now that we have the bundled version of elfutils, build it from libdrgn
and link to it. We can also get rid of the elfutils version checks from
the libdrgn code.
2019-09-19 11:07:12 -07:00
Omar Sandoval
f11a8766bf setup.py: get list of source files from git
Currently, we have a special Makefile target to output the files for a
libdrgn source tarball, and we use that for setuptools. However, the
next change is going to import elfutils, and it'd be a pain to add the
same thing for the elfutils sources. Instead, let's just use git
ls-files for everything. The only difference is that source
distributions won't have the autoconf/automake output.
2019-09-03 17:19:02 -07:00
Omar Sandoval
62d98b3016 libdrgn: fold ELF relocation code into dwarf_index
I started with drgn_elf_relocator as a separate interface to parallelize
by relocation. However, the final result is parallelized by file, which
means that it can be done as part of the main read_cus() loop. Get rid
of the elf_relocator interface and do it in dwarf_index.c instead. This
means that if/when libdwfl gets faster at ELF relocations, we can rip
out the relocation code without any other changes.
2019-08-29 12:26:22 -07:00
Omar Sandoval
10142f922f Add basic stack trace support
For now, we only support stack traces for the Linux kernel (at least
v4.9) on x86-64, and we only support getting the program counter and
corresponding function symbol from each stack frame.
2019-08-02 00:26:28 -07:00
Serapheim Dimitropoulos
93d7ea9f01 Add support for kdump-compressed core dumps with libkdumpfile 2019-08-02 00:20:16 -07:00
Omar Sandoval
690b5fd650 libdrgn: generalize architecture to platform
For stack trace support, we'll need to have some architecture-specific
functionality. drgn's current notion of an architecture doesn't actually
include the instruction set architecture. This change expands it to a
"platform", which includes the ISA as well as the existing flags.
2019-08-02 00:11:56 -07:00
Omar Sandoval
71e6744210 libdrgn: add symbol table interface
Now that we're not overloading the name "symbol", we can define struct
drgn_symbol as a symbol table entry. For now, this is very minimal: it's
just a name, address, and size. We can then add a way to find the symbol
for a given address, drgn_program_find_symbol(). For now, this is only
supported through the actual ELF symbol tables. However, in the future,
we can probably support adding "symbol finders".
2019-07-30 09:25:34 -07:00
Omar Sandoval
0c5df56fba libdrgn: replace symbol index with object index
struct drgn_symbol doesn't really represent a symbol; it's just an
object which hasn't been fully initialized (see c2be52dff0 ("libdrgn:
rename object index to symbol index"), it used to be called a "partial
object"). For stack traces, we're going to have a notion of a symbol
that more closely represents an ELF symbol, so let's get rid of the
temporary struct drgn_symbol representation and just return an object
directly.
2019-07-29 17:04:47 -07:00
Omar Sandoval
1d4854a5bc libdrgn: implement optimized x86-64 ELF relocations
After the libdwfl conversion, we apply ELF relocations with libdwfl
instead of our homegrown implementation. However, libdwfl is much slower
at it than the previous implementation. We can work around this by
(again) applying ELF relocations ourselves for architectures that we
care about (x86-64, to start). For other architectures, we can fall back
to libdwfl.

This new implementation of ELF relocation reworks the parallelization to
be per-file rather than per-relocation. The latter was done originally
because before commit 6f16ab09d6 ("libdrgn: only apply ELF relocations
to relocatable files"), we applied relocations to vmlinux, which is much
larger than most kernel modules. Now that we don't do that, it seems to
be slightly faster to parallelize by file.
2019-07-15 12:27:48 -07:00
Omar Sandoval
8d52536271 libdrgn: add common vector implementation
drgn has enough open-coded dynamic arrays at this point to warrant a
common implementation. Add one inspired by hash_table.h. The API is
pretty minimal. I'll add more to it as the need arises.
2019-07-15 12:27:15 -07:00
Omar Sandoval
129f1493b8 libdrgn: split kernel-specific stuff out of program.c
Almost half of program.c is stuff specific to the Linux kernel, so let's
separate that out (and combine it with the existing kernel module code).
2019-07-08 16:53:58 -07:00
Omar Sandoval
10fb398338 libdrgn: add splay tree implementation
This will be used to track memory segments instead of the array we
currently use. The API is based on the hash table API; it can support
alternative implementations in the future, like red-black trees.
2019-05-24 17:48:08 -07:00
Omar Sandoval
f11e030aaa libdrgn: factor out kernel module iteration and section lookup 2019-05-13 16:39:30 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
565e0343ef libdrgn: make symbol index pluggable with callbacks
The last piece of making the major program components pluggable.
2019-05-06 14:55:34 -07:00