Commit Graph

42 Commits

Author SHA1 Message Date
Omar Sandoval
dd08658a6e libdrgn: don't cache ORC sections in struct drgn_elf_file
.orc_unwind_ip and .orc_unwind are only referenced while initially
parsing ORC data and then never touched again, so it's wasteful to cache
them in struct drgn_elf_file. Look them up if and when we parse the ORC
data instead.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-22 15:27:39 -07:00
Omar Sandoval
2ee625fc74 libdrgn: handle DWARF sections exactly* like libdw
We only support .debug_* sections, but libdw also supports .zdebug_*,
.debug_*.dwo, and .gnu.debuglto_.debug_*. Mimic how libdw chooses debug
sections, with one exception: .debug_cu_index and .debug_tu_index (used
for DWP, which we don't support yet but will) should be considered DWO
sections (this needs to be fixed in libdw, too).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-06-20 13:45:04 -07:00
Omar Sandoval
02e344a7dd libdrgn: use strswitch for ELF section names
Move the definitions of the section names to a Python script,
gen_elf_sections.py, and use that to generate the enum definitions and a
lookup function. This is preparation for checking for section names with
the .dwo suffix in the future.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-02-08 13:25:22 -08:00
Omar Sandoval
6486073148 libdrgn: python: fix Py_BuildValue() type in gen_constants.py
We're calling Py_BuildValue() with the "k" format for unsigned long but
passing the enum value itself, which is promoted to int. I don't know
whether there are any ABIs where this matters in practice, but let's use
"K" and cast to unsigned long long explicitly to be safe.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-12-07 16:46:33 -08:00
Omar Sandoval
94e1407a5f libdrgn: python: don't repeat class names in gen_constants.py
Instead, define the list of constant classes in one place so we can
generate all 3 places that need it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-12-07 15:41:49 -08:00
Omar Sandoval
222680b47a Add StackFrame.sp
We have some generic helpers that we'd like to add (for example, #210)
that need to know the stack pointer of a frame. These shouldn't need to
hard-code register names for different architectures. Add a generic
shortcut, StackFrame.sp.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-22 18:47:16 -08:00
Omar Sandoval
87b7292aa5 Relicense drgn from GPLv3+ to LGPLv2.1+
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.

All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-01 17:05:16 -07:00
Omar Sandoval
33d14f7703 libdrgn: rework architecture definition files
Currently, register definitions are split across two files:
arch_foo.defs lists the names of registers, and arch_foo.c defines the
layout used to store registers in memory. The main rationale for this
was that the layout could be processed entirely by the C preprocessor,
but the register names needed an AWK script that we wanted to keep
minimal. But since commit af6f5a887d ("libdrgn: replace gen_arch.awk
with gen_arch_inc_strswitch.py"), arch_foo.defs is processed by a Python
script.

Let's define both the register names and the register layout in a new
file, arch_foo_defs.py, which is processed by gen_arch_inc_strswitch.py
This has a few benefits:

* It puts all of the register definitions for an architecture in one
  place.
* It is easier to maintain than preprocessor magic. (It also makes it
  trivial to support registers that don't exist in DWARF, which would've
  been harder to do with our preprocessor code.)
* It gets rid of our DSL in favor of Python (which also lets us reduce
  repetition for the ppc64 definitions).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-06-25 22:39:26 -07:00
Omar Sandoval
a3b72e33c8 Fix some more flake8 errors
Several have snuck in since the last time I did this in commit
5541fad063 ("Fix some flake8 errors"). Prepare for adding flake8 to
pre-commit by fixing them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-05-17 15:23:42 -07:00
Omar Sandoval
af6f5a887d libdrgn: replace gen_arch.awk with gen_arch_inc_strswitch.py
Now that we have gen_strswitch.py, there's no reason to keep this AWK
script around. Replace it with a Python script that outputs a strswitch
file. This also gets rid of our gawk dependency.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-03-02 16:10:43 -08:00
Omar Sandoval
95dff5b755 libdrgn: split some helpers out of gen_strswitch.py
These will be used by other code generation scripts.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-03-02 16:00:39 -08:00
prozak
e8dada0ec1 Enable prog.type to work with classes
Also added test for CPP class type

This is a prerequisite to #83

Signed-off-by: mykolal <nickolay.lysenko@gmail.com>
2022-02-22 14:55:23 -08:00
Omar Sandoval
c49bba41b2 libdrgn: language_c: replace c_keywords with memswitch
GCC or binutils on Fedora Rawhide for ARM seems to have a bug where
c_keywords gets placed in the .data.rel.ro section (see
https://www.airs.com/blog/archives/189):

$ readelf -s .libs/libdrgnimpl_la-language_c.o | grep -w c_keywords
   475: 00000000    16 OBJECT  LOCAL  DEFAULT  175 c_keywords
$ readelf -S .libs/libdrgnimpl_la-language_c.o | grep -F '[175]'
  [175] .data.rel         PROGBITS        00000000 051f90 000010 00  WA  0   0  4
$ readelf -s .libs/_drgn.so | grep -w c_keywords
  9267: 0008e84c    16 OBJECT  LOCAL  DEFAULT   21 c_keywords.lto_priv.0
$ readelf -S .libs/_drgn.so | grep -F '[21]'
  [21] .data.rel.ro      PROGBITS        0008e018 07e018 000a10 00  WA  0   0  8

This results in a crash on startup when c_keywords_init() attempts to
populate c_keywords.

While this appears to be a compiler or linker bug, I've been meaning to
replace c_keywords with a static lookup function anyways. Now that we
have gen_strswitch.py, we can use it to generate the lookup function.
Add a script, gen_c_keywords_inc_strswitch.py, which generates an array
mapping token kind to spelling, and a memswitch mapping spelling to
token kind.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-04 20:26:35 -08:00
Omar Sandoval
da01da3a2d Add gen_strswitch.py
We have multiple places where we match an input string against several
cases:

* drgn_lexer_c() checks C identifiers against a runtime hash table of C
  keywords.
* linux_kernel_object_find() has an if-else ladder of checks for object
  names.
* drgn_debug_info_find_sections() loops over an array of ELF section
  names to look for sections we need.
* libdrgn/build-aux/gen_arch.awk generates a compile-time trie using
  nested switch statements to match register names.

This commit adds a script, gen_strswitch.py, that can hopefully be used
to replace all of these. gen_strswitch.py generalizes the compile-time
trie idea from gen_arch.awk in a few ways:

* It has syntax and semantics based on C switch statements.
* It supports both null-terminated strings and strings with an explicit
  length.
* It compresses unique substrings to calls to strcmp(), strncmp(), or
  memcmp() when appropriate.

In benchmarks, this approach is more performant than the above options
as well as a candidate based on gperf, while resulting in machine code
around the same size as the straightforward if-else ladder approach.

Future commits will convert the use cases above to use this script.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-04 20:26:35 -08:00
Stephen Brennan
52b96aed88 Run pre-commit on all files
`pre-commit run --all-files` results in the following minor
updates, which appear to be caused by my own failure to run linters.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
2022-01-14 13:31:16 -08:00
Omar Sandoval
c0d8709b45 Update copyright headers to Meta
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-21 15:59:44 -08:00
Omar Sandoval
d1745755f1 Fix some include-what-you-use warnings
Also:

* Rename struct string to struct nstring and move it to its own header.
* Fix scripts/iwyu.py, which was broken by commit 5541fad063 ("Fix
  some flake8 errors").
* Add workarounds for a few outstanding include-what-you-use issues.

There is still a false positive for
include-what-you-use/include-what-you-use#970, but hopefully that is
fixed soon.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-10 15:09:29 -08:00
Stephen Brennan
1744d8d93c libdrgn: python: Add binding, kind to drgn.Symbol
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
2021-08-20 18:16:57 -07:00
Omar Sandoval
8d383fb89a libdrgn: fix alphabetization in gen_constants.py
PlatformFlags obviously comes before PrimitiveType.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-20 15:02:31 -07:00
Omar Sandoval
5541fad063 Fix some flake8 errors
Mainly unused imports, unused variables, unnecessary f-strings, and
regex literals missing the r prefix. I'm not adding it to the CI linter
because it's too noisy, though.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-11 14:52:44 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
eec67768aa libdrgn: replace elfutils DWARF unwinder with our own
The elfutils DWARF unwinder has a couple of limitations:

1. libdwfl doesn't have an interface for getting register values, so we
   have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
   error even on success, so we have to squirrel away our own errors in
   the callback.

Furthermore, there are a couple of things that will be easier with our
own unwinder:

1. Integrating unwinding using ORC will be easier when we're handling
   unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
   expression evaluation.

Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:43:12 -07:00
Omar Sandoval
0a6aaaae5d libdrgn: define structure for storing processor register values
libdwfl stores registers in an array of uint64_t indexed by the DWARF
register number. This is suboptimal for a couple of reasons:

1. Although the DWARF specification states that registers should be
   numbered for "optimal density", in practice this isn't the case. ABIs
   include unused ranges of numbers and don't order registers based on
   how likely they are to be known (e.g., caller-saved registers usually
   aren't recovered while unwinding the stack, but they are often
   numbered before callee-saved registers).
2. This precludes support for registers larger than 64 bits, like SSE
   registers.

For our own unwinder, we want to store registers in an
architecture-specific format to solve both of these problems.

So, have each architecture define its layout with registers arranged for
space efficiency and convenience when parsing saved registers from core
dumps. Instead of generating an arch_foo.c file from arch_foo.c.in,
separately define the logical register order in an arch_foo.defs file,
and use it to generate an arch_foo.inc file that is included from
arch_foo.c. The layout is defined as a macro in arch_foo.c. While we're
here, drop some register definitions that aren't useful at the moment.

Then, define struct drgn_register_state to efficiently store registers
in the defined format.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-15 16:36:38 -07:00
Omar Sandoval
b899a10836 Remove register numbers from API and add register aliases
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.

While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-28 17:47:45 -08:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
80c9fb35ff Add type hint stubs and generate documentation from them
I've been wanting to add type hints for the _drgn C extension for
awhile. The main blocker was that there is a large overlap between the
documentation (in docs/api_reference.rst) and the stub file, and I
really didn't want to duplicate the information. Therefore, it was a
requirement that the the documentation could be generated from the stub
file, or vice versa. Unfortunately, none of the existing tools that I
could find supported this very well. So, I bit the bullet and wrote my
own Sphinx extension that uses the stub file as the source of truth (and
subsumes my old autopackage extension and gen_docstrings script).

The stub file is probably incomplete/inaccurate in places, but this
should be a good starting point to improve on.

Closes #22.
2020-02-25 13:39:06 -08:00
Omar Sandoval
660276a0b8 Format Python code with Black
I'm not a fan of 100% of the Black coding style, but I've spent too much
time manually formatting Python code, so let's just pull the trigger.
2020-01-14 11:51:58 -08:00
Omar Sandoval
09a64f5cba Define version in libdrgn/configure.ac
Currently the drgn version number is defined in drgn.h.in, and configure
and setup.py both parse it out of there. However, now that we're
generating drgn.h anyways, it's easier to make configure.ac the source
of truth.
2020-01-11 10:05:57 -08:00
Omar Sandoval
d60c6a1d68 libdrgn: add register information to platform
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.
2019-10-18 14:33:02 -07:00
Omar Sandoval
ca9cdc1991 libdrgn: autogenerate docstrings.h
I didn't want to use BUILT_SOURCES before because that would break make
$TARGET. But, now that doesn't work anyways because we're using SUBDIRS,
so we might as well use BUILT_SOURCES.
2019-09-19 11:08:04 -07:00
Omar Sandoval
aa4bfd646f libdrgn: simplify gen_constants.py header search
Instead of passing in a directory for header files, add -iquote for that
directory.
2019-09-19 11:08:04 -07:00
Omar Sandoval
690b5fd650 libdrgn: generalize architecture to platform
For stack trace support, we'll need to have some architecture-specific
functionality. drgn's current notion of an architecture doesn't actually
include the instruction set architecture. This change expands it to a
"platform", which includes the ISA as well as the existing flags.
2019-08-02 00:11:56 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
47151c4554 libdrgn/python: add Program.object() 2019-05-06 14:55:34 -07:00
Omar Sandoval
0ea2825817 libdrgn: refactor gen_constants.py
All of the constants are generated with basically the same code, so
refactor it.
2019-05-06 14:55:34 -07:00
Omar Sandoval
932b7857b5 libdrgn: expose primitive type concept to public interface
Previously known as c_type.
2019-05-06 14:55:34 -07:00
Omar Sandoval
bad1b9b33c libdrgn/python: use generated docstrings for generated types
I missed ProgramFlags, Qualifiers, and TypeKind.
2019-05-06 14:55:34 -07:00
Omar Sandoval
435640faf6 Fix some linter errors 2019-04-11 15:51:20 -07:00
Omar Sandoval
393a1f3149 Document with Sphinx
drgn has pretty thorough in-program documentation, but it doesn't have a
nice overview or introduction to the basic concepts. This commit adds
that using Sphinx. In order to avoid documenting everything in two
places, the libdrgn bindings have their docstrings generated from the
API documentation. The alternative would be to use Sphinx's autodoc
extension, but that's not as flexible and would also require building
the extension to build the docs. The documentation for the helpers is
generated using autodoc and a small custom extension.
2019-04-11 12:48:15 -07:00
Omar Sandoval
687ea74ff2 Build Python bindings with automake
I went back and forth on using setuptools or autotools for the Python
extension, but I eventually settled on using only setuptools after
fighting to get the two to integrate well. However, setuptools is kind
of crappy; for one, it rebuilds every source file when rebuilding the
extension, which is really annoying for development. automake is a
better designed build system overall, so let's use that for the
extension. We override the build_ext command to build using autotools
and copy things where setuptools expects them.
2019-04-03 17:00:53 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00