JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-22 17:23:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	dd08658a6e	libdrgn: don't cache ORC sections in struct drgn_elf_file .orc_unwind_ip and .orc_unwind are only referenced while initially parsing ORC data and then never touched again, so it's wasteful to cache them in struct drgn_elf_file. Look them up if and when we parse the ORC data instead. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-06-22 15:27:39 -07:00
Omar Sandoval	2ee625fc74	libdrgn: handle DWARF sections exactly* like libdw We only support .debug_* sections, but libdw also supports .zdebug_, .debug_.dwo, and .gnu.debuglto_.debug_*. Mimic how libdw chooses debug sections, with one exception: .debug_cu_index and .debug_tu_index (used for DWP, which we don't support yet but will) should be considered DWO sections (this needs to be fixed in libdw, too). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-06-20 13:45:04 -07:00
Omar Sandoval	02e344a7dd	libdrgn: use strswitch for ELF section names Move the definitions of the section names to a Python script, gen_elf_sections.py, and use that to generate the enum definitions and a lookup function. This is preparation for checking for section names with the .dwo suffix in the future. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-02-08 13:25:22 -08:00
Omar Sandoval	6486073148	libdrgn: python: fix Py_BuildValue() type in gen_constants.py We're calling Py_BuildValue() with the "k" format for unsigned long but passing the enum value itself, which is promoted to int. I don't know whether there are any ABIs where this matters in practice, but let's use "K" and cast to unsigned long long explicitly to be safe. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-12-07 16:46:33 -08:00
Omar Sandoval	94e1407a5f	libdrgn: python: don't repeat class names in gen_constants.py Instead, define the list of constant classes in one place so we can generate all 3 places that need it. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-12-07 15:41:49 -08:00
Omar Sandoval	222680b47a	Add StackFrame.sp We have some generic helpers that we'd like to add (for example, #210) that need to know the stack pointer of a frame. These shouldn't need to hard-code register names for different architectures. Add a generic shortcut, StackFrame.sp. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-22 18:47:16 -08:00
Omar Sandoval	87b7292aa5	Relicense drgn from GPLv3+ to LGPLv2.1+ drgn is currently licensed as GPLv3+. Part of the long term vision for drgn is that other projects can use it as a library providing programmatic interfaces for debugger functionality. A more permissive license is better suited to this goal. We decided on LGPLv2.1+ as a good balance between software freedom and permissiveness. All contributors not employed by Meta were contacted via email and consented to the license change. The only exception was the author of commit `c4fbf7e589` ("libdrgn: fix for compilation error"), who did not respond. That commit reverted a single line of code to one originally written by me in commit `640b1c011d` ("libdrgn: embed DWARF index in DWARF info cache"). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-01 17:05:16 -07:00
Omar Sandoval	33d14f7703	libdrgn: rework architecture definition files Currently, register definitions are split across two files: arch_foo.defs lists the names of registers, and arch_foo.c defines the layout used to store registers in memory. The main rationale for this was that the layout could be processed entirely by the C preprocessor, but the register names needed an AWK script that we wanted to keep minimal. But since commit `af6f5a887d` ("libdrgn: replace gen_arch.awk with gen_arch_inc_strswitch.py"), arch_foo.defs is processed by a Python script. Let's define both the register names and the register layout in a new file, arch_foo_defs.py, which is processed by gen_arch_inc_strswitch.py This has a few benefits: * It puts all of the register definitions for an architecture in one place. * It is easier to maintain than preprocessor magic. (It also makes it trivial to support registers that don't exist in DWARF, which would've been harder to do with our preprocessor code.) * It gets rid of our DSL in favor of Python (which also lets us reduce repetition for the ppc64 definitions). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-06-25 22:39:26 -07:00
Omar Sandoval	a3b72e33c8	Fix some more flake8 errors Several have snuck in since the last time I did this in commit `5541fad063` ("Fix some flake8 errors"). Prepare for adding flake8 to pre-commit by fixing them. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-05-17 15:23:42 -07:00
Omar Sandoval	af6f5a887d	libdrgn: replace gen_arch.awk with gen_arch_inc_strswitch.py Now that we have gen_strswitch.py, there's no reason to keep this AWK script around. Replace it with a Python script that outputs a strswitch file. This also gets rid of our gawk dependency. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-03-02 16:10:43 -08:00
Omar Sandoval	95dff5b755	libdrgn: split some helpers out of gen_strswitch.py These will be used by other code generation scripts. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-03-02 16:00:39 -08:00
prozak	e8dada0ec1	Enable prog.type to work with classes Also added test for CPP class type This is a prerequisite to #83 Signed-off-by: mykolal <nickolay.lysenko@gmail.com>	2022-02-22 14:55:23 -08:00
Omar Sandoval	c49bba41b2	libdrgn: language_c: replace c_keywords with memswitch GCC or binutils on Fedora Rawhide for ARM seems to have a bug where c_keywords gets placed in the .data.rel.ro section (see https://www.airs.com/blog/archives/189): $ readelf -s .libs/libdrgnimpl_la-language_c.o \| grep -w c_keywords 475: 00000000 16 OBJECT LOCAL DEFAULT 175 c_keywords $ readelf -S .libs/libdrgnimpl_la-language_c.o \| grep -F '[175]' [175] .data.rel PROGBITS 00000000 051f90 000010 00 WA 0 0 4 $ readelf -s .libs/_drgn.so \| grep -w c_keywords 9267: 0008e84c 16 OBJECT LOCAL DEFAULT 21 c_keywords.lto_priv.0 $ readelf -S .libs/_drgn.so \| grep -F '[21]' [21] .data.rel.ro PROGBITS 0008e018 07e018 000a10 00 WA 0 0 8 This results in a crash on startup when c_keywords_init() attempts to populate c_keywords. While this appears to be a compiler or linker bug, I've been meaning to replace c_keywords with a static lookup function anyways. Now that we have gen_strswitch.py, we can use it to generate the lookup function. Add a script, gen_c_keywords_inc_strswitch.py, which generates an array mapping token kind to spelling, and a memswitch mapping spelling to token kind. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-04 20:26:35 -08:00
Omar Sandoval	da01da3a2d	Add gen_strswitch.py We have multiple places where we match an input string against several cases: * drgn_lexer_c() checks C identifiers against a runtime hash table of C keywords. * linux_kernel_object_find() has an if-else ladder of checks for object names. * drgn_debug_info_find_sections() loops over an array of ELF section names to look for sections we need. * libdrgn/build-aux/gen_arch.awk generates a compile-time trie using nested switch statements to match register names. This commit adds a script, gen_strswitch.py, that can hopefully be used to replace all of these. gen_strswitch.py generalizes the compile-time trie idea from gen_arch.awk in a few ways: * It has syntax and semantics based on C switch statements. * It supports both null-terminated strings and strings with an explicit length. * It compresses unique substrings to calls to strcmp(), strncmp(), or memcmp() when appropriate. In benchmarks, this approach is more performant than the above options as well as a candidate based on gperf, while resulting in machine code around the same size as the straightforward if-else ladder approach. Future commits will convert the use cases above to use this script. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-02-04 20:26:35 -08:00
Stephen Brennan	52b96aed88	Run pre-commit on all files `pre-commit run --all-files` results in the following minor updates, which appear to be caused by my own failure to run linters. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>	2022-01-14 13:31:16 -08:00
Omar Sandoval	c0d8709b45	Update copyright headers to Meta Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-21 15:59:44 -08:00
Omar Sandoval	d1745755f1	Fix some include-what-you-use warnings Also: * Rename struct string to struct nstring and move it to its own header. * Fix scripts/iwyu.py, which was broken by commit `5541fad063` ("Fix some flake8 errors"). * Add workarounds for a few outstanding include-what-you-use issues. There is still a false positive for include-what-you-use/include-what-you-use#970, but hopefully that is fixed soon. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-10 15:09:29 -08:00
Stephen Brennan	1744d8d93c	libdrgn: python: Add binding, kind to drgn.Symbol Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>	2021-08-20 18:16:57 -07:00
Omar Sandoval	8d383fb89a	libdrgn: fix alphabetization in gen_constants.py PlatformFlags obviously comes before PrimitiveType. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-08-20 15:02:31 -07:00
Omar Sandoval	5541fad063	Fix some flake8 errors Mainly unused imports, unused variables, unnecessary f-strings, and regex literals missing the r prefix. I'm not adding it to the CI linter because it's too noisy, though. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-08-11 14:52:44 -07:00
Omar Sandoval	a4b9d68a8c	Use GPL-3.0-or-later license identifier instead of GPL-3.0+ Apparently the latter is deprecated and the former is preferred. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-04-03 01:10:35 -07:00
Omar Sandoval	eec67768aa	libdrgn: replace elfutils DWARF unwinder with our own The elfutils DWARF unwinder has a couple of limitations: 1. libdwfl doesn't have an interface for getting register values, so we have to bundle a patched version of elfutils with drgn. 2. Error handling is very awkward: dwfl_getthread_frames() can return an error even on success, so we have to squirrel away our own errors in the callback. Furthermore, there are a couple of things that will be easier with our own unwinder: 1. Integrating unwinding using ORC will be easier when we're handling unwinding ourselves. 2. Support for local variables isn't too far away now that we have DWARF expression evaluation. Now that we have the register state, CFI, and DWARF expression pieces in place, stitch them together with the new unwinder, and tweak the public API a bit to reflect it. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-15 16:43:12 -07:00
Omar Sandoval	0a6aaaae5d	libdrgn: define structure for storing processor register values libdwfl stores registers in an array of uint64_t indexed by the DWARF register number. This is suboptimal for a couple of reasons: 1. Although the DWARF specification states that registers should be numbered for "optimal density", in practice this isn't the case. ABIs include unused ranges of numbers and don't order registers based on how likely they are to be known (e.g., caller-saved registers usually aren't recovered while unwinding the stack, but they are often numbered before callee-saved registers). 2. This precludes support for registers larger than 64 bits, like SSE registers. For our own unwinder, we want to store registers in an architecture-specific format to solve both of these problems. So, have each architecture define its layout with registers arranged for space efficiency and convenience when parsing saved registers from core dumps. Instead of generating an arch_foo.c file from arch_foo.c.in, separately define the logical register order in an arch_foo.defs file, and use it to generate an arch_foo.inc file that is included from arch_foo.c. The layout is defined as a macro in arch_foo.c. While we're here, drop some register definitions that aren't useful at the moment. Then, define struct drgn_register_state to efficiently store registers in the defined format. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-03-15 16:36:38 -07:00
Omar Sandoval	b899a10836	Remove register numbers from API and add register aliases enum drgn_register_number in the public libdrgn API and drgn.Register.number in the Python bindings are basically exports of DWARF register numbers. They only exist as a way to identify registers that's lighter weight than string lookups. libdrgn already has struct drgn_register, so we can use that to identify registers in the public API and remove enum drgn_register_number. This has a couple of benefits: we don't depend on DWARF numbering in our API, and we don't have to generate drgn.h from the architecture files. The Python bindings can just use string names for now. If it seems useful, StackFrame.register() can take a Register in the future, we'll just need to be careful to not allow Registers from the wrong platform. While we're changing the API anyways, also change it so that registers have a list of names instead of one name. This isn't needed for x86-64 at the moment, but will be for architectures that have multiple names for the same register (like ARM). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-01-28 17:47:45 -08:00
Omar Sandoval	2fc514f2a4	libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers] I originally did it this way because pydoc doesn't handle non-trivial defaults in signature very well (see commit `67a16a09b8` ("tests: test that Python documentation renders")). drgndoc doesn't generate signature for pydoc anymore, though, so we don't need to worry about it and can clean up the typing. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:29 -07:00
Omar Sandoval	8b264f8823	Update copyright headers to Facebook and add missing headers drgn was originally my side project, but for awhile now it's also been my work project. Update the copyright headers to reflect this, and add a copyright header to various files that were missing it.	2020-05-15 15:13:02 -07:00
Omar Sandoval	80c9fb35ff	Add type hint stubs and generate documentation from them I've been wanting to add type hints for the _drgn C extension for awhile. The main blocker was that there is a large overlap between the documentation (in docs/api_reference.rst) and the stub file, and I really didn't want to duplicate the information. Therefore, it was a requirement that the the documentation could be generated from the stub file, or vice versa. Unfortunately, none of the existing tools that I could find supported this very well. So, I bit the bullet and wrote my own Sphinx extension that uses the stub file as the source of truth (and subsumes my old autopackage extension and gen_docstrings script). The stub file is probably incomplete/inaccurate in places, but this should be a good starting point to improve on. Closes #22.	2020-02-25 13:39:06 -08:00
Omar Sandoval	660276a0b8	Format Python code with Black I'm not a fan of 100% of the Black coding style, but I've spent too much time manually formatting Python code, so let's just pull the trigger.	2020-01-14 11:51:58 -08:00
Omar Sandoval	09a64f5cba	Define version in libdrgn/configure.ac Currently the drgn version number is defined in drgn.h.in, and configure and setup.py both parse it out of there. However, now that we're generating drgn.h anyways, it's easier to make configure.ac the source of truth.	2020-01-11 10:05:57 -08:00
Omar Sandoval	d60c6a1d68	libdrgn: add register information to platform In order to retrieve registers from stack traces, we need to know what registers are defined for a platform. This adds a small DSL for defining registers for an architecture. The DSL is parsed by an awk script that generates the necessary tables, lookup functions, and enum definitions.	2019-10-18 14:33:02 -07:00
Omar Sandoval	ca9cdc1991	libdrgn: autogenerate docstrings.h I didn't want to use BUILT_SOURCES before because that would break make $TARGET. But, now that doesn't work anyways because we're using SUBDIRS, so we might as well use BUILT_SOURCES.	2019-09-19 11:08:04 -07:00
Omar Sandoval	aa4bfd646f	libdrgn: simplify gen_constants.py header search Instead of passing in a directory for header files, add -iquote for that directory.	2019-09-19 11:08:04 -07:00
Omar Sandoval	690b5fd650	libdrgn: generalize architecture to platform For stack trace support, we'll need to have some architecture-specific functionality. drgn's current notion of an architecture doesn't actually include the instruction set architecture. This change expands it to a "platform", which includes the ISA as well as the existing flags.	2019-08-02 00:11:56 -07:00
Omar Sandoval	baba1ff3f0	libdrgn: make program components pluggable Currently, programs can be created for three main use-cases: core dumps, the running kernel, and a running process. However, internally, the program memory, types, and symbols are pluggable. Expose that as a callback API, which makes it possible to use drgn in much more creative ways.	2019-05-10 12:41:07 -07:00
Omar Sandoval	47151c4554	libdrgn/python: add Program.object()	2019-05-06 14:55:34 -07:00
Omar Sandoval	0ea2825817	libdrgn: refactor gen_constants.py All of the constants are generated with basically the same code, so refactor it.	2019-05-06 14:55:34 -07:00
Omar Sandoval	932b7857b5	libdrgn: expose primitive type concept to public interface Previously known as c_type.	2019-05-06 14:55:34 -07:00
Omar Sandoval	bad1b9b33c	libdrgn/python: use generated docstrings for generated types I missed ProgramFlags, Qualifiers, and TypeKind.	2019-05-06 14:55:34 -07:00
Omar Sandoval	435640faf6	Fix some linter errors	2019-04-11 15:51:20 -07:00
Omar Sandoval	393a1f3149	Document with Sphinx drgn has pretty thorough in-program documentation, but it doesn't have a nice overview or introduction to the basic concepts. This commit adds that using Sphinx. In order to avoid documenting everything in two places, the libdrgn bindings have their docstrings generated from the API documentation. The alternative would be to use Sphinx's autodoc extension, but that's not as flexible and would also require building the extension to build the docs. The documentation for the helpers is generated using autodoc and a small custom extension.	2019-04-11 12:48:15 -07:00
Omar Sandoval	687ea74ff2	Build Python bindings with automake I went back and forth on using setuptools or autotools for the Python extension, but I eventually settled on using only setuptools after fighting to get the two to integrate well. However, setuptools is kind of crappy; for one, it rebuilds every source file when rebuilding the extension, which is really annoying for development. automake is a better designed build system overall, so let's use that for the extension. We override the build_ext command to build using autotools and copy things where setuptools expects them.	2019-04-03 17:00:53 -07:00
Omar Sandoval	75c3679147	Rewrite drgn core in C The current mixed Python/C implementation works well, but it has a couple of important limitations: - It's too slow for some common use cases, like iterating over large data structures. - It can't be reused in utilities written in other languages. This replaces the internals with a new library written in C, libdrgn. It includes Python bindings with mostly the same public interface as before, with some important improvements: - Types are now represented by a single Type class rather than the messy polymorphism in the Python implementation. - Qualifiers are a bitmask instead of a set of strings. - Bit fields are not considered a separate type. - The lvalue/rvalue terminology is replaced with reference/value. - Structure, union, and array values are better supported. - Function objects are supported. - Program distinguishes between lookups of variables, constants, and functions. The C rewrite is about 6x as fast as the original Python when using the Python bindings, and about 8x when using the C API directly. Currently, the exposed API in C is fairly conservative. In the future, the memory reader, type index, and object index APIs will probably be exposed for more flexibility.	2019-04-02 14:12:07 -07:00

42 Commits