JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-24 02:03:05 +00:00

Author	SHA1	Message	Date
Omar Sandoval	89b5da2abb	libdrgn: dwarf_index: free namespaces when rolling back Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	e69d0c0064	libdrgn: dwarf_index: fix use after free of pending CU If we create a pending CU for a namespace, then add more CUs to the index, the CU might get reallocated, resulting in a use after free. Fix it by storing the index of the CU instead of the pointer. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	f83bb7c71b	libdrgn: move debugging information tracking into drgn_debug_info Debugging information tracking is currently in two places: drgn_program finds debugging information, and drgn_dwarf_index stores it. Both of these responsibilities make more sense as part of drgn_debug_info, so let's move them there. This prepares us to track extra debugging information that isn't pertinent to indexing. This also reworks a couple of details of loading debugging information: - drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into a single structure, drgn_debug_info_module. - The first pass of DWARF indexing now happens in parallel with reading compilation units (by using OpenMP tasks). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-22 10:58:24 -07:00
Omar Sandoval	3ac9ae357b	libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info The current name is too verbose. Let's go with a shorter, more generic name. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-11 17:41:23 -07:00
Jay Kamat	d1beb0184a	libdrgn: add support for objects in C++ namespaces DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the DWARF index, with each namespace being its own sub-index. We only index the namespace itself when it is first accessed, which should help with startup time and simplifies tracking. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2020-09-02 17:13:16 -07:00
Jay Kamat	a51abfcd70	libdrgn: dwarf_index: keep CUs after indexing In order to index namespaces lazily, we need the CU structures. Rename struct compilation_unit to the less generic struct drgn_dwarf_index_cu and keep the CUs in a vector in the dindex. Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	66ad5077c9	libdrgn: dwarf_index: return indexed DIE entry from drgn_dwarf_index_iterator_next() For namespace support, we will want to access the struct drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	d512964c1e	libdrgn: add drgn_error_copy() This is needed for a future change where we'll want to save an error and return it multiple times. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	7a85b4188e	libdrgn: clean up read.h helpers and avoid undefined pointer behavior There are a couple of related ways that we can cause undefined behavior when parsing a malformed DWARF or depmod index file: 1. There are several places where we increment the cursor to skip past some data. It is undefined behavior if the result points out of bounds of the data, even if we don't attempt to dereference it. 2. read_in_bounds() checks that ptr <= end. This pointer comparison is only defined if ptr and end both point to elements of the same array object or one past the last element. If ptr has gone past end, then this comparison is likely undefined anyways. Fix it by adding a helper to skip past data with bounds checking. Then, all of the helpers can assume that ptr <= end and maintain that invariant. while we're here and auditing all of the call sites, let's clean up the API and rename it from read_foo() to the less generic mread_foo(). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	c053c2b212	libdrgn: dwarf_index: handle DW_AT_specification with DW_FORM_ref_addr Now that we can handle a DW_AT_specification that references another compilation unit, add support for DW_FORM_ref_addr. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	26291647eb	libdrgn: dwarf_index: handle DW_AT_specification DIEs with two passes We currently handle DIEs with a DW_AT_specification attribute by parsing the corresponding declaration to get the name and inserting the DIE as usual. This has a couple of problems: 1. It only works if DW_AT_specification refers to the same compilation unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not DW_FORM_ref_addr. As a result, drgn doesn't support the latter. 2. It assumes that the DIE with DW_AT_specification is in the correct "scope". Unfortunately, this is not true for g++: for a variable definition in a C++ namespace, it generates a DIE with DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE which refers to the declaration with DW_AT_specification _outside_ of the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE. Supporting both of these cases requires reworking how we handle DW_AT_specification. This commit takes an approach of parsing the DWARF data in two passes: the first pass reads the abbrevation and file name tables and builds a map of instances of DW_AT_specification; the second pass indexes DIEs as before, but ignores DIEs with DW_AT_specification and handles DIEs with DW_AT_declaration by looking them up in the map built by the first pass. This approach is a 10-20% regression in indexing time in the benchmarks I ran. Thankfully, it is not 100% slower for a couple of reasons. The first is that the two passes are simpler than the original combined pass. The second is that a decent part of the indexing time is spent faulting in the mapped debugging information, which only needs to happen once (even if the file is cached, minor page faults add non-negligible overhead). This doesn't handle DW_AT_specification "chains" yet, but neither did the original code. If it is necessary, it shouldn't be too difficult to add. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	507977664c	libdrgn: dwarf_index: store abbrevation and file name tables in CU This is preparation for the next change where we'll need to do two passes over the CUs. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	0b4ab1772b	libdrgn: dwarf_index: store DIE indices as uint32_t It's very unlikely that we'll ever index more than 4 billion DIEs in a single shard, so we can shrink the index a bit by using uint32_t indices (and uint8_t tag). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	9ce9094ee0	libdrgn: dwarf_index: don't copy sections into each CU I originally copied the sections into each compilation unit to avoid a pointer indirection, but performance-wise it's a wash, so we might as well save the memory. This will be more important when we keep the CUs after indexing. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	94e7b1f92c	libdrgn: dwarf_index: avoid copying CUs for one thread In read_cus(), the master thread can use the final CUs vector directly and the rest of the threads can merge their private vectors in. This consistently shaves a few milliseconds off of startup. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	53ba7262cd	libdrgn: dwarf_index: handle DW_AT_declaration with DW_FORM_flag We currently assume that if DW_AT_declaration is present, it is true. This seems to be true in practice, and I see no reason to ever use DW_FORM_flag with a value of zero. There's no performance hit to handle it, though, so we might as well. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	ea9f3f3114	libdrgn: dwarf_index: don't worry about tag of CU DIE As a small simplification, we can take commit `9bb2ccecb7` ("Enable DWARF indexing to work with partial units") further and not look at the tag of the top-level DIE at all. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	c8f84c57fb	libdrgn: dwarf_index: use size_t instead of uint64_t where appropriate The CU unit length and DIE offset are both limited by the size of the mapped debugging information, i.e., size_t. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	2252bef1a7	libdrgn: dwarf_index: rename TAG_FLAG_* and TAG_MASK to DIE_FLAG_* This is more clear: although these flags happen to be encoded with the DWARF tag, they are flags regarding the DIE. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	85c4b36820	libdrgn: dwarf_index: fix leak when parsing bad line number program header If we fail to read an include directory in read_file_name_table(), we need to free the directory hashes. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-09-02 17:13:16 -07:00
Omar Sandoval	d4b1711128	setup.py: add 5.9 to vmtest kernels Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 16:29:03 -07:00
Omar Sandoval	5ce80016c5	docs: use a different example for execscript() We have a real task_state_to_char() helper now, so we shouldn't use a half-baked implementation of it as an example. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 14:16:19 -07:00
Omar Sandoval	ff96c75da0	helpers: translate task_state_to_char() to Python Commit `326107f054` ("libdrgn: add task_state_to_char() helper") implemented task_state_to_char() in libdrgn so that it could be used in commit `4780c7a266` ("libdrgn: stack_trace: prohibit unwinding stack of running tasks"). As of commit `eea5422546` ("libdrgn: make Linux kernel stack unwinding more robust"), it is no longer used in libdrgn, so we can translate it to Python. This removes a bunch of code and is more useful as an example. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 13:54:39 -07:00
Omar Sandoval	36068a0ea8	Fix trailing commas for Black v20.8b1 Black was recently changed to treat a trailing comma as an indicator to put each item/argument on its own line. We have a bunch of places where something previously had to be split into multiple lines, then was edited to fit on one line, but Black kept the trailing comma. Now this update wants to unnecessarily split it back up. For now, let's get rid of these commas. Hopefully in the future Black has a way to opt out of this. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:29 -07:00
Omar Sandoval	e96d9fd3fd	libdrgn/python: don't allow None for Program.object() flags Similar to the previous commit, this was to work around pydoc issues that we don't have anymore. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:29 -07:00
Omar Sandoval	2fc514f2a4	libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers] I originally did it this way because pydoc doesn't handle non-trivial defaults in signature very well (see commit `67a16a09b8` ("tests: test that Python documentation renders")). drgndoc doesn't generate signature for pydoc anymore, though, so we don't need to worry about it and can clean up the typing. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:29 -07:00
Omar Sandoval	e49a87a3d7	libdrgn: remove struct drgn_object::prog We can get it via the type now. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-27 11:31:21 -07:00
Omar Sandoval	a97f6c4fa2	Associate types with program I originally envisioned types as dumb descriptors. This mostly works for C because in C, types are fairly simple. However, even then the drgn_program_member_info() API is awkward. You should be able to look up a member directly from a type, but we need the program for caching purposes. This has also held me back from adding offsetof() or has_member() APIs. Things get even messier with C++. C++ template parameters can be objects (e.g., template <int N>). Such parameters would best be represented by a drgn object, which we need a drgn program for. Static members are a similar case. So, let's reimagine types as being owned by a program. This has a few parts: 1. In libdrgn, simple types are now created by factory functions, drgn_foo_type_create(). 2. To handle their variable length fields, compound types, enum types, and function types are constructed with a "builder" API. 3. Simple types are deduplicated. 4. The Python type factory functions are replaced by methods of the Program class. 5. While we're changing the API, the parameters to pointer_type() and array_type() are reordered to be more logical (and to allow pointer_type() to take a default size of None for the program's default pointer size). 6. Likewise, the type factory methods take qualifiers as a keyword argument only. A big part of this change is updating the tests and splitting up large test cases into smaller ones in a few places. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:41:09 -07:00
Omar Sandoval	c31208f69c	libdrgn: fold drgn_type_index into drgn_program This is preparation for associating types with a program. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:36:35 -07:00
Omar Sandoval	1c8181e22d	libdrgn: rearrange struct drgn_program members struct drgn_program has a bunch of state scattered around. Group it together more logically, even if it means sacrificing some padding here and there. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 17:34:44 -07:00
Omar Sandoval	d4e0771f87	libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}() Most places that call these check has_platform and return an error, and those that don't can live with the extra check. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 16:56:28 -07:00
Omar Sandoval	a8d632b4c1	libdrgn/python: use F14 instead of PyDict for Program::objects Program::objects is used to store references to objects that must stay alive while the Program is alive. It is currently a PyDict where the keys are the object addresses as PyLong and the values are the objects themselves. This has two problems: 1. Allocating the key as a full object is obviously wasteful. 2. PyDict doesn't have an API for reserving capacity ahead of time, which we want for an upcoming change. Both of these are easily fixed by using our own hash table. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 16:56:28 -07:00
Omar Sandoval	b0f9403ebf	drgndoc: directly use name passed as argument to drgndoc directive E.g., drgndoc:: foo.bar() should emit py:method:: foo.bar() regardless of a previous py:module directive. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 16:56:28 -07:00
Omar Sandoval	93e33513da	drgndoc: bring back :exclude: It's still useful to have an escape hatch for names we don't want documented. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-26 16:36:55 -07:00
Omar Sandoval	d40526d85d	scripts: add Python include header path to cscope Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-25 18:07:31 -07:00
arsarwade	6f6c5f272f	libdrgn: export function drgn_object_init() (#70 ) drgn_object_init() is available in drgh.h file and seems to a required call before calling drgn_program_find_object(). Without this, trying to call drgn_object_init() from an external C application results in undefined reference. Signed-off-by: Aditya Sarwade <asarwade@fb.com>	2020-08-21 10:24:52 -07:00
Omar Sandoval	903a44d0dd	travis: upgrade to Ubuntu 20.04 This picks up a newer version of QEMU and lets us use udevadm trigger -w. Let's also explicitly add "os: linux" to silence the config validation. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 17:57:39 -07:00
Omar Sandoval	7fb196cfbf	vmtest: don't use onoatimehack on QEMU 5.1.0 As of QEMU commit a5804fcf7b22 ("9pfs: local: ignore O_NOATIME if we don't have permissions") (in v5.1.0), QEMU handles O_NOATIME sanely, so we don't need the LD_PRELOAD hack. Since we're adding a version check, make the multidevs check based on the version, too. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 17:55:57 -07:00
Omar Sandoval	656d85f2fe	travis: check Python code with black, isort, and mypy Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 16:55:07 -07:00
Omar Sandoval	4e770fb18a	Format imports with isort Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 16:55:07 -07:00
Omar Sandoval	8c7c80e2f7	Fix mypy --strict warnings The remaining warnings are all no-any-return, which is hard to avoid in drgn. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 16:28:02 -07:00
Omar Sandoval	0cf3320a89	Add type annotations to helpers Now that drgndoc can handle overloads and we have the IntegerLike and Path aliases, we can add type annotations to all helpers. There are also a couple of functional changes that snuck in here to make annotating easier. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 16:28:02 -07:00
Omar Sandoval	e4a2676cac	drgndoc: support @typing.overload() One of the blockers for adding type annotations to helpers is that some helpers need to be overloaded, but drgndoc doesn't support that. This adds support. Each function now tracks all of its overloaded signature, each of which may be documented separately. The formatted output (for functions/methods and classes with __init__()) combines all of the documented overloads. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:21:29 -07:00
Omar Sandoval	64a04a6c4f	drgndoc: include attributes based on presence of docstring We can get rid of the :include: and :exclude: options by deciding solely based on whether a node has a docstring. Empty docstrings can be used to indicate nodes that should be included with no additional content. The __init__() method must now also have a docstring in order to be documented. Additionally, the directives are now fully formatted by the Formatter rather than being split between the Formatter and DrgnDocDirective. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:21:29 -07:00
Omar Sandoval	f41cc7fb48	drgndoc: recursively document names imported with alias The helpers implemented in C have Python wrappers only for the purpose of documentation. This is because drgndoc ignores all imports when recursively documenting attributes. However, mypy uses the convention that aliased imports (i.e., import ... as ... or from ... import ... as ...) are considered re-exported, so we can follow that convention and include aliased imports. (mypy also considered attributes in __all__ as re-exported, so we should probably follow that in the future, too, but for now aliased imports are enough). This lets us get rid of the Python wrappers. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:21:29 -07:00
Omar Sandoval	192d35c609	drgndoc: support relative imports Mainly for completeness, as I don't really like using them in my own projects. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:21:29 -07:00
Omar Sandoval	a270525f8b	drgndoc: save all modules and classes traversed to resolve name This will be used to support relative imports. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:21:29 -07:00
Omar Sandoval	4a3b8fb8e6	drgndoc: fix mypy --strict errors Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:21:29 -07:00
Omar Sandoval	2d49ef657b	Add Path type alias Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes] to get rid of some mypy --strict errors. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:20:29 -07:00
Omar Sandoval	66c5cc83a6	Add IntegerLike type annotation Lots if interfaces in drgn transparently turn an integer Object into an int by using __index__(), so add an IntegerLike protocol for this and use it everywhere applicable. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2020-08-20 11:16:50 -07:00

1 2 3 4 5 ...

808 Commits