object-introspection/tools/config_gen.py
Jake Hillion fe9b4b232a nix: add building oid to the flake
OI's build is challenging and has often been a problem for the Open Source
community. It requires an extremely specific set of dependencies that are very
hard to achieve on most systems. There are frequent breakages, like when
updating to CentOS Stream 9, or when trying to update the CI's clang from
clang-12 to clang-15 - OI requires the clang libraries to be version 15 but
can't be compiled with it on the CI!

This changes provides a mostly working build environment with `nix`. This
environment is pinned to a specific nixpkgs revision using `flake.lock`, and
only updates when we explicitly tell it to.

Summary of changes:
- Update CMakeLists.txt required version to 3.24. This allows specifying
  `FIND_PACKAGE_ARGS` in `FetchContent`, meaning we can use system packages.
  This is available on most up to date distros (3.30.2 is current).
- Extends `flake.nix` to be able to build OI. Adds instructions for building
  and developing OI using `nix`.
- Partially runs the tests in GitHub Actions. A huge amount must be excluded
  because of incompatibilites between the clangStdenv compiler and drgn. We
  have similar, though fewer, issues when building with the clang-12/libstdcxx
  mix on the Ubuntu 22.04 CircleCI, though this is at least reproducible.
- Updates CircleCI to build CMake from source as we don't have a newer image
  available. Also add some newly found dependencies (not sure how it was
  working without them before).

Test plan:

This change requires less testing than previous build related changes because
it deprecates most of the build types.

- The internal BUCK build is unaffected. No special testing.
- The semi-internal CMake build is gone. Use Nix.
- The Nix build for clang-15 and some tests are continuously tested in GitHub
  actions.
- Tested the set of Nix commands in the README. All work except the one that
  points to GitHub as this must be merged first.
- The existing CircleCI runs on Ubuntu 20.04 are maintained.
- Unable to test the new `test-report.yml` as it must be merged due to the
  permissions it needs. Will follow up with testing after this is merged. See:
  https://github.com/dorny/test-reporter?tab=readme-ov-file#recommended-setup-for-public-repositories

The list of exclusions for GitHub Actions/nix testing is currently very long, I
think 29% of the tests. This should be stable and reproducible though, and
likely needs deep changes to OI to fix. That's why fixes are excluded from this
PR. It's all to do with the forked drgn not being able to parse clang's newer
DWARF output, and can't be fixed by rolling back as we required a relatively
new libcxx.
2024-08-16 12:29:31 +01:00

246 lines
7.6 KiB
Python
Executable File

#!/usr/bin/env python3
#
# Copyright (c) Meta Platforms, Inc. and affiliates.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import getpass
import os
import pathlib
import subprocess
import typing
import toml
STUB_CODE = b"""
#include <iostream>
#include <vector>
#include <set>
#include <map>
#include <sys/ioctl.h>
#include <cassert>
#include <fstream>
#include <string>
#include <regex>
#include <filesystem>
#include <algorithm>
#include <stddef.h>
#include <pthread.h>
#include <linux/sched.h>
"""
# Parts of the end of include paths to strip (eg: #include sys/param.h)
END_PATH_EXCLUDE = frozenset(("bits", "backward", "debug", "types", "sys", "gnu"))
def generate_compiler_preprocessed(compiler: str) -> str:
"""Execute compiler to grab preprocessor output"""
out = subprocess.check_output([compiler, "-E", "-x", "c++", "-"], input=STUB_CODE)
return out.decode()
def generate_compiler_commands(compiler: str) -> str:
"""Execute compiler to grab preprocessor output"""
out = subprocess.check_output(
[compiler, "-E", "-x", "c++", "-v", "-", "-o", "/dev/null"],
stderr=subprocess.STDOUT,
input=STUB_CODE,
)
return out.decode()
def filter_includes_preprocessor(
preprocessor_output: str,
) -> typing.Tuple[typing.Iterable[str], typing.Iterable[str]]:
"""Filter preprocessor output and extract include paths from it
We are expecting lines like this, for user:
# 1 "/usr/include/c++/8/iostream" 1 3
where,
lineComps[0] - '#'
lineComps[1] - 'line number
lineComps[2] - 'file path
lineComps[3] - '1' (Entering new header file)
lineComps[4] - '3' (SrcMgr::C_System)
see lib/Frontend/PrintPreprocessedOutput.cpp in clang source.
Or like this, for system:
# 1 "/usr/include/features.h" 1 3 4
This time the consecutive [3 4] pattern indicate a
SrcMgr::C_ExternCSystem file.
"""
is_system, is_user = False, False
system_paths: typing.List[pathlib.Path] = []
user_paths: typing.List[pathlib.Path] = []
for line in preprocessor_output.splitlines():
s = line.split()
if len(s) == 5 and s[3] == "1":
is_user = True
elif len(s) == 6 and s[3] == "1" and s[4] == "3" and s[5] == "4":
is_system = True
else:
continue
path = pathlib.Path(s[2].strip('"'))
if not path.exists():
continue
path = path.resolve().parent
while path.name in END_PATH_EXCLUDE:
path = path.parent
if is_system and path not in system_paths:
system_paths.append(path)
elif is_user and path not in user_paths:
user_paths.append(path)
# TODO the order of paths matter here - what's the right way to sort these lists?
# My best guess is that we need most specific to least specific, so let's just try the depth of the paths
system_paths.sort(key=lambda k: len(k.parents), reverse=True)
user_paths.sort(key=lambda k: len(k.parents), reverse=True)
return (map(str, system_paths), map(str, user_paths))
def filter_includes_commands(output: str) -> typing.Iterable[str]:
"""Filter -v compiler output and retrieve include paths if possible.
Unfortunately relies on non-standardized behavior...
Example output we are parsing, to obtain the directories in order
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib64/clang/12.0.1/include
/usr/include
End of search list.
# 1 "<stdin>"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 341 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "<stdin>" 2
End of search list
"""
collecting = False
paths: typing.List[pathlib.Path] = []
for line in output.splitlines():
if collecting:
if line == "End of search list.":
break
# Just a backup - ideally this should never trigger
if line.startswith("#"):
continue
path = pathlib.Path(line.strip())
path = path.resolve()
if path not in paths and path.exists():
paths.append(path)
if line.startswith("#include <...> search starts here:"):
collecting = True
continue
return map(str, paths)
def pull_base_toml() -> typing.Dict:
script = pathlib.Path(__file__)
repo_path = script.parent.parent
script = repo_path / "dev.oid.toml"
if not script.exists():
raise RuntimeError(
"Base file dev.oid.toml not found, either replace it, or skip types."
)
with open(script, "r") as f:
base = toml.load(f)
# Now, we need to replace any placeholders that might be present in the base toml file with the real verisons.
user = getpass.getuser()
if "IN_NIX_SHELL" in os.environ and "src" in os.environ:
pwd = os.environ['src']
else:
pwd = str(repo_path.resolve())
container_list = base.get("types", {}).get("containers")
if container_list:
for idx, c in enumerate(container_list):
container_list[idx] = c.replace("PWD", pwd).replace("USER", user)
return base
def generate_toml(
system_paths: typing.Iterable[str],
user_paths: typing.Iterable[str],
base_object: typing.Dict,
output_file: str,
):
base_object.update(
{"headers": {"system_paths": system_paths, "user_paths": user_paths}}
)
with open(output_file, "w") as f:
toml.dump(base_object, f)
def main():
parser = argparse.ArgumentParser(
description="Run a c/c++ compiler and attempt to generate an oi config file from the results"
)
parser.add_argument(
"-c",
"--compiler",
default="clang++",
help="The compiler binary used to generate headers from.",
)
parser.add_argument(
"--skip-types",
action="store_true",
help="Whether to skip pulling types from dev.oid.toml in addition to generating include headers.",
)
parser.add_argument(
"--include-mode",
choices=("preprocessor", "commands"),
default="commands",
help="Which strategy to use for generating includes. Right now choose between using -E (preprocessor) or -v (verbose commands)",
)
parser.add_argument(
"output_file", help="Toml file to output finished config file to."
)
args = parser.parse_args()
if args.include_mode == "preprocessor":
preprocessed = generate_compiler_preprocessed(args.compiler)
system_includes, user_includes = filter_includes_preprocessor(preprocessed)
elif args.include_mode == "commands":
output = generate_compiler_commands(args.compiler)
system_includes = filter_includes_commands(output)
user_includes = []
else:
raise ValueError("Invalid include mode provided!")
if args.skip_types:
base = {}
else:
base = pull_base_toml()
generate_toml(system_includes, user_includes, base, args.output_file)
if __name__ == "__main__":
main()