Update on Overleaf.

This commit is contained in:
jsh77 2022-05-27 08:25:04 +00:00 committed by node
parent 1a6f0a28d4
commit b7c93316c3

View File

@ -655,7 +655,7 @@ In the case of Docker, with CVE-2021-21284, a remapped user may be able to alter
\section{Control group namespaces} \section{Control group namespaces}
\label{sec:voiding-cgroup} \label{sec:voiding-cgroup}
Control group (cgroup) namespaces provide limited isolation of the cgroup hierarchy between processes. Rather than showing the full cgroups hierarchy, they instead show only the part of the hierarchy that the process was in on creation of the new cgroup namespace. Correctly creating a void process is hence as follows: Control group (cgroup) namespaces provide limited isolation of the cgroup hierarchy between processes. Rather than showing the full cgroups hierarchy, they show only the part of the hierarchy that the process was in on creation of the new cgroup namespace. Correctly creating a void process is hence as follows:
\begin{enumerate} \begin{enumerate}
\item Create an empty cgroup leaf. \item Create an empty cgroup leaf.
@ -663,14 +663,14 @@ Control group (cgroup) namespaces provide limited isolation of the cgroup hierar
\item Unshare the cgroup namespace. \item Unshare the cgroup namespace.
\end{enumerate} \end{enumerate}
By following this sequence of calls, the process in the void would only see the leaf which contains itself and nothing else, limiting access to the host system. Running the shim with ambient authority here presents an issue as the cgroup hierarchy relies on discretionary access control. In order to move the process into a leaf the shim must have sufficient authority to modify the cgroup hierarchy. Due to this behaviour, and hence the unreliability of correctly voiding cgroup processes, the void orchestrator performs only the third step - unsharing the cgroup namespace. This makes cgroups the only namespace which can't be voided with ambient authority, suggesting a strong need for kernel changes. By following this sequence of calls, the process in the void would only see the leaf which contains itself and nothing else. Running the shim with ambient authority here presents an issue as the cgroup hierarchy relies on discretionary access control. In order to move the process into a leaf the shim must have sufficient authority to modify the cgroup hierarchy. Due to this behaviour, and hence the unreliability of correctly voiding cgroup processes, the void orchestrator performs only the third step - unsharing the cgroup namespace. This makes cgroups the only namespace which can't be voided with ambient authority, advocating a need for kernel changes.
There are two problems when working with cgroups namespaces in user-space: needing sufficient discretionary access control, and leaving the control of individual application processes in a global namespace. I posture an alternative cgroup namespace: a process in a new cgroups namespace creates a detached hierarchy with the process as a leaf of the root and full permissions in the user namespace that created it. The main cgroups hierarchy sees a single application to control, while the application itself would have full access over sharing its externally limited resources. This would make the delegation of resources explicit in the design. That is, the main namespace could be handled by \texttt{systemd}, while the \texttt{/docker} namespace could be internally managed by docker. This would allow \texttt{systemd} to move the \texttt{/docker} namespace around as required, with isolation of the choices made internally. There are two problems when working with cgroups namespaces: needing sufficient discretionary access control, and leaving the control of individual application processes in a global namespace. I posture an alternative cgroup namespace: a process in a new cgroups namespace creates a detached hierarchy with itself as a leaf of the root and full permissions in the user namespace that created it. The main cgroups hierarchy sees a single application to control, while the application itself would have full access over sharing its externally limited resources. This would make the delegation of resources explicit in the design - the init namespace could be handled by \texttt{systemd}, while the \texttt{/docker} namespace could be managed internally by docker. This would allow \texttt{systemd} to move the \texttt{/docker} namespace around as required, with isolation from the choices made internally.
\section{Creation cost} \section{Creation cost}
\label{sec:void-creation-costs} \label{sec:void-creation-costs}
As shown in this chapter, creating a void requires creating 7 distinct namespaces for isolation. There are two options to create these namespaces: \texttt{clone(2)} or \texttt{unshare(2)}. As the void orchestrator uses \texttt{clone(2)} we evaluate the performance of this tool. As shown, creating a void requires creating 7 distinct namespaces for isolation. There are two options to create these namespaces: \texttt{clone(2)} or \texttt{unshare(2)}. As the void orchestrator uses \texttt{clone(2)} we evaluate the performance of this tool.
These tests were run on my development machine, using Linux 5.15.0-33-generic on Ubuntu 22.04 LTS. It is a Xen based virtual machine, hence absolute results are less important than trends. The test process calls \texttt{clone(2)} with the requisite flags, then waits for the child process to exit. The child process exits immediately after returning from clone. The time is taken from before the \texttt{clone(2)} call and after the \texttt{wait(2)} call returns using the high precision \texttt{CLOCK\_MONOTONIC}. This code is compiled into a tight C for loop, which executes 1250 times. The first 250 entries of each run are discarded. Prior to running the variety of clone tests, 12500 clone calls are made in an attempt to warm up the system. These tests were run on my development machine, using Linux 5.15.0-33-generic on Ubuntu 22.04 LTS. It is a Xen based virtual machine, hence absolute results are less important than trends. The test process calls \texttt{clone(2)} with the requisite flags, then waits for the child process to exit. The child process exits immediately after returning from clone. The time is taken from before the \texttt{clone(2)} call and after the \texttt{wait(2)} call returns using the high precision \texttt{CLOCK\_MONOTONIC}. This code is compiled into a tight C for loop, which executes 1250 times. The first 250 entries of each run are discarded. Prior to running the variety of clone tests, 12500 clone calls are made in an attempt to warm up the system.