mirror of
https://git.overleaf.com/6227c8e96fcdc06e56454f24
synced 2024-11-21 19:51:47 +00:00
Update on Overleaf.
This commit is contained in:
parent
41322b06c0
commit
4975483673
96
report.tex
96
report.tex
@ -20,7 +20,7 @@
|
||||
\usepackage{courier} % better listings font
|
||||
\usepackage{dirtytalk} % quotations
|
||||
\usepackage[square,numbers]{natbib} % citations
|
||||
\usepackage{minted} % code listings
|
||||
\usepackage[chapter]{minted} % code listings
|
||||
\usepackage{multirow} % multi-row cells in tables
|
||||
\usepackage{makecell} % multi-line cells in tables
|
||||
\usepackage[subpreambles]{standalone} % tex files as diagrams
|
||||
@ -181,6 +181,7 @@ Much prior work exists in the space of privilege separation, including: virtual
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=0.6\textwidth]{figures/least-most-linux.png}
|
||||
|
||||
\caption{Privilege separated environments plotted to compare the number of application changes required against the remaining attack surface of the environment.}
|
||||
\label{fig:attack-vs-changes}
|
||||
\end{figure}
|
||||
@ -205,6 +206,7 @@ In 2003, privilege separation was added to the \texttt{syslogd} daemon of OpenBS
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includestandalone[width=0.4\textwidth]{diagrams/openbsd-syslogd-privsep}
|
||||
|
||||
\caption{Separation of privileged access from untrusted user data in OpenBSD's privilege separated syslogd design compared to the previous. The process which handles untrusted data is separated from the privileged process and uses RPC to communicate.}
|
||||
\label{fig:openbsd-syslogd-privsep}
|
||||
\end{figure}
|
||||
@ -246,8 +248,6 @@ This work focuses on the application of namespaces to more conventional privileg
|
||||
\label{chap:entering-the-void}
|
||||
|
||||
\begin{table}
|
||||
\caption{Table showing the date and kernel version each namespace was added. The date provides the date of the first commit where they appeared, and the kernel version the kernel release they appear in the changelog of. Namespaces are ordered by kernel version then alphabetically. Some examples are provided of CVEs of each namespace, and CVEs that each namespace protects against.}
|
||||
|
||||
\begin{center}
|
||||
\begin{tabular}{l|lr|lr|l|l}
|
||||
ns & \multicolumn{2}{l}{date} & \multicolumn{2}{|l|}{kernel ver.} & ns CVEs & prot. CVEs \\ \hline
|
||||
@ -303,6 +303,7 @@ This work focuses on the application of namespaces to more conventional privileg
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
\caption{Table showing the date and kernel version each namespace was added. The date provides the date of the first commit where they appeared, and the kernel version the kernel release they appear in the changelog of. Namespaces are ordered by kernel version then alphabetically. Some examples are provided of CVEs of each namespace, and CVEs that each namespace protects against.}
|
||||
\label{tab:namespaces}
|
||||
\end{table}
|
||||
|
||||
@ -344,9 +345,9 @@ Searching the list of released CVEs for both ``clock" and ``time linux" (time it
|
||||
|
||||
Network namespaces on Linux isolate the system resources related to networking. These include network interfaces themselves, IP routing tables, firewall rules and the \texttt{/proc/net} directory. This level of isolation allows a network stack that operates completely independently to exist on a single kernel.
|
||||
|
||||
Similarly to IPC, network namespaces present the optimal namespace for running a void process. Creating a new network namespace immediately creates a namespace containing only a local loopback adapter. This means that the new network namespace has no link whatsoever to the creating network namespace, only supporting internal communication. To add a link, one can create a virtual Ethernet pair with one adapter in each namespace (Figure \ref{fig:virtual-ethernet}). Alternatively, one can create a Wireguard adapter with sending and receiving sockets in one namespace and the VPN adapter in another \citep[§7.3]{donenfeld_wireguard_2017}. These methods allow for very high levels of separation while still maintaining access to the primary resource - the Internet or wider network. Further, this design places the management of how connected a namespace is to the parent in user-space. This is a significant difference compared to some of the namespaces discussed later in this chapter.
|
||||
Similarly to IPC, network namespaces present the optimal namespace for running a void process. Creating a new network namespace immediately creates a namespace containing only a local loopback adapter. This means that the new network namespace has no link whatsoever to the creating network namespace, only supporting internal communication. To add a link, one can create a virtual Ethernet pair with one adapter in each namespace (Figure \ref{lst:virtual-ethernet}). Alternatively, one can create a Wireguard adapter with sending and receiving sockets in one namespace and the VPN adapter in another \citep[§7.3]{donenfeld_wireguard_2017}. These methods allow for very high levels of separation while still maintaining access to the primary resource - the Internet or wider network. Further, this design places the management of how connected a namespace is to the parent in user-space. This is a significant difference compared to some of the namespaces discussed later in this chapter.
|
||||
|
||||
\begin{figure}
|
||||
\begin{listing}
|
||||
\begin{minipage}{.49\textwidth}
|
||||
|
||||
\begin{minted}{shell-session}
|
||||
@ -379,8 +380,8 @@ PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
|
||||
\end{minipage}
|
||||
|
||||
\caption{Parallel shell sessions showing the creation of a virtual Ethernet pair between the root network namespace and a newly created and completely empty network namespace.}
|
||||
\label{fig:virtual-ethernet}
|
||||
\end{figure}
|
||||
\label{lst:virtual-ethernet}
|
||||
\end{listing}
|
||||
|
||||
Network namespaces are also the first mentioned to control access to \texttt{procfs}. \texttt{/proc} holds a pseudo-filesystem which controls access to many of the kernel data structures that aren't accessed with system calls. Achieving the intended behaviour here requires remounting \texttt{/proc}, which must be done with extreme care so as not to overwrite it for every other process. In a void process this is handled by automatically voiding the mount namespace, meaning that this does not need to be intentionally taken care of.
|
||||
|
||||
@ -396,9 +397,6 @@ As with network namespaces, PID namespaces have a significant effect on \texttt{
|
||||
Secondly, we see that even in a shell that appears to be working correctly, processes from outside of the new PID namespace are still visible. This behaviour occurs because the mount of \texttt{/proc} visible to the process in the new PID namespace is the same as the init process. This is solved by remounting \texttt{/proc}, available to \texttt{unshare(3)} with the \texttt{---mount-proc} flag. Care must be taken that this mount is completed in a new mount namespace, or else processes outside of the PID namespace will be affected. The Void Orchestrator again avoids this by voiding the mount namespace entirely, meaning that any access to \texttt{procfs} must be either freshly mounted or bound to outside the namespace intentionally. Remounting a fresh \texttt{procfs} is unfortunately not trivial on most systems, and will be discussed with user namespaces (§\ref{sec:voiding-user}).
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:unshare-pid}
|
||||
\caption{Unshare behaviour with pid namespaces, with and without forking and remounting proc. Spawning a process without explicitly forking creates a broken shell. Forking creates a shell that works, but the PID namespace appears unchanged to processes that inspect it. Remounting proc and forking provides a working shell in which processes see the new pid namespace.}
|
||||
|
||||
\begin{minted}{shell-session}
|
||||
£ unshare --pid
|
||||
-bash: fork: Cannot allocate memory
|
||||
@ -420,6 +418,9 @@ Secondly, we see that even in a shell that appears to be working correctly, proc
|
||||
15 pts/1 R+ 0:00 ps ax
|
||||
16 pts/1 S+ 0:00 tail -n 3
|
||||
\end{minted}
|
||||
|
||||
\caption{Unshare behaviour with pid namespaces, with and without forking and remounting proc. Spawning a process without explicitly forking creates a broken shell. Forking creates a shell that works, but the PID namespace appears unchanged to processes that inspect it. Remounting proc and forking provides a working shell in which processes see the new pid namespace.}
|
||||
\label{lst:unshare-pid}
|
||||
\end{listing}
|
||||
|
||||
PID namespaces are also of increased complexity as they enable something completely new in Linux: PID 1 processes that may terminate without the system. That is, the init process of an ordinary Linux systems survive until reboot, whereas the init process of a container survives only until the container exits. This raises issues with cleanup, such as CVE-2019-20794 where FUSE filesystems aren't correctly cleaed up on PID namespace exit. Vulnerabilities that PID protects from are quite hard to find, but a good example is CVE-2012-0056. A bug existed where a \texttt{setuid} binary could be coereced into writing to arbitrary process's memory. However, if one can't see the processes in their \texttt{/proc} because of the protection of PID namespaces, this bug is avoided.
|
||||
@ -437,9 +438,6 @@ The filesystem on Linux provides access to most of the system. It follows that a
|
||||
Compared to network namespaces, there is a huge difference in what occurs when a new namespace is created. When creating a new network namespace, the ideal conditions for a void process are created - a network namespace containing only a loopback adapter. That is, the process has no ability to interact with the outside network, and no immediate relation to the parent network namespace. To interact with alternate namespaces, one must explicitly create a connection between the two, or move a physical adapter into the new (empty) namespace. Mount namespaces, rather than creating a new and empty namespace, made the choice to create a copy of the parent namespace, in a copy-on-write fashion. That is, after creating a new mount namespace, the mount hierarchy appears much the same as before. This is shown in Listing \ref{lst:unshare-cat-passwd}, where the file \texttt{/etc/passwd} is shown before and after an unshare, revealing the same content.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:unshare-cat-passwd}
|
||||
\caption{Reading the same file before and after unsharing the mount namespace demonstrates no observable change in behaviour, showing that more work must be done to create an empty namespace.}
|
||||
|
||||
\begin{minted}{c}
|
||||
int main() {
|
||||
int fd;
|
||||
@ -474,6 +472,9 @@ bin:x:2:2:bin:/bin:/usr/sbin/nologin
|
||||
sys:x:3:3:sys:/dev:/usr/sbin/nologin
|
||||
...
|
||||
\end{minted}
|
||||
|
||||
\caption{Reading the same file before and after unsharing the mount namespace demonstrates no observable change in behaviour, showing that more work must be done to create an empty namespace.}
|
||||
\label{lst:unshare-cat-passwd}
|
||||
\end{listing}
|
||||
|
||||
\subsection{Shared subtrees}
|
||||
@ -484,9 +485,6 @@ While some other namespaces are copy-on-write, for example UTS namespaces, they
|
||||
Shared subtrees \citep{pai_shared_2005} were introduced to provide a consistent view of the unified hierarchy between namespaces. Consider the example in Listing \ref{lst:shared-subtrees}. \texttt{unshare(1)} creates a non-shared tree, which presents the behaviour shown. Although \texttt{/mnt/cdrom} from the parent namespace has been bind mounted in the new namespace, the content of \texttt{/mnt/cdrom} is not the same. This is because the filesystem newly mounted on \texttt{/mnt/cdrom} is unavailable in the separate mount namespace. To combat this, shared subtrees were introduced. That is, as long as \texttt{/mnt/cdrom} resides on a shared subtree, the newly mounted filesystem will be available to a bind of \texttt{/mnt/cdrom} in another namespace. \texttt{systemd} made the choice to mount \texttt{/} as a shared subtree \citep{free_software_foundation_mount_namespaces7_2021}:
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:shared-subtrees}
|
||||
\caption{Parallel shell sessions showing highly separated behaviour without shared subtrees between mount namespaces. A folder in the parent namespace that is bound may still show different results in each namespace if the mounts have changed.}
|
||||
|
||||
\begin{minipage}{.49\textwidth}
|
||||
|
||||
\begin{minted}{shell-session}
|
||||
@ -517,6 +515,9 @@ file_1 file_2
|
||||
\end{minted}
|
||||
|
||||
\end{minipage}
|
||||
|
||||
\caption{Parallel shell sessions showing highly separated behaviour without shared subtrees between mount namespaces. A folder in the parent namespace that is bound may still show different results in each namespace if the mounts have changed.}
|
||||
\label{lst:shared-subtrees}
|
||||
\end{listing}
|
||||
|
||||
\say{Notwithstanding the fact that the default propagation type for new mount is in many cases \texttt{MS\_PRIVATE}, \texttt{MS\_SHARED} is typically more useful. For this reason, \texttt{systemd(1)} automatically remounts all mounts as \texttt{MS\_SHARED} on system startup. Thus, on most modern systems, the default propagation type is in practice \texttt{MS\_SHARED}.}
|
||||
@ -533,9 +534,6 @@ Referring again to network namespaces, sockets continue to exist in their initia
|
||||
Something which behaves differently is the memory mapping of a currently running process's binary. Consider the example in Listing \ref{lst:unshare-umount}, which shows a short C program and the result of running it. It is seen that the \texttt{/} mount is busy when attempting the unmount. Given that the process was created in the parent namespace, the behaviour of file descriptors would suggest that the process would maintain a link to the parent namespace for its own memory mapped regions. However, the fact that the otherwise empty namespace has a busy mount shows that this is not the case.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:unshare-umount}
|
||||
\caption{Attempting to unmount the private root directory after an unshare results in an error that the resource is busy when no files have been opened on it in the new namespace.}
|
||||
|
||||
\begin{minted}{c}
|
||||
int main() {
|
||||
if (unshare(CLONE_NEWNS))
|
||||
@ -550,14 +548,13 @@ if (umount("/"))
|
||||
umount: Device or resource busy
|
||||
\end{minted}
|
||||
|
||||
\caption{Attempting to unmount the private root directory after an unshare results in an error that the resource is busy when no files have been opened on it in the new namespace.}
|
||||
\label{lst:unshare-umount}
|
||||
\end{listing}
|
||||
|
||||
A feature called lazy unmounting or \texttt{MNT\_DETACH} exists for situations where a busy mount still needs to be unmounted. Supplying the \texttt{MNT\_DETACH} flag to \texttt{umount2(2)} causes the mount to be immediately detached from the unified hierarchy, while remaining mounted internally until the last user has finished with it. Whilst this initially seems like a good solution, this system call is incredibly dangerous when combined with shared subtrees. This behaviour is shown in Listing \ref{lst:unshare-umount-lazy}, where a lazy (and hence recursive) unmount is combined with a shared subtree to disastrous effect.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:unshare-umount-lazy}
|
||||
\caption{Parallel shell sessions demonstrating the behaviour in the parent namespace when attempting to lazily unmount the root filesystem from an unshared shell with a shared mount. The mount of procfs in the parent is lost even though the unmount was performed in a different namespace.}
|
||||
|
||||
\begin{minipage}{.49\textwidth}
|
||||
|
||||
\begin{minted}{shell-session}
|
||||
@ -585,6 +582,9 @@ directory
|
||||
\end{minted}
|
||||
|
||||
\end{minipage}
|
||||
|
||||
\caption{Parallel shell sessions demonstrating the behaviour in the parent namespace when attempting to lazily unmount the root filesystem from an unshared shell with a shared mount. The mount of procfs in the parent is lost even though the unmount was performed in a different namespace.}
|
||||
\label{lst:unshare-umount-lazy}
|
||||
\end{listing}
|
||||
|
||||
This behaviour raises questions about why a shared subtree, which exists as an object, would need to be detached recursively - decreasing the reference count to the shared subtree itself would seem sufficient. The inconsistency is best explained by looking at the development timeline for the three features here: mount namespaces, shared subtrees, and recursive lazy unmounts. When lazy unmounting was added, in September 2001, the author said the following \citep{viro_patch_2001}:
|
||||
@ -616,9 +616,6 @@ To create an effective void process content must be written to the files \texttt
|
||||
User namespaces again interact with \texttt{procfs}, which brings up an interesting limitation to the capabilities available in user namespaces. On most systems, \texttt{procfs} has a variety of mounts over parts of it. This might be to interact with a hypervisor such as Xen, to support \texttt{binfmt\_misc} for running special applications, or Docker protecting the host from container mishaps. Most interestingly with Docker, these mounts are used to protect the host from the container accessing certain files. The series of mounts on one of my machines are shown in Listing \ref{lst:docker-procfs}. The objects mounted over include \texttt{/proc/kcore}, which presents direct access to all of the kernel's allocatable memory. Linux protects these mounts by enforcing that \texttt{procfs} with mounts below it can only be mounted in a new place if the user has root privilege in the init namespace. Fortunately, one can instead perform a small dance of first binding \texttt{/proc} to the parent namespace before remounting it, which is allowed with mounts below. Further, by running the void process with restricted authority (limited to that of the calling user even as root), the dangerous files in \texttt{/proc} are protected using discretionary access control. This avoids the requirement of adding extra mounts in the void orchestrator.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:docker-procfs}
|
||||
\caption{The mounts at and below /proc in a Ubuntu Docker container demonstrate the many additional mounts on top of procfs.}
|
||||
|
||||
\begin{minted}{shell-session}
|
||||
# docker run --rm ubuntu cat /proc/mounts | grep proc
|
||||
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
|
||||
@ -634,6 +631,9 @@ tmpfs /proc/keys tmpfs rw,nosuid,size=65536k,mode=755 0 0
|
||||
tmpfs /proc/timer_list tmpfs rw,nosuid,size=65536k,mode=755 0 0
|
||||
tmpfs /proc/scsi tmpfs ro,relatime 0 0
|
||||
\end{minted}
|
||||
|
||||
\caption{The mounts at and below /proc in a Ubuntu Docker container demonstrate the many additional mounts on top of procfs.}
|
||||
\label{lst:docker-procfs}
|
||||
\end{listing}
|
||||
|
||||
User namespaces act as both a blessing and a curse for security. In the case of Docker, with CVE-2021-21284, a remapped user may be able to alter the initial source of the mappings, causing them to be overridden and gaining root access. In contrast with containerd, with CVE-2021-23021, an always root containerd daemon mounts files that shouldn't be accessible with DAC due to a logic error. Mapped user namespaces preserve DAC, protecting against this sort of incorrect code compared to a root daemon.
|
||||
@ -691,9 +691,6 @@ Given that statically giving sockets is infeasible and adding a firewall does no
|
||||
Filling a user namespace is a slightly odd concept compared to the namespaces already discussed in this section. A user namespace comes with no implicit mapping of users whatsoever (§\ref{sec:voiding-user}). To enable applications to be run with bounded authority, a single mapping is added by the Void Orchestrator of \texttt{root} in the child user namespace to the launching UID in the parent namespace. This means that the user with highest privilege in the container, \texttt{root}, will be limited to the access of the launching user. The behaviour of mapping \texttt{root} to the calling user is shown with the \texttt{unshare(1)} command in Listing \ref{lst:mapped-root-directory}, where a directory owned by the calling user, \texttt{alice}, appears to be owned by \texttt{root} in the new namespace. A file owned by \texttt{root} in the parent namespace appears to be owned by \texttt{nobody} in the child namespace, as no mapping exists for that file's user.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:mapped-root-directory}
|
||||
\caption{A directory listing before and after entering a user namespace with mapped root demonstrates filesystem objects owned by the mapped (calling) user shown as being owned by root and any other filesystem objects shown as being owned by nobody.}
|
||||
|
||||
\begin{minted}{shell-session}
|
||||
£ ls -ld repos owned_by_root
|
||||
-rw-r--r-- 1 root root 0 May 7 22:13 owned_by_root
|
||||
@ -705,6 +702,9 @@ drwxrwxr-x 7 alice alice 4096 Feb 27 17:52 repos
|
||||
-rw-r--r-- 1 nobody nogroup 0 May 7 22:13 owned_by_root
|
||||
drwxrwxr-x 7 root root 4096 Feb 27 17:52 repos
|
||||
\end{minted}
|
||||
|
||||
\caption{A directory listing before and after entering a user namespace with mapped root demonstrates filesystem objects owned by the mapped (calling) user shown as being owned by root and any other filesystem objects shown as being owned by nobody.}
|
||||
\label{lst:mapped-root-directory}
|
||||
\end{listing}
|
||||
|
||||
The way user namespaces are currently used creates a binary system: either a file appears as owned by \texttt{root} if owned by the calling user, or appears as owned by \texttt{nobody} if not (ignoring groups for clarity, though their behaviour is similar). One questions whether more users could be mapped in, but this presents additional difficulties. Firstly, \texttt{setgroups(2)} system call must be denied to achieve correct behaviour in the child namespace. This is because the \texttt{root} user in the child namespace has full capabilities, which include \texttt{CAP\_SETGID}. This means that the user in the namespace can drop their groups, potentially allowing access to materials which the creating user did not (consider a file with permissions \texttt{0707}). This limits the utility of switching user in the child namespace, as the groups must remain the same. Secondly, mapping to users and groups other than oneself requires \texttt{CAP\_SETUID} or \texttt{CAP\_SETGID} in the parent namespace. Avoiding this is well advised to reduce the ambient authority of the shim.
|
||||
@ -756,9 +756,6 @@ There are two types of entrypoints: those spawned at startup, and those spawned
|
||||
To begin displaying the power of the void orchestrator system we will develop an application that requires completely minimal privilege. The application and its fixed output are shown, unmodified, in Listing \ref{lst:fibonacci-application}. The application is written in Rust, my language of choice, but there is no such requirement - an equivalent program would look very similar in C. The limited code of this example makes the privilege requirements quite clear. Computing \texttt{fib} requires no privilege at all, operating purely on numbers on the stack. Once the values are computed they are printed using the \texttt{println!} macro, which prints to stdout. Therefore the only privilege this application requires to correctly run is access to stdout.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:fibonacci-application}
|
||||
\caption{A basic Fibonacci application. The application computes elements of the Fibonacci sequence on static indices and does not process any user input.}
|
||||
|
||||
\begin{minted}{rust}
|
||||
fn main() {
|
||||
println!("fib(1) = {}", fib(1));
|
||||
@ -779,14 +776,14 @@ fib(1) = 1
|
||||
fib(7) = 13
|
||||
fib(19) = 4181
|
||||
\end{minted}
|
||||
|
||||
\caption{A basic Fibonacci application. The application computes elements of the Fibonacci sequence on static indices and does not process any user input.}
|
||||
\label{lst:fibonacci-application}
|
||||
\end{listing}
|
||||
|
||||
To run this application as a void process we require a specification (§\ref{sec:system-design}) to detail how the processes of the application should be set up. The specification for the Fibonacci application is given in Listing \ref{lst:fibonacci-application-spec}. When specifying an entrypoint for an application every privilege needed must be specified explicitly. In this case, as discussed, the application only requires special access to stdout. This is specified in the environment section of the entrypoint. We also see in the specification a variety of libraries made available, required for the application to successfully dynamically link. This information is decidable from the binary, but implementing that is left for future work (§\ref{sec:future-work-dynamic-linking}). We also see that no arguments are specified, although they are a part of the specification. No specified arguments defaults to no arguments, as the void orchestrator minimises privilege by default. The application void process therefore receives no arguments - including \texttt{arg0} as the binary name.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:fibonacci-application-spec}
|
||||
\caption{The specification for the void orchestrator to run the application shown in Listing \ref{lst:fibonacci-application}. A single entrypoint is provided with a minimal environment, including only the content to dynamically link the binary and standard output.}
|
||||
|
||||
\begin{minted}{json}
|
||||
{"entrypoints": { "fib": { "environment": [
|
||||
"Stdout",
|
||||
@ -810,6 +807,9 @@ To run this application as a void process we require a specification (§\ref{sec
|
||||
}
|
||||
]}}}
|
||||
\end{minted}
|
||||
|
||||
\caption{The specification for the void orchestrator to run the application shown in Listing \ref{lst:fibonacci-application}. A single entrypoint is provided with a minimal environment, including only the content to dynamically link the binary and standard output.}
|
||||
\label{lst:fibonacci-application-spec}
|
||||
\end{listing}
|
||||
|
||||
More of the advanced features of the system will be shown in the future examples, but this is enough to get a basic application up and running. We can see that the Rust application looks exactly like it would without the shim, at least for now. The application is also fully deprivileged. Of course, for an application as small as this example, we can verify by hand that the program has no foul effects. We can imagine a trivial extension that would make this program more dangerous: using a user argument (a privilege the program does not currently have) to take a value on which to execute fib. One way this user input could cause damage is with flawed usage of a logging library. The recent example of Log4j2 with CVE-2021-44228 springs to mind, enabling an attacker with string control to execute arbitrary code from the Internet. A void process with privilege of only arguments and stdout would protect well against this vulnerability, as not only is there no Internet access to pull remote code, but there is nothing to take advantage of in the process even if remote code execution is gained.
|
||||
@ -842,9 +842,6 @@ Rather than presenting the complete applications as shown in the previous two se
|
||||
The special privilege required by a process which accepts TCP connections is a listening TCP socket. As discussed in Section \ref{sec:filling-net}, TCP listening sockets are handed already bound to void processes. This enables a capability model for network access, otherwise restricting inbound and outbound networking entirely. The specification for this listener is given in Listing \ref{lst:tls-tcp-listener-spec}, where the TCP listener is requested as an argument already bound. No other permissions are required to accept connections from a TCP listener. Although the code at each stage is omitted for brevity, the resulting program has to parse the argument back into an integer and then a \texttt{TcpStream} before looping to receive incoming connections. When building and debugging software it is often useful to have access to the \texttt{stdout} or \texttt{stderr} streams, even though they won't be utilised in production. The void orchestrator provides useful \texttt{--stdout} and \texttt{--stderr} flags to temporarily privilege an application for debugging without modifying its specification. Of course, we can't do much useful with them without more privilege. Thus we move on to developing the HTTP handler.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:tls-tcp-listener-spec}
|
||||
\caption{The void orchestrator specification for the TCP listener endpoint of the TLS application. The privilege to use a TCP listener is requested as an argument. Dynamic linking binds are omitted for brevity.}
|
||||
|
||||
\begin{minted}{json}
|
||||
{"entrypoints": { "tcp_listener": {
|
||||
"args": [
|
||||
@ -852,6 +849,9 @@ The special privilege required by a process which accepts TCP connections is a l
|
||||
]
|
||||
}}}
|
||||
\end{minted}
|
||||
|
||||
\caption{The void orchestrator specification for the TCP listener endpoint of the TLS application. The privilege to use a TCP listener is requested as an argument. Dynamic linking binds are omitted for brevity.}
|
||||
\label{lst:tls-tcp-listener-spec}
|
||||
\end{listing}
|
||||
|
||||
\subsection{HTTP handler}
|
||||
@ -864,9 +864,6 @@ In this case, we are going to add a new entrypoint for two reasons: multiprocess
|
||||
The HTTP handler entrypoint is added to the specification in Listing \ref{lst:tls-http-handler-spec}. As well as adding a single extra argument to trigger the HTTP handler, we must also add an entrypoint argument to differentiate between the two entrypoints. Much like the usage of \texttt{arg0} for symlinked binaries, we utilise \texttt{arg0} to find which intended use of the binary is being called.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:tls-http-handler-spec}
|
||||
\caption{The void orchestrator specification for the TCP listener endpoint and HTTP handler endpoint of the TLS application. This extends on Listing \ref{lst:tls-tcp-listener-spec} by adding the HTTP handler endpoint. A new File Socket is used to link the two entrypoints together. Dynamic linking binds are omitted for brevity.}
|
||||
|
||||
\begin{minted}{json}
|
||||
{"entrypoints": {
|
||||
"tcp_listener": {
|
||||
@ -886,12 +883,12 @@ The HTTP handler entrypoint is added to the specification in Listing \ref{lst:tl
|
||||
}
|
||||
}}
|
||||
\end{minted}
|
||||
|
||||
\caption{The void orchestrator specification for the TCP listener endpoint and HTTP handler endpoint of the TLS application. This extends on Listing \ref{lst:tls-tcp-listener-spec} by adding the HTTP handler endpoint. A new File Socket is used to link the two entrypoints together. Dynamic linking binds are omitted for brevity.}
|
||||
\label{lst:tls-http-handler-spec}
|
||||
\end{listing}
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:tls-main-function}
|
||||
\caption{The main function for the TLS server. This matches on the entrypoint arg0 to determine which entrypoint the application has been run for.}
|
||||
|
||||
\begin{minted}{rust}
|
||||
fn main() {
|
||||
match std::env::args().next() {
|
||||
@ -904,6 +901,9 @@ fn main() {
|
||||
}
|
||||
}
|
||||
\end{minted}
|
||||
|
||||
\caption{The main function for the TLS server. This matches on the entrypoint arg0 to determine which entrypoint the application has been run for.}
|
||||
\label{lst:tls-main-function}
|
||||
\end{listing}
|
||||
|
||||
\subsection{TLS handler}
|
||||
@ -914,9 +914,6 @@ The final stage is to add the TLS handling into the mix. Once again we have the
|
||||
The resulting specification is given in Listing \ref{lst:tls-spec}. The TLS handler is added in a very similar manner to the previous HTTP handler. It is triggered by a file socket, but this time receives another file socket to trigger the next stage. It receives file descriptor capabilities to each the certificate and private key files, along with the TCP stream. This process receives nothing but highly restricted capabilities, ensuring that there is very little attack surface for compromise.
|
||||
|
||||
\begin{listing}
|
||||
\label{lst:tls-spec}
|
||||
\caption{The void orchestrator specification for the final TLS application. This extends on Listing \ref{lst:tls-tcp-listener-spec} by adding the HTTP handler endpoint. A new File Socket is used to link the two entrypoints together. Dynamic linking binds are omitted for brevity.}
|
||||
|
||||
\begin{minted}{json}
|
||||
{"entrypoints": {
|
||||
"connection_listener": {
|
||||
@ -946,6 +943,9 @@ The resulting specification is given in Listing \ref{lst:tls-spec}. The TLS hand
|
||||
}
|
||||
}}
|
||||
\end{minted}
|
||||
|
||||
\caption{The void orchestrator specification for the final TLS application. This extends on Listing \ref{lst:tls-tcp-listener-spec} by adding the HTTP handler endpoint. A new File Socket is used to link the two entrypoints together. Dynamic linking binds are omitted for brevity.}
|
||||
\label{lst:tls-spec}
|
||||
\end{listing}
|
||||
|
||||
We now have a full specification for a TLS server. In this section I have focused entirely on building up the specification and not the code behind it. There are two reasons for this: the code has a lot of boilerplate argument processing, and a variety of code implementations are available. The boilerplate argument processing could be addressed with future work using features like proc macros in Rust which dynamically generate code based on the code that is already there (§\ref{sec:future-work-macros}). As for varying implementations, I chose to use the static library \texttt{rustls} to implement my TLS server. Perhaps someone else would prefer OpenSSL or LibreSSL, which is of course fine. For the HTTP part I use a random library I found on the Internet to parse HTTP headers before responding only to GET requests. Of course this approach is hugely error prone, but the separation of the HTTP handler from the sensitive TLS material and other parts of the filesystem increases my confidence. The implementation therefore matters very little in this analysis, but is made available at \url{https://github.com/JakeHillion/void-orchestrator/tree/main/examples/tls} and along with this dissertation.
|
||||
@ -968,6 +968,7 @@ Every void process created requires a set of 7 unique namespaces, which is a lot
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.7\textwidth]{graphs/namespace_times.png}
|
||||
|
||||
\caption{Performance of making the \texttt{clone(2)} system call with varying namespace creation flags. The test is run in a tight compiled C loop with high precision timings taken before and after each new process is cloned and waited for. \texttt{clone(2)} presents very noisy results on a system with background activity.}
|
||||
\label{fig:namespace-times}
|
||||
\end{figure}
|
||||
@ -975,6 +976,7 @@ Every void process created requires a set of 7 unique namespaces, which is a lot
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{graphs/namespace_stacked_times.png}
|
||||
|
||||
\caption{Performance of making the \texttt{clone(2)} system call with increasing amounts of namespace creation flags. The effects of Figure \ref{fig:namespace-times} are amplified when creating multiple namespaces in a single call this frequently. There is a clear divide between the time taken for user, pid, uts, and cgroup namespaces and ipc, ns and net namespaces.}
|
||||
\label{fig:namespace-stacked-times}
|
||||
\end{figure}
|
||||
@ -982,6 +984,7 @@ Every void process created requires a set of 7 unique namespaces, which is a lot
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{graphs/fib_launch_times.png}
|
||||
|
||||
\caption{A box plot comparing the performance of the Fibonacci example (§\ref{sec:building-fib} under the shim and called directly. The median time to run under the shim is approximately 800\% the time without. The inter-quartile range and range of results is also much larger.}
|
||||
\label{fig:fib-launch-times}
|
||||
\end{figure}
|
||||
@ -992,6 +995,7 @@ Every void process created requires a set of 7 unique namespaces, which is a lot
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{graphs/tls-server-requests-per-second.png}
|
||||
|
||||
\caption{\texttt{a2bench} requests per second results over 10 seconds with 100 simultaneous requests on varying response sizes. As the response size increases, the gap between the \texttt{apache2} TLS web server and the void process TLS web server decreases.}
|
||||
\label{fig:tls-performance}
|
||||
\end{figure}
|
||||
|
Loading…
Reference in New Issue
Block a user