Proxying packets is the process of taking packets that arrive at one location and transporting them to leave at another. This chapter focuses on the preparatory work to achieve this practically and securely, given the design outlined in the previous chapter, in which the proxy consolidates multiple connections to appear as one to both the wider Internet and devices on the local network. In Sections \ref{section:risk-analysis} and \ref{section:preparation-security}, I discuss the security risks and plans to confront them. In Section \ref{section:language-selection}, I present three languages: Go, Rust and C++, and provide context for choosing Go as the implementation language. Finally, in sections \ref{section:requirements-analysis} and \ref{section:engineering-approach}, I present a requirements analysis and a description of the engineering approach for the project.
Any connection between two computers presents a set of security risks. A proxy adds some further risks to this, as additional attack vectors are created by the proxy itself. Firstly, this section focuses on layered security. If we consider the Local Proxy and Remote Proxy, with everything in between, as a single Internet connection, layered security focuses on how the risks of this combined connection compare to that of a standard Internet connection, and what guarantees must be made to achieve the same risks for a proxied connection as for a standard connection.
The transportation of packets is in three sections, as shown in Figure \ref{fig:security-zones}. The first segment of the figure is Client-to-Proxy, which occurs physically in the local zone. The second section is Proxy-to-Proxy, which occurs across the Internet. Finally is Proxy-to-Server, which also occurs across the Internet. With the goal of providing security equivalent to a standard connection, the Client-to-Proxy communication can be considered safe - it is equivalent to connecting a client directly to a modem. Therefore, this section will focus on the transports of Proxy-to-Proxy, and Proxy-to-Server communication. The threat model for this analysis will now be described.
The threat model considered here will be that packets can be injected, read, and black-holed at any point in the Internet. This is the model employed by \cite{dolev_security_1983}, in which the attacker has full control of the message while it's transmitting over the Internet. Private networks will be considered safe, covering both the connection between from client to local proxy, and any connections within a VPN (Virtual Private Network).
There are four locations to insert or remove packets in the transport between the two proxies, as shown in Figure \ref{fig:security-zones-attackers}. In this case, Alice can insert packets to the local proxy to be sent to the client, Bob can insert packets to the remote proxy to be sent to the world, Charlie can steal packets from the local proxy destined for the remote proxy, and Dave can steal packets from the remote proxy destined for the local proxy. Each of these will be examined for the impact that it would cause.
The impact of Alice inserting packets of their choosing to the local proxy is none. Considering a client connected directly to a modem, any device on the Internet is able to send packets to this modem. In the case of Alice inserting these packets, they could simply send the packets to the remote proxy instead, achieving the same effect. As such, inserting packets destined for the client presents no additional risk.
The impact of Bob inserting packets of their choosing to the remote proxy creates a legal risk to the user, and further cost. For example, Bob may be inserting packets destined for Frank, of an illegal nature. As the machine running the remote proxy is your responsibility, these packets would appear to have come from you. Similarly, if using a metered service such as many cloud providers, all traffic inserted by Bob will be billed to you. Thus it is highly important to prevent attackers such as Bob from inserting packets that will be forwarded as if from you.
Charlie and Dave black-holing packets provides the same risk in either direction, which is denial of service. Even if only a small percentage of packets are able to be stolen, the increase in packet loss has a significant effect on any loss based congestion control mechanisms. This applies whether to the tunnelled flows, or to the congestion controlled flows between proxies themselves. Mitigations for this will focus on opportunities for stealing packets unique to this proxy setup, such as convincing one proxy to send you a portion of its outgoing packets. As stated in Section \ref{section:threat-model}, attackers are able to black-hole packets going to a server on the Internet regardless of the presence of this proxy, so this will not be mitigated.
Packets between the proxy and server are transmitted openly across the Internet. As this proxy transports entire IP packets at layer 3, no security guarantees need be maintained once the IP packet has left the remote proxy, as it is the responsibility of the application to provide its own security guarantees. Maintaining the same level of security as a standard connection can therefore be achieved by ensuring that the packets which leave one side of a proxy are a subset of the packets that entered the other side.
This section provides means of alleviating the risks given in Section \ref{section:risk-analysis}. To achieve this goal, the authenticity of packets will be verified. Authenticity in this context means two properties of the object hold: integrity and freshness \citep[pp. 14]{anderson_security_2008}. Integrity guarantees that any modification between the sending and receiving proxies can be detected, while freshness guarantees that reuse of a previously transmitted packet can be detected.
To provide integrity and freshness for each message, I evaluate two choices: Message Authentication Codes (MACs) and Digital Signatures. A MAC is a hash digest generated from a concatenation of data and a secret key. The hash digest is appended to the data before transmission. Anyone sharing the secret key can perform an identical operation to verify the hash and, therefore, the integrity of the data \citep[pp. 352]{menezes_handbook_1997}. Producing a digital signature for a message uses the private key in a public/private keypair to generate a digital signature for a message, proving that the message was signed by the owner of the private key, which can be verified by anyone with the corresponding public key \citep[pp. 147-149]{anderson_security_2008}. In each case, a code is appended to the message, such that the integrity and authenticity of the message can be verified.
As both proxy servers are controlled by the same party, non-repudiation - the knowledge of which side of the proxy provided an authenticity guarantee for the message - is not necessary. This leaves MAC as the message authentication of choice for this project, as producing MACs is less computationally complex than digital signatures, while not providing non-repudiation.
Beyond authenticating messages themselves, the connection built between the two proxies must be authenticated. Consider a person-in-the-middle attack, where an attacker forwards the packets between the two proxies. Then, the attacker stops forwarding packets, and instead black holes them. This creates the denial of service mentioned in the previous section.
To prevent such forwarding attacks, the connection itself must be authenticated. I present two methods to solve this, the first being address allow-lists, and the second authenticating the IP address and port of each sent packet. The first solution is static, and simply states that the listening proxy may only respond to new communications when the IP address of the communication is in an approved set. This verifies that the connection is from an approved party, as they must control that IP to create a two-way communication from it.
The second is a more dynamic solution. The IP authentication header \citep{kent_ip_2005} achieves this by protecting all immutable parts of the IP header with an authentication code. In the case of this software, authenticating the source IP address, source port, destination IP address, and destination port ensures connection authenticity. By authenticating these addresses, which can be checked easily at both ends, it can be confirmed that both devices knew with whom they were talking, and from where the connection was initiated. That is, an owner of the shared key authorised this communication path.
However, both of these solutions have some shortfalls when Network Address Translation (NAT) is involved. The second solution, authenticating addresses, fails with any form of NAT. This is because the IPs and ports of the packets sent by the sending proxy are different to when they will be received by the receiving proxy, and therefore cannot be authenticated. The first solution, providing a set of addresses, fails with Carrier Grade NAT (CG-NAT), as many users share the same IP address, and hence anyone under the same IP could perform an attack. In most cases one of these solutions will work, else one can fail over to the security layering presented in Section \ref{section:layered-security}.
To ensure freshness of received packets, an anti-replay algorithm is employed. Replay protection in IP authentication headers is achieved by using a sequence number on each packet. This sequence number is monotonically and strictly increasing. The algorithm that I have chosen to implement for this is \emph{IPsec Anti-Replay Algorithm without Bit Shifting}\citep{tsou_ipsec_2012}, also employed in Wireguard \citep{donenfeld_wireguard_2017}.
When applying message authentication, it was sufficient to authenticate messages individually to their flow. However, replay protection must be applied across all flows connected to the proxy, otherwise, a packet could be repeated but appearing as a different connection, and therefore remain undetected. This is similar to the design pattern of MPTCP's congestion control, where there is a separation between the sequence number of individual subflows and the sequence number of the data transport as a whole \citep[pp. 11]{wischik_design_2011}.
It was previously mentioned that my solution is transparent to the higher layer security encapsulated by proxied packets. Further to this, my solution provides transparent security in the other direction, where proxied packets are encapsulated by another security solution. Consider the case of a satellite office that employs both a whole network corporate VPN and my solution. The network can be configured in each of two cases: the multipath proxy runs behind the VPN, or the VPN runs behind the multipath proxy.
Packet structures for proxied packets in each of these cases are given in Appendix \ref{appendix:layered-security}, as Figure \ref{fig:whole-network-vpn-behind} and Figure \ref{fig:whole-network-vpn-infront} for the VPN Wireguard \citep{donenfeld_wireguard_2017}. In Figure \ref{fig:whole-network-vpn-infront}, the proxies are only accessible via the VPN protected network. It can be seen that the packet in Figure \ref{fig:whole-network-vpn-infront} is shorter, given the removal of the message authentication code and the data sequence number. The data sequence number is unnecessary, given that Wireguard uses the same anti-replay algorithm, and thus replayed packets would have been caught entering the secure network. Further, the message authentication code is unnecessary, as the authenticity of packets is now guaranteed by Wireguard.
Supporting and encouraging this layering of protocols provides a second benefit: if the security in my solution degrades with time, there are two options to repair it. One can either fix the open source application, or compose it with a security solution that is not broken, but perhaps provides redundant security guarantees, translating to additional overhead. To this end, the security features mentioned are all configurable. This allows for flexibility in implementation.
The benefits of using a VPN tunnel between the two proxies are shown in Figure \ref{fig:security-zones-vpn}. Whereas in Figure \ref{fig:security-zones} the proxy-to-proxy communication is across the unprotected Internet, in Figure \ref{fig:security-zones-vpn} this communication occurs across a secure overlay network. This allows the packet transport to be trusted, and avoids the need for additional verification. Further, it allows the application to remain secure in any situation where a VPN will work. Home users, in most cases, would use this solution with the inbuilt authentication mechanisms. Business users, who already have a need for a corporate VPN, would benefit from running my solution across VPN tunnels, avoiding the need to complete authentication work multiple times.
In this section, I evaluate three potential languages (C++, Rust and Go) for the implementation of this software. To support this evaluation, I have provided a sample program in each language. The sample program is a minimal example of reading packets from a TUN interface, placing them in a queue from a single thread, and consuming the packets from the queue with multiple threads. These examples are given in Figures \ref{fig:cpp-tun-sample} through \ref{fig:go-tun-sample}, in Appendix \ref{appendix:language-samples}. For each language, I considered the performance, code clarity, and the language ecosystem. This culminated in choosing Go for the implementation language.
I similarly evaluated two languages for the test suite: Python and Java. Though Python was initially chosen for rapid development and better ecosystem support, the final test suite is a combination of both Python and Java - Python for data processing, and Java for systems interaction.
There are two primary advantages to completing this project in C++: speed of execution, and C++ being low level enough to achieve this project's goals (which turned out to be true for all considered languages).
The negatives of using C++ are demonstrated in the sample script, given in Figure \ref{fig:cpp-tun-sample}: achieving even the base functionality of this project requires multiple times more code than Rust or Go (93 lines compared to 34 for Rust or 48 for Go). This arises from the need to manually implement the required thread safe queue, which is available as a library for Rust, and included in the Go runtime. This manual implementation gives rise to additional risk of incorrect implementation, specifically with regards to thread safety, that could cause undefined behaviour, security vulnerabilities, and great difficulty debugging. Further, although open source queues are available, they are not handled by a package manager, and thus security updates would have to be manual, risking the introduction of bugs. Finally, C++ does not provide any memory safety guarantees.
Rust is memory safe and thread safe, solving the latter issues with C++. Rust also has a minimal runtime, allowing for an execution speed comparable to C or C++. The Rust sample is given in Figure \ref{fig:rust-tun-sample}, and is pleasantly concise.
For the purposes of this project, Rust's youthfulness is a negative. This is two-faceted: Integrated Development Environment (IDE) support and crate stability (crates are the Rust mechanism for package management). Firstly, the IDE support for Rust in my IDEs of choice is provided via a plugin to IntelliJ, and is not as well supported as the other languages. Secondly, the crate available for TUN support (tun-tap\footnote{\url{https://docs.rs/tun-tap/}}) does not yet provide a stable Application Programming Interface (API), which was noticed during test program development. Between writing the program initially and re-testing when documenting it, the crate API had changed to the point where my script no longer type checked. Further, the old version had disappeared, and thus I was left with a program that did not compile or function. Although I could write the API for TUN interaction myself, the safety benefits of Rust would be less pronounced, as the direct systems interaction requires \texttt{unsafe} code, which bypasses parts of the type-checker and borrow-checker, leading to an increased potential for bugs.
The final language to evaluate is Go, often written as GoLang. The primary difference between Go and the other two evaluated languages is the presence of a runtime. The code sample is provided in Figure \ref{fig:go-tun-sample}. Go is significantly higher level than the other two languages mentioned, and provides a memory management model that is both simpler than C++ and more standard than Rust.
For the greedy structure of this project, Go's focus on concurrency is extremely beneficial. Go has channels in the standard runtime, which support any number of both producers and consumers. In this project, both SPMC (Single Producer Multi Consumer) and MPSC (Multi Producer Single Consumer) queues are required, so having these provided as a first class feature of the language is beneficial.
Garbage collection and first order concurrency come together to make the code produced for this project highly readable, but relies on a more complex runtime than the other two languages evaluated. The downside is that the speed of execution is negatively affected by this runtime. However, for the purposes of this first production, that compromise is acceptable. By producing code that makes the functionality of the application clear, future implementations could more easily be built to mirror it. Given the performance shown in Section \ref{section:performance-evaluation}, the benefits of the compromise of using a well-suited, high-level language are clearly evident.
Python is a dynamically typed language, and it was chosen as the initial implementation language for the test suite. The first reason for this is \verb'matplotlib',\footnote{\url{https://matplotlib.org/}} a widely used graphing library that can produce the graphs needed for this evaluation. The second reason is \verb'proxmoxer'\footnote{\url{https://github.com/proxmoxer/proxmoxer}}, a fluent API for interacting with a Proxmox server.
Having the required modules available allowed for a swift initial development sprint. This showed that the method of evaluation was viable and effective. However, the requirements of evaluation changed with the growth of the software, and an important part of an agile process is adapting to changing requirements. The lack of static typing complicates the refactorability of Python, and becomes increasingly challenging as the project grows. Therefore, after the initial proof of concept, it became necessary to explore another language for the Proxmox interaction.
Java is statically typed and became the implementation language for all external interaction within the test suite. The initial reason for not choosing Java was the lack of availability of an equivalent library to \verb'proxmoxer'. However, as the implementation size grew in Python, the lack of static typing meant that making changes to the system without adding bugs became particularly difficulty. Further, productivity was reduced by lack of code suggestions provided by \verb'proxmoxer' without type hints, as much API documentation had to be read for each implemented piece of code.
To this end, I developed a library in Java with an almost identical interface, but providing a high degree of type-safety. This allowed for much safer changes to the program, while also encouraging the use of IDE hints for quickly generating code. Although the data gathering was much improved by switching to Java, the code for generating graphs was perfectly manageable in Python. As such, a hybrid solution with Java for data gathering and Python for data processing was employed.
The requirements of the project are detailed in the Success Criteria of the Project Proposal (Appendix \ref{appendix:project-proposal}), and are the primary method of evaluation for project success. They are split into three categories: success criteria, extended goals and stretch goals.
The three categories of success criteria can be summarised as follows. The success criteria, or must have elements, are to provide a multi-path proxy that is functional, secure and improves speed and resilience in specific cases. The extended goals, or should have elements, are focused on increasing the performance and flexibility of the solution. The stretch goals, or could have elements, are aimed at increasing performance by reducing overheads, and supporting IPv6 alongside IPv4.
Beyond the success criteria, I wanted to demonstrate the practicality of my software on prototypic networking equipment; therefore, continuous integration testing and evaluation will run on Linux and FreeBSD.
The development of this software followed the agile methodology. Work was organised into weekly sprints, aiming for increased functionality in the software each time. By focusing on sufficient but not excessive planning, a minimum viable product was quickly established. From there, the remaining features could be implemented in the correct sized segments. Examples of these sprints are: initial build including configuration, TUN adaptors and main program; TCP transport, enabling an end-to-end connection between the two parts; repeatable testing, providing the data to evaluate each iteration of the project against its success criteria; UDP transport for performance and control.
The agile methodology welcomse changing requirements \citep{beck_manifesto_2001}, and as the project grew, it became clear where shortcomings existed, and these could be fixed in very quick pull requests. An example is given in Figure \ref{fig:changing-requirements}, in which the type of a variable was changed from \mintinline{go}{string} to \mintinline{go}{func() string}. This allowed for lazy evaluation, when it became clear that configuring fixed IP addresses or DNS names could be impractical. Static typing enables refactors like this to be completed with ease, particularly with the development tools mentioned in the next section, reducing the incidental complexity of the agile methodology.
A large part of the language choice focused on development tools, particularly IDE support. I used GoLand (Go), IntelliJ (Java), and PyCharm (Python). Using intelligent IDEs, particularly with the statically-typed Go and Java, significantly increases programming productivity. They provide code suggestions and automated code generation for repetitive sections to reduce keystrokes, syntax highlighting for ease of reading, near-instant type checking without interaction, and many other features. Each reduce incidental complexity.
I used Git version control, with a self-hosted Gitea\footnote{\url{https://gitea.com/}} server as the remote. The repository contains over 180 commits, committed at regular intervals while programming. I maintained several on- and off-site backups (Multiple Computers + Git Remote + NAS + 2xCloud + 2xUSB). The Git remote was updated with every commit, the NAS and Cloud providers daily, with one USB updated every time significant work was added and the other a few days after. Having some automated and some manual backups, along with a variety of backup locations, minimises any potential data loss in the event of any failure. The backups are regularly checked for consistency, to ensure no data loss goes unnoticed.
Alongside my Gitea server, I have a self-hosted Drone\footnote{\url{http://drone.io/}} server for continuous integration: running Go tests, verifying formatting, and building artefacts. On a push, after verification, each artefact is built, uploaded to a central repository, and saved under the branch name. This dovetailed with my automated testing, which downloaded the relevant artefact automatically for the branch under test. I also built artefacts for multiple architectures to support real world testing on \texttt{AMD64} and \texttt{ARM64} architectures.
Continuous integration and Git are used in tandem to ensure that each pull request meet certain standards before merging, reducing the possibility of accidentally causing performance regressions. Pull requests also provide an opportunity to review submitted code, even with the same set of eyes, in an attempt to detect any glaring errors. Twenty-four pull requests were submitted to the repository for this project.
I had significant experience with the language Go before the start of this project, though not formally taught. My knowledge of networking is limited to that of a user, and the content of the Part IB Tripos courses \emph{Computer Networking} and \emph{Principles of Communication} (the latter given after the start of this project). The security analysis drew from the Part IA course \emph{Software and Security Engineering} and the Part IB course \emph{Security}. As the software is highly concurrent, the Part IB course \emph{Concurrent and Distributed Systems} and the Part II Unit of Assessment \emph{Multicore Semantics and Programming} were applied.
In this Chapter, I described my preparation for developing, testing and securing my proxy application. I chose to implement MACs, authenticated headers, and IP allow-lists for security, while maintaining composability with other solutions such as VPNs. I will be using Go as the implementation language for its high-level features that are well suited to this project, and Python and Java for evaluation for programming speed and type-safety, respectively. I have prepared a set of development tools, including IDEs, version control and continuous integration, to encourage productivity as both a developer and a project manager.