Update on Overleaf.

This commit is contained in:
jsh77 2021-05-13 10:14:43 +00:00 committed by overleaf
parent 3831d58a4a
commit aef7709b4f
2 changed files with 40 additions and 44 deletions

View File

@ -19,18 +19,18 @@ Proxying packets is the process of taking packets that arrive at one location an
\begin{figure}
\centering
\includegraphics{2_Preparation/Figs/security-zones.png}
\caption{A summary of the three different transportation zones in this proxy.}
\caption{A summary of the three different transportation zones in this proxy, with grey shading indicating an adversarial network.}
\label{fig:security-zones}
\end{figure}
Any connection between two computers presents a set of security risks. A proxy consists of both these risks, and further, which I will present and discuss in this section. Firstly, this focuses on layered security. This is the case of the Local Proxy and Remote Proxy, with everything in between, being viewed as an Internet connection. The focus is on how the risks compare to that of a standard Internet connection, and what guarantees must be made to achieve the same risks for a proxied connection as for a standard connection.
Any connection between two computers presents a set of security risks. A proxy adds some further risks to this, as additional attack vectors are created by the proxy itself. Firstly, this section focuses on layered security. If we consider the Local Proxy and Remote Proxy, with everything in between, as a single Internet connection, layered security focuses on how the risks of this combined connection compare to that of a standard Internet connection, and what guarantees must be made to achieve the same risks for a proxied connection as for a standard connection.
The transportation of packets is in three sections, as shown in Figure \ref{fig:security-zones}. The first segment of the figure is Client-to-Proxy, which occurs physically in the local zone. Secondly is Proxy-to-Proxy, which occurs across the Internet. Finally is Proxy-to-Server, which also occurs across the Internet. With the goal of providing security equivalent to a standard connection, the Client-to-Proxy communication can be considered safe - it is equivalent to connecting a client directly to a modem. Therefore, this section will focus on the transports of Proxy-to-Proxy, and Proxy-to-Server communication. The threat model for this analysis will now be described.
The transportation of packets is in three sections, as shown in Figure \ref{fig:security-zones}. The first segment of the figure is Client-to-Proxy, which occurs physically in the local zone. The second section is Proxy-to-Proxy, which occurs across the Internet. Finally is Proxy-to-Server, which also occurs across the Internet. With the goal of providing security equivalent to a standard connection, the Client-to-Proxy communication can be considered safe - it is equivalent to connecting a client directly to a modem. Therefore, this section will focus on the transports of Proxy-to-Proxy, and Proxy-to-Server communication. The threat model for this analysis will now be described.
\subsection{Threat Model}
\label{section:threat-model}
The threat model considered here will be that packets can be injected, read, and black-holed at any point in the Internet. Private networks will be considered safe, covering both the connection between from client to local proxy, and any connections within a VPN (Virtual Private Network).
The threat model considered here will be that packets can be injected, read, and black-holed at any point in the Internet. This is the model employed by \cite{dolev_security_1983}, described as ``the attacker carries the packet''. Private networks will be considered safe, covering both the connection between from client to local proxy, and any connections within a VPN (Virtual Private Network).
\subsection{Proxy-to-Proxy Communication}
@ -51,7 +51,7 @@ Charlie and Dave black-holing packets provides the same risk in either direction
\subsection{Proxy-to-Server Communication}
Packets between the proxy and server are transmitted openly across the Internet. As this proxy transports entire IP packets at layer 3, no security guarantees need be maintained once the IP packet has left the remote proxy. As such, it the responsibility of the application to provide its own security guarantees. Maintaining the same level of security for applications can therefore be achieved by ensuring that the packets which leave one side of a proxy are a subset of the packets that entered the other side.
Packets between the proxy and server are transmitted openly across the Internet. As this proxy transports entire IP packets at layer 3, no security guarantees need be maintained once the IP packet has left the remote proxy, as it is the responsibility of the application to provide its own security guarantees. Maintaining the same level of security as a standard connection can therefore be achieved by ensuring that the packets which leave one side of a proxy are a subset of the packets that entered the other side.
% ------------------------------- Security --------------------------------- %
\section{Security Solutions}
@ -61,32 +61,32 @@ This section provides means of alleviating the risks given in Section \ref{secti
\subsection{Message Authentication}
To provide integrity and freshness for each message, I evaluate two choices: Message Authentication Codes (MACs) or Digital Signatures. A MAC is a hash digest generated from a concatenation of data and a secret key. The hash digest is appended to the data before transmission. Anyone sharing the secret key can perform an identical operation to verify the hash and, therefore, the integrity of the data \citep[pp. 352]{menezes_handbook_1997}. Producing a digital signature for a message uses the private key in public/private keypair to generate a digital signature for a message, proving that the message was signed by the owner of the private key, which can be verified by anyone with the corresponding public key \citep[pp. 147-149]{anderson_security_2008}. In each case, a code is appended to the message, such that the integrity and authenticity of the message can be verified.
To provide integrity and freshness for each message, I evaluate two choices: Message Authentication Codes (MACs) and Digital Signatures. A MAC is a hash digest generated from a concatenation of data and a secret key. The hash digest is appended to the data before transmission. Anyone sharing the secret key can perform an identical operation to verify the hash and, therefore, the integrity of the data \citep[pp. 352]{menezes_handbook_1997}. Producing a digital signature for a message uses the private key in a public/private keypair to generate a digital signature for a message, proving that the message was signed by the owner of the private key, which can be verified by anyone with the corresponding public key \citep[pp. 147-149]{anderson_security_2008}. In each case, a code is appended to the message, such that the integrity and authenticity of the message can be verified.
Signatures provide non-repudiation, while MACs do not - one can know the owner of which private key signed a message, while anyone with the shared key could have produced an MAC for a message. The second point is that digital signatures are generally more computationally complex than MACs, and thus, given that the control of both ends lies with the same party, MAC is the message authentication of choice for this project.
As both proxy servers are controlled by the same party, non-repudiation - the knowledge of which side of the proxy provided an authenticity guarantee for the message - is not necessary. This leaves MAC as the message authentication of choice for this project, as producing MACs is less computationally complex than digital signatures, while not providing non-repudiation.
\subsection{Connection Authentication}
Beyond authenticating messages themselves, the connection built between the two proxies must be authenticated. Consider a person-in-the-middle attack, where an attacker forwards the packets as themselves between the two proxies. Then, the attacker stops forwarding packets, and instead black holes them. This creates the denial of service mentioned in the previous Section.
Beyond authenticating messages themselves, the connection built between the two proxies must be authenticated. Consider a person-in-the-middle attack, where an attacker forwards the packets between the two proxies. Then, the attacker stops forwarding packets, and instead black holes them. This creates the denial of service mentioned in the previous section.
To prevent such forwarding attacks, the connection itself must be authenticated. I present two methods to solve this, the first being address allow-lists, and the second authenticating the IP address and port of each sent packet. The first solution is static, and simply states that the listening proxy may only respond to new communications when their IP/a DNS record pointing to their IP is present in an approved set.
To prevent such forwarding attacks, the connection itself must be authenticated. I present two methods to solve this, the first being address allow-lists, and the second authenticating the IP address and port of each sent packet. The first solution is static, and simply states that the listening proxy may only respond to new communications when the IP address of the communication is in an approved set. This verifies that the connection is from an approved party, as they must control that IP to create a two-way communication from it.
Secondly is a more dynamic solution. The IP authentication header \citep{kent_ip_2005} achieves this by protecting all immutable parts of the IP header with an authentication code. In the case of this software, including the source IP address, source port, destination IP address, and destination port ensures connection authenticity, presuming the lack of an on the wire attack (an attack that is feasible regardless of the presence of this software). By authenticating these addresses, which can be checked easily at both ends, it can be confirmed that both devices knew with whom they were talking. That is, an owner of the shared key authorised this communication path.
The second is a more dynamic solution. The IP authentication header \citep{kent_ip_2005} achieves this by protecting all immutable parts of the IP header with an authentication code. In the case of this software, authenticating the source IP address, source port, destination IP address, and destination port ensures connection authenticity. By authenticating these addresses, which can be checked easily at both ends, it can be confirmed that both devices knew with whom they were talking, and from where the connection was initiated. That is, an owner of the shared key authorised this communication path.
However, both of these solutions have some shortfalls when NAT is involved. The second solution, authenticating addresses, fails with any form of NAT (Network Address Translation). This is because the IPs and often ports of the packets are unknown when it is sent, and therefore cannot be authorised. The first solution, providing a set of addresses, fails with CG-NAT (Carrier Grade NAT), as many share the IP address and hence anyone under the same IP could perform an attack. In most cases one of these solutions will work, else one can fail over to the security layering presented in Section \ref{section:layered-security}.
However, both of these solutions have some shortfalls when Network Address Translation (NAT) is involved. The second solution, authenticating addresses, fails with any form of NAT (Network Address Translation). This is because the IPs and ports of the packets sent by the sending proxy are different to when they will be received by the receiving proxy, and therefore cannot be authenticated. The first solution, providing a set of addresses, fails with Carrier Grade NAT (CG-NAT), as many users share the IP address, and hence anyone under the same IP could perform an attack. In most cases one of these solutions will work, else one can fail over to the security layering presented in Section \ref{section:layered-security}.
\subsection{Freshness}
To ensure freshness of received packets, an anti-replay algorithm is employed. Replay protection in IP authentication headers is achieved by using a sequence number on each packet. This sequence number is monotonically and strictly increasing. The algorithm that I have chosen to implement for this is \emph{IPsec Anti-Replay Algorithm without Bit Shifting} \citep{tsou_ipsec_2012}, also employed in Wireguard \citep{donenfeld_wireguard_2017}.
When applying message authentication, it was sufficient to authenticate messages individually to their flow. However, to apply repeat protection, this must be completed across all flows connected to the proxy. This is because, otherwise, a packet could be repeated but appearing as a different connection, and therefore remain undetected. This is similar to the design pattern of multipath TCP's congestion control, where there is a separation between the sequence number of individual subflows and the sequence number of the data transport as a whole \citep[pp. 11]{wischik_design_2011}.
When applying message authentication, it was sufficient to authenticate messages individually to their flow. However, replay protection must be applied across all flows connected to the proxy, otherwise, a packet could be repeated but appearing as a different connection, and therefore remain undetected. This is similar to the design pattern of MultiPath TCP's congestion control, where there is a separation between the sequence number of individual subflows and the sequence number of the data transport as a whole \citep[pp. 11]{wischik_design_2011}.
\subsection{Layered Security}
\label{section:layered-security}
It was previously mentioned that this solution focuses on maintaing the higher layer security of proxied packets. Further to this, this solution provides transparent security in the other direction. Consider the case of a satellite office that employs both a whole network corporate VPN and this solution. The network can be configured in each of two cases: the multipath proxy runs behind the VPN, or the VPN runs behind the multipath proxy.
It was previously mentioned that my solution is transparent to the higher layer security encapsulated by proxied packets. Further to this, my solution provides transparent security in the other direction, where proxied packets are encapsulated by another security solution. Consider the case of a satellite office that employs both a whole network corporate VPN and my solution. The network can be configured in each of two cases: the multipath proxy runs behind the VPN, or the VPN runs behind the multipath proxy.
These two examples are given in figures \ref{fig:whole-network-vpn-behind} and \ref{fig:whole-network-vpn-infront}, for the VPN Wireguard \citep{donenfeld_wireguard_2017}. In Figure \ref{fig:whole-network-vpn-infront}, the portals are only accessible via the VPN protected network. It can be seen that the packet in Figure \ref{fig:whole-network-vpn-infront} is shorter, given the removal of the message authentication code and the data sequence number. The data sequence number is unnecessary, given that Wireguard uses the same anti-replay algorithm, and thus replayed packets would have been caught entering the secure network. Further, the message authentication code is unnecessary, as the authenticity of packets is now guaranteed by Wireguard.
Packet structures for proxied packets in each of these cases are given in Figure \ref{fig:whole-network-vpn-behind} and Figure \ref{fig:whole-network-vpn-infront}, for the VPN Wireguard \citep{donenfeld_wireguard_2017}. In Figure \ref{fig:whole-network-vpn-infront}, the portals are only accessible via the VPN protected network. It can be seen that the packet in Figure \ref{fig:whole-network-vpn-infront} is shorter, given the removal of the message authentication code and the data sequence number. The data sequence number is unnecessary, given that Wireguard uses the same anti-replay algorithm, and thus replayed packets would have been caught entering the secure network. Further, the message authentication code is unnecessary, as the authenticity of packets is now guaranteed by Wireguard.
\begin{figure}
\begin{leftfullpage}
@ -127,7 +127,7 @@ These two examples are given in figures \ref{fig:whole-network-vpn-behind} and \
\end{rightwordgroup}
\end{bytefield}
\caption{A Wireguard client behind the multipath proxy.}
\caption{Packet structure for a configuration with a Wireguard client behind my multipath proxy.}
\label{fig:whole-network-vpn-behind}
\end{leftfullpage}
\end{figure}
@ -166,67 +166,63 @@ These two examples are given in figures \ref{fig:whole-network-vpn-behind} and \
\end{rightwordgroup}
\end{bytefield}
\caption{A Wireguard client in front of the multipath proxy.}
\caption{Packet structure for a configuration with a Wireguard client in front of my multipath proxy.}
\label{fig:whole-network-vpn-infront}
\end{fullpage}
\end{figure}
Supporting and encouraging this layering of protocols provides a second benefit: if the security in this solution is found to be broken with time, there are two options to repair it. One can either fix the open source application, or compose it with a security solution that is not broken, but perhaps provides extraneous security guarantees and therefore causes reduced performance. To this end, the security features mentioned will all be configurable. This allows for flexibility in implementation.
Supporting and encouraging this layering of protocols provides a second benefit: if the security in my solution degrades with time, there are two options to repair it. One can either fix the open source application, or compose it with a security solution that is not broken, but perhaps provides redundant security guarantees, translating to additional overhead. To this end, the security features mentioned are all configurable. This allows for flexibility in implementation.
\begin{figure}
\centering
\includegraphics{2_Preparation/Figs/security-zones-vpn.png}
\caption{Connection classes covered by a VPN.}
\caption{A summary of the three different transportation zones in this proxy behind a VPN, with grey shading indicating an adversarial network.}
\label{fig:security-zones-vpn}
\end{figure}
The benefits of using a VPN tunnel between the two proxies are shown in Figure \ref{fig:security-zones-vpn}. Whereas in Figure \ref{fig:security-zones} the proxy-to-proxy communication is across the unprotected Internet, in Figure \ref{fig:security-zones-vpn} this communication occurs across a secure overlay network. This allows the packets transported there to be trusted, and avoids the need for additional verification. Further, it allows the application to remain secure in any situation where a VPN will work.
The benefits of using a VPN tunnel between the two proxies are shown in Figure \ref{fig:security-zones-vpn}. Whereas in Figure \ref{fig:security-zones} the proxy-to-proxy communication is across the unprotected Internet, in Figure \ref{fig:security-zones-vpn} this communication occurs across a secure overlay network. This allows the packet transport to be trusted, and avoids the need for additional verification. Further, it allows the application to remain secure in any situation where a VPN will work. Home users, in most cases, would use this solution with the inbuilt authentication mechanisms. Business users, who already have a need for a corporate VPN, would benefit from running my solution across VPN tunnels, avoiding the need to complete authentication work multiple times.
% -------------------------- Language Selection ---------------------------- %
\section{Language Selection}
\label{section:language-selection}
In this section, I evaluate three potential languages (C++, Rust and Go) for the implementation of this software. To support this evaluation, I have provided a sample program in each language. The sample program is intended to be a minimal example of reading packets from a TUN interface, placing them in a queue from a single thread, and consuming the packets from the queue with multiple threads. These examples are given in figures \ref{fig:cpp-tun-sample} through \ref{fig:go-tun-sample}, in appendix \ref{appendix:language-samples}. The first test will be whether the small example is possible, which passed for all three languages. Then, considerations will be the performance of the language, clarity of code of the style needed to complete this software, and the ecosystem of the language. This culminates in choosing Go for the implementation language.
In this section, I evaluate three potential languages (C++, Rust and Go) for the implementation of this software. To support this evaluation, I have provided a sample program in each language. The sample program is intended to be a minimal example of reading packets from a TUN interface, placing them in a queue from a single thread, and consuming the packets from the queue with multiple threads. These examples are given in figures \ref{fig:cpp-tun-sample} through \ref{fig:go-tun-sample}, in Appendix \ref{appendix:language-samples}. The first test was whether the small example was possible, which passed for all three languages. I then considered the performance of the language, clarity of code of the style needed to complete this software, and the ecosystem of the language. This culminated in choosing Go for the implementation language.
Alongside the implementation language, a language is chosen to evaluate the implementation. Two potential languages are considered here, Python and Java. Though Python was initially chosen for rapid development and better ecosystem support, the final result is a combination of both Python and Java - Python for data processing, and Java for systems interaction.
Alongside the implementation language, a language is chosen to evaluate the implementation. Two potential languages are considered here, Python and Java. Though Python was initially chosen for rapid development and better ecosystem support, the final test suite is a combination of both Python and Java - Python for data processing, and Java for systems interaction.
\subsection{Implementation Languages}
\subsubsection{C++}
There are two primary advantages to completing this project in C++: speed of execution, and C++ being low level enough to achieve these goals (which was true for all considered languages). The negatives of using C++ are demonstrated in the sample script, given in Figure \ref{fig:cpp-tun-sample}, where it is immediately obvious that to achieve even the base of this project, the code in C++ is multiple times the length of equivalent code in either Rust or Go, at 93 lines compared to 34 for Rust or 48 for Go. This difference arises from the need to manually implement the required thread safe queue, while it is available as a library for Rust, and included in the Go runtime This manual implementation gives rise to additional risk of incorrect implementation, specifically with regards to thread safety, that could cause undefined behaviour and great difficulty debugging. Further, although open source queues are available, they are not handled by a package manager and thus security updates would have to be manual, leaving opportunity for unfound bugs.
There are two primary advantages to completing this project in C++: speed of execution, and C++ being low level enough to achieve this project's goals (which turned out to be true for all considered languages). The negatives of using C++ are demonstrated in the sample script, given in Figure \ref{fig:cpp-tun-sample}, where it is immediately obvious that to achieve even the base functionality of this project, the code in C++ is multiple times the length of equivalent code in either Rust or Go, at 93 lines compared to 34 for Rust or 48 for Go. This difference arises from the need to manually implement the required thread safe queue, while it is available as a library for Rust, and included in the Go runtime. This manual implementation gives rise to additional risk of incorrect implementation, specifically with regards to thread safety, that could cause undefined behaviour, security vulnerabilities, and great difficulty debugging. Further, although open source queues are available, they are not handled by a package manager, and thus security updates would have to be manual, leaving opportunity for unfound bugs.
The lack of memory safety in C++ is a significant negative of the language. Although C++ would provide increased performance over a language such as Go with a runtime, it is avoided due to the massive incidental complexity of manual memory management and the difficulty of manual thread safety.
The lack of memory safety in C++ is a significant negative of the language. Although C++ would provide increased performance over a language such as Go with a more feature-rich runtime, it is avoided due to the incidental complexity of manual memory management and the difficulty of manual thread safety.
\subsubsection{Rust}
Rust is memory safe and thread safe, solving the latter issues with C++. Rust also has no runtime, allowing for similar execution speed, comparable to C or C++. The Rust sample is given in Figure \ref{fig:rust-tun-sample}, and is pleasantly concise.
Rust is memory safe and thread safe, solving the latter issues with C++. Rust also has a minimal runtime, allowing for an execution speed comparable to C or C++. The Rust sample is given in Figure \ref{fig:rust-tun-sample}, and is pleasantly concise.
For the purposes of this project, the downsides of Rust come from its youthfulness. This is two-faceted: IDE (Integrated Development Environment) support and Crate stability. Firstly, the IDE support for Rust in my IDEs of choice is provided via a plugin to IntelliJ, and is not as well supported as either of the other languages. Crates are the Rust mechanism Secondly, the crate available for TUN support (tun-tap\footnote{\url{https://docs.rs/tun-tap/}}) does not yet provide a stable API, which was noticed during the development of the test program. Between writing the program initially and re-testing when documenting it, the API of the Crate had changed to the point where my script no longer type checked. Further, the old version had disappeared, and thus I was left with a program that didn't compile or function. Although writing the API for TUN interaction is not an issue, the safety benefits of Rust would be less pronounced, as the direct systems interaction would require unsafe code, leading to an increased potential for bugs.
\mynote{NOTE: Alastair also didn't like the unsafe code piece, but I've now reviewed multiple Rust tuntap implementations and they all use unsafe code. I'm not sure if I need to explain it some more, but I'm not really sure how they wouldn't.}
For the purposes of this project, the downsides of Rust come from its youthfulness. This is two-faceted: Integrated Development Environment (IDE) support and crate stability. Firstly, the IDE support for Rust in my IDEs of choice is provided via a plugin to IntelliJ, and is not as well supported as either of the other languages. Crates are the Rust mechanism for package management, providing packages that developed software can depend upon. Secondly, the crate available for TUN support (tun-tap\footnote{\url{https://docs.rs/tun-tap/}}) does not yet provide a stable Application Programming Interface (API), which was noticed during the development of the test program. Between writing the program initially and re-testing when documenting it, the API of the crate had changed to the point where my script no longer type checked. Further, the old version had disappeared, and thus I was left with a program that didn't compile or function. Although I could write the API for TUN interaction myself, the safety benefits of Rust would be less pronounced, as the direct systems interaction requires \texttt{unsafe} code, which bypasses parts of the type-checker and borrow-checker, leading to an increased potential for bugs.
\subsubsection{Go}
The final language to evaluate is Go, often written as GoLang. The primary difference between Go and the other two evaluated languages is the presence of a runtime. Regardless, it is the language of choice for this project, with a sample provided in Figure \ref{fig:go-tun-sample}. Go is significantly higher level than the other two languages mentioned, and provides a memory management model that is both simpler than C++ and more standard than Rust.
The final language to evaluate is Go, often written as GoLang. The primary difference between Go and the other two evaluated languages is the presence of a runtime. The code sample provided in Figure \ref{fig:go-tun-sample}. Go is significantly higher level than the other two languages mentioned, and provides a memory management model that is both simpler than C++ and more standard than Rust.
For the greedy structure of this project, Go's focus on concurrency is extremely beneficial. Go has channels in the standard runtime, which support any number of both producers and consumers. In this project, both SPMC (Single Producer Multi Consumer) and MPSC (Multi Producer Single Consumer) queues are required, so having these provided as a first class feature of the language is beneficial.
Garbage collection and first order concurrency come together to make the code produced for this project highly readable. The downside of this runtime is that the speed of execution is negatively affected. However, for the purposes of this first production, that compromise is acceptable. By producing code that makes the functionality of the application clear, future implementations could more easily be built to mirror it. Given the performance shown in Section \ref{section:performance-evaluation}, the compromise of using a well-suited high-level language is one worth taking.
Garbage collection and first order concurrency come together to make the code produced for this project highly readable, but relies on a more complex runtime than the other two languages evaluated. The downside is that the speed of execution is negatively affected by this runtime. However, for the purposes of this first production, that compromise is acceptable. By producing code that makes the functionality of the application clear, future implementations could more easily be built to mirror it. Given the performance shown in Section \ref{section:performance-evaluation}, the benefits of the compromise of using a well-suited, high-level language are clearly evident.
\subsection{Evaluation Languages}
\label{section:preparation-language-choices-evaluation}
\subsubsection{Python}
Python is a dynamically typed, and was chosen as the initial implementation language. The first reason for this is \verb'matplotlib',\footnote{\url{https://matplotlib.org/}} a widely used graphing library that can produce the graphs needed for this evaluation. The second reason is \verb'proxmoxer'\footnote{\url{https://github.com/proxmoxer/proxmoxer}}, a fluent API for interacting with a Proxmox server.
Python is a dynamically typed language, and it was chosen as the initial implementation language for the test suite. The first reason for this is \verb'matplotlib',\footnote{\url{https://matplotlib.org/}} a widely used graphing library that can produce the graphs needed for this evaluation. The second reason is \verb'proxmoxer'\footnote{\url{https://github.com/proxmoxer/proxmoxer}}, a fluent API for interacting with a Proxmox server.
\mynote{QUESTION: Alastair mentioned that I might not need footnotes at all, does that suggest references instead, or simply using the names?}
Having the required modules available allowed for a swift initial development sprint. This showed that the method of evaluation was viable and effective. However, the requirements of evaluation changed with the growth of the software, and an important part of an agile process is adapting to changing requirements. The lack of static typing limits the refactorability of Python, and becomes increasingly challenging as the project grows. Therefore, after the initial proof of concept, it became necessary to explore another language for the Proxmox interaction.
Having the required modules available allowed for a swift initial development sprint. This showed that the method of evaluation was viable and effective. However, the requirements of evaluation changed with the growth of the software, and an important part of an agile process is adapting to changing requirements. The lack of static typing complicates the refactorability of Python, and becomes increasingly challenging as the project grows. Therefore, after the initial proof of concept, it became necessary to explore another language for the Proxmox interaction.
\subsubsection{Java}
Java is statically typed, and became the implementation language for all external interaction. The initial reason for not choosing Java was the availability of an equivalent library to \verb'proxmoxer'. However, as the implementation size grew in Python, the lack of static typing meant that making changes to the system without adding bugs became particularly difficulty. Further, productivity was reduced by lack of code suggestions provided by \verb'proxmoxer' without type hints, as much API (Application Programming Interface) documentation had to be read for each implemented piece of code.
Java is statically typed and became the implementation language for all external interaction within the test suite. The initial reason for not choosing Java was the lack of availability of an equivalent library to \verb'proxmoxer'. However, as the implementation size grew in Python, the lack of static typing meant that making changes to the system without adding bugs became particularly difficulty. Further, productivity was reduced by lack of code suggestions provided by \verb'proxmoxer' without type hints, as much API documentation had to be read for each implemented piece of code.
To this end, I developed a library in Java with an almost identical interface, but providing a high degree of type-safety. This allowed for much safer changes to the program, while also encouraging the use of IDE hints for quickly generating code. Although the data gathering was much improved by switching to Java, the code for generating graphs was perfectly manageable in Python. As such, a hybrid solution with Java for data gathering and Python for data processing was employed.
@ -246,9 +242,9 @@ Beyond the success criteria, a requirement of the software produced is platform
\subsubsection{Software Development Model}
The development of this software followed the agile methodology. Work was organised into weekly sprints, aiming for increased functionality in the software each time. By focusing on sufficient but not excessive planning, a minimum viable product was quickly established. From there, the remaining features could be extracted in the correct sized segments. Examples of these sprints are: initial build including configuration, TUN adapters and main program; TCP transport, enabling an end-to-end connection between the two parts; repeatable testing, providing the data to evaluate each iteration of the project against its success criteria; UDP for performance and control.
The development of this software followed the agile methodology. Work was organised into weekly sprints, aiming for increased functionality in the software each time. By focusing on sufficient but not excessive planning, a minimum viable product was quickly established. From there, the remaining features could be implemented in the correct sized segments. Examples of these sprints are: initial build including configuration, TUN adapters and main program; TCP transport, enabling an end-to-end connection between the two parts; repeatable testing, providing the data to evaluate each iteration of the project against its success criteria; UDP transport for performance and control.
One of the most important features of any agile methodology is welcoming changing requirements \citep{beck_manifesto_2001}. As the project grew, it became clear where shortcomings existed, and these could be fixed in very quick pull requests. An example is given in Figure \ref{fig:changing-requirements}, in which the type of a variable was changed from \mintinline{go}{string} to \mintinline{go}{func() string}. This allowed for lazy evaluation, when it became clear that configuring fixed IP addresses or DNS names could be impractical with certain setups. The static typing in the chosen language enables refactors like this to be completed with ease, particularly with the development tools mentioned in the next Section, reducing the incidental complexity of the agile methodology.
One of the most important features of any agile methodology is welcoming changing requirements \citep{beck_manifesto_2001}. As the project grew, it became clear where shortcomings existed, and these could be fixed in very quick pull requests. An example is given in Figure \ref{fig:changing-requirements}, in which the type of a variable was changed from \mintinline{go}{string} to \mintinline{go}{func() string}. This allowed for lazy evaluation, when it became clear that configuring fixed IP addresses or DNS names could be impractical with certain setups. The static typing in the chosen language enables refactors like this to be completed with ease, particularly with the development tools mentioned in the next section, reducing the incidental complexity of the agile methodology.
\begin{figure}
\centering
@ -268,17 +264,17 @@ One of the most important features of any agile methodology is welcoming changin
\subsubsection{Development Tools}
A large part of the language choice focused on development tools. As discussed in Section \ref{section:language-selection}, IDE support is important for programming productivity. My preferred IDEs are those supplied by JetBrains\footnote{\url{https://jetbrains.com/}}, generously provided for education and academic research free of charge. As such, I used GoLand for the Go development of this project, IntelliJ for the Java evaluation development, and PyCharm for the Python evaluation program. Using an intelligent IDE, particularly with the statically typed Go and Java, can significantly increases programming productivity. Provided are intelligent code suggestions and automated code generation for repetitive sections to reduce keystrokes, syntax highlighting for ease of reading, near-instant type checking without interaction, and many other features. Each reduce incidental complexity.
A large part of the language choice focused on development tools. As discussed in Section \ref{section:language-selection}, IDE support is important for programming productivity. My preferred IDEs are those supplied by JetBrains,\footnote{\url{https://jetbrains.com/}} generously provided for education and academic research free of charge. As such, I used GoLand for the Go development of this project, IntelliJ for the Java evaluation development, and PyCharm for the Python evaluation program. Using an intelligent IDE, particularly with the statically typed Go and Java, can significantly increases programming productivity. They provide intelligent code suggestions and automated code generation for repetitive sections to reduce keystrokes, syntax highlighting for ease of reading, near-instant type checking without interaction, and many other features. Each reduce incidental complexity.
I used Git version control, with a self-hosted Gitea\footnote{\url{https://gitea.com/}} server as the remote. The repository contains over 180 commits, committed at regular intervals while programming. My repositories have a multitude of on- and off-site backups at varying frequencies (Multiple Computers + Git Remote + NAS + 2xCloud + 2xUSB). The Git remote was updated with every commit, the NAS and Cloud providers daily, with one USB updated every time significant work was added and the other a few days after. Having some automated and some manual backups, along with a wide variety of backup locations, ensures that the potential data loss in the event of any failure is minimal. The backups are regularly checked for consistency, to ensure no data loss goes unnoticed.
Alongside my self-hosted Gitea server, I have a self-hosted Drone\footnote{\url{http://drone.io/}} server for continuous integration. This made it simple to add a Drone file to the repository, allowing for the Go tests to be run, formatting verified, and artefacts built. On a push, after the verification, each artefact is built and uploaded to a central repository, where it is saved under the branch name. This is particularly useful for automated testing, as the relevant artefact can be downloaded automatically from a known location for the branch under test. Further, artefacts are built for multiple architectures, particularly useful when performing real world testing spread between AMD64 and ARM64 architectures.
Alongside my self-hosted Gitea server, I have a self-hosted Drone\footnote{\url{http://drone.io/}} server for continuous integration. This made it simple to add a Drone file to the repository, allowing for the Go tests to be run, formatting verified, and artefacts built. On a push, after the verification, each artefact is built and uploaded to a central repository, where it is saved under the branch name. This is particularly useful for automated testing, as the relevant artefact can be downloaded automatically from a known location for the branch under test. Further, artefacts are built for multiple architectures, particularly useful when performing real world testing spread between \texttt{AMD64} and \texttt{ARM64} architectures.
Continuous integration and Git are used in tandem to ensure that all code in a pull request meets certain standards. By ensuring that tests are automatically run before merging, all code that is merged must be formatted correctly and able to pass the tests. This removes the possibility of accidentally causing an already tested for regression to occur during a merge by forgetting to run the tests. Pull requests also provide an opportunity to review submitted code, even with the same set of eyes, in an attempt to detect any glaring errors. Twenty-four pull requests were submitted to the repository for this project.
\subsubsection{Licensing}
I have chosen to license this software under the MIT license. The MIT license is simple and permissive, enabling reuse and modification of the code, subject to including the license. Alongside the hopes that the code will receive updated pull requests over time, a permissive license allows others to build upon the given solution. A potential example of a solution that could build from this is a company employing a SaaS (Software as a Service) model to configure a remote proxy on your behalf, perhaps including the hardware required to convert this fairly involved solution into a plug-and-play option.
I have chosen to license this software under the MIT license. The MIT license is simple and permissive, enabling reuse and modification of the code, subject to including the license. Alongside the hopes that the code will receive updated pull requests over time, a permissive license allows others to build upon the given solution. A potential example of a solution that could build from this is a company employing a Software as a Service (SaaS) model to configure a remote proxy on your behalf, perhaps including the hardware required to convert this fairly involved solution into a plug-and-play option.
% ---------------------------- Starting Point ------------------------------ %
\section{Starting Point}
@ -288,4 +284,4 @@ I had significant experience with the language Go before the start of this proje
% -------------------------------- Summary --------------------------------- %
\section{Summary}
In this Chapter, I described my preparation for developing, testing and securing my proxy application. I chose to implement MACs, authenticated headers, and IP allow-lists for security, while maintaining composability with other solutions such as VPNs. I will be using Go as the implementation language for its high-level features that are well suited to this project, and Python and Java for evaluation for programming speed and type-safety respectively. I have prepared a good set of development tools, including IDEs, version control and continuous integration, to encourage productivity as a programmer.
In this Chapter, I described my preparation for developing, testing and securing my proxy application. I chose to implement MACs, authenticated headers, and IP allow-lists for security, while maintaining composability with other solutions such as VPNs. I will be using Go as the implementation language for its high-level features that are well suited to this project, and Python and Java for evaluation for programming speed and type-safety, respectively. I have prepared a set of development tools, including IDEs, version control and continuous integration, to encourage productivity as both a developer and a project manager.

View File

@ -11,9 +11,9 @@
\graphicspath{{5_Conclusions/Figs/Vector/}{5_Conclusions/Figs/}}
\fi
The software produced in this project provides a method of combining multiple Internet connections, prioritising throughput and resilience in the resultant aggregate connection. As a result, this project was a success. All of the core success criteria were met, along with many of the extended goals. The proxy provides a novel approach to combining Internet connections, responding well to very dynamic Internet connections.
The software produced in this project provides a method of combining multiple Internet connections via a proxy, prioritising throughput and resilience in the resultant aggregate connection. The proxy provides a novel approach to combining Internet connections, responding well to very dynamic Internet connections. All of the core success criteria were met, along with many of the extended goals.
The multipath proxy built in this project provides an effective method to combine dynamic Internet connections, that works in today's Internet. Though future work may make much of this redundant, the performance gains seen today are useful in many situations. As it becomes more common to see a variety of connections in homes, such as 5G, Low Earth Orbit and DSL, a method to combine these that dynamically adapts to the variability of the wireless connections can be a huge advantage, especially in situations where gaining a single faster link is difficult.
The multipath proxy built in this project provides an effective method to combine dynamic Internet connections, and it works in today's Internet. Though future work may make much of this redundant, the performance gains seen today are useful in many situations. As it becomes more common to see a variety of connections in homes, such as 5G, Low Earth Orbit and DSL, a method to combine these that dynamically adapts to the variability of the wireless connections can be a significant, practical benefit, especially in situations where gaining a single faster link is difficult.
\section{Lessons Learnt}
@ -23,12 +23,12 @@ I learnt throughout this project the importance of producing a minimum viable pr
Further, lessons were learnt on the quality of packages. A package being a part of the standard library for a language does not imply support or a full feature set, while packages from respected software companies can be superior.
On re-implementation of this work, more considerations should be made for the interface of the software. In particular, monitoring the current connections without a debugger is particularly difficult, and monitoring long term statistics is presently impossible. This compromise was made for code readability and clarity, increasing the likelihood correct code, but does raise some issues for the network architects who implement this software.
On re-implementation of this work, more considerations should be made for the interface of the software. In particular, monitoring the current connections without a debugger is particularly difficult, and monitoring long term statistics is presently impossible. This compromise was made for code readability and clarity, increasing the likelihood of correct code, but does raise some issues for the network architects who implement this software.
Many of the lessons learnt relating to IP routing are detailed in Section \ref{section:implementation-system-configuration}, which would aid future implementations significantly, allowing the developer to focus only on what needs to occur in the application itself. Similarly, Figure \ref{fig:dataflow-overview} provides a comprehensive overview of the particularly complex dataflow within this application. These tools provide an effective summary of the information needed to implement this software again, reducing the complexity of such a new implementation, and allowing the developer to focus on the important features.
\section{Future Work}
A further concept that could take multipath proxies further is alternative methods of load balancing. Having control of both proxies allows for a variety of load balancing mechanisms, of which congestion control is only one. An alternative method is to monitor packet loss, and use this to infer the maximum capacity of each link. These capacities can then be used to load balance packets by proportion as opposed to greedily with congestion control. This could provide performance benefits over congestion control by allowing the congestion control mechanisms of underlying flows to be better employed, while also having trade-offs with slower reaction to connection changes.
Alternative methods of load balancing could take multipath proxies further. Having control of both proxies allows for a variety of load balancing mechanisms, of which congestion control is only one. An alternative method is to monitor packet loss, and use this to infer the maximum capacity of each link. These capacities can then be used to load balance packets by proportion as opposed to greedily with congestion control. This could provide performance benefits over congestion control by allowing the congestion control mechanisms of underlying flows to be better employed, while also having trade-offs with slower reaction to connection changes.
To increase performance, a kernel implementation of the proxy could be written. Kernel implementations avoid copying the packets between kernel- and user-space, as well as removing the cost of syscalls. This can increase maximum performance significantly, as well as reducing latency. These combine to make the software useful in more places, though enforcing platform compatibility to only that which uses the compatible kernel. Therefore, having kernel implementations maintain compatibility with a user-space implementation allows more systems to take advantage of the proxy.