323 lines
36 KiB
TeX
323 lines
36 KiB
TeX
%*******************************************************************************
|
|
%****************************** Second Chapter *********************************
|
|
%*******************************************************************************
|
|
|
|
\chapter{Preparation}
|
|
|
|
\ifpdf
|
|
\graphicspath{{2_Preparation/Figs/Raster/}{2_Preparation/Figs/PDF/}{2_Preparation/Figs/}}
|
|
\else
|
|
\graphicspath{{2_Preparation/Figs/Vector/}{2_Preparation/Figs/}}
|
|
\fi
|
|
|
|
Proxying packets is the process of taking packets that arrive at one location and transporting them to leave at another. This chapter focuses on the preparatory work to achieve this practically and securely, given the design outlined in the previous chapter, in which the proxy consolidates multiple connections to appear as one to both the wider Internet and devices on the local network. In sections \ref{section:risk-analysis} and \ref{section:preparation-security}, I discuss the security risks and plans to confront them. In section \ref{section:language-selection}, I present three languages: Go, Rust and C++, and provide context for choosing Go as the implementation language. Finally, in sections \ref{section:requirements-analysis} and \ref{section:engineering-approach}, I present a requirements analysis and a description of the engineering approach for the project.
|
|
|
|
% ---------------------------- Risk Analysis ------------------------------- %
|
|
\section{Risk Analysis}
|
|
\label{section:risk-analysis}
|
|
|
|
Any connection between two computers presents a set of security risks. A proxy consists of both these risks, and further, which I will present and discuss in this section. Firstly, this focuses on layered security. This is the case of the Local Portal and Remote Portal, with everything in between, being viewed as an Internet connection. The focus is on how the risks compare to that of a standard Internet connection, and what guarantees must be made to achieve the same risks for a proxied connection as for a standard connection.
|
|
|
|
Secondly, this section focuses on the connections between the Local Portal and the Remote Portal. This focuses primarily on the risk of accepting and transmitting a packet that is not intended, or sending packets to an unintended recipient.
|
|
|
|
These security problems will be considered in the context of the success criteria: provide security no worse than a standard connection. That is, the security should be identical or stronger than the threats in the first case, and provide no additional vectors of attack in the second.
|
|
|
|
\subsection{Higher Layer Security}
|
|
|
|
This application proxies entire IP packets, so is a Layer 3 solution. As such, the goal is to maintain the same guarantees that one would normally expect at layer 3, for higher layers to build off of. At layer 3, none of anonymity, integrity, privacy or freshness are guaranteed, so it is up to the application to provide its own security guarantees. As such, maintaining the same level of security for applications can be achieved by ensuring that the the packets which leave one side of a proxy are a subset of the packets that entered the other side.
|
|
|
|
This ensures that guarantees managed by layers above layer 3 are maintained. Regardless of whether a user is accessing insecure websites over HTTP, running a corporate VPN connection or sending encrypted emails, the security of these applications will be unaltered. Further, this allows other guarantees to be managed, including reliable delivery with TCP.
|
|
|
|
\subsection{Portal to Portal Communication}
|
|
|
|
\subsubsection{Denial of Service}
|
|
\label{subsubsection:threats-denial-of-service}
|
|
|
|
Proxying packets in this way provides a new method of Denial of Service. If an attacker can convince either portal to send them a portion of the packets due for the other portal, the packet loss of the overall connection from the perspective of the other portal will increase by an equivalent amount. For example, if a bad actor can convince the remote portal to send them $50\%$ of the good packets, and previously the packet loss was at $0.2\%$, the packet loss will increase to $50.1\%$.
|
|
|
|
This is of particular concern for flows carried by the proxy that use loss based congestion control. In such a case, for every 2 packets a TCP flow sends, it will lose one on average. This means that the window size will be unable to grow beyond one with NewReno congestion control. As such, the performance of these flows will be severely negatively impacted.
|
|
|
|
However, even if only $25\%$ of the packets are lost, NewReno would still fail to increase the window size past 3. This demonstrates that an attacker with even a slower connection than you can have a significant impact on connection performance.
|
|
|
|
\subsubsection{Privacy}
|
|
|
|
Though the packets leaving a modem have no reasonable expectation of privacy, having the packets enter the Internet at two points does present more points at which a packet can be read. For example, if the remote portal lies in a data center, the content and metadata of packets can be sniffed in either the data center or at the physical connections. However, this is equivalent to your packets taking a longer route through the Internet, with more hops. Therefore, comparatively, this is not worse.
|
|
|
|
Further, if an attacker convinces the Remote Portal that they are a valid connection from the Local Portal, a portion of packets will be sent to them. However, as a fortunate side effect, this method to attempt sniffing would cause a significant Denial of Service to any congestion controlled links based on packet loss, as shown in the previous segment. Therefore, as long as it is ensured that each packet is not sent to multiple places, privacy should be maintained at a similar level to simple Internet access, given that an eavesdropper using this active eavesdropping method will be very easy to detect.
|
|
|
|
\subsubsection{Cost}
|
|
|
|
In many cases, the remote portal will be taking advantage of a cloud instance, for the extremely high bandwidth and well peered connections available at a reasonable price. Cloud instances are often billed per unit of outbound traffic, and as such, an attacker could cause a user to carry a high cost burden by forcing their remote portal to transmit more outbound data. This should be avoided by ensuring that the packets transmitted out are both from the local portal and fresh.
|
|
|
|
% ------------------------------- Security --------------------------------- %
|
|
\section{Security}
|
|
\label{section:preparation-security}
|
|
|
|
This section provides means of alleviating the risks given in section \ref{section:risk-analysis}. To achieve this goal, the authenticity of packets will be verified. Authenticity in this context means two properties of the object hold: integrity and freshness \citep[pp. 14]{anderson_security_2008}. Integrity means that the object has not been altered since the last authorised modification, that is the transmission from the other portal. Freshness details that the object has not been used for this purpose before.
|
|
|
|
\subsection{Message Authentication}
|
|
|
|
To provide integrity and freshness for each message, I evaluate two choices: Message Authentication Codes (MACs) or Digital Signatures. A MAC combines the data with a shared key using a specific method, before using a one-way hash function to generate a message authentication code, and thus the result is only verifiable by someone with the same private key \citep[pp. 352]{menezes_handbook_1997}. Producing a digital signature for a message uses the private key in public/private keypair to produce a digital signature for a message, proving that the message was produced by the owner of the private key, which can be verified by anyone with the public key \citep[pp. 147-149]{anderson_security_2008}. In both cases, the message authentication code is appended to the message, such that the integrity and authenticity of the message can be verified.
|
|
|
|
The comparison is as such: signatures provide non-repudiation, while MACs do not - one can know the owner of which private key signed a message, while anyone with the shared key could have produced an MAC for a message. The second point is that digital signatures are generally more computationally complex than MACs, and thus, given that the control of both ends lies with the same party, MAC is the message authentication of choice for this project.
|
|
|
|
\subsection{IP Authentication Header}
|
|
|
|
The security requirements for this project are equivalent to those provided by the IP Authentication Header \citep{kent_ip_2005}. The IP authentication header operates between IP and the transport layer, using IP protocol number 51. The authentication header uses a hash function and a secret shared key to provide an Integrity Check Value. This check value covers all immutable parts of the IP header, the authentication header itself, and the data below the authentication header. Combined, this provides connectionless integrity and authenticity, as the IP header is authenticated. Further, the header contains a sequence number, which is used to prevent replay attacks.
|
|
|
|
Unfortunately, there are two reasons why this solution cannot be used: difficulties with NAT traversal, and inaccessibility for user-space programs. As the IP packet provides integrity for the source and destination addresses, any NAT that alters these addresses violates the integrity of the packet. Although NAT traversal is not an explicit success criteria for this project, it is implicit, as the flexibility of the project for different network structures is a priority, including those where NAT is unavoidable. The second is that IP authentication headers, being an IP protocol and not transport layer, would cause issues interacting with user-space programs. Given that the first implementation of transport is completed using TCP, having IP Authentication Headers would require the user-space program to handle the TCP connection without the aid of the kernel, complicating multiplexing and being an unsupported setup.
|
|
|
|
Overall, using the IP authentication header would function similarly to running over a VPN, described in section \ref{section:layered-security}. Although this will be a supported configuration, the shortfalls mean that it will not be the base implementation. However, inspiration can be taken from the header structure, shown in figure \ref{fig:ip-auth-header-structure}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{bytefield}[bitwidth=0.8em]{32}
|
|
\bitheader{0-31} \\
|
|
\bitbox{8}{Next Header} & \bitbox{8}{Payload Len} & \bitbox{16}{Reserved} \\
|
|
\wordbox{1}{Security Parameters Index} \\
|
|
\wordbox{1}{Sequence Number} \\
|
|
\wordbox[tlr]{1}{Integrity Check Value} \\
|
|
\wordbox[blr]{1}{$\cdots$}
|
|
\end{bytefield}
|
|
\caption{IP authentication header structure}
|
|
\label{fig:ip-auth-header-structure}
|
|
\end{figure}
|
|
|
|
It is first important to note the differences between the use of IP authentication headers and the security footers used in this application. Firstly, the next header field is unnecessary, given that headers are not being chained. Secondly, given the portals have a fixed security configuration by static configuration, the payload length field is unnecessary - the payloads will always be of a predetermined length. Similarly, the security parameters index is unnecessary, as the parameters will be equal.
|
|
|
|
The difference in security arises from the lack of integrity given to the fields above the application layer. That is, the IP header itself, and the TCP or UDP header. However, there is an important distinction between the TCP and UDP cases: TCP congestion control will not be covered by any application provided security, while the UDP congestion control will. That is, this application can do nothing to authenticate the ACKs of a TCP connection, as these are created outside of the control of the application. As such, the TCP implementation provided by the solution should be used in one of two ways: as a baseline test for the performance of other algorithms, or taking advantage of layered security as given in section \ref{section:layered-security}. The rest of this section will therefore focus on securing the UDP transport.
|
|
|
|
Further differences arising from the lack of integrity above the application layer still apply to UDP transport. Although the congestion control layer and therefore packet flow is authenticated, the source and destination of packets are not.
|
|
|
|
\subsubsection{Adapting for NAT}
|
|
|
|
To achieve authentication with the IP Authentication Header, one must authenticate the source and destination addresses of the packet. However, these addresses may be altered by NAT devices in transit between the local and remote portals. In the case of source NAT, the source of an outgoing packet is masqueraded to the public address of the NAT router, likely altering the outgoing port as well. For destination NAT, an inbound packet to a NAT router will have its address changed to the internal destination, possibly changing the destination port too.
|
|
|
|
However, each of these address translations is predictable at the time of packet sending and the time of packet receiving. For a packet that will go through source NAT, the eventual source address is predictable, in that it will be altered from the internal address to the public address of the router. Likewise, with destination NAT, the destination address of the packet will be predictable as the public address of the router that receives it. An example of this is shown in figure \ref{fig:network-address-translations}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{tikzpicture}
|
|
\draw (0, 0) rectangle (3, 1.5) node[midway,align=center] {Host B \\10.172.14.6/24};
|
|
\draw [->] (1.5, 2.5) -- (1.5, 1.5);
|
|
\draw (0, 2.5) rectangle (3, 4) node[midway,align=center] {Dest. NAT \\192.168.1.8/24};
|
|
\draw [->] (1.5, 5) -- (1.5, 4);
|
|
\draw (0, 5) rectangle (3, 6.5) node[midway,align=center] {Source NAT \\192.168.1.9/24};
|
|
\draw [->] (1.5, 7.5) -- (1.5, 6.5);
|
|
\draw (0, 7.5) rectangle (3, 9) node[midway,align=center] {Host A \\172.19.72.12/24};
|
|
|
|
\draw[dashed] (1.5, 2) -- (4, 2);
|
|
\draw[dashed] (4, 0.5) rectangle (7.5, 3.5) node[midway,align=left] {SA: 192.168.1.9\\SP: 31602\\DA: 10.172.14.6\\DP: 2048};
|
|
|
|
\draw[dashed] (1.5, 4.5) -- (-1, 4.5);
|
|
\draw[dashed] (-1, 3) rectangle (-4.5, 6) node[midway,align=left] {SA: 192.168.1.9\\SP: 31602\\DA: 192.168.1.8\\DP: 1024};
|
|
|
|
\draw[dashed] (1.5, 7) -- (4, 7);
|
|
\draw[dashed] (4, 5.5) rectangle (7.5, 8.5) node[midway,align=left] {SA: 172.19.72.12\\SP: 21941\\DA: 192.168.1.8\\DP: 1024};
|
|
\end{tikzpicture}
|
|
\caption{UDP packet passing through source and destination network address translation, and the addresses and ports at each point.}
|
|
\label{fig:network-address-translations}
|
|
\end{figure}
|
|
|
|
Therefore, to authenticate the message's source and destination, the source address and destination address from the period between the NATs will be used. Host A can predict this by using the destination address of the flow transporting the packets and knowledge of its own NATs public IP as the source address. Similarly, host B can predict this by using the source address of the flow transporting the packets and knowledge of its own NATs public IP as the destination address. Although this does mean that the authentication would apply equally to any other device behind both NATs, this is an acceptable compromise for NATs controlled by the user. Achieving sufficient security under a CG-NAT is left as an exercise to the implementer, where the techniques described in section \ref{section:layered-security} can be applied.
|
|
|
|
\subsubsection{Replay Attacks}
|
|
|
|
Replay protection in IP Authentication Headers is achieved by using a sequence number on each packet. This sequence number is monotonically and strictly increasing. The algorithm that I have chosen to implement for this is \emph{IPsec Anti-Replay Algorithm without Bit Shifting} \citep{tsou_ipsec_2012}, also employed in Wireguard \citep{donenfeld_wireguard_2017}.
|
|
|
|
A specific of the multipath nature of this application is requiring the use of a sequence number space that is shared between flows. This is similar to the design pattern of multipath TCP's congestion control, where there is a separation between the sequence number of individual subflows and the sequence number of the data transport as a whole \citep[pp. 11]{wischik_design_2011}.
|
|
|
|
\subsection{Layered Security}
|
|
\label{section:layered-security}
|
|
|
|
It was previously mentioned that this solution focuses on maintaing the higher layer security of proxied packets. Further to this, this solution provides transparent security in the other direction. Consider the case of a satellite office that employs both a whole network corporate VPN and this solution. The network can be configured in each of two cases: the multipath proxy runs behind the VPN, or the VPN runs behind the multipath proxy.
|
|
|
|
These two examples are given in figures \ref{fig:whole-network-vpn-behind} and \ref{fig:whole-network-vpn-infront}, for the VPN Wireguard \citep{donenfeld_wireguard_2017}. In figure \ref{fig:whole-network-vpn-infront}, the portals are only accessible via the VPN protected network. It can be seen that the packet in figure \ref{fig:whole-network-vpn-infront} is shorter, given the removal of the message authentication code and the data sequence number. The data sequence number is unnecessary, given that Wireguard uses the same anti-replay algorithm, and thus replayed packets would have been caught entering the secure network. Further, the message authentication code is unnecessary, as the authenticity of packets is now guaranteed by Wireguard.
|
|
|
|
Supporting and encouraging this layering of protocols provides a second benefit: if the security in this solution breaks with time, there are two options to repair it. One can either fix the open source application, or compose it with a security solution that is not broken, but perhaps provides extraneous security guarantees and therefore causes reduced performance. To this end, the security features mentioned will all be configurable. This allows for flexibility in implementation.
|
|
|
|
\begin{figure}
|
|
\begin{leftfullpage}
|
|
\centering
|
|
\begin{bytefield}[bitwidth=0.6em]{32}
|
|
\bitheader{0-31} \\
|
|
\wordbox[tlr]{1}{IPv4 Header} \\
|
|
\wordbox[blr]{1}{$\cdots$} \\
|
|
\begin{rightwordgroup}{UDP\\Header}
|
|
\bitbox{16}{Source port} & \bitbox{16}{Destination port} \\
|
|
\bitbox{16}{Length} & \bitbox{16}{Checksum}
|
|
\end{rightwordgroup} \\
|
|
\begin{rightwordgroup}{CC\\Header}
|
|
\bitbox{32}{Acknowledgement number} \\
|
|
\bitbox{32}{Negative acknowledgement number} \\
|
|
\bitbox{32}{Sequence number}
|
|
\end{rightwordgroup} \\
|
|
\begin{rightwordgroup}{Proxied\\Wireguard\\Packet}
|
|
\wordbox[tlr]{1}{IPv4 Header} \\
|
|
\wordbox[blr]{1}{$\cdots$} \\
|
|
\begin{leftwordgroup}{UDP Header}
|
|
\bitbox{16}{Source port} & \bitbox{16}{Destination port} \\
|
|
\bitbox{16}{Length} & \bitbox{16}{Checksum}
|
|
\end{leftwordgroup} \\
|
|
\begin{leftwordgroup}{Wireguard\\Header}
|
|
\bitbox{8}{type} & \bitbox{24}{reserved} \\
|
|
\wordbox{1}{receiver} \\
|
|
\wordbox{2}{counter}
|
|
\end{leftwordgroup} \\
|
|
\wordbox[tlr]{1}{Proxied IP packet} \\
|
|
\skippedwords\\
|
|
\wordbox[blr]{1}{}
|
|
\end{rightwordgroup} \\
|
|
\begin{rightwordgroup}{Security\\Footer}
|
|
\bitbox{32}{Data sequence number} \\
|
|
\wordbox[tlr]{1}{Message authentication code} \\
|
|
\wordbox[blr]{1}{$\cdots$}
|
|
\end{rightwordgroup}
|
|
\end{bytefield}
|
|
|
|
\caption{A Wireguard client behind the multipath proxy.}
|
|
\label{fig:whole-network-vpn-behind}
|
|
\end{leftfullpage}
|
|
\end{figure}
|
|
|
|
\begin{figure}
|
|
\begin{fullpage}
|
|
\centering
|
|
\begin{bytefield}[bitwidth=0.6em]{32}
|
|
\bitheader{0-31} \\
|
|
\wordbox[tlr]{1}{IPv4 Header} \\
|
|
\wordbox[blr]{1}{$\cdots$}\\
|
|
\begin{rightwordgroup}{UDP\\Header}
|
|
\bitbox{16}{Source port} & \bitbox{16}{Destination port} \\
|
|
\bitbox{16}{Length} & \bitbox{16}{Checksum}
|
|
\end{rightwordgroup} \\
|
|
\begin{rightwordgroup}{Wireguard\\Header}
|
|
\bitbox{8}{type} & \bitbox{24}{reserved} \\
|
|
\wordbox{1}{receiver} \\
|
|
\wordbox{2}{counter}
|
|
\end{rightwordgroup} \\
|
|
\begin{rightwordgroup}{Tunnelled\\Proxy\\Packet}
|
|
\wordbox[tlr]{1}{IPv4 Header} \\
|
|
\wordbox[blr]{1}{$\cdots$}\\
|
|
\begin{leftwordgroup}{UDP Header}
|
|
\bitbox{16}{Source port} & \bitbox{16}{Destination port} \\
|
|
\bitbox{16}{Length} & \bitbox{16}{Checksum}
|
|
\end{leftwordgroup} \\
|
|
\begin{leftwordgroup}{CC\\Header}
|
|
\bitbox{32}{Acknowledgement number} \\
|
|
\bitbox{32}{Negative acknowledgement number} \\
|
|
\bitbox{32}{Sequence number}
|
|
\end{leftwordgroup} \\
|
|
\wordbox[tlr]{1}{Proxied IP packet} \\
|
|
\skippedwords\\
|
|
\wordbox[blr]{1}{}
|
|
\end{rightwordgroup}
|
|
\end{bytefield}
|
|
|
|
\caption{A Wireguard client in front of the multipath proxy.}
|
|
\label{fig:whole-network-vpn-infront}
|
|
\end{fullpage}
|
|
\end{figure}
|
|
|
|
% -------------------------- Language Selection ---------------------------- %
|
|
\section{Language Selection}
|
|
\label{section:language-selection}
|
|
|
|
In this section, I evaluate three potential languages (C++, Rust and Go) for the implementation of this software. To support this evaluation, I have provided a sample program in each language. The sample program is intended to be a minimal example of reading packets from a TUN interface, placing them in a queue from a single thread, and consuming the packets from the queue with multiple threads. These examples are given in figures \ref{fig:cpp-tun-sample} through \ref{fig:go-tun-sample}. The primary considerations will be the performance of the language, clarity of code of the style needed to complete this software, and the ecosystem of the language. This culminates in choosing Go for the implementation language.
|
|
|
|
Alongside the implementation language, a language is chosen to evaluate the implementation. Two potential languages are considered here, Python and Java. Though Python was initially chosen for rapid development and better ecosystem support, the final result is a combination of both Python and Java - Python for data processing, and Java for systems interaction.
|
|
|
|
\subsection{Implementation Languages}
|
|
\subsubsection{C++}
|
|
|
|
There are two primary advantages to completing this project in C++: speed of execution, and C++ being low level enough to achieve these goals. The negatives of using C++ are demonstrated in the sample script, given in figure \ref{fig:cpp-tun-sample}, where it is immediately obvious that to achieve even the base of this project, the code in C++ is multiple times the length of equivalent code in either Rust or Go, at 93 lines compared to 34 for Rust or 48 for Go. This difference arises from the need to manually implement the required thread safe queue, while it is available as a library for both Rust and Go, and can be handled by the respective package managers. This manual implementation gives rise to additional risk of incorrect implementation, specifically with regards to thread safety, that could cause undefined behaviour and great difficulty debugging.
|
|
|
|
The lack of memory safety in C++ is a significant negative of the language. Although C++ would provide increased performance over a language such as Go with a runtime, it is avoided due to the massive incidental complexity of manual memory management and the difficulty of manual thread safety.
|
|
|
|
\subsubsection{Rust}
|
|
|
|
Rust is memory safe and thread safe, solving the latter issues with C++. Rust also has no runtime, allowing for similar execution speed, comparable to C or C++. The Rust sample is given in figure \ref{fig:rust-tun-sample}, and is pleasantly concise.
|
|
|
|
For the purposes of this project, the downsides of Rust come from its youthfulness. This is two-faceted: IDE support and Crate stability. Firstly, the IDE support for Rust in my IDEs of choice is provided via a plugin to IntelliJ, and is not as well supported as many other languages. Secondly, the crate available for TUN support (tun-tap\footnote{\url{https://docs.rs/tun-tap/}}) does not yet provide a stable API, which was noticed during the development of even this test program. Between writing the program initially and re-testing it to place in this document, the API of the Crate had changed to the point where my script no longer type checked. Further, the old version had disappeared, and thus I was left with a program that didn't compile or function. Although writing the API for TUN interaction is not an issue, the safety benefits of Rust would be less pronounced, as the direct systems interaction would require unsafe code, leading to an increased potential for bugs.
|
|
|
|
\subsubsection{Go}
|
|
|
|
The final language to evaluate is Go, often written as GoLang. The primary difference between Go and the other two evaluated languages is the presence of a runtime. Regardless, it is the language of choice for this project, with a sample provided in figure \ref{fig:go-tun-sample}. Go is significantly higher level than the other two languages mentioned, and provides a memory management model that is both simpler than C++ and more standard than Rust.
|
|
|
|
For the greedy structure of this project, Go's focus on concurrency is extremely beneficial. Go has channels in the standard runtime, which support any number of both producers and consumers. In this project, both SPMC (Single Producer Multi Consumer) and MPSC (Multi Producer Single Consumer) queues are required, so having these provided as a first class feature of the language is beneficial.
|
|
|
|
Garbage collection and first order concurrency come together to make the code produced for this project highly readable. The downside of this runtime is that the speed of execution is negatively affected. However, for the purposes of this first production, that compromise is acceptable. By producing code that makes the functionality of the application clear, future implementations could more easily be built to mirror it. Given the sample of speeds displayed in section (Ref Needed: Introduction Comments on Speed), and the performance shown in section \ref{section:performance-evaluation}, the compromise of using a well-suited high-level language is one worth taking.
|
|
|
|
\subsection{Evaluation Languages}
|
|
\label{section:preparation-language-choices-evaluation}
|
|
|
|
\subsubsection{Python}
|
|
|
|
Python is a dynamically typed, and was chosen as the initial implementation language. The first reason for this is \verb'matplotlib'\footnote{\url{https://matplotlib.org/}}, a widely used graphing library that can produce the graphs needed for this evaluation. The second reason is \verb'proxmoxer'\footnote{\url{https://github.com/proxmoxer/proxmoxer}}, a fluent API for interacting with a Proxmox server.
|
|
|
|
Having the required modules available allowed for a swift initial development sprint. This showed that the method of evaluation was viable and effective. However, the requirements of evaluation changed with the growth of the software, and an important part of an agile process is adapting to changing requirements. The lack of static typing limits the refactorability of Python, and becomes increasingly challenging as the project grows. Therefore, after the initial proof of concept, it became necessary to explore another language for the Proxmox interaction.
|
|
|
|
\subsubsection{Java}
|
|
|
|
Java is statically typed, and became the implementation language for all external interaction. One of the initial reasons for not choosing Java was the availability of an equivalent library to \verb'proxmoxer'. Although two libraries to interact with Proxmox are available for Java, one was released under an incompatible license, and the other does not have adequate type safety. To this end, to develop in Java, I would need to develop my own Proxmox library. However, after the initial development in Python, it became clear that this was a valuable use of time, and thus development began. By developing a type safe Proxmox API library, and having learnt from the initial development in Python, a clear path to producing the appropriate Java libraries was available.
|
|
|
|
However, as Python is an incredibly popular language for data processing, the solution was not to use purely Java. Given the graphing existed already in Python and worked perfectly well, a combined solution with Java gathering the data and Python processing it was chosen.
|
|
|
|
% ------------------------- Requirements Analysis -------------------------- %
|
|
\section{Requirements Analysis}
|
|
\label{section:requirements-analysis}
|
|
|
|
The requirements of the project are detailed in the Success Criteria of the Project Proposal (Appendix \ref{appendix:project-proposal}), and are the primary method of evaluation for project success. They are split into three categories: success criteria, extended goals and stretch goals.
|
|
|
|
The three categories of success criteria can be summarised as follows. The success criteria, or must have elements, are to provide a multi-path proxy that is functional, secure and improves speed and resilience in specific cases. The extended goals, or should have elements, are focused on increasing the performance and flexibility of the solution. The stretch goals, or could have elements, are aimed at increasing performance by reducing overheads, and supporting IPv6 alongside IPv4.
|
|
|
|
% ------------------------- Engineering Approach --------------------------- %
|
|
\section{Engineering Approach}
|
|
\label{section:engineering-approach}
|
|
|
|
\subsubsection{Software Development Model}
|
|
|
|
The development of this software followed the agile methodology. Work was organised into 2-7 day sprints, aiming for increased functionality in the software each time. By focusing on sufficient but not excessive planning, a minimum viable product was quickly established. From there, the remaining features could be extracted in the correct sized segments. Examples of these sprints are: initial build including configuration, TUN adapters and main program; TCP transport, enabling an end-to-end connection between the two parts; repeatable testing, providing the data to evaluate each iteration of the project against its success criteria; UDP for performance and control.
|
|
|
|
One of the most important features of any agile methodology is welcoming changing requirements \citep{beck_manifesto_2001}. As the project grew, it became clear where shortcomings existed, and these could be fixed in short sprints. An example is given in figure \ref{fig:changing-requirements}, in which the type of a variable was changed from \mintinline{go}{string} to \mintinline{go}{func() string}. This allowed for lazy evaluation, when it became clear that configuring fixed IP addresses or DNS names could be impractical with certain setups. The static typing in the chosen language enables refactors like this to be completed with ease, particularly with the development tools mentioned in the next section, reducing the incidental complexity of the agile methodology.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{subfigure}[t]{0.45\textwidth}
|
|
\centering
|
|
\inputminted{go}{2_Preparation/Samples/string.go}
|
|
\caption{The structure with a fixed local address.}
|
|
\end{subfigure}
|
|
\begin{subfigure}[t]{0.45\textwidth}
|
|
\centering
|
|
\inputminted{go}{2_Preparation/Samples/funcstring.go}
|
|
\caption{The structure with a dynamic local address.}
|
|
\end{subfigure}
|
|
\caption{An example of refactoring for changing requirements.}
|
|
\label{fig:changing-requirements}
|
|
\end{figure}
|
|
|
|
\subsubsection{Development Tools}
|
|
|
|
A large part of the language choice focused on development tools. As discussed in section \ref{section:language-selection}, IDE support was important to me. My preferred IDEs are those supplied by JetBrains\footnote{\url{https://jetbrains.com/}}, generously provided for education and academic research free of charge. As such, I used GoLand for the Go development of this project, IntelliJ for the Java evaluation development, and PyCharm for the Python evaluation program. Using an intelligent IDE, particularly with the statically typed Go and Java, significantly increases my productivity as a programmer, and thus reduces incidental complexity.
|
|
|
|
I used Git version control, with a self-hosted Gitea\footnote{\url{https://gitea.com/}} server as the remote. My repositories have a multitude of on- and off-site backups at varying frequencies (2xUSB + 2xDistinct Cloud Storage + NAS + Multiple Computers).
|
|
|
|
Alongside my self-hosted Gitea server, I have a self hosted Drone by Harness\footnote{\url{http://drone.io/}} server for continuous integration. This made it simple to add a Drone file to the repository, allowing for the Go tests to be run, formatting verified, and artefacts built. On a push, after the verification, each artefact is built and uploaded to a central repository, where it is saved for the branch name. This is particularly useful for automated testing, as the relevant artefact can be downloaded automatically from a known location for the branch under test. Further, artefacts can be built for multiple architectures, particularly useful when performing real world testing spread between x86\_64 and ARMv7 architectures.
|
|
|
|
\subsubsection{Licensing}
|
|
|
|
I have chosen to license this software under the MIT license. The MIT license is simple and permissive, enabling reuse and modification of the code, subject to including the license. Alongside the hopes that the code will receive updated pull requests over time, a permissive license allows others to build upon the given solution. A potential example of a solution that could build from this is a company employing a SaaS (Software as a Service) model to configure remote portals on your behalf, perhaps including the hardware required to convert this fairly involved solution into a plug-and-play option.
|
|
|
|
% ---------------------------- Starting Point ------------------------------ %
|
|
\section{Starting Point}
|
|
|
|
I had significant experience with the language Go before the start of this project, though not formally taught. My knowledge of networking is limited to that of a user, and the content of the Part IB Tripos courses \emph{Computer Networking} and \emph{Principles of Communication} (the latter given after the start of this project). The security analysis drew from the Part IA course \emph{Software and Security Engineering} and the Part IB course \emph{Security}. As the software is highly concurrent, the Part IB course \emph{Concurrent and Distributed Systems} and the Part II Unit of Assessment \emph{Multicore Semantics and Programming} were applied.
|
|
|
|
% -------------------------------- Summary --------------------------------- %
|
|
\section{Summary}
|
|
|
|
Security is a large area in this project - perhaps more than the single success criteria suggests. This preparation has led to two clear concepts in security: the system must be adaptable in code, and flexible in deployment. Being adaptable allows more options to be provided in the future, while deployment flexibility allows the solution to better fit into a network with special security requirements.
|
|
|
|
Go has a concurrency structure excellently suited to this project, and the large library reduces incidental complexity. Using a high level language allows for more readable code, which future implementations in a lower level language could effectively build off of. The structure of this project suggests a large initial program base, from which further features can be merged to reach the success criteria.
|