Update on Overleaf.

This commit is contained in:
jsh77 2021-03-02 12:02:16 +00:00 committed by overleaf
parent 549e714736
commit 9c7d7b5f6f

View File

@ -10,7 +10,7 @@
\graphicspath{{Preparation/Figs/Vector/}{Preparation/Figs/}}
\fi
Proxying packets is the process of taking packets that arrive at one location and transporting them to leave at another. This chapter focuses on the preparatory work to achieve this practically and securely, given the design outlined in the previous chapter, in which the proxy consolidates multiple connections to appear as one to both the wider Internet and devices on the local network. In sections \ref{section:risk-analysis} to \ref{section:preparation-security}, I discuss the security risks and plans to confront them. In section \ref{section:language-selection}, I present three languages: Go, Rust and C++, and provide context for choosing Go as the implementation language. Finally, in sections \ref{section:requirements-analysis} and \ref{section:engineering-approach}, I present a requirements analysis and a description of the engineering approach for the project.
Proxying packets is the process of taking packets that arrive at one location and transporting them to leave at another. This chapter focuses on the preparatory work to achieve this practically and securely, given the design outlined in the previous chapter, in which the proxy consolidates multiple connections to appear as one to both the wider Internet and devices on the local network. In sections \ref{section:risk-analysis} and \ref{section:preparation-security}, I discuss the security risks and plans to confront them. In section \ref{section:language-selection}, I present three languages: Go, Rust and C++, and provide context for choosing Go as the implementation language. Finally, in sections \ref{section:requirements-analysis} and \ref{section:engineering-approach}, I present a requirements analysis and a description of the engineering approach for the project.
% ---------------------------- Risk Analysis ------------------------------- %
\section{Risk Analysis}
@ -18,13 +18,13 @@ Proxying packets is the process of taking packets that arrive at one location an
Any connection between two computers presents a set of security risks. A proxy consists of both these risks, and further, which I will present and discuss in this section. Firstly, this focuses on layered security. This is the case of the Local Portal and Remote Portal, with everything in between, being viewed as an Internet connection. The focus is on how the risks compare to that of a standard Internet connection, and what guarantees must be made to achieve the same risks for a proxied connection as for a standard connection.
Secondly, this section focuses on the connections between the Local Portal and the Remote Portal. This section focuses primarily on the risk of accepting and transmitting a packet that is not intended, or sending packets to an unintended recipient.
Secondly, this section focuses on the connections between the Local Portal and the Remote Portal. This focuses primarily on the risk of accepting and transmitting a packet that is not intended, or sending packets to an unintended recipient.
These security problems will be considered in the context of the success criteria: provide security no worse than a standard connection. That is, the security should be identical or stronger than the threats in the first case, and provide no additional vectors of attack in the second.
\subsection{Higher Layer Security}
This application proxies entire IP packets, so is a Layer 3 solution. As such, the goal is to maintain the same guarantees a standard Layer 3 connection for higher layers to build off of. At layer 3, none of anonymity, integrity, privacy or freshness are guaranteed, so it is up to the application to provide its own security guarantees. As such, maintaining the same level of security for applications can be achieved by ensuring that the the packets which leave one side of a proxy are a subset of the packets that entered the other side.
This application proxies entire IP packets, so is a Layer 3 solution. As such, the goal is to maintain the same guarantees that one would normally expect at layer 3, for higher layers to build off of. At layer 3, none of anonymity, integrity, privacy or freshness are guaranteed, so it is up to the application to provide its own security guarantees. As such, maintaining the same level of security for applications can be achieved by ensuring that the the packets which leave one side of a proxy are a subset of the packets that entered the other side.
This ensures that guarantees managed by layers above layer 3 are maintained. Regardless of whether a user is accessing insecure websites over HTTP, running a corporate VPN connection or sending encrypted emails, the security of these applications will be unaltered. Further, this allows other guarantees to be managed, including reliable delivery with TCP.
@ -33,7 +33,7 @@ This ensures that guarantees managed by layers above layer 3 are maintained. Reg
\subsubsection{Denial of Service}
\label{subsubsection:threats-denial-of-service}
Proxying packets in this way provides a new method of Denial of Service. If an attacker can convince either portal to send them a portion of the packets due for the other portal, the packet loss of the overall connection from the perspective of the other portal will increase by an equivalent portion. For example, if a bad actor can convince the remote portal to send them $50\%$ of the good packets, and previously the packet loss was at $0.2\%$, the packet loss will increase to $50.1\%$.
Proxying packets in this way provides a new method of Denial of Service. If an attacker can convince either portal to send them a portion of the packets due for the other portal, the packet loss of the overall connection from the perspective of the other portal will increase by an equivalent amount. For example, if a bad actor can convince the remote portal to send them $50\%$ of the good packets, and previously the packet loss was at $0.2\%$, the packet loss will increase to $50.1\%$.
This is of particular concern for flows carried by the proxy that use loss based congestion control. In such a case, for every 2 packets a TCP flow sends, it will lose one on average. This means that the window size will be unable to grow beyond one with NewReno congestion control. As such, the performance of these flows will be severely negatively impacted.
@ -41,13 +41,13 @@ However, even if only $25\%$ of the packets are lost, NewReno would still fail t
\subsubsection{Privacy}
Though the packets leaving a modem have no reasonable expectation of privacy, having the packets enter the Internet at two points does increase this vector. However, this is equivalent to your packets taking a longer route through the Internet, with more hops. Therefore, comparatively, this is not worse.
Though the packets leaving a modem have no reasonable expectation of privacy, having the packets enter the Internet at two points does present more points at which a packet can be read. For example, if the remote portal lies in a data center, the content and metadata of packets can be sniffed in either the data center or at the physical connections. However, this is equivalent to your packets taking a longer route through the Internet, with more hops. Therefore, comparatively, this is not worse.
Further, if an attacker convinces the Remote Portal that they are a valid connection from the Local Portal, a portion of packets will be sent to them. However, as a fortunate side effect, this method to attempt sniffing would cause a significant Denial of Service to any congestion controlled links based on packet loss, due to the amount of packet loss caused. Therefore, as long as it is ensured that each packet is not sent to multiple places, privacy should be maintained at a similar level to simple Internet access.
Further, if an attacker convinces the Remote Portal that they are a valid connection from the Local Portal, a portion of packets will be sent to them. However, as a fortunate side effect, this method to attempt sniffing would cause a significant Denial of Service to any congestion controlled links based on packet loss, as shown in the previous segment. Therefore, as long as it is ensured that each packet is not sent to multiple places, privacy should be maintained at a similar level to simple Internet access, given that an eavesdropper using this active eavesdropping method will be very easy to detect.
\subsubsection{Cost}
In many cases, the remote portal will be taking advantage of a cloud instance, for the extremely high bandwidth and well peered connections available at a reasonable price. Cloud instances are often billed per GB of outbound traffic, and as such, an attacker could cause a user to carry a high cost burden by forcing their remote portal to transmit more outbound data.
In many cases, the remote portal will be taking advantage of a cloud instance, for the extremely high bandwidth and well peered connections available at a reasonable price. Cloud instances are often billed per unit of outbound traffic, and as such, an attacker could cause a user to carry a high cost burden by forcing their remote portal to transmit more outbound data. This should be avoided by ensuring that the packets transmitted out are both from the local portal and fresh.
% ------------------------------- Security --------------------------------- %
\section{Security}
@ -59,7 +59,7 @@ This section provides means of alleviating the risks given in section \ref{secti
To provide integrity and freshness for each message, I evaluate two choices: Message Authentication Codes (MACs) or Digital Signatures. A MAC combines the data with a shared key using a specific method, before using a one-way hash function to generate a message authentication code, and thus the result is only verifiable by someone with the same private key \citep[pp. 352]{menezes_handbook_1997}. Producing a digital signature for a message uses the private key in public/private keypair to produce a digital signature for a message, proving that the message was produced by the owner of the private key, which can be verified by anyone with the public key \citep[pp. 147-149]{anderson_security_2008}. In both cases, the message authentication code is appended to the message, such that the integrity and authenticity of the message can be verified.
The comparison is as such: signatures provide non-repudiation, while MACs do not - one can know the owner of which private key signed a message, while anyone with the shared key could have produced an MAC for a message. The second point is that digital signatures are much more computationally complex than MACs, and thus, given that the control of both ends lies with the same party, MAC is the message authentication of choice for this project.
The comparison is as such: signatures provide non-repudiation, while MACs do not - one can know the owner of which private key signed a message, while anyone with the shared key could have produced an MAC for a message. The second point is that digital signatures are generally more computationally complex than MACs, and thus, given that the control of both ends lies with the same party, MAC is the message authentication of choice for this project.
\subsection{IP Authentication Header}
@ -83,7 +83,7 @@ Overall, using the IP authentication header would function similarly to running
\label{fig:ip-auth-header-structure}
\end{figure}
It is first important to note the differences between the use of IP authentication headers and the security footers used in this application. Firstly, the next header field is unnecessary, given that headers are not being chained. Secondly, given the portals having a fixed security configuration by static configuration, the payload length field is unnecessary - the payloads will always be of a predetermined length. Similarly, the security parameters index is unnecessary, as the parameters will be equal.
It is first important to note the differences between the use of IP authentication headers and the security footers used in this application. Firstly, the next header field is unnecessary, given that headers are not being chained. Secondly, given the portals have a fixed security configuration by static configuration, the payload length field is unnecessary - the payloads will always be of a predetermined length. Similarly, the security parameters index is unnecessary, as the parameters will be equal.
The difference in security arises from the lack of integrity given to the fields above the application layer. That is, the IP header itself, and the TCP or UDP header. However, there is an important distinction between the TCP and UDP cases: TCP congestion control will not be covered by any application provided security, while the UDP congestion control will. That is, this application can do nothing to authenticate the ACKs of a TCP connection, as these are created outside of the control of the application. As such, the TCP implementation provided by the solution should be used in one of two ways: as a baseline test for the performance of other algorithms, or taking advantage of layered security as given in section \ref{section:layered-security}. The rest of this section will therefore focus on securing the UDP transport.
@ -91,9 +91,9 @@ Further differences arising from the lack of integrity above the application lay
\subsubsection{Adapting for NAT}
To achieve authentication with the IP Authentication Header, one must authenticate the source and destination addresses of the packet. However, these addresses may be altered by NAT devices in transit between the local and remote portals. In the case of source NAT, the source of an outgoing packet is masqueraded to the public address of the NAT router, likely altering the outgoing port as well. For destination NAT, an inbound packet to a NAT router will have its address changed to the internal destination, possibly changing the destination port too, though this is likely configurable via port forwarding.
To achieve authentication with the IP Authentication Header, one must authenticate the source and destination addresses of the packet. However, these addresses may be altered by NAT devices in transit between the local and remote portals. In the case of source NAT, the source of an outgoing packet is masqueraded to the public address of the NAT router, likely altering the outgoing port as well. For destination NAT, an inbound packet to a NAT router will have its address changed to the internal destination, possibly changing the destination port too.
However, each of these translations is predictable at the time of packet sending and the time of packet receiving. For a packet that will go through source NAT, the eventual source address is predictable, in that it will be altered from the internal address to the public address of the router. Likewise, with destination NAT, the destination address of the packet will be predictable as the public address of the router that receives it. An example of this is shown in figure \ref{fig:network-address-translations}.
However, each of these address translations is predictable at the time of packet sending and the time of packet receiving. For a packet that will go through source NAT, the eventual source address is predictable, in that it will be altered from the internal address to the public address of the router. Likewise, with destination NAT, the destination address of the packet will be predictable as the public address of the router that receives it. An example of this is shown in figure \ref{fig:network-address-translations}.
\begin{figure}
\centering
@ -119,7 +119,7 @@ However, each of these translations is predictable at the time of packet sending
\label{fig:network-address-translations}
\end{figure}
Therefore, to authenticate the message's source and destination, the source address and destination address from the period between the NATs will be used. Host A can predict this by using the destination address of the flow transporting the packets and knowledge of its own NAT's public IP as the source address. Similarly, host B can predict this by using the source address of the flow transporting the packets and knowledge of its own NAT's public IP as the destination address. Although this does mean that the authentication would apply equally to any other device behind both NAT's, this is an acceptable compromise for NAT's controlled by the user. Achieving sufficient security under a CG-NAT is left as an exercise to the implementer, where the techniques described in section \ref{section:layered-security} can be applied.
Therefore, to authenticate the message's source and destination, the source address and destination address from the period between the NATs will be used. Host A can predict this by using the destination address of the flow transporting the packets and knowledge of its own NATs public IP as the source address. Similarly, host B can predict this by using the source address of the flow transporting the packets and knowledge of its own NATs public IP as the destination address. Although this does mean that the authentication would apply equally to any other device behind both NATs, this is an acceptable compromise for NATs controlled by the user. Achieving sufficient security under a CG-NAT is left as an exercise to the implementer, where the techniques described in section \ref{section:layered-security} can be applied.
\subsubsection{Replay Attacks}
@ -130,9 +130,9 @@ A specific of the multipath nature of this application is requiring the use of a
\subsection{Layered Security}
\label{section:layered-security}
It was previously mentioned that this solution focuses on providing transparent security for the proxied packets. Further to this, this solution provides transparent security in the other direction. Consider the case of a satellite office that employs both a whole network corporate VPN and this solution. The network can be configured in each of two cases: the multipath proxy runs behind the VPN, or the VPN runs behind the multipath proxy.
It was previously mentioned that this solution focuses on maintaing the higher layer security of proxied packets. Further to this, this solution provides transparent security in the other direction. Consider the case of a satellite office that employs both a whole network corporate VPN and this solution. The network can be configured in each of two cases: the multipath proxy runs behind the VPN, or the VPN runs behind the multipath proxy.
These two examples are given in figures \ref{fig:whole-network-vpn-behind} and \ref{fig:whole-network-vpn-infront}, for the VPN Wireguard \citep{donenfeld_wireguard_2017}. In this setup, it is assumed that the portals are only accessible via the VPN protected network. It can be seen that the packet in figure \ref{fig:whole-network-vpn-infront} is shorter, given the removal of the message authentication code and the data sequence number. The data sequence number is unnecessary, given that Wireguard uses the same anti-replay algorithm, and thus replayed packets would have been caught entering the secure network. Further, the message authentication code is unnecessary, as the authenticity of packets is now guaranteed by Wireguard.
These two examples are given in figures \ref{fig:whole-network-vpn-behind} and \ref{fig:whole-network-vpn-infront}, for the VPN Wireguard \citep{donenfeld_wireguard_2017}. In figure \ref{fig:whole-network-vpn-infront}, the portals are only accessible via the VPN protected network. It can be seen that the packet in figure \ref{fig:whole-network-vpn-infront} is shorter, given the removal of the message authentication code and the data sequence number. The data sequence number is unnecessary, given that Wireguard uses the same anti-replay algorithm, and thus replayed packets would have been caught entering the secure network. Further, the message authentication code is unnecessary, as the authenticity of packets is now guaranteed by Wireguard.
Supporting and encouraging this layering of protocols provides a second benefit: if the security in this solution breaks with time, there are two options to repair it. One can either fix the open source application, or compose it with a security solution that is not broken, but perhaps provides extraneous security guarantees and therefore causes reduced performance. To this end, the security features mentioned will all be configurable. This allows for flexibility in implementation.
@ -236,7 +236,7 @@ The lack of memory safety in C++ is a significant negative of the language. Alth
\subsubsection{Rust}
Rust is memory safe and thread safe, solving the latter issues with C++. Rust also has no runtime, allowing for similar execution speed, comparable to C or C++. The Rust sample is given in figure \ref{fig:rust-tun-sample}, and it is pleasantly concise.
Rust is memory safe and thread safe, solving the latter issues with C++. Rust also has no runtime, allowing for similar execution speed, comparable to C or C++. The Rust sample is given in figure \ref{fig:rust-tun-sample}, and is pleasantly concise.
For the purposes of this project, the downsides of Rust come from its youthfulness. This is two-faceted: IDE support and Crate stability. Firstly, the IDE support for Rust in my IDEs of choice is provided via a plugin to IntelliJ, and is not as well supported as many other languages. Secondly, the crate available for TUN support (tun-tap\footnote{\url{https://docs.rs/tun-tap/}}) does not yet provide a stable API, which was noticed during the development of even this test program. Between writing the program initially and re-testing it to place in this document, the API of the Crate had changed to the point where my script no longer type checked. Further, the old version had disappeared, and thus I was left with a program that didn't compile or function. Although writing the API for TUN interaction is not an issue, the safety benefits of Rust would be less pronounced, as the direct systems interaction would require unsafe code, leading to an increased potential for bugs.
@ -299,11 +299,11 @@ One of the most important features of any agile methodology is welcoming changin
\subsubsection{Development Tools}
A large part of the language choice focused on development tools. As discussed in section \ref{section:language-selection}, IDE support was important to me. Given that my preferred IDEs are those supplied by JetBrains\footnote{\url{https://jetbrains.com/}}, generously provided for education and academic research free of charge, I used GoLand for the Go development of this project, and PyCharm for the Python evaluation program. Using an intelligent IDE, particularly with the statically typed Go, significantly increases my productivity as a programmer, and thus reduces incidental complexity.
A large part of the language choice focused on development tools. As discussed in section \ref{section:language-selection}, IDE support was important to me. My preferred IDEs are those supplied by JetBrains\footnote{\url{https://jetbrains.com/}}, generously provided for education and academic research free of charge. As such, I used GoLand for the Go development of this project, IntelliJ for the Java evaluation development, and PyCharm for the Python evaluation program. Using an intelligent IDE, particularly with the statically typed Go and Java, significantly increases my productivity as a programmer, and thus reduces incidental complexity.
I used Git version control, with a self-hosted Gitea\footnote{\url{https://gitea.com/}} server as the remote. My repositories have a multitude of on- and off-site backups at varying frequencies (2xUSB + 2xDistinct Cloud Storage + NAS + Multiple Computers).
Alongside my self-hosted Gitea server, I have a self hosted Drone by Harness\footnote{\url{http://drone.io/}} server for continuous integration. This made it simple to add a Drone file to the repository, allowing for the Go tests to be run, formatting verified, and artifacts built. On a push, after the verification, each artefact is built and uploaded to a central repository, where it is saved for the branch name. This is particularly useful for automated testing, as the relevant artefact can be downloaded automatically from a known location for the branch under test. Further, artefacts can be built for multiple architectures, particularly useful when performing real world testing spread between x86\_64 and ARMv7 architectures.
Alongside my self-hosted Gitea server, I have a self hosted Drone by Harness\footnote{\url{http://drone.io/}} server for continuous integration. This made it simple to add a Drone file to the repository, allowing for the Go tests to be run, formatting verified, and artefacts built. On a push, after the verification, each artefact is built and uploaded to a central repository, where it is saved for the branch name. This is particularly useful for automated testing, as the relevant artefact can be downloaded automatically from a known location for the branch under test. Further, artefacts can be built for multiple architectures, particularly useful when performing real world testing spread between x86\_64 and ARMv7 architectures.
\subsubsection{Licensing}