Update on Overleaf.

2020-12-31 20:32:32 +00:00 · 2020-12-31 20:32:32 +00:00 · ae970549cf
commit ae970549cf
parent bde59e2194
6 changed files with 121 additions and 30 deletions
--- a/Implementation/implementation.tex
+++ b/Implementation/implementation.tex
@ -252,3 +252,65 @@ TCP requires further work than UDP, as the TCP handshake is out of the control o

 TODO

+\subsection{Repeated Packets}
+\label{section:implementation-repeated-packets}
+
+As discussed in section \ref{section:preparation-repeated-packets}, care must be taken to avoid a bad actor repeating packets. Although some degree of freshness is provided by including a timestamp in the hashed packet data, it is inadequate in the case of a determined attacker. To resolve this, a data structure is built, given in figure \ref{fig:data-structure-hash-store}. The data structure has a focus on execution speed, as this check could quickly become costly. To achieve this, the data structure combines both a map, implemented efficiently in the Go runtime, and a Binary Heap. This provides the time complexity given in figure \ref{fig:time-complexity-hash-store}. The code given, although almost syntactically correct, is pseudocode: Go's lack of support for generic programming unfortunately means that \verb'BinaryHeap[*packetStore]' is not valid Go code, and must be replaced with a data structure explicitly for that type, such as \verb'PacketStoreBinaryHeap', written specifically for the \verb'*packetStore' type.
+
+\begin{figure}
+    \begin{minted}{go}
+type packetStore struct {
+  h [16]byte
+  t time.Time
+}
+
+type HashStore struct {
+  m map[[16]byte]*packetStore
+  h BinaryHeap[*packetStore]
+}
+
+func (s *HashStore) Size() int {
+  return len(s.m)
+}
+
+func (s *HashStore) RemoveOldest() {
+  o := h.Pop()
+  delete m[o.h]
+}
+
+func (s *HashStore) Store(h [16]byte, t time.Time) (existed bool) {
+  p, found := s.m[h]
+  if found {
+    p.t = t
+    s.h.Update(p)
+    return true
+  }
+  
+  p := &packetStore{h: h, t: t}
+  s.m[h] = p
+  s.h.Insert(p)
+  
+  return false
+}
+    \end{minted}
+    \caption{Pseudocode for the implementation of HashStore, a mechanism of storing a number of most recent hashes in a Set}
+    \label{fig:data-structure-hash-store}
+\end{figure}
+
+\begin{figure}
+    \centering
+    \begin{tabular}{c|c|c}
+         Method       & Fresh Packet   & Repeated Packet \\
+         \hline
+         Store        & $O(1)$         & $O(n)$          \\
+         \hline
+         Size         & \multicolumn{2}{c}{$O(1)$}       \\
+         \hline
+         RemoveOldest & \multicolumn{2}{c}{$O(log n)$}   \\
+         \hline
+    \end{tabular}
+    \caption{Caption}
+    \label{fig:time-complexity-hash-store}
+\end{figure}
+
+In the stable state, each of \verb'Store', \verb'Size' and \verb'RemoveOldest' are called for each received packet. Each packet will first call Store, before diverging based on freshness. A repeated packet will update the store, at a cost of $O(n)$ - this is due to the cost of traversing the heap in order to find the correct entry to heapify from. Although this is a particularly expensive operation, taking linear time, this is deemed acceptable. This is for two reasons: the average case is far better, and the code can react in other ways to prevent repeat packets. The average case is better due to the layout of a heap - the least least item in a heap must be in only a few positions.
--- a/Introduction/introduction.tex
+++ b/Introduction/introduction.tex
@ -11,8 +11,6 @@
    \graphicspath{{Introduction/Figs/Vector/}{Introduction/Figs/}}
 \fi

-\section{Motivation}
-
 Most UK residential broadband speeds receive broadband speeds advertised at between 30Mbps and 100Mbps download (Ofcom, “UK Home Broadband Performance.”, \cite{ofcom_performance_2020}). However, it is often possible to have multiple low bandwidth connections installed. More generally, a wider variety of Internet connections for fixed locations are becoming available with time. These include: DSL, Fibre To The Premises, 4G, 5G, Wireless ISPs such as LARIAT and Low Earth Orbit ISPs such as Starlink. This work focuses on a method of providing an aggregate link from multiple distinct connections, regardless of their likeness.

 \section{Existing Work}
--- a/Preparation/Figs/Middlebox.png
+++ b/Preparation/Figs/Middlebox.png
--- a/Preparation/preparation.tex
+++ b/Preparation/preparation.tex
@ -10,14 +10,15 @@
    \graphicspath{{Preparation/Figs/Vector/}{Preparation/Figs/}}
 \fi

-Proxying packets is the process of taking packets arriving at one location and transporting them to leave at another. This chapter focuses on the preparatory work to achieve this correctly, given the design laid out in the previous chapter. In sections \ref{section:threat-model} and \ref{section:preparation-security}, I discuss the security threats and plans to confront them. In section 2.3, I present three languages: Go, Rust and C++ - choosing the most appropriate.  Finally, in sections \ref{section:requirements-analysis} and \ref{section:engineering-approach}, I present a requirements analysis and a summary of the engineering approach for the project.
+Proxying packets is the process of taking packets arriving at one location and transporting them to leave at another. This chapter focuses on the preparatory work to achieve this correctly, given the design laid out in the previous chapter. In sections \ref{section:risk-analysis} to \ref{section:preparation-security}, I discuss the security threats and plans to confront them. In section \ref{section:language-selection}, I present three languages: Go, Rust and C++ - providing context for choosing Go as the implementation language for this project.  Finally, in sections \ref{section:requirements-analysis} and \ref{section:engineering-approach}, I present a requirements analysis and a description of the engineering approach for the project.

 % ---------------------------- Risk Analysis ------------------------------- %
 \section{Risk Analysis}
+\label{section:risk-analysis}

-Proxying a network connection via a Remote Portal creates an expanded set of security threats than connecting directly to the Internet via a modem. In this section, I will analyse these threats, in both isolation, and compared to the case of connecting directly.
+Proxying a network connection via a Remote Portal creates an expanded set of security risks than connecting directly to the Internet via a modem. In this section, I will analyse these risks, in both isolation, and compared to the case of connecting directly.

-The first focus of this analysis is the transparent security. That is, if the Local Portal is treated as a modem, what security would normally be expected? And for servers communicating with the Remote Portal, what guarantees can they expect of the packets sent and received?
+The first focus of this analysis is the transparent security. That is, if the Local Portal is treated as a modem, what security can normally be expected? And for servers communicating with the Remote Portal, what guarantees can they expect of the packets sent and received?

 The second focus is the direct interaction between the Local Portal and the Remote Portal. Questions like, does having this system make it easier for someone to perform a Denial of Service attack on the principal?

@ -25,9 +26,9 @@ These security problems will be considered in the context of the success criteri

 \subsection{Transparent Security}

-A convenient factor of the Internet being an interconnected set of smaller networks is that there are very few guarantees of security. At layer 3, none of anonymity, integrity, privacy or freshness are provided once the packet leaves private ranges, so it is up to the application to ensure its own security on top of this lack of guarantees. For the purposes of this software, this is very useful: if there are no guarantees to maintain, applications can be expected to act correctly regardless of how easy it is for these cases to occur.
+A convenient factor of the Internet being an interconnected set of smaller networks is that there are very few guarantees of security. At layer 3, none of anonymity, integrity, privacy or freshness are provided, so it is up to the application to ensure its own security on top of this lack of guarantees. For the purposes of this software, this is very useful: if there are no guarantees to maintain, applications can be expected to act correctly regardless of how easy it is for these cases to occur.

-Therefore, to maintain the same level of security for applications, this project can simply guarantee that the packets which leave the Remote Portal are the same as those that came in. By doing this, all of the security implemented above Layer 3 will be maintained. This means that whether a user is accessing insecure websites over HTTP, running a corporate VPN connection or sending encrypted emails, the security of these applications will be unaltered.
+Therefore, to maintain the same level of security for applications, this project can simply guarantee that the set of packets which leave the Remote Portal is a subset of those that entered the Local Portal, and vice versa. By doing this, all of the security implemented above Layer 3 will be maintained. This means that whether a user is accessing insecure websites over HTTP, running a corporate VPN connection or sending encrypted emails, the security of these applications will be unaltered.

 \subsection{Portal to Portal Communication}

@ -88,19 +89,21 @@ Though the packets leaving a modem have no reasonable expectation of privacy, ha
 \section{Threat Model}
 \label{section:threat-model}

+In this section, we discuss a set of threats that expose the risk discussed in section \ref{section:risk-analysis}.
+
 \subsection{Stealing Packets}

-This section focuses on a bad actor that prevents packets arriving at the Remote Portal reaching the Local Portal. Recall, as stated in the Risk Analysis section, that this is high risk - taking packets causes significant packet loss and thus effectively denies service for loss based congestion control mechanisms.
+This section focuses on a bad actor that prevents packets arriving at the Remote Portal reaching the Local Portal. Recall, as stated in the Risk Analysis section, that this is high risk - taking packets causes significant packet loss and thus effectively denies service for loss based congestion control mechanisms. Note that this excludes methods such as cutting cables, which would apply equally without this solution.

 \subsubsection{Reflection Attacks}

 An attack vector for someone attempting to read but not write packets is a reflection attack. A Reflection Attack is an attack where an attacker is able to provide the challenge you have given either to the victim itself or a friend of the victim, to receive the correct challenge response (Anderson 2008; pp. 76-78).

 \begin{align*}
-    F &\longrightarrow B : N \\
-    B &\longrightarrow F : N \\
-    B &\longrightarrow F: \{N\}_k \\
-    F &\longrightarrow B: \{N\}_k
+    A \longrightarrow M &: N \\
+    M \longrightarrow A &: N \\
+    A \longrightarrow M &: \{N\}_k \\
+    M \longrightarrow A &: \{N\}_k
 \end{align*}

 \subsection{Sending Fresh Packets}
@ -109,31 +112,59 @@ A Bad Speaker is a bad actor that sends packets of their own to one of the serve

 \subsection{Repeating Packets}

-A bad actor having the ability to cause the Remote Portal to resend a packet relates to the Cost section of the Risk Analysis given above. As each packet forwarded has an essentially fixed cost, repeating these packets one or many times can cost the subject.
+A bad actor having the ability to cause the Remote Portal to resend a packet relates to the Cost section of the Risk Analysis given above. As each packet forwarded has an essentially fixed cost, repeating these packets one or many times can cost the subject. This threat exists if a message does not successfully guarantee freshness.

 \subsubsection{Man in the Middle}

-This threat is based in an actor wishing to force cost upon you, and having significantly faster a connection between their machine and the Remote Portal.
+This threat is based on an actor wishing to force cost upon you. In the example layout given in figure \ref{fig:mitm-middlebox}, the middlebox can sniff the packets from anywhere on the path(s) between the Local Portal and the Remote Portal. In this case, the middlebox could increase the traffic leaving the remote portal by 100x at full load, or even more at less load, increasing the cost by also 100x.
+
+\begin{figure}
+    \centering
+    \includegraphics[width=12cm]{Middlebox.png}
+    \caption{A middlebox placed to perform a repeating packet man in the middle attack between a local and remote portal}
+    \label{fig:mitm-middlebox}
+\end{figure}

 % ------------------------------- Security --------------------------------- %
 \section{Security}
 \label{section:preparation-security}

-The security of this application is designed to target the threats mentioned in the threat model.
+This section provides means of confronting the threats given in section \ref{section:threat-model}, in order to alleviate the additional risk of proxying traffic.

-TODO
+\subsection{Message Authentication}

-\subsection{Symmetric Key Cryptography}
+When providing integrity and authentication for a message, there are two main choices: a Hash-based Message Authentication Code (HMAC) or Signing. Signing a message uses the private key in public/private keypair to produce a digital signature for a message, stating that the message was produced by the owner of the private key. This can be verified by anyone with the public key. An HMAC instead prefaces the data with a shared key, before 

-When providing integrity and authentication for a message, there are two main choices: a Message Authentication Code (MAC) or signing.
+\begin{align*}
+    A \longrightarrow B &: N_1, A \\
+    B \longrightarrow A &: \{N_1, A, B, N_2\}_k \\
+    A \longrightarrow B &: \{A, B, N_2\}_k
+\end{align*}

 TODO: Finish this section.

+\subsection{Repeated Packets}
+\label{section:preparation-repeated-packets}
+
+Although a timestamp is included with each packet, the time delay between the packet being dispatched by one side and received by the other is significant. As this is the case, there must be significant flexibility in how old a received packet can be - chosen to be 5 seconds. An attacker, as pictured in figure \ref{fig:mitm-middlebox}, could therefore send a number of packets only limited by their own bandwidth, if they can gain a fresh packet at least once every 5 seconds.
+
+The cryptography above can also be used to prevent repeated packets. As each message is already hashed, this can be done with little additional computational load. This hash can then be compared against future packets, and if the hash is found, the packet discarded. It is clear that the system cannot store the hash of every packet received over its lifetime - that's 16 bytes per packet, which would rapidly consume memory. Therefore, the choice of data structure to provide a replay prevention mechanism is important. The data structure should confirm to the following interface:
+
+\begin{minted}{go}
+type HashStore interface {
+    Store(h [16]byte, t time.Time) bool
+    Size()                         int
+    ClearOldest()
+}
+\end{minted}
+
+This data structure functions as a set, storing and informing whether a hash is already present. It allows deletion of the least recently received hash, providing a simple mechanism of limiting the maximum memory footprint. An efficient implementation of this is presented in section \ref{section:implementation-repeated-packets}.
+
 % -------------------------- Language Selection ---------------------------- %
 \section{Language Selection}
 \label{section:language-selection}

-In this section, I evaluate three potential languages (C++, Rust and Go) for the development of this software. To support this evaluation, I have provided a sample script in each language. The sample script is intended to be a minimal example of collected packets from a TUN device, placing them in a queue from a single thread, and consuming the packets from the queue with multiple threads. These examples are given in figures \ref{fig:cpp-tun-sample} through \ref{fig:go-tun-sample}.
+In this section, I evaluate three potential languages (C++, Rust and Go) for the development of this software. To support this evaluation, I have provided a sample program in each language. The sample program is intended to be a minimal example of reading packets from a TUN interface, placing them in a queue from a single thread, and consuming the packets from the queue with multiple threads. These examples are given in figures \ref{fig:cpp-tun-sample} through \ref{fig:go-tun-sample}.

 \subsubsection{C++}

@ -175,7 +206,7 @@ Garbage collection and first order concurrency come together to make the code pr
 \section{Requirements Analysis}
 \label{section:requirements-analysis}

-TODO
+The requirements of the project are detailed in the Success Criteria of the Project Proposal (Appendix \ref{appendix:project-proposal}), and are the primary method of evaluation for project success. They are split into three categories: success criteria, extended goals and stretch goals.

 % ------------------------- Engineering Approach --------------------------- %
 \section{Engineering Approach}
@ -185,9 +216,9 @@ TODO

 The development of this software used the iterative model, with the initial iteration following a waterfall model. As mentioned above, the core deliverable of this project is large, such that much programming was required before systems testing became a possibility. The waterfall model best suited this - building the software in separately tested parts, then putting significant focus on systems testing.

-As many of the requirements laid out in the project proposal's success criteria are quantitative system performance tests, I developed a system to automate this as part of the initial waterfall. This allowed frequent testing of the software against the success criteria. 
+As many of the requirements laid out in the project proposal's success criteria are quantitative system performance tests, I developed a system to automate this as part of the initial waterfall. This allowed frequent evaluation of the software against the success criteria. 

-The rest of the iterations were much smaller than the first, with each focusing on improving a specific factor. These iterations could be continued until the quantitative success criteria were satisfied, meaning that the software had met its intended use.
+The rest of the iterations were much smaller than the first, with each focusing on improving a specific factor. These iterations were continued until the success criteria were satisfied, meaning that the software had met its intended use.

 \subsubsection{Development Tools}

@ -195,11 +226,11 @@ A large part of the language choice focused on development tools. As discussed i

 I used Git version control, with a self-hosted Gitea\footnote{\url{https://gitea.com/}} server as the remote. My repositories have a multitude of on- and off-site backups, at varying frequencies (2xUSB + 2xCloud Storage + NAS + Multiple Computers).

-Alongside my self-hosted Gitea server, I have a self hosted Drone by Harness\footnote{\url{http://drone.io/}} server for CI. This made it simple to add a Drone file to the repository, allowing for the Go tests to be ran, and using a script with the gofmt\footnote{\url{https://golang.org/cmd/gofmt/}} tool.
+Alongside my self-hosted Gitea server, I have a self hosted Drone by Harness\footnote{\url{http://drone.io/}} server for continuous integration. This made it simple to add a Drone file to the repository, allowing for the Go tests to be ran, and using a script with the gofmt\footnote{\url{https://golang.org/cmd/gofmt/}} tool.

 \mint{shell-session}`bash -c "gofmt -l . | wc -l | cmp -s <(echo 0) || (gofmt -l . && exit 1)"`

-This script, ran by Drone, rejects any pushes to the Git repository that do not conform to the formatting specified by the gofmt tool. This may seem simple, but ensuring that both branches in a merge are consistently formatted can significantly reduce merge issues.
+This script, ran by Drone, rejects any pushes to the Git repository that do not conform to the formatting specified by the gofmt tool. Ensuring that all branches are consistently formatted can significantly reduce merge issues.

 \subsubsection{Licensing}

@ -208,9 +239,9 @@ I have chosen to license this software under the MIT license. The MIT license is
 % ---------------------------- Starting Point ------------------------------ %
 \section{Starting Point}

-TODO
+I had significant experience with the language Go before the start of this project, though not formally taught. My knowledge of networking is limited to that of a user, and the content of the Part IB Tripos courses \emph{Computer Networking} and \emph{Principles of Communication} (the latter given after the start of this project). The security analysis drew from the Part IA course \emph{Software and Security Engineering} and the Part IB course \emph{Security}. As the software is highly concurrent, the Part IB course \emph{Concurrent and Distributed Systems} and the Part II Unit of Assessment \emph{Multicore Semantics and Programming} proved useful.

 % -------------------------------- Summary --------------------------------- %
 \section{Summary}

-TODO
+Security is a large area in this project - perhaps more than the single success criteria suggests.
--- a/Proposal/proposal.tex
+++ b/Proposal/proposal.tex
@ -1,6 +1,6 @@
 % ************************** Proposal **************************

-\begin{proposal}
-  \includepdf[pages=-]{Proposal/project-proposal.pdf}
-\end{proposal}
+\chapter{Project Proposal}
+\label{appendix:project-proposal}

+\includepdf[pages=-]{Proposal/project-proposal.pdf}
--- a/thesis.tex
+++ b/thesis.tex
@ -177,6 +177,7 @@

 \include{GraphGeneration/graphgeneration}
 \include{OutboundGraphs/outboundgraphs}
+\include{Proposal/proposal}

 \end{appendices}

@ -184,7 +185,6 @@
 \printthesisindex % If index is present

 % ************************************** Proposal ******************************
-\include{Proposal/proposal}

 %TC:endignore
 \end{document}