Update on Overleaf.

2021-02-08 22:26:59 +00:00 · 2021-02-08 22:26:59 +00:00 · 8a5514bc0f
commit 8a5514bc0f
parent e450e1a91f
4 changed files with 5501 additions and 61 deletions
--- a/Implementation/implementation.tex
+++ b/Implementation/implementation.tex
@ -185,6 +185,60 @@ To resolve the issues seen with TCP, an implementation using UDP was built as an

 \subsection{Congestion Control}

+Congestion control most commonly exists in the context of reliable delivery. This provides a significant benefit to TCP congestion control protocols: cumulative acknowledgements. As all of the bytes should always arrive, unless the connection has faulted, the acknowledgement number can simply be kept at the highest received byte. However, for an unreliable protocol, this is not as simple - which packets were missed? An example of this issue is given in 
+
+\begin{figure}
+  \hfill
+  \begin{subfigure}[t]{0.3\textwidth}
+    \centering
+    \begin{tabular}{|c|c|}
+      SEQ & ACK \\
+      1 & 0 \\
+      2 & 0 \\
+      3 & 2 \\
+      4 & 2 \\
+      5 & 2 \\
+      6 & 5 \\
+      6 & 6
+    \end{tabular}
+    \caption{ACKs responding to in order sequence numbers}
+    \label{fig:sequence-ack-continuous}
+  \end{subfigure}\hfill
+  \begin{subfigure}[t]{0.3\textwidth}
+    \centering
+    \begin{tabular}{|c|c|}
+      SEQ & ACK \\
+      1 & 0 \\
+      2 & 0 \\
+      3 & 2 \\
+      5 & 3 \\
+      6 & 3 \\
+      7 & 3 \\
+      7 & 3
+    \end{tabular}
+    \caption{ACKs responding to a missing sequence number}
+    \label{fig:sequence-ack-discontinuous}
+  \end{subfigure}\hfill
+  \begin{subfigure}[t]{0.35\textwidth}
+    \centering
+    \begin{tabular}{|c|c|c|}
+      SEQ & ACK & NACK \\
+      1 & 0 & 0 \\
+      2 & 0 & 0 \\
+      3 & 2 & 0 \\
+      5 & 2 & 0 \\
+      6 & 2 & 0 \\
+      7 & 6 & 4 \\
+      7 & 7 & 4
+    \end{tabular}
+    \caption{ACKs and NACKs responding to a missing sequence number}
+    \label{fig:sequence-ack-nack-discontinuous}
+  \end{subfigure}
+  \caption{ACKs and NACKs responding to sequence numbers}
+  \label{fig:sequence-ack-nack-comparison}
+  \hfill
+\end{figure}
+
 \mynote{Write this.}

 \subsubsection{New Reno}
@ -198,11 +252,11 @@ To resolve the issues seen with TCP, an implementation using UDP was built as an

 \subsection{Routing}

-\mynote{Write this.}
+\mynote{Left for now. I have a bit of an idea about why this might/might not be necessary, and what to discuss it in slightly more context (it differs massively between use cases).}

 \subsection{Buffers}

-\mynote{Write this.}
+\mynote{Left for now. I have some specific debugging I want to do on the real connection to see if I can come up with some better reasons for this. Currently it's along the lines of "setting it to 0 solves one problem but might make another worse".}

 % ----------------------------- Security ----------------------------------- %
 \section{Security}
@ -239,16 +293,16 @@ func (None) Next(b []byte) ([]byte, []byte, error) { return nil, nil, nil }
    \label{fig:crypto-exchanges-none}
 \end{figure}

-The exchanges are designed to be flexible for any packet transport method. The flow of using an exchange is shown in pseudocode in figure \ref{fig:crypto-exchanges-pseudocode}.  The flow manages all packets exchanged by the connection, until the exchange confirms that it is done, or fails. The exchange can also produce data during the exchange, which is handed off to the normal packet handling mechanism, represented in the pseudocode by the \verb'yield' keyword.
+The exchanges are designed to be applicable to any packet transport method. The flow of using an exchange is shown in pseudocode in figure \ref{fig:crypto-exchanges-pseudocode}.  The flow manages all packets exchanged by the connection, until the exchange confirms that it is done, or fails. The exchange can also produce data during the exchange, which is handed off to the normal packet handling mechanism, represented in the pseudocode by the \verb'yield' keyword.

 \begin{figure}
    \centering
    \begin{minted}{python}
-if !exchange.IsDone():
+if not exchange.IsDone():
  if this_side_goes_first:
    send(try(exchange.First()))
    
-  while !exchange.IsDone():
+  while not exchange.IsDone():
    packet = receive()
    next, out = try(exchange.Next(packet))
    if out != nil:
@ -265,11 +319,10 @@ if !exchange.IsDone():

 As discussed in section \ref{section:preparation-repeated-packets}, the algorithm used to prevent repeated packets is the \emph{IPsec Anti-Replay Algorithm without Bit Shifting} \citep{tsou_ipsec_2012}. In the referenced work, a \verb'C' implementation is provided, which could be easily adapted to Go.

-Although the repeated packet protection is primarily targeted at avoiding repeated IP packets, it is possible for it to be used elsewhere, as mentioned in exchanges. As such, the interface for repeat protection is made concurrent, such that it can be used anywhere in the program it's deemed useful. Given that initial flow exchanges should happen exceedingly rarely compared to packet flow, this should not affect the size of the window in a meaningful way.
-
+Although the repeated packet protection is primarily targeted at avoiding repeated IP packets, it is possible for it to be used elsewhere, as mentioned in exchanges. As such, the interface for repeat protection is made concurrent, such that it can be used anywhere in the program it's deemed useful. Given that initial flow exchanges should happen exceedingly rarely compared to packet flow, this should not affect the size of the repeated packet window significantly.
 \subsection{Hierarchy}

-The security features presented in this section form a hierarchy of attachment to a packet. Given in figure \ref{fig:udp-packet-dataflow} is an example of the growth of a packet with a specific configuration of UDP, replay protection, and a message authentication code. The process of sending a packet via a consumer is given right to left, and the reverse process of receiving a packet from a producer is given left to right. For a packet to consume, the first element added is the data sequence number. This data sequence number is closest to the proxy, as it is global and not unique to a flow. This is followed by the congestion control header, added before the MAC to receive integrity and authenticity protection, preventing tampering. Finally, the UDP header is prepended before the packet is dispatched. The same process is repeated in reverse in the other direction. This figure represents the flow of data through the software, and importantly, where each portion of the security must be implemented to achieve its goals.
+The security features presented in this section form a hierarchy within the data flow of a packet. Given in figure \ref{fig:udp-packet-dataflow} is an example of the growth of a packet with a specific configuration of UDP, replay protection, and a message authentication code. The process of sending a packet via a consumer is given right to left, and the reverse process of receiving a packet from a producer is given left to right. For a packet to consume, the first element added is the data sequence number. This data sequence number is closest to the proxy, as it is global and not unique to a flow. This is followed by the congestion control header, added before the MAC such that it receive integrity and authenticity protection, preventing tampering. Finally, the UDP header is prepended before the packet is dispatched. The same process is repeated in reverse in the other direction. This figure represents the flow of data through the software, and importantly, where each portion of the security must be implemented to achieve its goals.

 \begin{figure}
    \centering
@ -305,7 +358,67 @@ The security features presented in this section form a hierarchy of attachment t
 % ----------------------------- Testing ------------------------------------ %
 \section{Testing and Evaluation}

-\mynote{Write this.}
+The project focuses particularly on automatic evaluation. Although some testing can be performed with mocked connections, the evaluation of the success criteria revolve around virtual hardware. The benefit of virtual hardware is the ability to spin up and spin down entire testing environments when required. As such, automatic evaluation software can be built to create the required environments and gather the information required. This allows each code change to be verified by confirming that the graphs show the required trends.
+
+The application is split into two parts: data gathering and data processing. There are two reasons for this. Firstly, the data gathering takes significantly longer than data processing, so being able to gather data once and then process it multiple times is a benefit for rapid development. Secondly, the choice of language is different for each stage, as discussed in \ref{section:preparation-language-choices-evaluation}. To complete the full process of automatic evaluation, the data is first updated by running the data gathering application, after which the data processing is completed. As the only item which needs to cross the language barrier is output data, this can be achieved by using the file system. Any loss of speed by switching from memory to files is irrelevant for testing purposes, so this is an acceptable trade-off.
+
+Discussion of data gathering will be given in this section, while the data processing will be discussed in the evaluation chapter, in section \ref{INVALID}. The data gathering grew to be a significant part of this project, and was built alongside the implementation to gather the requisite data. In fact, it developed from a custom script created to make testing this project easier, into a full blown Java library that can be used for a much wider variety of applications.
+
+\subsection{Data Gathering}
+
+The system for automated data gathering is built using Java. It involves three layers: the Java application itself, the \verb'virtualtests'\footnote{\url{https://github.com/JakeHillion/virtual-tests}} library, which depends on the \verb'proxmox'\footnote{\url{https://github.com/JakeHillion/proxmox-java}}. Each of those elements were developed by me within the constraints of this project.
+
+The data gathering system was built with to make adding data points as simple as possible. As such, the pseudocode of the program's main method follows that in figure \ref{fig:data-gathering-pseudocode}. This succinctly demonstrates that each testing environment exists only as long as test are running within it. The majority of tests are standard tests, with a network structure shown in \ref{fig:data-gathering-standard-network-structure}. The structure connects the local portal via $n$ connections to the remote portal. The test then specifies the connection initial rate limit, and changes to the connection liveness and rate over time. Once this is created, tests can be run between the speed test server and proxied client.
+
+\begin{figure}
+    \centering
+    \begin{minted}{python}
+with environment_1.build() as env:
+  test(env, test1a)
+  test(env, test1b)
+  
+with environment_2.build() as env:
+  test(env, test2a)
+  test(env, test2b)
+    \end{minted}
+    \caption{Pseudocode for data gathering main method.}
+    \label{fig:data-gathering-pseudocode}
+\end{figure}
+
+\begin{figure}
+  \centering
+  \begin{tikzpicture}[
+      squarednode/.style={rectangle, draw=black!60, fill=red!5, very thick, minimum size=5mm},
+    ]
+
+    % Nodes
+    \node[squarednode] at (0,0) (speedtest)      {Speed Test Server};
+    \node[squarednode] at (4,0) (remoteportal)   {Remote Portal};
+    \node[squarednode] at (8,0) (localportal)    {Local Portal};
+    \node[squarednode] at (11,0) (client)         {Client};
+
+    % Edges
+    \draw[->] ([yshift=6mm]speedtest.north) -- (speedtest.north);
+    \draw[->] ([yshift=6mm]remoteportal.north) -- (remoteportal.north);
+    \draw[->] ([xshift=-7mm,yshift=6mm]localportal.north) -- ([xshift=-7mm]localportal.north);
+    \draw[->] ([yshift=6mm]localportal.north) -- (localportal.north);
+    \draw[->] ([xshift=7mm,yshift=6mm]localportal.north) -- ([xshift=7mm]localportal.north);
+    \draw[->] ([yshift=6mm]client.north) -- (client.north);
+
+    \draw[-] ([yshift=6mm]speedtest.north) -- ([yshift=6mm]localportal.north);
+    \draw[-] ([xshift=7mm,yshift=6mm]localportal.north) -- ([yshift=6mm]client.north);
+
+    % Edge Label
+    \node at ([xshift=-3.5mm,yshift=9mm]localportal.north) {0 .. N};
+  \end{tikzpicture}
+
+  \caption{The network structure of standard tests}
+  \label{fig:data-gathering-standard-network-structure}
+\end{figure}
+
+The tests proposed here are expensive to compute. They involve creating Virtual Machines, installing the required software, and most involve running speed tests that take multiple seconds. As such, methods are taken to keep repeats of test small, while maintaining sufficiently small uncertainty to conclusively prove points. To achieve this, each test is run a minimum of 5 times, then run until the coefficient of variance ($CV = \mu/\sigma$) falls below a provided value. Further, the tests support a draft mode, which runs the tests less time in order to produce results more quickly, albeit with a higher level of uncertainty.
+
+After performing the required data analysis to confirm the level of uncertainty for each result, the result can be saved directly to a file. For the majority of tests these are JSON files. To ensure correct communication between the Java and Python softwares, a matching structure exists on both sides that can be compiled to a directory name. Within this directory, each repeat of test data can be stored by its numeric index.

 % ------------------------ Repository Overview ----------------------------- %
 \section{Repository Overview}
@ -432,58 +545,6 @@ The benefit of a NACK is demonstrated in figure \ref{fig:sequence-ack-nack-compa

 Figure \ref{fig:sequence-ack-nack-discontinuous} shows how this same situation can be responded to with a NACK field. After the receiver has concluded that the intermediate packet(s) were lost in transit (a function of RTT, to be discussed further later), it updates the NACK field to the highest lost packet, allowing the ACK field to be increased from one after the lost packet. This solution resolves the deadlock of not being able to increase the ACK number without requiring reliable delivery. That is, the receiver increases their NACK similarly to when a TCP sender would retransmit.

-\begin{figure}
-  \hfill
-  \begin{subfigure}[t]{0.3\textwidth}
-    \centering
-    \begin{tabular}{|c|c|}
-      SEQ & ACK \\
-      1 & 0 \\
-      2 & 0 \\
-      3 & 2 \\
-      4 & 2 \\
-      5 & 2 \\
-      6 & 5 \\
-      6 & 6
-    \end{tabular}
-    \caption{ACKs responding to in order sequence numbers}
-    \label{fig:sequence-ack-continuous}
-  \end{subfigure}\hfill
-  \begin{subfigure}[t]{0.3\textwidth}
-    \centering
-    \begin{tabular}{|c|c|}
-      SEQ & ACK \\
-      1 & 0 \\
-      2 & 0 \\
-      3 & 2 \\
-      5 & 3 \\
-      6 & 3 \\
-      7 & 3 \\
-      7 & 3
-    \end{tabular}
-    \caption{ACKs responding to a missing sequence number}
-    \label{fig:sequence-ack-discontinuous}
-  \end{subfigure}\hfill
-  \begin{subfigure}[t]{0.35\textwidth}
-    \centering
-    \begin{tabular}{|c|c|c|}
-      SEQ & ACK & NACK \\
-      1 & 0 & 0 \\
-      2 & 0 & 0 \\
-      3 & 2 & 0 \\
-      5 & 2 & 0 \\
-      6 & 2 & 0 \\
-      7 & 6 & 4 \\
-      7 & 7 & 4
-    \end{tabular}
-    \caption{ACKs and NACKs responding to a missing sequence number}
-    \label{fig:sequence-ack-nack-discontinuous}
-  \end{subfigure}
-  \caption{ACKs and NACKs responding to sequence numbers}
-  \label{fig:sequence-ack-nack-comparison}
-  \hfill
-\end{figure}
-
 As this was a new UDP protocol, I wrote a Wireshark\footnote{\url{https://wireshark.org}} dissector, shown in figure \ref{fig:udp-wireshark-dissector}. This is a Lua script that requests Wireshark use the given dissector function for UDP traffic on port 1234 (a port chosen for testing). It extracts the three congestion control parameters from the UDP datagram, showing them in a far easier to read format and allowing more efficient debugging of congestion control protocols. The extracted data can be seen in figure \ref{fig:udp-wireshark-dissector-results}.

 \begin{figure}
--- a/Preamble/preamble.tex
+++ b/Preamble/preamble.tex
@ -88,6 +88,7 @@
 \usepackage{dirtree}

 \usepackage{tikz}
+\usepackage{sty/tikz-uml}
 \usetikzlibrary{positioning}
 \usetikzlibrary{shapes.multipart}

--- a/Preparation/preparation.tex
+++ b/Preparation/preparation.tex
@ -308,6 +308,8 @@ For the greedy structure of this project, Go's focus on concurrency is extremely
 Garbage collection and first order concurrency come together to make the code produced for this project highly readable. The downside of this runtime is that the speed of execution is negatively affected. However, for the purposes of this first production, that compromise is acceptable. By producing code that makes the functionality of the application clear, future implementations could more easily be built to mirror it. Given the sample of speeds displayed in section (Ref Needed: Introduction Comments on Speed), and the performance shown in section \ref{section:performance-evaluation}, the compromise of using a well-suited high-level language is one worth taking. 

 \subsection{Evaluation Languages}
+\label{section:preparation-language-choices-evaluation}
+
 \subsubsection{Python}

 Python is a dynamically typed, and was chosen as the initial implementation language. The first reason for this is \verb'matplotlib'\footnote{\url{https://matplotlib.org/}}, a widely used graphing library that can produce the graphs needed for this evaluation. The second reason is \verb'proxmoxer'\footnote{\url{https://github.com/proxmoxer/proxmoxer}}, a fluent API for interacting with a Proxmox server.
--- a/sty/tikz-uml.sty
+++ b/sty/tikz-uml.sty