From 05f30740f51b6569701be23dd6615cd4b6409fb6 Mon Sep 17 00:00:00 2001 From: Igor <38665104+ihaid@users.noreply.github.com> Date: Mon, 10 Jul 2023 12:23:55 +0300 Subject: [PATCH] docs/testplan: add project cowbell testplan (#6001) --- docs/testplan/project-cowbell-testplan.md | 25 +++++++++++++++++++++++ 1 file changed, 25 insertions(+) create mode 100644 docs/testplan/project-cowbell-testplan.md diff --git a/docs/testplan/project-cowbell-testplan.md b/docs/testplan/project-cowbell-testplan.md new file mode 100644 index 000000000..a92a461fa --- /dev/null +++ b/docs/testplan/project-cowbell-testplan.md @@ -0,0 +1,25 @@ +# Mini Cowbell Testplan + +  + +## Background +We want to deploy the entire Storj stack on environments that have kubernetes running on 5 NUCs. + +  + +## Pre-condition +Configuration for satellites that only have 5 node and the recommended RS scheme is [2,3,4,4] where: +- 2 is the number of required pieces to reconstitute the segment. +- 3 is the repair threshold, i.e. if a segment remains with only 3 healthy pieces, it will be repaired. +- 4 is the success threshold, i.e. the number of pieces required for a successful upload or repair. +- 4 is the number of total erasure-coded pieces that will be generated. + + +| Test Scenario | Test Case | Description | Comments | +|---------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Upload | Upload with all nodes online | Every file is uploaded to 4 nodes with 2x expansion factor. So one node has no files. | Happy path scenario | +| | Upload with one node offline | If one of five nodes fails and goes offline, 80% of the stored data will lose one erasure-coded piece. The health status of these segments will be reduced from 4 pieces to 3 pieces and will mark these segments for repair. overlay.node.online-window: 4h0m0s -> for about 4 hours the node will still be selected for uploads) | Uploads will continue uninterrupted if the client uses the new refactored upload path. This improved upload logic will request the satellite for a new node if the satellite selects the offline node for the upload, unaware it is already offline. If the client uses the old upload logic, uploads may fail if the satellite selects the offline node (20% chance). When the satellite detects the offline node, all uploads will be successful. | +| Download | Download with one node offline | If one of five nodes fails and goes offline, 80% of the stored data will lose one erasure-coded piece. The health status of these segments will be reduced from 4 pieces to 3 pieces and will mark these segments for repair. overlay.node.online-window: 4h0m0s -> for about 4 hours the node will still be selected for downloads) | | +| Repair | Repair with 2 nodes disqualified | Disqualify 2 nodes so the repair download are still possible but there is no node available for an upload, shouldn't consume download bandwidth and error out early. Only spend download bandwidth when there is at least one node available for an upload | If two nodes go offline, there are remaining pieces in the worst case, which cannot be repaired and is a de facto data loss if the offline nodes are damaged. | +| Audit | | Audits can't identify corrupted pieces with just the minimum number of pieces. Reputation should not increase. Audits should be able to identify corrupted pieces with minumum + 1 pieces. Reputation should decrease. | | +| Upgrades | Nodes restart for upgrades | No more than a single node goes offline for maintenance. Otherwise, normal operation of the network cannot be ensured. | Occasionally, nodes may need to restart due to software updates. This brings the node offline for some period of time |