storj/docs/blueprints/deletion-performance.md
Ivan Fraixedes 46ad27c5f2
docs/blueprints: Change adapt endpoint by creating (#3601)
Replace a sentence which indicates that we're creating and new endpoint
in the storage node side rather than adapting the current one.
2019-11-19 14:28:19 +01:00

11 KiB

Deletion Performance

Abstract

This document describes design for improvements to deletion speed.

Background

Current design requires uplinks to send deletion information to all the storage nodes. This ends up taking a considerable amount of time.

There are few goals with regards to deletions:

  • We need to ensure that uplinks are responsive.
  • We need to ensure that storage nodes aren't storing too much garbage, since it reduces overall capacity of the network.
  • We need to ensure that satellites aren't storing too much garbage, since that increases running cost of the satellite.

We have traced the uplink when removing files of different sizes and we obtained the following results:

File size Inline Communication time in milliseconds
Total Satellite Storage Nodes
Dialing Information gathering (bucket, Object, …) Mark beginning of deletion & get list of segments Begin delete segment (delete segment metadata and return the list of order limits and private key of each piece)
1 Kibyes8582772721441400
4 Kibyes9102932781441420
5 Kibno1959328275142513652
10 Kibno24513082781415601134
2.2 Mibno26433252851495601273
256 Mibno759133327514515396644

We extracted the data of the table the following trace graph SVG files that you can find in a Github gist.

Design

Delegate to the satellite to make the delete requests to the storage nodes.

Uplink will send the delete request to the satellite and wait for its response.

The satellite will send the delete requests to all the storage nodes that have segments and delete the segments from the pointerDB. Finally, it will respond to the uplink.

The satellite will communicate with the storage nodes:

  • Using the current network protocol, RPC/TCP.
  • The request will be sent with long-tail cancellation in the same way as upload.
  • It won't send requests to offline storage nodes.
  • It will send requests concurrently as much as possible.

The satellite will use a backpressure mechanism for ensuring that it's responsive with uplink. The garbage collection will delete the pieces in those storage nodes that didn't receive the delete requests.

The satellite could use a connectionless network protocol, like UDP, to send delete request to the storage nodes. We discard to introduce this change in the first implementation and consider it in the future if we struggle to accomplish the goals.

Rationale

We have found some alternative approaches. We list them, presenting their advantages and trade-offs regarding the designed approach.

As it currently does, uplink would delete the pieces from the storage nodes but with the following changes:

  1. Reduce timeouts for delete requests.
  2. Undeleted pieces will eventually get garbage collected, so we can allow some of them to get lost.
  3. Uplink would send the request without waiting for the deletion to happen. For example, nodes could internally delete things async.
  4. Send delete segments request concurrently as much as possible.

Additionally:

  • We could change transport protocol for delete requests to a connectionless protocol, like UDP, for reducing dialing time.
  • We could probabilistically skip deleting pieces to minimize the number of requests. For example, we could only send the deletion requests to only 50% of the pieces.

Advantages:

  • No extra running and maintenance costs on the satellite side.

Disadvantages:

  • Storage nodes will have more garbage than currently because of not waiting for storage nodes to confirm the operation.
  • Storage nodes will have more garbage if we use a connectionless transport protocol
  • Storage nodes will have more garbage if we use a probabilistic approach.

(2) Satellite delete pieces of the storage nodes reliably

Uplink will communicate with the satellite as it currently does.

The satellite will take care of communicating with the storage nodes for deleting the pieces using RPC.

Advantages:

  • Uplink deletion operation will be independent of the size of the file, guaranteeing always being responsive.
  • It doesn't present a risk of leaving garbage in the storage nodes when deletion operation is interrupted.
  • In general, the storage nodes will have less garbage because of deletions.

Disadvantages:

  • The satellite requires a new chore to delete the pieces of the storage nodes. The increment of network traffic, computation, and data to track the segments to delete will increase the running costs.
  • The satellite will have another component incrementing the cost of the operation as monitoring, replication, etc.

(3) Satellite delete pieces of the storage nodes unreliably

Uplink will communicate with the satellite as it currently does.

The satellite will take care of communicating with the storage nodes for deleting the pieces using a connectionless protocol like UDP.

Advantages:

  • Uplink deletion operation will be independent of the size of the file, guaranteeing always being responsive.
  • It doesn't present a risk of leaving garbage in the storage nodes when deletion operation is interrupted.

Disadvantages:

  • The satellite requires a new chore to delete the pieces of the storage nodes. The increment of network traffic, computation, and data to track the segments to delete will increase the running costs.
  • The satellite will have another component incrementing the cost of the operation as monitoring, replication, etc.

Conclusion

The alternative approach (1):

  • It is similar to the current mechanism but with some improvements towards the goals.
  • It doesn't add more load to the satellite, but we cannot trust in the uplink in deleting the pieces or informing the non-deleted ones.

The alternative approaches (2) and (3) are similar.

Approach (2) has the advantage of guaranteeing less garbage left on the storage nodes at the expense of requiring more network traffic.

Approach (3) requires less network traffic, but it may not require less computation considering that we may need to encrypt the data sent through UDP.

Approach (3) may get rid of the satellite garbage faster.

Both approaches, (2) and (3), present the problem of increasing garbage on the satellite side.

Taking one of these approaches will require a study on how to keep the less amount of garbage as possible.

Implementation

(1) Storage nodes:

  1. Adapt protocol buffers definitions for delete operation. Satellite doesn't have to send order limits; it only has to send piece ID.
  2. Create a new endpoint to receive delete requests from the satellite.

(2) Satellite:

  1. Implement delete request logic with backpressure mechanism to only confirm the operation when certain amount of storage nodes confirm successful. It's like what we do for upload long-tail cancellations.

  2. Adapt protocol buffers definitions for delete operation. The current uplink RPC requests are:

     1. `BeginDeleteObject` - Retreives the _stream ID_.
     1. `ListSegments` - Uses _stream ID_  for retrieving the position index of all the segments.
     1. `BeginDeleteSegments` - Uses the _stream ID_ and position index for retrieving a list of _addressed order limits_.
    

    Because uplink won't send the delete requests to the storage nodes, the delete operation we can simplify it with one satellite request. The satellite will respond, with an empty body, when the deletion ends.

  1. Change logic to not send delete requests to storage nodes.
  2. Uplink rm command will wait until satellite responds.

Considerations

If we plan to release the feature in several steps:

  1. Implement and independently release (1) and (2) without removing the logic of the current functionality.
  2. Implement and release (3).
  3. Announce when previous versions will stop to work properly.
  4. Remove and independently release the delete old logic from the storage node and satellite.

Open issues (if applicable)

  1. Should we track how much each storage node is storing extra due not sending deletes? For the storage nodes that accumulate too much garbage, we could send garbage collection request outside of the normal schedule.
  2. Discuss backward-incompatibility that we may introduce when adapting the protocol buffer definitions in the storage node and satellite side. Should we reuse the current definition as much as possible although it isn't ideal?
  3. How many storage nodes must confirm the deletion successful to allow the satellite to return a response to the uplink?