storj/satellite/repair/queue/queue.go
Cameron Ayer c2525ba2b5 satellite/{repair,satellitedb}: clean up healthy segments from repair queue at end of checker iteration
Repair workers prioritize the most unhealthy segments. This has the consequence that when we
finally begin to reach the end of the queue, a good portion of the remaining segments are
healthy again as their nodes have come back online. This makes it appear that there are more
injured segments than there actually are.

solution:
Any time the checker observes an injured segment it inserts it into the repair queue or
updates it if it already exists. Therefore, we can determine which segments are no longer
injured if they were not inserted or updated by the last checker iteration. To do this we
add a new column to the injured segments table, updated_at, which is set to the current time
when a segment is inserted or updated. At the end of the checker iteration, we can delete any
items where updated_at < checker start.

Change-Id: I76a98487a4a845fab2fbc677638a732a95057a94
2020-09-29 20:38:22 +00:00

31 lines
1.0 KiB
Go

// Copyright (C) 2019 Storj Labs, Inc.
// See LICENSE for copying information.
package queue
import (
"context"
"time"
"storj.io/common/pb"
)
// RepairQueue implements queueing for segments that need repairing.
// Implementation can be found at satellite/satellitedb/repairqueue.go.
//
// architecture: Database
type RepairQueue interface {
// Insert adds an injured segment.
Insert(ctx context.Context, s *pb.InjuredSegment, numHealthy int) (alreadyInserted bool, err error)
// Select gets an injured segment.
Select(ctx context.Context) (*pb.InjuredSegment, error)
// Delete removes an injured segment.
Delete(ctx context.Context, s *pb.InjuredSegment) error
// Clean removes all segments last updated before a certain time
Clean(ctx context.Context, before time.Time) (deleted int64, err error)
// SelectN lists limit amount of injured segments.
SelectN(ctx context.Context, limit int) ([]pb.InjuredSegment, error)
// Count counts the number of segments in the repair queue.
Count(ctx context.Context) (count int, err error)
}