satellite/repair: fix flaky test TestECREpairerGetOffline
It was possible to get into a situation where successfulPieces = es.RequiredCount(), errorCount < minFailures, and inProgress == 0 (when the succeeding gets all completed before the failures), whereupon the last goroutine in the limiter would sit and wait forever for another goroutine to finish. This change corrects the handling of that situation. As an aside, this is really pretty confusing code and we should think about redoing the whole function. Change-Id: Ifa3d3ad92bc755e563fd06b2aa01ef6147075a69
This commit is contained in:
parent
4a6e34bb2c
commit
20bcdeb8b1
@ -121,7 +121,14 @@ func (ec *ECRepairer) Get(ctx context.Context, limits []*pb.AddressedOrderLimit,
|
||||
return
|
||||
}
|
||||
|
||||
if successfulPieces+inProgress >= es.RequiredCount() {
|
||||
if successfulPieces+inProgress >= es.RequiredCount() && errorCount+inProgress >= minFailures {
|
||||
// we know that inProgress > 0 here, since we didn't return on the
|
||||
// "successfulPieces >= es.RequiredCount() && errorCount >= minFailures" check earlier.
|
||||
// There may be enough downloads in progress to meet all of our needs, so we won't
|
||||
// start any more immediately. Instead, wait until all needs are met (in which case
|
||||
// cond.Broadcast() will be called) or until one of the inProgress workers exits
|
||||
// (in which case cond.Signal() will be called, waking up one waiter) so we can
|
||||
// reevaluate the situation.
|
||||
cond.Wait()
|
||||
continue
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user