storj/storagenode/storagenodedb/orders.go

333 lines
8.6 KiB
Go
Raw Permalink Normal View History

// Copyright (C) 2019 Storj Labs, Inc.
// See LICENSE for copying information.
package storagenodedb
import (
"context"
"database/sql"
"errors"
2019-03-21 13:24:26 +00:00
"time"
"github.com/zeebo/errs"
"storj.io/common/pb"
"storj.io/common/storj"
"storj.io/private/tagsql"
"storj.io/storj/storagenode/orders"
"storj.io/storj/storagenode/orders/ordersfile"
)
// ErrOrders represents errors from the ordersdb database.
var ErrOrders = errs.Class("ordersdb")
storagenode/storagenodedb: refactor both data access objects and migrations to support multiple DB connections (#3057) * Split the info.db database into multiple DBs using Backup API. * Remove location. Prev refactor assumed we would need this but don't. * Added VACUUM to reclaim space after splitting storage node databases. * Added unique names to SQLite3 connection hooks to fix testplanet. * Moving DB closing to the migration step. * Removing the closing of the versions DB. It's already getting closed. * Swapping the database connection references on reconnect. * Moved sqlite closing logic away from the boltdb closing logic. * Moved sqlite closing logic away from the boltdb closing logic. * Remove certificate and vouchers from DB split migration. * Removed vouchers and bumped up the migration version. * Use same constructor in tests for storage node databases. * Use same constructor in tests for storage node databases. * Adding method to access underlining SQL database connections and cleanup * Adding logging for migration diagnostics. * Moved migration closing database logic to minimize disk usage. * Cleaning up error handling. * Fix missing copyright. * Fix linting error. * Add test for migration 21 (#3012) * Refactoring migration code into a nicer to use object. * Refactoring migration code into a nicer to use object. * Fixing broken migration test. * Removed unnecessary code that is no longer needed now that we close DBs. * Removed unnecessary code that is no longer needed now that we close DBs. * Fixed bug where an invalid database path was being opened. * Fixed linting errors. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Fix migration test. NOTE: This change does not address new tables satellites and satellite_exit_progress * Removing v22 migration to move into it's own PR. * Removing v22 migration to move into it's own PR. * Refactored schema, rebind and configure functions to be re-useable. * Renamed LegacyInfoDB to DeprecatedInfoDB. * Cleaned up closeDatabase function. * Renamed storageNodeSQLDB to migratableDB. * Switched from using errs.Combine() to errs.Group in closeDatabases func. * Removed constructors from storage node data access objects. * Reformatted usage of const. * Fixed broken test snapshots. * Fixed linting error.
2019-09-18 17:17:28 +01:00
// OrdersDBName represents the database name.
const OrdersDBName = "orders"
storagenode/storagenodedb: refactor both data access objects and migrations to support multiple DB connections (#3057) * Split the info.db database into multiple DBs using Backup API. * Remove location. Prev refactor assumed we would need this but don't. * Added VACUUM to reclaim space after splitting storage node databases. * Added unique names to SQLite3 connection hooks to fix testplanet. * Moving DB closing to the migration step. * Removing the closing of the versions DB. It's already getting closed. * Swapping the database connection references on reconnect. * Moved sqlite closing logic away from the boltdb closing logic. * Moved sqlite closing logic away from the boltdb closing logic. * Remove certificate and vouchers from DB split migration. * Removed vouchers and bumped up the migration version. * Use same constructor in tests for storage node databases. * Use same constructor in tests for storage node databases. * Adding method to access underlining SQL database connections and cleanup * Adding logging for migration diagnostics. * Moved migration closing database logic to minimize disk usage. * Cleaning up error handling. * Fix missing copyright. * Fix linting error. * Add test for migration 21 (#3012) * Refactoring migration code into a nicer to use object. * Refactoring migration code into a nicer to use object. * Fixing broken migration test. * Removed unnecessary code that is no longer needed now that we close DBs. * Removed unnecessary code that is no longer needed now that we close DBs. * Fixed bug where an invalid database path was being opened. * Fixed linting errors. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Fix migration test. NOTE: This change does not address new tables satellites and satellite_exit_progress * Removing v22 migration to move into it's own PR. * Removing v22 migration to move into it's own PR. * Refactored schema, rebind and configure functions to be re-useable. * Renamed LegacyInfoDB to DeprecatedInfoDB. * Cleaned up closeDatabase function. * Renamed storageNodeSQLDB to migratableDB. * Switched from using errs.Combine() to errs.Group in closeDatabases func. * Removed constructors from storage node data access objects. * Reformatted usage of const. * Fixed broken test snapshots. * Fixed linting error.
2019-09-18 17:17:28 +01:00
type ordersDB struct {
dbContainerImpl
}
// Enqueue inserts order to the unsent list.
func (db *ordersDB) Enqueue(ctx context.Context, info *ordersfile.Info) (err error) {
defer mon.Task()(&ctx)(&err)
limitSerialized, err := pb.Marshal(info.Limit)
if err != nil {
return ErrOrders.Wrap(err)
}
orderSerialized, err := pb.Marshal(info.Order)
if err != nil {
return ErrOrders.Wrap(err)
}
// TODO: remove uplink_cert_id
_, err = db.ExecContext(ctx, `
INSERT INTO unsent_order(
satellite_id, serial_number,
order_limit_serialized, order_serialized, order_limit_expiration,
uplink_cert_id
) VALUES (?,?, ?,?,?, ?)
`, info.Limit.SatelliteId, info.Limit.SerialNumber, limitSerialized, orderSerialized, info.Limit.OrderExpiration.UTC(), 0)
return ErrOrders.Wrap(err)
}
// ListUnsent returns orders that haven't been sent yet.
//
// If there is some unmarshal error while reading an order, the method proceed
// with the following ones and the function will return the ones which have
// been successfully read but returning an error with information of the ones
// which have not. In case of database or other system error, the method will
// stop without any further processing and will return an error without any
// order.
func (db *ordersDB) ListUnsent(ctx context.Context, limit int) (_ []*ordersfile.Info, err error) {
defer mon.Task()(&ctx)(&err)
rows, err := db.QueryContext(ctx, `
SELECT order_limit_serialized, order_serialized
FROM unsent_order
LIMIT ?
`, limit)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, nil
}
return nil, ErrOrders.Wrap(err)
}
var unmarshalErrors errs.Group
defer func() { err = errs.Combine(err, unmarshalErrors.Err(), rows.Close()) }()
var infos []*ordersfile.Info
for rows.Next() {
var limitSerialized []byte
var orderSerialized []byte
err := rows.Scan(&limitSerialized, &orderSerialized)
if err != nil {
return nil, ErrOrders.Wrap(err)
}
var info ordersfile.Info
info.Limit = &pb.OrderLimit{}
info.Order = &pb.Order{}
err = pb.Unmarshal(limitSerialized, info.Limit)
if err != nil {
unmarshalErrors.Add(ErrOrders.Wrap(err))
continue
}
err = pb.Unmarshal(orderSerialized, info.Order)
if err != nil {
unmarshalErrors.Add(ErrOrders.Wrap(err))
continue
}
infos = append(infos, &info)
}
return infos, ErrOrders.Wrap(rows.Err())
}
2019-03-21 13:24:26 +00:00
// ListUnsentBySatellite returns orders that haven't been sent yet and are not expired.
// The orders are ordered by the Satellite ID.
//
// If there is some unmarshal error while reading an order, the method proceed
// with the following ones and the function will return the ones which have
// been successfully read but returning an error with information of the ones
// which have not. In case of database or other system error, the method will
// stop without any further processing and will return an error without any
// order.
func (db *ordersDB) ListUnsentBySatellite(ctx context.Context) (_ map[storj.NodeID][]*ordersfile.Info, err error) {
defer mon.Task()(&ctx)(&err)
2019-03-21 13:24:26 +00:00
// TODO: add some limiting
rows, err := db.QueryContext(ctx, `
2019-03-21 13:24:26 +00:00
SELECT order_limit_serialized, order_serialized
FROM unsent_order
WHERE order_limit_expiration >= $1
`, time.Now().UTC())
2019-03-21 13:24:26 +00:00
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
2019-03-21 13:24:26 +00:00
return nil, nil
}
return nil, ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
var unmarshalErrors errs.Group
defer func() { err = errs.Combine(err, unmarshalErrors.Err(), rows.Close()) }()
2019-03-21 13:24:26 +00:00
infos := map[storj.NodeID][]*ordersfile.Info{}
2019-03-21 13:24:26 +00:00
for rows.Next() {
var limitSerialized []byte
var orderSerialized []byte
err := rows.Scan(&limitSerialized, &orderSerialized)
if err != nil {
return nil, ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
var info ordersfile.Info
info.Limit = &pb.OrderLimit{}
info.Order = &pb.Order{}
2019-03-21 13:24:26 +00:00
err = pb.Unmarshal(limitSerialized, info.Limit)
2019-03-21 13:24:26 +00:00
if err != nil {
unmarshalErrors.Add(ErrOrders.Wrap(err))
continue
2019-03-21 13:24:26 +00:00
}
err = pb.Unmarshal(orderSerialized, info.Order)
2019-03-21 13:24:26 +00:00
if err != nil {
unmarshalErrors.Add(ErrOrders.Wrap(err))
continue
2019-03-21 13:24:26 +00:00
}
infos[info.Limit.SatelliteId] = append(infos[info.Limit.SatelliteId], &info)
}
return infos, ErrOrders.Wrap(rows.Err())
2019-03-21 13:24:26 +00:00
}
// Archive marks order as being handled.
//
// If any of the request contains an order which doesn't exist the method will
// follow with the next ones without interrupting the operation and it will
// return an error of the class orders.OrderNotFoundError. Any other error, will
// abort the operation, rolling back the transaction.
func (db *ordersDB) Archive(ctx context.Context, archivedAt time.Time, requests ...orders.ArchiveRequest) (err error) {
defer mon.Task()(&ctx)(&err)
2019-03-21 13:24:26 +00:00
storagenode: live tracking of order window usage This change accomplishes multiple things: 1. Instead of having a max in flight time, which means we effectively have a minimum bandwidth for uploads and downloads, we keep track of what windows have active requests happening in them. 2. We don't double check when we save the order to see if it is too old: by then, it's too late. A malicious uplink could just submit orders outside of the grace window and receive all the data, but the node would just not commit it, so the uplink gets free traffic. Because the endpoints also check for the order being too old, this would be a very tight race that depends on knowledge of the node system clock, but best to not have the race exist. Instead, we piggy back off of the in flight tracking and do the check when we start to handle the order, and commit at the end. 3. Change the functions that send orders and list unsent orders to accept a time at which that operation is happening. This way, in tests, we can pretend we're listing or sending far into the future after the windows are available to send, rather than exposing test functions to modify internal state about the grace period to get the desired effect. This brings tests closer to actual usage in production. 4. Change the calculation for if an order is allowed to be enqueued due to the grace period to just look at the order creation time, rather than some computation involving the window it will be in. In this way, you can easily answer the question of "will this order be accepted?" by asking "is it older than X?" where X is the grace period. 5. Increases the frequency we check to send up orders to once every 5 minutes instead of once every hour because we already have hour-long buffering due to the windows. This decreases the maximum latency that an order will be reported back to the satellite by 55 minutes. Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-12 20:01:43 +01:00
// change input parameter to UTC timezone before we send it to the database
archivedAt = archivedAt.UTC()
tx, err := db.BeginTx(ctx, nil)
if err != nil {
return ErrOrders.Wrap(err)
}
var notFoundErrs errs.Group
defer func() {
if err == nil {
err = tx.Commit()
if err == nil {
if len(notFoundErrs) > 0 {
// Return a class error to allow to the caler to identify this case
2019-08-21 17:30:29 +01:00
err = orders.OrderNotFoundError.Wrap(notFoundErrs.Err())
}
}
} else {
err = errs.Combine(err, tx.Rollback())
}
}()
for _, req := range requests {
err := db.archiveOne(ctx, tx, archivedAt, req)
if err != nil {
if orders.OrderNotFoundError.Has(err) {
notFoundErrs.Add(err)
continue
}
return err
}
}
return nil
}
// archiveOne marks order as being handled.
func (db *ordersDB) archiveOne(ctx context.Context, tx tagsql.Tx, archivedAt time.Time, req orders.ArchiveRequest) (err error) {
defer mon.Task()(&ctx)(&err)
result, err := tx.ExecContext(ctx, `
2019-07-16 17:31:29 +01:00
INSERT INTO order_archive_ (
2019-03-21 13:24:26 +00:00
satellite_id, serial_number,
order_limit_serialized, order_serialized,
uplink_cert_id,
status, archived_at
) SELECT
2019-03-21 13:24:26 +00:00
satellite_id, serial_number,
order_limit_serialized, order_serialized,
2019-03-21 13:24:26 +00:00
uplink_cert_id,
?, ?
FROM unsent_order
WHERE satellite_id = ? AND serial_number = ?;
DELETE FROM unsent_order
2019-03-21 13:24:26 +00:00
WHERE satellite_id = ? AND serial_number = ?;
`, int(req.Status), archivedAt, req.Satellite, req.Serial, req.Satellite, req.Serial)
2019-03-21 13:24:26 +00:00
if err != nil {
return ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
count, err := result.RowsAffected()
if err != nil {
return ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
if count == 0 {
return orders.OrderNotFoundError.New("satellite: %s, serial number: %s",
req.Satellite.String(), req.Serial.String(),
)
2019-03-21 13:24:26 +00:00
}
return nil
}
// ListArchived returns orders that have been sent.
func (db *ordersDB) ListArchived(ctx context.Context, limit int) (_ []*orders.ArchivedInfo, err error) {
defer mon.Task()(&ctx)(&err)
2019-03-21 13:24:26 +00:00
rows, err := db.QueryContext(ctx, `
SELECT order_limit_serialized, order_serialized, status, archived_at
2019-07-16 17:31:29 +01:00
FROM order_archive_
2019-03-21 13:24:26 +00:00
LIMIT ?
`, limit)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
2019-03-21 13:24:26 +00:00
return nil, nil
}
return nil, ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
defer func() { err = errs.Combine(err, rows.Close()) }()
var infos []*orders.ArchivedInfo
for rows.Next() {
var limitSerialized []byte
var orderSerialized []byte
var status int
var archivedAt time.Time
err := rows.Scan(&limitSerialized, &orderSerialized, &status, &archivedAt)
2019-03-21 13:24:26 +00:00
if err != nil {
return nil, ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
var info orders.ArchivedInfo
info.Limit = &pb.OrderLimit{}
info.Order = &pb.Order{}
2019-03-21 13:24:26 +00:00
info.Status = orders.Status(status)
info.ArchivedAt = archivedAt
err = pb.Unmarshal(limitSerialized, info.Limit)
2019-03-21 13:24:26 +00:00
if err != nil {
return nil, ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
err = pb.Unmarshal(orderSerialized, info.Order)
2019-03-21 13:24:26 +00:00
if err != nil {
return nil, ErrOrders.Wrap(err)
2019-03-21 13:24:26 +00:00
}
infos = append(infos, &info)
}
return infos, ErrOrders.Wrap(rows.Err())
2019-03-21 13:24:26 +00:00
}
// CleanArchive deletes all entries older than ttl.
storagenode: live tracking of order window usage This change accomplishes multiple things: 1. Instead of having a max in flight time, which means we effectively have a minimum bandwidth for uploads and downloads, we keep track of what windows have active requests happening in them. 2. We don't double check when we save the order to see if it is too old: by then, it's too late. A malicious uplink could just submit orders outside of the grace window and receive all the data, but the node would just not commit it, so the uplink gets free traffic. Because the endpoints also check for the order being too old, this would be a very tight race that depends on knowledge of the node system clock, but best to not have the race exist. Instead, we piggy back off of the in flight tracking and do the check when we start to handle the order, and commit at the end. 3. Change the functions that send orders and list unsent orders to accept a time at which that operation is happening. This way, in tests, we can pretend we're listing or sending far into the future after the windows are available to send, rather than exposing test functions to modify internal state about the grace period to get the desired effect. This brings tests closer to actual usage in production. 4. Change the calculation for if an order is allowed to be enqueued due to the grace period to just look at the order creation time, rather than some computation involving the window it will be in. In this way, you can easily answer the question of "will this order be accepted?" by asking "is it older than X?" where X is the grace period. 5. Increases the frequency we check to send up orders to once every 5 minutes instead of once every hour because we already have hour-long buffering due to the windows. This decreases the maximum latency that an order will be reported back to the satellite by 55 minutes. Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-12 20:01:43 +01:00
func (db *ordersDB) CleanArchive(ctx context.Context, deleteBefore time.Time) (_ int, err error) {
defer mon.Task()(&ctx)(&err)
result, err := db.ExecContext(ctx, `
DELETE FROM order_archive_
WHERE archived_at <= ?
storagenode: live tracking of order window usage This change accomplishes multiple things: 1. Instead of having a max in flight time, which means we effectively have a minimum bandwidth for uploads and downloads, we keep track of what windows have active requests happening in them. 2. We don't double check when we save the order to see if it is too old: by then, it's too late. A malicious uplink could just submit orders outside of the grace window and receive all the data, but the node would just not commit it, so the uplink gets free traffic. Because the endpoints also check for the order being too old, this would be a very tight race that depends on knowledge of the node system clock, but best to not have the race exist. Instead, we piggy back off of the in flight tracking and do the check when we start to handle the order, and commit at the end. 3. Change the functions that send orders and list unsent orders to accept a time at which that operation is happening. This way, in tests, we can pretend we're listing or sending far into the future after the windows are available to send, rather than exposing test functions to modify internal state about the grace period to get the desired effect. This brings tests closer to actual usage in production. 4. Change the calculation for if an order is allowed to be enqueued due to the grace period to just look at the order creation time, rather than some computation involving the window it will be in. In this way, you can easily answer the question of "will this order be accepted?" by asking "is it older than X?" where X is the grace period. 5. Increases the frequency we check to send up orders to once every 5 minutes instead of once every hour because we already have hour-long buffering due to the windows. This decreases the maximum latency that an order will be reported back to the satellite by 55 minutes. Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-12 20:01:43 +01:00
`, deleteBefore.UTC())
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return 0, nil
}
return 0, ErrOrders.Wrap(err)
}
count, err := result.RowsAffected()
if err != nil {
return 0, ErrOrders.Wrap(err)
}
return int(count), nil
}