storj

Author	SHA1	Message	Date
Ivan Fraixedes	c4fd84ad3e	satellite/metainfo: Add metrics and traces DeletePices Trace the calls to DeletePiecesService.DeletePieces method and add metrics for having statistics about the rate that specific storage node is dialed and duration time spent on dialing storage nodes. These statistics will help us to find out if we should implement connections queues to storage node for reducing the deletion time in cae that we see that we're spending too much time dialing frequent storage nodes. Ticket: https://storjlabs.atlassian.net/browse/SM-85 Change-Id: I9601676c3a8ad96c73c93833145929e4817755e2	2020-02-12 15:38:50 +00:00
Michal Niewrzal	cea4c25f53	mod: bump common and uplink version Change-Id: Ia063d33c087dd91a46c008e154b078f11fa21527	2020-02-12 14:33:54 +00:00
Michal Niewrzal	2472554826	uplinkc: add ability to not set encryption restrictions with restrict_scope Change-Id: I445a945c9dad9d6b9ac5d973619914f50d973185	2020-02-12 11:56:13 +00:00
littleskunk	76849558cb	satellite/gracefulexit: increase performance and tolerate higher error rate Graceful exit is very slow at the moment. Over the last couple days we increase the batch size on Stefans satellite to 1000 but as a side effect the error rate was increased. With a batch size of 500 the error rate looks stable. This PR will increase the default to batch size to 300. Graceful exit will still be painful slow but at least it will be a bit faster. At the same time this PR also increases the number of errors we tolerate. We don't want to DQ slow storage nodes just because they didn't finish all 300 transfers in time. We want to give them more retries. Change-Id: I92e3f99e116d4988457d8b902a88e85ed1bcc1a7	2020-02-12 11:40:15 +00:00
Kaloyan Raev	37cf42a9ae	satellite/metainfo: overwrite zombie segments Fixes https://storjlabs.atlassian.net/browse/USER-240 - Adds UnsynchronizedPut method to metainfo service that overwrites any existing pointer under the same path - Uses UnsynchronizedPut in the metainfo endpoint for committing the segments Change-Id: Icb43f31ea33f14066ca9dfdcf226eb3079b90948	2020-02-12 11:10:38 +00:00
Jeff Wendling	5d6cb68cd7	storage/{cockroachkv,postgreskv}: detailed monitoring for list Change-Id: Iedba10776367233e59f3a6523efdb303b836b241	2020-02-12 10:55:07 +00:00
Egon Elbre	dbf46c4aa7	satellite/admin: administrative endpoint Admin server allows creating basic REST and html API-s for different administrative tasks. Change-Id: I3dc1786abe1c87350eed60ec90e48130f44e63cf	2020-02-12 12:12:50 +02:00
Jeff Wendling	2d2f5e1a7f	satellite/satellitedb/dbx: remove typo in dbx file and format it Change-Id: I756315d6228ac9edd35cad8b496d36ecf2b5d26f	2020-02-11 14:15:13 -07:00
Cameron Ayer	f10b22eae9	accounting/tally: if delta < 0, delta = 0 if redis crashed in the middle of tally we could have a situation where we erroneously subtract from a project total. Currently, `latest` should never be less than `initial` Change-Id: Ibb5ab724ac0ad4d684f7954fad7a9e061104b7df	2020-02-11 19:48:55 +00:00
Cameron Ayer	33d696b096	storage/redis/redisserver: simplify redisserver creation Change-Id: I881576a7881db671b5abeeca7120a022987cc47f	2020-02-11 19:11:57 +00:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
VitaliiShpital	dba647199a	web/satellite: multiple storj coin transactions bug fix Change-Id: If69c5ff65741de1d4ed1d555816df3710d6ee721	2020-02-11 17:30:07 +00:00
Michal Niewrzal	c59938479c	Adding Monty Anderson to the CLA Change-Id: I21ea8babb65637ecd57cb995d6848e8f34fe5cee	2020-02-11 17:11:24 +00:00
Simon Guindon	961944f24d	satellite/orders: Resolve storage node addresses to IP addresses. This change resolves all the storage node addresses to their IP addresses before giving them to the uplink so that the uplink doesn't have to resolve a hundred hosts and can immediately connect to improve uplink performance. Change-Id: Idb834351e0fece409d74c8a1c29b0b8c9b09c9ff	2020-02-11 18:44:45 +02:00
Egon Elbre	429f08b4f0	satellite: add Admin peer This peer will contain our administrative panels. It's completely separated from our other satellite processes because it allows better control for restricting access to it. Change-Id: Ifca473bee82ff6c680b346918ba32b835a7a6847	2020-02-11 16:15:33 +00:00
Egon Elbre	7d62b1cf3b	cmd/storj-sim: don't wait for process to start indefinitely In case the endpoint doesn't start, it might end up indefinitely waiting for it to come up stalling jenkins. Change-Id: Ib10bf1a25461e7532ec56ca705178bc9a7f85d12	2020-02-11 16:15:18 +00:00
Michal Niewrzal	426c8eb31a	private/testplanet: add DeleteBucket method for uplink New method added to be able to delete easily bucket during tests. Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2	2020-02-11 15:58:13 +00:00
Yaroslav Vorobiov	bd9cebda5b	satellite/payments: fix transaction list pagination Change-Id: I533f637e5cb12b47d7f7248f8bf7de93bd8be000	2020-02-11 16:22:53 +02:00
Ethan	208c05e3db	Add metrics to track rate limit. Add monkit metric for the rate-limit when the rate limit is hit Logs warning with projectID https://storjlabs.atlassian.net/browse/SM-165 Change-Id: I352dc40006021990d1bc66a999f62bbf8deb54db	2020-02-11 14:02:12 +00:00
Egon Elbre	ccd8b7f107	satellite/satellitedb: add benchmark for satellitedb setup and close Change-Id: Ifb561f2eb81e439ea7cfa2ca2dad6b15aa50417e	2020-02-11 13:30:23 +00:00
Egon Elbre	d2fca76146	cmd/uplink/cmd: set exact argument counts It was possible to call uplink cp a b c d e sj://bucket/something Change-Id: I731da0da4530a3b3f8fbc569f363ba40cf84853a	2020-02-11 13:09:38 +00:00
NickolaiYurchenko	6cd86b2145	web/storagenode: npm audit fix Change-Id: I09cb16ca8196d36b931e0a148cf73a2ce7ab5be0	2020-02-11 12:53:29 +00:00
Yaroslav Vorobiov	984ed26737	satellite/payments: fix invoice project records pagination Change-Id: I68de69de78256280a6bbf0b744963b9c8c813007	2020-02-11 14:31:55 +02:00
Qweder93	dc075eaa96	satellite/payments : deposit bonuses (credits) added Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0	2020-02-11 11:11:42 +00:00
Yingrong Zhao	3331b443e7	satellite/metainfo: Delete all the piece of a storage node in one single request Change-Id: Ia8758d36f1a113b545e4f746d74d172421f14b24	2020-02-11 00:28:30 +00:00
Natalie Ventura Villasana	3900dadafd	satellite/overlay: find new nodes with ExcludedIPs Adds ExcludedIPs to the NodeCriteria for selecting new storage nodes. Previously, ExcludedIPs was only added to the NodeCriteria for selecting reputable storage nodes. Now that both are included in the FindStorageNodesWithPreferences call, it should no longer be possible to repair pieces to nodes that are on the same IP as nodes already storing pieces from that segment. Adds TestSelectNewStorageNodesExcludedIPs to make sure that SelectNewStorageNodes returns nodes with different IP addresses. https://storjlabs.atlassian.net/browse/V3-3011 Change-Id: Ic2d5e607cadeba6e8d5c40f9717149cb30880335	2020-02-10 23:45:17 +00:00
Moby von Briesen	c4a9a5d48b	satellite/downtime: update detection and estimation downtime chores for more trustworthy downtime tracking Detection chore: Do not update downtime at all from the detection chore. We only want to include downtime between two explicitly failed ping attempts (the duration between last contact success and the first failed ping is no longer included in downtime calculation) Estimation chore: If the satellite started after the last failed ping for a node, do not include offline time since the last failed ping time - only estimate based on two failed pings with no satellite downtime in between. This protects us from including satellite downtime in our storagenode downtime calculations. Change-Id: I1fddc9f7255a7023e02474255d70c64faae75b8a	2020-02-10 22:37:01 +00:00
Jeff Wendling	99c3ba5bbf	testplanet: log stack trace for error during creation Change-Id: Ifcd2cba4195413a7213ba4d113c43f9fb3cbc3e5	2020-02-10 21:59:20 +00:00
Jeff Wendling	bde302fdb8	jenkins: disable npm audit as its failing everything Change-Id: I399e73b86144588758e50bdebbd0e65e8052c8b6	2020-02-10 13:32:40 -07:00
VitaliiShpital	a90955eced	web/satellite: billing history refreshing bug fix Change-Id: I221e2dc13179b9368dea1c7e1a46b8b49b79c729	2020-02-10 18:56:43 +02:00
littleskunk	68d5b1d6ec	build/jenkins: allow -rc release tags Change-Id: Ib1942a8029313debe672eaae6b4de835b74d5035	2020-02-10 12:01:18 +00:00
VitaliiShpital	55a3a90391	web/satellite: uplink CLI docs link behaviour on API keys page reworked Change-Id: I5565c3c8e6e55720c2cbf527aa37b6d881047818	2020-02-10 11:35:54 +00:00
NikolaiYurchenko	6679036ace	web/satellite: unauthorize error handled Change-Id: I12c6937ed1660af097d6930fe2a90fac5f298311	2020-02-10 11:14:51 +00:00
Cameron Ayer	13903449c7	satellite/accounting: fix flaky TestProjectUsageStorage Sometimes the upload that is supposed to fail due to excess usage would pass. This looks to be because it's overwriting another object uploaded earlier in the test and deleting the old pointer. If tally happened to run after the pointer is deleted but before the current upload reaches the live accounting check, it might pass through. The solution is to upload to a different path each time. Change-Id: Ie6c825b9c6eab9ed53426ae262e7997bcb6beb7f	2020-02-07 20:58:24 -05:00
Yingrong Zhao	f151a0b9e1	installer/windows: add service dependency for storagenode When system restarts, local dns resolver may not be ready before our application starts up. Adding a dependency for dns service will help prevent dns lookup not available error for storagenode on system reboot. Change-Id: Ie4be2813736e377df551fd8190f2247d3ae05ccd	2020-02-07 17:05:30 +00:00
Cameron Ayer	75355547c2	satellite/satellitedb: don't include GET_AUDIT and GET_REPAIR with chargeable BW In the methods we use to retrieve a user's chargeable BW, we were summing GET, GET_AUDIT, and GET_REPAIR. We only want to charge for GET Change-Id: Icead7695494b22c7c835482cf8b1512a980d59f1	2020-02-07 12:02:44 +00:00
Egon Elbre	34f38bf6ce	mod: upgrade miniredis to latest miniredis 2.5.0 had a bug with matching keys with newlines. Change-Id: I9bcf998459be6d7d4e03bca3589e989e5ed2304d	2020-02-06 13:31:17 +00:00
NikolaiYurchenko	8147d6ccce	web/storagenode: chart date bug fixed Change-Id: Id781bb8973958510e390286a8e6d4d79e3a36725	2020-02-06 13:53:19 +02:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Isaac Hess	17580fdf57	storagenode/pieces: Add test to cache store This test checks that we are actually walking over the pieces when starting the cache, and that it is returning expected values. A recent outage was partially caused by the fact that this cache was accidentally reading itself (via the pieces store, which has the cache embedded). This test ensures that does not happen, and checks that when the cache's `Run` method is called, the space used values are read from disk and accurately update the cache. Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e	2020-02-05 11:39:06 -07:00
igor gaidaienko	efa0f6d443	storagenode/monitor: set MinimumDiskSpace default to 500GB. As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB. Change-Id: Ia2743d49e86683396958aff51b95df743af4f872	2020-02-04 15:55:42 +00:00
NikolaiYurchenko	384bdf1f58	web/storagenode: code refactoring pt.1 Change-Id: I948261c61e7ed7f9703a85314d7a14ca9a59b16d	2020-02-04 12:06:29 +00:00
Egon Elbre	91a480f5a0	Jenkins: add storagenode npm checks Change-Id: I93e3cc009c628e3c97a24541e7b01c75a342bda6	2020-02-03 20:35:33 +02:00
NikolaiYurchenko	109d733dde	web/storagenode: npm dependencies updated Change-Id: I660d65cc38171d94d21225da8d1b82a08850eb27	2020-02-03 18:15:17 +00:00
Jessica Grebenschikov	dd9d18f152	upgrade drpc so that we have the monkit metric capability Change-Id: Icdd08478aeff4fbd7148975eca8a21fac41289d7	2020-02-03 17:05:54 +00:00
Egon Elbre	9e5679fdaa	storagenode/console/consoleserver: set content-type manually http.FileServer relies on mime types defined in the operating system. These values may be misconfigured, so a javascript file might end up being served as "plain/text". Change-Id: I3c13c8a9ac484bd765a4de0f8253bfe40dde7513	2020-02-03 15:37:47 +02:00
Jeff Wendling	bd78945116	statreceiver: add v2/v3 splitter and downgrade this allows for us to roll out monkit v3 while keeping the old v2 code and dependencies working. Change-Id: I0758ee2bb66685b0819502368fc2c20cb35b958a	2020-02-02 22:56:14 +00:00
Egon Elbre	eaf3318a58	diagrams: satellite graph per process Currently the whole satellite diagram can be quite overwhelming. This change makes graphs for api, core and repair processes separately. Change-Id: Iea906f51c3bcc46c71d7c8f6d8964034b317b3b4	2020-02-01 19:56:15 +00:00
Jeff Wendling	d20db90cff	private/dbutil/txutil: create new transactions for retries it was noticed that if you had a long lived transaction A that was blocking some other transaction B and A was being aborted due to retriable errors, then transaction B was never given priority. this was due to using savepoints to do lightweight retries. this behavior was problematic becaue we had some queries blocked for over 16 hours, so this commit addresses the issue with two prongs: 1. bound the amount of time we will retry a transaction 2. create new transactions when a retry is needed the first ensures that we never wait for 16 hours, and the value chosen is 10 minutes. that should be long enough for an ample amount of retries for small queries, and huge queries probably shouldn't be retried, even if possible: it's more preferrable to find a way to make them smaller. the second ensures that even in the case of retries, queries that are blocked on the aborted transaction gain priority to run. between those two changes, the maximum stall time due to retries should be bounded to around 10 minutes. Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4	2020-02-01 18:34:28 +00:00
Jeff Wendling	71ff044edb	storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month Change-Id: I390569a8702164c42edddd3be020e93782227c2e	2020-01-31 16:25:52 -07:00

1 2 3 4 5 ...

3463 Commits