datahoarder

8253 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 5 years ago
MODERATORS
151
152
12
ZFS backup strategy (lemmy.sdfeu.org)
submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]
 
 

Hello,

I've been lately thinking about my backup strategy as I'm finalising building my NAS. I want to use ZFS and my idea was to have two drives in mirror (RAID-1) configuration and just execute periodical snapshots on such dataset. I want to the same thing in a second location, so in the end my files would be on 4 different drives in 2 different locations and protected by snapshots from deletion or any other unwanted modification.

Would be possible with this setup to just swap one of the drives in one location and have ZFS automatically rebuild data on the new drive and then I take the drive to second location and do the same so all drives would be exactly the same, instead of copying data manually? Though I believe all of the drives would need to be exactly the same size, is that right?

Is it a good idea in general or should I ditch it, or maybe just ditch the part with ZFS rebuilding and use instead some kind of software for that?

Thank you for your help in advance!

153
154
 
 

I have an old Synology 1819 NAS that was the unfortunate victim of the Intel Atom/Clock bug. I would like to repurpose the case as I like the compact form factor. Has anyone tried replacing a synology mobo with something else (ex: an ITX board or a Raspberry Pi), and using an alternative operating system?

155
 
 

I've just received this E-Mail from Backblaze, announcing a slight increase in storage cost.

In exchange, they offer a free download budget of three times the stored capacity.


Storage Price Increase: Effective October 3, 2023, we are increasing the monthly pay-as-you-go storage rate from $5/TB to $6/TB. The price of B2 Reserve will not change.

Free Egress: Also effective October 3, we’re making egress free (i.e. free download of data) for all B2 Cloud Storage customers—both pay-as-you-go and B2 Reserve—up to three times the amount of data you store with us, with any additional egress priced at just $0.01/GB. Because supporting an open cloud environment is central to our mission, expanding free egress to all customers so they can move data when and where they prefer is a key next

Product Upgrades: From Object Lock for ransomware protection, to Cloud Replication for redundancy, to more data centers to support data location needs, Backblaze has consistently improved B2 Cloud Storage. Stay tuned for more this fall, when we’ll announce upload performance upgrades, expanded integrations, and more partnerships.

156
 
 

Hey everyone,

I'm reaching out because I could use some help or guidance here. So, I kinda ended up being the go-to person for collecting and storing data in a project I'm involved in with my team. We've got a bunch of data, like tens of thousands of files, and we're using nextcloud to make it accessible to our users. The thing is, we're facing a bit of a challenge with the search function on nextcloud when accessed through a public link. Being that there isn't a search function. While we really appreciate the features it offers from an administrative standpoint, it's not working that well for our users.

I was wondering if anyone has any suggestions or resources that could point us in the right direction for this project? It would be super awesome if it's something open-source and privacy retaining. 😄

Thanks a bunch in advance!

157
 
 

I have a Thinkpad mini that needs to expand it's disagree storage and have a couple of Red 4Tb with basically no use.

I wanted to plug one of them over USB, but it seems that docker just doesn't like to have volumes on external drives. AFAIK docker starts before the drive is fully mounted, preventing it from doing so. I couldn't find any reliable way to work around this (but I'm open to suggestions!).

I was wondering if it makes sense too just get an extension SATA cable, if they even exist, and have the drive outside the case. One of the worries I have is if it will have enough power (being meant for 2.5 drives originally).

The alternative would be to either change the server itself or to buy a larger 2.5 instead to keep up for a while.

158
17
submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]
 
 

My old laptop died so I took the SSD from it in hope to use it as external drive. I wanted to just overwrite it with dd for security but I decided to go with f3 as that would also give me the opportunity to test the drive. Sadly, bad results came back

Data OK: 111.75 GB (234352247 sectors)
Data LOST: 14.13 MB (28937 sectors)
       Corrupted: 14.11 MB (28905 sectors)
Slightly changed: 0.00 Byte (0 sectors)
     Overwritten: 16.00 KB (32 sectors)
Average reading speed: 250.69 MB/s

S.M.A.R.T. data if you're curious

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0013   100   100   050    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       359
 12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       995
161 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       236
163 Unknown_Attribute       0x0003   100   100   050    Pre-fail  Always       -       96
165 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       84
166 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       56
172 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0022   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       339
194 Temperature_Celsius     0x0023   059   059   000    Pre-fail  Always       -       41 (Min/Max 33/41)
196 Reallocated_Event_Count 0x0000   100   100   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0012   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       1847
242 Total_LBAs_Read         0x0012   100   100   000    Old_age   Always       -       2424

Yeah, barely used. Just the LBAs written/read doesn't seem to make sense.

Any better ideas than paperweight?
I have tested it when it was new, it had no errors.

159
 
 

Last Friday, the Southern District of New York court issued its final order in Hachette v. Internet Archive, thus bringing the lower court proceedings to a close. We disagree with the court’s decision and intend to appeal. In the meantime, however, we will abide by the court’s injunction.

The lawsuit only concerns our book lending program. The injunction clarifies that the Publisher Plaintiffs will notify us of their commercially available books, and the Internet Archive will expeditiously remove them from lending. Additionally, Judge Koeltl also signed an order in favor of the Internet Archive, agreeing with our request that the injunction should only cover books available in electronic format, and not the publishers’ full catalog of books in print. Separately, we have come to agreement with the Association of American Publishers (AAP), the trade organization that coordinated the original lawsuit with the four publishers, that the AAP will not support further legal action against the Internet Archive for controlled digital lending if we follow the same takedown procedures for any AAP-member publisher.

So what is the impact of these final orders on our library? Broadly, this injunction will result in a significant loss of access to valuable knowledge for the public. It means that people who are not part of an elite institution or who do not live near a well-funded public library will lose access to books they cannot read otherwise. It is a sad day for the Internet Archive, our patrons, and for all libraries.

Because this case was limited to our book lending program, the injunction does not significantly impact our other library services. The Internet Archive may still digitize books for preservation purposes, and may still provide access to our digital collections in a number of ways, including through interlibrary loan and by making accessible formats available to people with qualified print disabilities. We may continue to display “short portions” of books as is consistent with fair use—for example, Wikipedia references (as shown in the image above). The injunction does not affect lending of out-of-print books. And of course, the Internet Archive will still make millions of public domain texts available to the public without restriction.

Regarding the monetary payment, we can say that “AAP’s significant attorney’s fees and costs incurred in the Action since 2020 have been substantially compensated by the Monetary Judgement Payment.”

160
 
 

My current setup is an Odroid H3+ with 2x8TB hard drives configured in raid 1 via mdadm. It is running Ubuntu server and I put in 16GB of RAM. It also has a 1TB SSD on it for system storage.

I'm looking to expand my storage via an attached storage drive enclosure that connects through a USB 3 port. My ideal enclosure has 4 bays with hardware raid support, which I plan to configure in raid 10.

My main question: How do I pool these together so I don't need to balance between the two? Ideally all binded to appear as a single mount point. I tried looking at mergefs, but I'm not sure that it would work.

I ask primarily because I write scripts which moves data to the current raid 1 setup. As i expand the storage, I want it to continue working without worrying about checking for space between the two sets.

Be real with me - is this dumb? What would you suggest given my current stack?

161
 
 

Hey everyone

I was curious has anyone ever had any issues with a cloud service say Google drive and data disappearing? Or any other provider such as the Aws s3.

I know there must be redundancy going on in data centers, but I always wondered about the what if scenario.

Be interesting to hear if anyone has any stories.

162
 
 

I'm wondering how to effectively keep track of the data I hoard. I'm almost at 80TB and currently use very basic folders structure to keep track of my data. I'm looking for something like a full text index maybe even a like a search engine for local data? Is there such a tool?

163
 
 

Anyone have thoughts about backblaze?

I pushed some data to it. Pretty happy with their default tool for uploading from a linux box.

Only issue is I picked the wrong region and no way to flip it without a new account which is a bit annoying.

0.005 cents is pretty decent for a GB/month.

164
165
 
 

I plan to start working on my small setup for personal use. While I'm already pretty on certain on what I'll use on the software side (2 different server with CEPH in network, with an erasure coding (k=2 and k+m=4))

I plan to start with 4 4TiB HDD (plus maybe a small SSD for cache). I already have two laptops with broken screen laying around, and would like, to minimize initial investement, reuse them, and just them into the USB's connection (and connect them with a switch via Ethernet, plus maybe an overlay network if it work well)

I would like to know if someone have recommandation for getting these hard drive? For tax avoidance reason, it is important for them to be internal HDD (as they are considerably less taxed than USB HDD for stupid reason)

166
 
 

cross-posted from: https://lemmy.world/post/3125897

From: https://storage.courtlistener.com/recap/gov.uscourts.nysd.537900/gov.uscourts.nysd.537900.214.1.pdf

‘“Covered Book” shall mean any in-copyright book or portion thereof, whether in existence as of the date hereof or later created, in which any Plaintiff (or any subsidiary or corporate affiliate of a Plaintiff) (a) owns or controls an exclusive right under the Copyright Act …’

‘the “Internet Archive Parties” … are permanently enjoined and restrained from engaging in any of the following acts in, from or to the United States … the distribution to the public, public display, and/or public performance, of Covered Books in, from or to the United States in Case 1:20-cv-04160-JGK-OTW Document 214-1 Filed 08/11/23 Page 3 of 6 any digital or electronic form, including without limitation on the Internet Archive website (collectively “Unauthorized Distribution”)’

So while backing up the entire lending library might have been a challenge, perhaps the books of just the plaintiff publishers can be backed up?

Some tools:

https://gitea.com/bipinkrish/DeGourou

https://github.com/MiniGlome/Archive.org-Downloader

Might also be an opportunity to punish the publishers by distributing their copyrighted works and hurting their pocket (though it seems they're still yet to prove that piracy actually hurts profits!)

167
 
 

Another lawsuit against Internet archive sigh

168
169
 
 

Saw this posted on [email protected] and thought ideal to crosspost here too

At the end of Q2 2023, Backblaze was monitoring 245,757 hard drives and SSDs in our data centers around the world. Of that number, 4,460 are boot drives, with 3,144 being SSDs and 1,316 being HDDs.

This graph looks at the annual failure rate for drives more than 5 years old. The higher capacity ones look a little bit concerning IMO. This is discussed within a short section later on in the blog post.

170
171
 
 

Hey all,

I’m looking to grab the audio files from Blinkist (I have an annual subscription if that matters) because I want to be able to listen to them at a slightly faster speed and their in app player doesn’t allow this.

Is there a way to either rip these from the site using a scraper - the Blinkist-m4a-downloader on GitHub doesn’t work anymore - or I guess automate a process to save the books in app, download them - audio files are streamed but you can save and download for offline playback within the app - and three somehow pull them off my phone storage?

172
 
 

This is really cool, I have never heard of dual actuator drives before.

173
 
 

Wanted to share my "even my 8 TB hard drive isn't enough" solution.

The old drive isn't even included to the storage yet.

174
 
 

cross-posted from: https://lemmy.world/post/2499498 (because I'm new to posting and have a lot to learn)

As I said, I made a lossy reformat of the database and a lossless one for 6.0 Gib (6,477,905,920). compared to ~26GIB from Reddit, where fields are almost intentionally anti-compressed to take up more room.

If there is somewhere I can host it, let me know.

also, I couldn't figure this out, do sqlite databses store any information on the creator or editor of a document?

why it's lossyIt's missing a large table of base64 urandom technically required to recreate the document fully

175
 
 

Info for anyone switching from gdrive to dropbox like me, know that there may be a weekly upload limit of 8TB.

view more: ‹ prev next ›