It's A Digital Disease!

91 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
51
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Inevitable-Bank-8614 on 2025-06-08 05:38:11+00:00.


I'm going through a stack of old HDDs, all over a decade old. Most survived, but two of them give me the click of death and one stopped spinning on me. I never got a chance to back up the two clicking drives or zero-fill them, unfortunately, so it's smashy time, then maybe e-recycling.

Got me thinking. I've always read that data is still technically recoverable from loose damaged platters, but realistically what is the risk here? If you drill a few holes, scrape up the platter with sandpaper, then bend the platter or even cut it into quarters, who in their right mind is going to spend the time, effort, and presumably lots of money to recover data from a random damaged platter they find in the trash?

When you have no other option, how safe is your data if you just destroy the drive without first wiping it?

52
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/kettu92 on 2025-06-07 16:17:53+00:00.

53
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/mikepm07 on 2025-06-06 20:33:39+00:00.


Hey, I started a new job recently that has nearly 600TB of video footage, with about 80% of it sitting on hard drives that are over 10 years old and that isn't kept in an alternate location.

It sounds like some of these drives haven't been turned on and verified in three years.

My new boss just requested we come up with some proposals on how we could safely update our storage and protect from hard drive failure.

We have a DAM (Digital Asset Management Tool) that keeps a lot of the footage we need regularly accessible, but I know he won't want to delete any of the 600TB of footage.

What's our best option here?

My thought is just to buy new hard drives and make it a policy to verify each drive once a year. In addition to that, we need to clone the contents of each drive to a backup and keep it at a separate location as a safety precaution.

I think that will be cheaper than a server or NAS type system?

Would love any thoughts from people who operate in this field more than I.

Thank you

54
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/D3VEstator on 2025-06-06 19:07:28+00:00.


I have bunch of dvds and im debating on if i should rip them because of quality?

The bluerays i rip, but im not sure about dvds in today day in age?

Thoughts

55
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/ElectricalGuava1971 on 2025-06-06 00:39:58+00:00.

56
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/sprfreek on 2025-06-06 18:00:00+00:00.

57
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/nando1969 on 2025-06-06 17:49:10+00:00.

58
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/LordGAD on 2025-06-06 16:40:13+00:00.

59
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/AshleyAshes1984 on 2025-06-06 14:57:56+00:00.

60
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/NateUrBoi on 2025-06-06 00:08:09+00:00.


Right now I’ve got a NAS running SHR with 4x 18TB drives. I’ve heard RAID isn’t enough and while I agree, everything is just so expensive. Am I expected to buy an additional 50TB worth of cold storage? Are all cloud storage providers abhorrently expensive with this amount of data? I’m only storing non-personal media files meaning they are replaceable so I’m not too worried about it, but I’d still like to know if I’m missing something. Thanks.

61
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/rexyuan on 2025-06-05 16:59:11+00:00.


https://www.techradar.com/pro/mission-impossible-the-final-reckoning-gets-surprise-guest-appearance-a-revolutionary-360tb-silica-storage-media

How far away these alternative material stuff good for cold storage are from coming to the consumer market? And what does it mean for data hoarding?

I think it would make the 2 in the 1-2-3 backup principle become 1 copy stored in your usual drive and 1 copy stored in these kinds of specialized cold storage drive

62
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/kommandantredundant on 2025-06-05 10:14:44+00:00.


Yeah you read right, not SanDisk. Got it for free with my AliExpress order.

I tested it with h2testw. 3.9GB OK, 1.9 TB lost. Well. So what can I do with it now? is it just going into the bin? I know I shouldn't rely on it whatsoever, but will this thing actually only take 3.9GB of data or can I put more data onto it, but it will be random wether that data gets corrupted?

63
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Owltiger2057 on 2025-06-04 21:53:35+00:00.

64
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/TestFlightBeta on 2025-06-04 14:00:18+00:00.


I need to pull 5TB of data from Drive, or else my entire account will be deleted, which I must absolutely avoid. Here are some options I've considered:

1a. rclone. I used this to put a lot of data onto Drive. Unfortunately it only sees ~1.5TB of data on Drive. Maybe I'm doing something wrong, but for my rclone is inadequate.

1b. Google Takeout. This seems to be my only hope. Creates 50x 50GB ZIP files. However, it has a lot of problems.

2a. I'm not even going to consider the possibility of trying to download 50x huge ZIP files in Chrome.

2b. I tried Chrono download manager, but it has strange issues where it doesn't want to download a lot of files simultaneously.

2c. JDownloader doens't reliably grab downloads from Chrome, even with the extension installed.

2d. Neither does Folx (I'm on macOS)

2e. Xtreme Download Manager was supposed to have a built-in browser, but after installing it on macOS I don't see an app. I Googled, it's supposed to be a browser extension, but it certainly doesn't appear on Edge, and doesn't specify which browsers work with it. All in all, XDM's macOS support is extremely sloppy, to say the least.

2f. I tried manually downloading them one by one and copying the download link and pasting them into one of the aforementioned download managers, but this did not work (the token expires).

2g. Tried using curl/aria2c with cookies, this does not work either.

2h. Free Download Manager is the only download manager that worked to grab Google Takeout links reliably from Edge. So I can queue them from Google Takeout into FDM.

3a. However, in FDM, it often tries to download serially, one by one, but this works for the first 5 links. The rest error out because of authentication issues.

3b. I tried enabling the ability to download up to 20 files simultaneously. At least then I'd only need to add download links 3 times to download all files. However, a lot of the downloads stay "queued" and not all of them download simultaneously. Meaning I probably have to download 5 at a time.

I'm really at my wits' end... is there no good way to download these links reliably?

65
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/SuperBox4776 on 2025-06-04 23:32:27+00:00.

66
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/tashjiann on 2025-06-04 09:20:59+00:00.

67
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Glad_Obligation1790 on 2025-06-04 00:14:51+00:00.


I learned hoarding from my grandfather. For as long as I can remember, he bought DVDs and Blu-Rays at yard sales and gathered a collection of roughly 2000 disks (no joke), while I argued streaming was better. Except, I learned I was wrong...in the worst way. Two-ish years ago I went to watch my silver boxed Evangelion Neon Genesis DVDs and found, oh no, disk one won't load....in anything and disk 3 sometimes won't either. Since it's expensive to replace and it's pretty old, there's no way to know for sure a new set would even work. Then last year I got my first NAS, a little UGREEN NASync DXP2800 (2 bay, N100, 16GB RAM, 2x 10TB drives, RAID 1) and realized that physical media > streaming. So I began ripping all my DVDs using a cheap portable DVD drive. I got my hands on an OWC Mercury enclosure with an HL Blu-ray drive, and Blu-rays got added to the list too. As I went I started to realize, oh shit, disk rot is showing on a lot of my disks (M*A*S*H was by far the worst). Clearly, hoarding physical media isn't my strong suit. With a lot of work I've gotten almost every disk to eventually rip including Eva. Thank god.

At the start of this year, I moved to a southern state and upgraded to a 6800 Pro when I started running out of space (6 bay, i5, 64GB RAM, 3x 10TB drives, RAID 5), then discovered flea markets selling used DVDs for $1 and TV shows for $5. Obviously, they're older movies and shows, but it's nice to find Psych, House, and others, along with movies I've wanted to watch but haven't, or ones that I can't find available to stream. I found a place near me too that has a small wall that's similarly priced. I bought a lot of 4 Blu-ray drives, got adapters to connect it to my PC, and did the same with some older Sony OptiArc DVD drives, using OWC enclosures again, albeit for laptop drives this time. Now I have 2 Blu-ray and 3 OptiArcs connected and can batch rip my disks.

Last weekend I went to the place with the wall of disks, and they were running a fill-a-box of DVDs sale for $10. The only rule: the box must close. I got 71 cases (4 TI learned hoarding from my grandfather. For as long as I can remember he bought DVDs and Blu-Rays at yard sales and gathered a collection of roughly 2000 disks (no joke) while I argued streaming was better. Except, I learned I was wrong in the worst way. Two-ish years ago I went to watch my silver boxed Evangelion Neon Genesis DVDs and found, oh no, disk one won't load....in anything and disk 3 sometimes won't either. Since it's expensive to replace and it's pretty old, there's no way to know for sure a new set would work. Then last year I got my first NAS, a little UGREEN NASync DXP2800 (2 bay, N100, 16GB RAM, 2x 10TB drives, RAID 1) and realized that physical media > streaming. So I began ripping all my DVDs using a cheap portable DVD drive. I got my hands on an OWC Mercury enclosure with an HL Blu-ray drive and Blu-rays got added to the list too. As I went I started to realize, oh shit, disk rot is showing on a lot of my disks (M*A*S*H was by far the worst). Clearly hoarding physical media isn't my strong suit. With a lot of work I've gotten almost every disk to eventually rip including Eva. Thank god.

At the start of this year I moved to a southern state and upgraded to a 6800 Pro when I started running out of space (6 bay, i5, 64GB RAM, 3x 10TB drives, RAID 5) then discovered flea markets selling used DVDs for $1 and TV shows for $5. Obviously older movies and shows but none the less, it's nice to find Psych, House, and others along with movies I've wanted to watch but haven't or ones that I can't find available to stream. I found a place near me too that has a small wall that's similarly priced. I bought a lot of 4 Blu-ray drives and got an adapter to connect it to my PC and did the same with some older Sony OptiArc DVD drives, using OWC enclosures again, albeit for laptop drives this time. Now I have 2 Blu-ray and 3 OptiArcs connected and can batch rip my disks.

Last weekend I went to the place with the wall of disks and they were running a fill-a-box of DVDs sale for $10. The only requirement being, the box must be able to close. I got 71 cases (4 TV seasons, 2 of 3 disks in a Back to the Future box set, and the rest individual movies). Best deal so far.

Over the past year my goal has evolved. I started by aiming to cancel my streaming services and build my own personal Netflix sized catalogue (at the time, 6600 individual TV shows and movies was the goal) that can grow with me over time without having to worry about something disappearing on me (ahem, Netflix removing Fringe was a bad day), and it's also become an archival project. At the start of the year I switched from VideoByte Blu-ray ripper to DVDFab and MakeMKV which didn't change what I was doing so much as the quality I could achieve. Now I can save more space on the video end, get better color, less artifacts, and original audio (legit Atmos is amazing).

My process involves ripping every disk to ISO using MakeMKV, then batch encoding in DVDFab to h.265 for movies and TV and AV1 for anime, both with remuxed audio and subtitles. It's been a fun project and I have so many more TV shows, anime, and movies to buy. I try to get them used to save money but for shows like Frieren Beyond Journeys End, Moshuko Tensei, and Mieruko-Chan I have to buy them new since they aren't exactly readily available used and Blu-rays are few and far between where I go, especially anime. My next goal is to get the Topaz upscaling software so I can upscale certain DVDs like John Wick until I eventually track down their Blu-rays.

Once I finish ripping to ISO, I put them in a tote and store them in the attic. No point keeping them out once they're digitized and re-encodable whenever I want!

I'm sure my collection is smaller than a lot of peoples but right now but I am proud to have a private and legitimate collection. Best hoarding hobby ever.

Stats (Type - Space - Number):

  • Disks - 4.26TB
  • Anime (Seasons) - 145GB - 13 series
  • Anime (OVAs) - 17.4GB - 11 OVAs
  • Movies - 992GB - 337 movies
  • TV Shows - 605GB - 13 series

Hardware:

  • PC (handles all the encoding) - 13th Gen i7, RTX 4080, 128GB RAM
    • 1x HL BH16NS40 BD-RE
    • 1x HL CH20N BD-ROM
    • 3x Sony OptiArc AD-7740H
  • UGREEN NASync DXP 6800 Pro (hosts Plex and stores the ISOs and content)
    • 12th Gen i5, 64GB RAM, 2x HGST HE10 10TB Drives, 1x Toshiba N300 10TB, 3 Free Bays, setup in RAID 5
  • Various Streaming Devices - Apple TV 4k (1st Gen) w/ Sonos Arc, Roku TV, iPhone 13 Pro Max, iPad Pro M2 (2022), Windows PC
    • All Apple devices play via Infuse

Process:

  • MakeMKV - Back up to DVDs to ISO
  • xreveal - Back up Blu-rays to ISO
  • DVDFab - Convert movies and TV shows
    • MP4, H.265, web optimized, match resolution and frame rate, preserve chapters, 2-pass, high quality, copy audio, subtitles set to remux into file - VobSub Subtitle
  • DVDFab - Convert anime and OVAs
    • MP4, AV1, match resolution and frame rate, preserve chapters, 2-pass, high quality, copy audio, subtitles set to remux into file - VobSub Subtitle

Edit: Since I clearly touched a nerve: I flatly disagree that buying used is the same or even similar to piracy. It was bought. Somewhere along the line, money was paid to purchase it new. Torrenting or downloading it is straight up theft and it’s a disingenuous argument to make . No one was paid at any point. In the case of torrenting a ripped blu-ray, one person paid so 1000+ don't. That neither supports those who did the work nor does it support a primary or secondary market for physical media. There is nothing wrong with buying a used blu-ray or dvd simply because they aren't paid a second time. Just like ford doesn’t get paid again when you buy a used car or a designer when you go thrift shopping. There's a difference between being paid and never being paid and that doesn't change because a disk is used. Regardless it’s a moot point since as a few people have asked all but 3 tv series’s are new, all anime was new, and more than 200 movies (some in my pile still) are new.

68
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Harry_Yudiputa on 2025-06-03 19:50:59+00:00.

69
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/RoachedCoach on 2025-06-03 15:22:15+00:00.

70
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/vff on 2025-06-03 14:18:52+00:00.


You may be familiar with the book A Million Random Digits with 100,000 Normal Deviates from the RAND corporation that was used throughout the 20th century as essentially the canonical source of random numbers.

I’m working towards putting together a similar collection, not of one million random decimal digits, but of at least one quadrillion random binary digits (so 128 terabytes). Truly random numbers, not pseudorandom ones. As an example, one source I’ve been using is video noise from an old USB webcam (a Raspberry Pi Zero with a Pi NoIR camera) in a black box, with every two bits fed into a Von Neumann extractor.

I want to save everything because randomness is by its very nature ephemeral. By storing randomness, this gives permanence to ephemerality.

What I’m wondering is how people sort, store, and organize random numbers.

Current organization

I’m trying to keep this all neatly organized rather than just having one big 128TB file. What I’ve been doing is saving them in 128KB chunks (1 million bits) and naming them “random-values/000/000/000.random” (in a zfs dataset “random-values”) and increasing that number each time I generate a new chunk (so each folder level has at most 1,000 files/subdirectories). I’ve found 1,000 is a decent limit that works across different filesystems; much larger and I’ve seen performance problems. I want this to be usable on a variety of platforms.

Then, in separate zfs dataset, “random-metadata,” I also store metadata as the same filename but with different extensions, such as “random-metadata/000/000/000.sha512” (and 000.gen-info.txt and so on). Yes, I know this could go in a database instead. But that makes sharing this all hugely more difficult. To share a SQL database properly requires the same software, replication, etc. So there’s a pragmatic aspect here. I can import the text data into a database at any time if I want to analyze things.

I am open to suggestions if anyone has any better ideas on this. There is an implied ordering to the blocks, by numbering them in this way, but since I’m storying them in generated order at least it should be random. (Emphasis on should.)

Other ideas I explored

Just as an example of another way to organize this, an idea I had but decided against was to randomly generate a numeric filename instead, using a large enough number of truly random bits to minimize the chances of collisions. In the end, I didn’t see any advantage to this over temporal ordering, since such random names could always be applied after-the-fact instead by taking any chunk as a master index and “renaming” the files based on the values in that chunk. Alternatively, if I wanted to select chunks at random, I could always choose one chunk as an “index”, take each N bits of that as a number, and look up whatever chunk has that index.

What I do want to do in the naming is avoid accidentally introducing bias in the organizational structure. As an example, breaking the random numbers into chunks, then sorting those chunks by the values of the chunks as binary numbers, would be a bad idea. So any kind of sorting is out, and to that end even naming files with their SHA-512 hash introduces an implied order, as they become “sorted” by the properties of the hash. We think of SHA-512 as being cryptographically secure, but it’s not truly “random.”

Validation

Now, as an aside, there is also the question of how to validate the randomness, although this is outside the scope of data hoarding. I’ve been validating the data, as it comes in, in those 128KB chunks. Basically, I take the last 1,048,576 bits as a 128KB binary string and use various functions from the TestU01 library to validate its randomness, always going once forwards and once backwards, as TestU01 is more sensitive to the lower bits in each 32-bit chunk. I then store the results as metadata for each chunk, 000.testu01.txt.

An earlier thought was to try compressing the data with zstd, and reject data that compressed, figuring that meant it wasn’t random. I realized that was naive since random data may in fact have a big string of 0’s or some repeating pattern occasionally, so I switched to TestU01.

Questions

I am not married to how I am doing any of this. It works, but I am pretty sure I’m not doing it optimally. Even 1,000 files in a folder is a lot, although it seems OK so far with zfs. But storing as one big 128TB file would make it far too hard to manage.

I’d love feedback. I am open to new ideas.

For those of you who store random numbers, how do you organize them? And, if you have more random numbers than you have space, how do you decide which random numbers to get rid of? Obviously, none of this can be compressed, so deletion is the only way, but the problem is that once these numbers are deleted, they really are gone forever. There is absolutely no way to ever get them back.

(I’m also open to thoughts on the other aspects of this outside of the data hoarding and organizational aspects, although those may not exactly be on-topic for this subreddit and would probably make more sense to be discussed elsewhere.)


TLDR

I’m generating and hoarding ~128TB of (hopefully) truly random bits. I chunk them into 128KB files and use hierarchical naming to keep things organized and portable. I store per-chunk metadata in a parallel ZFS dataset. I am open to critiques on my organizational structure, metadata handling, efficiency, validation, and strategies for deletion when space runs out.

71
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/svper-user on 2025-06-03 03:46:39+00:00.


After discovering BTRFS, I was amazed by its capabilities. So I started using it on all my systems and backups. That was almost a year ago.

Today I was researching small "UPS" with 18650 batteries and I saw posts about BTRFS being very dangerous in terms of power outages.

How much should I worry about this? I'm afraid that a power outage will cause me to lose two of my backups on my server. The third backup is disconnected from the power, but only has the most important part.

EDIT: I was thinking about it before I went to sleep. I have one of those Chinese emulation handhelds and its first firmware version used some FAT or ext. It was very easy to corrupt the file system if it wasn't shut down properly. They implemented btrfs to solve this and now I can shut it down any way I want, directly from the power supply and it never corrupts the system. That made me feel more at ease.

72
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Far_Marsupial6303 on 2025-06-01 22:05:56+00:00.


https://seekingalpha.com/article/4789561-seagate-technology-holdings-plc-stx-seagate-2025-investor-and-analyst-conference-transcript

Understand that these presentations are of course optimistic for the future, but a high degree of honesty must be given.

I'm still digesting all the great info, particularly in the Q&A section.

73
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Cultural-Victory3442 on 2025-06-01 23:10:36+00:00.


I am about to buy a better capacity hard drive for saving my files, because right now I only use 500Gb hard drives that i had along the years

So I want to move to a better capacity drive.

But I'm not sure on how much $ per TB is a good price.

Any suggestions?

74
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Ok_Quantity_5697 on 2025-06-01 21:37:07+00:00.

75
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/JJPath005 on 2025-05-31 17:35:26+00:00.


Every week I take like 15 GB of footage and it adds pretty quick. What is the most efficient way to upload and store this content. Im saying 1 TB as it allows me space to leverage and avoids bigger crashing issues. Is an SSD Disk the best option.

view more: ‹ prev next ›