I am still :ablobcatsweatsip: ing on Nostr: theprimeape I got out of EXACTLY the same situation a while back. 24 TB of situation. ...
theprimeape (npub14gg…79df) I got out of EXACTLY the same situation a while back. 24 TB of situation. The entire process took, no shit, about 4 months. Learned a hell of a lot from it though.
The first of which being that modern HDDs have not only not improved but seem to have dramatically backslid in their handling of many small files. If you are archiving many small files (I'm talking millions) you should be prepared for whatever operations you do to potentially take WEEKS at the very least on an HDD. I would recommend getting a 2 TB external SSD, which are about $150, search to find the largest files and get those out of the folder if you can, and then transfer the small files onto the SSD as a backup. The difference between file transfers to HDD vs SSD was 2.5 weeks vs 36 hours.
Generally annex does not help with small files unless they can benefit from duplicate reduction. Mine did massively. Some of my archives shrank to 1/3rd of the size on ext4. However, small files have an overhead with annex so unless you are making them part of a huge archive directory spread across multiple disks I don't particularly recommend it.
What I do recommend annex for is media achives. It's particularly good at managing a relatively small (think 20,000) set of files that are huge. You can do selective clones to different disks and you're able to ask where files are from any of them.
I also use annex as an "archive bin" on my windows computers. So like all of my computers have a gitannex/downloads/<computer_name> and they all have a copy, but only <computer_name> has the content of the <computer_name> folder. However, all computers can see the file names of what all the other computers have, and if I need a file from another computer I just ask annex for it and it finds it. When I need space I just "delete" (drop) parts or the entirety of folders. I can still see everything that i have but it's only taking up a few kilobytes after a drop operation.
Unfortunately ntfs is not particularly compatible with annex so you really should not use ntfs as a server, but it's fine to use as a client.
The first of which being that modern HDDs have not only not improved but seem to have dramatically backslid in their handling of many small files. If you are archiving many small files (I'm talking millions) you should be prepared for whatever operations you do to potentially take WEEKS at the very least on an HDD. I would recommend getting a 2 TB external SSD, which are about $150, search to find the largest files and get those out of the folder if you can, and then transfer the small files onto the SSD as a backup. The difference between file transfers to HDD vs SSD was 2.5 weeks vs 36 hours.
Generally annex does not help with small files unless they can benefit from duplicate reduction. Mine did massively. Some of my archives shrank to 1/3rd of the size on ext4. However, small files have an overhead with annex so unless you are making them part of a huge archive directory spread across multiple disks I don't particularly recommend it.
What I do recommend annex for is media achives. It's particularly good at managing a relatively small (think 20,000) set of files that are huge. You can do selective clones to different disks and you're able to ask where files are from any of them.
I also use annex as an "archive bin" on my windows computers. So like all of my computers have a gitannex/downloads/<computer_name> and they all have a copy, but only <computer_name> has the content of the <computer_name> folder. However, all computers can see the file names of what all the other computers have, and if I need a file from another computer I just ask annex for it and it finds it. When I need space I just "delete" (drop) parts or the entirety of folders. I can still see everything that i have but it's only taking up a few kilobytes after a drop operation.
Unfortunately ntfs is not particularly compatible with annex so you really should not use ntfs as a server, but it's fine to use as a client.