Macrobius on Nostr: My reply in thread (mid Oct 2024) ...
My reply in thread (mid Oct 2024)
https://thephora.net/phoranova/index.php?threads/the-continued-wayback-machine-censorship-thread.1581/#post-15680
responding to a comment on another forum:
> 1 happened to need archive.org yesterday and found out you already posted here about it being down.
> But Archive.org is back online now!
From TBC version of this thread ^^
This partly true -- they put up a 'read only' instance
However
1/ all inbound links to them are still broken
2/ they also host the purl.org domain, which has the global persistent URL (pURL) resolver. You might not care but quite a few things in academia will break as that stays down. Have you ever seen an academic paper with doi:10.1000/blah sort of thing in it (a Digital Object ID used to identify academic papers and other things as 'objects'). Those use pURLs.
Some of those systems might run their own embedded purl resolvers, some may not. YMMV
[[ so far, recapitulating above phora thread ]]
To see the problem, suppose Twitter/X went under the wave... who cares, right? But t.co (url shortener) domain is used in every tweet everywhere to bring the click back to Twitter for market analytics and in particular only Twitter can translate t.co to an actual link.
archive.org understands the problem here -- if Twitter ever goes for final dirt nap, a very large portion of their curated collections will have 404 links and be useless. Solution, cache that information when they scrape as part of the process. It's not just content... but WHAT ABOUT THE METADATA? (link resolution).
Er, Houston, they forgot to cache their own metadata themselves. But anything like that on the web where the metadata is more important sometimes than cache the content (for most people who don't cache a Trillion webpages like the Wayback Machine does...) the data will be there... they just won't be able to find it.
Another instance: Google 'went down' for 5 minutes in Dec 2022... and internet traffic dropped 40% ... what happened? Is 40% of internet traffic people using Google web properties, watch YouTube or what? Hardly -- my guess is that 40% of internet traffic goes to sites that use Google Analytics, and the backend for that stopped working. Pages loaded slowly or hardly at all, people got bored with the internet for 5 minutes... and half of them found something better to do with their lives besides watching blank screens and spinning cursors.
The 'internet' was just fine of course -- all the sites were there, still functional. Two possible and easy fixes that no one will know who to do the NEXT TIME IT HAPPENS:
1/ if you are a webmaster, turn off the broken Analytics, Now your site works again and at least YOUR customers are happily browsing YOU. You won't be getting those analytics from Google anyway until they fix their shit.
2/ but the real problem is at the edge -- the server isn't responsible for sending analytics to Google -- YOUR BROWSER IS. It downloaded that page just fine and all the .js assets it needed to get the job done, and followed orders... but those orders didn't work, so now your browser is useless and uninteresting.
Fix? Simple... find your /etc/hosts file and replace the failing Google URLs with 127.0.0.1 -- that will fail for you instantly, and your browser will in almost all cases give you a now fully armed and operational Death Page. Probably ad and junk filtering software, and privacy software, can do this more easily (fix at the proxy level rather than old hacker method of blacklisting URLs by returned them locally)
https://thephora.net/phoranova/index.php?threads/the-continued-wayback-machine-censorship-thread.1581/#post-15680
responding to a comment on another forum:
> 1 happened to need archive.org yesterday and found out you already posted here about it being down.
> But Archive.org is back online now!
From TBC version of this thread ^^
This partly true -- they put up a 'read only' instance
However
1/ all inbound links to them are still broken
2/ they also host the purl.org domain, which has the global persistent URL (pURL) resolver. You might not care but quite a few things in academia will break as that stays down. Have you ever seen an academic paper with doi:10.1000/blah sort of thing in it (a Digital Object ID used to identify academic papers and other things as 'objects'). Those use pURLs.
Some of those systems might run their own embedded purl resolvers, some may not. YMMV
[[ so far, recapitulating above phora thread ]]
To see the problem, suppose Twitter/X went under the wave... who cares, right? But t.co (url shortener) domain is used in every tweet everywhere to bring the click back to Twitter for market analytics and in particular only Twitter can translate t.co to an actual link.
archive.org understands the problem here -- if Twitter ever goes for final dirt nap, a very large portion of their curated collections will have 404 links and be useless. Solution, cache that information when they scrape as part of the process. It's not just content... but WHAT ABOUT THE METADATA? (link resolution).
Er, Houston, they forgot to cache their own metadata themselves. But anything like that on the web where the metadata is more important sometimes than cache the content (for most people who don't cache a Trillion webpages like the Wayback Machine does...) the data will be there... they just won't be able to find it.
Another instance: Google 'went down' for 5 minutes in Dec 2022... and internet traffic dropped 40% ... what happened? Is 40% of internet traffic people using Google web properties, watch YouTube or what? Hardly -- my guess is that 40% of internet traffic goes to sites that use Google Analytics, and the backend for that stopped working. Pages loaded slowly or hardly at all, people got bored with the internet for 5 minutes... and half of them found something better to do with their lives besides watching blank screens and spinning cursors.
The 'internet' was just fine of course -- all the sites were there, still functional. Two possible and easy fixes that no one will know who to do the NEXT TIME IT HAPPENS:
1/ if you are a webmaster, turn off the broken Analytics, Now your site works again and at least YOUR customers are happily browsing YOU. You won't be getting those analytics from Google anyway until they fix their shit.
2/ but the real problem is at the edge -- the server isn't responsible for sending analytics to Google -- YOUR BROWSER IS. It downloaded that page just fine and all the .js assets it needed to get the job done, and followed orders... but those orders didn't work, so now your browser is useless and uninteresting.
Fix? Simple... find your /etc/hosts file and replace the failing Google URLs with 127.0.0.1 -- that will fail for you instantly, and your browser will in almost all cases give you a now fully armed and operational Death Page. Probably ad and junk filtering software, and privacy software, can do this more easily (fix at the proxy level rather than old hacker method of blacklisting URLs by returned them locally)