Wulfy on Nostr: nprofile1q…7gqp6 So I've done some preliminary number crunching (using AI natch) On ...
nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpq8c7wjmr8txk9u3xzrxl5rsx8mpt4dr84nyluufn4qg4x9xnar52qr7gqp6 (nprofile…gqp6)
So I've done some preliminary number crunching (using AI natch)
On rolling your own search engine.
There are approx (conservatively) 200 million websites with content.
That does not include deep web.
Total words: ~4.5 trillion words ≈ 33.75 trillion bytes
Approximately 33.75 petabytes of text
Daily New Content:
New daily words: ~225 billion words
Approximately 1.69 terabytes of new text per day
So your new search engine will have to be provisioned for 33 petabytes and your network/CPU build will have to index additional 1.6Terabyte of data a day. I'm getting anxious just thinking about managing a database of this size.
My back of a napkin calculation is the initial cost outlay would be about $300,000 if you do it on a smell of an oil rag and about $100,000 per month. Which is probably unreasonably low.
So I've done some preliminary number crunching (using AI natch)
On rolling your own search engine.
There are approx (conservatively) 200 million websites with content.
That does not include deep web.
Total words: ~4.5 trillion words ≈ 33.75 trillion bytes
Approximately 33.75 petabytes of text
Daily New Content:
New daily words: ~225 billion words
Approximately 1.69 terabytes of new text per day
So your new search engine will have to be provisioned for 33 petabytes and your network/CPU build will have to index additional 1.6Terabyte of data a day. I'm getting anxious just thinking about managing a database of this size.
My back of a napkin calculation is the initial cost outlay would be about $300,000 if you do it on a smell of an oil rag and about $100,000 per month. Which is probably unreasonably low.