nusa on Nostr: It’s tempting to scrape and crawl everywhere because it is open and accessible. But ...
It’s tempting to scrape and crawl everywhere because it is open and accessible. But sheer volume will surface as a problem. The whole point is that you don’t need to hold all the content yourself. Smart scrapers will only retain pointers to where the content was seen and not the complete compendium, and sort and process as they go. Keep this, dump that, convert to vector, save a summary, guess topics, feed to an agent,… then forget.
Published at
2025-04-04 09:59:25Event JSON
{
"id": "875abfd28d382ea96973b73576ede36b313d38e7f88d887122ba54f70458dd16",
"pubkey": "d475ce4b3977507130f42c7f86346ef936800f3ae74d5ecf8089280cdc1923e9",
"created_at": 1743760765,
"kind": 1,
"tags": [
[
"e",
"b5c09f93e29efa1aa1e8a214e0635c1e7a266d4407702e6306f08af904432b1a",
"",
"root"
],
[
"p",
"b90c3cb71d66343e01104d5c9adf7db05d36653b17601ff9b2eebaa81be67823"
]
],
"content": "It’s tempting to scrape and crawl everywhere because it is open and accessible. But sheer volume will surface as a problem. The whole point is that you don’t need to hold all the content yourself. Smart scrapers will only retain pointers to where the content was seen and not the complete compendium, and sort and process as they go. Keep this, dump that, convert to vector, save a summary, guess topics, feed to an agent,… then forget.",
"sig": "9bf9a2ea0a6f2c4ba7d4bd8a141e9d8f282f825432f853e6c5feed77a7324b5ca2906970cab7f5cf70095b8e728291ab93304b74633a98278b46208eff6b2151"
}