f6XF on Nostr: ...
「财富」杂志报道,字节跳动为了追赶大模型的训练进度,把自家的数据爬虫功率拉满了,全网抓取速度达到了OpenAI爬虫的25倍,还说字节、OpenAI、Anthropic这些公司的爬虫全都无视了robots.txt协议,也就是不管一个网站有没有拒绝爬虫访问,它们都照爬无误。[允悲]
Published at
2024-10-05 11:43:02Event JSON
{
"id": "211ef37ff350e33c7c57c7fdc0b6059b950da51d32cb92e68ccd3065f66776c6",
"pubkey": "2936462bda8612e290f17231fddca9a658b472680cb661b537b1121d5b3d683b",
"created_at": 1728128582,
"kind": 1,
"tags": [
[
"e",
"9ca31705212eb454d564fc8b5e7c4b383cf61cd70beab5ae771c32a8c81c1a8f",
"",
"root"
],
[
"p",
"2936462bda8612e290f17231fddca9a658b472680cb661b537b1121d5b3d683b"
]
],
"content": "「财富」杂志报道,字节跳动为了追赶大模型的训练进度,把自家的数据爬虫功率拉满了,全网抓取速度达到了OpenAI爬虫的25倍,还说字节、OpenAI、Anthropic这些公司的爬虫全都无视了robots.txt协议,也就是不管一个网站有没有拒绝爬虫访问,它们都照爬无误。[允悲] ",
"sig": "de0ad9d4edb922c226980010d1d9968ca0571a8492b0a72474a08eb0ebb3b1ef36b87fb7918c91c5fbc92187ca924ab9550da3d386af97fc679f3315aeeb63d2"
}