I am going to try this today: https://github.com/Tencent/Hunyuan3D-2 Not the ...

Tyler the Enginigger /

npub1a8p…2ncx

2025-01-27 23:35:58

in reply to nevent1q…9ccp

I am going to try this today: https://github.com/Tencent/Hunyuan3D-2

Not the deepseek-r1 because I got bored of it

You can run almost any model on the cpu with your normal ram, but I think, for some cards and for some cpus and for some versions of linux on some drivers you can either do cpu offload with a gpu or only gpu

Because it's in such a fast moving field I can't tell you ## b models are good or better or behave this way or fit in this much vram etc because it also depends if they're using fp32/16/8 or int8/4 quantization to reduce total size at the sacrifice of accuracy.

For example, the one I linked above, I don't have enough vram. I do for ONE model, but the gradio_app.py wants. Sometimes you can do cpu_offload=True somewhere and it will be MUCH slower but you can at least run it.

It's a pretty complicated field, as far as new concepts and what effects what, and I'm still trying to figure it all out.

Ironically, this memory issue could be fixed if nvidia would just support sharing system ram with vram by extending the address space, but that would defeat the purpose of them wanting you to buy newer more expensive cards with more vram, wouldn't it?

Author Public Key

npub1a8p5fz2ja6dsntqvngf7x2ej7gp43q2w8e85lfnxqexsqdslpmdqhe2ncx

Show more details

Published at

2025-01-27 23:35:58

Kind type

1 Short Text Note

Event JSON

{ "id": "3cf551165dae0ee30d575f6dda48cf7bda8abf78fc0df33e7f72b3e91d703f4d", "pubkey": "e9c3448952ee9b09ac0c9a13e32b32f20358814e3e4f4fa666064d00361f0eda", "created_at": 1738020958, "kind": 1, "tags": [ [ "p", "8b347916be2cb3ab9687c9eb78a8d05224c045bce5b416bdd50169965eb0f45c", "wss://relay.mostr.pub" ], [ "e", "3f8fc57d2ed0b81ccbd5fb15e15fedcfe10cd7af0b87f58118f10b9db544a4ea", "wss://relay.mostr.pub", "reply" ], [ "imeta", "url https://media.nicecrew.digital/2dee98dfb3ab1a42ed388595f1a2b05cc5d946b8119653275a60b20fca2ad2ce.png", "m image/png", "dim 1350x601", "blurhash D25#qr_3of%Mt7tRxuofV@j[" ], [ "proxy", "https://nicecrew.digital/objects/c5bca586-0c1a-4b5b-a98b-0844093fb4d1", "activitypub" ] ], "content": "I am going to try this today: https://github.com/Tencent/Hunyuan3D-2\n\nNot the deepseek-r1 because I got bored of it\n\nYou can run almost any model on the cpu with your normal ram, but I think, for some cards and for some cpus and for some versions of linux on some drivers you can either do cpu offload with a gpu or only gpu\n\nBecause it's in such a fast moving field I can't tell you ## b models are good or better or behave this way or fit in this much vram etc because it also depends if they're using fp32/16/8 or int8/4 quantization to reduce total size at the sacrifice of accuracy.\n\nFor example, the one I linked above, I don't have enough vram. I do for ONE model, but the gradio_app.py wants. Sometimes you can do cpu_offload=True somewhere and it will be MUCH slower but you can at least run it.\n\nIt's a pretty complicated field, as far as new concepts and what effects what, and I'm still trying to figure it all out.\n\nIronically, this memory issue could be fixed if nvidia would just support sharing system ram with vram by extending the address space, but that would defeat the purpose of them wanting you to buy newer more expensive cards with more vram, wouldn't it?\n\nhttps://media.nicecrew.digital/2dee98dfb3ab1a42ed388595f1a2b05cc5d946b8119653275a60b20fca2ad2ce.png", "sig": "5a19019054a4023384adc5cabbe7df3831d2af964eadd0d809fbb8af758135c7b5dc9fd27402d92a5db17b69612670df4a8f6c3423246259f55318b24ef0cd91" }