o1 is impressive in reasoning. looks like they fine tuned the model with chain of ...

o1 is impressive in reasoning. looks like they fine tuned the model with chain of thought reasoning capabilities and it got better at it. previously CoT was done using prompts.

it looks like after gobbling up everything pre trainable closed ai spent a lot of time for post training engineering, with CoT and reinforcement learning.

is this a general rule of thumb then: when something works in prompt, training with that can be a dramatic effect in the model performance.

but as always be mindful of what AI is saying. don't get super excited by smartness. also seek truthfulness. as i have shown they don't always tell the truth when it matters.

someone on Nostr: o1 is impressive in reasoning. looks like they fine tuned the model with chain of ...