Translating the jargon: RL == Social control (training Meta users by controlling ...

Translating the jargon:

RL == Social control (training Meta users by controlling their 'like' supply is an example. It's morally dubious and AI researchers who had qualms about it quit a few years ago)

SFT == supervised fine tuning (requires 'supervised labels' or Mechanical Turk (if you are Amazon). Meta got slapped down for using cheap labour in Africa to produce labels not at the 10-15 cent rate for Indian outsourcing but for mere pennies per label. The shocking minimum wage reported a few years ago in this scandal was, btw, the wage paid to the SUPERVISORS, not the WORKERS).

You just have to love Globalist Commie Pinko Gruefags grueing Naggers in East Africa by making them watch child porn to say, ayup that kid there is getting raped by some poofters who want killin', so don't show that one, Zuck... to underbid the Bezos 'Turking' empire.

So... supply and demand... Supervised Labels are the most costly part of building search engines (or using Deep Learning, or LLMs) but the price per label is pretty much fixed, since human intelligence tasks (HIT) depend pretty much on IQ and its avatar, reaction time, no matter how you cut it. The demand for even a percent or two of improved performance on benchmarks is such that Sam Altman is asking for 7 Trillion USD (from Saudi Arabia) and Trump is talking 0.5 T USD in funding from the US (not just to OpenAI though).

So... if the process becomes more efficient, then more quantity will be demanded at the same price (demand curve, which slopes downward and to the right, will shift up), and the equilibrium quantity will be the same (fixed by the supply of HIT contstraint -- vertical supply curve), but the total amount expended will go up, to meet the higher demand.

There is a pretty high demand for training AI base models... this latest fracas in china happened because they used 2.6 million hours on an H800... meaning you need lots of GPUs, or lots of hours, and Nation-States are impatient in an arms race, which is what this is.

tl;dr - they started out trying to beat the supply constraint of human supervised labels for SFT (special data needed for each individual academic discipline) and the fact that LLMs do *not* do any type of reasoning or logic (Chain of Thought or CoT to try to get around that).

This seems to be a 'breakthrough' in cost effective (time scaling) INFERENCE. Whether these results will hold up, or what is implied, is yet to be seen.

Anyway, the Chinks tried to add an extra 'social control' (RL) layer, so instead of doing SFT, then RL, they did massive RL first, in parallel, before proceeding to the final polish, SFT and one last RL (the real social control/censorship polishing that IRL Nation-States demand this tech has, for ideological reasons).

All clear? China needs watchin'.

I hope it's not as simple as: 'Prior Restraint of Human Intelligence Tasks (HITs) is more efficient when you apply Social Control'. But it could be as simple as that. Not Sure.

Macrobius on Nostr: Translating the jargon: RL == Social control (training Meta users by controlling ...