Ignacio Castaño on Nostr: nprofile1q…0t3wn Yeah, my initial inclination was to avoid the 2x cmovs, since ...
nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqykddq7nmfd8cd0hupl8fnsmmuhtg55zusm6de4fjecmrhsx4x97qs0t3wn (nprofile…t3wn) Yeah, my initial inclination was to avoid the 2x cmovs, since these are not available in RDNA and I assumed it would be missing in other ISAs. Turns out that was a win in some, but a regression in others.
Integer bit ops execute at a quarter rate in some GPUs, so if cmov is available, better to use fewer instructions. What I don't get is why option 3 is slower than 2!
Integer bit ops execute at a quarter rate in some GPUs, so if cmov is available, better to use fewer instructions. What I don't get is why option 3 is slower than 2!