Dan Piponi on Nostr: npub15swlx…zx855 This is a fuzzy lookup in a key-value store. The rows of Q are ...
npub15swlxudlhx4ttcgsd4556zuqrl57qndxmt4n3dnzrkqn89nxv6lsjzx855 (npub15sw…x855) This is a fuzzy lookup in a key-value store.
The rows of Q are queries (q_i). The columns of K are keys (k_i). The rows of V are values (v_i). q_i.
In the "ideal" case where all of the k_i are orthogonal, and q_i is precisely equal to one of the k_j, then the dot product q_i.k_j = 1 and the other q_i.k_l = 0. Applying softmax to the rows of the matrix ensures that there is a 1 in the jth column of the ith row. Which will then pick out v_j. So in this case literally looks up q_i in the table of (k_i,v_i) pairs and returns the appropriate v_i.
The given formula is then a smooth extension of that to the more general case (ie. q_i doesn't exactly match any k_j and the k_j aren't orthogonal).
The rows of Q are queries (q_i). The columns of K are keys (k_i). The rows of V are values (v_i). q_i.
In the "ideal" case where all of the k_i are orthogonal, and q_i is precisely equal to one of the k_j, then the dot product q_i.k_j = 1 and the other q_i.k_l = 0. Applying softmax to the rows of the matrix ensures that there is a 1 in the jth column of the ith row. Which will then pick out v_j. So in this case literally looks up q_i in the table of (k_i,v_i) pairs and returns the appropriate v_i.
The given formula is then a smooth extension of that to the more general case (ie. q_i doesn't exactly match any k_j and the k_j aren't orthogonal).