gentoobro on Nostr: AI is limited by the functions it models, which are an approximation of some ...
AI is limited by the functions it models, which are an approximation of some generalization of the training data. It can never generate something inherently different from the training data. In the case of image generation, they use convolutional algorithms which means "something" is a finite feature of the image of some size.
So, if the training data only included squares and triangles, it's impossible to get a circle out it. You can get many combinations or squares and triangles of various sizes and orientations, but never a circle. In this way AI is limited to the content it can make.
Prompting is similar. It has no idea what what is inside images, nor what words are, nor does it have any concept of ideas or concepts. The training data was manually labeled by humans. A picture of a square inside a triangle might be labeled "small red square inside large blue triangle". It has no idea what any of those sequences of letters mean. It merely collects statistics to model a function that maps words into pictures (more or less). Many photos labeled with "small red square" that feature as such will result in strong coefficients between those words and a block of matrix coefficients which will converge to make a red square in an image.
So, if the training data never included a phrase like "to the left of", the internal mapping would not be able to do anything with that phrase. In this way AI is limited in prompting to the descriptions used in the training data, and thereby the creativity of the human users in prompting it.
tl;dr: It's cool technology, but math says that it's limited to compositions and modifications of its training data.
So, if the training data only included squares and triangles, it's impossible to get a circle out it. You can get many combinations or squares and triangles of various sizes and orientations, but never a circle. In this way AI is limited to the content it can make.
Prompting is similar. It has no idea what what is inside images, nor what words are, nor does it have any concept of ideas or concepts. The training data was manually labeled by humans. A picture of a square inside a triangle might be labeled "small red square inside large blue triangle". It has no idea what any of those sequences of letters mean. It merely collects statistics to model a function that maps words into pictures (more or less). Many photos labeled with "small red square" that feature as such will result in strong coefficients between those words and a block of matrix coefficients which will converge to make a red square in an image.
So, if the training data never included a phrase like "to the left of", the internal mapping would not be able to do anything with that phrase. In this way AI is limited in prompting to the descriptions used in the training data, and thereby the creativity of the human users in prompting it.
tl;dr: It's cool technology, but math says that it's limited to compositions and modifications of its training data.