I recently started to think about current sources of data that are given to AI and the claim that "AI like GPT is not producing samples of data it was learning on but instead creates new content based on context"
So let’s start the experiment. Some person A
scraps websites like IMDb for movie reviews and later feeds it to his AI.
Next, he defines the output of AI. Basically, AI should output new reviews with the context of previously learned movies.
Context is defined as a positive or negative review.
So when you ask this AI to generate a review of Scott Pilgrim vs The World
it would generate content with text that is
completely different than all reviews written in IMDb but the context of those reviews is remembered.
It’s important that this context is limited to the data sources
So this AI is capable of generating all reviews for all IMDb movies but reviews
are each time different. The thing is you ask your AI to make a review based on some
parameter. Let’s say the overall rating of the movie. AI is aware of this rating and
it always generates positive reviews for Scott Pilgrim vs The World
Should it be right for person A
to do it? It does not repeat content with "samples"
but it repeats the context of the data. It repeats the general opinion of people
which is the intellectual content of IMDb