Attributive machine learning

There's a lot of buzz around things like DALL-E image generation, GitHub Copilot source code generation, and so on. I'm worried that they don't attribute their sources properly, which might lead to unintentional plagiarism (I call AI-assisted plagiarism plAIgiarism). I spent some time over the last month thinking about how to improve the situation, the main idea is that the machine learning model should keep track of what it is learning from in the training stage, so that it can attribute its influences properly in the generation stage.

The first machine learning model I tried was Markov Chains for text generation, where the probablities of the next output (character or word) depends on the recent history (previous characters or words). A Markov Chain can be generated from a text corpus, by counting the occurences of groups of characters or words. I used characters (in Haskell) or bytes (in Lua), which are not quite the same (some characters take up multiple bytes in the common UTF-8 encoding).

To add attribution to the basic Markov Chain implementation, I added an extra histogram of how often each source file in the corpus contained each group of characters. Then when emitting characters, I include this data. I wrote a bit of Javascript that collates the embedded histograms over the selected part of the output HTML, so you can explore the generated text's attribution in the browser.

Then I tried an artificial neural network for a classification task. After some false starts, I settled on classifying music by genre, using the rhythm fingerprints from my disco project. As attribution I decided to use releasing label, mainly because attributing individual artists might use too many resources (mainly memory), let alone individual tracks.

Adding attribution to the neural network turned out to be quite easy, in the weight update step in training where w += dw I now also do dwda += outer(dw, a) where a is a one-hot attribution vector. Then in the classification (feed forward) step where O = w @ I I do dOda = w @ dIda + I @ dwda. At the end I process the final dOda to give an indication of whether a particular source was more significant than expected in generating the output classification.

The attributive Markov Chain was much more satisfying, because of the way it worked as a generator - you could see snippets taken verbatim from particular sources, annotated with their attribution. The genre classifier is much less immediate - most of the attributions differ by tiny amounts, the fun part (is the classification more or less accurate than the genre in the track metadata?) is unrelated to attribution.

Project page with source code repository and example output: mathr.co.uk/attributive-machine-learning