2022-06-17
https://mathr.co.uk/attributive-machine-learning
Machine learning algorithms often put the AI into PLAIGIARISM.
This repository contains machine learning algorithms that properly attribute the sources used for the output.
Browse at https://code.mathr.co.uk/attributive-machine-learning.
Download with git
:
git clone https://code.mathr.co.uk/attributive-machine-learning.git
Code is implemented in the Haskell, Lua and JavaScript programming languages.
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.
The chain is constructed by analysing a source corpus to construct the probability tables for the next token in each state (determined by the previous tokens). The process starts from a prompt, each following tokens are determined by weighted random choice given the current context.
The source corpus is made of many files, attribution takes the form of listing how much each source file influenced the choice of each output token. Selecting text in the output HTML shows the corresponding attribution (requires JavaScript).
This implementation uses tokens of a single character (Unicode code points for the Haskell version, bytes for the Lua version). The Lua version is much faster than the Haskell version and uses much less memory.
github.com/agraef/pd-lua/examples/*.pd_lua
at commit f07953b4f7586d936e57a437ed9f66af8240a839
, prompt pd.Class:new
: examples/pd-lua-examples.html.Lua version:
lua attributive-markov-chain.lua "prompt" source ... > output.html
Also works with luajit
.
Haskell version:
runghc attributive-markov-chain.hs "prompt" source ... > output.html
You may want to cd
to the directory containing your sources first, otherwise long path names may be included in the output (causing both size and privacy issues). The generated output.html
expects the JavaScript file attributive-markov-chain.js
to be adjacent to it.
Note: sources must be UTF-8 text, you may use iconv
to convert encodings.
Attributive Machine Learning
Copyright (C) 2022 Claude Heiland-Allen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.