Attributive Machine Learning

Claude Heiland-Allen

2022-06-17

Attributive Machine Learning

https://mathr.co.uk/attributive-machine-learning

Machine learning algorithms often put the AI into PLAIGIARISM.

This repository contains machine learning algorithms that properly attribute the sources used for the output.

Source Code Repository

Browse at https://code.mathr.co.uk/attributive-machine-learning.

Download with git:

git clone https://code.mathr.co.uk/attributive-machine-learning.git

Code is implemented in the Haskell, Lua and JavaScript programming languages.

Attributive Markov Chain

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

https://en.wikipedia.org/wiki/Markov_chain

The chain is constructed by analysing a source corpus to construct the probability tables for the next token in each state (determined by the previous tokens). The process starts from a prompt, each following tokens are determined by weighted random choice given the current context.

The source corpus is made of many files, attribution takes the form of listing how much each source file influenced the choice of each output token. Selecting text in the output HTML shows the corresponding attribution (requires JavaScript).

This implementation uses tokens of a single character (Unicode code points for the Haskell version, bytes for the Lua version). The Lua version is much faster than the Haskell version and uses much less memory.

Examples

Usage

Lua version:

lua attributive-markov-chain.lua "prompt" source ... > output.html

Also works with luajit.

Haskell version:

runghc attributive-markov-chain.hs "prompt" source ... > output.html

You may want to cd to the directory containing your sources first, otherwise long path names may be included in the output (causing both size and privacy issues). The generated output.html expects the JavaScript file attributive-markov-chain.js to be adjacent to it.

Note: sources must be UTF-8 text, you may use iconv to convert encodings.

Attributive Machine Learning

Copyright (C) 2022 Claude Heiland-Allen

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.


https://mathr.co.uk