Attributive Machine Learning

Claude Heiland-Allen

2023-01-06

Attributive Machine Learning

https://mathr.co.uk/attributive-machine-learning

Machine learning algorithms often put the AI into PLAIGIARISM.

This repository contains machine learning algorithms that properly attribute the sources used for the output.

Source Code Repository

Browse at https://code.mathr.co.uk/attributive-machine-learning.

Download with git:

git clone https://code.mathr.co.uk/attributive-machine-learning.git

Code is implemented in the C, Haskell, JavaScript, Lua, and Python programming languages, with support code in Bash and Make.

Attributive Markov Chain

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

https://en.wikipedia.org/wiki/Markov_chain

The chain is constructed by analysing a source corpus to construct the probability tables for the next token in each state (determined by the previous tokens). The process starts from a prompt, each following tokens are determined by weighted random choice given the current context.

The source corpus is made of many files, attribution takes the form of listing how much each source file influenced the choice of each output token. Selecting text in the output HTML shows the corresponding attribution (requires JavaScript).

This implementation uses tokens of a single character (Unicode code points for the Haskell version, bytes for the Lua version). The Lua version is much faster than the Haskell version and uses much less memory.

Examples

Usage

Lua version:

First generate chain.lua from sources, order is an integer (e.g. 8):

lua attributive-markov-chain.lua build chain.lua order source ...

Then optionally censor it (modifies in place) to forbid certain words or phrases (censor order can be smaller than build order):

lua attributive-markov-chain.lua censor chain.lua order forbidden ...

And prune it (modifies in place) to remove dead ends:

lua attributive-markov-chain.lua prune chain.lua

Finally generate text (the prompt should be the same length as the order used to build the chain):

lua attributive-markov-chain.lua generate chain.lua prompt > output.html

Also works with luajit.

Haskell version:

make attributive-markov-chain
./attributive-markov-chain "prompt" source ... > output.html

You may want to cd to the directory containing your sources first, otherwise long path names may be included in the output (causing both size and privacy issues). The generated output.html expects the JavaScript file attributive-markov-chain.js to be adjacent to it.

Note: sources must be UTF-8 text, you may use iconv to convert encodings.

Attributive Neural Network

Artificial neural networks are computing systems inspired by biological brains. A neural network is based on a collection of connected neurons, each of which can transmit a signal to other neurons. The signal at a connection is a real number, and the output of each neuron is computed by some non-linear function of the weighted sum of its inputs. The weights, which are adjusted during learning, increase or decrease the strength of the signals arriving at the neurons. Typically, neurons are aggregated into layers. Signals travel from the input layer, to the output layer, possibly through multiple hidden layers in between.

– adapted from https://en.wikipedia.org/wiki/Artificial_neural_network

Examples

Usage

Download the data set I created from: data/genres.data.gz

Python version:

gunzip -k data/genres.data.gz
python3 attributive-neural-network.py --train data/genres.data genres/
python3 attributive-neural-network.py --classify genres/network.npz data/genres.data > genres.html

To generate your own data set, you need rhythm-analysis from Disco (browse at https://code.mathr.co.uk/disco) that is assumed to be findable via the system PATH environment variable:

git clone https://code.mathr.co.uk/disco.git
make -C disco rhythm-analysis

You also need a collection of music in MP3 format. Tracks should be stored with paths like label/release/track.mp3. The tracks of each release are decoded to perceptually normalized WAV with ffmpeg, and analysed with rhythm-analysis. The results are concatenated into output files per genre, which are then randomly sampled into a smaller corpus. This is all orchestrated by data/genres.sh

If you don’t have a collection of music, you can download some from the Internet Archive netlabels collections using unarchive_collection from Unarchive (homepage at https://mathr.co.uk/unarchive).

Attributive Machine Learning

Copyright (C) 2022,2023 Claude Heiland-Allen

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.


https://mathr.co.uk