A command-line tool for masking authorship of text, by changing the writing style with a Large Language Model.
The main use cases of masking an author's writing style are:
- anonymizing the author of a text
- protecting the identity of whistleblowers and activists
- see more use cases at Adversarial Stylometry
Despite it's pre-production status, this library has several known limitations:
- Only a limited number of transformations are implemented (see
transform.py
). - Long chains of transformations have observed to make the LLM output artifacts.
- Sensitive content can trigger an LLMs censoring, and thus ruin the output.
In this case it is advised to try uncensored LLMs, e.g. of thewizard-vicuna-uncensored
type. - Currently, unique names of places or persons are not removed/anonymized.
- Locally serve a Large Language Model server with ollama:
$ ollama serve
- Make sure a potent model is served, e.g. a version of
nous-hermes2
:
$ ollama run nous-hermes2:10.7b-solar-q6_K
- Mask your writing style by transforming it into a different one:
$ llmask -v -i "this was a triumph. i'm making a note here: huge success."
User-provided input:
> this was a triumph. i'm making a note here: huge success.
Result after applying transformation 'thesaurus':
> This was an astonishing achievement. I'll jot down: extraordinary victory.
Result after applying transformation 'simplify':
> This was a great success. I'll write down: wonderful win.
For larger-scale text work, the text input and output can also be piped:
$ cat input.txt | llmask > output.txt
LLMs can run on ordinary CPUs, e.g. with ollama
.
However, GPU acceleration greatly accelerates execution speed.
Please note that this project is tested most thoroughly on Apple Silicon hardware.
This command line tool can be installed with: pipx install llmask
$ llmask --help
Usage: llmask [OPTIONS]
Transform input text with chained transformations by a Large Language Model.
Options:
-t, --transformations TEXT Sequence of transformations to apply in order,
e.g. 'tsp' for the steps 'thesaurus ->
simplify -> persona', where 't' applies
thesaurus, 's' simplifies, and 'p' imitates a
persona. [default: ts]
-i, --input TEXT Input text that will be transformed.
-p, --persona TEXT Name of persona whose writing style to
imitate. [default: Ernest Hemingway]
-m, --model TEXT Name of model to use (as known to model
server). [default: nous-
hermes2:10.7b-solar-q6_K]
-u, --url TEXT URL of Open AI compatible model API.
[default: http://localhost:11434/v1]
-v, --verbose Verbosity level. At default, only the final
output is returned. [default: 0]
-r, --randomness FLOAT RANGE Higher values make the output more
random.Parameter value is passes as 'sampling
temperature' to language model. [default:
0.5; 0.0<=x<=2.0]
-s, --seed INTEGER Repeated requests with the same `seed` and
parameters should return the same result.
[default: 42]
-h, --help Show this message and exit.
The development environment can be installed via: poetry install
.
- support transformations from and into text files
- measure success of obfuscation
- measure success of anonymzation with de-anonymization tools (e.g.
faststylometry
) - check with GPTZero if suspected author is an LLM
- measure success of anonymzation with de-anonymization tools (e.g.
- re-introduce test suite