Librarian is an easy-to-use viewer for scanned home documents
Features:
- support for PDFs, JPGs and PNGs
- document backups to a mounted volume (or a NAS via NFS!)
- search engine for scanned text (OCR via Google Compute Vision)
- tagging, folders, organize how you want
Check out a demo at https://librarian-demo.montanadev.com
$ docker run -p 8000:8000 \
-e DATABASE_URL=postgresql://user:password@address/database \
ghcr.io/montanadev/librarian:main
apiVersion: v1
kind: ConfigMap
metadata:
name: librarian-config
labels:
app: librarian
data:
DATABASE_URL: 'postgresql://user:password@address/database'
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: librarian
labels:
app: librarian
spec:
selector:
matchLabels:
app: librarian
replicas: 1
template:
metadata:
labels:
app: librarian
spec:
containers:
- name: librarian
image: ghcr.io/montanadev/librarian:main
imagePullPolicy: Always
envFrom:
- configMapRef:
name: librarian-config
ports:
- containerPort: 8000
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "2000m"
Librarian's OCR is performed by GCV -- it can't detect text without credentials. To get an API key:
- Go to https://cloud.google.com/docs/authentication/getting-started
- Follow the
Creating a service account > Cloud Console
instructions. Create a new project, if necessary. - Visit the API library page, search for
Cloud Vision API
- Enable the
Cloud Vision API
for the service account you just created - Go back to Librarian, click
Settings
, and paste the service account JSON key into theCloud Vision API Key
box
As of writing, each month the first 1k pages are free and each 1k pages after that are $1.60.
The only required environment variable is DATABASE_URL
, which should be pointed at a working postgres instance. The rest are optional.
Name | Default | Example | Description |
---|---|---|---|
DATABASE_URL | postgresql://username:password@127.0.0.1/librarian | Database to store document metadata | |
ALLOWED_HOSTS | * | localhost,my-site.com | Django setting (more) |
SECRET_KEY | Django setting (more) | ||
ALLOW_REUPLOAD | false | Set true to allow the same document to be reuploaded as unique documents | |
DISABLE_ANNOTATION | false | Set to true if you don't like OCR and document search |
It would be a real bad idea to put Librarian in a public environment.
Librarian doesn't (currently) require logins, or block anonymous access. I also haven't made XSS prevention and enforcing file types a priority.
Tools used to build Librarian
You can install some of these on macOS via Homebrew
$ brew install node python@3.9 poetry libnfs imagemagick postgres openssl
For the backend
# on Macs with the M1/2 chip, you may encounter gcrpio issues, use the following command to install
$ LDFLAGS="-L/opt/homebrew/opt/openssl@3/lib -L/opt/homebrew/opt/libnfs/lib ${LDFLAGS}" \
CPPFLAGS="-I/opt/homebrew/opt/libnfs/include -I/opt/homebrew/opt/openssl@3/include ${CPPFLAGS}" \
GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1 \
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
poetry install
# the LD/CPPFLAGS may need to be adjusted based on where brew installs libnfs openssl
# ex.
# LDFLAGS=-L/opt/homebrew/opt/openssl@3/lib\ -L/usr/local/Cellar/libnfs/5.0.2/lib \
# CPPFLAGS="-I/usr/local/Cellar/libnfs/5.0.2/include -I/opt/homebrew/opt/openssl@3/include ${CPPFLAGS}" \
# GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1 \
# GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
# poetry install
# for everyone else
$ poetry install
$ createdb librarian
# run database migrations (if postgres isnt running, start with `brew services start postgres`)
$ make migrate
# start the server
$ make run
For the frontend
$ cd client
$ npm i
$ npm start
See Makefile for additional commands.
Test uploads without drag-n-dropping on the frontend
$ curl 'http://0.0.0.0:8000/api/documents/home-title.pdf' -H 'Content-Type: application/pdf' --data-binary '@home-title.pdf'
See roadmap.md