pdfLibrary is an API that provides basic CRUD operations on PDF documents, with features like text extraction and searching.
- Clone the project:
git clone https://github.com/aigeoo/pdfLibrary.git
- Run
cd pdfLibrary
- Create
.env
file from.env.example
file and adjust database & port parameters - Install the dependencies:
npm install
- Build the application:
npm build
- Run
npm start
- Browse the application on http://127.0.0.1:3000
- Make sure you have docker installed. To install docker click here.
- Clone the project:
git clone https://github.com/aigeoo/pdfLibrary.git
- Run
cd pdfLibrary
- Create
.env
file from.env.example
file and adjust database & port parameters - Build the Docker image:
docker build -t <image-name> .
- Run the Docker container:
docker run -p 3000:3000 <image-name>
- Browse the application on http://127.0.0.1:3000
/auth/register
: Register a new user:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"username": "<USERNAME>", "password": "<PASSWORD>"}' \
http://localhost:3000/api/v1/auth/register
/auth/login
: Get authorization token:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"username": "<USERNAME>", "password": "<PASSWORD>"}' \
http://localhost:3000/api/v1/auth/login
/data/create
: Create a new data record by uploading a PDF file
curl -X POST \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Basic <token>" \
-F "file=@test.pdf" \
http://localhost:3000/api/v1/data/create
/data/all
: Retrieve all of the registered files in the database
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
http://localhost:3000/api/v1/data/all
/data/download/<id>
: Retrieve a stored PDF given the ID
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
http://localhost:3000/api/v1/data/download/<id>
/data/delete/
: Delete a PDF file and all its related data
curl -X DELETE \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
http://localhost:3000/api/v1/data/delete/<id>
/search/sentences/<id>
: Return all the parsed sentences for a given PDF ID
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
http://localhost:3000/api/v1/data/search/sentences/<id>
/search/word/<word>
: Search for the existence of a certain keyword in all stored PDF's
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
http://localhost:3000/api/v1/data/search/word/<word>
/search/topwords/<id>
: Retrieve the top 5 occurring words in a PDF
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
http://localhost:3000/api/v1/data/search/topwords/<id>
/search/wordcount
: Check the occurrence of a word in PDF
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
-d '{"id": "<ID>", "keyword": "<KEYWORD>"}' \
http://localhost:3000/api/v1/data/search/wordcount
/search/image
: Check the occurrence of a word in PDF
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Basic <token>" \
-d '{"id": "<ID>", "page": "<PAGE_NUMBER>"}' \
http://localhost:3000/api/v1/data/search/image