Paperbaum is a decentralized academic paper publishing and verification system built on a custom Substrate-based parachain. It addresses issues of authorship verification, restricted access, and inefficient paper linking in academic publishing.
Link to the slides: here
- Substrate Parachain: Custom runtime for paper metadata storage in a merkle tree and verification.
- IPFS Integration: Decentralized storage for full paper content.
- Vector Similarity Engine: NLP-based system for semantic paper linking.
The core of Paperbaum is built on a custom Substrate parachain, providing a robust and flexible foundation for academic paper management and verification. The custom pallet uses a Merkle tree to natively link papers together. This pallet provides functionality for:
- Managing a Merkle tree of paper hashes
- Verifying Merkle proofs
- Storing and retrieving paper metadata
- Enforcing size limits on various paper attributes
Paperbaum leverages the InterPlanetary File System (IPFS) for decentralized storage of full paper content. This integration ensures that papers are stored in a distributed, content-addressed manner, enhancing accessibility and permanence.
Paperbaum implements a vector similarity engine for semantic paper linking. This system uses OpenAI's text embedding model to generate vector representations of papers, enabling efficient similarity searches. The generateEmbedding
function creates a vector representation of text, while cosineSimilarity computes the similarity between two vectors.
When a paper is uploaded, Paperbaum processes the PDF, extracts key metadata, and generates a vector representation:
- PDF text extraction
- Metadata extraction using GPT4o-mini
- Vector embedding generation
- IPFS upload
- Storage of metadata and vector in-memory in a merkle tree
![](https://private-user-images.githubusercontent.com/80065244/350760139-cebaa7ac-5f2e-4efb-bfa8-d3afce0734e3.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMjU1OTEsIm5iZiI6MTczOTEyNTI5MSwicGF0aCI6Ii84MDA2NTI0NC8zNTA3NjAxMzktY2ViYWE3YWMtNWYyZS00ZWZiLWJmYTgtZDNhZmNlMDczNGUzLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE4MjEzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTUxOTQ1N2Y3ZjFhYmM4ZGRlZDY4OGNhYzlkNDllMmU3NDUxM2I1Y2M5Nzg4MGM3ZTMzZmZhOWI3MWI4YTgwMmUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.8qkVbL8of5AA-CIJjELJRcm3HFA2NrIAYk2quL87MTc)
![](https://private-user-images.githubusercontent.com/80065244/350760140-1c621f98-d9df-417e-92d7-db6a1b2354c9.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMjU1OTEsIm5iZiI6MTczOTEyNTI5MSwicGF0aCI6Ii84MDA2NTI0NC8zNTA3NjAxNDAtMWM2MjFmOTgtZDlkZi00MTdlLTkyZDctZGI2YTFiMjM1NGM5LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE4MjEzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBiMmNiNzE3YzMyZmFkOTk2NTdmOTE5OWE1MWU5OWI1YzdkMTA1YjI1MzQ5YTA2NDVmOGJhZWRmNmRlMWFkZTkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.P5Jdakqdh2DrIHoKj7V1tK5pzgFsHXg6AZo-GbN87hs)
![](https://private-user-images.githubusercontent.com/80065244/350760141-d1d4f39c-96e0-423b-9c1f-be90f3ce1a3b.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMjU1OTEsIm5iZiI6MTczOTEyNTI5MSwicGF0aCI6Ii84MDA2NTI0NC8zNTA3NjAxNDEtZDFkNGYzOWMtOTZlMC00MjNiLTljMWYtYmU5MGYzY2UxYTNiLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA5VDE4MjEzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI1ZmVhM2VhNjk0NjMxMGYwNGNmZGU5YTViZGFmOTU1M2Q2NmI5NzkyZGQzOWRkOGUzMGI5MWRjMmU2N2QwZWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.LuaI9M5WiEekxjV6BPW_I-DZSwTIVKGsfYM3nHBdDMo)
For the parachain, first compile it using
cargo build --release
and then run
./target/release/node-template --dev
to run the substrate node on 127.0.0.1:9944
To run the backend server, enter the backend
directory and run
npm install
and then proceed to run
node server.js
to run the server on localhost:3000
To run the frontend, enter the frontend
directory and run
npm install
and then
npm run dev
to run the frontend on localhost:3001
- Develop a more sophisticated Merkle tree structure for efficient paper linking and verification.
- Implement Merkle Mountain Ranges (MMR) for dynamic dataset management, allowing efficient updates and proofs of inclusion.
- Develop a ZK-based reputation system for anonymous yet credible peer reviews.
- Create ZK proofs for citation verification without revealing full paper contents.
- Implement double-blind review processes using ZK proofs.
- Develop a reputation system for reviewers based on the quality and timeliness of their reviews.
- Develop cross-chain citation verification and tracking.
- Create a system for recognizing academic credentials and reputations across different blockchain networks.
- Implement versioning and provenance tracking of papers using OriginTrail's blockchain-agnostic protocol.
- Develop an AI-assisted discovery system leveraging OriginTrail's semantic data structure.
See MIT License