Skip to content

We propose Atomas, a hierarchical molecular representation learning framework that jointly learns representations from SMILES strings and text. We design a Hierarchical Adaptive Alignment model to automatically learn the fine-grained fragment correspondence between two modalities and align these representations at three semantic levels.

Notifications You must be signed in to change notification settings

yikunpku/Atomas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

【ICLR 2025】🧪 Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Project Page | Paper | Report Bug | Citation

Table of Contents
  1. Updates
  2. About The Project
  3. Getting Started
  4. Citation
  5. Contact

📣 Updates

  • [2025/2/24]: We've released the source code and pretrained checkpoint for Atomas. This includes everything needed to start using and extending Atomas in your own projects. Enjoy easy setup and quick integration!

📘 About The Project

We propose Atomas, a hierarchical molecular representation learning framework that jointly learns representations from SMILES strings and text. We design a Hierarchical Adaptive Alignment model to automatically learn the fine-grained fragment correspondence between two modalities and align these representations at three semantic levels. Atomas's end-to-end training framework supports understanding and generating molecule, enabling a wider range of downstream tasks. Extensive experiments on retrieval and generation tasks demonstrate superior performance, highlighting the efficacy of our method. Scaling experiments reveal Atomas’s robustness and scalability. Additionally, the visualization and qualitative analysis of Atomas confirms the chemical significance of our approach.

🚀 Getting Started

To get a local copy up and running follow these simple example steps.

Requirements

To install requirements:

pip install -r requirements.txt

Pre-trained Model

You can download pretrained models here:

  • Atomas pre-trained on PubchemSTM-distill dataset and finetune on CHEBI-20 dataset for molecule generation task.

Training

To train the model(s) in the paper, run this command:

python main.py --project Atomas --data_dir <your data path> --dataset <choose pubchem or chebi-20 dataset> --model_size <choose base or large> --task <choose genmol or gentext>

Evaluation

To evaluate model, run:

python eval.py --resume_from_checkpoint mymodel.ckpt 

📌 Citation

If you find our work useful in your research or if you use parts of this code please consider citing our paper:

@article{zhang2024atomas,
  title={Atomas: Hierarchical alignment on molecule-text for unified molecule understanding and generation},
  author={Zhang, Yikun and Ye, Geyan and Yuan, Chaohao and Han, Bo and Huang, Long-Kai and Yao, Jianhua and Liu, Wei and Rong, Yu},
  journal={arXiv preprint arXiv:2404.16880},
  year={2024}
}

☎️ Contact

Yikun Zhang - yikun.zhang@stu.pku.edu.cn

About

We propose Atomas, a hierarchical molecular representation learning framework that jointly learns representations from SMILES strings and text. We design a Hierarchical Adaptive Alignment model to automatically learn the fine-grained fragment correspondence between two modalities and align these representations at three semantic levels.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages