Based on the PMC-15 dataset, we interact with GPT-4V, and collect 80K unique language-image instruction-following samples in total. Please check out ``BioMed-VITAL-Instruct-80K''' on [HuggingFace Dataset].
Data file name | File Size | Sample Size |
---|---|---|
BioMed-VITAL-instructions-80K.json | 142 MB | 80K |
BioMed-VITAL-instructions-150K.json | 309 MB | 60K + 10K + 80K |
BioMed-VITAL-instructions-210K.json | 462 MB | 60K + 10K + 80K + 60K |
@misc{cui2024biomedical,
title={Biomedical Visual Instruction Tuning with Clinician Preference Alignment},
author={Hejie Cui and Lingjun Mao and Xin Liang and Jieyu Zhang and Hui Ren and Quanzheng Li and Xiang Li and Carl Yang},
year={2024},
eprint={2406.13173},
archivePrefix={arXiv}
}
This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We thank the LLaVA and LLaVA-Med team for giving us access to their models, and open-source projects, including BioMed-CLIP.
Usage and License Notices: The data, code and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of CLIP, LLaVA and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
The source code of this repository is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages.