Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
Our code base includes folowing stages:
- Finetune Vision Encoder
- Finetune CTViT model on Vietnamese PET/CT-report dataset. Details can be found in pet-clip/README.md
- Finetune Cosmos model on Vietnamese PET/CT-report dataset. Details can be found in Cosmos/README.md
- Training and Inference VLMs
- Training and Inference Vision-Language model. Details can be found in VLMs/README.md
- Clinical Evaluation
- Extract structured lesion information from the LLM output and clinically evaluate the predictions. Details can be found in clinical_evaluation/README.md