Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

Our code base includes folowing stages:

Finetune CTViT model on Vietnamese PET/CT-report dataset. Details can be found in pet-clip/README.md
Finetune Cosmos model on Vietnamese PET/CT-report dataset. Details can be found in Cosmos/README.md

Training and Inference Vision-Language model. Details can be found in VLMs/README.md

Extract structured lesion information from the LLM output and clinically evaluate the predictions. Details can be found in clinical_evaluation/README.md

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Cosmos		Cosmos
VLMs		VLMs
clinical_evaluation		clinical_evaluation
pet-clip		pet-clip
.gitignore		.gitignore
README.md		README.md