Pre-print available on tba.
This code relies on python 3.9. To install openslide, do:
apt-get update -qq && apt-get install openslide-tools libgeos-dev -y 2>&1
Then to install miso in a dedicated environment:
conda create --name miso_env python=3.9
conda activate miso_env
pip install -e .
Public 10 Genomics Visium samples are available to download on 10 Genomics's website (filter by "Spatial Gene Expression"). For example purposes, we make available tools to process such samples and make available the outputs of the samples:
Human Colorectal Cancer, 11 mm Capture Area (FFPE)
.Human Lung Cancer, 11 mm Capture Area (FFPE)
.Human Ovarian Cancer, 11 mm Capture Area (FFPE)
.
For a given sample, you will need to download:
- The
filtered_feature_bc_matrix.h5
- The
spatial
folder - The H&E image
tissue_image.tif
(be careful to use the full resolution image and not the CytAssist view, if you are using CytAssis samples)
The dataset HER2ST (Andersson et al) can be downloaded following instruction given in the repository.
- Re-write the H&E image in a pyramidal format compatible with the
openslide
library using the scriptmiso/data/processing/rewrite_slide.py
.
For instance, after downloading the Human Colorectal Cancer in PATH_TO_RAW_DATA
, you can run
python miso/data/processing/rewrite_slide.py --path_visium PATH_TO_RAW_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer/spatial --path_slide PATH_TO_RAW_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer/CytAssist_11mm_FFPE_Human_Colorectal_Cancer_tissue_image.tif --path_output_folder PATH_TO_PROCESSED_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer
This will save the new file CytAssist_11mm_FFPE_Human_Colorectal_Cancer_tissue_image_pyr.tif
in PATH_TO_PROCESSED_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer
.
- Run the pre-processing script
miso/scripts/process_data.py
:
python miso/scripts/process_data.py --path_visium PATH_TO_RAW_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer/ --path_slide PATH_TO_PROCESSED_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer/CytAssist_11mm_FFPE_Human_Colorectal_Cancer_tissue_image_pyr.tif --path_output_folder PATH_TO_PROCESSED_DATA/CytAssist_11mm_FFPE_Human_Colorectal_Cancer --level 1 --knn 37
This script will:
- Select 224 x 224 pixels tiles centered on each spots that passed Space Ranger's QC.
- For each tile, we use a pre-trained ViT-16 feature extractor to extract features both at the tile level and at each patch of size 16 x 16 pixels. By default, we use the phikon model available on huggingface.
- A list of neighbors for each tiles is computed.
- Rewrite the counts into numpy files.
Once downloaded, the data folder PATH_TO_HER2ST_DATA
contains four subfolder: count-matrices
, images
, meta
and spot-selection
. To extract tile and subtile features, run
python miso/scripts/process_her2st.py --path_dataset PATH_TO_HER2ST_DATA
This will create a fifth subfolder processed_data
in PATH_TO_HER2ST_DATA
.
To train a model you can run the scripts miso/train.py
.
To do so you can use a config file in the folder confs
and specify it with the command-line argument --config-name
, e.g.
python miso/train.py --config-name train_her2st.yaml
Performances of the models trained on HER2ST can be compared to the extensive benchmark carried by Wang et al. 1, using the source data provided in the paper. The default config train_her2st.yaml
makes use of the same split, saved in miso/assets/splits_benchmark_her2st.pkl
.
Once a model is trained, you can use it to generate pseudolabels for distillation with miso/distillation/generate_distillation_labels.py
.
For instance, to generate pseudolabels with a model trained with config miso/confs/train.yaml
in the same folders as raw counts, run
python miso/distillation/generate_distillation_labels.py --config-name=train.yaml
It is then possible to train a weakly-supervised model for super-resolved prediction of gene expression by launching
python miso/train.py --config-name=distil.yaml
Footnotes
-
Wang, C., Chan, A. S., Fu, X., Ghazanfar, S., Kim, J., Patrick, E., & Yang, J. Y. (2025). Benchmarking the translational potential of spatial gene expression prediction from histology. Nature Communications, 16(1), 1544. (Link to the publication). ↩