This repository contains the code and analysis for our paper, "Cell shapes decode molecular phenotypes in image-based spatial proteomics" (Le et al., 2025).
We introduce a computational framework called Shapespace, which study interpretable shape variations, and maps single-cell protein localization and pathway activity onto a common coordinate system defined by cell and nuclear morphology. This enables robust, interpretable analysis across morphological variation, conditions, and perturbations.
notebooks/ and analysis/ – Python script or Jupyter notebooks for reproducing figures, performing downstream analysis, and exploring results. For step-by-step details, please refer to the Methods section of the manuscript.
This repo is not maintained.
Clone this repository and set up a Python environment:
git clone https://github.com/CellProfiling/2D_shapespace.git
cd 2D_shapespaceWe recommend using conda or venv to create an isolated environment:
# With conda
conda create -n shapespace python=3.11
conda activate shapespace
# Or with venv
python -m venv shapespace
source shapespace/bin/activate # Linux/macOSInstall the repository as a Python package in editable mode:
pip install -e .Before running the analysis, you need to:
-
Edit config.py to point to your project directory, alignment mode, cell line, mapping method etc.
-
Follow the workflow steps, which are described in detail in the manuscript and summarized below in the [Shapespace construction](## Shapespace construction) section.
We also provide a test dataset (single-cell crops and masks) which allows you to quickly test shape parameterization, constructing shapespace and map protein intensity to the average cell shape. NOTE: This dataset is intentionally small for testing and may not preserve the true average shape representation of the cell line. The number of samples/organelles is also limited, so it will not recover the true organelle map.
wget https://ell-vault.stanford.edu/dav/trangle/www/K-562.zip
unzip K-562.zip -d K-562
python -m coefficients.s2_calculate_fft
python -m analysis.cell_nucleus_ratio
python -m shapemodes.s3_calculate_shapemodes
python -m warps.s4_concentric_rings_intensity --cell_line K-562 --n_isos 10 20 # check cfg.N_ISOS and cfg.LANDMARKS
python -m warps.s4_tsp --cell_line K-562 # check cfg.LANDMARKSFor large datasets or when analyzing multiple cell lines, consider using a workflow manager such as Snakemake, or submitting separate jobs to a compute cluster using SLURM. Example workflow files and job scripts can be found inside each folder.
Steps for the pipeline:
Either manual segmentation, or segmentation by any DL model (in this case HPACellSegmentator for inference; training code is currently in private repo). I've also provided here example of training and segmenting dataset by the popular cellpose v2.0 (credits to their starter notebook, I only wrapped them in a more comprehensible/concise manner). The training set for this part is only 9 images/FOVs.
Folder: segmentation
Removing cells where nucleus touching the borders. Cells where cell segmentation touching the bordered are still kept (maybe do a percentage rules to remove them in the future).
python s1_get_single_cell_shapes.pyFolder: coefficients
- Alignment and center: major axis, nuclei-cell centroid vector, major axis + nuclei centroid (mass) alignment
- Calculate FFT of x,y of the nucleus and cell segmentation (equally spaced sample along the shapes): fast fourier coefficients, elliptical fourier discriptors, wavelet
- Save result of multiprocessing pool
python s2_calculate_fft.pyFolder: shapemodes
Fit and transform PCA, calculate shapemodes (n_PCs with xx% variance) based on coefficients produced from s2.
python s3_calculate_shapemodes.pyFolder: warp
Protein parameterization based on concentric rings from nucleus centroid - nucleus membrane - cell membrane. Final shape for all proteins: (n_rings, n_points)
python s4_concentric_rings_intensity.pyOR Protein morphing on to shape based on thin-plate splines given landmarks: nucleus centroid, 32p in nucleus membrane, 32p cell membrane. Final shape for all proteins = shape of the average cell in that shapemode (bin).
python s4_protein_image_warp.pypython s5_organelle_heatmappyThe pilot was performed for a small subset of U2OS cell line (private images, manual segmentation and annotations): cells (1776 images) as well as HPA (public images, automatic segmentation): 297108 cells (23272 images). For historical reason, some files contained fixed paths, such as shapemode_pipeline.py.