progress report

Valentin Marteau

2025-03-18

Shears are more powerful
scissors

scissor criticism

[

  • method to associate single cells from scRNA-seq datasets with phenotypic or clinical information of bulk RNA-seq samples (PMID:34764492)
  • computationally expensive (several thousand CPU hours for the CRC atlas)
  • biological replicates in the single-cell data are not taken into account
  • despite being based on a GLM it is not possible to include covariates into the regression analysis
  • Scissor results are always relative to the other cells present in the single-cell RNA-seq dataset

Alternative approach: shears1

Two step approach:

1. Generate bulk x cell matrix with cell weights

-> Assumption: bulk=weights*single cells

We need to compute a weight for each single-cell and each bulk sample. This is similar to deconvolution, except we have a single-cell matrix instead of a signature matrix aggregated by cell types and can be solved with e.g. linear regression. Since we have way too many features, we apply a Ridge constraint to the regression.


2. Compute the importance of each cell for a given phenotype

Use an individual linear model per cell. Fit a model

-> Phenotype ~ cell_weight

for each cell. The coefficient of the model can be considered as the effect size measured for each cell. The problem with this approach is that a cell will be significantly associated, even if the effect could be explained with other cells. I’d expect this to be mostly a problem when using a correlation matrix instead of a weight matrix that already takes the importance of each cell into account.

single cell colorectal cancer atlas

Compute cell weights for each bulk sample


  • Quantile normalization to make bulk and single-cell comparable (as done by scissor)
  • Ridge regression to estimate weights for the following equation

\[ B = w_1 S_1 + w_2 S_2 + \dots + w_n S_n \]

Compute cell weights for each bulk sample

weights for three different AC-ICAM cohort patients

Aggregate to cell-type fractions

similar to other deconvolution methods; bulk samples x cell type matrix

Comparison of cell weights between conditions


For each cell:

weight ~ condition + covariates

-> coefficient and p-value for condition

shears survival

shears KRAS mutation

shears BRAF mutation

Limitations


  • Multicollinearity, i.e. can shears distinguish between similar cell-types?
  • How to validate?
  • Code is available at icbi-lab/shears