Abstract
The Massively-Parallel Cytometry (MPC) experiments allow cost-effective quantification of more than 200 surface proteins at single-cell resolution. The Infinity Flow analysis protocol was developed and used to measure highly informative protein ‘backbone’ markers on all cells in all wells distributed across three 96-well plates, and well-specific exploratory protein ‘infinity’ markers, followed by using the backbone markers to impute the infinity markers on all cells in all other wells using machine learning methods. This protocol offers unprecedented opportunities for more comprehensive classification of cell types. However, several aspects of the analysis protocol can be improved, including data normalisation methods such as background correction and removal of unwanted variation. We propose an end-to-end toolbox that carefully pre-processes the raw data in FCS format, and further imputes the ‘missing’ infinity markers in the wells without measurement. Our pipeline starts by performing background correction on raw intensities to remove the background noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Secondly, unwanted technical variation such as batch effects is removed using a log-normal model with plate, column, and row factors. Thirdly, imputation is done by using the informative backbone markers as predictors. Lastly, cluster and other statistical analyses can be performed on the completed dataset. Unique features of our approach relative to the existing method include performing background correction prior to imputation and removing unwanted variation from the data at the cell-level while explicitly accounting for the potential association between biology and unwanted factors. We benchmark our pipeline against alternative pipelines and demonstrate that our approach is better at preserving biological signals, removing unwanted variation, imputing unmeasured infinity markers, and refining cell-types.
Keywords: single-cell, surface proteins, Massively-Parallel Cytometry (MPC), Infinity Flow, data normalisation, background correction, imputation, cluster analysis.