As I’ve recently been diving into the code for AutoSpill, this seems like a good moment for a post on how it works. This is also relevant for the autofluorescence series, which we'll in the future. For this post, I’ll be focusing on the AutoSpill R code, but there are also implementations in FlowJo and OMIQ, at least, so use those if you don’t do coding.
There are three key ways in which AutoSpill differs from traditional compensation.
Data usage. AutoSpill uses all of the data rather than selected positive and negative populations to define the spillover.
AF handling. Using robust linear regression to reduce the impact of autofluorescence on the spillover from the fluorophore signal.
Refinement. AutoSpill applies the compensation to the controls and checks the spillover again. It does this until the error reaches a plateau or falls below a certain amount.
First, let’s look at how traditional compensation works. Traditional compensation works best with neatly bimodal data, like compensation beads or CD4-type staining. In a standard compensation workflow, you need to select the positive and negative events using a gate on a histogram for the channel.
Example of autocomp gating
Then, all we do is follow the basic process laid out by Bagwell & Adams in 1993. We determine the MFIs for the target channel and the spillover channel on both the negative and positive populations. From this, we can determine the linear relationship between the two (think y = mx + b), which is the spillover as a function of target signal.
Example calculation for single colour control: (1379 - 29.5)/(4179-79.8) = 0.329
Because we want to know about the relationship between the target and spillover channels, we typically express the spillover values as fractions of 1. If it’s easier, you can think of 1 being 100%, and the other numbers being proportional to that. That's what FlowJo does. A value of 1 (or 100% spillover) would mean we have equal signal in that channel as in the target channel. By definition, this is true for the target channel.
Spillover matrix for BUV395 and BUV496 for these data calculated by FlowJo using the traditional method
We repeat this for all the spillover channels, getting a row of spillover values like so:
Example of spillover values
An aside----------------
Why can we use a linear relationship (a set percentage) to adjust the spillover signals? The relationship between photons emitted at the different wavelengths is linear. Ignoring that fact that most of our instruments don't have strictly linear abilities to detect fluorescence across the entire range (example of this with older instruments from Mario Roederer), a linear relationship is a good approximation. This can be hard to see when the data are plotted on biexponential scales--as they usually are--because that distorts the low end differently in each channel. When viewed on linear scales, it's easier to see the relationship.
Biexponential view Linear view
---------------------
Right, back to it. We repeat this process for all the single colour controls, obtaining a spillover matrix like this:
Full spillover matrix
But wait, we want a compensation matrix—what this spillover matrix thing? Well, it’s the inverse. The spillover matrix tells us how much off target signal we’re getting per channel. The compensation matrix tells us how much we need to adjust the signal in the spillover channel as a proportion of the signal in the target channel. This is taken care of for you if you're using software.
And that’s it for traditional compensation, nice and simple. And, you know what, that works pretty well for beads (in as much as your beads can provide accurate spectra). Once we get cells involved, the autofluorescence starts throwing things off, and we can have lots of cases where gating positive and negative events can be tricky, particularly for rare markers. The less well expressed the marker is, the worse traditional compensation will work. Notice that this is still true for unmixing—in both cases we depend fundamentally on identifying clean spectral signatures for the fluorophores.
Stuff that's harder to compensate accurately using a simple positive/negative approach
Right, so how does AutoSpill (try to) fix this? AutoSpill takes its inspiration from how I was doing manual compensation of FACSymphony panels using cell controls. Rather than picking a specific population, I would try to adjust the compensation value to bring the trend of the data to the horizontal with respect to the spillover channel, ignoring anything off the main trendline, which would be autofluorescence or other noise.
Example of manually adjusting the compensation with a messy control
I’d then repeat this for each channel, apply that partial compensation matrix to the next control (because FACSDiva), and continue through all the controls. I’d then repeat this whole process for the entire set of controls until the matrix stabilized. I’m not going to show the manual workflow for that here because I’m out of practice and it would take me much longer to do than preparing this blog post.
So, how do we automate that and what is the math behind it?
First, we recognize that adjusting the trend line is performing linear regression. We use the data to calculate the slope between the target and the spillover channels. In R, that would normally be the lm() function for a linear model. But, we don’t want to have the autofluorescence be the driver of the slope. For that, we use rlm(), which is to say, a robust linear model. This in itself is an iterative process, fitting and refitting a best fit line to the data for a single colour control while kicking out the data points that are doing something different and are trying to skew the slope.
Example of AutoSpill plot with points ignored
As in the manual version, we repeat this robust linear modeling for the target channel and every other channel to get spillover values for every channel. From the model, we get a coefficient (slope) and an intercept, just like we do with the positive/negative population method.
As before, we repeat this for every single colour control. This gives us the spillover matrix, which we can solve to get the compensation matrix.
We don’t stop there, though, that’s just the beginning. The compensation matrix is a best fit solution for the spillover matrix, but we can see how well it actually fits by applying it to the data. So, we apply that compensation matrix to the controls and check for spillover. Any spillover results in a small adjustment to the spillover matrix, and, subsequently, the compensation matrix. After a certain point in this process, we switch from looking at the data in an untransformed (linear) format to using a biexponential transform, as you might expect. This compresses the events around zero, causing them to have less weight in the modeling of the slope. That helps get a bit better results.
Once AutoSpill detects an oscillation in the data or that the accuracy reaches a certain threshold, the iterative improvements stop, and the compensation matrix is finalized.
AutoSpill iteration error plot
How to use AutoSpill:
The original AutoSpill R package works only on Mac or Unix systems. This is because it employs mclapply to process the controls in parallel, and a different type of parallel processing is required on Windows machines. This version of the package should work in R on Windows, but lacks parallel processing all together. It’s still somehow about as fast the FlowJo implementation in my hands. Use the original version if you're working on a cluster.
AutoSpill requires two things as input:
Your single stained control fcs files.
A CSV file listing the files by their names, assigning them to the channels to use from the cytometer and the names of the markers. In the original version, you also need to add the approximate emission wavelength for each fluorophore in a separate column. This information is hardly used, and I’ll remove this requirement in the updated version soon.
Set the location of these items:
fcs.control.dir <- "working_directory/single_stained_controls"
control.def.file <- "working_directory/fcs_control_file.csv"
You can then set the parameters for the Autospill run. "final.step" is what you want if you just want it to produce a good spillover matrix.
asp <- get.autospill.param( "final.step" )
Next, read in the single-stained controls:
flow.control <- read.flow.control( fcs.control.dir, control.def.file, asp)
Now, automatically gate the cells to be used for calculating the compensation. This step uses Voronoi tessellation to figure out where your cells are on FSC/SSC. If you have a lot of debris or dead cells, at higher density than the live cells, this may not work well. That said, you may have bigger problems if you have poor cell preps. To get around this, you can pre-gate in the software of your choice and use exported pre-gated cells as the input. This may also help if your positive cells are not lymphocytes, as the Voronoi tessellation tends to pick the lymphocyte area since it tends to be the area of highest density. The gating parameters can be tweaked a bit, but auto-gating for any possible arrangement of cells on a scatter plot is not a simple coding problem.
flow.gate <- gate.flow.data( flow.control, asp )
We then extract the spillover using robust linear models on both the untransformed and biexponentially transformed data.
marker.spillover.unco.untr <- get.marker.spillover( TRUE, flow.gate, flow.control, asp )
marker.spillover.unco.tran <- get.marker.spillover( FALSE, flow.gate, flow.control, asp )
Finally, we iteratively refine the spillover by applying the calculated compensation to the controls and checking to see if they’re actually straight versus the other channels.
refine.spillover.result <- refine.spillover( marker.spillover.unco.untr,
marker.spillover.unco.tran,
flow.gate, flow.control, asp )
If you want a .mtx compensation matrix file that you can use in FlowJo, run the convert_spillover_to_flowjo script (available in both R and Python versions).
Thanks so much for posting this Oliver! I've been wanting to use autospill for a while but wasn't ever able to get very far with just the vignette.
On a separate note, I still ran into issues with mclapply from the original version of the package on an M2 Mac at the `gate.flow.data(flow.control, asp)` step and had to reinstall the "Windows-version", which then worked.