RNA-seq: The Pitfalls of Technical Replicates

Science, like most things, isn’t perfect. Because individual mice (or other biological samples, like humans) are different and because the way we measure things isn’t perfect, every time we do an experiment, we get slightly different results.

We cope with this variation by doing the same experiment a bunch of times. Once we’ve collected a lot of samples, we typically calculate the average and the standard deviation. If we’re worried about the biological variation (the variation from each individual mouse or human) we do lots of biological replicates, and if we’re worried about technical variation, we do technical replicates (remeasure the same individual multiple times). Often we’ll do both.

For simple and inexpensive experiments, it’s easy to do lots of replicates, both biological and technical. However, RNA-seq is time consuming and expensive, and we want to squeeze as much as we can out of the samples that we have (which can be very limited!) The good news is that because of the way technical variation affects RNA-seq results, we only need to do biological replicates with this type of experiment.

Here’s an example of biological variation alone (imagine there is no technical variation). In this example, we’re measuring read counts for an imaginary gene “X”. Each mouse has either more or less than the average for all mice. If we do enough samples, our estimate of the average of all mice will be pretty good because the measurements that are above the mean will be canceled out by the measurements that are below the mean.

This figure shows how biological variation gives us measurements that are always a little larger or smaller than the mean.

This figure shows how biological variation gives us measurements that are always a little larger or smaller than the mean.

When we add technical variation to the mix, RNA-seq will result in random values that either add or subtract read counts from each mouse’s measurement. Here’s an example where I’ve colored the biological variation orange and the technical variation green.

This figure shows how technical variation can increase or decrease the read counts we get for each sample.

This figure shows how technical variation can increase or decrease the read counts we get for each sample. It also shows the equation for the average read count.

When we calculate the average measurement from our samples, we end up with a term for biological variation and a term for technical variation. Just like before, the term for biological variation will go to zero as we add samples, because the positive values will be canceled out by the negative values. However, because the technical variation is similarly distributed, it will cancel itself out for the same reason. Thus, without doing a single technical replicate (and only doing biological replicates) we can reduce the affects of biological and technical variation and get a good estimate of the mean.

This last figure shows how the biological and technical variation can be separated into separate terms. Each of these terms goes to zero as we add more biological replicates.

This last figure shows how the biological and technical variation can be separated into separate terms. Each of these terms goes to zero as we add more biological replicates.

One problem that is often overlooked is that technical replicates alone will only eliminate the term for technical variation. The term for biological variation will only be reinforced, and this can lead to misleading results. See the video for more details.