Forkcasting
As clear as a puddle of mud

Dithering

Sometimes, worse is better

Let's say that you want to record the day's mean temperature accurately. You hook up a temperature sensor to an analogue-to-digital converter, then record that value with a Raspberry Pi.

When building this kit, you measure the output of your temperature sensor with your Fancy oscilloscope, and see a signal:

A continuously-varying signal between 100 and 101 mV

Your oscilloscope output. A nice smooth line.

This has a mean of 101.1 mV.

Running your logger and looking at the output, you see something odd:

A quantized signal, a flat line at 101 mV. There's also the original signal, which is not flat, it's consistently above it.

That seems to have lost some information.

Here, you find the mean is 101 mV. This is wrong! It's not that wrong, but it is wrong.

You're seeing quantization error. You must throw away some data when digitizing a signal; it's an unfortunate fact of life. You have to round the input to the nearest whole value for your system [1].

For whatever reason, you're not happy with this. You want to get the "true" mean of your signal, side-stepping the quantization error. Dithering fixes this.

Dithering is a little counter-intuitive. It works by making things a little worse: it adds noise to the original signal. Intuitively, this signal is just less useful, you now have to account for the randomness in all later processing.

Adding noise to the original signal works with quantization, not against it. As the signal bounces around, some of the original data points go high enough to reach the next quantization level. Some go low enough to drop down to the one below.

If the randomness is "fair," then the number that go up and down is also fair. The only remaining "unfairness" in the signal is the original mean: the thing we want to measure. The new randomness amplifies the original bias so it's measurable. After dithering your signal, you measure again:

A dithered quantized signal. The signal does not look like the original, it bounces up and down in steps to 100, 101, and 102. Then centre of the signal is perfectly over the original signal's centre, showing no bias.

Messy, but somehow tells you more about the original signal

The mean for this signal is 101.1 mV -- a more accurate measurement! In this case, worse is better.

You can now happily report your local temperatures with better accuracy. Your home-brew weather-station is the talk of the village. Hurray!

Worse is better

This principle isn't restricted to signal processing. You can improve some system by making part of it worse or injecting randomness. Let's take a few examples:

  1. Machine learning
  2. Software engineering
  3. Forecasting
  4. Process engineering

Machine learning

Many machine learning algorithms work best when they have less data. For example, many computer vision algorithms work only on gray-scale (black & white) images, or deliberately reduce the resolution.

Similarly, you can over-fit your data if you train the algorithm too much. The algorithm "learns" the input data, but that isn't what real-life is like. You get great scores when training, but the system is worse when applied to real problems.

Sometimes, the system even gets better with fewer input features! They "distract" the algorithm with random correlations that aren't really there. This is related to over-fitting the input data.

Software engineering

Unreasonable availability goals cause problems. Sometimes these problems crop up when pushing beyond reasonable goals.

First, you need every dependency to reach 100% uptime -- a very expensive requirement! If you try to patch-over dependencies using retries you might cause a retry storm

You might annoy your customers. If you spend 6 months perfecting every change, your customers might not be so happy that their "small change" took so long to get to them. Customers sometimes value features more than a low error rate. It's better for the customer if you just them something that works 99.5% of the time.

Property testing is "objectively" worse -- you generate random inputs and test a "property." Often, this property is not a perfect representation of the system. However, it is fast and easy to check, so you can run thousands of tests in the blink of an eye. This often finds more bugs than slower, more precise testing.

You can deliberately break things in production to improve overall reliability. If you do have a high availability goal, then finding bad decisions is worthwhile. Maybe you can't really fail-over when latency is high, maybe your service hangs. It's better to find these out when you can turn off the problem than when the problem is forced on you by circumstance. This is chaos engineering.

Often, people try to optimise software systems by looking at utilization. If utilization is low, then that's Bad. They'll look for things that are blocking or waiting. This can work to a point, but it also goes too far. 100% utilization means that a system might have high throughput, but also a very high latency. Sometimes so high that customers can't get results because their browser times out. The best thing to do in these cases is to back off from full utilization (and maximum throughput.)

Process engineering

Let's say you're running a custard factory. You're having trouble getting enough custard to the packaging line. You call in an engineer to help improve the flow rate, and they reduce the pressure in the pipes and the flow rate increases. The engineer recognised the custard is a non-newtonian fluid, so adding pressure made it more solid and harder to move around. The fix let it flow freely.

In Goldratt's The Goal, the protagonist's factory automates part of the work. This increases the output from one step, but overloads the bottleneck. This creates a big queue in front of the bottleneck which makes all work later and later. Customers get angry and the factory starts losing money. The solution? Don't use the automation, or only use it when the bottleneck has capacity (i.e. a very low queue.)

Forecasting

When making a forecast, many people's instinct is to get more data. Find out everything they can about the situation, try to "simulate" it in their head, etc. This often takes a long time and a lot of effort.

This opens you up to biases. You think this project will be easy, you forget all the problems you had in another project, you believe that your political appointee will do better than "their" appointee, etc.

Strangely, it's often better to ignore the problem at hand and look for outside examples. For instance, let's say you're trying to work out what will happen with Brexit. You can deep-dive on the local papers, polling, pundits, and so on. Or you can look at recent secession attempts. One of those gives you 4:1 odds for remain, and the other says about half of secession attempts succeed. Brexit was just another secession attempt, so 1:1 odds was a better forecast.

In geopolitical forecasting this is called "the outside view." When estimating projects, this is called reference class forecasting.

Conclusion

Dithering adds noise to a signal so we can measure it more accurately. We can apply this principle in other places, like machine learning, software engineering, process engineering, and forecasting.

Perfection is actively harmful. Stop trying to be locally optimal and you might stumble into something that's better overall.

[1]You could use a floating-point number, but there is still a maximum precision to a floating-point number. You might fix it for this problem, but you can still bump into it if you need even more precision. Plus, floating-point hardware can be expensive.