Notice anything backwards, missing, misleading, or wrong?

At Analytical Flavor Systems, our job is to monitor our clients’ batches for variations in real-time. Clients review their batches regularly, assigning each sensory attribute a score from 0 to 5 along 24 universal flavor dimensions. Additionally, every reviewer assigns the product an overall Perceived Quality (PQ) score between one and seven.

Our algorithms run 24 hours a day, 365 days a year, protecting our clients from shipping a bad batch and hurting their brand. Our clients are alerted to any significant drop in post-production quality through email and phone alerts. This blog post will explain how we updated our models from the Western Electric rules to a more precise and accurate Adaptive Bayesian model.

The original model we used was the Western Electric rules, which operates on the assumption that the \(\bar x\) values (or sequences of \(\bar x\) values) outside a certain number of standard deviations from the mean have a very low probability of being generated by random processes, and could thus be a symptom of meaningful variation (such as a beer batch flawed with dimethyl sulfide).

For example, say that in the last few weeks, three overall Perceived Quality values were less than one standard deviation below the mean. Frequentist statisticians will posit that the the odds of such an event occurring (if the mean is where we expect it to be and the variations were generated by random chance) are less than 1%. They would then assume that the batch's variation is caused by a contamination and alert the producer.

Data scientists and mathematicians will note that the Western Electric Rules rely heavily on Frequentist assumptions and a normal distribution, which does not always fit our data very well. Since we have upper and lower limitations on our PQ scores, the normal distribution is a bad estimate for scores close to 1 or 7. Behind the scenes of the Bayes model, we assume that the data follows a Poisson, rather than normal distribution, because this will change its variance based on how high or low the expected PQ values are.

Furthermore, there are several confounding factors in the data that the model needs to be robust enough to account for. For example, a person's mood has an unconscious impact on how much they perceive a batch's quality; this could mean that reviews done on Monday have a lower-than-normal Perceived Quality score on average. Similarly, there could be long term seasonal effects such as a preference for stouts in the winter and India Pale Ales in the summer—these are perception shifts in the underlying population of reviewers and consumers of our clients' products, not true batch variations.

We updated our model to the Bayesian Poisson statistic as it gives us more useful information. After we generate two hypotheses (\(PQ_{normal}\) and \(PQ_{low}\)) from past data, the model tells us how likely it is that the Perceived Quality is at a normal value versus how likely it is that the Perceived Quality has dropped significantly (to \(PQ_{low}\)) based on the data we have. This is more useful than the Western Electric statistic, which only gives us the probability that the data was generated by a normal Product.

When deciding how much the data supports either hypothesis, the Bayesian statistic takes two factors into account.

\(P(PQ_{low})\) : Based on past information, how likely is it that this hypothesis is true (i.e. what is the probability of a variation in beer X?).

\(P(data | PQ_{low})\): If \(PQ_{low}\) were true, how likely is it that we would see this data?

For example, if you tasted Earthy-ness in a beer, you might guess that the flavor was caused by 2‑isobutyl‑3‑methoxypyrazine (IBMP) or 2‑isopropyl‑3‑methoxypyrazine (IPMP). For this hypothesis, \(P(IPMP)\) would be lower while \(P(IBMP)\) would be higher.

Continuing the model, knowing where your water came from, say a source with a relatively high concentration of IPMP, \(P(earthiness | IBMP)\) would be barely believable while \(P(earthiness | IPMP)\) would be very high. Both factors are important when considering any theory, and the Bayesian statistic does a good job balancing both—assuming you start with the right assumptions and enough background data to determine the underlying probabilities of your intended inference.

We calculate the \(P(data | PQ_{low})\) and \(P(data | PQ_{normal})\) using our database of over 10,000 coffee reviews, 6,000 beer reviews, and thousands of reviews across our clients' other products. We assume a Poisson distribution, which looks similar to a normal distribution. The difference is, instead of considering the distance of each point from the mean in standard deviations, it takes in only the number of points that deviate significantly from the mean, then compares it to the number of points that we expect to deviate significantly (given normal random variations).

We find \(\lambda\), the expected number of deviations, from the probability of generating a bad review (\(P_{badrev}\)). Now that the initial model is built we have to take into account the other confounding factors and expand the model to a \(\mathbb{R}^{24}\) joint probability distribution.

The initial truth values for the Bayesian model will be found using not only the means and standard deviations of the data points, but also a neural network and principal component analysis to isolate key variables. This, again, is because our perception of taste changes depending on a large variety of factors. All these factors must be accounted for by this model. In contrast to the standard Western Electric Frequentist Analysis, which must operate under strict, simplistic normal assumptions, our model is able to dynamically infer what a normal and contaminated batch would look like given conditions we have encountered in the past.

We are pleased that our new Bayesian model will find deviant batches with more accuracy than the Frequentist model. However, what we are really excited about is the fact that our models give us more usable information. For example, using feature extraction to dynamically predict the value of the normal and low Perceived Quality values, we can determine which correlating factors have the biggest impact on Perceived Quality.

This information might be useful in letting producers know how they can optimize their brewing or reviewing process. Thus, instead of just flagging an unexplained variation, our model will return a full diagnosis, so that producers know what they need to fix.

If you’re a producer of beer, coffee, or spirits who wants to benefit from quality control, sign up for our 31 day free trial here. If you’re a data scientist passionate about food & beverage manufacturing and applying data science to human sensory data, check out our openings here.