Central Limit Theorem Board

Recently I was looking at the availability of a Galton Board, but didn't find one that I liked that was available in the Netherlands. Then yesterday I was thinking about the Central limit theorem that I looked into some time ago, and got the idea to create a board analogues to the Galton Board, demonstrating the Central limit theorem, so today I did some JavaScripting to implement this idea.

So, based on the size of the initial dataset, the start integer and the max integer, the initial dataset is randomly filled. This initial dataset can be updated using the Update dataset button. Then by using the sample size and the number of samples, a random sample is created from the dataset using the sample size, then the average of this subset is calculated and a 'ball' is dropped in the 'swim-lane' that is fitting for this average. This is done for the number of samples given.

These values can be updated to manipulate the outcome and to see the effects. The dataset is only updated after clicking 'Update dataset', so you can run multiple times using the same random data as the initial data source. When the sample size is somewhere about or above 30 we should more or less recognize a usable Normal distribution. Have fun trying this out a few times with different values, especially sample size, so you can see the effect of this.

Additional remarks:

I don't really explain the Central Limit Theorem here, so please find out more about it if it isn't clear to you (for example watch some explanation videos). For me, when I learned about the Central Limit Theorem I finally started understanding statistics (think of t tests, ANOVA, ANCOVA). Oh, and by the way, I see I have used the term average here, that's because I am more used to working with a spreadsheet that with statistics tools like SPSS, so you can interchange that word with the word mean, as that might be a better fit.

So, I think there are some key points we can learn from this. Randomness is of utmost importance. n=1 (sample size 1) says nothing (try it on the board), although it does seem to convince us sometimes if we are that 1 person experiencing it. Even repeated n=1 are not convincing, here we have to take into account that there could be a case where we remember the hits while forgetting the misses (or vice versa) (availability heuristic) and/or that there is a conformation bias, or that the n=1 outcomes are pre-selected on the outcome, like with testemonials that quacks and outright charlatans use. So don't trust testimonials, and create a thinking heuristic to be aware/sceptical when you see testemonials being used. Oh, and not to forget, repeated n=1 might also violate the randomness.

Another key takeaway is, that statistical significance is related to the sample size. Don't be blinded by statistical significance especially if we have a large sample size, we also need to be looking at the effect size (now I remember the university teacher telling this, but hearing it and really understanding it are two different things). I was doing a search while writing this, and found a nice quote from a scientific paper: "Very large samples tend to transform small differences into statistically significant differences - even when they are clinically insignificant."

These are some of the insights that looking at the Central Limit Theorem can give you. It is really worth while learning about it if you haven't heard about it before, so keep on learning.