Central Limit Theorem Board

(updated)

Recently I was looking at the availability of a Galton Board, but didn't find one that I liked that was available in the Netherlands. Then yesterday I was thinking about the Central limit theorem that I looked into some time ago, and got the idea to create a board analogues to the Galton Board, demonstrating the Central limit theorem, so today I did some JavaScripting to implement this idea.

Based on the size of the initial dataset, the start integer and the max integer, the initial dataset is randomly filled. This initial dataset can be updated using the Update dataset button. Then by using the sample size and the number of samples, a random sample is created from the dataset using the sample size, then the mean of this subset is calculated and a 'ball' is dropped in the 'swim-lane' that is fitting for this mean. This is done for the number of samples given.

Size of initial dataset: Start number: Max number: Sample size: Number of samples:

These values can be updated to manipulate the outcome and to see the effects. The dataset is only updated after clicking 'Update dataset', so you can run multiple times using the same random data as the initial data source. When the sample size is higher, mostly somewhere about or above 30 is used, we should get a greater kurtosis (higher peak), this makes it easier (more reliable) to see differences between two normal distributions. Please note that I haven't converted the graph to a standard normal distribution, so the 'balls' don't fall into the swimlane based on the standard deviation, I might look at that some other time. Have fun trying this out a few times with different values, especially sample size, so you can see the effect of this.

Additional remarks:

I don't really explain the Central Limit Theorem here, so please find out more about it if it isn't clear to you (for example watch some explanation videos). For me, when I learned about the Central Limit Theorem I finally started understanding statistics (think of t tests, ANOVA, ANCOVA).

So, I think there are some key points we can learn from this. Randomness is of utmost importance. n=1 (sample size 1) says nothing (try it on the board), although it does seem to convince us sometimes if we are that 1 person experiencing it. Even repeated n=1 are not convincing, here we have to take into account that there could be a case where we remember the hits while forgetting the misses (or vice versa) (availability heuristic) and/or that there is a conformation bias, or that the n=1 outcomes are pre-selected on the outcome, like with testimonials that quacks and outright charlatans use. So don't trust testimonials, and create a thinking heuristic to be aware/skeptical when you see testimonials being used. Oh, and not to forget, repeated n=1 most probably also violates randomness.

Another key takeaway is, that statistical significance is related to the sample size. Don't be blinded by statistical significance especially if we have a large sample size, we also need to be looking at the effect size (now I remember the university teacher telling this, but hearing it and really understanding it are two different things). I was doing a search while writing this, and found a nice quote from a scientific paper: "Very large samples tend to transform small differences into statistically significant differences - even when they are clinically insignificant."

These are some of the insights that looking at the Central Limit Theorem can give you. It is really worth while learning about it if you haven't heard about it before, so keep on learning.

I have updated this text (May 30^th 2024), to correct some inaccuracies that I realized where in here.