Central Limit Theorem Board
(updated)
Recently I was looking at the availability of a Galton Board, but didn't find one that I liked that
was available in the Netherlands. Then yesterday I was thinking about the Central limit theorem
that I looked into some time ago, and got the idea to create a board analogues to the Galton Board, demonstrating the Central limit theorem, so today I did some
JavaScripting to implement this idea.
Based on the size of the initial dataset, the start integer and the max integer, the initial dataset is randomly filled. This initial dataset can be updated using the
Update dataset button. Then by using the sample size and the number of samples, a random sample is created from the dataset using the sample size, then the mean of this subset is
calculated and a 'ball' is dropped in the 'swim-lane' that is fitting for this mean. This is done for the number of samples given.
These values can be updated to manipulate the outcome and to see the effects. The dataset is only updated after clicking 'Update dataset', so you can run multiple times using the same random data
as the initial data source. When the sample size is higher, mostly somewhere about or above 30 is used, we should get a greater kurtosis (higher peak), this makes it easier (more reliable) to see differences between two
normal distributions. Please note that I haven't converted the graph to a standard normal distribution, so the 'balls' don't fall into
the swimlane based on the standard deviation, I might look at that some other time. Have fun trying this out a few times with different values, especially sample size, so you can see the effect of this.
Additional remarks:
I don't really explain the Central Limit Theorem here, so please find out more about it if it isn't clear to you (for example watch some
explanation videos). For me, when I learned about
the Central Limit Theorem I finally started understanding statistics (think of t tests, ANOVA, ANCOVA).
So, I think there are some key points we can learn from this. Randomness is of utmost importance. n=1 (sample size 1) says nothing (try it on the board), although it does seem to convince us sometimes if we are that 1 person experiencing it.
Even repeated n=1 are not convincing, here we have to take into account that there could be a case where we remember the hits while forgetting the misses (or vice versa) (availability heuristic)
and/or that there is a conformation bias, or that the n=1 outcomes are pre-selected on the outcome, like with testimonials that quacks and outright charlatans use. So don't trust testimonials, and
create a thinking heuristic to be aware/skeptical when you see testimonials being used. Oh, and not to forget, repeated n=1 most probably also violates randomness.
Another key takeaway is, that statistical significance is related to the sample size. Don't be blinded by statistical significance especially if we have a large sample size, we also need to be looking
at the effect size (now I remember the university teacher telling this, but hearing it and really understanding it are two different things). I was doing a search while writing this, and found a nice quote from
a scientific paper:
"Very large samples tend to transform small differences into statistically significant differences - even when they are clinically insignificant."
These are some of the insights that looking at the Central Limit Theorem can give you. It is really worth while learning about it if you haven't heard about it before, so keep on learning.
I have updated this text (May 30th 2024), to correct some inaccuracies that I realized where in here.