Normal distribution and histogram in R
I spent much time lately seeking for a tool that would allow me to easily draw a histogram with a normal distribution curve on the same diagram. I could create the histogram in OOCalc, by using the FREQUENCY()
function and creating a column chart, but I found no way to add a curve, so I gave up. I started searching for something more powerful than OpenOffice. Of course, no Windows applications were allowed.
I googled my problem up before trying to use Maxima or something similar, and I found R. I haven’t heard about the R project earlier, but I decided to give it a try. And it was worth trying. Even if I were able to do the same in Octave or Maxima, I don’t think it could have been done easier.
Installation
In my case it was just:
emerge dev-lang/R
You should probably search for the packages specific for your distribution (as far as I know R is available in the Ubuntu repository), or download it from the official site.
After installation, type
R
in the terminal, to launch the R console.
Where the magic begins
We’ll do everything in just few lines of code. Let’s start with preparing the data. All we need is a comma-separated list of numbers (probably much longer than in this example), and we’ll create a vector that will keep this data:
x = c(4.14, 4.14, 4.16, 4.15, 4.19, 4.13, 4.16, 4.17)
Creating the histogram is as simple as that:
hist(x)
but you may want to use something like:
hist(x, col="#d3d3d3", xlim=c(4.10, 4.22), ylim=c(0, 20), probability=TRUE)
where col="#d3d3d3"
is the histogram color, xlim
and ylim
define the range of x and y values, and probability=TRUE
gives you probability density on the y axis.
Then, having in mind that sd(x)
returns a standard deviation and mean(x)
returns the mean value of the values in x
, and dnorm()
gives the density, we can add a standard deviation curve to our diagram:
s = sd(x)
m = mean(x)
curve(dnorm(x, mean=m, sd=s), add=TRUE)
The add=TRUE
option tells R to add the curve to the existing diagram instead of replacing it. The col=
option for changing the curve’s color is also available.
And that’s all. Quite simple, isn’t it?