Normal distribution and histogram in R

I spent much time lately seeking for a tool that would allow me to easily draw a histogram with a normal distribution curve on the same diagram. I could create the histogram in OOCalc, by using the FREQUENCY() function and creating a column chart, but I found no way to add a curve, so I gave up. I started searching for something more powerful than OpenOffice. Of course, no Windows applications were allowed.

I googled my problem up before trying to use Maxima or something similar, and I found R. I haven’t heard about the R project earlier, but I decided to give it a try. And it was worth trying. Even if I were able to do the same in Octave or Maxima, I don’t think it could have been done easier.

Installation

In my case it was just:

emerge dev-lang/R

You should probably search for the packages specific for your distribution (as far as I know R is available in the Ubuntu repository), or download it from the official site.

After installation, type

R

in the terminal, to launch the R console.

Where the magic begins

We’ll do everything in just few lines of code. Let’s start with preparing the data. All we need is a comma-separated list of numbers (probably much longer than in this example), and we’ll create a vector that will keep this data:

x = c(4.14, 4.14, 4.16, 4.15, 4.19, 4.13, 4.16, 4.17)

Creating the histogram is as simple as that:

hist(x)

but you may want to use something like:

hist(x, col="#d3d3d3", xlim=c(4.10, 4.22), ylim=c(0, 20), probability=TRUE)

where col="#d3d3d3" is the histogram color, xlim and ylim define the range of x and y values, and probability=TRUE gives you probability density on the y axis.

Then, having in mind that sd(x) returns a standard deviation and mean(x) returns the mean value of the values in x, and dnorm() gives the density, we can add a standard deviation curve to our diagram:

s = sd(x)
m = mean(x)
curve(dnorm(x, mean=m, sd=s), add=TRUE)

The add=TRUE option tells R to add the curve to the existing diagram instead of replacing it. The col= option for changing the curve’s color is also available.

And that’s all. Quite simple, isn’t it?