Lab #1 Photon counting with a PM tube - statistics of light

Two weeks - 9/29 & 9/5 (Labor day is 9/4). Report is due on 9/12

For show-and-tell on 9/5 you should have progressed to step 5.

Goals 

What are the fundamental physical limitations on the detection of
light?  How do you specify how bright is a source of light is?  How
precisely can that brightness be specified?  What determines the
precision?  What are the statistical properties of photons?

Reading assignments: 

 - Unix tutorial
 - IDL tutorial
 - TeX tutorial
 - Taylor Chapters 1 & 2

Summary

This lab comprises four topics: 

Collect data from the photometer experiment and make a plot of the
counts per sample versus time.

Plot histograms of sets of samples from the photometer.

Compute the mean and standard deviation for your samples and
investigate the variability of the count rate.

Compare the observed histograms with the theoretical Poisson
probability distribution function.


1) GETTING DATA FROM THE PHOTMULTIPLIER TUBE (PMT)

Before you modify anything associated with the PMT make sure you
have read and understood the following.

 - Read the PMT web page 

http://ugastro.berkeley.edu/class/ay122/photon/photon.html

The PMT is very sensitive---it detects individual photons.  Do not
expose the PMT to direct room light! Do not expose to lights so that
count rate is > 1 MHz. To figure the count rate multiply the counts
per sample by the sample rate.e.g., 1000 counts per sample when the
sample rate is 1kHz corresponds to a 1,000,000 counts per second or 1
MHz.

The first step is to acquire a time sequence of digitized data and
save the data as a file.  Log into one of the sun workstations.  Make
a Unix sub directory for your project and type the following at the
Unix prompt:

   % echo counter nsamples=100 rate=1000 fname=c.100.1000.dat | sendphot

(Note: % indicates the Unix prompt).  This cryptic command takes 100
samples of data from the photo-multiplier tube at a rate of 1000 Hz,
and puts the data in a file called c.100.100.dat.

You should get a message indicating success. Since you asked for 100
samples, this file should contain 100 lines. In this example how long
is the counter active for? What about nsamples=1000 rate=100?

A sample rate of 1000 Hz means that the computer accumulates counts
from the PMT for 1/1000 of a second. At this rate you should get few
counts per sample.  Play around with the sample rate and convince
yourself that the length of the sample is inversely proportional to
the sample rate. If the rate is significantly higher or lower than
this then ask an instructor for help. 

The maximum sample rate is 5000 Hz. Note that not all sample rates are
permitted. If you ask for a counter rate that is not supported then
the counter program will choose the closest one available.


2) READING DATA

The first challenge is to read in the data file. View your file using
emacs and notice the format. It is a single column of numbers.  There
is an IDL program that reads a file into computer memory and
labels it with an IDL variable name, in this example
it is called "mydata":

   IDL> readcol, 'c.100.1000.dat', mydata

Notice that the file name must be enclosed in single quotes.  IDL
names between single quotes are called strings.


3) PLOTTING DATA

Once you have the data in an IDL variable you can plot it using the
idl command

   IDL> plot, mydata

To make useful graph you need to plot your data as a function of
time. Label the x- (i.e., time) and y-axes (counts).  Make sure that
your labels include units. A plot title is also helpful and can be
used to denote the sample rate. Invoke IDL help by typing a question
mark at the IDL command line. Look up "PLOT" in the index and figure out
how to add a title and axes to your plot.

A useful on-line resource is 

http://idlastro.gsfc.nasa.gov/idl_html_help/home.html


4) STATISTICS---Making a Histogram

Calculate the mean and standard deviation of the count rate.  The IDL
function "TOTAL" will turn out to be very useful.  Bin the data and
make a histogram of the counts. Binning means sorting the data into
unique categories, and counting the number of occurrences of those
categories.  Use my example program in the statistics hand-out to help
you solve this programming problem.

Does the histogram plot that you have made really reflect that data
that you collected? Carefully compare the list of counts in the data
file and the plot.

Once you can plot histograms with confidence repeat, say, six times
and compare the results of your experiments. Does the histgoram
change?  Do you always get the same mean count rate?  Become adept at
inspecting the histogram plot and guessing what the mean and standard
deviations are.

Calculate the mean and standard deviation of the six count rates you
just measured.  If you want to automate the procedure of acquiring
data create a file with the above command, cut and paste the line as
many times as needed while changing whatever parameter you need. When
your file is ready for execution save it and type at the Unix prompt
 
  % source myfile.script

Be sure to use a unique file name for each sequence of data. Examining
the data in IDL will challenge your ability to write "FOR" loops. A
quick and sophisticated way to approach repetitive tasks involves
writing the entire sequence of data acquisition in IDL using FOR
loops.

The simplest IDL FOR loop can be executed at the command line. Try this:

   IDL> for i=0,9 do print,i

If you are plotting multiple sequences of data investigate what
happens when you set the IDL plotting variable

   IDL> !p.multi = [0,2,3]

What does this do? To get back to normal plotting set

   IDL> !p.multi=0

Now repeat with a bigger sample of data, i.e., increase the number of
samples by a factor of 10, but keep the sample rate the same as
before.

   % echo counter nsamples=1000 rate=1000 fname=c.1000.1000.dat | sendphot

Again, take six sets of data and repeat the exercise of calculating
the mean and standard deviation for each longer set of data. What do
you notice about the mean count rate and the standard deviation for
these sequences?  Calculate the means and standard deviations of the six
count rates you just measured. Why is the ensemble of measurements
different when you take 100 samples and 1000 samples?


5) MEAN AND STANDARD DEVIATION

Let's try and get to the root of these variations.
Show that there is a relation between the number of counts and the
standard deviation. Take a sequence of data with increasingly long
(i.e. slow) sample times, e.g.,

 echo counter nsamples=100 rate=5000 fname=c.100.5000.dat | sendphot
 echo counter nsamples=100 rate=3333 fname=c.100.3333.dat | sendphot
 echo counter nsamples=100 rate=2500 fname=c.100.2500.dat | sendphot
 echo counter nsamples=100 rate=1666 fname=c.100.1666.dat | sendphot
 echo counter nsamples=100 rate=1250 fname=c.100.1250.dat | sendphot
 echo counter nsamples=100 rate=1111 fname=c.100.1111.dat | sendphot
 echo counter nsamples=100 rate=1000 fname=c.100.1000.dat | sendphot
 echo counter nsamples=100 rate=833  fname=c.100.833.dat  | sendphot
 echo counter nsamples=100 rate=714  fname=c.100.714.dat  | sendphot
 echo counter nsamples=100 rate=666  fname=c.100.666.dat  | sendphot
 echo counter nsamples=100 rate=625  fname=c.100.625.dat  | sendphot
 echo counter nsamples=100 rate=555  fname=c.100.555.dat  | sendphot
 echo counter nsamples=100 rate=500  fname=c.100.500.dat  | sendphot
 echo counter nsamples=100 rate=454  fname=c.100.454.dat  | sendphot
 echo counter nsamples=100 rate=416  fname=c.100.416.dat  | sendphot
 echo counter nsamples=100 rate=384  fname=c.100.384.dat  | sendphot
 echo counter nsamples=100 rate=370  fname=c.100.370.dat  | sendphot

Where do these odd sample rates come from? The counter in the PC has a
clock rate of 10kHz, and the sample rate must be an integer multiple of
this. So if you ask for a sample rate of 3000 Hz the counter will give
you the nearest allowed value: 10000 Hz/3 = 3333 Hz. The fastest that
the counter can operate at is 10000Hz/2 = 5000 Hz

Calculate the mean count for each sequence and also the standard
deviation.  Suppose "xbar" and "s" are the means and standard
deviations, use the command

   IDL> plot, xbar, s^2

to make a plot of the mean versus the variance (the standard deviation
squared).  Now over plot a line representing x=y,

   IDL> oplot, xbar, xbar

i.e., plot the mean versus itself.  What does this tell you about the
relation between mean and variance for counting (Poisson) statistics?


6) POISSON DISTRIBUTION

Plot a histogram for one of your sequences with a small count rate, e.g,
2-4 counts per sample and lots of samples, e.g., 1000. Calculate the
mean count rate and compare with the theoretical Poisson distribution,

   P(x,mu) = mu^x / ( factorial(x) * exp(mu) ). 

Use IDL's OPLOT function to compare the observations and
prediction. 

Think about that! How do you compare a histogram and a theoretical
probability distribution!? The Poisson distribution gives a
probability. You have measured counts. Explain how to choose the
correct scaling factor (or normalization) to compare the measured and
theoretical distributions.

Does the Poisson distribution provide a good description of the data?
Now arrange so that the counts per sample is increased (be careful
that the count rate does not exceed 1 MHz).  Aim for several hundred
counts per sample. Plot the histogram again.  What has happened to the
shape of the histogram?  Calculate the mean and standard deviation and
over-plot the corresponding Gaussian probability distribution,

   P(x,mu,sigma) = exp(-0.5*((x-mu)/sigma)^2)/(sigma * sqrt(2 * !pi)).

Is a Gaussian curve a good approximation to the Poisson distribution?


7) STANDARD DEVIATION OF THE MEAN

The more events you count the more accurately you can measure the
number of counts per sample (i.e., the count rate).  To illustrate the
effect take *ten* sets of data with a given number of samples, say
16. Choose a fixed sample rate, say 1000 Hz.

For each of these ten sets calculate the mean.  Due to statistical
variations the ten means will be different, so also calculate the mean
of the means and the standard deviation of the means (SDOM).  The SDOM
is a measure of how precisely we know the average counts per sample.

How does the SDOM vary with the number of samples? Intuition suggests
that if we have more samples in each of our ten measurements the SDOM
will be smaller. To quantify this effect repeat with 2, 4, 8, 16, 32,
64, 128, 256, 512, 1024, 2048 etc. samples. Don't vary the sample
rate.

For each sample size consider the ten data sets and calculate the mean
of the means (MOM) and the standard deviation of the means (SDOM).
Plot the MOM and the SDOM as a function of the number of samples.
Describe how the MOM and SDOM vary as the sample size increases?
Based on your knowledge of Poisson statistics predict the SDOM given
the measured mean count per sample the and sample size. Use the IDL
OPLOT function to compare your prediction with the data. If I want to
improve the accuracy of a measurement of the mean by a factor of two,
by what factor do I need to increase the number of samples?

How accurate is your best estimate of the count rate, i.e., how
accurate is the MOM?

Is it possible to construct a light source for the photometer
experiment that would not show variations in the count rate?


Write up your lab report as a LaTeX document describing each of the
above exercises. Show your results by including IDL plots in your
report.

Your report is due on September 12 at 6PM. NO EXTENSIONS WILL BE
GRANTED.

EQUIPMENT 

PMT + dark box, counter, dim light source of controllable
intensity. Access to IDL programming environment.