\documentclass[psfig,preprint]{aastex}
%\documentstyle[12pt,aasms4]{article}
%\documentstyle[11pt,psfig,aaspp4]{article}
%\documentstyle[aas2pp4]{article}
\begin{document}
\title{FINE TIMES WITH FOURIER TRANSFORMS (FT's WITH FT's) \\ or \\
``WHY DOES THAT FFT OUTPUT LOOK SO WEIRD???''}
``Spectral analysis'' means describing a time-varying signal
by its power spectrum, otherwise known as its Fourier spectrum or its
frequency spectrum. It should be obvious that such spectra are essential for
astronomers: for example, spectra tell us not only about pulsar periods,
but also about chemical composition in stars and Doppler velocities.
But the description of signals in terms of spectra or Fourier components
has importance in many fields, and this becomes obvious on just a little
reflection. A prime example: a bridge is subject to shaking by
earthquakes; if its natural resonance frequency is close to the
frequency of ground movement, you've got trouble!
Spectral analysis is an integral part of some of our labs. We
will sample signals at regular intervals with our computers, a process
known as ``discrete sampling'', and our computers will change the
voltages to digital numbers, a process known as ``digitization''. We'll
then do a discrete Fourier transform to calculate the power spectrum.
There are some intricacies with this process\dots
\tableofcontents
\section{THE INTEGRAL FOURIER TRANSFORM: TIME $\leftrightarrow$ FREQUENCY}
\subsection{Classical Fourier transforms}
We use the Fourier transform (FT) to convert a time-varying
voltage $E(t)$ from the time to the frequency domain $E(\nu)$, and {\it
vice versa}. The formally correct equations are
\begin{mathletters} \label{eqone}
\begin{equation}
E(\nu) = \int_{-\infty}^{\infty} E(t) e^{[2 \pi i]\nu t} dt \;
\end{equation}
\noindent and, to go the other way,
\begin{equation}
E(t) = \int_{-\infty}^{\infty} E(\nu) e^{-[2 \pi i]\nu t} d\nu \;
\end{equation}
\end{mathletters}
In our case, we are going to modify this a bit so that we don't run into
infinities. We will write\dots
\begin{equation} \label{eqtwo}
E(\nu) = \lim_{T \to \infty} {1 \over 2T}
\int_{-T}^{+T} E(t) e^{[2 \pi i]\nu t} dt \; .
\end{equation}
\noindent We are often interested in the {\it power spectrum}. Power
goes as amplitude squared, so the power spectrum $P(\nu ) \propto E(\nu
)^2$. A small complication: $E(\nu)$ can be complex, but the power
spectrum is just watts per Hz and must be real. We take care of this by
defining
\begin{equation}
P(\nu ) = E(\nu ) \times [E(\nu )]^* \; ,
\end{equation}
\noindent where the superscripted $[E(\nu)]^*$ means the complex
conjugate. This multiplication is the equivalent of ``detection''.
\section{EFFECTS OF $T$ BEING FINITE: INTEGRAL TRANSFORMS}
There's a fundamental fact of the real world: we don't live
forever, and even if we did TAC's wouldn't allow us infinite telescope
time for our precious projects, so we never can allow $T \rightarrow
\infty$. This frustrates the old politicians who want absolute control
over everything, forever. Less seriously in some circles---and more in
others---it has two major ramifications for Fourier transorm devotees:
\begin{enumerate}
\item It limits the frequency resolution.
\item It produces ``spectral leakage'': The result at a given
frequency in the Fourier transform spectrum is contaminated by those at
{\it all other} frequncies in the spectrum!
\end{enumerate}
We treat these two ramifications in the following subsections.
\subsection{Spectral Resolution} \label{spctres}
Equations \ref{eqone} and \ref{eqtwo} are fine as they stand,
but when we sample a signal $T$ is finite. This limits our spectral
resolution. To understand this, consider the analytic solution of
equation \ref{eqtwo} for a monochromatic cosine wave of frequency
$\nu_s$, that is $E(t) = \cos(2 \pi \nu_s t)$. For $T \rightarrow
\infty$, $E(\nu) = 0$ except for $\nu = \nu_s$; this is commonly
expressed by the term ``Dirac delta function''. But more generally, we
have as the analytic solution
\begin{equation}
E(\nu) = {\sin (2 \pi (\nu - \nu_s) T) \over 2 \pi (\nu - \nu_s)
T} \; .
\end{equation}
\begin{equation}
P(\nu) = \left( {\sin (2 \pi (\nu - \nu_s) T) \over 2 \pi (\nu - \nu_s)
T}
\right)^2 \; .
\end{equation}
\noindent It is simpler to write $\delta \nu = (\nu - \nu_s)$ and write
\begin{equation}
E(\nu) = {\sin (2 \pi \delta \nu T) \over 2 \pi \delta \nu T} \; .
\end{equation}
\begin{equation}
P(\nu) = \left ({\sin (2 \pi \delta \nu T) \over 2 \pi \delta \nu T}
\right)^2 \; .
\end{equation}
\noindent and we see that this is just the famous $\sin x \over x$ type
behavior. This quantitatively expresses the degradation in frequency
resolution when you sample for a finite time; roughly, the spectral
resolution is the width of this function, which is $\Delta \nu \sim {1
\over T}$.
In optical and IR astronomy, the need for high spectral
resolution leads to the requirement of long time delays, which in turn
means long path lengths. For grating and prism spectrometers, this can
be achieved only by using physically large devices. For a Fabry-Perot,
the long path is achieved by multiple reflections. In terms of the
conventional definition of resolving power, ${\Delta \lambda \over
\lambda} \sim {\lambda \over size}$. There's no way around this!
\subsection{Spectral Leakage}
Not only is the infinitesmally-narrow delta function replaced by
a function of width $\sim {1 \over 2T}$ but also the function $\sin x
\over x$ has ``sidelobes'': it spreads power out over a huge range of
frequencies, albeit---after squaring, anyway---at a low level. The
spread-out power gets weaker as you move away from the signal, that is
as $\delta \nu$ increases---but it doesn't decrease monotonically.
Rather, the sidelobes mean that there the spread-out power goes to
zero periodically for $\delta \nu = {1 \over 2T}$. In colloquial terms,
the power of the monochromatic sine wave ``leaks'' away in the power
spectrum to all frequencies. Not surprisingly, this is called
``spectral leakage''. It's covered in chapter 12.7 of {\it Numerical
Recipes}.
You can reduce spectral leakage by using a ``window function''
$W(t)$ that is not ``square''. What on Earth does this mean? In equation
\ref{eqtwo}, when you use a finite $T$ you are integrating over a
``window'' of total length $2T$, as follows:
\begin{equation}
E(\nu) = {1 \over 2T}
\int_{-T}^{+T} W(t) E(t) e^{[2 \pi i]\nu t} dt \; .
\end{equation}
\noindent In equation \ref{eqtwo}, the window function $W(t)$ is unity;
when we let $W(t) = 1$ for the time window of length $2T$, then this is
called a ``square window function''. A square window function produces
``sharp edges'' in the sense that the function $E(t)$ has its full
amplitude for all $t$ within the window and then cuts off precipitiously
at the ends of the window. {\it It's this sudden cutoff---the sharp
edge---that produces spectral leakage.}
We can reduce spectral leakage by making $W(t)$ a function that
produces a {\it gradual} cutoff so as to eliminate the sharp edge. We
make $W(t)$ a function that gradually falls to zero at the window edges
$-T$ and $T$. As one possible example among many, consider the Hanning
weighting function, which is commonly used in radio astronomy:
\begin{equation}
W(t) = {1 \over 2} \left( 1 + \cos{\pi t \over T} \right) \; .
\label{weighting}
\end{equation}
\noindent This drastically reduces the sidelobes. This comes at a
price, though: the spectral resolution gets worse---by about a factor
of two. The reason is that the window function $W(t)$ goes to zero
at the ends of the time window $2T$, which effectivly makes the window
shorter. {\it Note:} We also use the term ``weighting function'' for
$W(t)$.
\section{DISCRETELY SAMPLED SIGNALS}
When I was a teenager we had vinyl records. They contained {\it
continuously-sampled} signals that were directly imprinted on the vinyl.
Today we focus on CD's. These contain {\it discretely-sampled} signals
that are digitized and written on to the little silver disk. The fact
that CD's contain discretely sampled data means that the signal is known
only for certain specific times. In particular, the signal is not
sampled {\it between} these times. Does this mean that CD's don't sound
as good as the vinyl records?
Regarding the calculation of our power spectrum with a FT, you'd
think that the replacement of an integral by a sum would be
straightforward, with no subtleties. But this {\it isn't the case}.
There {\it are} subtleties, and they are {\it crucial}. Understanding
them, and dealing with them properly, makes your CD's sound just as good
as vinyl. So read on: we'll first provide a qualitative introduction
and then go into details.
\subsection {Discretely-Sampled Signals: The Nyquist Criterion.}
When your CD music is recorded, or in astronomy when we observe
a pulsar or sample an incoming signal to determine the spectrum, we
sample the incoming signal sequentially in time, at regular intervals
called the {\it sampling interval} $\Delta t = t_{smpl}$. Equivalently,
we are sampling at a regular rate called the sampling frequency
$\nu_{smpl} = {1 \over t_{smpl}}$.
In discrete sampling, we first require that the signal be
limited in bandwidth. That is, the highest frequency in its spectrum
must be limited to an upper cutoff, which we call the bandwidth $B$;
thus, the frequencies in the signal extend from 0 to $B$ Hz. Then we
must sample the signal periodically at a fast enough
rate---specifically, we require ($\nu_{smpl} \ge 2 B$ Hz). This is the
``Nyquist criterion''. If the Nyquist criterion is violated, we have
the problem of {\it aliasing}, which means that signals with frequencies
$\nu > B$ {\it appear} as {\it lower} frequencies. Remember those
movies showing cars and the wheels that seem to rotate backwards? That's
a real-world example of aliasing: the movie frames discretely sample the
scene at too slow a rate to faithfully reproduce the wheel's rotation.
You can understand aliasing by looking at Figure 1.
To put it another way: When we sample at rate $\nu_{smpl}$, the
maximum signal frequency that can be faithfully reproduced is
$\nu_{smpl} \over 2$. This is called the Nyquist frequency and here we
denote this by $f_N = {\nu_{smpl} \over 2}$. Clearly, the signal's
bandwidth $B$ must satisfy $B \le f_N$. All this is called the {\it
sampling theorem}.
How rapidly must music be sampled for recording on a CD? The
human ear responds to something like 20 kHz. To prevent aliasing, we
need to sample at twice this, at about 40 kHz.
\subsection {A Popular MISCONCEPTION!!!}
Somebody, sometime, will probably tell you that aliasing has to
do with {\it discrete Fourier transforms}. Tell them to go jump in the
lake. It has to do with the sampling rate. Period. Aliasing occurs if
you don't sample fast enough, and affects your results {\it whether or
not} you are dealing with Fourier transforms. If you don't believe me,
just look again at Figure 1. And remember what happens to rotating
wheels in movies.
\section{DISCRETE SAMPLING AND THE FOURIER TRANSFORM}
We use the Fourier transform to convert from the time to the
frequency domain, and {\it vice versa}. For continuously-sampled
signals we use the Fourier integral, equation \ref{eqtwo}. For
discretely-sampled signals we have to replace this by a summation. This
is the {\it discrete Fourier transform.}
\subsection {The maximum recoverable bandwidth: the Nyquist frequency}
OK, we've sampled our signal at regular intervals $t_{smpl}$,
meaning that the sampling frequency is $\nu_{smpl} = {1 \over t_{smpl}}$
and $f_N = {\nu_{smpl} \over 2}$. If we do this for time $2T$ then we
obtain $2J = {2T \over t_{smpl}}$ independent points. From these we wish
to compute the spectrum.
If we want to digitally compute the Fourier transform, then the
first thing we must realize that the original $2J$ channels provide
only $J$ spectral points! This seems like a loss of information, but
it's not: the spectrum is {\it complex}, so those $J$ complex numbers
contain $2J$ independent numbers.
With sample frequency $\nu_{smpl}$ we obtain the spectrum for
the range of frequencies $0 \rightarrow {\nu_{smpl} \over 2}$ or $0
\rightarrow f_N$; this is known as the {\it bandwidth}, denoted by $B$.
With $J$ independent points, the separation (and the frequency
resolution) is $\Delta \nu = {\nu_{smpl} \over 2J}$. Fortunately,
${\nu_{smpl} \over 2J} = {1 \over 2 J t_{smpl}} = {1 \over 2T}$; this
confirms our earlier discussion in \S \ref{spctres} about frequency
resolution.
But more importantly for the present discussion, when sampling
at rate $\nu_{smpl}$ you cannot recover a spectrum wider in bandwidth
than $f_N = {\nu_{smpl} \over 2}$. For digitally sampled signals this is
obvious, but it also applies to other discretely sampled time series.
Consider, for example, the classical optical grating spectrometer. The
incoming light strikes the grating at angle $\theta$ away from the
normal and for the first order reflection leaves at the same angle. Thus
the difference in time delay from one groove to the next is ${2s \cos
\theta \over c}$, where $s$ is the groove separation. You don't have any
information for time differences between these values. So this is
exactly equivaqent to $t_{smpl}$, so the maximum fractional bandwidth is
${f_N \over f} = {1 \over 4 {s \over \lambda} \cos \theta}$. To attain a
substantial fractional bandwidth, e.g.\ ${f_N \over f} > {1/2}$ say, we
require the spacing $s < {\lambda /2 \cos \theta}$. Moreover, to avoid
aliasing it's absolutely necessary to insert a filter in front of the
grating to limit the input bandwidth.
\subsection {Summary: The Two Fundamental Parameters in Discrete
Sampling and Fourier Transforms}
The above two parameters are the fundamental ones.
{\bf (1)} To prevent aliasing, we must satisfy the sampling
theorem: the total signal bandwidth $B$ must be small enough
, so $B \leq f_N$, or $B \leq {\nu_{smpl} \over 2}$.\footnote{For
complex input, which can occur in radio astronomy and some other
applications, the factor of 2 doesn't apply.} Usually you need to limit
the bandwidth with a filter.
{\bf (2)} In the Fourier-transformed power spectrum, the spectral
resolution is the reciprocal of the total time over which the FT is
computed: $\Delta \nu ={1 \over T_{tot}}$.
\section{THE DISCRETE FOURIER TRANSFORM AND DISCRETE SAMPLING}
\label{four}
With the discrete FT, we have to replace the Fourier integral in
equation (2) by a summation. Let's do this with a one-to-one
correspondence of the terms. First we rewrite equation \ref{eqtwo} to
make it easier for a direct comparison:
\begin{equation} \label{eqtwoa}
E(\nu) = {1 \over 2T}
\int_{-T}^{+T} E(t) e^{[2 \pi i]\nu t} dt \; .
\end{equation}
In our discretely-sampled case we can replace $t$ by $j
t_{smpl}$, defined for $j = -J \rightarrow J$; $\nu$ by ${k\nu_{smpl}
\over 2J}$; and $dt$ by $t_{smpl}$. We would calculate
$E({k\nu_{smpl}\over 2J})$ for $k = -J \rightarrow J$, so the summation
looks like\dots
\begin{equation} \label{eqfivea}
E({k\nu_{smpl} \over 2J}) = {1 \over 2Jt_{smpl}} \sum_{j=-J}^{J-1} E(j
t_{smpl}) e^{[2 \pi i] \nu_{smpl}t_{smpl} {jk \over 2J}} t_{smpl} \; .
\end{equation}
\noindent Here we've taken $dt = \Delta t = t_{smpl} \Delta j =
t_{smpl}$ (i.e., $\Delta j=1$). The product $t_{smpl}\nu_{smpl} = 1$,
so we can simplify our notation by eliminating these variables,
replacing ${k\nu_{smpl} \over 2J}$ by the much simpler $k$, and
replacing $j t_{smpl}$ by just $j$ and writing\dots
\begin{equation} \label{five}
E(k) = {1 \over 2J} \sum_{j=-J}^{J-1} E(j) e^{[\pi i]
{kj \over J}} \; . \end{equation}
\noindent This is conventionally written
\begin{equation} E(k) = {1 \over 2J} \sum_{j=0}^{2J-1} E(j) e^{[\pi i]
{kj \over J}} \; \; {\rm or} \; \;
E(k) = {1 \over N} \sum_{n=0}^{N-1} E(n) e^{[2\pi i]
{kn \over N}} \ . \end{equation}
\subsection{IMPORTANT DETAIL Regarding Limits of Summation and
Periodicities}
Are you paying attention? If so, you should be asking why the
upper limit in equation \ref{five} on the sum is $J-1$ instead of $J$.
One reason is that a sum from $-J \rightarrow J$ has $(2J+1)$
samples. But we have only $2J$ samples. So we can't possibly sum from
$-J \rightarrow J$.
This doesn't matter at all. To begin with, look at equation
\ref{five} carefully: you can verify for yourself that the trig
portion---that is, the $e^{[\pi i] {kj \over J}}$ term---is {\it
periodic} in both $j$ and $k$, with period $2J$. That is, you can use
either $k$ or $(k + 2J)$---you'll get the same answer. Same with
$j$.\footnote{With the qualification that $k$ is an integer, i.e.\ that
you are calculating the result only for integral multiples of
$\nu_{smpl} \over 2J$. We discuss this in more detail below.} And one
other thing: the discrete Fourier transform makes the implicit,
intrinsic, completely irrevocable {\it assumption} that the input signal
$E(j)$ is {\it also} periodic with period $2J$. Thus the {\it entire
quantity being summed} is periodic with period $2J$. This means, also,
that the {\it result} of the summation, namely $E(k)$, is {\it also}
periodic with period $2J$. The intrinsic, irrevocable assumption that
$E(j)$ is periodic means that we only need to know $2J$ values of
$E(j)$; the $j = (2J+1)$ value is equal to the $j=0$ value.
Now, everybody knows\footnote{If you don't believe me, go look
at your elementary calculus book, in the chapter where integration was
introduced.} that when we replace an integral by a digital summation we
regard each sample as centered on a bin of width $unity \times \Delta
t$. However, the samples at the end, with $j=-J$ and $j=J$, must have
bins of width ${1 \over 2} \times \Delta t$, so we're supposed to
include both $E(-J)$ and $E(J)$ in the sum, each with a weight of $1
\over 2$. But because of the $2J$ periodicity, we can instead include
just one and use a weight of unity.
\subsection{ Again: The Periodic Nature of the Discrete FT}
Above we discussed the periodic nature of both $E(j)$ and
$E(k)$. Let's delve deeper.
By its very nature, the Fourier transform wants to integrate to
$\infty$. But the input signal doesn't go on forever! We have to resolve
this basic incompatibility: the input signal is sampled for a finite
time, but the Fourier transform needs it to go on forever.
There's only one way to resolve this incompatibility: we must
assume that the input signal {\it does} go on forever, and because we
don't know what happens outside the sampling interval $-J \rightarrow
J$, the only sensible thing to do is to assume that the input signal is
{\it periodic} with period $2J$.\footnote{You might be surprised: why
not assume the signal is {\it zero} outside the interval? {\it Answer}:
most signals don't stop simply because we stop sampling them---for
example, a pulsar. {\it More Fundamental Answer}: the math requires this
assumption!}
This assumption of ``forever'', together with the associated
$2J$ periodicity, leads to the necessary {\it result} that the spectrum
$E(k)$ and its power spectrum $P(k) = E(k) \times [E(k)]^*$ {\it also}
go on forever, $k = -\infty \rightarrow +\infty$, and are periodic with
period $2J$.
Because of these periodicities, we gain complete information on
the spectrum by restricting our attention to {\it windows} of length
$2J$: these windows are the finite intervals in $(j, k)$ between the
dotted lines. They can begin and end anywhere---all that matters is
that their length is $2J$.
\section{THAT WEIRD-LOOKING POWER SPECTRUM---IT'S JUST A MATTER OF
STANDARD CONVENTION}
Above we let the indices $j$ and $k$ run from $-J \rightarrow
J-1$. But in the real world of numerical computing, we don't use
negative indices. You might think that the reasonable way to handle
this would be to displace the whole set of $2J$ values of $E(j)$ and
$E(k)$ upwards by $J$ so that the indices run from $0 \rightarrow 2J-1$
(this would be perfectly compatible with IDL's indexing scheme). But
this would put the $t=0$ or $f=0$ point out in the middle, at $j$ or
$k=J$, and this isn't very convenient for lots of reasons.
Instead, realize that the FT is periodic in $j$ with period
$2J$. Therefore, in equation (5) it doesn't matter whether you sum from
$j = -J \to J-1$, from $j = 0 \to 2J-1$, or even (god forbid!) from $j =
-3J + 7 \to -J + 6$. So we might as well just sum from $j = 0 \to 2J-1$
and not displace anything at all. {\it This is the standard convention,
and it leads to the standard way in which FT arrays are stored in
memory}---not just in IDL but in just about {\it every} software
package. It has the great advantage that the $t=0$ or $f=0$ point is
the first one in the array.
Above we were discussing ``the FT array'', without specifying
whether it was the input or output array. The arrangement for the {\it
input array} works in exactly the same way as that for the {\it output}
array. And it doesn't matter whether the independent variable for the
array is time or frequency. {\it All FT arrays, whether input or output
and no matter what the independent variable, are arranged {\it
identically} with respect to the negative and positive values}.
\subsection{ Enough Preaching! Let's Try an Example in IDL}
Here we take an example with $J=32$, with the discretely-sampled
input signal being the sum of three cosine waves: $E(j) = \cos(\pi {8j
\over 32}) + 0.5[\cos(\pi {7j \over 32}) + \cos(\pi {9j \over 32})]$.
Thus we have power at three frequencies: $k=7, 8, 9$. There is twice as
much voltage, thus four times the power at $k=8$ than there is at $k=7,
9$. The power spectrum consists of these three nonzero points, plus a
bunch of zeros at all other frequencies.
\begin{figure}
\begin{center}
\leavevmode
%\includegraphics[scale=.55, angle=90]{lsfitfig.ps}
\includegraphics[width=6.0in]{fourierbfig.ps}
\end{center}
\caption{Upper plots in Figs 2 and 3 show a 64-point time series and
power spectrum. Lower plots show a portion of an infinitely-long
periodic time series and power spectrum from which the 64-point ones are
extracted. }
\end{figure}
In IDL, we denote $E(j)$ by {\bf EJ} and generate this
64-element array with the statements\dots
\begin{mathletters}
\begin{equation}
\bf{ EJ = cos( !pi * 8 * findgen(64)/32) }
\end{equation}
\begin{equation}
\bf{ EJ = EJ + 0.5*cos( !pi * 7
* findgen(64)/32) + cos( !pi * 9 * findgen(64)/32) ) }
\end{equation}
\end{mathletters}
\noindent Figure 2 (top) shows the plot of the 64-element array {\bf
EJ}, with the 64 crosses marking the computed points (or, in our
parlance, the discretely-sampled points). The interference of the sine
waves makes the sampled signal look like a wave packet of frequency
$k=8$, attenuated at the middle of the time interval where $j \sim 32$.
The signal $E(j)$ must be periodic with period $2J = 64$ and it
must go on forever. Figure 2 (bottom) shows this for a larger slice of
time in which $j = -128 \rightarrow 128$. Our points are computed for
IDL indices $0 \rightarrow 63$; this window is shown by the dotted lines
on Figure 2 (bottom).
We take the Fourier transform [$E(k) = FT( E(j))$]. In IDL, we
denote $E(k)$ by {\bf EK} and use IDL's Fast Fourier Transform (FFT)
procedure\dots
\begin{mathletters}
\begin{equation}
{\bf EK = fft(EJ)}
\end{equation}
\noindent and get then the power spectrum $P(k)$ ({\bf PK} in IDL)\dots
\begin{equation}
{\bf PK = float(EK * conj(EK))}
\end{equation}
\end{mathletters}
\noindent For convenience, we use the {\bf float} to discard the
imaginary portion of the product, which is zero---this makes {\bf PK}
real, so that in later operations we don't have to deal with a complex
array (e.g., to plot the power spectrum we can type {\bf plot, PK}
instead of {\bf plot, float(PK)}).
Figure 3 (top) shows the plot of the 64-element array {\bf PK},
with the 64 crosses marking the computed points returned by IDL. The
{\it positive} frequencies lie in the IDL index range $1 \rightarrow 32$
and the {\it negative} ones in the range $32 \rightarrow 63$. As
expected, there is nonzero power at only three positive frequencies, the
ones with IDL indices 7, 8, 9. There is also power at the three
corresponding negative frequencies, the ones with IDL indices 55, 56,
57.
The power spectrum must be periodic with period $2J = 64$ and it
must go on forever. Figure 3 (bottom) shows this for a larger slice of
frequency in which $k = -128 \rightarrow 128$. IDL indices for {\bf PK}
are $0 \rightarrow 63$; this window is shown by the dotted lines on
Figure 3 (bottom).
\subsection{ Is this wierdness perfectly clear? }
Probably not. So, in the following two subsections we go
through this again in excruciating detail. We provide first a detailed
verbal description of the arrangement, and then the shortest of short
numerical examples. In the verbal description we focus on the output
array to make things specific, but we just as well could have focussed
on the input array and replaced the word ``frequency'' by ``time''. As
you'll see, this leads to something surprising\dots we'll discuss it
explicitly (\S6.3).
\subsubsection{Verbal Description}
Again, we let the time for $E(t)$ run from $t = -T \rightarrow
+T$, with $T = J t_{smpl}$. The corresponding frequencies run from $f =
-f_N \rightarrow +f_N$, with $f_N = {\nu_{smpl} \over 2}$.
All FT arrays are arranged so that the {\it first} $J+1$
channels contain the {\it positive-frequency} portion of the transform.
Again, here we focus on the output array. Thus, for channel $k = 0
\rightarrow J$ the frequency of channel $k$ is $\nu _k = +{k f_N \over
J}$. Thus, channel 0 contains the $\nu =0$ result, channel 1 the $\nu =
+{f_N \over J}$ result, \dots, up to channel $J$ which contains the
maximum frequency $\nu = +f_N$.
The remaining $J-1$ channels contain the {\it
negative-frequency} portion of the transform. As we go from channel $J$
to $J+1$ we would go to the next highest {\it positive}-frequency point;
but because the FT is periodic with period $2J$, this must be identical
to the corresponding point at {\it negative} frequency. This means that
channel $J$ contains {\it not only} the result for $\nu = +f_N$, but {\it
also} the result for $\nu = -f_N$. And the remaining $J-1$ higher
channels contain the rest of the {\it negative}-frequency points, so for
$k = J \rightarrow (2J-1)$ the frequency is $\nu _k = -f_N + {(k - J) f_N
\over J}$. Thus channel $J$ has $\nu = -f_N$, channel $(J+1)$ has $\nu =
-f_N + {f_N \over J}$, \dots, and channel $(2J-1)$ has the $\nu = -{f_N
\over J}$ result. If you were to consider channel number $2J$ it would
contain the next highest frequency, which is $\nu = 0$; of course, this
is identical to the result in channel 0.
Note that frequency can {\it always be considered to increase
with channel number}: you can even regard the big backwards jump from
$+f_N$ to $-f_N$ at channel $J$ as {\it not} being a jump because the
periodic nature of the transform means that the results for these two
frequencies must be identical, and successively higher negative
frequencies are equivalent to successively higher positive frequencies
above $+f_N$.
So any FT array, for example the spectrum, contains $2J$
independent points. More generally, the FT spectrum could contain $4J$
points, or $6J$ points, etc. We could calculate as many points as we
wish. However, because the FT is periodic, with period $2J$, channel
$2J+1$ would contain the same result as in channel 1, etc. So there is
no sense in calculating more than $2J$ points, because the calculations
would be redundant.
\subsubsection{An Example}
Consider a simple case with $J=4$; you've taken 8 time
samples. Then the proper way to arrange the input to the FT is the
following, where the {\it left-hand matrix contains the IDL indices} and
the {\it right hand the times or frequencies}:
\begin{mathletters}
\begin{eqnarray}
\left[
\begin{array}{r}
0 \\ 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7
\end{array} \right] \leftrightarrow
\left[
\begin{array}{r}
0 \times {T \over J} \\
1 \times {T \over J} \\
2 \times {T \over J} \\
3 \times {T \over J} \\
\pm 4 \times {T \over J} \\
-3 \times {T \over J} \\
-2 \times {T \over J} \\
-1 \times {T \over J} \\
\end{array} \right]
\end{eqnarray}
\noindent and the output looks like
\begin{eqnarray}
\left[
\begin{array}{r}
0 \\ 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7
\end{array} \right] \leftrightarrow
\left[
\begin{array}{r}
0 \times {f_N \over J} \\
1 \times {f_N \over J} \\
2 \times {f_N \over J} \\
3 \times {f_N \over J} \\
\pm 4 \times {f_N \over J} \\
-3 \times {f_N \over J} \\
-2 \times {f_N \over J} \\
-1 \times {f_N \over J} \\
\end{array} \right]
\end{eqnarray}
\end{mathletters}
\subsubsection{Wait a Minute! You Mean the TIME Isn't CONTINUOUS?}
Above we said---and we meant it---that in our verbal description
we focussed on {\it frequency}, but we could just as well have focussed
on {\it time}. And in equation (8a) above, where we explicitly listed
the time versus the IDL index, the time {\it begins} at $0$ for IDL
index 0, runs up to $3 \times {T \over J}$ for IDL index $3$,
and then ``wraps around'' to {\it negative} times for IDL index $\ge 4$.
So it seems that time versus IDL index is discontinuous---just like
frequency.
{\it Yes it's true!!} If you take a bunch of time samples and
compute the FT, you should put the negative time samples into the upper
index ranges.
But for the calculation of just the power spectrum, it doesn't
matter whether you bother with this reshuffling or not---you get the
same answer whether or not you reshuffle. What it {\it does} matter for
is the cases for which you are really interested in the {\it phase} of
the FT output array. Remember that the output array $E(k)$ in equation
(5) is complex; the ratio of imaginary to real parts gives the phase of
the signal. This phase is defined with respect to $t=0$. If you want
the phase to be correctly defined, and if you want to regard the samples
as extending from $-T \rightarrow T$ instead of $0 \rightarrow 2T$, then
you must reshuffle the discrete time samples according to the above
prescription.
Of course, the power spectrum doesn't have any phase
information: it's only a specification of the power versus frequency.
In other words, detected signals have no phase information. In our lab
work we generally don't care about the absolute phase of the signal, so
you have no need to carry out the index reshuffling before computing the
FT.
\section{The spectrum at an arbitrary frequency}
Equation \ref{five} provides results for discretely-spaced
frequencies at intervals $\Delta \nu$. It's often nice to have results
for arbitrary frequencies. There are interpolation techniques. However,
here we offer the brute force approach of equation
\ref{eqfivea}, which we repeat here in slightly modified form:
\begin{equation} \label{eqfiveb}
E({\kappa \nu_{smpl} \over 2J}) = {1 \over 2J} \sum_{j=-J}^{J-1} E(j)
e^{[2 \pi i] {j\kappa \over 2J} } \; .
\end{equation}
\noindent Here we have eliminated $t_{smpl}$ and $\nu_{smpl}$ on the
right hand side and replaced $k$ by $\kappa$ to emphasize the fact that
$\kappa$ can be a non-integer.
\subsection{Two examples}
We consider two examples. Both have 64 datapoints; note that $64
= 2^6$.
\subsubsection{The first example: a spike centered on integral $k$}
First, the example of $[E(j) = \cos(2 \pi {\nu_{smpl} \over 8}]
t)]$. This is a monochromatic wave whose frequency is exactly
one-quarter the Nyquist frequency, i.e.\ when plotted it is exactly
one-quarter the way between 0 and $f_N$. It falls {\it exactly} on an
integral value of $k$. We show the resulting $E(ss{k \nu_{smpl} \over
2J})$ by the squares in Figure \ref{three} (top), and we show the
almost-continuous curve from fractional values of $\kappa$ in equation
\ref{eqfiveb} as the solid curve. The squares are nonzero {\it only} for
one single value of $k$. This is because the signal frequency is {\it
exactly centered} on the frequency represented by this value of $k$.
But look at the solid curve. You might have thought that this
would be {\it symmetric} about the signal frequency; this is what the
analytic equation \ref{eqtwoa} seems to suggest. {\it But it's not
symmetric.} The reason is that, because the signal is real, the spectrum
is Hermetian. Hermetian means that the negative frequency real part is
equal to the positive frequency real part. (In this case, the signal is
symmetric in time, so the imaginary components are zero.) If you include
the negative frequencies in the analytic calculation, then you also find
that the sidelobes are not symmetric.
Similar asymmetries in sidelobe structure occur in the high end
of the spectrum because of aliasing: The sidelobes of the $\sin x \over
x$ function go on forever, so they exist at frequencies {\it beyond the
Nyquist frequency}. They are aliased back into the displayed spectrum
and this also produces asymmetry.
\subsubsection{The second example: a spike {\it not} centered on
integral $k$}
Second, the example of $[E(j) = \cos(2 \pi {\nu_{smpl} \over 10}
t)]$. This is a monochromatic wave whose frequency is exactly one-tenth
the Nyquist frequency, i.e.\ when plotted it is exactly one-fifth the
way between 0 and $f_N$. But, in contrast to the previous example, it
does {\it not} fall on an integral value of $k$. Again we show the
resulting $E({k \nu_{smpl} \over 2J})$ by the squares in Figure
\ref{three} (bottom), and we show the almost-continuous curve from
fractional values of $\kappa$ in equation \ref{eqfiveb} as the solid
curve. The squares are nowhere zero because the signal centered on a
non-integral value of $\kappa$. Comments about the sidelobe asymmetry
apply here too, of course.
\subsection{ The repeating windows for nonintegral $k$}
We have extensively discussed the periodic nature of the
discrete FT in \S \ref{four}. Specifically, equation \ref{five} shows
this periodicity: both the input signal and the output spectrum are
periodic with period $2J$. But for the input signal, this periodicity
exists only if $k$ ($\kappa$ in equation \ref{eqfiveb}) is an
integer.\footnote{Similarly, for the spectrum this periodicity exists
only if $j$ is an integer; but $j$ is always an integer!}
If $\kappa$ is not an integer, then the interpolated spectra
(i.e., for nonintegral $\kappa$) at frequency offsets of $J \nu_{smpl}$
(equal to $2J f_N)$ are {\it not} identical. Fundamentally this is an
effect of the time-shift property of FT's. Nevertheless, for the
integral values of $k$, which provide the basic spectral information,
the spectra at frequency offsets of $J \nu_{smpl}$ (equal to $2J f_N)$)
{\it are} identical. Here the digital world triumphs over the time-shift
theorem!
\section{THE FAST FOURIER TRANSFORM}
The Fourier transform as defined in equation \ref{five}, for
example, is an operation that requires of order $(2J)^2$ operations:
$2J$ frequencies to be computed, each of which requires a sum over $2J$
datapoints. Many applications generate huge values of $J$ and the
computing time becomes impractically long.
Enter the FFT. The fundamental idea is to split the FT into
smaller chunks. Suppose, for example, you have $2^N$ datapoints. You
split the FT into $2^N \over 2$ chunks each of which contains only two
numbers; you FT each pair; then you put the chunks back together again.
This works and requires only $\sim \log_2 N$ operations. This has two
advantages: The obvious one is computing time. The less obvious one is
numerical accuracy: fewer operations means smaller roundoff errors in
the final result.
Suppose you have an arbitrary number $M$ of datapoints. IDL's
FFT routine factors $M$ into as many chunks as possible. It's most
efficient for chunks that are powers of 2, 3, or 5. The more factors,
the faster the runtime. People tend to think of and always use powers of
2, and this is indeed the fastest kind of FFT, but the presence of other
factors can be quite acceptable.
This can lead to large apparent anamolies in runtime. You might
have $M$ equal to some large integral power of 2. If then you add just
one more point, $M$ might be a prime number! (An easy, but not
interesting, example is $M=16$). The difference in computing time can be
enormous. IDL's FFT doesn't warn you about these matters, so you have to
think of them yourself---beforehand!
If you have an awkward value of $M$, you have two choices:
discard some datapoints to make the FFT fast, or pad the datapoints with
zeros. We'll discuss this technique below in \S \ref{***}.
\section{COSINE AND SIN TRANSFORMS}
The Fourier transform is by its intrinsic definition a complex
operation. However, there are many instances when you need to take a
cosine or sin transform. This is straightforward, but it's worth
spending some space on this because some people get it wrong.
Suppose you have $J$ datapoints and you wish to take the cosine
transform using the FFT method. That is, you use equation \ref{five},
which we reproduce here:
\begin{equation}
E(k) = {1 \over 2J} \sum_{j=-J}^{J-1} E(j) e^{[\pi i]
{kj \over J}} \; . \end{equation}
\noindent To take the cosine transform, you need to make sure that the
argument $E(j)$ is symmetric in $j$. The datapoints $D(j)$ are defined
only for $j \ge 0$. Defining the symmetric counterpart would seem to be
easy: just define
\begin{equation}
E(j) = D(j) \ ; E(-j) = D(j) \ , (j \ge 0)
\end{equation}
\noindent But you immediately run into a problem: you don't have $2J$
numbers, but instead you have $(2J-1)$ numbers---because the $j=0$
number cannot be duplicated at $-j$. To perform the FFT
%#############
\section{SOME FINAL POINTS}
\subsection{What's This Business About {\it Negative Frequencies}?}
There are some cases in which one can distinguish between
negative and positive frequencies. Specifically, these are cases in
which the input to the FT is {\it complex}. To be complex, the input
must have both a real and imaginary part: in other words, each sample
consists of two numbers, and these two numbers can be regarded as the
real and imaginary parts of a complex number. If you take AY120B, you
will encounter such a case.
\subsection{For Real Inputs, How Do These Negative Frequencies Enter the
Power Calculation?}
In the vast majority of applications, the samples consist
only of one number: each time sample represents
a real voltage (or a real number of photons), and there is nothing
imaginary---or complex (mathematically speaking, that is)---about them.
But it is perhaps surprising that the FT {\it output} numbers {\it are}
complex: the imaginary part is {\it not} zero. The phase angle of each
complex number represents the phase of that Fourier component with
respect to $t=0$. For the case of real numbers as input, the outputted
complex numbers have a simplification: the imaginary parts are odd and
the real parts even (in other words, the negative-frequency number is
the complex conjugate of the positive-frequency number).
This means that when you use the complex output spectral numbers
to calculate the corresponding power numbers (by $P(k) = E(k) \times
[E(k)]^*$), negative and positive frequencies have identical powers.
The proper way to combine the powers for the negative and positive
frequencies is simply to add them; but because the numbers are
identical, it's equivalent to simply use twice the full value of, say,
the positive-frequency number. It should be obvious that there is only
one number representing zero frequency, so you should {\it not} multiply
this by two.
Thus, in the example above in \S6.2, after calculating $P(k) =
E(k) \times [E(k)]^*$, your power spectrum is most simply given by the
first $P_k$ ($k = 0$) and twice the next four values of $P(k)$ ($k =1
\rightarrow 4$).
\subsection{A Detail on Normalization and ``Going Back and Forth''}
In IDL, the FFT is normalized by multiplying the sum by ${1
\over 2J}$, exactly as we've done in equation (5). Not all FFT routines
do the normalization in this way. This way has the advantage that the
scaling of the output is the same as that of the input---in other words,
it's sort of like taking an average, because we divide by the number of
points that contribute to the sum.
As we've mentioned in \S1, you should know that {\it apart from
normalization constants}, you can convert willy-nilly back and forth
from frequency to time by applying FT's in succession. That is, $E(k) =
FT(E(j))$ and $E(j) = FT^-(E(k))$. Here the superscript minus sign
indicates using the negative complex exponential in the transform, as in
equation (1b); this is called the {\it inverse Fourier transform}. More
graphically, $E(j) = FT^-[ FT (E(j))]$.
With the normalization used by IDL, the inverse transform must
{\it not} multiply the sum by ${1 \over 2J}$. Of course, IDL's inverse
transform does all this correctly. In IDL, you invoke the inverse
transform by using the {\bf inverse} keyword---see IDL's help on the
{\bf fft} procedure.
\section { REFERENCE READING}
The ``bible'' of FT's is Bracewell's book: it contains
detailed discussions, from the viewpoint of the sophisticated
intellectual, of most aspects of FT's. But this book is long and
detailed. For the {\it practical} aspects, you can't beat the quick,
useful summary of essentials in chapter 12 of {\it Numerical Recipes}.
I cannot recommend this book strongly enough. You will greatly benefit
from reading, at a minimum, the following sections.
{\it Chapter 12.0} gives the basic theory of the FT and
summarizes the basic symmetry properties, and in particular the first
entry in its un-numbered table covers our case in which the samples are
real, not complex.
{\it Chapter 12.1} describes FT's of discretely sampled
data---the discrete FT---and why it differs in major fundamental
respects from the analytic integral FT. It tells about the Nyquist
criterion and aliasing.
{\it Chapter 12.2} begins with a readable account of why and
how the {\it Fast Fourier Transform} (FFT) works. It's called {\it
Fast} because the number of arithmetic operations in the transform is $N
\ln_2 N$, where $N$ is the number of points; in contrast, if you
evaluated the numerical integral of equation (5) in the straightforward,
naive way, there would be $N^2$ operations. This is a savings in
computer time by a factor ${N \over \ln_2 N}$, which can be huge: for
our Crab pulsar data set, we will have $N \sim 2 \times 10^6$ so the
savings is a factor $\sim 10^4$ in computer time. There's no need to
understand the details of what makes the FFT fast, so you can read this
part over lightly. What's important is {\it why it is} that a numerical
FFT of $2N$ time samples always comes out with $2N$ points that reflect
about frequency channel $N$. This has to do with the arrangement of
FFT's results, which we discussed {\it ad nauseum} above.
{\it Chapter 12.7} gives details about what a power spectrum
means and, importantly for you, how to calculate the power spectrum from
the FT. The FT is not without its peculiarities, however. In
particular, the section on ``data windowing'' deals with the
$\sin x \over x$ ``spectral leakage'' and discusses how to reduce it.
This isn't important for our particular application, but it is important
in many others.
\enddocument
\end
\end
Equation \ref{five} provides results for discretely-spaced
frequencies at intervals $\Delta \nu$. It's often nice to have results
for arbitrary frequencies. You can do this with brute force, but a more
elegant method is to use the proper interpolation formulae in the
frequency domain. This is (Brault \& White)
\begin{equation}
E(\nu) = {1 \over 2J} \sum_{k=0}^{2J-1} E(k)
{ \sin[ 2\pi (\nu - \nu_k) t_{smpl} J] \over
\tan[ 2\pi (\nu - \nu_k) t_{smpl} ] }
\end{equation}