Here is the chapter on Fourier transforms from my linear algebra book that goes into more details: https://minireference.com/static/excerpts/fourier_transforma...
As for the math, there really is no other way to convince yourself that sin(x) and sin(2x) are orthogonal with respect to the product int(f,g,[0,2pi]) other than to try it out https://live.sympy.org/?evaluate=integrate(%20sin(x)*sin(2*x... Try also with sin(3x) etc. and cos(n*x) etc.
A function is like a vector, but instead of having two or three dimensions you have a continuous number of them. Adding functions component-wise works just like adding vectors.
Just like regular vectors, you can choose to represent functions in a different basis. So you choose a family of other functions (call it a basis) that's big enough to represent any other you want. For a lot of reasons [1, 2], a very good choice is the set of complex exponentials g_w(x) = exp(2πiwx), for every real w. It's an infinite family, but that's what you need to deal with the diversity of functions that exist.
So you try to find the linear combination of exponentials that sum to your original function. You need a coefficient for each w, so call it c(w) for simplicity. After fixing the basis, the coefficients really have all the information to describe your function. They're an important object, and we call c(w) the Fourier transform.
How do you find the coefficients? Just project your original function onto a particular exp(2πiwx), that is, take the inner product. Usually the inner product is the sum of the products of coefficients. Since functions are continuously-valued, you use an integral instead of a sum. This is your formula for the Fourier transform.
I known there are technical conditions I am glossing over, but this is the intuition of it for me.
[1] There is an intuition for these exponentials. Complex exponentials are periodic functions, so you are decomposing a function in its constituent frequencies. You could also separate the exponential into a sin and cos, and will obtain other common formulas for the Fourier transform.
[2] Exponentials are like "eigenvectors" to the derivative operation (taking the derivative is just multiplying by a constant), so they're really useful in differential equations as well.
If this helps, then it can also help with understanding other projections such as the Laplace transform (a dot project against the complex signal space).
While this analogy has helped me, I still have no clue why real valued signals result in an even FT.
edit: grammar
Weights on neural networks don't have to be independent functions.
Independence gives you a set of mathematical guarantees that insure you fully cover the space you're representing. For example that given a 2 dimensional space, X and Y are pointing in different directions. If they pointed in the same direction you could not fully decompose all vectors on the plane into two coefficients of X and Y.
I disagree with that. It's pretty easy to prove it in general by calculating \int_0^{2\pi} sin(mx)sin(nx) dx etc. for m ≠ n.
The "no other way..." was referring to me not having an intuitive explanation to offer about why an sin(x) and sin(2x) are orthogonal.
1. The magical orthogonal basis functions: complex sinusoids. Shifting of a time signal just multiplies the Fourier counterpart by a new phase (relative to its represented frequency). Thus transforming to the Fourier basis enables an alternate method of implementing a lot of linear operations (like convolution, i.e. filtering).
2. The magic of the fast implementation of the Discrete Fourier Transform (DFT) as the Fast Fourier Transform (FFT) makes the above alternate method faster. It can be most easily understood by a programmer as a clever reuse of intermediate results from inner loops. The FFT is O(N log N), a direct DFT transform would be O(N^2)
A mathy demonstration of this at https://sourceforge.net/projects/kissfft/
For some intuition, consider music, especially on a violin. Fourier series applies to a periodic function (wave), and represents the whole wave as sine waves that fit the one period exactly. So, get sine waves at frequency 1, 2, ... that of the period. In music, these waves are called overtones.
Playing with a violin, the overtones are fully real and even important! E.g., get a tuning fork and tune the A string (second from the right as the violinist sees them) to 440 cycles per second (440 Hertz, 440 Hz). Then the D string, the next to the left, is supposed to have frequency 2/3rds that of the A string. So, bow the two strings together and listen for the pitch 880 Hz, that is, 3 times the desired frequency of the D string and twice that of the A string. So are listening to the second overtone of the D string and the first overtone of the A string; are hearing the third Fourier series term of the D string and the second Fourier series term of the A string. Adjust the tuning peg of the D string until don't hear beats. If the D string is at, say, 881 Hz, then will get 1 beat a second -- so this is an accurate method of tuning. Similarly for tuning the E string from the A string and the G string from the D string -- on a violin, the frequencies of adjacent strings are in the ratio of 3:2, that is, a perfect fifth. That's how violinists tune their violin -- which is needed often since violins are just wood and glue and less stable than, say, the cast iron frame of a piano.
For one more, hold a finger lightly against a string at 1/2 the length of the string and hear a note one octave, twice the frequency, higher. That's often done in the music, e.g., playing harmonics. And it's a good way to get the left hand where it belongs at the start of the famous Bach Preludio in E-major that starts on the E half way up the E string. Lightly touch one third of the way up the string and get three times the fundamental frequency, sometimes done in music to give a special tone color. Net, Fourier series, harmonics, and overtones are real everyday for violinists.
E.g., on a piano, hold down a key and then play and release the key one octave lower and notice that the strings of the key held down still vibrate. The key vibrating was stimulated by the first overtone of the key struck and released.
The Fourier integral applies to functions on the whole real line. Very careful math is in Rudin, Real and Complex Analysis.
Yes, Fourier series and integrals can be looked at as all about perpendicular projections of rank 1 as emphasized in Halmos, Finite Dimensional Vector Spaces, written in 1942 when Halmos was an assistant to John von Neumann at the Institute for Advanced Study. That Halmos book is a finite dimensional (linear algebra) introduction to Hilbert space apparently at least partly due to von Neumann. So, right, Fourier theory can be done in Hilbert space.
Fourier integrals and series are very close both intuitively and mathematically, one often an approximation to the other. E.g., if multiply in one (time, frequency) domain, then convolve in the other (frequency, time) domain. E.g., take a function on the whole real line, call it a box, that is 0 everywhere but 1 on, say, [-1,1]. Well the Fourier transform of the box is a wave, roughly a bell curve, that goes to zero quickly away from 0. A convolution is just a moving weighted average, usually a smoothing. Then given a function on the whole real line, regard that line as the time domain and multiply by the box. Now can regard the result as one period, under the box, of a periodic function to which can apply Fourier series. And in the frequency domain, the Fourier transform of the product is the smoothing with the Fourier transform of the box of the Fourier transform of the function and, then, an approximation of Fourier series coefficients of a periodic function with the one period under the box. Nice. That is partly why the fast Fourier transform algorithm is presented as applying both to Fourier series and the Fourier transform.
Mostly Fourier theory is done with an L^2, that is, finite square integral, assumption, but somewhere in my grad school notes I have some of the theory with just an L^1 assumption. Nice notes!
Essentially the Fourier transform of a Gaussian bell curve is a Gaussian bell curve -- if the curve is wide in one domain, then it is narrow in the other.
The uncertainty principle in quantum mechanics is just Plancherel's theorem from Fourier theory.
Can do a lot with Fourier theory just with little pictures such as for that box -- can get a lot of intuition for what is actually correct.
I got all wound up with this Fourier stuff when working on US Navy sonar signal processing.
Then at one point I moved on to power spectral analysis of wave forms, signals, sample paths of stochastic processes, as in
Blackman and Tukey, The Measurement of Power Spectra: From the Point of View of Communications Engineering.
Can get more on the relevant wave forms, signals, stochastic processes from
Athanasuis Papoulis, Probability, Random Variables, and Stochastic Processes, ISBN 07-048448-1.
with more on the math of the relevant stochastic processes in a chapter of
J. L. Doob, Stochastic Processes.
Doob was long a leader in stochastic processes in the US and the professor of Halmos.
At one time a hot area for applications of Fourier theory and the fast Fourier transform was to looking for oil, that is, mapping underground layers, as in
Enders A. Robinson, Multichannel Time Series Analysis with Digital Computer Programs.
Quickly antenna theory depends deeply on Fourier theory so can do beam forming, etc.
Can also see
Ron Bracewell, The Fourier Transform and its Applications.
Of course, one application is to holography. So, that's why can cut a hologram in half and still get the whole image, except with less resolution: The cutting in half is like applying that box, and the resulting Fourier transform is just the same as before except smoothed some by the Fourier transform of the box.
As I recall, in
David R. Brillinger, Time Series Analysis: Data Analysis and Theory, Expanded Edition, ISBN 0-8162-1150-7,
every time-invariant linear system (maybe with some meager additional assumptions) has sine waves as eigenvectors. That it, feed in a sine wave and, then, will get out a sine wave with the same frequency but maybe with amplitude and phase adjusted.
So, in a concert hall, the orchestra plays and up in the cheap seats what hear is the wave filtered by a convolution, that is, with the amplitudes and phases of the Fourier transform of the signal adjusted by the characteristics of the concert hall.
In particular, the usual audio tone controls are essentially just such adjustments of Fourier transform amplitudes and phases.
Since there a lot of systems that are time-invariant and linear or nearly so, there is no shortage of applications of Fourier theory.
If you see any signal, it can be represented as a value at each time, x(0) = 1, x(1) = 2 .. x(100) = 5 etc. We can visualize them as you shouting 1 at time 0, 2 at time 1 and 5 at time 100. Alternatively we can also do the same with a larger number of persons.
Representation using dirac delta
--------------------------------------
Lets say that you have 100 persons at your disposal. You ask first person to shout 1 at time 0, second person to shout 2 at time 1 and person to shout 5 at time 100. Other times they will be silent. So with these 100 people you can represent the signal X. We call each of these person as bases. Mathematically they are delta functions of time, ie they get activated only at their specified time. Other times they are silent, ie 0. The advantage of this representation is that you have fine control on the signal. If you want to modify value at time=5, you can just inform the 5th guy.
Introduction to bases
--------------------------
Dirac delta is not the only bases. You can ask multiple guys to shout at multiple times. They can even tell negative numbers. All you have to ensure is that they add up to the value of X. The guys should be able tell any number that can come as a part of X. This we name the property "SPAN".
Instead of 100 guys, we can have 200 guys too, ie 2 guys for each time and they tell half of the original value. However, this is wasteful since you have to pay for extra guys with no use. Hence we say that the bases should be orthogonal, ie they should not have correlation with others in the group. So as we have uncorrelated and spanning guys, we can represent any signal using them.
Fourier transform
--------------------------
In case of Fourier transform, each guy will shout according to a sinusoidal wave. Lets say sine wave. ie guy 1 at time 0 will tell the value of sine(f0 t). Second guy will shout value of sine(f1t) and so on. The f0, f1 etc are the frequencies for each guy. Now it comes out that these guys will be orthogonal to each other, and they can span all the signals. Thus we have Fourier transform. Hence instead of representing signal as value at each step, we can represent it as value at each frequency.
Why Fourier transform
-------------------------
We have seen that as long as bases span and and are orthogonal, they can define a transformation. But why is Fourier transform so famous. This comes from the systems we use. The most common systems we use are LTI(Linear time invariant) systems. A property of the said system is that they work on sinusoidal waves. Ie if a sinusoidal wave of frequency f is passed through an LTI system, all it can do is to multiply with a scalar. Any other wave will have a more complex effect. Hence if we can represent signals as a sum of sinusoids, we can represent our system as just a amplifier at each frequency. This makes whole of system analysis into a set of linear equations which we are good at solving. So we love Fourier transform
The cool thing about this insight is that the converse is true. You can disaggregate any waveform into its additive harmonics. This means you can jam multiple signals into a single channel (eg a fibre optic cable) and then apply a fourier transform at the other end to "untangle" them.