Friday, July 28, 2017

Section 6–4 A probability distribution

(Distribution of distances / Normal distribution / Distribution in velocity)

In this section, the three interesting points discussed are a distribution of distances, normal distribution, and distribution in velocity.

1. Distribution of distances:
What would we expect now for the distribution of distances D? What is, for example, the probability that D = 0 after 30 steps? The answer is zero! (Feynman et al., 1963, section 6.4 A probability distribution).”

Dr. Sands modifies the previous random walk by varying the length of each step such that the average step length is one unit. It can be mathematically represented as root-mean-square distance Srms = 1 in which the length of a step S may have any value or possibly close to one unit. In essence, this modification helps to model the thermal motion of a gas molecule. In this case, physicists define P(x, Δx) as the probability that distances D will lie in an interval Δx located at x (say from x to xx). One may write P(x, Δx) = p(xx in which the function p(x) is the probability density for ending up at the distance D from the original position.

According to Dr. Sands, the probability for any particular value of D is zero because there is no chance at all that the sum of the backward steps (of varying lengths) would exactly equal the sum of forward steps. However, I would emphasize that the concept of probability is now defined for continuous random variables instead of discrete random variables. Thus, the probability can be calculated by using an integral such that it is always zero for any single value. In other words, the probability of any possible step is exactly zero because the integral over a single point (or the area under a point) is zero.

Note: You may prefer Feynman’s insightful explanation of random walk in chapter 41: “We have already answered this question, because once we were discussing the superposition of light from a whole lot of different sources at different phases, and that meant adding a lot of arrows at different angles (Chapter 32). There we discovered that the mean square of the distance from one end to the other of the chain of random steps, which was the intensity of the light, is the sum of the intensities of the separate pieces… (Feynman et al., 1963, section 41–4 The random walk).”

2. Normal distribution:
“…The probability density function we have been describing is one that is encountered most commonly. It is known as the normal or Gaussian probability density (Feynman et al., 1963, section 6.4 A probability distribution).”

Dr. Sands briefly describes the normal or Gaussian probability density in which the total probability for all possible events between x = −∞ and x = +∞ is surely 1. The probability density function can be represented as p(x) = (1/σ√[2π]) (exp[−x2/2σ2]), where σ is the standard deviation. There are five characteristics of the normal distribution: (1) The bell curve is symmetric about the mean, m. (2) The mode occurs at x = m. (3) The curve approaches the horizontal axis asymptotically. (4) The curve has its points of inflection at x = m ± σ. (5) The total area under the curve is equal to 1 (Walpole & Myers, 1985). However, the term normal distribution is a misnomer because it is actually a family of distributions and has a connotation that other distributions are abnormal.

In Theory of the motion of the heavenly bodies moving about the sun in conic sections, Gauss (1809) uses the method of least squares to deduce the orbits of celestial bodies. Historically speaking, his work supersedes Laplace’s method of estimation by using the method of least squares with principles of probability and the normal distribution that minimizes the error of estimation. That is, the least squares estimates of orbital paths are the same as the maximum likelihood estimates if the errors due to observations follow a normal distribution.

Note: In his seminal paper titled On the motion of small particles suspended in liquids at rest required by the molecular-kinetic theory of heat, Einstein (1905) derives the probability distribution of a molecule’s resulting displacement x in a given time t as f(x, t) = (n/Ö[4πD])(exp[-x2/4Dt])/(t½).”

3. Distribution in velocity:
“…We call Np(v) the ‘distribution in velocity.’ The area under the curve between two velocities v1 and v2 … represents the expected number of molecules with velocities between v1 and v2 (Feynman et al., 1963, section 6.4 A probability distribution).”

Physicists may want to know how fast some molecules are moving from organic compounds in a bottle as a result of collisions with other molecules. Dr. Sands clarifies that the spread of molecules in still air may be detected from its color or odor (e.g. colored smoke grenades). In general, these molecules have different velocities and they continue to change their velocities after collisions. Thus, we describe the probability that any particular molecule will have velocities between v and vv is p(vv, where p(v), a probability density, is a function of speed. Importantly, they are described by Maxwellian velocity distribution instead of normal distribution.

Dr. Sands mentions that we shall see later how Maxwell, using common sense and the ideas of probability, to find a mathematical expression for p(v). However, in footnote 1 of Chapter 39, it is stated that “[t]his argument, which was the one used by Maxwell, involves some subtleties. Although the conclusion is correct, the result does not follow purely from the considerations of symmetry that we used before, since, by going to a reference frame moving through the gas, we may find a distorted velocity distribution. We have not found a simple proof of this result (Feynman et al., 1963).” Interested readers may want to read Peliti’s (2007) refinement of an argument due to Maxwell for the equipartition of kinetic energy in a mixture of ideal gases with different masses.

Questions for discussion:
1. Why should we ask what is the probability of obtaining distances D near 0, 1, or 2 instead of 0, 1, or 2?
2. Should we define Gaussian distribution such that it is not exactly the same as a normal distribution?
3. Could we speak of the speed of a molecule instead of using a probability description?

The moral of the lesson: the distribution in velocities of gas molecules is not described by a normal distribution.

References:
1. Einstein, A. (1905). On the motion of small particles suspended in liquids at rest required by the molecular-kinetic theory of heat. Annalen der physik, 17, 549-560.
2. Feynman, R. P., Leighton, R. B., & Sands, M. (1963). The Feynman Lectures on Physics, Vol I: Mainly mechanics, radiation, and heat. Reading, MA: Addison-Wesley.
3. Gauss, K. F. (1809). Theory of Motion of the Heavenly Bodies Moving About the Sun in Conic Sections: A Translation of Theoria Motus. New York: Dover Phoenix Editions.
4. Peliti, L. (2007). On the equipartition of kinetic energy in an ideal gas mixture. European journal of physics, 28(2), 249-254.
5. Walpole, R. E., & Myers, R. H. (1985). Probability and Statistics for Engineers and Scientists (3rd ed.). New York: Macmillan.

Friday, July 21, 2017

Section 6–3 The random walk

(Average distance / Root-mean-square distance / Empirical probability)

In this section, the three interesting concepts discussed are average distance, root-mean-square distance, and empirical probability.

1. Average distance:
We might, therefore, ask what is his average distance traveled in absolute value, that is, what is the average of |D| (Feynman et al., 1963, section 6.3 The random walk).”

Dr. Sands explains that the problem of a random walk is related to the motion of atoms as well as the coin-tossing problem. He characterizes the random walk of a walker’s progress by the net distance DN moved in N steps. We may expect the walker’s average progress in a one dimension walk to be zero because he is equally likely to move either forward or backward. One may intuitively feel that as the number of steps N increases, the walker is more likely to have strayed farther from the original position. This refers to the average distance moved in a random walk in absolute value or the average of |D|.

Importantly, the average distance in a random walk in one-dimension and two-dimensions is zero which means that it is possible that the walker returns to the original position. Conversely, in a three-dimensional random walk, the walker will unlikely return to the same initial position. Thus, Shizuo Kakutani, a mathematician, describes these two different consequences of the random walk as “[a] drunk man will find his way home, but a drunk bird may get lost forever (Durrett, 2010, p. 163).” In general, mathematicians may say that a random walk is recurrent if it moves to its original position infinitely often with probability one and the random walk is transient if it moves to its original position finitely often with the same probability.

Note: The term “random walk” was coined by Karl Pearson in 1905. In his own words, “A man starts from a point O and walks l yards in a straight line; he then turns through any angle whatever and walks another l yards in a second straight line. He repeats this process n times. I require the probability that after these n stretches he is at a distance between r and r+dr from this starting point, O (p. 294).” His question was answered by Lord Rayleigh (1905), who had solved a general form of this problem in 1880.

2. Root-mean-square distance:
“… to represent the ‘progress made away from the origin’ in a random walk, we can use the ‘root-mean-square distance’ Drms: = √D2 = √N (Feynman et al., 1963, section 6.3 The random walk).”

Dr. Sands elaborates that it is more convenient to have a measure of “progress” or random wandering by using the square of the distance (D2) that is positive for both positive and negative motion. Mathematically, we can show that the expected value of DN2 is just the number (N) of steps taken. The “expected value” means the probable value that is based on our best guess on the expected average behavior in repeated sequences. We may represent this expected value by DN2 and refer to it as the “mean square distance.” In addition, we can use the “root-mean-square distance” Drms = √N to represent the “progress made away from the origin” in a random walk. These distances are measured in terms of a unit of one step (instead of meters or other units) for reasons of simplicity.

Alternatively, we can derive the root-mean-square distance of a one-dimensional random walker as follows:
If there are N steps, D2 = (Dx1 + Dx2 + … + DxN)2
                                   = Dx12 + Dx22 + … + DxN2 +2Dx1Dx2 + … + 2Dx1DxN
On the average, the cross terms (2Dx1Dx2 + … + 2Dx1DxN) would have the same amount of positive and negative quantities.
That is, we expect 2Dx1Dx2 + … + 2Dx1DxN = 0
If we let every step to be the same unit distance, D2 = Dx12 + Dx22 + … + DxN2 = N
Therefore, the root-mean-square distance Ö(D2) = ÖN

Note: Feynman later explains the problem of random walk again as follows: “[i]t is like the famous drunken sailor problem: the sailor comes out of the bar and takes a sequence of steps, but each step is chosen at an arbitrary angle, at random… (Feynman et al., 1963, section 41–4 The random walk).”

3. Empirical probability:
“…An experimental physicist usually says that an ‘experimentally determined’ probability has an ‘error,’ and writes P(H) = NH/N ± 1/2√N (Feynman et al., 1963, section 6.3 The random walk).”

According to Dr. Sands, an experimental physicist commonly says that an empirical probability has an “error,” and one may write this probability P(H) as NH/N ± 1/2√N. In other words, this expression implies that there is a “correct” probability which could be computed if we have sufficient knowledge and that the observation may have an “error” due to a fluctuation. However, the empirical probability P(H) of an event may vary depending on the experimental conditions or the experimenter that performs the experiment. In essence, we should be cognizant of the subjectivity in the probability concept because it is always based on uncertain knowledge, and that its quantitative determination is subject to change as we obtain more information.

The concept of empirical probability is applicable in a casino, for example, a “loaded” die may be due to a loading with metals such that the unaltered side is more likely to land facing up. Furthermore, it is not always possible to perform experiments such that the same experimental conditions are maintained. Thus, Mayants (2012) explains that “experimental determination of probability distribution is in the general case a practically unsolvable problem (p. 70).” Nevertheless, physicists always need to use experiments to verify the correctness of a hypothetical probability distribution based on theoretical considerations. More importantly, a different physicist performing experiments under a slightly different condition may conclude that P(H) was different.

Questions for discussion:
1. What is the average distance of a walker in a one-dimensional random walk?
2. What is the root-mean-square distance of a walker in a one-dimensional random walk?
3. What is the empirical probability of a drunk walker located at the original position?

The moral of the lesson: the probability concept is in a sense subjective because it is always based on uncertain knowledge and an empirical probability has an experimental “error.”

References:
1. Durrett, R. (2010). Probability: theory and examples (4th ed.). Cambridge: Cambridge University Press.
2. Feynman, R. P., Leighton, R. B., & Sands, M. (1963). The Feynman Lectures on Physics, Vol I: Mainly mechanics, radiation, and heat. Reading, MA: Addison-Wesley.
3. Mayants, L. (2012). The enigma of probability and physics. Dordrecht: D. Reidel.
4. Pearson, K. (1905). The Problem of the Random Walk. Nature, 72(1865), 294.
5. Rayleigh, J. W. S. (1905). The Problem of the Random Walk. Nature, 72(1866), 318.

Friday, July 14, 2017

Section 6–2 Fluctuations

(Binomial distribution / Pascal’s triangle / Binomial probability)

In this section, the three interesting concepts discussed are binomial distribution, Pascal’s triangle, and binomial probability.

1. Binomial distribution:
We can get a better feeling for the details of these results if we plot a graph of the distribution of the results (Feynman et al., 1963, section 6.2 Fluctuations).”

Dr. Sands initially asks how many “heads” we are expected to get if a coin is tossed N times. To illustrate the concept of binomial distribution, 100 experiments were actually done by shaking 30 coins violently in a box and then the number of heads was counted. As a result, there were 3000 tosses of coin during the experiments and the number of heads obtained was 1493. Simply put, the fraction of tosses that gave heads is 0.498 which is slightly less than half. Thus, we do not assume that the probability of throwing heads is greater than 0.5. However, it is possible to have a fluctuation from the binomial distribution such that one particular set of observations of 30 coins gave 16 heads most often instead of 15 heads.

A binomial distribution can be related to a repeatable experiment or observation that has two different outcomes. This distribution is important in physics because it can be used to describe a Bernoulli process, such as a random walk of a molecule in one dimension. There are four important properties of a binomial distribution: (1) There is a fixed number of repeatable trials, for example, toss a coin 30 times. (2) The trials are independent of each other, i.e., the result of one trial has no influence on any other trial. (3) The probability of an outcome denoted by p remains constant throughout the experiment. (4) There are two possible outcomes, such as a “head” and a “tail.” (Remember “fict.”)

2. Pascal’s triangle:
“…The set of numbers which appears in such a diagram is known as Pascal’s triangle. The numbers are also known as the binomial coefficients because they also appear in the expansion of (a + b)n (Feynman et al., 1963, section 6.2 Fluctuations).”

Dr. Sands explains that the number of “ways” of getting different numbers of heads and tails can be illustrated by the set of numbers in Pascal’s triangle. The numbers are also known as the binomial coefficients (or combinatorial numbers) because they are the coefficients of the xk term in the polynomial expansion of (a + b)n. If n is the number of tosses and k is the number of heads obtained, then it can be represented as C(n, k), nCk, or nCk. In general, the binomial coefficients can be computed from C(n, k) = n!/k!(nk)!, in which n! is commonly called “n-factorial” that represents the product (n)(n−1)(n−2)…(3)(2)(1). In essence, the expression C(n, k) is about different combinations of k “heads” that could occur in a sequence of n tosses without considering the order of permutation (or redundancies).

Note: There are claims that the triangle of binomial coefficients was discovered earlier by mathematicians in India, Greece, Iran, China, Germany, and Italy. For example, Yang Hui (1238 – 1298) presented the binomial coefficients in his book, titled Xiangjie Jiuzhang Suanfa (Needham, 1959, p. 135). More importantly, he acknowledged that his method of finding square roots and cubic roots by using these numbers was discovered earlier by another mathematician Jia Xian (1010 – 1070). Jia Xian’s book entitled Shi Suo Suan Shu was written about 600 years before Pascal (1623 – 1662).

3. Binomial probability:
“…This probability function is called the Bernoulli or, also, the binomial probability (Feynman et al., 1963, section 6.2 Fluctuations).”

Dr. Sands elaborates that we can compute the probability P(k, n) of throwing k heads in n tosses by using the definition of probability mentioned earlier. Firstly, the total number of possible outcome in n tosses is 2n because there are 2 outcomes for each toss. Next, the number of ways of obtaining k heads from n tosses is C(n, k) and thus, the probability P(k, n) is equal to C(n, k)/2n. In general, we can designate the two outcomes by W (for “win”) and L (for “lose”) in which the probability of W or L in a single trial may not be equal to 0.5. If we let p be the probability of obtaining the result W, then the probability of L is (1−p) and it can be represented as q. In short, the probability P(k, n) that W will be obtained k times in n trials is P(k, n) = C(n, k) pkqnk.

Curiously, Dr. Sands simply calls this probability function the Bernoulli or the binomial probability. However, the Bernoulli distribution is commonly known as a special case of the binomial distribution in which there is only one trial (n = 1). In other words, we can use the binomial distribution to find the probability of recurring Bernoulli trials. Importantly, if every Bernoulli trial is independent, then the number of “win” in Bernoulli trials has a binomial distribution. Furthermore, some may use 1 and 0 to represent “head” and “tail” (or vice versa) in a coin toss.

Questions for discussion:
1. What are the properties of a binomial distribution?
2. How are binomial coefficients related to the number of “ways” of getting certain numbers (k) of heads in a number (n) of trials without considering the order of heads (k!) and tails ([nk]!)?
3. Suppose 1000 randomly selected residents of New York are asked to vote for Clinton and Trump. Assume that the residents of New York are equally divided on this issue. What is the binomial probability of 501 or more vote for Trump?

The moral of the lesson: the binomial probability refers to a set of n trials in which the probability P(k, n) that the number of “win” or successes will be obtained k times is P(k, n) = C(n, k) pkqnk.

References:
1. Feynman, R. P., Leighton, R. B., & Sands, M. (1963). The Feynman Lectures on Physics, Vol I: Mainly mechanics, radiation, and heat. Reading, MA: Addison-Wesley.
2. Needham, J. (1959). Science and Civilization in China, vol. 3: Mathematics and the Sciences of the Heavens and the Earth. New York: Cambridge University Press.