Continuous Distributions 3 3. χ 2 \chi^2 χ 2 distributions 3.1 Definition Let X X X be a continuous random variable.
By definition, X X X is said to follow the χ 2 \chi^2 χ 2 distribution with k k k degrees of freedom if
X ∼ Γ ( k 2 , 1 2 ) = χ k 2 X\sim \Gamma(\frac{k}{2},\frac{1}{2})=\chi^2_k X ∼ Γ ( 2 k , 2 1 ) = χ k 2 3.2 Significance For k ∈ N , k\in\mathbb{N}, k ∈ N , The χ k 2 \chi^2_k χ k 2 distribution is the distribution of the sum of square of k k k independent standard normal random variables.
The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics , notably in hypothesis testing and in construction of confidence intervals .
The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data , and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, such as Friedman's analysis of variance by ranks .
3.3 Square of a standard normal random variable Let X ∈ N ( 0 , 1 ) , X\in\mathcal{N}(0,1), X ∈ N ( 0 , 1 ) , and let Y = X 2 Y=X^2 Y = X 2
∀ x ∈ R + ∗ , F Y ( x ) = P ( X 2 < x ) = P ( − x < X < x ) = 1 2 π ∫ − x x e − t 2 2 dt = 2 π ∫ 0 x e − t 2 2 dt ⟹ ∀ x ∈ R + ∗ , f Y ( x ) = F Y ′ ( x ) = 1 2 x ⋅ ( 2 π e − ( x ) 2 2 ) = 1 2 π x e − x 2 \begin{align*} \forall x\in\mathbb{R}_+^*,F_Y(x)&=\mathcal{P}(X^2<x)\\ &=\mathcal{P}(-\sqrt{x}<X<\sqrt{x})\\ &=\frac{1}{\sqrt{2\pi}}\int_{-\sqrt{x}}^{\sqrt{x}}e^{-\frac{t^2}{2}}\text{dt}\\ &=\frac{\sqrt{2}}{\sqrt{\pi}}\int_{0}^{\sqrt{x}}e^{-\frac{t^2}{2}}\text{dt}\\ \implies\forall x\in\mathbb{R}_+^*,f_Y(x)&=F_Y'(x)\\ &=\frac{1}{2\sqrt{x}}\cdot\left(\frac{\sqrt{2}}{\sqrt{\pi}}e^{-\frac{(\sqrt x)^2}{2}}\right)\\ &=\frac{1}{\sqrt{2\pi x}}e^{-\frac{x}{2}} \end{align*} ∀ x ∈ R + ∗ , F Y ( x ) ⟹ ∀ x ∈ R + ∗ , f Y ( x ) = P ( X 2 < x ) = P ( − x < X < x ) = 2 π 1 ∫ − x x e − 2 t 2 dt = π 2 ∫ 0 x e − 2 t 2 dt = F Y ′ ( x ) = 2 x 1 ⋅ ( π 2 e − 2 ( x ) 2 ) = 2 π x 1 e − 2 x ∀ x ≤ 0 , \forall x\leq0, ∀ x ≤ 0 , it is trivial that F Y ( x ) = 0 , F_Y(x)=0, F Y ( x ) = 0 , So consequently ∀ x ≤ 0 , f Y ( x ) = 0 \forall x\leq 0,f_Y(x)=0 ∀ x ≤ 0 , f Y ( x ) = 0
So we can conclude that:
X 2 ∼ Γ ( 1 2 , 1 2 ) = χ 1 2 \boxed{X^2\sim\Gamma(\frac{1}{2},\frac{1}{2})=\chi^2_1} X 2 ∼ Γ ( 2 1 , 2 1 ) = χ 1 2 3.4 Sum of squares of independent standard normal random variables Let n ∈ N ∗ n\in\mathbb{N}^* n ∈ N ∗ Let X 1 , … , X n ∼ N ( 0 , 1 ) X_1,\dots,X_n \sim \mathcal{N}(0,1) X 1 , … , X n ∼ N ( 0 , 1 ) be independent standard normal random variables ∑ i = 0 n X i 2 ∼ Γ ( n 2 , 1 2 ) = χ n 2 \boxed{\sum_{i=0}^nX_i^2\sim \Gamma(\frac{n}{2},\frac{1}{2})=\chi^2_n} i = 0 ∑ n X i 2 ∼ Γ ( 2 n , 2 1 ) = χ n 2 This follows immediately from the sum of gamma distributions.
3.5 Sum of chi-square distributions Let n ∈ N ∗ n\in\mathbb{N}^* n ∈ N ∗ Let d 1 , … , d n ∈ N ∗ , d_1,\dots,d_n\in\mathbb{N}^*, d 1 , … , d n ∈ N ∗ , and let r = ∑ i = 1 n d i r=\sum_{i=1}^nd_i r = ∑ i = 1 n d i Let X 1 ∼ χ d 1 2 , … , X n ∼ χ d n 2 X_1\sim \chi^2_{d_1},\dots,X_n\sim \chi^2_{d_n} X 1 ∼ χ d 1 2 , … , X n ∼ χ d n 2 : ∑ i = 1 n X i ∼ χ r 2 \boxed{\sum_{i=1}^nX_i\sim \chi^2_{r}} i = 1 ∑ n X i ∼ χ r 2 This follows immediately from the sum of gamma distributions.
3.6 Moment Let X ∼ χ n 2 X\sim\chi_n^2 X ∼ χ n 2
As a χ 2 \chi^2 χ 2 -distribution is a Γ \Gamma Γ -distribution, the calculation of the moments can be found here
We will list here essentially the expected value and the variance.
3.6.1 Expected Value E [ X ] = n 2 1 2 = n \mathbb{E}[X]=\frac{\frac{n}{2}}{\frac{1}{2}}=n E [ X ] = 2 1 2 n = n 3.6.2 Variance V [ X ] = n 2 1 2 2 = 2 n \mathbb{V}[X]=\frac{\frac{n}{2}}{\frac{1}{2^2}}=2n V [ X ] = 2 2 1 2 n = 2 n 4. F \mathcal{F} F - distributions 4.1 Definition Let d 1 d 2 ∈ N ∗ d_1\,d_2 \in\mathbb{N}^* d 1 d 2 ∈ N ∗ X X X a continuous random variableBy definition, we say that X X X follows the F F F distribution with parameters ( d 1 , d 2 ) (d_1,d_2) ( d 1 , d 2 ) if there exists X 1 ∼ χ d 1 2 , X 2 ∼ χ d 2 2 X_1\sim\chi^2_{d_1},X_2\sim \chi^2_{d_2} X 1 ∼ χ d 1 2 , X 2 ∼ χ d 2 2 such that X 1 , X 2 X_1,X_2 X 1 , X 2 are independents and:
X = X 1 d 1 X 2 d 2 X=\frac{\tfrac{X_1}{d_1}}{\tfrac{X_2}{d_2}} X = d 2 X 2 d 1 X 1 By definition:
∀ d 1 , d 2 ∈ N ∗ , ∀ X 1 ∼ χ d 1 2 , ∀ X 2 ∼ χ d 2 2 independents : X 1 d 1 X 2 d 2 ∼ F ( d 1 , d 2 ) \boxed{\forall d_1,d_2\in\mathbb{N}^*,\quad\forall X_1\sim\chi^2_{d_1},\forall X_2\sim\chi^2_{d_2} \text{ independents}:\quad \frac{\tfrac{X_1}{d_1}}{\tfrac{X_2}{d_2}}\sim\mathcal{F}(d_1,d_2)} ∀ d 1 , d 2 ∈ N ∗ , ∀ X 1 ∼ χ d 1 2 , ∀ X 2 ∼ χ d 2 2 independents : d 2 X 2 d 1 X 1 ∼ F ( d 1 , d 2 ) 4.2 Significance the F F F -distribution arises frequently as the null distribution of a test statistic , most notably in the analysis of variance (ANOVA) and other F -tests .
A random variate of the F -distribution with parameters d 1 d_1 d 1 and d 2 d_2 d 2 arises as the ratio of two appropriately scaled chi-squared variates with respective degree of freedoms d 1 d_1 d 1 and d 2 d_2 d 2 .
4.3 Probability Distribution Function Let d 1 , d 2 ∈ N ∗ d_1,d_2\in\mathbb{N}^* d 1 , d 2 ∈ N ∗
We have χ d 1 2 , χ d 2 2 > 0 \chi_{d_1}^2,\chi_{d_2}^2> 0 χ d 1 2 , χ d 2 2 > 0 , So:
∀ x ∈ R + ∗ , f F ( d 1 , d 2 ) ( x ) = ∫ R + ∗ t f χ d 1 2 / d 1 ( x t ) f χ d 2 2 / d 2 ( t ) dt = ∫ R + ∗ d 1 d 2 t f χ d 1 2 ( x d 1 t ) f χ d 2 2 ( d 2 t ) dt = d 1 d 2 ∫ R + ∗ t f χ d 1 2 ( x d 1 t ) f χ d 2 2 ( d 2 t ) dt = d 1 d 2 ∫ R + ∗ t d 1 d 1 2 − 1 t d 1 2 − 1 x d 1 2 − 1 e − d 1 x 2 t d 2 d 2 2 − 1 t d 2 2 − 1 e − d 2 2 t 2 d 1 + d 2 2 Γ ( d 1 2 ) Γ ( d 2 2 ) dt = d 1 d 1 2 d 2 d 2 2 x d 1 2 − 1 2 d 1 + d 2 2 Γ ( d 1 2 ) Γ ( d 2 2 ) ∫ R + ∗ t d 1 + d 2 2 − 1 e − d 1 x + d 2 2 t dt = d 1 d 1 2 d 2 d 2 2 x d 1 2 − 1 2 d 1 + d 2 2 Γ ( d 1 2 ) Γ ( d 2 2 ) ∫ R + ∗ ( 2 d 1 x + d 2 ) d 1 + d 2 2 u d 1 + d 2 2 − 1 e − u du with u = 2 t d 1 x + d 2 = d 1 d 1 2 d 2 d 2 2 x d 1 2 − 1 ( d 1 x + d 2 ) d 1 + d 2 2 Γ ( d 1 2 ) Γ ( d 2 2 ) ∫ R + ∗ u d 1 + d 2 2 − 1 e − u du = d 1 d 1 2 d 2 d 2 2 x d 1 2 − 1 ( d 1 x + d 2 ) d 1 + d 2 2 Γ ( d 1 2 ) Γ ( d 2 2 ) Γ ( d 1 + d 2 2 ) = d 1 d 1 2 d 2 d 2 2 x d 1 2 − 1 ( d 1 x + d 2 ) d 1 + d 2 2 B ( d 1 2 , d 2 2 ) = 1 x B ( d 1 2 , d 2 2 ) ( d 1 x ) d 1 d 2 d 2 ( d 1 x + d 2 ) d 1 + d 2 \begin{align*} \forall x \in \mathbb{R}_+^*,f_{\mathcal{F}(d_1,d_2)}(x)&=\int_{\mathbb{R}_+^*}tf_{\chi^2_{d_1}/d_1}(xt)f_{\chi^2_{d_2}/d_2}(t)\text{dt}\\ &=\int_{\mathbb{R}_+^*}d_1d_2tf_{\chi^2_{d_1}}(xd_1t)f_{\chi^2_{d_2}}(d_2t)\text{dt}\\ &=d_1d_2\int_{\mathbb{R}_+^*}tf_{\chi^2_{d_1}}(xd_1t)f_{\chi^2_{d_2}}(d_2t)\text{dt}\\ &=d_1d_2\int_{\mathbb{R}_+^*}t\frac{d_1^{\tfrac{d_1}{2}-1}t^{\tfrac{d_1}{2}-1}x^{\tfrac{d_1}{2}-1}e^{-\tfrac{d_1x}{2}t}d_2^{\tfrac{d_2}{2}-1}t^{\tfrac{d_2}{2}-1}e^{-\tfrac{d_2}{2}t}}{2^{\tfrac{d_1+d_2}{2}}\Gamma(\tfrac{d_1}{2})\Gamma(\tfrac{d_2}{2})}\text{dt}\\ &=\frac{d_1^{\tfrac{d_1}{2}}d_2^{\tfrac{d_2}{2}}x^{\tfrac{d_1}{2}-1}}{2^{\tfrac{d_1+d_2}{2}}\Gamma(\tfrac{d_1}{2})\Gamma(\tfrac{d_2}{2})}\int_{\mathbb{R}_+^*}t^{\tfrac{d_1+d_2}{2}-1}e^{-\tfrac{d_1x+d_2}{2}t}\text{dt}\\ &=\frac{d_1^{\tfrac{d_1}{2}}d_2^{\tfrac{d_2}{2}}x^{\tfrac{d_1}{2}-1}}{2^{\tfrac{d_1+d_2}{2}}\Gamma(\tfrac{d_1}{2})\Gamma(\tfrac{d_2}{2})}\int_{\mathbb{R}_+^*}\left(\frac{2}{d_1x+d_2}\right)^{\tfrac{d_1+d_2}{2}}u^{\tfrac{d_1+d_2}{2}-1}e^{-u}\text{du} \text{ with }u=\frac{2t}{d_1x+d_2}\\ &=\frac{d_1^{\tfrac{d_1}{2}}d_2^{\tfrac{d_2}{2}}x^{\tfrac{d_1}{2}-1}}{\left(d_1x+d_2\right)^{\tfrac{d_1+d_2}{2}}\Gamma(\tfrac{d_1}{2})\Gamma(\tfrac{d_2}{2})}\int_{\mathbb{R}_+^*}u^{\tfrac{d_1+d_2}{2}-1}e^{-u}\text{du}\\ &=\frac{d_1^{\tfrac{d_1}{2}}d_2^{\tfrac{d_2}{2}}x^{\tfrac{d_1}{2}-1}}{\left(d_1x+d_2\right)^{\tfrac{d_1+d_2}{2}}\Gamma(\tfrac{d_1}{2})\Gamma(\tfrac{d_2}{2})}\Gamma\left(\frac{d_1+d_2}{2}\right)\\ &=\frac{d_1^{\tfrac{d_1}{2}}d_2^{\tfrac{d_2}{2}}x^{\tfrac{d_1}{2}-1}}{\left(d_1x+d_2\right)^{\tfrac{d_1+d_2}{2}}\Beta(\tfrac{d_1}{2},\tfrac{d_2}{2})}\\ &=\frac{1}{x\Beta(\tfrac{d_1}{2},\tfrac{d_2}{2})}\sqrt{\frac{(d_1x)^{d_1}d_2^{d_2}}{(d_1x+d_2)^{d_1+d_2}}} \end{align*} ∀ x ∈ R + ∗ , f F ( d 1 , d 2 ) ( x ) = ∫ R + ∗ t f χ d 1 2 / d 1 ( x t ) f χ d 2 2 / d 2 ( t ) dt = ∫ R + ∗ d 1 d 2 t f χ d 1 2 ( x d 1 t ) f χ d 2 2 ( d 2 t ) dt = d 1 d 2 ∫ R + ∗ t f χ d 1 2 ( x d 1 t ) f χ d 2 2 ( d 2 t ) dt = d 1 d 2 ∫ R + ∗ t 2 2 d 1 + d 2 Γ ( 2 d 1 ) Γ ( 2 d 2 ) d 1 2 d 1 − 1 t 2 d 1 − 1 x 2 d 1 − 1 e − 2 d 1 x t d 2 2 d 2 − 1 t 2 d 2 − 1 e − 2 d 2 t dt = 2 2 d 1 + d 2 Γ ( 2 d 1 ) Γ ( 2 d 2 ) d 1 2 d 1 d 2 2 d 2 x 2 d 1 − 1 ∫ R + ∗ t 2 d 1 + d 2 − 1 e − 2 d 1 x + d 2 t dt = 2 2 d 1 + d 2 Γ ( 2 d 1 ) Γ ( 2 d 2 ) d 1 2 d 1 d 2 2 d 2 x 2 d 1 − 1 ∫ R + ∗ ( d 1 x + d 2 2 ) 2 d 1 + d 2 u 2 d 1 + d 2 − 1 e − u du with u = d 1 x + d 2 2 t = ( d 1 x + d 2 ) 2 d 1 + d 2 Γ ( 2 d 1 ) Γ ( 2 d 2 ) d 1 2 d 1 d 2 2 d 2 x 2 d 1 − 1 ∫ R + ∗ u 2 d 1 + d 2 − 1 e − u du = ( d 1 x + d 2 ) 2 d 1 + d 2 Γ ( 2 d 1 ) Γ ( 2 d 2 ) d 1 2 d 1 d 2 2 d 2 x 2 d 1 − 1 Γ ( 2 d 1 + d 2 ) = ( d 1 x + d 2 ) 2 d 1 + d 2 B ( 2 d 1 , 2 d 2 ) d 1 2 d 1 d 2 2 d 2 x 2 d 1 − 1 = x B ( 2 d 1 , 2 d 2 ) 1 ( d 1 x + d 2 ) d 1 + d 2 ( d 1 x ) d 1 d 2 d 2 5. Sudent's t t t -distribution 5.1 Definition In probability and statistics , Student's t t t -distribution (or simply the t t t -distribution ) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".
Let d 1 d 2 ∈ N ∗ d_1\,d_2 \in\mathbb{N}^* d 1 d 2 ∈ N ∗ X X X a continuous random variableBy definition, we say that X X X follows the t t t distribution with ν \nu ν degrees of freedom if there exists P ∼ N ( 0 , 1 ) , S ∼ χ ν 2 P\sim\mathcal{N}(0,1),S\sim \chi^2_{\nu} P ∼ N ( 0 , 1 ) , S ∼ χ ν 2 such that X , S X,S X , S are independents and:
X = P S ν X=\frac{P}{\sqrt{\tfrac{S}{\nu}}} X = ν S P By definition:
∀ ν ∈ N ∗ , ∀ P ∼ N ( 0 , 1 ) , ∀ S ∼ χ ν 2 independents : P S ν ∼ T ( ν ) \boxed{\forall \nu\in\mathbb{N}^*,\quad\forall P\sim\mathcal{N}(0,1),\forall S\sim\chi^2_{\nu} \text{ independents}:\quad \frac{P}{\sqrt{\tfrac{S}{\nu}}}\sim\mathcal{T}(\nu)} ∀ ν ∈ N ∗ , ∀ P ∼ N ( 0 , 1 ) , ∀ S ∼ χ ν 2 independents : ν S P ∼ T ( ν ) 5.2 Significance The t t t -distribution plays a role in a number of widely used statistical analyses, including Student's t -test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis . Student's t -distribution also arises in the Bayesian analysis of data from a normal family.
If we take a sample of n n n observations from a normal distribution, then the t t t -distribution with ν = n − 1 \nu=n-1 ν = n − 1 degrees of freedom can be defined as the distribution of the location of the sample mean relative to the true mean, divided by the sample standard deviation, after multiplying by the standardizing term n \sqrt{n} n . In this way, the t -distribution can be used to construct a confidence interval for the true mean.
The t t t -distribution is symmetric and bell-shaped, like the normal distribution. However, the t t t -distribution has heavier tails, meaning that it is more prone to producing values that fall far from its mean. This makes it useful for understanding the statistical behavior of certain types of ratios of random quantities, in which variation in the denominator is amplified and may produce outlying values when the denominator of the ratio falls close to zero. The Student's t t t -distribution is a special case of the generalised hyperbolic distribution .
5.3 Probability Distribution Function ∀ x ∈ R , f T ( ν ) ( x ) = Γ ( n + 1 2 ) n π Γ ( n 2 ) ( 1 + x 2 2 ) − n + 1 2 \forall x\in\mathbb{R},\quad f_{\mathcal{T}(\nu)}(x)=\frac{\Gamma(\tfrac{n+1}{2})}{\sqrt{n\pi}\Gamma(\tfrac{n}{2})}\left(1+\frac{x^2}{2}\right)^{-\tfrac{n+1}{2}} ∀ x ∈ R , f T ( ν ) ( x ) = nπ Γ ( 2 n ) Γ ( 2 n + 1 ) ( 1 + 2 x 2 ) − 2 n + 1