Advanced Probability 1. Tchebychev Inequality 1.1 Statement Let X ∼ D X\sim \mathcal{D} X ∼ D with finite variance σ 2 \sigma^2 σ 2 and finite expectation μ . \mu. μ . Then:
∀ r ∈ R + ∗ , P ( ∣ X − μ ∣ ≥ r σ ) ≤ 1 r 2 \forall r\in\mathbb{R}_+^*,\quad \mathcal{P}\left(\lvert X-\mu \rvert \ge r\sigma\right) \le \frac{1}{r^2} ∀ r ∈ R + ∗ , P ( ∣ X − μ ∣ ≥ r σ ) ≤ r 2 1 1.2 Proof Let X ∼ D X\sim \mathcal{D} X ∼ D with finite variance σ 2 \sigma^2 σ 2 and finite expectation μ \mu μ
Suppose that ∃ r ∈ R + ∗ : \exists r\in\mathbb{R}_+^*: ∃ r ∈ R + ∗ :
P ( ∣ X − μ ∣ ≥ r σ ) > 1 r 2 \mathcal{P}(\lvert X-\mu \rvert \ge r\sigma)> \frac{1}{r^2} P (∣ X − μ ∣ ≥ r σ ) > r 2 1 So we have:
P ( ( X − μ ) 2 ≥ r 2 σ 2 ) > 1 r 2 ⟹ r 2 σ 2 P ( ( X − μ ) 2 ≥ r 2 σ 2 ) > σ 2 V [ X ] = E [ ( X − μ ) 2 ] = E [ ( X − μ ) 2 ∣ ( X − μ ) 2 ≥ r 2 σ 2 ] ⋅ P ( ( X − μ ) 2 ≥ r 2 σ 2 ) + E [ ( X − μ ) 2 ∣ ( X − μ ) 2 < r 2 σ 2 ] ⋅ P ( ( X − μ ) 2 < r 2 σ 2 ) ≥ E [ ( X − μ ) 2 ∣ ( X − μ ) 2 ≥ r 2 σ 2 ] ⋅ P ( ( X − μ ) 2 ≥ r 2 σ 2 ) ≥ r 2 σ 2 P ( ( X − μ ) 2 ≥ r 2 σ 2 ) > σ 2 \begin{align*} \mathcal{P}((X-\mu)^2 \ge r^2\sigma^2)&> \frac{1}{r^2}\\ \implies r^2\sigma^2\mathcal{P}((X-\mu)^2 \ge r^2\sigma^2)&> \sigma^2\\ \mathbb{V}[X]&= \mathbb{E}\left[(X-\mu)^2\right]\\ &=\mathbb{E}\left[(X-\mu)^2 \mid (X-\mu)^2 \ge r^2\sigma^2\right]\cdot \mathcal{P}((X-\mu)^2 \ge r^2\sigma^2)\\ &\quad + \mathbb{E}\left[(X-\mu)^2 \mid (X-\mu)^2 < r^2\sigma^2\right]\cdot \mathcal{P}((X-\mu)^2 < r^2\sigma^2) \\ &\ge \mathbb{E}\left[(X-\mu)^2 \mid (X-\mu)^2 \ge r^2\sigma^2\right]\cdot \mathcal{P}((X-\mu)^2 \ge r^2\sigma^2) \\ &\ge r^2\sigma^2\mathcal{P}((X-\mu)^2 \ge r^2\sigma^2) \\ &> \sigma^2 \end{align*} P (( X − μ ) 2 ≥ r 2 σ 2 ) ⟹ r 2 σ 2 P (( X − μ ) 2 ≥ r 2 σ 2 ) V [ X ] > r 2 1 > σ 2 = E [ ( X − μ ) 2 ] = E [ ( X − μ ) 2 ∣ ( X − μ ) 2 ≥ r 2 σ 2 ] ⋅ P (( X − μ ) 2 ≥ r 2 σ 2 ) + E [ ( X − μ ) 2 ∣ ( X − μ ) 2 < r 2 σ 2 ] ⋅ P (( X − μ ) 2 < r 2 σ 2 ) ≥ E [ ( X − μ ) 2 ∣ ( X − μ ) 2 ≥ r 2 σ 2 ] ⋅ P (( X − μ ) 2 ≥ r 2 σ 2 ) ≥ r 2 σ 2 P (( X − μ ) 2 ≥ r 2 σ 2 ) > σ 2 Which is a contradiction as V [ X ] = σ 2 ■ \mathbb{V}[X]=\sigma^2 \quad \blacksquare V [ X ] = σ 2 ■
2. Discrete Compound Distribution A discrete compound distribution is a discrete distribution whose parameter is a random variable X X X .
For example, if X ∼ U ( 0 , 1 ) X\sim U(0,1) X ∼ U ( 0 , 1 ) , B ( X ) \mathcal{B}(X) B ( X ) is said to be a compound discrete distribution
We will analyse two cases:
The random variable X X X is discrete The random variable X X X is continuous 2.1 Compounding with a discrete random variable 2.1.1 Definition Let D \mathcal{D} D a family of distributions with parameter s ∈ S s\in S s ∈ S and values on A A A Let X X X be a discrete distribution with values on Q ⊆ S Q\subseteq S Q ⊆ S a discrete random variable Y Y Y is said to follow the compound distribution D ( X ) \mathcal{D}(X) D ( X ) if:
∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) \forall k \in Q, \quad Y[X=k]\sim \mathcal{D}(k) ∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) 2.1.2 Probability Mass function ∀ y ∈ A , P ( Y = y ) = ∑ k ∈ Q P ( Y = y ∣ X = k ) ⋅ P ( X = k ) = ∑ k ∈ Q P ( Y [ X = k ] = y ) ⋅ P ( X = k ) \forall y \in A,\quad \mathcal{P}(Y=y)=\sum_{k\in Q}\mathcal{P}(Y=y\mid X=k)\cdot \mathcal{P}(X=k)=\sum_{k\in Q}\mathcal{P}(Y[X=k]=y)\cdot \mathcal{P}(X=k) ∀ y ∈ A , P ( Y = y ) = k ∈ Q ∑ P ( Y = y ∣ X = k ) ⋅ P ( X = k ) = k ∈ Q ∑ P ( Y [ X = k ] = y ) ⋅ P ( X = k ) 2.1.3 Example Let n ∈ N , p ∈ [ 0 , 1 ] n\in\mathbb{N},p\in[0,1] n ∈ N , p ∈ [ 0 , 1 ] Let X ∼ D ( 1 , n ) X\sim \mathcal{D}(1,n) X ∼ D ( 1 , n ) Let Y ∼ D ( 1 , X ) . Y\sim \mathcal{D}(1,X). Y ∼ D ( 1 , X ) . It can be thought as following a Bernoulli distribution with a random size X X X Our goal is to calculate the probability distribution of Y Y Y , and then its expected value and variance
We will start by the probability distribution function, as it represents the distribution of Y Y Y
∀ k ∈ { 1 , … , n } , P ( Y = k ) = ∑ s = 1 n P ( Y = k ∣ X = s ) P ( X = s ) = 1 n ∑ s = 1 n 1 [ 1 , s ] ( k ) s = 1 n ∑ s = k n 1 s \begin{align*} \forall k \in \{1,\dots,n\},\quad \mathcal{P}(Y=k)&=\sum_{s=1}^n\mathcal{P}(Y=k \mid X=s)\mathcal{P}(X=s)\\ &=\frac{1}{n}\sum_{s=1}^n \frac{\mathbb{1}_{[1,s]}(k)}{s} \\ &=\frac{1}{n}\sum_{s=k}^n \frac{1}{s}\\ \end{align*} ∀ k ∈ { 1 , … , n } , P ( Y = k ) = s = 1 ∑ n P ( Y = k ∣ X = s ) P ( X = s ) = n 1 s = 1 ∑ n s 1 [ 1 , s ] ( k ) = n 1 s = k ∑ n s 1 For the expected value E [ Y ] \mathbb{E}[Y] E [ Y ] , we have:
∀ k ∈ { 1 , … , n } , E [ Y ∣ X = k ] = k + 1 2 ⟹ E [ Y ∣ X ] = X + 1 2 ⟹ E [ Y ] = E [ E [ Y ∣ X ] ] = 1 2 E [ X ] + 1 2 = n + 1 4 + 1 2 = n + 3 4 \begin{align*} \forall k\in\{1,\dots,n\},\quad \mathbb{E}[Y\mid X=k]&=\frac{k+1}{2}\\ \implies \mathbb{E}[Y\mid X]&=\frac{X+1}{2}\\ \implies \mathbb{E}[Y]&=\mathbb{E}[\mathbb{E}[Y\mid X]] \\ &=\frac{1}{2}\mathbb{E}[X]+\frac{1}{2}\\ &=\frac{n+1}{4}+\frac{1}{2}\\ &=\frac{n+3}{4} \end{align*} ∀ k ∈ { 1 , … , n } , E [ Y ∣ X = k ] ⟹ E [ Y ∣ X ] ⟹ E [ Y ] = 2 k + 1 = 2 X + 1 = E [ E [ Y ∣ X ]] = 2 1 E [ X ] + 2 1 = 4 n + 1 + 2 1 = 4 n + 3 To calculate the variance, we will start by the conditional variance :
∀ k ∈ { 1 , … , n } , V [ Y ∣ X = k ] = k 2 − 1 12 ⟹ V [ Y ∣ X ] = X 2 − 1 12 \begin{align*} \forall k\in\{1,\dots,n\},\quad \mathbb{V}[Y\mid X=k]&=\frac{k^2-1}{12}\\ \implies \mathbb{V}[Y\mid X]&=\frac{X^2-1}{12} \end{align*} ∀ k ∈ { 1 , … , n } , V [ Y ∣ X = k ] ⟹ V [ Y ∣ X ] = 12 k 2 − 1 = 12 X 2 − 1 Now, we will calculate the variance V [ Y ] \mathbb{V}[Y] V [ Y ] using the theorem of total variance:
V [ Y ] = V [ E [ Y ∣ X ] ] + E [ V [ Y ∣ X ] ] = V [ X + 1 2 ] + E [ X 2 − 1 12 ] = 1 4 V [ X ] + 1 12 E [ X 2 ] − 1 12 = 1 4 V [ X ] + 1 12 ( V [ X ] + E [ X ] 2 ) − 1 12 = 1 3 V [ X ] + 1 12 E [ X ] 2 − 1 12 = n 2 − 1 36 + ( n + 1 ) 2 48 − 1 12 = 4 n 2 − 4 + 3 n 2 + 6 n + 3 − 12 144 = 7 n 2 + 6 n − 13 144 = ( n − 1 ) ( 7 n + 13 ) 144 \begin{align*} \mathbb{V}[Y]&=\mathbb{V}[\mathbb{E}[Y\mid X]] + \mathbb{E}[\mathbb{V}[Y\mid X]]\\ &=\mathbb{V}\left[\frac{X+1}{2}\right]+\mathbb{E}\left[\frac{X^2-1}{12}\right] \\ &=\frac{1}{4}\mathbb{V}[X]+\frac{1}{12}\mathbb{E}[X^2]-\frac{1}{12}\\ &=\frac{1}{4}\mathbb{V}[X]+\frac{1}{12}(\mathbb{V}[X]+\mathbb{E}[X]^2)-\frac{1}{12}\\ &=\frac{1}{3}\mathbb{V}[X]+\frac{1}{12}\mathbb{E}[X]^2-\frac{1}{12}\\ &=\frac{n^2-1}{36}+\frac{(n+1)^2}{48}-\frac{1}{12}\\ &=\frac{4n^2-4+3n^2+6n+3-12}{144}\\ &=\frac{7n^2+6n-13}{144}\\ &=\frac{(n-1)(7n+13)}{144} \end{align*} V [ Y ] = V [ E [ Y ∣ X ]] + E [ V [ Y ∣ X ]] = V [ 2 X + 1 ] + E [ 12 X 2 − 1 ] = 4 1 V [ X ] + 12 1 E [ X 2 ] − 12 1 = 4 1 V [ X ] + 12 1 ( V [ X ] + E [ X ] 2 ) − 12 1 = 3 1 V [ X ] + 12 1 E [ X ] 2 − 12 1 = 36 n 2 − 1 + 48 ( n + 1 ) 2 − 12 1 = 144 4 n 2 − 4 + 3 n 2 + 6 n + 3 − 12 = 144 7 n 2 + 6 n − 13 = 144 ( n − 1 ) ( 7 n + 13 ) 2.2 Compounding with a continuous random variable 2.2.1 Definition Let D \mathcal{D} D a family of distributions with parameter s ∈ S s\in S s ∈ S and values on A A A Let X X X be a continuous distribution with values on Q ⊆ S Q\subseteq S Q ⊆ S a discrete random variable Y Y Y is said to follow the compound distribution D ( X ) \mathcal{D}(X) D ( X ) if:
∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) \forall k \in Q, \quad Y[X=k]\sim \mathcal{D}(k) ∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) 2.2.2 Probability Mass function ∀ y ∈ A , P ( Y = y ) = ∫ Q P ( Y = y ∣ X = t ) ⋅ f X ( t ) dt = ∫ Q P ( Y [ X = t ] = y ) ⋅ f X ( t ) dt \forall y \in A,\quad \mathcal{P}(Y=y)=\int_Q\mathcal{P}(Y=y\mid X=t)\cdot f_X(t)\space \text{dt}=\int_Q\mathcal{P}(Y[X=t]=y)\cdot f_X(t)\space \text{dt} ∀ y ∈ A , P ( Y = y ) = ∫ Q P ( Y = y ∣ X = t ) ⋅ f X ( t ) dt = ∫ Q P ( Y [ X = t ] = y ) ⋅ f X ( t ) dt 2.2.3 Example Let n ∈ N n\in\mathbb{N} n ∈ N Let X ∼ U ( 0 , 1 ) X\sim \mathcal{U}(0,1) X ∼ U ( 0 , 1 ) Let Y ∼ B ( n , X ) . Y\sim \mathcal{B}(n,X). Y ∼ B ( n , X ) . It can be thought as following a Bernoulli distribution with a random probability X X X Our goal is to calculate the probability distribution of Y Y Y , and then its expected value and variance
We will start by the probability distribution function, as it represents the distribution of Y Y Y
∀ k ∈ { 0 , … , n } , P ( Y = k ) = ∫ 0 1 P ( Y = k ∣ X = p ) ⋅ f X ( p ) dp = ∫ 0 1 ( n k ) p k ( 1 − p ) n − k dp = ( n k ) B ( k + 1 , n − k + 1 ) Where B is the Beta function = ( n k ) Γ ( k + 1 ) Γ ( n − k + 1 ) Γ ( n + 2 ) as B ( x , y ) = Γ ( x ) Γ ( y ) Γ ( x + y ) = ( n k ) k ! ( n − k ) ! ( n + 1 ) ! = ( n k ) k ! ( n − k ) ! ( n + 1 ) n ! = 1 n + 1 ⟹ Y follows the discrete uniform distribution D ( 0 , n ) \begin{align*} \forall k \in \{0,\dots,n\}, \mathcal{P}(Y=k)&=\int_{0}^1\mathcal{P}(Y=k \mid X=p)\cdot f_X(p)\space \text{dp}\\ &=\int_{0}^{1}{n \choose k}p^k(1-p)^{n-k} \space\text{dp} \\ &={n\choose k}\Beta(k+1,n-k+1) \quad \text{Where}\space \Beta \space \text{is the Beta function}\\ &={n\choose k}\frac{\Gamma(k+1)\Gamma(n-k+1)}{\Gamma(n+2)} \quad \text{as} \space \Beta(x,y)=\frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)}\\ &={n\choose k}\frac{k!(n-k)!}{(n+1)!}\\ &={n\choose k}\frac{k!(n-k)!}{(n+1)n!}\\ &=\frac{1}{n+1}\\ \implies Y &\space \text{follows the discrete uniform distribution} \space \mathcal{D}(0,n) \end{align*} ∀ k ∈ { 0 , … , n } , P ( Y = k ) ⟹ Y = ∫ 0 1 P ( Y = k ∣ X = p ) ⋅ f X ( p ) dp = ∫ 0 1 ( k n ) p k ( 1 − p ) n − k dp = ( k n ) B ( k + 1 , n − k + 1 ) Where B is the Beta function = ( k n ) Γ ( n + 2 ) Γ ( k + 1 ) Γ ( n − k + 1 ) as B ( x , y ) = Γ ( x + y ) Γ ( x ) Γ ( y ) = ( k n ) ( n + 1 )! k ! ( n − k )! = ( k n ) ( n + 1 ) n ! k ! ( n − k )! = n + 1 1 follows the discrete uniform distribution D ( 0 , n ) The expected value and variance are:
E [ X ] = n + 1 2 V [ X ] = ( n + 1 ) 2 − 1 12 = n ( n + 2 ) 12 \begin{align*} \mathbb{E}[X]&=\frac{n+1}{2}\\ \mathbb{V}[X]&=\frac{(n+1)^2-1}{12}\\ &=\frac{n(n+2)}{12} \end{align*} E [ X ] V [ X ] = 2 n + 1 = 12 ( n + 1 ) 2 − 1 = 12 n ( n + 2 ) 3. Discrete Compound Distribution A continuous compound distribution is a discrete distribution whose parameter is a random variable X X X .
For example, if X ∼ U ( 0 , 1 ) X\sim U(0,1) X ∼ U ( 0 , 1 ) , N ( X , 1 ) \mathcal{N}(X,1) N ( X , 1 ) is said to be a compound continuous distribution
We will analyse two cases:
The random variable X X X is discrete The random variable X X X is continuous 3.1 Compounding by a discrete distribution 3.1.1 Definition Let D \mathcal{D} D a family of distributions with parameter s ∈ S s\in S s ∈ S and values on A A A Let X X X be a discrete distribution with values on Q ⊆ S Q\subseteq S Q ⊆ S a continuous random variable Y Y Y is said to follow the compound distribution D ( X ) \mathcal{D}(X) D ( X ) if:
∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) \forall k \in Q, \quad Y[X=k]\sim \mathcal{D}(k) ∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) 3.1.2 Probability Mass function ∀ y ∈ A , f Y ( y ) = ∑ k ∈ Q f Y [ X = k ] ( y ) ⋅ P ( X = k ) \forall y \in A,\quad f_Y(y)=\sum_{k\in Q}f_{Y[X=k]}(y)\cdot \mathcal{P}(X=k) ∀ y ∈ A , f Y ( y ) = k ∈ Q ∑ f Y [ X = k ] ( y ) ⋅ P ( X = k ) 3.1.3 Example Let p ∈ ] 0 , 1 ] p\in\space ]0,1] p ∈ ] 0 , 1 ] Let X ∼ G ( p ) X\sim \mathcal{G}(p) X ∼ G ( p ) Let Y ∼ U ( 0 , X ) . Y\sim \mathcal{U}(0,X). Y ∼ U ( 0 , X ) . It can be thought as following a uniform distribution with a random parameter X X X Our goal is to calculate the probability distribution of Y Y Y , and then its expected value
We will start by the probability distribution function, as it represents the distribution of Y Y Y
∀ y ∈ R + ∗ , f Y ( y ) = ∑ n ∈ N ∗ f Y [ X = n ] ( y ) P ( X = n ) = ∑ n ∈ N ∗ p n ( 1 − p ) n − 1 1 [ 0 , n ] ( y ) = ∑ n = ⌈ y ⌉ + ∞ p n ( 1 − p ) n − 1 \begin{align*} \forall y \in\mathbb{R}^*_+, f_Y(y)&= \sum_{n\in\mathbb{N}^*}f_{Y[X=n]}(y)\mathcal{P}(X=n) \\ &=\sum_{n\in\mathbb{N}^*}\frac{p}{n}(1-p)^{n-1}\mathbb{1}_{[0,n]}(y)\\ &=\sum_{n=\lceil y \rceil}^{+\infty}\frac{p}{n}(1-p)^{n-1} \end{align*} ∀ y ∈ R + ∗ , f Y ( y ) = n ∈ N ∗ ∑ f Y [ X = n ] ( y ) P ( X = n ) = n ∈ N ∗ ∑ n p ( 1 − p ) n − 1 1 [ 0 , n ] ( y ) = n = ⌈ y ⌉ ∑ + ∞ n p ( 1 − p ) n − 1 The expected Value E [ Y ] \mathbb{E}[Y] E [ Y ] is:
E [ Y ] = ∫ R + y f Y ( y ) dy = ∫ R + y ∑ n = ⌈ y ⌉ + ∞ p n ( 1 − p ) n − 1 dy = ∑ m ∈ N ∫ m m + 1 ∑ n = m + 1 + ∞ y p n ( 1 − p ) n − 1 dy = ∑ m ∈ N ∑ n = m + 1 + ∞ p n ( 1 − p ) n − 1 ∫ m m + 1 y dy = ∑ m ∈ N ∑ n = m + 1 + ∞ p n ( 1 − p ) n − 1 [ y 2 2 ] m m + 1 = ∑ m ∈ N ∑ n = m + 1 + ∞ p n ( 1 − p ) n − 1 2 m + 1 2 = ∑ n ∈ N ∗ ∑ m = 0 n − 1 p n ( 1 − p ) n − 1 2 m + 1 2 = ∑ n ∈ N ∗ p 2 n ( 1 − p ) n − 1 ∑ m = 0 n − 1 2 m + 1 = ∑ n ∈ N ∗ p 2 n ( 1 − p ) n − 1 n 2 = 1 2 ∑ n ∈ N n p ( 1 − p ) n − 1 = 1 2 E [ X ] = 1 2 p \begin{align*} \mathbb{E}[Y]&=\int_{\mathbb{R}_+}yf_Y(y)\space \text{dy}\\ &=\int_{\mathbb{R}_+}y\sum_{n=\lceil y \rceil}^{+\infty}\frac{p}{n}(1-p)^{n-1} \space \text{dy} \\ &=\sum_{m\in\mathbb{N}}\int_{m}^{m+1}\sum_{n=m+1}^{+\infty}y\frac{p}{n}(1-p)^{n-1} \text{dy} \\ &=\sum_{m\in\mathbb{N}}\sum_{n=m+1}^{+\infty}\frac{p}{n}(1-p)^{n-1}\int_{m}^{m+1}y\space \text{dy}\\ &=\sum_{m\in\mathbb{N}}\sum_{n=m+1}^{+\infty}\frac{p}{n}(1-p)^{n-1}\left[\frac{y^2}{2}\right]^{m+1}_m\\ &=\sum_{m\in\mathbb{N}}\sum_{n=m+1}^{+\infty}\frac{p}{n}(1-p)^{n-1}\frac{2m+1}{2}\\ &=\sum_{n\in\mathbb{N}^*}\sum_{m=0}^{n-1}\frac{p}{n}(1-p)^{n-1}\frac{2m+1}{2}\\ &=\sum_{n\in\mathbb{N}^*}\frac{p}{2n}(1-p)^{n-1}\sum_{m=0}^{n-1}2m+1\\ &=\sum_{n\in\mathbb{N}^*}\frac{p}{2n}(1-p)^{n-1}n^2\\ &=\frac{1}{2}\sum_{n\in\mathbb{N}}np(1-p)^{n-1}\\ &=\frac{1}{2}\mathbb{E}[X]\\ &=\frac{1}{2p} \end{align*} E [ Y ] = ∫ R + y f Y ( y ) dy = ∫ R + y n = ⌈ y ⌉ ∑ + ∞ n p ( 1 − p ) n − 1 dy = m ∈ N ∑ ∫ m m + 1 n = m + 1 ∑ + ∞ y n p ( 1 − p ) n − 1 dy = m ∈ N ∑ n = m + 1 ∑ + ∞ n p ( 1 − p ) n − 1 ∫ m m + 1 y dy = m ∈ N ∑ n = m + 1 ∑ + ∞ n p ( 1 − p ) n − 1 [ 2 y 2 ] m m + 1 = m ∈ N ∑ n = m + 1 ∑ + ∞ n p ( 1 − p ) n − 1 2 2 m + 1 = n ∈ N ∗ ∑ m = 0 ∑ n − 1 n p ( 1 − p ) n − 1 2 2 m + 1 = n ∈ N ∗ ∑ 2 n p ( 1 − p ) n − 1 m = 0 ∑ n − 1 2 m + 1 = n ∈ N ∗ ∑ 2 n p ( 1 − p ) n − 1 n 2 = 2 1 n ∈ N ∑ n p ( 1 − p ) n − 1 = 2 1 E [ X ] = 2 p 1 It can be also calculated directly using the law of total expectation:
E [ Y ] = E [ E [ Y ∣ X ] ] = ∑ n ∈ N ∗ E [ Y ∣ X = n ] ⋅ P ( X = n ) = ∑ n ∈ N ∗ n 2 p ( 1 − p ) n since Y [ X = n ] ∼ U ( 0 , n ) = 1 2 E [ X ] = 1 2 p \begin{align*} \mathbb{E}[Y]&=\mathbb{E}[\mathbb{E}[Y\mid X]] \\ &= \sum_{n\in\mathbb{N}^*}\mathbb{E}[Y\mid X=n]\cdot \mathcal{P}(X=n)\\ &=\sum_{n\in\mathbb{N}^*}\frac{n}{2}p(1-p)^n \quad \text{since } Y[X=n] \sim \mathcal{U}(0,n)\\ &=\frac{1}{2}\mathbb{E}[X]\\ &=\frac{1}{2p} \end{align*} E [ Y ] = E [ E [ Y ∣ X ]] = n ∈ N ∗ ∑ E [ Y ∣ X = n ] ⋅ P ( X = n ) = n ∈ N ∗ ∑ 2 n p ( 1 − p ) n since Y [ X = n ] ∼ U ( 0 , n ) = 2 1 E [ X ] = 2 p 1 A third method close to method 2 is to express E [ Y ∣ X ] \mathbb{E}[Y\mid X] E [ Y ∣ X ] as a function of X X X :
∀ n ∈ N ∗ , E [ Y ∣ X = n ] = n 2 as Y [ X = n ] ∼ U ( 0 , n ) ⟹ E [ Y ∣ X ] = X 2 ⟹ E [ Y ] = E [ E [ Y ∣ X ] ] = E [ X 2 ] = 1 2 E [ X ] = 1 2 p \begin{align*} \forall n\in\mathbb{N}^*,\quad \mathbb{E}[Y\mid X=n]&=\frac{n}{2} \quad \text{as }Y[X=n]\sim\mathcal{U}(0,n)\\ \implies \mathbb{E}[Y\mid X]&=\frac{X}{2}\\ \implies \mathbb{E}[Y]&=\mathbb{E}[\mathbb{E}[Y\mid X]]\\ &=\mathbb{E}\left[\frac{X}{2}\right]\\ &=\frac{1}{2}\mathbb{E}[X]\\ &=\frac{1}{2p} \end{align*} ∀ n ∈ N ∗ , E [ Y ∣ X = n ] ⟹ E [ Y ∣ X ] ⟹ E [ Y ] = 2 n as Y [ X = n ] ∼ U ( 0 , n ) = 2 X = E [ E [ Y ∣ X ]] = E [ 2 X ] = 2 1 E [ X ] = 2 p 1 This example may illustrate how using the right tools can simplify your work and make your engineering life easier 😄
We can go further, and calculate the variance of Y Y Y
First of all, we calculate the conditional variance, given X X X :
∀ n ∈ N ∗ , V [ Y ∣ X = n ] = n 2 12 since Y [ X = n ] ∼ U ( 0 , n ) ⟹ V [ Y ∣ X ] = 1 12 X 2 \begin{align*} \forall n\in\mathbb{N}^*,\quad \mathbb{V}[Y\mid X=n]&=\frac{n^2}{12} \quad \text{since} \space Y[X=n]\sim\mathcal{U}(0,n) \\ \implies \mathbb{V}[Y \mid X]&=\frac{1}{12}X^2 \end{align*} ∀ n ∈ N ∗ , V [ Y ∣ X = n ] ⟹ V [ Y ∣ X ] = 12 n 2 since Y [ X = n ] ∼ U ( 0 , n ) = 12 1 X 2 Now using the law of total variance:
V [ Y ] = E [ V [ Y ∣ X ] ] + V [ E [ Y ∣ X ] ] = 1 12 E [ X 2 ] + V [ 1 2 X ] = 1 12 V [ X ] + 1 12 E [ X ] 2 + 1 4 V [ X ] = 1 3 V [ X ] + 1 12 E [ X ] 2 = 1 3 ⋅ 1 − p p 2 + 1 12 p 2 = 1 12 ⋅ 5 − 4 p p 2 \begin{align*} \mathbb{V}[Y]&=\mathbb{E}[\mathbb{V}[Y\mid X]]+\mathbb{V}[\mathbb{E}[Y\mid X]]\\ &= \frac{1}{12}\mathbb{E}[X^2]+\mathbb{V}\left[\frac{1}{2}X\right] \\ &=\frac{1}{12}\mathbb{V}[X]+\frac{1}{12}\mathbb{E}[X]^2+\frac{1}{4}\mathbb{V}[X]\\ &= \frac{1}{3}\mathbb{V}[X]+\frac{1}{12}\mathbb{E}[X]^2\\ &=\frac{1}{3}\cdot\frac{1-p}{p^2}+\frac{1}{12p^2}\\ &=\frac{1}{12}\cdot \frac{5-4p}{p^2} \end{align*} V [ Y ] = E [ V [ Y ∣ X ]] + V [ E [ Y ∣ X ]] = 12 1 E [ X 2 ] + V [ 2 1 X ] = 12 1 V [ X ] + 12 1 E [ X ] 2 + 4 1 V [ X ] = 3 1 V [ X ] + 12 1 E [ X ] 2 = 3 1 ⋅ p 2 1 − p + 12 p 2 1 = 12 1 ⋅ p 2 5 − 4 p 3.2 Compounding by a continuous distribution 3.2.1 Definition Let D \mathcal{D} D a family of distributions with parameter s ∈ S s\in S s ∈ S and values on A A A Let X X X be a continuous distribution with values on Q ⊆ S Q\subseteq S Q ⊆ S a discrete random variable Y Y Y is said to follow the compound distribution D ( X ) \mathcal{D}(X) D ( X ) if:
∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) \forall k \in Q, \quad Y[X=k]\sim \mathcal{D}(k) ∀ k ∈ Q , Y [ X = k ] ∼ D ( k ) 3.2.2 Probability Mass function ∀ y ∈ A , f Y ( y ) = ∫ Q f Y [ X = t ] ( y ) ⋅ f X ( t ) dt \forall y \in A,\quad f_Y(y)=\int_Qf_{Y[X=t]}(y)\cdot f_X(t)\space \text{dt} ∀ y ∈ A , f Y ( y ) = ∫ Q f Y [ X = t ] ( y ) ⋅ f X ( t ) dt 3.2.3 Example Let p ∈ [ 0 , 1 ] p\in[0,1] p ∈ [ 0 , 1 ] Let X ∼ U ( 0 , 1 ) X\sim \mathcal{U}(0,1) X ∼ U ( 0 , 1 ) Let Y ∼ E ( X ) Y\sim \mathcal{E}(X) Y ∼ E ( X ) is a compound distribution, it can be thought as following a gamma distribution with random parameter X X X Our goal is to represent the distribution of Y Y Y , its expected value and its variance
We will start by the probability distribution function
∀ y ∈ R + ∗ , f Y ( y ) = ∫ R f Y [ X = t ] ( y ) ⋅ f X ( t ) dt = ∫ 0 1 t e − t y dt = [ − t e − t y y ] 0 1 + 1 y ∫ 0 1 e − t y dt = − e − y y + [ − e − t y y 2 ] 0 1 = − e − y y + 1 − e − y y 2 = 1 − e − y − y e − y y 2 = e y − 1 − y y 2 e y f Y ( 0 ) = ∫ 0 1 t dt = 1 2 \begin{align*} \forall y \in\mathbb{R}^*_+, f_Y(y)&=\int_{\mathbb{R}}f_{Y[X=t]}(y)\cdot f_X(t)\space \text{dt}\\ &=\int_{0}^{1}te^{-ty} \space\text{dt} \\ &=\left[\frac{-te^{-ty}}{y}\right]^{1}_0+\frac{1}{y}\int_0^1e^{-ty}\text{dt} \\ &=\frac{-e^{-y}}{y}+\left[\frac{-e^{-ty}}{y^2}\right]^{1}_0\\ &=\frac{-e^{-y}}{y}+\frac{1-e^{-y}}{y^2}\\ &=\frac{1-e^{-y}-ye^{-y}}{y^2}\\ &=\frac{e^y-1-y}{y^2e^y}\\ f_Y(0)&=\int_{0}^1t\text{dt}\\ &=\frac{1}{2} \end{align*} ∀ y ∈ R + ∗ , f Y ( y ) f Y ( 0 ) = ∫ R f Y [ X = t ] ( y ) ⋅ f X ( t ) dt = ∫ 0 1 t e − t y dt = [ y − t e − t y ] 0 1 + y 1 ∫ 0 1 e − t y dt = y − e − y + [ y 2 − e − t y ] 0 1 = y − e − y + y 2 1 − e − y = y 2 1 − e − y − y e − y = y 2 e y e y − 1 − y = ∫ 0 1 t dt = 2 1 For the expected value, we will use the theorem of total expectation:
E [ Y ] = E [ E [ Y ∣ X ] ] = ∫ 0 1 E [ Y ∣ X = t ] P ( X = t ) dt = ∫ 0 1 1 t dt = + ∞ diverges \begin{align*} \mathbb{E}[Y]&=\mathbb{E}[\mathbb{E}[Y\mid X]] \\ &=\int_0^1 \mathbb{E}[Y\mid X=t]\mathcal{P}(X=t)\text{dt}\\ &=\int_0^1\frac{1}{t} \text{dt} =+\infty\quad \text{diverges} \end{align*} E [ Y ] = E [ E [ Y ∣ X ]] = ∫ 0 1 E [ Y ∣ X = t ] P ( X = t ) dt = ∫ 0 1 t 1 dt = + ∞ diverges So this probability distribution does not have a mean value, and by extension it does not have a variance.
This example serves as a proof that not all distributions have a mean value and/or a variance
By contrast, the random variable Z = Y Z=\sqrt{Y} Z = Y has a finite mean value, but does not have a variance (if it had one, Y Y Y should then have a finite expected value):
E [ Z ] = E [ E [ Y ∣ X ] ] = ∫ 0 1 E [ Y ∣ X = t ] P ( X = t ) dt = ∫ 0 1 1 t dt = [ 2 t ] 0 1 = 2 \begin{align*} \mathbb{E}[Z]&=\mathbb{E}[\mathbb{E}[\sqrt{Y}\mid X]] \\ &=\int_0^1 \mathbb{E}[\sqrt{Y}\mid X=t]\mathcal{P}(X=t)\text{dt}\\ &=\int_0^1\frac{1}{\sqrt{t}} \text{dt} \\ &= \left[2\sqrt{t}\right]^{1}_0\\ &=2 \end{align*} E [ Z ] = E [ E [ Y ∣ X ]] = ∫ 0 1 E [ Y ∣ X = t ] P ( X = t ) dt = ∫ 0 1 t 1 dt = [ 2 t ] 0 1 = 2