1.2 Probability Theory

ํŒจํ„ด์ธ์‹์—์„œ ์ค‘์š”ํ•œ ๊ฐœ๋… ์ค‘ ํ•˜๋‚˜๋Š” ๋ถˆํ™•์‹ค์„ฑ(uncertainty) ์ด๋‹ค. ํ™•๋ฅ  ์ด๋ก (Probability Theory) ์€ ๋ถˆํ™•์‹ค์„ฑ์„ ์ •ํ™•ํ•˜๊ณ  ์–‘์ ์ธ ๋ฐฉ์‹์œผ๋กœ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์ผ๊ด€๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ด์ค€๋‹ค. ๋˜ํ•œ ๊ฒฐ์ • ์ด๋ก (Decision Theory) ์™€ ๊ฒฐํ•ฉํ•˜๋ฉด ํ˜„์žฌ ๊ฐ€์ง„ ์ •๋ณด๋‚ด์—์„œ ์ตœ์ ์˜ ์˜ˆ์ธก์„ ๋‚ด๋ฆด์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค€๋‹ค.

์ด ์ฑ…์—์„œ๋Š” ๋‹ค์Œ ์˜ˆ์ œ๋กœ ํ™•๋ฅ ์„ ์†Œ๊ฐœํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ํ™•๋ฅ ๋ณ€์ˆ˜(Random Variable) BB๋กœ ๊ทธ๋ฆผ 1.2.1์˜ ๋ฐ•์Šค๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค. ์ด ํ™•๋ฅ ๋ณ€์ˆ˜ BB๋Š” ๋นจ๊ฐ„์ƒ‰(rr)๊ณผ ํŒŒ๋ž‘์ƒ‰(bb) ๋‘ ๊ฐ€์ง€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค. ๋ฐ•์Šค ์•ˆ์— ์žˆ๋Š” ๊ณผ์ผ์˜ ์ข…๋ฅ˜ ๋˜ํ•œ ํ™•๋ฅ ๋ณ€์ˆ˜ FF๋กœ ํ‘œํ˜„ํ•˜๋ฉฐ, ์‚ฌ๊ณผ(aa)์™€ ์˜ค๋ Œ์ง€(oo) ๋‘ ๊ฐ€์ง€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค.

1.2.1

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ์‚ฌ๊ฑด์˜ ๋ฐœ์ƒ ํšŸ์ˆ˜๋ฅผ ์ด ์‹œํ–‰ํšŸ์ˆ˜๋กœ ๋‚˜๋ˆˆ ๊ฐ’์„ ์–ด๋–ค ์‚ฌ๊ฑด(event)์˜ ํ™•๋ฅ ๋กœ ์ •์˜ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค์Œ ์‚ฌ๊ฑด๋“ค์˜ ํ™•๋ฅ ์„ ์ •์˜ ํ•  ์ˆ˜ ์žˆ๋‹ค(๋นจ๊ฐ•์ƒ‰ ๋ฐ•์Šค๋ฅผ ์„ ํƒํ•  ํ™•๋ฅ ์€ 40%, ํŒŒ๋ž‘์ƒ‰์€ 60%๋‹ค).

p(B=b)=410p(B=r)=610\begin{aligned}p(B=b)&=\dfrac{4}{10} \\ p(B=r)&=\dfrac{6}{10}\end{aligned}

์œ„ ์ •์˜์— ๋”ฐ๋ฅด๋ฉด, ํ™•๋ฅ ์€ ํ•ญ์ƒ 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ๋˜ํ•œ, ์ƒํ˜ธ ๋ฐฐํƒ€์ (mutually exclusive)์ด๊ฑฐ๋‚˜ ๋ชจ๋“  ๊ฒฐ๊ณผ(outcomes)๋ฅผ ํฌํ•จํ•˜๋Š” ๊ฒฝ์šฐ, ๋ชจ๋“  ํ™•๋ฅ ์˜ ํ•ฉ์€ 1์ด ๋˜์–ด์•ผ ํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ ์ž ๊น ํ™•๋ฅ ์—์„œ ํ•ฉ์˜ ๋ฒ•์น™(sum rule)๊ณผ ๊ณฑ์˜ ๋ฒ•์น™(product rule) ์•Œ์•„๋ณด๊ณ  ์˜จ๋‹ค.

1.2.2

๊ทธ๋ฆผ 1.2.2์—์„œ XX, YY ๋‘ ๊ฐœ์˜ ํ™•๋ฅ ๋ณ€์ˆ˜๊ฐ€ ์žˆ๋‹ค. XX๋Š” xix_i๊ฐ’์„ ์ทจํ•  ์ˆ˜ ์žˆ๊ณ (ii๋Š” (1,โ‹ฏโ€‰,M)(1, \cdots, M)๊นŒ์ง€), YY๋Š” yjy_j๊ฐ’์„ ์ทจํ•  ์ˆ˜ ์žˆ๋‹ค(jj๋Š” (1,โ‹ฏโ€‰,N)(1, \cdots, N)๊นŒ์ง€). ๋˜ํ•œ, XX์™€ YY์—์„œ ํ‘œ๋ณธ์„ ์ถ”์ถœํ•˜๋Š”๋ฐ ์ด ์‹œ๋„ํšŸ์ˆ˜๋ฅผ NN์ด๋ผ๊ณ  ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  XX๊ฐ€ xix_i๊ฐ’์„ ์ทจํ•˜๊ณ  YY๊ฐ€ yjy_j๊ฐ’์„ ์ทจํ–ˆ์„ ๋•Œ์˜ ์‹œ๋„ ๊ฐฏ์ˆ˜๋ฅผ nijn_{ij} ๋ผ๊ณ  ํ•œ๋‹ค. ์ด๋•Œ ํ™•๋ฅ ์€ p(X=xi,Y=yj)p(X=x_i, Y=y_j)๋ผ๊ณ  ํ•˜๋ฉฐ, X=xi,Y=yjX=x_i, Y=y_j์˜ ๊ฒฐํ•ฉ ํ™•๋ฅ (joint probability) ์ด๋ผ๊ณ  ํ•œ๋‹ค.

์‹ค์ œ๋กœ ์ž„์˜๋กœ ํšŸ์ˆ˜๋ฅผ ์ง€์ •ํ•ด์„œ ๊ณ„์‚ฐ์„ ํ•ด๋ณด์ž.

1.2.3
np.random.seed(777)
A = np.random.randint(1, 10, size=(3, 5))
fig, ax = plt.subplots(1, 1)
ax.matshow(table, cmap="coolwarm")

for (i, j), z in np.ndenumerate(A):
    ax.text(j, i, f"{z}", ha="center", va="center")
ax.set_xticklabels(np.arange(0, 6))
ax.set_yticklabels(np.arange(0, 4))
ax.set_xlabel("$X$", fontsize=20)
ax.set_ylabel("$Y$", fontsize=20).set_rotation(0)

plt.show()

ํ™•๋ฅ ๋ณ€์ˆ˜ YY์— ๊ด€๊ณ„์—†์ด X=xiX=x_i์˜ ์‹œ๋„ ํšŸ์ˆ˜๋ฅผ cic_i, XX์— ๊ด€๊ณ„์—†์ด Y=yjY=y_j์˜ ์‹œ๋„ ํšŸ์ˆ˜๋ฅผ rjr_j๋ผ๊ณ  ํ•˜๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

ci=โˆ‘jnijri=โˆ‘inij\begin{aligned} c_i &= \sum_j n_{ij} \\ r_i &= \sum_i n_{ij} \end{aligned}

์ด๋ฅผ ํ†ตํ•ด ํ™•๋ฅ ์˜ ํ•ฉ์˜ ๋ฒ•์น™(sum rule)์„ ๋„์ถœํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค. p(X=xi)p(X=x_i)๋ฅผ ์ฃผ๋ณ€ ํ™•๋ฅ (marginal probability)์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค.

X=xiX=x_i์ธ ์‚ฌ๋ก€๋“ค์„ ๊ณ ๋ คํ•˜์—ฌ ์ด์ค‘์—์„œ Y=yjY=y_j์ธ ํ™•๋ฅ , ์ฆ‰ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ (conditional probability) p(Y=yjโˆฃX=xi)p(Y=y_j \vert X=x_i)๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆผ 1.2.2์—์„œ ๋ถ„ํ•ดํ•˜๋ฉด X=xiX=x_i์˜ ์ฃผ๋ณ€ ํ™•๋ฅ (marginal probability)์ค‘์—์„œ Y=yjY=y_j๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

์ˆ˜์‹ 1.5, 1.7, 1.8์„ ๊ฒฐํ•ฉํ•˜๋ฉด, ํ™•๋ฅ ์˜ ๊ณฑ์˜ ๋ฒ•์น™(product rule)์„ ๋„์ถœํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

p(X=xi,Y=yj)=p(Y=yjโˆฃX=xi)p(X=xi)=nijN=nijciciN(1.9)\tag{1.9} \begin{aligned} p(X=x_i, Y=y_j) &= p(Y=y_j \vert X=x_i)p(X=x_i) \\ &= \dfrac{n_{ij}}{N} = \dfrac{n_{ij}}{c_i} \dfrac{c_i}{N} \end{aligned}

์œ„์™€ ๊ฐ™์ด ํ‘œํ˜„์€ ๋„ˆ๋ฌด ๋ณต์žกํ•˜๋‹ˆ ์กฐ๊ธˆ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ํ™•๋ฅ ๋ณ€์ˆ˜์˜ ๋ถ„ํฌ๋ฅผ ํ‘œํ˜„ํ•  ๋•Œ๋Š” p(X)p(X), ํ™•๋ฅ ๋ณ€์ˆ˜๊ฐ€ ์ทจํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์˜ ๋ถ„ํฌ์„ ํ‘œํ˜„ํ•  ๋•Œ๋Š” p(x)p(x)๋กœ ์•ฝ์†ํ•œ๋‹ค.

sumย rulep(X)=โˆ‘Yp(X,Y)productย rulep(X,Y)=p(YโˆฃX)p(X)\begin{aligned} \text{sum rule} && p(X) &= \sum_Y p(X, Y) \\ \text{product rule} && p(X, Y) &= p(Y \vert X)p(X) \end{aligned}

๊ณฑ์˜ ๋Œ€์นญ์„ฑ p(X,Y)=p(Y,X)p(X, Y) = p(Y, X)์œผ๋กœ๋ถ€ํ„ฐ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ๊ด€๊ณ„์‹์œผ๋กœ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ(Bayes' theorem)์„ ๋„์ถœํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

p(YโˆฃX)=p(Xโˆฃ)p(Y)p(X)(1.12)\tag{1.12} p(Y \vert X) = \dfrac{p(X\vert)p(Y)}{p(X)}

์ง€๊ธˆ๊นŒ์ง€ ๋ฐฐ์šด ๊ฒƒ์œผ๋กœ ๊ทธ๋ฆผ 1.2.1์˜ ์˜ˆ์‹œ์—์„œ ์–ด๋–ค ๊ณผ์ผ์„ ์„ ํƒํ–ˆ๋Š”๋ฐ ๊ทธ ๊ณผ์ผ์ด ์˜ค๋ Œ์ง€๋ผ๋ฉด, ์ด ์˜ค๋ Œ์ง€๊ฐ€ ์–ด๋–ค ์ƒ์ž์—์„œ ๋‚˜์™”์„์ง€๋ฅผ ์˜ˆ์ธก ํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  1. ๊ฐ ์ƒ์ž(ํ™•๋ฅ ๋ณ€์ˆ˜ BB)๋ฅผ ์„ ํƒํ–ˆ์„ ๋•Œ ๊ฐ๊ฐ์˜ ๊ณผ์ผ(ํ™•๋ฅ ๋ณ€์ˆ˜ FF)์ด ๋‚˜์˜ฌ ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    p(F=aโˆฃB=r)=1/4p(F=oโˆฃB=r)=3/4p(F=aโˆฃB=b)=3/4p(F=oโˆฃB=b)=1/4\begin{aligned} p(F=a \vert B=r) &= 1/4 \\ p(F=o \vert B=r) &= 3/4 \\ p(F=a \vert B=b) &= 3/4 \\ p(F=o \vert B=b) &= 1/4 \\ \end{aligned}
  2. ํ™•๋ฅ ์˜ ํ•ฉ์˜ ๋ฒ•์น™๊ณผ ๊ณฑ์˜ ๋ฒ•์น™์„ ์ ์šฉํ•˜์—ฌ ์˜ค๋ Œ์ง€๋ฅผ ๊ณ ๋ฅด๋Š” ์ „์ฒด ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

    p(F=o)=p(F=oโˆฃB=r)p(B=r)+p(F=oโˆฃB=b)p(B=b)=34ร—410+14ร—610=920\begin{aligned} p(F=o) &= p(F=o \vert B=r)p(B=r) + p(F=o \vert B=b)p(B=b) \\ &= \dfrac{3}{4}\times \dfrac{4}{10} + \dfrac{1}{4}\times\dfrac{6}{10} = \dfrac{9}{20} \end{aligned}
  3. ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๊ตฌํ•˜๊ณ  ์‹ถ์€ ๋ฌธ์ œ์˜ ํ™•๋ฅ ์„ ๊ตฌํ•œ๋‹ค.

    p(B=rโˆฃF=o)=p(F=oโˆฃB=r)p(B=r)p(F=o)=34ร—410ร—209=23p(B=bโˆฃF=o)=1โˆ’23=13\begin{aligned} p(B=r \vert F=o) &= \dfrac{p(F=o \vert B=r)p(B=r)}{p(F=o)} = \dfrac{3}{4} \times \dfrac{4}{10} \times \dfrac{20}{9} = \frac{2}{3} \\ p(B=b \vert F=o) &= 1 - \frac{2}{3} = \frac{1}{3} \end{aligned}

์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์–ด๋–ค ๋ฐ•์Šค๋ฅผ ์„ ํƒํ–ˆ๋‹ค๋Š” ์‚ฌ๊ฑด์„ ๊ฐ€๋ฅดํ‚ค๋Š” ํ™•๋ฅ ๋ณ€์ˆ˜ BB์˜ ํ™•๋ฅ (p(B)p(B))์€ ์‚ฌ์ „ ํ™•๋ฅ (prior probability)์ด๋ผ๊ณ  ํ•œ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๊ด€์‹ฌ์žˆ๋Š” ์‚ฌํ•ญ์ธ ์–ด๋–ค ๊ณผ์ผ์ด ์„ ํƒ ๋˜์—ˆ๋Š”์ง€๋ฅผ ๊ด€์ฐฐํ•˜๊ธฐ '์ „'์˜ ํ™•๋ฅ ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์„ ํƒํ•œ ๊ณผ์ผ์ด ์˜ค๋ Œ์ง€๋ผ๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋œ๋‹ค๋ฉด ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ p(BโˆฃF)p(B\vert F)๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์‚ฌํ›„ ํ™•๋ฅ (posterior probability)๋ผ๊ณ  ํ•˜๋ฉฐ, ๊ทธ ์ด์œ ๋Š” ์‚ฌ๊ฑด FF๋ฅผ ๊ด€์ธกํ•œ 'ํ›„'์˜ ํ™•๋ฅ ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ "๋‘ ํ™•๋ฅ ๋ณ€์ˆ˜๊ฐ€ ๋…๋ฆฝ์ (independent)์ด๋‹ค"๋ผ๊ณ  ํ•˜๋Š” ๊ฒƒ์€ ๋‘ ํ™•๋ฅ ๋ณ€์˜ ํ™•๋ฅ ์˜ ๊ณฑ์ด ๊ฒฐํ•ฉํ™•๋ฅ ๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ๋ฅผ ๋งํ•œ๋‹ค. p(X,Y)=p(X)p(Y)p(X, Y) = p(X)p(Y)

1.2.1 Probability densities(ํ™•๋ฅ  ๋ฐ€๋„)

์ง€๊ธˆ๊นŒ์ง€ ์ด์‚ฐ(descrete) ์‚ฌ๊ฑด๋“ค์˜ ํ™•๋ฅ ์„ ๋‹ค๋ค˜๋Š”๋ฐ, ์—ฐ์†์ ์ธ(continious) ๋ณ€์ˆ˜์˜ ํ™•๋ฅ ์„ ์•Œ์•„๋ณธ๋‹ค. ์‹ค์ˆ˜ ํ™•๋ฅ ๋ณ€์ˆ˜ xx๊ฐ€ (x,x+ฮดx)(x, x+\delta x) ๊ตฌ๊ฐ„์˜ ๊ฐ’์„ ๊ฐ€์ง€๊ณ  ํ™•๋ฅ ์ด p(x)ฮดxp(x) \delta x๋ผ๋ฉด, p(x)p(x)๋Š” xx์˜ ํ™•๋ฅ  ๋ฐ€๋„(probability density)๋ผ๊ณ  ํ•œ๋‹ค. ์ด๋•Œ xx๊ฐ€ (a,b)(a, b)๊ตฌ๊ฐ„ ์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์งˆ ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์ถ”๊ฐ€๋กœ ํ™•๋ฅ ์˜ ์ •์˜์— ์˜ํ•˜์—ฌ ๋‹ค์Œ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ด์•ผํ•œ๋‹ค.

  1. p(x)โ‰ฅ0p(x) \geq 0

  2. โˆซโˆ’โˆžโˆžp(x)dx=1\int_{-\infty}^{\infty} p(x) dx = 1

ํ™•๋ฅ  ๋ฐ€๋„์˜ ์ตœ๋Œ“๊ฐ’์€ ์–ด๋–ค ํ™•๋ฅ ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜๋Š”์ง€์— ๋”ฐ๋ผ์„œ ๋‹ฌ๋ผ์ง„๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด x=g(y)x=g(y)์˜ ๋ณ€ํ™˜์„ ํ•˜๊ฒŒ ๋˜๋ฉด, ํ•จ์ˆ˜ f(x)f(x) ๋Š” f^(y)=f(g(y))\hat{f}(y) = f(g(y))๋กœ ๋ฐ”๋€๋‹ค. xx์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜ px(x)p_x(x)์™€ yy์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜ py(y)p_y(y)๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ํ™•๋ฅ  ๋ฐ€๋„๋ฅผ ๊ฐ€์ง„๋‹ค. (x,x+ฮดx)(x, x + \delta x)๋ฒ”์œ„์— ์†ํ•˜๋Š” ๊ด€์ฐฐ๊ฐ’์€ (y,y+ฮดy)(y, y + \delta y)๋กœ ๋ณ€ํ™˜๋œ๋‹ค. ์ด๋Š” ๋น„์„ ํ˜• ๋ณ€์ˆ˜ ๋ณ€ํ™˜์‹œ ์•ผ์ฝ”๋น„์•ˆ ์ธ์ž(Jacobian Factor)๊ฐ€ ๋”ฐ๋ผ ๋ถ™๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

ํ™•๋ฅ ๋ณ€์ˆ˜ xx๊ฐ€ (โˆ’โˆž,z)(-\infty, z) ๋ฒ”์œ„์— ์†ํ•  ํ™•๋ฅ ์€ ๋ˆ„์  ๋ถ„ํฌ ํ•จ์ˆ˜(cumulative distribution function) ๋ผ๊ณ  ํ•œ๋‹ค.

P(z)=โˆซโˆ’โˆžzp(x)dx(1.28)\tag{1.28} P(z) = \int_{-\infty}^{z} p(x) dx

์—ฌ๊ธฐ์„œ Pโ€ฒ(x)=p(x)P'(x) = p(x) ๋‹ค.

๊ทธ๋ฆผ 1.2.4 ์—์„œ ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜(๋นจ๊ฐ•)์™€ ๋ˆ„์  ๋ถ„ํฌ ํ•จ์ˆ˜(ํŒŒ๋ž‘)์˜ ๋ชจ์–‘์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฃผ์˜ ํ•  ์ ์€ ํ™•๋ฅ  ๋ฐ€๋„๋Š” ์ผ์ • ๋ฒ”์œ„ฮดx\delta x ๋‚ด์— ์ •์˜๋˜๋Š” ํ•จ์ˆ˜๋‹ค.

1.2.4

๋ฒกํ„ฐ x=(x1,x2,โ‹ฏโ€‰,xD)\mathbf{x} = (x_1, x_2, \cdots, x_D)๋กœ ์ฃผ์–ด์ง„ ๋‹ค๋ณ€์ˆ˜์ธ ๊ฒฝ์šฐ, ๋˜‘๊ฐ™์ด ํ™•๋ฅ  ๋ฐ€๋„ p(x)=p(x1,x2,โ‹ฏโ€‰,xD)p(\mathbf{x}) = p(x_1, x_2, \cdots, x_D)๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹จ๋ณ€์ˆ˜์™€ ๊ฐ™์ด ๋‹ค์Œ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ด์•ผํ•œ๋‹ค.

  1. p(x)โ‰ฅ0p(\mathbf{x}) \geq 0

  2. โˆซโˆ’โˆžโˆžp(x)dx=1\int_{-\infty}^{\infty} p(\mathbf{x}) d\mathbf{x} = 1

๋งŒ์•ฝ ํ™•๋ฅ ๋ณ€์ˆ˜ xx๊ฐ€ ์ด์‚ฐํ™•๋ฅ ๋ณ€์ˆ˜์ธ ๊ฒฝ์šฐ p(x)p(x)๋ฅผ ํ™•๋ฅ  ์งˆ๋Ÿ‰ ํ•จ์ˆ˜(probability mass function)์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค.

๋˜ํ•œ, ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜์— ํ•ฉ์˜ ๋ฒ•์น™, ๊ณฑ์˜ ๋ฒ•์น™, ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

p(x)=โˆซp(x,y)dyp(x,y)=p(yโˆฃx)p(x)\begin{aligned} p(x) &= \int p(x,y) dy \\ p(x, y) &= p(y \vert x) p(x)\end{aligned}

1.2.2 Expectations and covariances

์–ด๋–ค ํ™•๋ฅ  ๋ถ„ํฌ p(x)p(x)ํ•˜์— ํ™•๋ฅ  ํ•จ์ˆ˜ f(x)f(x)์˜ ํ‰๊ท ์„ ๊ธฐ๋Œ“๊ฐ’(expectation)์ด๋ผ๊ณ  ํ•˜๋ฉฐ, E(f)\Bbb{E}(f)๋ผ๊ณ  ํ‘œ๊ธฐํ•œ๋‹ค.

  • ํ™•๋ฅ  ์งˆ๋Ÿ‰ ํ•จ์ˆ˜์ธ ๊ฒฝ์šฐ: E[f]=โˆ‘xp(x)f(x)\Bbb{E}[f] = \sum_x p(x)f(x)

  • ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜์ธ ๊ฒฝ์šฐ: E[f]=โˆซxp(x)f(x)dx\Bbb{E}[f] = \int_x p(x)f(x)dx

๋งŒ์•ฝ ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ ์œ ํ•œํ•œ NN๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ์ถ”์ถœํ•œ๊ฑฐ๋ผ๋ฉด, ๊ฐ ํฌ์ธํŠธ๋“ค์˜ ์œ ํ•œํ•œ ํ•ฉ์‚ฐ์œผ๋กœ ๊ธฐ๋Œ“๊ฐ’์„ ๊ทผ์‚ฌ(approximate)ํ•  ์ˆ˜ ์žˆ๋‹ค(์ฐจํ›„ 11์žฅ์—์„œ ํ‘œ๋ณธ ์ถ”์ถœ ๋ฐฉ๋ฒ•๋ก ์—์„œ ํ™œ์šฉํ•œ๋‹ค).

๋‹ค๋ณ€์ˆ˜ ํ•จ์ˆ˜์˜ ๊ธฐ๋ฑƒ๊ฐ’์„ ๊ตฌํ•  ๊ฒฝ์šฐ์—๋Š” ์–ด๋–ค ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ํ‰๊ท ์„ ๋‚ด๋Š”์ง€๋ฅผ ์ง€์ •ํ•˜์—ฌ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ์‹œ๋กœ Ex[f(x,y)]\Bbb{E}_x[f(x, y)]๋Š” ํ•จ์ˆ˜ f(x,y)f(x, y)์˜ ํ‰๊ท ๊ฐ’์„ xx์˜ ๋ถ„ํฌ์— ๋Œ€ํ•ด ๊ตฌํ•˜๋ผ๋Š” ์˜๋ฏธ์ด๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ yy์— ๋Œ€ํ•œ ํ•จ์ˆ˜๊ฐ€ ๋œ๋‹ค.

๋˜ํ•œ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์ฒ˜๋Ÿผ ์กฐ๊ฑด๋ถ€ ๊ธฐ๋Œ“๊ฐ’(conditional expectation)๋„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ถ„์‚ฐ(variance)์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

๊ณต๋ถ„์‚ฐ(covariance)์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

๋‹ค๋ณ€์ˆ˜์˜ ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

1.2.3 Bayesian probabilities

ํ™•๋ฅ ์—๋Š” ๋‘ ๊ฐ€์ง€ ๊ด€์ ์ด ์žˆ๋‹ค.

  1. ๋นˆ๋„์ (frequentist) ํ˜น์€ ๊ณ ์ „์ (classical) ๊ด€์ : ํ™•๋ฅ ์„ ์ž„์˜์˜ ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•œ ์‚ฌ๊ฑด์˜ ๋นˆ๋„์ˆ˜

  2. ๋ฒ ์ด์ง€์•ˆ(Bayesian) ๊ด€์ : ๋ถˆํ™•์‹ค์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•˜๊ณ  ์ฆ๊ฑฐ๋ฅผ ํ†ตํ•ด ๋ถˆํ™•์‹ค์„ฑ์„ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๊ฒƒ, ๋ถˆํ™•์‹ค์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋„๊ตฌ๋กœ ํ™•๋ฅ ์„ ์‚ฌ์šฉ.

1.1์ ˆ์˜ ์˜ˆ์ œ์—์„œ ๋งค๊ฐœ๋ณ€์ˆ˜ w\mathbf{w}๋ฅผ ๋ฒ ์ด์ง€์•ˆ ๊ด€์ ์„ ์‚ฌ์šฉํ•˜๋ฉด, ํ™•๋ฅ ๋ก ์˜ ๋‹ค์–‘ํ•œ ์žฅ์น˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฒซ ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€์ฐฐํ•˜๊ธฐ ์ „์˜ w\mathbf{w}์— ๋Œ€ํ•œ ๊ฐ€์ •์„ ์‚ฌ์ „ ํ™•๋ฅ ๋ถ„ํฌ p(w)p(\mathbf{w})๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ D={t1,โ‹ฏโ€‰,tN}\mathcal{D} = \{t_1, \cdots, t_N\}์€ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  p(Dโˆฃw)p(\mathcal{D}\vert \mathbf{w})๋กœ์จ ์ž‘์šฉํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๊ด€์ฐฐ ํ›„ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ํ™•๋ฅ  p(wโˆฃD)p(\mathbf{w}\vert \mathcal{D})์„ ๋ฒ ์ด์ง€์•ˆ ์ •๋ฆฌ๋กค ํ’€์–ด๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์ˆ˜์‹ 1.43 ์šฐ์ธก์˜ p(Dโˆฃw)p(\mathcal{D}\vert \mathbf{w})๋Š” ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜(likelihood function)๋ผ๊ณ  ํ•˜๋ฉฐ ์ด๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ๋ฒกํ„ฐ w\mathbf{w}์˜ ํ•จ์ˆ˜๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜์˜ ์˜๋ฏธ๋Š” ์ฃผ์–ด์ง„ w\mathbf{w}์— ๋Œ€ํ•ด ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ด ์–ผ๋งˆ๋‚˜ '์ด๋ ‡๊ฒŒ ๋‚˜ํƒ€๋‚  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š”๊ฐ€'๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค. ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜๋Š” w\mathbf{w}์— ๋Œ€ํ•œ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ์ ๋ถ„ํ•ด๋„ 1์ด ๋  ํ•„์š”๊ฐ€ ์—†๋‹ค.

๋นˆ๋„์  ๊ด€์ ๊ณผ ๋ฒ ์ด์ง€์•ˆ ๊ด€์ ์˜ ์ฐจ์ด๋Š” ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜์—์„œ ๋‚˜ํƒ€๋‚œ๋‹ค.

๋นˆ๋„์  ๊ด€์ :

  • w\mathbf{w}๊ฐ€ ๊ณ ์ •๋œ ๋งค๊ฐœ๋ณ€์ˆ˜์ด๊ณ , ์–ด๋–ค ํ˜•ํƒœ์˜ '์ถ”์ •์ž(estimator)' ๋ฐ์ดํ„ฐ D\mathcal{D}์˜ ๋ถ„ํฌ๋ฅผ ๊ณ ๋ คํ•˜๋ฉด์„œ ์˜ค๋ฅ˜๋ฅผ ์ค„์ด๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ’์ด ๊ฒฐ์ •๋œ๋‹ค.

  • ๋ณดํ†ต estimator๋กœ ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„(maximum likelihood)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, w\mathbf{w}๊ฐ€ ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜ p(Dโˆฃw)p(\mathcal{D}\vert \mathbf{w})๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์„ ํƒ๋œ๋‹ค. ๋ณดํ†ต ์Œ์˜ ๋กœ๊ทธ ๊ฐ€๋Šฅ๋„(negative log likelihood)๋ฅผ ์˜ค์ฐจํ•จ์ˆ˜(error function)๋กœ ์„ค์ •ํ•˜์—ฌ ์ถ”์ •ํ•œ๋‹ค(๋‹จ์กฐ ๊ฐ์†Œํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅ๋„์˜ ์ตœ๋Œ“๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ์€ ๊ณง ์˜ค์ฐจํ•จ์ˆ˜์˜ ์ตœ์†Ÿ๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ๊ณผ ๋™์ผ).

  • ์˜ค์ฐจ๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ค‘ ํ•˜๋‚˜๋Š” ๋ถ€ํŠธ์ŠคํŠธ๋žฉ(bootstrap)์ธ๋ฐ, ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์—์„œ ์—ฌ๋Ÿฌ๋ฒˆ ์ค‘๋ณต ๊ฐ€๋Šฅํ•˜๊ฒŒ ์ž„์˜๋กœ ์ถ”์ถœํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์œผ๋กœ ๋งŒ๋“  ํ›„, ์—ฌ๋Ÿฌ๋ฒˆ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”์ •ํ•˜์—ฌ ์ถ”์ •๊ฐ’์˜ ํ†ต๊ณ„์  ์ •ํ™•๋„๋ฅผ ํŒ๋‹จํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

๋ฒ ์ด์ง€์•ˆ ๊ด€์ :

  • ๋งŽ์€ ๊ฒฝ์šฐ ์ค‘ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ D\mathcal{D}์ด ๊ด€์ธก๋œ ๊ฒƒ์ผ ๋ฟ์ด๋ฉฐ, ๋งค๊ฐœ๋ณ€์ˆ˜ w\mathbf{w}์˜ ๋ถˆํ™•์‹ค์„ฑ์€ w\mathbf{w}์˜ ๋ถ„ํฌ๋กœ ํ‘œํ˜„ํ•œ๋‹ค.

  • ์žฅ์ ์ค‘ ํ•˜๋‚˜๋Š” ์‚ฌ์ „ ์ง€์‹์„ ์ถ”๋ก  ๊ณผ์ •์— ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํฌํ•จ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ๊ณผ๋„ํ•œ ๊ฒฐ๋ก ์ด ๋‚˜์˜ค์ง€ ์•Š๊ฒŒ ๋ฐฉ์ง€ํ•œ๋‹ค. ์˜ˆ: ๋™์ „์„ ์„ธ๋ฒˆ ๋˜์กŒ๋Š”๋ฐ ๋ชจ๋‘ ์•ž๋ฉด์ธ ๊ฒฝ์šฐ ๋นˆ๋„์  ๊ด€์ ์—์„œ ํ™•๋ฅ ์€ 1์ด๋‹ค.

  • ๋ช‡ ๊ฐ€์ง€ ๋น„ํŒ์ค‘ ํ•˜๋‚˜๋Š” ์‚ฌ์ „ ํ™•๋ฅ ์˜ ์„ ํƒ์— ๋”ฐ๋ผ ๊ฒฐ๋ก ์ด ๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๋ก  ๊ณผ์ •์— ์ฃผ๊ด€์ด ํฌํ•จ๋  ์ˆ˜๋ฐ–์— ์—†๋‹ค. ์ด๋ฅผ ๋ณด์ •ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌด์ •๋ณด์ (noninformative) ์‚ฌ์ „ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๋‹ค.

  • ๋ฒ ์ด์ง€์•ˆ ์ ˆ์ฐจ๋ฅผ ์™„์ „ํžˆ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ „์ฒด ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ณต๊ฐ„์— ๋Œ€ํ•œ marginalize(์ฃผ๋ณ€ํ™”: ํ•ฉ ๋˜๋Š” ์ ๋ถ„)์ด ํ•„์š”ํ•˜๋‹ค. ๋ชฌํ…Œ ์นด๋ฅผ๋กœ ๋ฐฉ๋ฒ•๋ก ๊ณผ ์ปดํ“จํ„ฐ ์—ฐ์‚ฐ ์†๋„, ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ฐœ์ „์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค.

1.2.4 The Gaussian distribution

2์žฅ์—์„œ ๋‹ค์–‘ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณด๊ธฐ ์ „์— ์ž์ฃผ ๋ณด๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ(Gaussian distribution) ๋˜๋Š” ์ •๊ทœ ๋ถ„ํฌ(normal distribution)๋ฅผ ๋จผ์ € ์‚ดํŽด๋ณธ๋‹ค.

๋‹จ์ผ ์‹ค์ˆ˜ ํ™•๋ฅ ๋ณ€์ˆ˜ xx์— ๋Œ€ํ•ด์„œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • ฮผ\mu๋Š” ํ‰๊ท (mean), ฯƒ2\sigma^2 ๋ถ„์‚ฐ(variance), ฯƒ\sigma๋Š” ํ‘œ์ค€ํŽธ์ฐจ(standard deviation)๋ผ๊ณ  ํ•˜๊ณ , ๋ถ„์‚ฐ์˜ ์—ญ์ธ ฮฒ=1/ฯƒ2\beta = 1/\sigma^2๋Š” ์ •๋ฐ€๋„(precision)๋ผ๊ณ  ํ•œ๋‹ค.

  • ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋Š” ํ™•๋ฅ  ๋ถ„ํฌ์˜ ํŠน์„ฑ์„ ๋งŒ์กฑํ•œ๋‹ค.

    N(xโˆฃฮผ,ฯƒ2)>0โˆซโˆ’โˆžโˆžN(xโˆฃฮผ,ฯƒ2)dx=1\begin{aligned} \mathcal{N}(x \vert \mu, \sigma^2) > 0 \\ \int_{-\infty}^{\infty} \mathcal{N}(x \vert \mu, \sigma^2) dx = 1 \end{aligned}

๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ์ž„์˜์˜ xx์— ๋Œ€ํ•ด ํ•จ์ˆ˜์˜ ๊ธฐ๋Œ“๊ฐ’์„ ๊ตฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๋ถ„์‚ฐ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

var[x]=E[x2]โˆ’E[x]2=โˆซโˆ’โˆžโˆžN(xโˆฃฮผ,ฯƒ2)x2dxโˆ’ฮผ2=ฮผ2+ฯƒ2โˆ’ฮผ2=ฯƒ2\begin{aligned} var[x] &= \Bbb{E}[x^2] - \Bbb{E}[x]^2 \\ &= \int_{-\infty}^{\infty} \mathcal{N}(x \vert \mu, \sigma^2)x^2 dx - \mu^2 \\ &= \mu^2 + \sigma^2 - \mu^2 \\ & = \sigma^2 \end{aligned}

์ด์ œ ์—ฐ์† ๋ณ€์ˆ˜ D์ฐจ์› ๋ฒกํ„ฐ x=(x1,x2,โ‹ฏxD)T\mathbf{x} = (x_1, x_2, \cdots x_D)^T๋กœ ํ™•์žฅํ•œ๋‹ค. x\mathbf{x}์— ๋Œ€ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • D์ฐจ์› ๋ฒกํ„ฐ ฮผ\mathbf{\mu}๋Š” ํ‰๊ท ๊ฐ’, Dร—DD \times Dํ–‰๋ ฌ ฮฃ\Sigma๋Š” ๊ณต๋ถ„์‚ฐ์ด๋ผ๊ณ  ํ•œ๋‹ค. โˆฃฮฃโˆฃ\vert \Sigma \vert๋Š” ฮฃ\Sigma์˜ ํ–‰๋ ฌ์‹์ด๋‹ค.

๋‹ค์‹œ ๋‹จ์ผ ์‹ค์ˆ˜ ํ™•๋ฅ ๋ณ€์ˆ˜๋กœ ๋Œ์•„์˜ค๋ฉด, ๊ด€์ธก ๋ฐ์ดํ„ฐ X=(x1,x2,โ‹ฏโ€‰,xN)TX = (x_1, x_2, \cdots, x_N)^T์—์„œ ๊ฐ ๋ณ€์ˆ˜ xnx_n๋Š” ํ‰๊ท ๊ฐ’ ฮผ\mu, ๋ถ„์‚ฐ ฯƒ2\sigma^2๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ ์ถ”์ถœํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋ฅผ ๋…๋ฆฝ์ ์ด๊ณ  ๋™์ผํ•˜๊ฒŒ ๋ถ„ํฌ(independent and identically distributed - i.i.d) ๋˜์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ XX๋Š” i.i.d์ด๊ธฐ ๋•Œ๋ฌธ์— ฮผ,ฯƒ2\mu, \sigma^2๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์ˆ˜์‹ 1.53์€ ฮผ,ฯƒ2\mu, \sigma^2์— ๋Œ€ํ•œ ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜(likelihood function)์— ํ•ด๋‹นํ•œ๋‹ค. ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ(XX)์„ ๋ฐ”ํƒ•์œผ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜ ฮผ,ฯƒ2\mu, \sigma^2๋ฅผ ๊ฒฐ์ •์ง“๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Š” ์–‘๋ณ€์— ๋‹จ์กฐํ•จ์ˆ˜์ธ logโก\log๋ฅผ ์ทจํ•˜์—ฌ ์ตœ๋Œ“๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•˜๋‹ค.

ฮผ\mu์— ๋Œ€ํ•ด ์ˆ˜์‹ 1.54์˜ ์ตœ๋Œ“๊ฐ’์„ ์ฐพ์œผ๋ฉด, ๊ด€์ฐฌ๊ฐ’๋“ค์˜ ํ‰๊ท ์ธ ํ‘œ๋ณธ ํ‰๊ท (sample mean)๊ณผ ํ‘œ๋ณธ ๋ถ„์‚ฐ(sample variance)์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

ฮผMLE=1Nโˆ‘n=1NxnฯƒMLE2=1Nโˆ‘n=1N(xnโˆ’ฮผMLE)2\begin{aligned} \mu_{MLE} &= \dfrac{1}{N}\sum_{n=1}^N x_n \\ \sigma_{MLE}^2 &= \dfrac{1}{N}\sum_{n=1}^N (x_n - \mu_{MLE})^2 \end{aligned}

๊ทธ๋Ÿฌ๋‚˜ ์ด๋ ‡๊ฒŒ ๊ตฌํ•˜๋Š” ๊ฒƒ์€ ๋ถ„ํฌ์˜ ๋ถ„์‚ฐ์„ ๊ณผ์†Œํ‰๊ฐ€ํ•˜๊ฒŒ ๋œ๋‹ค. ์œ„ ์ˆ˜์‹๋“ค์˜ ๊ธฐ๋Œ“๊ฐ’์„ ๊ตฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

E[ฮผMLE]=ฮผE[ฯƒMLE2]=Nโˆ’1Nฯƒ2\begin{aligned} \Bbb{E}[\mu_{MLE}] &= \mu \\ \Bbb{E}[\sigma_{MLE}^2] &= \dfrac{N-1}{N} \sigma^2 \end{aligned}

์ฆ‰, ์‹ค์ œ ๋ถ„์‚ฐ์€ Nโˆ’1N\dfrac{N-1}{N} ๋งŒํผ ์ž‘์•„์ ธ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ ‡๊ฒŒ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ๊ฒƒ์„ ํŽธํ–ฅ(bias) ์ด๋ผ๋Š” ํ˜„์ƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์‹ค์ œ ๋ถ„ํฌ์˜ ๋ถ„์‚ฐ(ฯƒ~\tilde{\sigma})์„ ์ถ”์ •ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์ˆ˜์‹ 1.59์—์„œ ์•Œ ์ˆ˜์žˆ๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜(NN)๊ฐ€ ํด ์ˆ˜๋ก ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„๋กœ ๊ตฌํ•œ ํ•ด(solution)์—์„œ ํŽธํ–ฅ์น˜๋Š” ์ ์  ์ค„์–ด๋“ ๋‹ค. ๋ณต์žกํ•œ ๋ชจ๋ธ์ผ ์ˆ˜๋ก ์ตœ๋Œ€ ๊ฐ€๋Šฅ๋„ ๋ฐฉ๋ฒ•๊ณผ ์—ฐ๊ด€๋œ ํŽธํ–ฅ ๋ฌธ์ œ๋Š” ์‹ฌ๊ฐํ•ด์ง„๋‹ค. ๋˜ํ•œ, ์ด ํŽธํ–ฅ ๋ฌธ์ œ๋Š” ๊ณผ์ ํ•ฉ ๋ฌธ์ œ์˜ ๊ทผ๋ณธ์ ์ธ ์›์ธ์— ํ•ด๋‹นํ•œ๋‹ค.

Last updated

Was this helpful?