参考资料: 李航《统计学习方法》
朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法
对于给定的训练数据集,首先基于特征条件独立假设学习输入/输出的联合概率分布 p ( x , y ) p(x,y) p(x,y);然后基于此模型,对给定的输入 x x x,利用贝叶斯定理求出后验概率 p ( y ∣ x ) p(y|x) p(y∣x)最大的输出 y y y
利用训练数据学习 p ( x ∣ y ) p(x|y) p(x∣y)和 p ( y ) p(y) p(y)的估计,得到联合概率分布: p ( x , y ) = p ( y ) p ( x ∣ y ) p(x,y)=p(y)p(x|y) p(x,y)=p(y)p(x∣y)
概率估计可以使极大似然估计或贝叶斯估计
基本假设
朴素贝叶斯法的基本假设是条件独立性,
P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , X ( 2 ) = x ( 2 ) , . . . , X ( n ) = x ( n ) ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) \begin{aligned} P(X=x|Y=c_{k})&=P(X^{(1)}=x^{(1)},X^{(2)}=x^{(2)},...,X^{(n)}=x^{(n)}|Y=c_{k})\\ &=\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k}) \end{aligned} P(X=x∣Y=ck)=P(X(1)=x(1),X(2)=x(2),...,X(n)=x(n)∣Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck)
这是一个较强的假设,由于这一假设,模型包含的条件概率的数量大为减少,朴素贝叶斯法的学习与预测大为简化,高效易于实现,然而分类的性能不一定很高
P ( Y ∣ X ) = P ( X , Y ) P ( X ) = P ( Y ) P ( X ∣ Y ) ∑ Y P ( Y ) P ( X ∣ Y ) P(Y|X)=\frac {P(X,Y)}{P(X)}=\frac {P(Y)P(X|Y)}{\sum \limits_{Y}P(Y)P(X|Y)} P(Y∣X)=P(X)P(X,Y)=Y∑P(Y)P(X∣Y)P(Y)P(X∣Y)
将输入 x x x分到后验概率最大的类 y y y
y = a r g max c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=arg\max \limits{_{c_{k}}P(Y=c_{k})}\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k}) y=argmaxckP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
后验概率最大等价于0-1损失函数时的期望风险最小化
朴素贝叶斯法实际上学习到的生成数据的机制,所以属于生成模型
条件独立假设等于说用于分类的特征在类确定的条件下都是独立的,这一假设使朴素贝叶斯法变得简单,但有时会牺牲一定的分类准确率。
极大似然估计
先验概率 P ( Y = c k ) P(Y=c_{k}) P(Y=ck)的极大似然估计 P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , . . . , K P(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}{N},k=1,2,...,K P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K
设第 j j j个特征 x ( j ) x^{(j)} x(j)可能取值的集合为 { a j 1 , a j 2 , . . . , a j S j } \{a_{j1},a_{j2},...,a_{jS_j}\} {aj1,aj2,...,ajSj},
条件概率 P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)}=a_{jl}|Y=c_{k}) P(X(j)=ajl∣Y=ck)的极大似然估计 P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})} P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck) j = 1 , 2 , . . . , n ; l = 1 , 2 , . . . , S j ; k = 1 , 2 , . . . , K j=1,2,...,n;l=1,2,...,S_{j};k=1,2,...,K j=1,2,...,n;l=1,2,...,Sj;k=1,2,...,K
x i ( j ) x_{i}^{(j)} xi(j)是第 i i i个样本的第 j j j个特征; a j l a_{jl} ajl是第 j j j个特征可能取的第 l l l个值; I I I为指示函数
贝叶斯估计
朴素贝叶斯法与贝叶斯估计是不同的概念
用极大似然估计可能会出现所要估计的概率值为0的情况,采用贝叶斯估计来解决这一问题
条件概率的贝叶斯估计是 P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S i λ P_{\lambda}(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})+\lambda}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})+S_{i}\lambda} Pλ(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)+Siλi=1∑NI(xi(j)=ajl,yi=ck)+λ
式中 λ > 0 \lambda>0 λ>0,常取 λ = 1 \lambda=1 λ=1,这时称为拉普拉斯平滑,显然有
P λ ( X ( j ) = a j l ∣ Y = c k ) > 0 P_{\lambda}(X^{(j)}=a_{jl}|Y=c_{k})>0 Pλ(X(j)=ajl∣Y=ck)>0
∑ l = 1 S j P λ ( X ( j ) = a j l ∣ Y = c k ) = 1 \sum \limits_{l=1}^{S_{j}}P_{\lambda}(X^{(j)}=a_{jl}|Y=c_{k})=1 l=1∑SjPλ(X(j)=ajl∣Y=ck)=1
l = 1 , 2 , . . . , S j , k = 1 , 2 , . . . , K l=1,2,...,S_{j},k=1,2,...,K l=1,2,...,Sj,k=1,2,...,K
表明贝叶斯估计是一种概率分布。同理,先验概率的贝叶斯估计是 P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ , k = 1 , 2 , . . . , K P_{\lambda}(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})+\lambda}{N+K\lambda},k=1,2,...,K Pλ(Y=ck)=N+Kλi=1∑NI(yi=ck)+λ,k=1,2,...,K
朴素贝叶斯算法
输入:训练数据 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i = ( x i ( 1 ) , x i ( 2 ) , . . . , x i ( N ) ) x_{i}=(x_{i}^{(1)},x_{i}^{(2)},...,x_{i}^{(N)}) xi=(xi(1),xi(2),...,xi(N)), x i ( j ) x_{i}^{(j)} xi(j)是第 i i i个样本的第 j j j个特征, x i ( j ) ∈ { a j 1 , a j 2 , . . . , a j S j } x_{i}^{(j)}\in \{a_{j1},a_{j2},...,a_{jS_{j}}\} xi(j)∈{aj1,aj2,...,ajSj}, a j l a_{jl} ajl是第 j j j个特征可能取的第 l l l个值, j = 1 , 2 , . . . , n , l = 1 , 2 , . . . , S j , y i ∈ { c 1 , c 2 , . . . , c K } j=1,2,...,n,l=1,2,...,S_{j},y_{i}\in\{c_{1},c_2,...,c_K\} j=1,2,...,n,l=1,2,...,Sj,yi∈{c1,c2,...,cK};实例 x x x;
输出:实例 x x x的分类
(1)计算先验概率及条件概率
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , . . . , K P(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}{N},k=1,2,...,K P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})} P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)
j = 1 , 2 , . . . , n ; l = 1 , 2 , . . . , S j ; k = 1 , 2 , . . . , K j=1,2,...,n;l=1,2,...,S_{j};k=1,2,...,K j=1,2,...,n;l=1,2,...,Sj;k=1,2,...,K
(2)对于给定的实例 x = ( x ( 1 ) , x ( 2 ) , . . . , x ( N ) ) x=(x^{(1)},x^{(2)},...,x^{(N)}) x=(x(1),x(2),...,x(N)),计算 P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) , k = 1 , 2 , . . . , K P(Y=c_{k})\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k}),k=1,2,...,K P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck),k=1,2,...,K
(3)确定实例 x x x的类 y = a r g max c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=arg\max \limits{_{c_{k}}P(Y=c_{k})}\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_{k}) y=argmaxckP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
习题4.1
习题:用极大似然估计法推出朴素贝叶斯中的概率估计公式
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , . . . , K P(Y=c_{k})=\frac{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})}{N},k=1,2,...,K P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})} P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)
解答:把 P ( Y = c k ) , P ( X ( j ) = a j l ∣ Y = c k ) P(Y=c_{k}),P(X^{(j)}=a_{jl}|Y=c_{k}) P(Y=ck),P(X(j)=ajl∣Y=ck)当做参数, ∑ k = 1 K P ( y = c k ) = 1 \sum \limits_{k=1}^{K}P(y=c_k)=1 k=1∑KP(y=ck)=1作为约束条件来求解参数值
由假设可知: P ( y ) = ∏ k = 1 K P ( y = c k ) I ( y = c k ) P(y)=\prod \limits_{k=1}^{K}P(y=c_{k})^{I(y=c_{k})} P(y)=k=1∏KP(y=ck)I(y=ck) ,
P ( x ∣ y = c k ) = ∏ j = 1 n P ( x ( j ) ∣ y = c k ) = ∏ j = 1 n ∏ l = 1 S j P ( x ( j ) = a j l ∣ y = c k ) I ( x ( j ) = a j l , y = c k ) P(x|y=c_k)=\prod \limits_{j=1}^{n}P(x^{(j)}|y=c_{k})=\prod \limits_{j=1}^{n} \prod \limits_{l=1}^{S_j}P(x^{(j)}=a_{jl}|y=c_{k})^{I(x^{(j)}=a_{jl},y=c_k)} P(x∣y=ck)=j=1∏nP(x(j)∣y=ck)=j=1∏nl=1∏SjP(x(j)=ajl∣y=ck)I(x(j)=ajl,y=ck)
令 φ = { P ( Y = c k ) , P ( X ( j ) = a j l ∣ Y = c k ) } \varphi = \{P(Y=c_{k}),P(X^{(j)}=a_{jl}|Y=c_{k})\} φ={P(Y=ck),P(X(j)=ajl∣Y=ck)},对数似然函数为:
L ( φ ) = l o g ∏ i = 1 N P ( x i , y i ; φ ) = l o g ∏ i = 1 N P ( x i ∣ y i ; φ ) P ( y i ; φ ) = l o g ∏ i = 1 N ∏ j = 1 n P ( x i ( j ) ∣ y i ; φ ) P ( y i ; φ ) = ∑ i = 1 N ( P ( y i ; φ ) + ∑ j = 1 n P ( x i ( j ) ∣ y i ; φ ) ) = ∑ i = 1 N [ ∑ k = 1 K l o g P ( y = c k ) I ( y i = c k ) + ∑ j = 1 n ∑ l = 1 S j ∑ k = 1 K l o g P ( x i ( j ) = a j l ∣ y i = c k ) I ( x i ( j ) = a j l , y i = c k ) ] = ∑ i = 1 N [ ∑ k = 1 K I ( y i = c k ) l o g P ( y = c k ) + ∑ j = 1 n ∑ l = 1 S j ∑ k = 1 K I ( x i ( j ) = a j l , y i = c k ) l o g P ( x i ( j ) = a j l ∣ y i = c k ) ] \begin{aligned} L(\varphi)&=log\prod \limits_{i=1}^{N}P(x_i,y_i;\varphi)=log\prod \limits_{i=1}^{N}P(x_i|y_i;\varphi)P(y_{i};\varphi)\\ &=log\prod \limits_{i=1}^{N} \prod \limits_{j=1}^{n}P(x_i^{(j)}|y_i;\varphi)P(y_{i};\varphi)\\ &=\sum \limits_{i=1}^{N} (P(y_{i};\varphi) + \sum \limits_{j=1}^{n}P(x_i^{(j)}|y_i;\varphi))\\ &=\sum \limits_{i=1}^{N} [\sum \limits_{k=1}^{K}logP(y=c_k)^{I(y_i=c_k)} + \sum \limits_{j=1}^{n} \sum \limits_{l=1}^{S_j}\sum \limits_{k=1}^{K}log P(x_i^{(j)}=a_{jl}|y_i=c_k)^{I(x_i^{(j)}=a_{jl},y_i=c_k)}]\\ &=\sum \limits_{i=1}^{N} [\sum \limits_{k=1}^{K}{I(y_i=c_k)}logP(y=c_k) + \sum \limits_{j=1}^{n} \sum \limits_{l=1}^{S_j}\sum \limits_{k=1}^{K}{I(x_i^{(j)}=a_{jl},y_i=c_k)}logP(x_i^{(j)}=a_{jl}|y_i=c_k)] \end{aligned} L(φ)=logi=1∏NP(xi,yi;φ)=logi=1∏NP(xi∣yi;φ)P(yi;φ)=logi=1∏Nj=1∏nP(xi(j)∣yi;φ)P(yi;φ)=i=1∑N(P(yi;φ)+j=1∑nP(xi(j)∣yi;φ))=i=1∑N[k=1∑KlogP(y=ck)I(yi=ck)+j=1∑nl=1∑Sjk=1∑KlogP(xi(j)=ajl∣yi=ck)I(xi(j)=ajl,yi=ck)]=i=1∑N[k=1∑KI(yi=ck)logP(y=ck)+j=1∑nl=1∑Sjk=1∑KI(xi(j)=ajl,yi=ck)logP(xi(j)=ajl∣yi=ck)]
关于第一个参数 P ( Y = c k ) P(Y=c_{k}) P(Y=ck)求导: ∂ L ( φ ) ∂ P ( y = c k ) = ∂ ∂ P ( y = c k ) ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) l o g P ( y = c k ) \frac {\partial {L(\varphi)}}{\partial P(y=c_k)}=\frac {\partial}{\partial P(y=c_k)}\sum \limits_{i=1}^{N}\sum \limits_{k=1}^{K}{I(y_i=c_k)}logP(y=c_k) ∂P(y=ck)∂L(φ)=∂P(y=ck)∂i=1∑Nk=1∑KI(yi=ck)logP(y=ck)
由约束条件可知: P ( y = c K ) = 1 − ∑ k = 1 K − 1 P ( y = c k ) P(y=c_K)=1-\sum \limits_{k=1}^{K-1}P(y=c_k) P(y=cK)=1−k=1∑K−1P(y=ck)
⇒ ∂ L ( φ ) ∂ P ( y = c k ) = ∂ ∂ P ( y = c k ) ∑ i = 1 N [ ∑ k = 1 K − 1 I ( y i = c k ) l o g P ( y = c k ) + I ( y i = c K ) l o g P ( y = c K ) ] = ∂ ∂ P ( y = c k ) ∑ i = 1 N [ ∑ k = 1 K − 1 I ( y i = c k ) l o g P ( y = c k ) + I ( y i = c K ) l o g ( 1 − ∑ k = 1 K − 1 P ( y = c k ) ) ] \Rightarrow\frac {\partial {L(\varphi)}}{\partial P(y=c_k)}=\frac {\partial}{\partial P(y=c_k)}\sum \limits_{i=1}^{N}[\sum \limits_{k=1}^{K-1}{I(y_i=c_k)}logP(y=c_k)+I(y_i=c_K)logP(y=c_K)]\\ =\frac {\partial}{\partial P(y=c_k)}\sum \limits_{i=1}^{N}[\sum \limits_{k=1}^{K-1}{I(y_i=c_k)}logP(y=c_k)+I(y_i=c_K)log(1-\sum \limits_{k=1}^{K-1}P(y=c_k))] ⇒∂P(y=ck)∂L(φ)=∂P(y=ck)∂i=1∑N[k=1∑K−1I(yi=ck)logP(y=ck)+I(yi=cK)logP(y=cK)]=∂P(y=ck)∂i=1∑N[k=1∑K−1I(yi=ck)logP(y=ck)+I(yi=cK)log(1−k=1∑K−1P(y=ck))]
先来求 P ( y = c 1 ) P(y=c_1) P(y=c1)的估计值:
0 = ∂ ∂ P ( y = c 1 ) ∑ i = 1 N [ ∑ k = 1 K − 1 I ( y i = c k ) l o g P ( y = c k ) + I ( y i = c K ) l o g ( 1 − ∑ k = 1 K − 1 P ( y = c k ) ) ] = ∑ i = 1 N [ I ( y i = c 1 ) P ( y = c 1 ) − I ( y i = c K ) 1 − ∑ a = 1 K − 1 P ( y = c a ) ] = ∑ i = 1 N [ I ( y i = c 1 ) P ( y = c 1 ) − I ( y i = c K ) P ( y = c K ) ] \begin{aligned} 0&=\frac {\partial}{\partial P(y=c_1)}\sum \limits_{i=1}^{N}[\sum \limits_{k=1}^{K-1}{I(y_i=c_k)}logP(y=c_k)+I(y_i=c_K)log(1-\sum \limits_{k=1}^{K-1}P(y=c_k))]\\ &=\sum \limits_{i=1}^{N}[\frac{I(y_i=c_1)}{P(y=c_1)}-\frac{I(y_i=c_K)}{1-\sum\limits_{a=1}^{K-1}P(y=c_a)}]\\ &=\sum \limits_{i=1}^{N}[\frac{I(y_i=c_1)}{P(y=c_1)}-\frac{I(y_i=c_K)}{P(y=c_K)}] \end{aligned} 0=∂P(y=c1)∂i=1∑N[k=1∑K−1I(yi=ck)logP(y=ck)+I(yi=cK)log(1−k=1∑K−1P(y=ck))]=i=1∑N[P(y=c1)I(yi=c1)−1−a=1∑K−1P(y=ca)I(yi=cK)]=i=1∑N[P(y=c1)I(yi=c1)−P(y=cK)I(yi=cK)]
P ( y = c K ) P(y=c_K) P(y=cK)在此为由 P ( y = c 1 ) , P ( y = c 2 ) , . . . , P ( y = c K − 1 ) P(y=c_1),P(y=c_2),...,P(y=c_{K-1}) P(y=c1),P(y=c2),...,P(y=cK−1)决定的一个值
∑ i = 1 N [ I ( y i = c 1 ) P ( y = c 1 ) − I ( y i = c K ) P ( y = c K ) ] = 0 \begin{aligned} \sum \limits_{i=1}^{N}[\frac{I(y_i=c_1)}{P(y=c_1)}-\frac{I(y_i=c_K)}{P(y=c_K)}]=0 \\ \end{aligned} i=1∑N[P(y=c1)I(yi=c1)−P(y=cK)I(yi=cK)]=0 ⇒ P ( y = c K ) ∑ i = 1 N I ( y i = c 1 ) − P ( y = c 1 ) ∑ i = 1 N I ( y i = c K ) = 0 \begin{aligned} \Rightarrow P(y=c_K)\sum \limits_{i=1}^{N}I(y_i=c_1)-P(y=c_1)\sum \limits_{i=1}^{N}I(y_i=c_K)=0\\ \end{aligned} ⇒P(y=cK)i=1∑NI(yi=c1)−P(y=c1)i=1∑NI(yi=cK)=0 P ( y = c 1 ) = ∑ i = 1 N I ( y i = c 1 ) ∑ i = 1 N I ( y i = c K ) P ( y = c K ) P ( y = c 2 ) = ∑ i = 1 N I ( y i = c 2 ) ∑ i = 1 N I ( y i = c K ) P ( y = c K ) . . . . . . P ( y = c K ) = ∑ i = 1 N I ( y i = c K ) ∑ i = 1 N I ( y i = c K ) P ( y = c K ) \begin{aligned} P(y=c_1) &= \frac {\sum \limits_{i=1}^{N}I(y_i=c_1)}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K)\\ P(y=c_2) &= \frac {\sum \limits_{i=1}^{N}I(y_i=c_2)}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K)\\ &...... \\ P(y=c_K) &= \frac {\sum \limits_{i=1}^{N}I(y_i=c_K)}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K) \end{aligned} P(y=c1)P(y=c2)P(y=cK)=i=1∑NI(yi=cK)i=1∑NI(yi=c1)P(y=cK)=i=1∑NI(yi=cK)i=1∑NI(yi=c2)P(y=cK)......=i=1∑NI(yi=cK)i=1∑NI(yi=cK)P(y=cK)
累加上式 P ( y = c 1 ) , P ( y = c 2 ) , . . . , P ( y = c K ) P(y=c_1),P(y=c_2),...,P(y=c_K) P(y=c1),P(y=c2),...,P(y=cK)得到:
P ( y = c 1 ) + P ( y = c 2 ) + . . . + P ( y = c K ) = N ∑ i = 1 N I ( y i = c K ) P ( y = c K ) P(y=c_1)+P(y=c_2)+...+P(y=c_K)=\frac{N}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K) P(y=c1)+P(y=c2)+...+P(y=cK)=i=1∑NI(yi=cK)NP(y=cK)
⇒ 1 = N ∑ i = 1 N I ( y i = c K ) P ( y = c K ) \Rightarrow 1=\frac{N}{\sum \limits_{i=1}^{N}I(y_i=c_K)} P(y=c_K) ⇒1=i=1∑NI(yi=cK)NP(y=cK)
⇒ P ( y = c K ) = ∑ i = 1 N I ( y i = c K ) N \Rightarrow P(y=c_K)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_K)} {N} ⇒P(y=cK)=Ni=1∑NI(yi=cK)
同理可得: P ( y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , . . , K P(y=c_k)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_k)} {N},k=1,2,..,K P(y=ck)=Ni=1∑NI(yi=ck),k=1,2,..,K
同理对 P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)}=a_{jl}|Y=c_{k}) P(X(j)=ajl∣Y=ck)求导,可得 P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_{k})=\frac {\sum \limits_{i=1}^{N}I(x_{i}^{(j)}=a_{jl},y_{i}=c_{k})}{\sum \limits_{i=1}^{N}I(y_{i}=c_{k})} P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck)