log likelihood相似度python实现

6,040次阅读

共计 724 个字符，预计需要花费 2 分钟才能阅读完成。

在上一篇文章中介绍了log likehood相似度函数，这里在贴上代码，这份代码是参考了mahout代码实现，想看mahout在这个源码实现的可以去看Apache官方源码，也是比较好理解的。

话不多说直接上代码，也是比较简单，熵是非归一化的，区别于常规的熵计算

def entropy(*elements):
    sum = 0
    result = 0.0
    for element in elements:

      result += xLogX(element)
      sum += element
    return xLogX(sum) - result

def xLogX(x)->float:
    return  0.0 if x==0 else  x * math.log(x)

def checkargs(*args):
    for x in args:
        if x<0: raise ValueError 
def logLikelihoodRatio(k11, k12,k21,k22)->float:
    checkargs(k11,k12,k21,k22)
    #note that we have counts here, not probabilities, and that the entropy is not normalized.
    rowEntropy = entropy(k11 + k12, k21 + k22);
    columnEntropy = entropy(k11 + k21, k12 + k22);
    matrixEntropy = entropy(k11, k12, k21, k22);
    if rowEntropy + columnEntropy < matrixEntropy:
      #round off error
      return 0.0
    return 2.0 * (rowEntropy + columnEntropy - matrixEntropy)

正文完

请博主喝杯咖啡吧！