zoukankan      html  css  js  c++  java
  • 非负矩阵分解的两种方法简析

    一、使用非负最小二乘法

    Non-negative matrix factorisation using non-negative least squares

    问题

    给定一个矩阵(A),将其分解成两个非负的因子:

    [A_{M imes N} approx W_{M imes K} imes H_{K imes N}, such that space W_{M imes K} geq 0 space and space H_{K imes N} geq 0 ]

    解法

    我们的解决方法包含两个步骤。首先,在 A 给定的情况下固定 W 然后求解 H。接下来固定 H 来求解 W。迭代的重复这个过程,求解的方法就是最小二乘法,所以这种方法也叫做交替最小二乘法(ALS)。但是我们的问题有特殊性,那就是我们将 W 和 H 约束位非负的,所以我们用非负最小二乘(NNLS)来代替最小二乘。

    代码示例

    import numpy as np
    from scipy.optimize import nnls
    
    M, N = 20, 10
    K = 4
    
    np.random.seed(2019)
    A_orig = np.abs(np.random.uniform(low=0.0, high=1.0, size=(M,N)))
    
    A = A_orig.copy()
    # 在实际问题中常会出现 A 中有缺失值的情况,特别是在协同过滤的问题中
    A[0, 0] = np.NAN
    A[3, 1] = np.NAN
    A[6, 3] = np.NAN
    A[3, 6] = np.NAN
    
    W = np.abs(np.random.uniform(low=0, high=1, size=(M, K)))
    H = np.abs(np.random.uniform(low=0, high=1, size=(K, N)))
    
    def cost(A, W, H):
        # 计算代价函数时忽略 A 中缺失的元素
        mask = ~np.isnan(A)
        WH = np.dot(W, H)
        WH_mask = WH[mask] # Now WH_mask is a vector, only include the non-nan values
        A_mask = A[mask]
        A_WH_mask = A_mask-WH_mask
        return np.linalg.norm(A_WH_mask, 2)
    
    num_iter = 1000
    
    for i in range(num_iter):
        if i%2 ==0:
            # 固定 W 求解 H
            for j in range(N): # 注意 H 是 一列一列的求
                mask_rows = ~np.isnan(A[:,j])
                H[:,j] = nnls(W[mask_rows], A[:,j][mask_rows])[0]
        else:
            # 固定 H 求解 W
            for j in range(M): # W 是一行一行的求
                mask_rows = ~np.isnan(A[j,:])
                W[j,:] = nnls(H.T[mask_rows], A[j,:][mask_rows])[0]
        if i%100 == 0:
            print(i,cost(A,W,H))
    
    

    二、使用TensorFlow

    https://nipunbatra.github.io/blog/2017/nnmf-tensorflow.html

    主要是利用梯度下降的原理

    代码示例

    import tensorflow as tf
    import numpy as np
    
    np.random.seed(2019)
    
    A = np.array([[np.nan, 4, 5, 2],
                  [4, 4, np.nan, 3],
                  [5, 5, 4, 4]], dtype=np.float32).T # 4 users,3 movies
    
    # Boolean mask for computing cost only on non-missing value
    tf_mask = tf.Variable(~np.isnan(A))
    
    shape = A.shape
    A = tf.constant(A)
    
    # latent factors
    rank = 3
    
    H = tf.Variable(np.random.randn(rank,shape[1]).astype(np.float32))
    W = tf.Variable(np.random.randn(shape[0],rank).astype(np.float32))
    
    WH = tf.matmul(W,H)
    
    # Define cost on Frobenius norm
    cost = tf.reduce_sum(tf.pow(tf.boolean_mask(A,tf_mask)
                                - tf.boolean_mask(WH,tf_mask),2))
    
    learning_rate = 0.001
    steps=1000
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    init = tf.global_variables_initializer()
    
    # Clipping operation. This ensure that W and H learnt are non-negative
    clip_W = W.assign(tf.maximum(tf.zeros_like(W),W))
    clip_H = H.assign(tf.maximum(tf.zeros_like(H),H))
    clip = tf.group(clip_W,clip_H)
    
    steps = 1000
    with tf.Session() as sess:
        sess.run(init)
        for i in range(steps):
            sess.run(train_step)
            sess.run(clip)
            if i%100==0:
                print("Cost: ",sess.run(cost))
        learnt_W = sess.run(W)
        learnt_H = sess.run(H)
    
    
  • 相关阅读:
    Dynamics AX 2012 R2 配置E-Mail模板
    Dynamics AX 2012 R2 设置E-Mail
    Dynamics AX 2012 R2 为运行失败的批处理任务设置预警
    Dynamics AX 2012 R2 耗尽用户
    Dynamics AX 2012 R2 创建一个专用的批处理服务器
    Dynamics AX 2012 R2 创建一个带有负载均衡的服务器集群
    Dynamics AX 2012 R2 安装额外的AOS
    Dynamics AX 2012 R2 将系统用户账号连接到工作人员记录
    Dynamics AX 2012 R2 从代码中调用SSRS Report
    Dynamics AX 2012 R2 IIS WebSite Unauthorized 401
  • 原文地址:https://www.cnblogs.com/ZeroTensor/p/10262704.html
Copyright © 2011-2022 走看看