zoukankan      html  css  js  c++  java
  • Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

    先来看一下使用流程:
    1)拿到DataModel
    2)定义相似度计算模型 PearsonCorrelationSimilarity
    3)定义用户邻域计算模型 NearestNUserNeighborhood
    4)定义推荐模型 GenericUserBasedRecommender
    5)进行推荐
      @Test
      public void testHowMany() throws Exception {
        DataModel dataModel = getDataModel(
                new long[] {1, 2, 3, 4, 5},
                new Double[][] {
                        {0.1, 0.2},
                        {0.2, 0.3, 0.3, 0.6},
                        {0.4, 0.4, 0.5, 0.9},
                        {0.1, 0.4, 0.5, 0.8, 0.9, 1.0},
                        {0.2, 0.3, 0.6, 0.7, 0.1, 0.2},
                });
        //用于计算最相似的用户,领域用户
        UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, dataModel);
        
        Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
        List<RecommendedItem> fewRecommended = recommender.recommend(1, 2);
        List<RecommendedItem> moreRecommended = recommender.recommend(1, 4);
        for (int i = 0; i < fewRecommended.size(); i++) {
          assertEquals(fewRecommended.get(i).getItemID(), moreRecommended.get(i).getItemID());
        }
        recommender.refresh(null);
        for (int i = 0; i < fewRecommended.size(); i++) {
          assertEquals(fewRecommended.get(i).getItemID(), moreRecommended.get(i).getItemID());
        }
      }

    相似度计算,参考上篇的PearsonCorrelationSimilarity。

    NearestNUserNeighborhood ,获取最近的N个用户,怎么实现的呢?
    ~/mahout-core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericUserBasedRecommender.java

      @Override
      public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
        Preconditions.checkArgument(howMany >= 1, "howMany must be at least 1");
    
        log.debug("Recommending items for user ID '{}'", userID);
        
        //根据similarity模型进行计算,计算最相似的N个用户
        long[] theNeighborhood = neighborhood.getUserNeighborhood(userID);
    
        if (theNeighborhood.length == 0) {
          return Collections.emptyList();
        }
        //获取其他领域用户进行评分而且当前用户所没有进行评分的Item列表,作为推荐的基本池子 
        FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID);
    
        //获取池子里面,当前用户偏好最高的TopN进行推荐
        TopItems.Estimator<Long> estimator = new Estimator(userID, theNeighborhood);
    
        List<RecommendedItem> topItems = TopItems
            .getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);
    
        log.debug("Recommendations are: {}", topItems);
        return topItems;
      }
    Estimator的实现,是这样的:

      private final class Estimator implements TopItems.Estimator<Long> {
        
        private final long theUserID;
        private final long[] theNeighborhood;
        
        Estimator(long theUserID, long[] theNeighborhood) {
          this.theUserID = theUserID;
          this.theNeighborhood = theNeighborhood;
        }
        
        @Override
        public double estimate(Long itemID) throws TasteException {
          return doEstimatePreference(theUserID, theNeighborhood, itemID);
        }
      }
    }
     
      protected float doEstimatePreference(long theUserID, long[] theNeighborhood, long itemID) throws TasteException {
        //把相似用户对该Item的偏好累加起来,再做平均值,当做当前用户对改Item的偏好
        if (theNeighborhood.length == 0) {
          return Float.NaN;
        }
        DataModel dataModel = getDataModel();
        double preference = 0.0;
        double totalSimilarity = 0.0;
        int count = 0;
        for (long userID : theNeighborhood) {
          if (userID != theUserID) {
            // See GenericItemBasedRecommender.doEstimatePreference() too
            Float pref = dataModel.getPreferenceValue(userID, itemID);
            if (pref != null) {
              double theSimilarity = similarity.userSimilarity(theUserID, userID);
              if (!Double.isNaN(theSimilarity)) {
                preference += theSimilarity * pref;
                totalSimilarity += theSimilarity;
                count++;
              }
            }
          }
        }
        // Throw out the estimate if it was based on no data points, of course, but also if based on
        // just one. This is a bit of a band-aid on the 'stock' item-based algorithm for the moment.
        // The reason is that in this case the estimate is, simply, the user's rating for one item
        // that happened to have a defined similarity. The similarity score doesn't matter, and that
        // seems like a bad situation.
        if (count <= 1) {
          return Float.NaN;
        }
        float estimate = (float) (preference / totalSimilarity);
        if (capper != null) {
          estimate = capper.capEstimate(estimate);
        }
        return estimate;
      }
    总结:
    1)计算最相似的N个用户
    2)从最相似的N个用户中,获取自己没有评分过的Item
    3)预计自己对每个Item的偏好
    4)取偏好最高的N个Item进行推荐



  • 相关阅读:
    原码, 反码, 补码 详解
    位移运算符
    ASP.NET中httpmodules与httphandlers全解析
    MySQL count
    真正的能理解CSS中的line-height,height与line-height
    IfcEvent
    IfcWorkCalendarTypeEnum
    IfcSingleProjectInstance
    转换模型
    IfcTypeProduct
  • 原文地址:https://www.cnblogs.com/zhangqingping/p/4118840.html
Copyright © 2011-2022 走看看