zoukankan      html  css  js  c++  java
  • [Spark][Python]PageRank 程序

    PageRank 程序:

    file contents:

    page1 page3
    page2 page1
    page4 page1
    page3 page1
    page4 page2
    page3 page4


    def computeContribs(neighbors,rank):
        for neighbor in neighbors: yield( neighbor, rank/len(neighbors) )

    links = sc.textFile("tst001.txt").map(lambda line: line.split()).map(lambda pages: (pages[0],pages[1]))
    .distinct().groupByKey().persist()

    ranks=links.map(lambda (page,neighbors): (page,1.0) )


    In [4]: for x in range(1):
    ...: print "links count:"+links.count()
    ...: print "ranks count:" ranks.count()


    In [11]: for x in range(3):
    ....: contribs=links.join(ranks).flatMap( lambda (page,(neighbors,rank)): computeContribs(neighbors,rank) )
    ....: ranks=contribs.reduceByKey(lambda v1,v2: v1+v2).map(lambda (page,contrib): (page,contrib*0.85+0.15))
    ....:


    for rank in ranks.collect(): print rank

    (u'page2', 0.394375)
    (u'page3', 1.2619062499999998)
    (u'page4', 0.8820624999999999)
    (u'page1', 1.4616562499999997)

  • 相关阅读:
    OI 知识总览 算法篇 之 动态规划
    LeetCode 16.3Sum Closest
    LeetCode 1.Two sum
    leetCode 15. 3Sum
    leetCode 54. Spiral Matrix
    mybatis(视频)
    mybatis
    spring笔记
    Spring(一)
    Spring(二)
  • 原文地址:https://www.cnblogs.com/gaojian/p/7614711.html
Copyright © 2011-2022 走看看