zoukankan      html  css  js  c++  java
  • [Spark][Python]PageRank 程序

    PageRank 程序:

    file contents:

    page1 page3
    page2 page1
    page4 page1
    page3 page1
    page4 page2
    page3 page4


    def computeContribs(neighbors,rank):
        for neighbor in neighbors: yield( neighbor, rank/len(neighbors) )

    links = sc.textFile("tst001.txt").map(lambda line: line.split()).map(lambda pages: (pages[0],pages[1]))
    .distinct().groupByKey().persist()

    ranks=links.map(lambda (page,neighbors): (page,1.0) )


    In [4]: for x in range(1):
    ...: print "links count:"+links.count()
    ...: print "ranks count:" ranks.count()


    In [11]: for x in range(3):
    ....: contribs=links.join(ranks).flatMap( lambda (page,(neighbors,rank)): computeContribs(neighbors,rank) )
    ....: ranks=contribs.reduceByKey(lambda v1,v2: v1+v2).map(lambda (page,contrib): (page,contrib*0.85+0.15))
    ....:


    for rank in ranks.collect(): print rank

    (u'page2', 0.394375)
    (u'page3', 1.2619062499999998)
    (u'page4', 0.8820624999999999)
    (u'page1', 1.4616562499999997)

  • 相关阅读:
    UVA756
    SP30906
    SP32900
    CF940F
    洛谷P5030
    洛谷P5142
    洛谷P2569
    网络流 24 题做题记录
    矩阵
    二分图
  • 原文地址:https://www.cnblogs.com/gaojian/p/7614711.html
Copyright © 2011-2022 走看看