zoukankan      html  css  js  c++  java
  • 计算人口平均年龄

    创建实验数据:
    from pyspark import SparkContext
    import random
    OutputFile = "file:///usr/local/spark/mycode/exercise/people"
    sc = SparkContext('local','createPeopleAgeData')
    peopleAge = []
    for i in range(1,1001):
    rand = random.randint(1,100)
    peopleAge.append(str(i)+" "+str(rand))
    RDD = sc.parallelize(peopleAge)
    RDD.saveAsTextFile(OutputFile)

    from pyspark import SparkContext
    #配置sc
    sc = SparkContext('local','CountAverAge')
    #创建RDD 读入数据
    RDD = sc.textFile("file:///usr/local/spark/mycode/exercise/peopleAge.txt")
    #得到数据总条数
    Count =RDD.count()
    #对数据进行切割,只取年龄部分,然后把年龄字符串转成Int,然后用reduce函数累加
    Average = RDD.map(lambda line : line.split(" ")[1]).map(lambda a: int(a)).reduce(lambda a,b :(a+b))
     
    print(Count)
    print(Average)
    print("平均年龄为:{0}".format(Average / Count))
  • 相关阅读:
    概率派VS贝叶斯派
    Numpy-数组array操作
    Numpy基础
    PCA基本原理
    编程语言
    卷积神经网络基础
    IntelliJ IDEA Merge
    Mybatis 问题总结
    Lambda用法
    Map键值对取值, key是在"|"转义, value是在::取值
  • 原文地址:https://www.cnblogs.com/SoftwareBuilding/p/9473533.html
Copyright © 2011-2022 走看看