zoukankan html css js c++ java

python spark 求解最大最小平均

rdd = sc.parallelizeDoubles(testData);

Now we’ll calculate the mean of our dataset.

1	LOGGER.info("Mean: " + rdd.mean());

There are similar methods for other statistics operation such as max, standard deviation, …etc.

Every time one of this method is invoked , Spark performs the operation on the entire RDD data. If more than one operations performed, it will repeat again and again which is very inefficient. To solve this, Spark provides “StatCounter” class which executes once and provides results of all basic statistics operations in the same time.

1	StatCounter statCounter = rdd.stats();

Now results can be accessed as follows,

LOGGER.info("Count: " + statCounter.count());

LOGGER.info("Min: " + statCounter.min());

LOGGER.info("Max: " + statCounter.max());

LOGGER.info("Sum: " + statCounter.sum());

LOGGER.info("Mean: " + statCounter.mean());

LOGGER.info("Variance: " + statCounter.variance());

LOGGER.info("Stdev: " + statCounter.stdev());

摘自：http://www.sparkexpert.com/tag/rdd/

查看全文

相关阅读:
使用yield实现一个协成
 串讲-Python基础练习
 Linux练习
 列表生成式
 Jupyter Notebook的快捷键帮助文档
 mysql字段类型
 爬取12306火车票信息
 【Lodop】02 C-Lodop手册阅读上手
 【Lodop】01 Lodop手册阅读上手
 【Redis】06 事务

原文地址：https://www.cnblogs.com/bonelee/p/7154042.html

python spark 求解最大 最小 平均

python spark 求解最大最小平均