Spark RDD编程

准备文本文件
从文件创建RDD lines=sc.textFile()
筛选出含某个单词的行 lines.filter()
lambda 参数：条件表达式

>>>lines=sc.textFile("file:///home/hadoop/word.txt")
>>>lines.foreach(print)
>>>lineWithSpark=lines.filter(lambda line:"Spark" in line)
>>>lineWithSpark.foreach(print)

生成单词的列表
从列表创建RDD words=sc.parallelize()
筛选出长度大于2 的单词 words.filter()

>>>wordsList='Spark power a stack of libraries.'.split()
>>>wordsList
>>>wordsRDD=sc.parallelize(wordsList)
>>>wordsRDD.collect()
>>>wordsRDD.filter(lambda word:len(word)>2).collect()

筛选出的单词RDD，映射为（单词，1）键值对。 words.map()

wordsRDD.map(lambda word:(word,1)).collect()

查看全文

相关阅读:
JavaScript模态对话框类
 事件模块的演变（1）
html5中可通过document.head获取head元素
 How to search for just a specific file type in Visual Studio code?
What do 'lazy' and 'greedy' mean in the context of regular expressions?
正则非获取匹配 Lookahead and Lookbehind ZeroLength Assertions
regex length 正则长度问题
 Inversion of Control vs Dependency Injection
How to return View with QueryString in ASP.NET MVC 2?
今天才发现Google Reader

原文地址：https://www.cnblogs.com/shawncs/p/14583229.html