Hadoop下的word count程序

zoukankan html css js c++ java

Hadoop下的word count程序
接上一个文章，在伪分布式下运行word count程序，相当于一个hello world。

参考官方文档的步骤：（发现官方的步骤已经很好了，就不写我的了）

1 Format a new distributed-filesystem:
$ bin/hadoop namenode -format 格式化分布式文件系统

2 Start the hadoop daemons:
$ bin/start-all.sh 运行，如果不运行，会出现不能连接到主机的错误信息

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
3 Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input 把conf目录上传到分布式系统dfs的input目录

4 Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 运行一个（java）Hadoop程序的命令，这个程序是找字符串的命令，j计数程序的是$ bin/hadoop jar hadoop-examples-*.jar wordcount input output .*代表版本，输命令的时候用tab自动补全就可以了。

Examine the output files:

5 Copy the output files from the distributed filesystem to the local filesytem and examine them: 从dfs中取到本地查看
$ bin/hadoop fs -get output output
$ cat output/*

or

View the output files on the distributed filesystem: 在dfs中查看
$ bin/hadoop fs -cat output/*

6 When you're done, stop the daemons with: 结束
$ bin/stop-all.sh

运行完后，表明你已经安装好伪分布了，接下来，我们来学习用eclipse开发MapReduce程序。请看我下一篇博文

http://www.cnblogs.com/xioyaozi/archive/2012/05/28/2521595.html
查看全文

相关阅读:
函数的有用信息，装饰器 day12
函数名、闭包、装饰器 day11
函数的动态参数与命名空间 day10
函数 day9
集合 day8
文件操作 day8
基础数据类型补充，及capy daty7
day7 回顾
 编码补充 daty 6
字典的增删改查 daty 5

原文地址：https://www.cnblogs.com/xioyaozi/p/2521161.html