zoukankan html css js c++ java

ubuntu上spark-1.5 standalone mode 测试

第一步，创建用户uspark

root@hadoop1:~# adduser uspark
Adding user `uspark' ...
Adding new group `uspark' (1002) ...
Adding new user `uspark' (1002) with group `uspark' ...
Creating home directory `/home/uspark' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for uspark
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
root@hadoop1:~#

第二步，配置Java环境变量

uspark@hadoop:~$ java -version
The program 'java' can be found in the following packages:
* default-jre
* gcj-4.8-jre-headless
* openjdk-7-jre-headless
* gcj-4.6-jre-headless
* openjdk-6-jre-headless
Ask your administrator to install one of them
uspark@hadoop:~$ vi .bashrc

在 .bashrc 文件末尾加上

#set Java Environment

export JAVA_HOME=/home/uspark/jdk1.8.0_60

export CLASSPATH=".:$JAVA_HOME/lib/rt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH"

export PATH="$JAVA_HOME/bin:$PATH"

uspark@hadoop:~$ source .bashrc
uspark@hadoop:~$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) Server VM (build 25.60-b23, mixed mode)
uspark@hadoop:~$

第三步，下载spark

打开http://spark.apache.org/downloads.html

复制下作链接

uspark@hadoop:~/backup$ wget http://mirrors.cnnic.cn/apache/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz

下载完成，解压文件

tar xf spark-1.5.1-bin-hadoop2.6.tgz

uspark@liuhy:~$ cd spark-1.5.1-bin-hadoop2.6/
uspark@liuhy:~/spark-1.5.1-bin-hadoop2.6$ ll
total 1068
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:13 bin/
-rw-r--r-- 1 uspark uspark 960539 Oct  8 05:13 CHANGES.txt
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:13 conf/
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:12 data/
-rw-rw-r-- 1 uspark uspark    747 Oct  8 05:23 derby.log
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:12 ec2/
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:13 examples/
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:12 lib/
-rw-r--r-- 1 uspark uspark  50972 Oct  8 05:12 LICENSE
drwxrwxr-x 5 uspark uspark   4096 Oct  8 05:23 metastore_db/
-rw-r--r-- 1 uspark uspark  22559 Oct  8 05:12 NOTICE
drwxr-xr-x 6 uspark uspark   4096 Oct  8 05:12 python/
drwxr-xr-x 3 uspark uspark   4096 Oct  8 05:12 R/
-rw-r--r-- 1 uspark uspark   3593 Oct  8 05:12 README.md
-rw-r--r-- 1 uspark uspark    120 Oct  8 05:12 RELEASE
drwxr-xr-x 2 uspark uspark   4096 Oct  8 05:12 sbin/
uspark@liuhy:~/spark-1.5.1-bin-hadoop2.6$

Interactive Analysis with the Spark Shell

参考http://spark.apache.org/docs/latest/quick-start.html

uspark@liuhy:~/spark-1.5.1-bin-hadoop2.6$ bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_   version 1.5.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
15/10/08 05:23:26 WARN Utils: Your hostname, liuhy resolves to a loopback address: 127.0.1.1; using 192.168.1.112 instead (on interface eth0)
15/10/08 05:23:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/10/08 05:23:30 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.

scala> val tf = sc.textFile("README.md")

tf: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at <console>:21

scala> tf.count

count countApprox countApproxDistinct countByValue

countByValueApprox

scala> tf.count

res2: Long = 98

scala>

scala> val lineWithSpark = tf.filter(_.contains("Spark"))
lineWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at <console>:23

scala> lineWithSpark.first
res5: String = # Apache Spark

scala> lineWithSpark.count
count                 countApprox           countApproxDistinct   countByValue
countByValueApprox

scala> lineWithSpark.count
res6: Long = 18

scala> lineWithSpark.foreach
foreach            foreachPartition   foreachWith

scala> lineWithSpark.foreach(println)
# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
and Spark Streaming for stream processing.
You can find the latest Spark documentation, including a programming
## Building Spark
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
The easiest way to start using Spark is through the Scala shell:
Spark also comes with several sample programs in the `examples` directory.
    ./bin/run-example SparkPi
    MASTER=spark://host:7077 ./bin/run-example SparkPi
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
Hadoop, you must build Spark against the same version that your cluster runs.
for guidance on building a Spark application that works with a particular
in the online documentation for an overview on how to configure Spark.

scala>

over

查看全文

相关阅读:
深入理解计算机系统cp1：存储单位与编码
 25个JavaScript数组方法代码示例
 中间人攻击，HTTPS也可以被碾压
 用了这么久HTTP, 你是否了解Content-Length?
C#证明静态方法中的成员在线程之间是独立的
 ASP.NET Core Web API官方文档（链接）
C#中，隐式转换（implicit）和显式转换（explicit）重载方法，不支持将接口类型作为转换的源类型或目标类型
 ASP.NET Core MVC 和Razor页面中的模型验证（链接）
C#中JSON字符串中的转义字符
 使用文件流，读写网络共享盘

原文地址：https://www.cnblogs.com/ihongyan/p/4859905.html