配好了Spark集群后,先用pyspark写了两个小例子,但是发现Tab键没有提示,于是打算转到scala上试试,在spark-shell下有提示了,但是发现不能退格,而且提示也不是复写,而是追加,这样根本就没法写程序.
解决办法:
1.打开会话选项
2.终端-仿真 在终端中选择Linux
3.映射键 勾选两个选项
4.至此已经成功了,但是如果远程长时间未操作 就会中断连接,下次再操作时需要等待,其实也很影响使用,在这里也附上解决办法(可选)
val lines =sc.textFile("hdfs://alamps:9000/wordcount/input/test.txt")
lines.count()
-----
scala> val lines =sc.textFile("hdfs://alamps:9000/wordcount/input/test.txt")
17/10/13 23:09:24 INFO MemoryStore: ensureFreeSpace(77922) called with curMem=179665, maxMem=280248975
17/10/13 23:09:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 76.1 KB, free 267.0 MB)
17/10/13 23:09:24 INFO MemoryStore: ensureFreeSpace(31262) called with curMem=257587, maxMem=280248975
17/10/13 23:09:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 30.5 KB, free 267.0 MB)
17/10/13 23:09:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:41619 (size: 30.5 KB, free: 267.2 MB)
17/10/13 23:09:24 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
17/10/13 23:09:24 INFO SparkContext: Created broadcast 1 from textFile at <console>:12
lines: org.apache.spark.rdd.RDD[String] = hdfs://alamps:9000/wordcount/input/test.txt MappedRDD[3] at textFile at <console>:12
scala> lines.count()
17/10/13 23:09:45 INFO FileInputFormat: Total input paths to process : 1
17/10/13 23:09:48 INFO SparkContext: Starting job: count at <console>:15
17/10/13 23:09:48 INFO DAGScheduler: Got job 0 (count at <console>:15) with 1 output partitions (allowLocal=false)
17/10/13 23:09:48 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15)
17/10/13 23:09:48 INFO DAGScheduler: Parents of final stage: List()
17/10/13 23:09:48 INFO DAGScheduler: Missing parents: List()
17/10/13 23:09:48 INFO DAGScheduler: Submitting Stage 0 (hdfs://alamps:9000/wordcount/input/test.txt MappedRDD[3] at textFile at <console>:12), which has no missing parents
17/10/13 23:09:48 INFO MemoryStore: ensureFreeSpace(2544) called with curMem=288849, maxMem=280248975
17/10/13 23:09:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 267.0 MB)
17/10/13 23:09:48 INFO MemoryStore: ensureFreeSpace(1898) called with curMem=291393, maxMem=280248975
17/10/13 23:09:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1898.0 B, free 267.0 MB)
17/10/13 23:09:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:41619 (size: 1898.0 B, free: 267.2 MB)
17/10/13 23:09:48 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
17/10/13 23:09:48 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
17/10/13 23:09:48 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (hdfs://alamps:9000/wordcount/input/test.txt MappedRDD[3] at textFile at <console>:12)
17/10/13 23:09:48 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/13 23:09:48 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1307 bytes)
17/10/13 23:09:48 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/13 23:09:49 INFO HadoopRDD: Input split: hdfs://alamps:9000/wordcount/input/test.txt:0+88
17/10/13 23:09:49 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/10/13 23:09:49 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/10/13 23:09:49 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/10/13 23:09:49 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/10/13 23:09:49 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/10/13 23:09:53 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1920 bytes result sent to driver
17/10/13 23:09:53 INFO DAGScheduler: Stage 0 (count at <console>:15) finished in 4.875 s
17/10/13 23:09:53 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4812 ms on localhost (1/1)
17/10/13 23:09:53 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/13 23:09:53 INFO DAGScheduler: Job 0 finished: count at <console>:15, took 5.480197 s
res2: Long = 8
[hadoop@alamps sbin]$ jps
3596 Master
3733 Worker
2558 DataNode
2748 SecondaryNameNode
3814 Jps
2884 ResourceManager
2986 NodeManager
2467 NameNode
[hadoop@alamps sbin]$ hadoop fs -ls /
Found 11 items
drwxr-xr-x - hadoop supergroup 0 2017-10-02 06:29 /aaa
drwxr-xr-x - hadoop supergroup 0 2017-10-06 04:04 /external
drwxr-xr-x - hadoop supergroup 0 2017-10-04 09:14 /flowsum
-rw-r--r-- 1 hadoop supergroup 43 2017-10-02 02:52 /hello.txt
drwxr-xr-x - hadoop supergroup 0 2017-10-04 21:10 /index
-rw-r--r-- 1 hadoop supergroup 143588167 2017-10-01 08:38 /jdk-7u65-linux-i586.tar.gz
drwx------ - hadoop supergroup 0 2017-10-05 22:43 /tmp
drwxr-xr-x - hadoop supergroup 0 2017-10-02 06:18 /upload
drwxr-xr-x - hadoop supergroup 0 2017-10-05 22:44 /user
drwxr-xr-x - hadoop supergroup 0 2017-10-03 06:20 /wc
drwxr-xr-x - hadoop supergroup 0 2017-10-01 09:07 /wordcount
[hadoop@alamps sbin]$ hadoop fs -cat /wordcount
cat: `/wordcount': Is a directory
[hadoop@alamps sbin]$ hadoop fs -ls /wordcount
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-10-01 09:00 /wordcount/input
drwxr-xr-x - hadoop supergroup 0 2017-10-01 09:07 /wordcount/out
[hadoop@alamps sbin]$ hadoop fs -ls /wordcount/input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 88 2017-10-01 09:00 /wordcount/input/test.txt
[hadoop@alamps sbin]$ hadoop fs -cat /wordcount/input/test.txt
hello tom
hello java
hello c
hello python
hello scala
hello spark
hello baby
hello java
[hadoop@alamps sbin]$
val lines =sc.textFile("hdfs://alamps:9000/wordcount/input/test.txt")
lines.count()
-----
scala> val lines =sc.textFile("hdfs://alamps:9000/wordcount/input/test.txt")
17/10/13 23:09:24 INFO MemoryStore: ensureFreeSpace(77922) called with curMem=179665, maxMem=280248975
17/10/13 23:09:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 76.1 KB, free 267.0 MB)
17/10/13 23:09:24 INFO MemoryStore: ensureFreeSpace(31262) called with curMem=257587, maxMem=280248975
17/10/13 23:09:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 30.5 KB, free 267.0 MB)
17/10/13 23:09:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:41619 (size: 30.5 KB, free: 267.2 MB)
17/10/13 23:09:24 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
17/10/13 23:09:24 INFO SparkContext: Created broadcast 1 from textFile at <console>:12
lines: org.apache.spark.rdd.RDD[String] = hdfs://alamps:9000/wordcount/input/test.txt MappedRDD[3] at textFile at <console>:12
scala> lines.count()
17/10/13 23:09:45 INFO FileInputFormat: Total input paths to process : 1
17/10/13 23:09:48 INFO SparkContext: Starting job: count at <console>:15
17/10/13 23:09:48 INFO DAGScheduler: Got job 0 (count at <console>:15) with 1 output partitions (allowLocal=false)
17/10/13 23:09:48 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15)
17/10/13 23:09:48 INFO DAGScheduler: Parents of final stage: List()
17/10/13 23:09:48 INFO DAGScheduler: Missing parents: List()
17/10/13 23:09:48 INFO DAGScheduler: Submitting Stage 0 (hdfs://alamps:9000/wordcount/input/test.txt MappedRDD[3] at textFile at <console>:12), which has no missing parents
17/10/13 23:09:48 INFO MemoryStore: ensureFreeSpace(2544) called with curMem=288849, maxMem=280248975
17/10/13 23:09:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 267.0 MB)
17/10/13 23:09:48 INFO MemoryStore: ensureFreeSpace(1898) called with curMem=291393, maxMem=280248975
17/10/13 23:09:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1898.0 B, free 267.0 MB)
17/10/13 23:09:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:41619 (size: 1898.0 B, free: 267.2 MB)
17/10/13 23:09:48 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
17/10/13 23:09:48 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
17/10/13 23:09:48 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (hdfs://alamps:9000/wordcount/input/test.txt MappedRDD[3] at textFile at <console>:12)
17/10/13 23:09:48 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/13 23:09:48 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 1307 bytes)
17/10/13 23:09:48 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/13 23:09:49 INFO HadoopRDD: Input split: hdfs://alamps:9000/wordcount/input/test.txt:0+88
17/10/13 23:09:49 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/10/13 23:09:49 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/10/13 23:09:49 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/10/13 23:09:49 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/10/13 23:09:49 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/10/13 23:09:53 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1920 bytes result sent to driver
17/10/13 23:09:53 INFO DAGScheduler: Stage 0 (count at <console>:15) finished in 4.875 s
17/10/13 23:09:53 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4812 ms on localhost (1/1)
17/10/13 23:09:53 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/13 23:09:53 INFO DAGScheduler: Job 0 finished: count at <console>:15, took 5.480197 s
res2: Long = 8
[hadoop@alamps sbin]$ jps
3596 Master
3733 Worker
2558 DataNode
2748 SecondaryNameNode
3814 Jps
2884 ResourceManager
2986 NodeManager
2467 NameNode
[hadoop@alamps sbin]$ hadoop fs -ls /
Found 11 items
drwxr-xr-x - hadoop supergroup 0 2017-10-02 06:29 /aaa
drwxr-xr-x - hadoop supergroup 0 2017-10-06 04:04 /external
drwxr-xr-x - hadoop supergroup 0 2017-10-04 09:14 /flowsum
-rw-r--r-- 1 hadoop supergroup 43 2017-10-02 02:52 /hello.txt
drwxr-xr-x - hadoop supergroup 0 2017-10-04 21:10 /index
-rw-r--r-- 1 hadoop supergroup 143588167 2017-10-01 08:38 /jdk-7u65-linux-i586.tar.gz
drwx------ - hadoop supergroup 0 2017-10-05 22:43 /tmp
drwxr-xr-x - hadoop supergroup 0 2017-10-02 06:18 /upload
drwxr-xr-x - hadoop supergroup 0 2017-10-05 22:44 /user
drwxr-xr-x - hadoop supergroup 0 2017-10-03 06:20 /wc
drwxr-xr-x - hadoop supergroup 0 2017-10-01 09:07 /wordcount
[hadoop@alamps sbin]$ hadoop fs -cat /wordcount
cat: `/wordcount': Is a directory
[hadoop@alamps sbin]$ hadoop fs -ls /wordcount
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-10-01 09:00 /wordcount/input
drwxr-xr-x - hadoop supergroup 0 2017-10-01 09:07 /wordcount/out
[hadoop@alamps sbin]$ hadoop fs -ls /wordcount/input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 88 2017-10-01 09:00 /wordcount/input/test.txt
[hadoop@alamps sbin]$ hadoop fs -cat /wordcount/input/test.txt
hello tom
hello java
hello c
hello python
hello scala
hello spark
hello baby
hello java
[hadoop@alamps sbin]$