zoukankan      html  css  js  c++  java
  • Spark记录-Spark-Shell客户端操作读取Hive数据

    1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下

    2.开启hive元数据服务:hive  --service metastore

    3.开启hadoop服务:sh  $HADOOP_HOME/sbin/start-all.sh

    4.开启spark服务:sh $SPARK_HOME/sbin/start-all.sh

    5.进入spark-shell:spark-shell

    6.scala操作hive(spark-sql)

    scala>val conf=new SparkConf().setAppName("SparkHive").setMaster("local")   //可忽略,已经自动创建了

    scala>val sc=new SparkContext(conf)  //可忽略,已经自动创建了

    scala>val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

    scala>sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' ")//这里需要注意数据的间隔符

    scala>sqlContext.sql("LOAD DATA INPATH '/user/spark/src.txt' INTO TABLE src ");

    scala>sqlContext.sql(" SELECT * FROM src").collect().foreach(println)

    scala>sc.stop()

    SQL context available as sqlContext.
    
    scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
    17/12/05 10:38:51 INFO HiveContext: Initializing execution hive, version 1.2.1
    17/12/05 10:38:51 INFO ClientWrapper: Inspected Hadoop version: 2.4.0
    17/12/05 10:38:51 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:38:51 INFO metastore: Mestastore configuration hive.metastore.warehouse.dir changed from file:/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore to file:/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore
    17/12/05 10:38:51 INFO metastore: Mestastore configuration javax.jdo.option.ConnectionURL changed from jdbc:derby:;databaseName=/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore;create=true to jdbc:derby:;databaseName=/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore;create=true
    17/12/05 10:38:51 INFO HiveMetaStore: 0: Shutting down the object store...
    17/12/05 10:38:51 INFO audit: ugi=root	ip=unknown-ip-addr	cmd=Shutting down the object store...	
    17/12/05 10:38:51 INFO HiveMetaStore: 0: Metastore shutdown complete.
    17/12/05 10:38:51 INFO audit: ugi=root	ip=unknown-ip-addr	cmd=Metastore shutdown complete.	
    17/12/05 10:38:51 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    17/12/05 10:38:51 INFO ObjectStore: ObjectStore, initialize called
    17/12/05 10:38:51 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    17/12/05 10:38:51 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:38:56 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:39:01 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    17/12/05 10:39:01 INFO ObjectStore: Initialized ObjectStore
    17/12/05 10:39:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    17/12/05 10:39:02 INFO HiveMetaStore: Added admin role in metastore
    17/12/05 10:39:02 INFO HiveMetaStore: Added public role in metastore
    17/12/05 10:39:02 INFO HiveMetaStore: No user is added in admin role, since config is empty
    17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/d66a519b-e512-4295-b707-0f688aa238ea_resources
    17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea
    17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/root/d66a519b-e512-4295-b707-0f688aa238ea
    17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea/_tmp_space.db
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:39:02 INFO HiveContext: default warehouse location is /user/hive/warehouse
    17/12/05 10:39:02 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    17/12/05 10:39:02 INFO ClientWrapper: Inspected Hadoop version: 2.4.0
    17/12/05 10:39:03 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:39:08 INFO metastore: Trying to connect to metastore with URI thrift://192.168.66.66:9083
    17/12/05 10:39:08 INFO metastore: Connected to metastore.
    17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/4989df94-ba31-4ef6-ab78-369043e2067e_resources
    17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e
    17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e
    17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e/_tmp_space.db
    sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3be94b12
    
    scala> sqlContext.sql("use siat")
    17/12/05 10:39:36 INFO ParseDriver: Parsing command: use siat
    17/12/05 10:39:41 INFO ParseDriver: Parse Completed
    17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:45 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:45 INFO ParseDriver: Parsing command: use siat
    17/12/05 10:39:49 INFO ParseDriver: Parse Completed
    17/12/05 10:39:50 INFO PerfLogger: </PERFLOG method=parse start=1512441585044 end=1512441590042 duration=4998 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:50 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Semantic Analysis Completed
    17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441590188 end=1512441591560 duration=1372 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=compile start=1512441584491 end=1512441591758 duration=7267 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Concurrency mode is disabled, not creating a lock manager
    17/12/05 10:39:51 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Starting command(queryId=root_20171205103945_2f994f07-9e52-456b-97ee-d03e722116ff): use siat
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441584488 end=1512441592212 duration=7724 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO Driver: Starting task [Stage-0:DDL] in serial mode
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=runTasks start=1512441592212 end=1512441592496 duration=284 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441591760 end=1512441592497 duration=737 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO Driver: OK
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592571 end=1512441592571 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441584478 end=1512441592571 duration=8093 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592612 end=1512441592613 duration=1 from=org.apache.hadoop.hive.ql.Driver>
    res0: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> sqlContext.sql("drop table src")
    17/12/05 10:40:13 INFO ParseDriver: Parsing command: drop table src
    17/12/05 10:40:13 INFO ParseDriver: Parse Completed
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO ParseDriver: Parsing command: DROP TABLE src
    17/12/05 10:40:17 INFO ParseDriver: Parse Completed
    17/12/05 10:40:17 INFO PerfLogger: </PERFLOG method=parse start=1512441617979 end=1512441617998 duration=19 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Semantic Analysis Completed
    17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441617999 end=1512441619115 duration=1116 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=compile start=1512441617977 end=1512441619116 duration=1139 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Hive: Dumping metastore api call timing information for : compilation phase
    17/12/05 10:40:19 INFO Hive: Total time spent in this metastore function was greater than 1000ms : getTable_(String, String, )=3999
    17/12/05 10:40:19 INFO Driver: Concurrency mode is disabled, not creating a lock manager
    17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Starting command(queryId=root_20171205104017_dd3db388-5058-4af4-9076-90035b4837d9): DROP TABLE src
    17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441617977 end=1512441619119 duration=1142 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Starting task [Stage-0:DDL] in serial mode
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=runTasks start=1512441619119 end=1512441664030 duration=44911 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO Hive: Dumping metastore api call timing information for : execution phase
    17/12/05 10:41:04 INFO Hive: Total time spent in this metastore function was greater than 1000ms : dropTable_(String, String, boolean, boolean, boolean, )=44266
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441619118 end=1512441664031 duration=44913 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO Driver: OK
    17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664032 end=1512441664032 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441617976 end=1512441664051 duration=46075 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664054 end=1512441664054 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    res1: org.apache.spark.sql.DataFrame = []
    
    scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' ")  
    17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
    17/12/05 10:41:57 INFO ParseDriver: Parse Completed
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
    17/12/05 10:41:57 INFO ParseDriver: Parse Completed
    17/12/05 10:41:57 INFO PerfLogger: </PERFLOG method=parse start=1512441717568 end=1512441717619 duration=51 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO CalcitePlanner: Starting Semantic Analysis
    17/12/05 10:41:58 INFO CalcitePlanner: Creating table siat.src position=27
    17/12/05 10:41:58 INFO Driver: Semantic Analysis Completed
    17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441717619 end=1512441718637 duration=1018 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=compile start=1512441717565 end=1512441718637 duration=1072 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Concurrency mode is disabled, not creating a lock manager
    17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Starting command(queryId=root_20171205104157_e9b5ed54-e7dc-448a-984c-6d5cb37f964f): CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
    17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441717565 end=1512441718735 duration=1170 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Starting task [Stage-0:DDL] in serial mode
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=runTasks start=1512441718735 end=1512441721846 duration=3111 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO Hive: Dumping metastore api call timing information for : execution phase
    17/12/05 10:42:01 INFO Hive: Total time spent in this metastore function was greater than 1000ms : createTable_(Table, )=2431
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441718638 end=1512441721849 duration=3211 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO Driver: OK
    17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721852 end=1512441721882 duration=30 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441717564 end=1512441721883 duration=4319 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721883 end=1512441721883 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    res2: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> sqlContext.sql("select * from src").collect().foreach(println)
    17/12/05 10:42:54 INFO ParseDriver: Parsing command: select * from src
    17/12/05 10:42:54 INFO ParseDriver: Parse Completed
    17/12/05 10:42:56 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/12/05 10:42:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 467.6 KB, free 142.8 MB)
    17/12/05 10:43:02 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.8 MB)
    17/12/05 10:43:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:02 INFO SparkContext: Created broadcast 0 from collect at <console>:30
    17/12/05 10:43:04 INFO FileInputFormat: Total input paths to process : 0
    17/12/05 10:43:04 INFO SparkContext: Starting job: collect at <console>:30
    17/12/05 10:43:04 INFO DAGScheduler: Job 0 finished: collect at <console>:30, took 0.043396 s
    
    scala> val res=sqlContext.sql("select * from src").collect().foreach(println)
    17/12/05 10:43:25 INFO ParseDriver: Parsing command: select * from src
    17/12/05 10:43:25 INFO ParseDriver: Parse Completed
    17/12/05 10:43:26 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 467.6 KB, free 142.3 MB)
    17/12/05 10:43:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.3 MB)
    17/12/05 10:43:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:27 INFO SparkContext: Created broadcast 1 from collect at <console>:29
    17/12/05 10:43:27 INFO FileInputFormat: Total input paths to process : 0
    17/12/05 10:43:27 INFO SparkContext: Starting job: collect at <console>:29
    17/12/05 10:43:27 INFO DAGScheduler: Job 1 finished: collect at <console>:29, took 0.000062 s
    
    scala> res
    
    scala> val res=sqlContext.sql("select count(*) from src").collect().foreach(println)
    17/12/05 10:43:47 INFO ParseDriver: Parsing command: select count(*) from src
    17/12/05 10:43:47 INFO ParseDriver: Parse Completed
    17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 467.0 KB, free 141.8 MB)
    17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 40.4 KB, free 141.8 MB)
    17/12/05 10:43:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.66.66:36024 (size: 40.4 KB, free: 143.1 MB)
    17/12/05 10:43:48 INFO SparkContext: Created broadcast 2 from collect at <console>:29
    17/12/05 10:43:49 INFO FileInputFormat: Total input paths to process : 0
    17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:49 INFO SparkContext: Starting job: collect at <console>:29
    17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:49 INFO DAGScheduler: Registering RDD 15 (collect at <console>:29)
    17/12/05 10:43:49 INFO DAGScheduler: Got job 2 (collect at <console>:29) with 1 output partitions
    17/12/05 10:43:49 INFO DAGScheduler: Final stage: ResultStage 1 (collect at <console>:29)
    17/12/05 10:43:49 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
    17/12/05 10:43:49 INFO DAGScheduler: Missing parents: List()
    17/12/05 10:43:49 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29), which has no missing parents
    17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.0 KB, free 142.7 MB)
    17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.0 KB, free 142.7 MB)
    17/12/05 10:43:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.66.66:36024 (size: 6.0 KB, free: 143.2 MB)
    17/12/05 10:43:49 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
    17/12/05 10:43:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29)
    17/12/05 10:43:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
    17/12/05 10:44:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:44:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:44:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:44:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:57 INFO AppClient$ClientEndpoint: Executor added: app-20171205103712-0001/0 on worker-20171204180628-192.168.66.66-7078 (192.168.66.66:7078) with 2 cores
    17/12/05 10:45:57 INFO SparkDeploySchedulerBackend: Granted executor ID app-20171205103712-0001/0 on hostPort 192.168.66.66:7078 with 2 cores, 512.0 MB RAM
    17/12/05 10:45:59 INFO AppClient$ClientEndpoint: Executor updated: app-20171205103712-0001/0 is now RUNNING
    17/12/05 10:46:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:46:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:46:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:46:46 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (xinfang:10363) with ID 0
    17/12/05 10:46:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xinfang, partition 0,PROCESS_LOCAL, 1999 bytes)
    17/12/05 10:46:48 INFO BlockManagerMasterEndpoint: Registering block manager xinfang:34620 with 143.3 MB RAM, BlockManagerId(0, xinfang, 34620)
    17/12/05 10:46:51 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on xinfang:34620 (size: 6.0 KB, free: 143.2 MB)
    17/12/05 10:47:07 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xinfang:10363
    17/12/05 10:47:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 82 bytes
    17/12/05 10:47:14 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 27243 ms on xinfang (1/1)
    17/12/05 10:47:14 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
    17/12/05 10:47:14 INFO DAGScheduler: ResultStage 1 (collect at <console>:29) finished in 204.228 s
    17/12/05 10:47:14 INFO DAGScheduler: Job 2 finished: collect at <console>:29, took 204.785107 s
    [0]
    
    scala> res
    
    scala> sc.stop()
    17/12/05 10:48:32 INFO SparkUI: Stopped Spark web UI at http://192.168.66.66:4041
    17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Shutting down all executors
    17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
    17/12/05 10:48:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    17/12/05 10:48:36 INFO MemoryStore: MemoryStore cleared
    17/12/05 10:48:36 INFO BlockManager: BlockManager stopped
    17/12/05 10:48:36 INFO BlockManagerMaster: BlockManagerMaster stopped
    17/12/05 10:48:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    17/12/05 10:48:36 INFO SparkContext: Successfully stopped SparkContext
    
    scala> 17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
    17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
    17/12/05 10:48:38 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
    

      

  • 相关阅读:
    35 个手机和Web应用开发图标集
    20个创意404错误页面设计的启示
    31个不同种类的网站免费滑块巨大集合(PSD文件)
    18个伟大的搜索引擎设计者能够找到高品质的免费图标
    50 个独家免费图标集下载
    C语言对结构体何时用> , 何时用.
    GNU make manual 翻译(一)
    PostgreSQL 的 语法分析调用关系
    GNU make manual 翻译(二)
    PostgreSQL 的 target_list分析(七)
  • 原文地址:https://www.cnblogs.com/xinfang520/p/7985939.html
Copyright © 2011-2022 走看看