zoukankan      html  css  js  c++  java
  • Spark记录-Spark-Shell客户端操作读取Hive数据

    1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下

    2.开启hive元数据服务:hive  --service metastore

    3.开启hadoop服务:sh  $HADOOP_HOME/sbin/start-all.sh

    4.开启spark服务:sh $SPARK_HOME/sbin/start-all.sh

    5.进入spark-shell:spark-shell

    6.scala操作hive(spark-sql)

    scala>val conf=new SparkConf().setAppName("SparkHive").setMaster("local")   //可忽略,已经自动创建了

    scala>val sc=new SparkContext(conf)  //可忽略,已经自动创建了

    scala>val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

    scala>sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' ")//这里需要注意数据的间隔符

    scala>sqlContext.sql("LOAD DATA INPATH '/user/spark/src.txt' INTO TABLE src ");

    scala>sqlContext.sql(" SELECT * FROM src").collect().foreach(println)

    scala>sc.stop()

    SQL context available as sqlContext.
    
    scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
    17/12/05 10:38:51 INFO HiveContext: Initializing execution hive, version 1.2.1
    17/12/05 10:38:51 INFO ClientWrapper: Inspected Hadoop version: 2.4.0
    17/12/05 10:38:51 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:38:51 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:38:51 INFO metastore: Mestastore configuration hive.metastore.warehouse.dir changed from file:/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore to file:/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore
    17/12/05 10:38:51 INFO metastore: Mestastore configuration javax.jdo.option.ConnectionURL changed from jdbc:derby:;databaseName=/tmp/spark-ecfcdcc1-2bb0-4efc-aa00-96ad1dd47840/metastore;create=true to jdbc:derby:;databaseName=/tmp/spark-ea48b58b-ef90-43c0-8d5e-f54a4b4cadde/metastore;create=true
    17/12/05 10:38:51 INFO HiveMetaStore: 0: Shutting down the object store...
    17/12/05 10:38:51 INFO audit: ugi=root	ip=unknown-ip-addr	cmd=Shutting down the object store...	
    17/12/05 10:38:51 INFO HiveMetaStore: 0: Metastore shutdown complete.
    17/12/05 10:38:51 INFO audit: ugi=root	ip=unknown-ip-addr	cmd=Metastore shutdown complete.	
    17/12/05 10:38:51 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    17/12/05 10:38:51 INFO ObjectStore: ObjectStore, initialize called
    17/12/05 10:38:51 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    17/12/05 10:38:51 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:38:56 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:38:56 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:38:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:39:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    17/12/05 10:39:01 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    17/12/05 10:39:01 INFO ObjectStore: Initialized ObjectStore
    17/12/05 10:39:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    17/12/05 10:39:02 INFO HiveMetaStore: Added admin role in metastore
    17/12/05 10:39:02 INFO HiveMetaStore: Added public role in metastore
    17/12/05 10:39:02 INFO HiveMetaStore: No user is added in admin role, since config is empty
    17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/d66a519b-e512-4295-b707-0f688aa238ea_resources
    17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea
    17/12/05 10:39:02 INFO SessionState: Created local directory: /tmp/root/d66a519b-e512-4295-b707-0f688aa238ea
    17/12/05 10:39:02 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/d66a519b-e512-4295-b707-0f688aa238ea/_tmp_space.db
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:39:02 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:39:02 INFO HiveContext: default warehouse location is /user/hive/warehouse
    17/12/05 10:39:02 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    17/12/05 10:39:02 INFO ClientWrapper: Inspected Hadoop version: 2.4.0
    17/12/05 10:39:03 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.4.0
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.metastore.local does not exist
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist
    17/12/05 10:39:07 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
    17/12/05 10:39:08 INFO metastore: Trying to connect to metastore with URI thrift://192.168.66.66:9083
    17/12/05 10:39:08 INFO metastore: Connected to metastore.
    17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/4989df94-ba31-4ef6-ab78-369043e2067e_resources
    17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e
    17/12/05 10:39:10 INFO SessionState: Created local directory: /tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e
    17/12/05 10:39:10 INFO SessionState: Created HDFS directory: /user/hive/tmp/root/4989df94-ba31-4ef6-ab78-369043e2067e/_tmp_space.db
    sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3be94b12
    
    scala> sqlContext.sql("use siat")
    17/12/05 10:39:36 INFO ParseDriver: Parsing command: use siat
    17/12/05 10:39:41 INFO ParseDriver: Parse Completed
    17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:44 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:45 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:45 INFO ParseDriver: Parsing command: use siat
    17/12/05 10:39:49 INFO ParseDriver: Parse Completed
    17/12/05 10:39:50 INFO PerfLogger: </PERFLOG method=parse start=1512441585044 end=1512441590042 duration=4998 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:50 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Semantic Analysis Completed
    17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441590188 end=1512441591560 duration=1372 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    17/12/05 10:39:51 INFO PerfLogger: </PERFLOG method=compile start=1512441584491 end=1512441591758 duration=7267 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Concurrency mode is disabled, not creating a lock manager
    17/12/05 10:39:51 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:51 INFO Driver: Starting command(queryId=root_20171205103945_2f994f07-9e52-456b-97ee-d03e722116ff): use siat
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441584488 end=1512441592212 duration=7724 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO Driver: Starting task [Stage-0:DDL] in serial mode
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=runTasks start=1512441592212 end=1512441592496 duration=284 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441591760 end=1512441592497 duration=737 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO Driver: OK
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592571 end=1512441592571 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441584478 end=1512441592571 duration=8093 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:39:52 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441592612 end=1512441592613 duration=1 from=org.apache.hadoop.hive.ql.Driver>
    res0: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> sqlContext.sql("drop table src")
    17/12/05 10:40:13 INFO ParseDriver: Parsing command: drop table src
    17/12/05 10:40:13 INFO ParseDriver: Parse Completed
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO ParseDriver: Parsing command: DROP TABLE src
    17/12/05 10:40:17 INFO ParseDriver: Parse Completed
    17/12/05 10:40:17 INFO PerfLogger: </PERFLOG method=parse start=1512441617979 end=1512441617998 duration=19 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:17 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Semantic Analysis Completed
    17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441617999 end=1512441619115 duration=1116 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=compile start=1512441617977 end=1512441619116 duration=1139 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Hive: Dumping metastore api call timing information for : compilation phase
    17/12/05 10:40:19 INFO Hive: Total time spent in this metastore function was greater than 1000ms : getTable_(String, String, )=3999
    17/12/05 10:40:19 INFO Driver: Concurrency mode is disabled, not creating a lock manager
    17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Starting command(queryId=root_20171205104017_dd3db388-5058-4af4-9076-90035b4837d9): DROP TABLE src
    17/12/05 10:40:19 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441617977 end=1512441619119 duration=1142 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:40:19 INFO Driver: Starting task [Stage-0:DDL] in serial mode
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=runTasks start=1512441619119 end=1512441664030 duration=44911 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO Hive: Dumping metastore api call timing information for : execution phase
    17/12/05 10:41:04 INFO Hive: Total time spent in this metastore function was greater than 1000ms : dropTable_(String, String, boolean, boolean, boolean, )=44266
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441619118 end=1512441664031 duration=44913 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO Driver: OK
    17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664032 end=1512441664032 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441617976 end=1512441664051 duration=46075 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:04 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441664054 end=1512441664054 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    res1: org.apache.spark.sql.DataFrame = []
    
    scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' ")  
    17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
    17/12/05 10:41:57 INFO ParseDriver: Parse Completed
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
    17/12/05 10:41:57 INFO ParseDriver: Parse Completed
    17/12/05 10:41:57 INFO PerfLogger: </PERFLOG method=parse start=1512441717568 end=1512441717619 duration=51 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:57 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO CalcitePlanner: Starting Semantic Analysis
    17/12/05 10:41:58 INFO CalcitePlanner: Creating table siat.src position=27
    17/12/05 10:41:58 INFO Driver: Semantic Analysis Completed
    17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1512441717619 end=1512441718637 duration=1018 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=compile start=1512441717565 end=1512441718637 duration=1072 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Concurrency mode is disabled, not creating a lock manager
    17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Starting command(queryId=root_20171205104157_e9b5ed54-e7dc-448a-984c-6d5cb37f964f): CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'
    17/12/05 10:41:58 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1512441717565 end=1512441718735 duration=1170 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:41:58 INFO Driver: Starting task [Stage-0:DDL] in serial mode
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=runTasks start=1512441718735 end=1512441721846 duration=3111 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO Hive: Dumping metastore api call timing information for : execution phase
    17/12/05 10:42:01 INFO Hive: Total time spent in this metastore function was greater than 1000ms : createTable_(Table, )=2431
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.execute start=1512441718638 end=1512441721849 duration=3211 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO Driver: OK
    17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721852 end=1512441721882 duration=30 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=Driver.run start=1512441717564 end=1512441721883 duration=4319 from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
    17/12/05 10:42:01 INFO PerfLogger: </PERFLOG method=releaseLocks start=1512441721883 end=1512441721883 duration=0 from=org.apache.hadoop.hive.ql.Driver>
    res2: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> sqlContext.sql("select * from src").collect().foreach(println)
    17/12/05 10:42:54 INFO ParseDriver: Parsing command: select * from src
    17/12/05 10:42:54 INFO ParseDriver: Parse Completed
    17/12/05 10:42:56 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/12/05 10:42:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 467.6 KB, free 142.8 MB)
    17/12/05 10:43:02 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.8 MB)
    17/12/05 10:43:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:02 INFO SparkContext: Created broadcast 0 from collect at <console>:30
    17/12/05 10:43:04 INFO FileInputFormat: Total input paths to process : 0
    17/12/05 10:43:04 INFO SparkContext: Starting job: collect at <console>:30
    17/12/05 10:43:04 INFO DAGScheduler: Job 0 finished: collect at <console>:30, took 0.043396 s
    
    scala> val res=sqlContext.sql("select * from src").collect().foreach(println)
    17/12/05 10:43:25 INFO ParseDriver: Parsing command: select * from src
    17/12/05 10:43:25 INFO ParseDriver: Parse Completed
    17/12/05 10:43:26 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 467.6 KB, free 142.3 MB)
    17/12/05 10:43:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 40.5 KB, free 142.3 MB)
    17/12/05 10:43:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.66.66:36024 (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:27 INFO SparkContext: Created broadcast 1 from collect at <console>:29
    17/12/05 10:43:27 INFO FileInputFormat: Total input paths to process : 0
    17/12/05 10:43:27 INFO SparkContext: Starting job: collect at <console>:29
    17/12/05 10:43:27 INFO DAGScheduler: Job 1 finished: collect at <console>:29, took 0.000062 s
    
    scala> res
    
    scala> val res=sqlContext.sql("select count(*) from src").collect().foreach(println)
    17/12/05 10:43:47 INFO ParseDriver: Parsing command: select count(*) from src
    17/12/05 10:43:47 INFO ParseDriver: Parse Completed
    17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 467.0 KB, free 141.8 MB)
    17/12/05 10:43:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 40.4 KB, free 141.8 MB)
    17/12/05 10:43:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.66.66:36024 (size: 40.4 KB, free: 143.1 MB)
    17/12/05 10:43:48 INFO SparkContext: Created broadcast 2 from collect at <console>:29
    17/12/05 10:43:49 INFO FileInputFormat: Total input paths to process : 0
    17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:49 INFO SparkContext: Starting job: collect at <console>:29
    17/12/05 10:43:49 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.66.66:36024 in memory (size: 40.5 KB, free: 143.2 MB)
    17/12/05 10:43:49 INFO DAGScheduler: Registering RDD 15 (collect at <console>:29)
    17/12/05 10:43:49 INFO DAGScheduler: Got job 2 (collect at <console>:29) with 1 output partitions
    17/12/05 10:43:49 INFO DAGScheduler: Final stage: ResultStage 1 (collect at <console>:29)
    17/12/05 10:43:49 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
    17/12/05 10:43:49 INFO DAGScheduler: Missing parents: List()
    17/12/05 10:43:49 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29), which has no missing parents
    17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 12.0 KB, free 142.7 MB)
    17/12/05 10:43:49 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 6.0 KB, free 142.7 MB)
    17/12/05 10:43:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.66.66:36024 (size: 6.0 KB, free: 143.2 MB)
    17/12/05 10:43:49 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
    17/12/05 10:43:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[18] at collect at <console>:29)
    17/12/05 10:43:49 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
    17/12/05 10:44:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:44:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:44:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:44:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:50 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:45:57 INFO AppClient$ClientEndpoint: Executor added: app-20171205103712-0001/0 on worker-20171204180628-192.168.66.66-7078 (192.168.66.66:7078) with 2 cores
    17/12/05 10:45:57 INFO SparkDeploySchedulerBackend: Granted executor ID app-20171205103712-0001/0 on hostPort 192.168.66.66:7078 with 2 cores, 512.0 MB RAM
    17/12/05 10:45:59 INFO AppClient$ClientEndpoint: Executor updated: app-20171205103712-0001/0 is now RUNNING
    17/12/05 10:46:05 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:46:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:46:35 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    17/12/05 10:46:46 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (xinfang:10363) with ID 0
    17/12/05 10:46:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xinfang, partition 0,PROCESS_LOCAL, 1999 bytes)
    17/12/05 10:46:48 INFO BlockManagerMasterEndpoint: Registering block manager xinfang:34620 with 143.3 MB RAM, BlockManagerId(0, xinfang, 34620)
    17/12/05 10:46:51 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on xinfang:34620 (size: 6.0 KB, free: 143.2 MB)
    17/12/05 10:47:07 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to xinfang:10363
    17/12/05 10:47:08 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 82 bytes
    17/12/05 10:47:14 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 27243 ms on xinfang (1/1)
    17/12/05 10:47:14 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
    17/12/05 10:47:14 INFO DAGScheduler: ResultStage 1 (collect at <console>:29) finished in 204.228 s
    17/12/05 10:47:14 INFO DAGScheduler: Job 2 finished: collect at <console>:29, took 204.785107 s
    [0]
    
    scala> res
    
    scala> sc.stop()
    17/12/05 10:48:32 INFO SparkUI: Stopped Spark web UI at http://192.168.66.66:4041
    17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Shutting down all executors
    17/12/05 10:48:35 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
    17/12/05 10:48:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    17/12/05 10:48:36 INFO MemoryStore: MemoryStore cleared
    17/12/05 10:48:36 INFO BlockManager: BlockManager stopped
    17/12/05 10:48:36 INFO BlockManagerMaster: BlockManagerMaster stopped
    17/12/05 10:48:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    17/12/05 10:48:36 INFO SparkContext: Successfully stopped SparkContext
    
    scala> 17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
    17/12/05 10:48:36 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
    17/12/05 10:48:38 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
    

      

  • 相关阅读:
    Luogu 1080 【NOIP2012】国王游戏 (贪心,高精度)
    Luogu 1314 【NOIP2011】聪明的质检员 (二分)
    Luogu 1315 【NOIP2011】观光公交 (贪心)
    Luogu 1312 【NOIP2011】玛雅游戏 (搜索)
    Luogu 1525 【NOIP2010】关押罪犯 (贪心,并查集)
    Luogu 1514 引水入城 (搜索,动态规划)
    UVA 1394 And Then There Was One / Gym 101415A And Then There Was One / UVAlive 3882 And Then There Was One / POJ 3517 And Then There Was One / Aizu 1275 And Then There Was One (动态规划,思维题)
    Luogu 1437 [HNOI2004]敲砖块 (动态规划)
    Luogu 1941 【NOIP2014】飞扬的小鸟 (动态规划)
    HDU 1176 免费馅饼 (动态规划)
  • 原文地址:https://www.cnblogs.com/xinfang520/p/7985939.html
Copyright © 2011-2022 走看看