zoukankan      html  css  js  c++  java
  • spark dataframe 类型转换

    读一张表,对其进行二值化特征转换。可以二值化要求输入类型必须double类型,类型怎么转换呢?

    直接利用spark column 就可以进行转换:

     

    DataFrame dataset = hive.sql("select age,sex,race from hive_race_sex_bucktizer ");

    /**

    * 类型转换

    */

    dataset = dataset.select(dataset.col("age").cast(DoubleType).as("age"),dataset.col("sex"),dataset.col("race"));

     

    是不是很简单。想起之前的类型转换做法,遍历并创建另外一个满足类型要求的RDD,然后根据RDD创建Datafame,好复杂!!!!

     

    		JavaRDD<Row> parseDataset =   dataset.toJavaRDD().map(new Function<Row,Row>() {
    
    			@Override
    			public Row call(Row row) throws Exception {
    				System.out.println(row);
    				long age = row.getLong(row.fieldIndex("age"));
    				String sex = row.getAs("sex");
    				String race =row.getAs("race");
    				double raceV  = -1;
    				if("white".equalsIgnoreCase(race)){
    					raceV = 1;
    				} else if("black".equalsIgnoreCase(race)) {
    					raceV = 2;
    				} else if("yellow".equalsIgnoreCase(race)) {
    					raceV = 3;
    				} else if("Asian-Pac-Islander".equalsIgnoreCase(race)) {
    					raceV = 4;
    				}else if("Amer-Indian-Eskimo".equalsIgnoreCase(race)) {
    					raceV = 3;
    				}else {
    					raceV = 0;
    				}
    				
    				return RowFactory.create(age,("male".equalsIgnoreCase(sex)?1:0),raceV);
    			}
    		});
    		
    		StructType schema = new StructType(new StructField[]{
    				 createStructField("_age", LongType, false),
    				  createStructField("_sex", IntegerType, false),
    				  createStructField("_race", DoubleType, false)
    				});
    		
    		DataFrame  df  =  hive.createDataFrame(parseDataset, schema);
    

      不断探索,不断尝试!

     

  • 相关阅读:
    SpringIoC和SpringMVC的快速入门
    Swoole引擎原理的快速入门干货
    Windowns 10打开此电脑缓慢问题的一种解决办法
    CentOS下使用Postfix + Dovecot + Dnsmasq搭建极简局域网邮件系统
    CentOS7.2 创建本地YUM源和局域网YUM源
    CentOS 7.2 安装配置Samba服务器
    Zookeeper 日志输出到指定文件夹
    MySQL索引优化-from 高性能MYSQL
    Transaction事务注解和DynamicDataSource动态数据源切换问题解决
    Redis使用经验之谈
  • 原文地址:https://www.cnblogs.com/likehua/p/6203520.html
Copyright © 2011-2022 走看看