zoukankan      html  css  js  c++  java
  • 序列化框架Avro

    1.Avro介绍

    • Avro是Hadoop中的一个子项目
    • Avro是基于二进制传输高性能的中间件【Hbase和Hive的客户端与服务端的数据传输也采用该工具】
    • Avro可以将数据进行序列化,适用远程和本地大批量数据的交互
    • Avro可以支持对定义的数据结构(Schema)进行动态加载,提高性能

    2.Avro特点

    • 提供了丰富的数据结构类型,8种基本数据类型以及6种复杂数据类型
    • 快速可压缩的二进制形式
    • 提供容器文件用于持久化数据
    • 远程过程调用RPC框架

    入门demo

    1.创建maven工程,导入pom依赖

    <?xml version="1.0" encoding="UTF-8"?>
    
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.blb</groupId>
      <artifactId>Avro</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <name>Avro</name>
      <!-- FIXME change it to the project's website -->
      <url>http://www.example.com</url>
    
      <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.7</maven.compiler.source>
        <maven.compiler.target>1.7</maven.compiler.target>
      </properties>
    
      <dependencies>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>4.11</version>
          <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->
        <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro</artifactId>
          <version>1.8.2</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.avro/avro-tools -->
        <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro-tools</artifactId>
          <version>1.8.2</version>
        </dependency>
        <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro-maven-plugin</artifactId>
          <version>1.8.2</version>
        </dependency>
        <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro-compiler</artifactId>
          <version>1.8.2</version>
        </dependency>
        <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro-ipc</artifactId>
          <version>1.8.2</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-maven-plugin</artifactId>
            <version>1.8.2</version>
            <executions>
              <execution>
                <phase>generate-sources</phase>
                <goals>
                  <goal>schema</goal>
                </goals>
                <configuration>
                  <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                  <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                </configuration>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>1.8</source>
              <target>1.8</target>
            </configuration>
          </plugin>
        </plugins>
        <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
          <plugins>
            <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
            <plugin>
              <artifactId>maven-clean-plugin</artifactId>
              <version>3.1.0</version>
            </plugin>
            <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
            <plugin>
              <artifactId>maven-resources-plugin</artifactId>
              <version>3.0.2</version>
            </plugin>
            <plugin>
              <artifactId>maven-compiler-plugin</artifactId>
              <version>3.8.0</version>
            </plugin>
            <plugin>
              <artifactId>maven-surefire-plugin</artifactId>
              <version>2.22.1</version>
            </plugin>
            <plugin>
              <artifactId>maven-jar-plugin</artifactId>
              <version>3.0.2</version>
            </plugin>
            <plugin>
              <artifactId>maven-install-plugin</artifactId>
              <version>2.5.2</version>
            </plugin>
            <plugin>
              <artifactId>maven-deploy-plugin</artifactId>
              <version>2.8.2</version>
            </plugin>
            <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
            <plugin>
              <artifactId>maven-site-plugin</artifactId>
              <version>3.7.1</version>
            </plugin>
            <plugin>
              <artifactId>maven-project-info-reports-plugin</artifactId>
              <version>3.0.0</version>
            </plugin>
          </plugins>
        </pluginManagement>
      </build>
    </project>

    2.在指定的目录下创建User.avsc文件

    {
      "namespace": "com.blb",
      "type": "record",
      "name": "User",
      "fields": [
        {"name": "name","type": "string"},
        {"name": "id","type": "int"},
        {"name": "salary","type": "int"},
        {"name": "age","type": "int"},
        {"name": "address","type": "string"}
      ]
    }

    3.使用avro-maven插件为avsc文件生成Java类

     

    4.序列化示例

    /**
     * 序列化测试
     */
    @Test
    public void write(){
        // 初始化User对象
        User user = new User("张三",1,6500,25,"云南");
        User user1 = new User("李四",2,7000,20,"湖北");
    
        DatumWriter<User> dw = new SpecificDatumWriter<>(User.class);
        DataFileWriter<User> dfw = new DataFileWriter<>(dw);
    
        //创建底层的文件输出通道
        //schma - 序列化类的模式
        //path - 文件路径
        try {
            dfw.create(user.getSchema(),new File("G://hadoop//Avro//src//test//user.txt"));
            dfw.append(user);
            dfw.append(user1);
            dfw.close();
        } catch (IOException e) {
            System.out.println("找不到指定文件");
        }
    }

    5.反序列化测试

    /**
     * 反序列化测试
     */
    @Test
    public void read(){
        DatumReader<User> dr = new SpecificDatumReader<>(User.class);
        try {
            DataFileReader dfr = new DataFileReader<User>(new File("G://hadoop//Avro//src//test//user.txt"), dr);
            // 通过迭代器的方式,迭代出对象数据
            while(dfr.hasNext()){
                System.out.println(dfr.next());
            }
        } catch (IOException e) {
            System.out.println("找不到指定文件");
        }
    
    }

  • 相关阅读:
    USACO 4.1 Fence Rails
    POJ 1742
    LA 2031
    uva 10564
    poj 3686
    LA 3350
    asp.net MVC 3多语言方案--再次写, 配源码
    使用Log4net记录日志
    在C#用HttpWebRequest中发送GET/HTTP/HTTPS请求
    为什么要使用反射机制
  • 原文地址:https://www.cnblogs.com/IT_CH/p/12690481.html
Copyright © 2011-2022 走看看