zoukankan      html  css  js  c++  java
  • LeetCode 393. UTF-8 Validation

    原题链接在这里:https://leetcode.com/problems/utf-8-validation/

    题目:

    A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

    1. For 1-byte character, the first bit is a 0, followed by its unicode code.
    2. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

    This is how the UTF-8 encoding would work:

       Char. number range  |        UTF-8 octet sequence
          (hexadecimal)    |              (binary)
       --------------------+---------------------------------------------
       0000 0000-0000 007F | 0xxxxxxx
       0000 0080-0000 07FF | 110xxxxx 10xxxxxx
       0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
       0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    

    Given an array of integers representing the data, return whether it is a valid utf-8 encoding.

    Note:
    The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

    Example 1:

    data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001.
    
    Return true.
    It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

    Example 2:

    data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100.
    
    Return false.
    The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
    The next byte is a continuation byte which starts with 10 and that's correct.
    But the second continuation byte does not start with 10, so it is invalid.

    题解:

    We need to first check if we are currently in the preCount of previous bytes.

    If not, there are 2 cases:

    First, the current byte is 1-byte, skip.

    Second, the current byte is leading byte of multiple bytes. Calculate how many bytes following and assign it to preCount.

    If current byte is within preCount, then need check if it is starting with 10.

    Note: when checking leading 1, we need to use (num & (1 << 7)) != 0, but not == 1, because it is not 1, it is 10000000.

    Time Complexity: O(n). n = data.length.

    Space: O(1).

    AC Java:

     1 class Solution {
     2     public boolean validUtf8(int[] data) {
     3         if(data == null || data.length == 0){
     4             return true;
     5         }
     6         
     7         int preCount = 0;
     8         int mask1 = 1 << 7;
     9         int mask2 = 1 << 6;
    10         for(int num : data){
    11             if(preCount == 0){
    12                 // 1 - byte
    13                 if((num & mask1) == 0){
    14                     continue;
    15                 }
    16 
    17                 int count = 0;
    18                 int mask = 1 << 7;
    19                 while((num & mask) != 0 && count <= 5){
    20                     count++;
    21                     mask = mask >> 1; 
    22                 }
    23 
    24                 if(count == 1 || count > 4){
    25                     return false;
    26                 }
    27 
    28                 preCount = count - 1;   
    29             }else{
    30                 if(!((num & mask1) != 0 && (num & mask2) == 0)){
    31                     return false;
    32                 }
    33                 
    34                 preCount--;
    35             }
    36         }
    37         
    38         return preCount == 0;
    39     }
    40 }
  • 相关阅读:
    蝶恋花
    JVM解毒——JVM与Java体系结构
    超赞!IDEA 最新版本,支持免打扰和轻量模式!
    SpringBoot 结合 Spring Cache 操作 Redis 实现数据缓存
    神奇的 SQL 之 WHERE 条件的提取与应用
    终于放弃了单调的swagger-ui了,选择了这款神器—knife4j
    Git 高级用法,喜欢就拿去用
    既然有 HTTP 请求,为什么还要用 RPC 调用?
    SpringBoot和Spring到底有没有本质的不同?
    一条简单的更新语句,MySQL是如何加锁的?
  • 原文地址:https://www.cnblogs.com/Dylan-Java-NYC/p/12154530.html
Copyright © 2011-2022 走看看