zoukankan      html  css  js  c++  java
  • SQL Support and Workarounds

    此文章来自官方文档,说明了,对于不支持pg 标准的sql 查询的变通方法,实际使用的时候有很大的指导意义

    As Citus provides distributed functionality by extending PostgreSQL, it is compatible with PostgreSQL constructs. This means that users can use the tools and features that come with the rich and extensible PostgreSQL ecosystem for distributed tables created with Citus.

    Citus supports all SQL queries on distributed tables, with only these exceptions:

    Furthermore, in Multi-tenant Applications when queries are filtered by table Distribution Column to a single tenant then all SQL features work, including the ones above.

    To learn more about PostgreSQL and its features, you can visit the PostgreSQL documentation.

    For a detailed reference of the PostgreSQL SQL command dialect (which can be used as is by Citus users), you can see the SQL Command Reference.

    Workarounds

    Before attempting workarounds consider whether Citus is appropriate for your situation. Citus’ current version works well for real-time analytics and multi-tenant use cases.

    Citus supports all SQL statements in the multi-tenant use-case. Even in the real-time analytics use-cases, with queries that span across nodes, Citus supports the majority of statements. The few types of unsupported queries are listed in Are there any PostgreSQL features not supported by Citus? Many of the unsupported features have workarounds; below are a number of the most useful.

    JOIN a local and a distributed table

    Attempting to execute a JOIN between a local table “local” and a distributed table “dist” causes an error:

    SELECT * FROM local JOIN dist USING (id);
    
    /*
    ERROR:  relation local is not distributed
    STATEMENT:  SELECT * FROM local JOIN dist USING (id);
    ERROR:  XX000: relation local is not distributed
    LOCATION:  DistributedTableCacheEntry, metadata_cache.c:711
    */
    

    Although you can’t join such tables directly, by wrapping the local table in a subquery or CTE you can make Citus’ recursive query planner copy the local table data to worker nodes. By colocating the data this allows the query to proceed.

    -- either
    
    SELECT *
      FROM (SELECT * FROM local) AS x
      JOIN dist USING (id);
    
    -- or
    
    WITH x AS (SELECT * FROM local)
    SELECT * FROM x
    JOIN dist USING (id);
    

    Remember that the coordinator will send the results in the subquery or CTE to all workers which require it for processing. Thus it’s best to either add the most specific filters and limits to the inner query as possible, or else aggregate the table. That reduces the network overhead which such a query can cause. More about this in Subquery/CTE Network Overhead.

    INSERT…SELECT upserts lacking distribution column

    Citus supports INSERT…SELECT…ON CONFLICT statements between co-located tables when the distribution column is among those columns selected and inserted. Also aggregates in the statement must include the distribution column in the GROUP BY clause. Failing to meet these conditions will raise an error:

    ERROR: ON CONFLICT is not supported in INSERT ... SELECT via coordinator
    

    If the upsert is an important operation in your application, the ideal solution is to model the data so that the source and destination tables are co-located, and so that the distribution column can be part of the GROUP BY clause in the upsert statement (if aggregating). However if this is not feasible then the workaround is to materialize the select query in a temporary distributed table, and upsert from there.

    -- workaround for
    -- INSERT INTO dest_table <query> ON CONFLICT <upsert clause>
    
    BEGIN;
    CREATE UNLOGGED TABLE temp_table (LIKE dest_table);
    SELECT create_distributed_table('temp_table', 'tenant_id');
    INSERT INTO temp_table <query>;
    INSERT INTO dest_table SELECT * FROM temp_table <upsert clause>;
    DROP TABLE temp_table;
    END;
    

    Temp Tables: the Workaround of Last Resort

    There are still a few queries that are unsupported even with the use of push-pull execution via subqueries. One of them is running window functions that partition by a non-distribution column.

    Suppose we have a table called github_events, distributed by the column user_id. Then the following window function will not work:

    -- this won't work
    
    SELECT repo_id, org->'id' as org_id, count(*)
      OVER (PARTITION BY repo_id) -- repo_id is not distribution column
      FROM github_events
     WHERE repo_id IN (8514, 15435, 19438, 21692);
    

    There is another trick though. We can pull the relevant information to the coordinator as a temporary table:

    -- grab the data, minus the aggregate, into a local table
    
    CREATE TEMP TABLE results AS (
      SELECT repo_id, org->'id' as org_id
        FROM github_events
       WHERE repo_id IN (8514, 15435, 19438, 21692)
    );
    
    -- now run the aggregate locally
    
    SELECT repo_id, org_id, count(*)
      OVER (PARTITION BY repo_id)
      FROM results;
    

    Creating a temporary table on the coordinator is a last resort. It is limited by the disk size and CPU of the node.

  • 相关阅读:
    (原创)(二)作为测试负责人测试过程监控中关注的度量数据
    国产免费非开源测试管理软件MYPM 零配置安装过程
    (原创)存在于大多数小公司的测试管理问题
    抨击评价音频播放软件音质的穆伦
    关于C#交互式窗口(C# Shell REPL Interpreter Interactive)
    网上车管所系统更新日志
    SharpDevelop 用来临时在服务器上写Web服务很不错。
    弄到现在才知道网页没有combobox,弄网上的服务器控件不方便,自己用textbox+dropdownlist用CSS组合起一个简单的combobox效果。
    为什么国内的企业不收购WebOS、塞班、Meego?
    未能初始化 AppDomain:/LM/W3SVC/1/Root 服务应用程序不可用
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/9875800.html
Copyright © 2011-2022 走看看