zoukankan      html  css  js  c++  java
  • Greenplum 5.21.1 集群安装部署

    简单来说GPDB是一个分布式数据库软件,其可以管理和处理分布在多个不同主机上的海量数据。对于GPDB来说,一个DB实例实际上是由多个独立的PostgreSQL实例组成的,它们分布在不同的物理主机上,协同工作,呈现给用户的是一个DB的效果。Master是GPDB系统的访问入口,其负责处理客户端的连接及SQL 命令、协调系统中的其他Instance(Segment)工作,Segment负责管理和处理用户数据。

    环境准备:
    操作系统:CentOS Linux release 7.6.1810 (Core) 64位
    master 1台(架构图中的主节点),Standby 1台(架构图中的从节点),Segment 2台。共4台服务器。

    一、Master主机 Root 用户上操作

    1. 修改/etc/hosts文件,添加下面内容(注:4台服务器相同的配置)

    vim /etc/hosts

    192.168.18.130 gp-master
    192.168.18.131 gp-standby
    192.168.18.132 gp-node1
    192.168.18.133 gp-node2

    2. 服务器关闭selinux,防火墙4台服务器相互开放,测试环境可以直接先关闭防火墙。(注:4台服务器相同的配置)

    关闭Firewalld

    systemctl stop firewalld
    systemctl disable firewalld

    永久关闭Selinux

    vim /etc/selinux/conf

    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    #     enforcing - SELinux security policy is enforced.
    #     permissive - SELinux prints warnings instead of enforcing.
    #     disabled - No SELinux policy is loaded.
    SELINUX=disabled
    # SELINUXTYPE= can take one of three two values:
    #     targeted - Targeted processes are protected,
    #     minimum - Modification of targeted policy. Only selected processes are protected. 
    #     mls - Multi Level Security protection.
    SELINUXTYPE=targeted 

    注:查看Selinux运行状态:getenforce,CLI界面非永久生效设置SeLinux:setenforce 0(0-1对应关闭和开启)

    3. 操作系统参数设置

     vim /etc/sysctl.conf   (注:4台服务器相同的配置)

    kernel.shmmax = 500000000
    kernel.shmmni = 4096
    kernel.shmall = 4000000000
    kernel.sem = 250 512000 100 2048
    kernel.sysrq = 1
    kernel.core_uses_pid = 1
    kernel.msgmnb = 65536
    kernel.msgmax = 65536
    net.ipv4.tcp_syncookies = 1
    net.ipv4.ip_forward = 0
    net.ipv4.conf.default.accept_source_route = 0
    net.ipv4.tcp_tw_recycle = 1
    net.ipv4.tcp_max_syn_backlog = 4096
    net.ipv4.conf.all.arp_filter = 1
    net.ipv4.conf.default.arp_filter = 1
    net.core.netdev_max_backlog = 10000
    vm.overcommit_memory = 2
    kernel.msgmni = 2048
    net.ipv4.ip_local_port_range = 1025 65535

     vim /etc/security/limits.conf  (注:4台服务器相同的配置)

    * soft nofile 65536
    * hard nofile 65536
    * soft nproc 131072
    * hard nproc 131072

    磁盘预读参数及 deadline算法修改 (注:4台服务器相同的配置)

    blockdev --setra 65536 /dev/sda
    echo deadline > /sys/block/sda/queue/scheduler

    注:磁盘盘符sda需根据自己的实际情况进行配置

    软件下载地址:https://network.pivotal.io/products/pivotal-gpdb,下载:greenplum-db-5.21.1-rhel7-x86_64.rpm

    在Master主机上安装GP二进制文件,也就是主机名是mdw的服务器。(注:在master上安装即可,后面通过批量的方法安装剩下的服务器)

    rpm -ivh greenplum-db-5.21.1-rhel7-x86_64.rpm

    注:默认安装目录:/usr/local

     在Master上添加gpadmin用户

    adduser gpadmin
    echo gpadmin | passwd --stdin gpadmin

     注:设置密码为了后面gpssh-exkeys -f hostfile_allhosts 使用

     在Master上给gpadmin用户提权

    [root@gp-master ~]# visudo  
    gpadmin    ALL=(ALL)       ALL
    gpadmin    ALL=(ALL)       NOPASSWD:ALL

    在Master主机上赋予gpadmin用户Greenplum文件夹的的权限

    chown -R gpadmin.gpadmin /usr/local/greenplum-db*

     二、Master 主机 Gpadmin用户上操作

    准备用于批量安装软件以及后续集群的初始化文件,hostfile_allhosts,hostfile_segments,hostfile_mshosts,存放到/home/gpadmin

    su - gpadmin

    vim hostfile_allhosts

    gp-master
    gp-standby
    gp-node1
    gp-node2

    vim hostfile_segments

    gp-node1
    gp-node2

    vim hostfile_mshosts

    gp-master
    gp-standby

    设置各主机之间免密登录

    gpssh-exkeys -f hostfile_allhosts

    注:需输入gpadmin用户的密码,此处为:gpadmin

    设置用于安装Greenplum的文件夹权限

    gpssh -f hostfile_allhosts
    => sudo chown gpadmin.gpadmin /usr/local
    => exit

    创建及赋权master/standby主机元数据存储目录

    gpssh -f hostfile_mshosts
    =>sudo mkdir /data/greenplum_data/gpmaster
    =>sudo chown -R gpadmin.gpadmin /data
    =>exit

    创建及赋权Segments主机数据存储目录

    gpssh -f hostfile_segments
    =>sudo mkdir /data/greenplum_data/{primary,mirror}
    =>sudo chown -R gpadmin.gpadmin /data
    =>exit

    批量安装软件(GP)

    cd /home/gpadmin/
    source /usr/local/greenplum-db/greenplum_path.sh
    gpseginstall -f hostfile_allhosts -u gpadmin -p gpadmin

    设置NTP同步


     Yum下载安装NTP服务器,已安装的可以略过

    sudo yum install ntp -y

    若出现如下报错,可看下一步解决方法

    There was a problem importing one of the Python modules 
    required to run yum. The error leading to this problem was: 
      
      No module named yum 
      
    Please install a package which provides this module, or 
    verify that the module is installed correctly. 
      
    It's possible that the above module doesn't match the 
    current version of Python, which is: 
    2.7.13 (r266:84292, Jan 22 2014, 09:37:14)  
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] 
      
    If you cannot solve this problem yourself, please go to  
    the yum faq at: 
      http://yum.baseurl.org/wiki/Faq
    View Code

    解决方法:

    unset PYTHONHOME
    unset PYTHONPATH
    unset LD_LIBRARY_PATH

     再进行yum安装之后,再修改回来,使得GP能正常使用

    source /usr/local/greenplum-db/greenplum_path.sh

    注:报错原因:在安装GP集群之后,会在master节点中的环境变量中会增加 PYTHONHOME,PYTHONPATH,LD_LIBRARY_PATH几项,并且会修改原本的path。

    补充:LD_LIBRARY_PATH 该环境变量主要用于指定查找共享库(动态链接库)时除了默认路径之外的其他路径。


    在每个Segment主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个server参数指向Standby主机。如下面:

    sudo vim /etc/ntp.conf

    server gp-master prefer
    server gp-standby

    在Standby主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个参数指向数据中心的时间服务器。

    sudo vim /etc/ntp.conf

    server gp-master prefer

    在Master主机,使用NTP守护进程同步所有Segment主机的系统时钟。例如,使用gpssh来完成:

    gpssh -f hostfile_allhosts -v -e 'ntpd'

     输出如下代表成功:

    [root@gp-master gpadmin]# gpssh -f all_hosts -v -e 'ntpd'
    [WARN] Reference default values as $MASTER_DATA_DIRECTORY/gpssh.conf could not be found
    Using delaybeforesend 0.05 and prompt_validation_timeout 1.0
    
    [Reset ...]
    [INFO] login mdw
    [INFO] login smdw
    [INFO] login sdw1
    [INFO] login sdw2
    [ mdw] ntpd
    [smdw] ntpd
    [sdw1] ntpd
    [sdw2] ntpd
    [INFO] completed successfully
    
    [Cleanup...]

    配置Greenplum初始化文件

    cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config   /home/gpadmin/gpinitsystem_config
    chmod 775 gpinitsystem_config

     相关配置如下:

    [gpadmin@gp-master ~]$ cat gpinitsystem_config 
    # FILE NAME: gpinitsystem_config
    
    # Configuration file needed by the gpinitsystem
    
    ################################################
    #### REQUIRED PARAMETERS
    ################################################
    
    #### Name of this Greenplum system enclosed in quotes.
    ARRAY_NAME="Greenplum Data Platform"
    
    #### Naming convention for utility-generated data directories.
    SEG_PREFIX=gpseg
    
    #### Base number by which primary segment port numbers 
    #### are calculated.
    PORT_BASE=40000
    
    #### File system location(s) where primary segment data directories 
    #### will be created. The number of locations in the list dictate
    #### the number of primary segments that will get created per
    #### physical host (if multiple addresses for a host are listed in 
    #### the hostfile, the number of segments will be spread evenly across
    #### the specified interface addresses).
    declare -a DATA_DIRECTORY=(/data/greenplum_data/primary)
    
    #### OS-configured hostname or IP address of the master host.
    MASTER_HOSTNAME=k8s-master
    
    #### File system location where the master data directory 
    #### will be created.
    MASTER_DIRECTORY=/data/greenplum_data/gpmaster
    
    #### Port number for the master instance. 
    MASTER_PORT=5432
    
    #### Shell utility used to connect to remote hosts.
    TRUSTED_SHELL=ssh
    
    #### Maximum log file segments between automatic WAL checkpoints.
    CHECK_POINT_SEGMENTS=8
    
    #### Default server-side character set encoding.
    ENCODING=UTF-8
    
    ################################################
    #### OPTIONAL MIRROR PARAMETERS
    ################################################
    
    #### Base number by which mirror segment port numbers 
    #### are calculated.
    MIRROR_PORT_BASE=43000
    
    #### Base number by which primary file replication port 
    #### numbers are calculated.
    REPLICATION_PORT_BASE=34000
    
    #### Base number by which mirror file replication port 
    #### numbers are calculated. 
    MIRROR_REPLICATION_PORT_BASE=44000
    
    #### File system location(s) where mirror segment data directories 
    #### will be created. The number of mirror locations must equal the
    #### number of primary locations as specified in the 
    #### DATA_DIRECTORY parameter.
    declare -a MIRROR_DATA_DIRECTORY=(/data/greenplum_data/mirror)
    
    
    ################################################
    #### OTHER OPTIONAL PARAMETERS
    ################################################
    
    #### Create a database of this name after initialization.
    DATABASE_NAME=testDB
    
    #### Specify the location of the host address file here instead of
    #### with the the -h option of gpinitsystem.
    MACHINE_LIST_FILE=/home/gpadmin/hostfile_segments
    View Code

     运行初始化工具初始化数据库

    source /usr/local/greenplum-db/greenplum_path.sh 
    gpinitsystem -c gpinitsystem_config

    初始化日志:

    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory
    20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------

    现在只有1个master,2个segment,没有standby,那么接下来把standby加入集群。

    在Master服务器上执行

    gpinitstandby -s gp-standby

    输出如下:

    [gpadmin@mdw ~]$ gpinitstandby -s smdw
    20160827:16:59:24:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking for filespace directory /data/master/gpseg-1 on smdw
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname               = mdw
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory         = /data/master/gpseg-1
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port                   = 5432
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname       = smdw
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port           = 5432
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /data/master/gpseg-1
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog         = On
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:- Filespace locations
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
    20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_system -> /data/master/gpseg-1
    Do you want to continue with standby master initialization? Yy|Nn (default=N):
    > y
    20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
    20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent.
    20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog...
    20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully.
    20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file...
    20160827:16:59:37:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
    20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating filespace flat files...
    20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Filespace flat file updated successfully.
    20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master
    20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw  in directory: /data/master/gpseg-1
    20160827:16:59:40:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
    20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
    20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on gp-standby
    View Code

    查看启动进程:

    [gpadmin@gp-master ~]$ ps -ef | grep postgres
    gpadmin  10975     1  0 00:57 ?        00:00:00 /usr/local/greenplum-db-5.21.1/bin/postgres -D /data/greenplum_data/gpmaster/gpseg-1 -p 5432 --gp_dbid=1 --gp_num_contents_in_cluster=2 --silent-mode=true -i -M master --gp_contentid=-1 -x 0 -E
    gpadmin  10976 10975  0 00:57 ?        00:00:00 postgres:  5432, master logger process   
    gpadmin  10979 10975  0 00:57 ?        00:00:00 postgres:  5432, stats collector process   
    gpadmin  10980 10975  0 00:57 ?        00:00:01 postgres:  5432, writer process   
    gpadmin  10981 10975  0 00:57 ?        00:00:00 postgres:  5432, checkpointer process   
    gpadmin  10982 10975  0 00:57 ?        00:00:00 postgres:  5432, seqserver process   
    gpadmin  10983 10975  0 00:57 ?        00:00:00 postgres:  5432, ftsprobe process   
    gpadmin  10984 10975  0 00:57 ?        00:00:00 postgres:  5432, sweeper process   
    gpadmin  10985 10975  0 00:57 ?        00:00:05 postgres:  5432, stats sender process   
    gpadmin  10986 10975  0 00:57 ?        00:00:01 postgres:  5432, wal writer process   
    gpadmin  11279 10975  0 00:59 ?        00:00:00 postgres:  5432, wal sender process gpadmin 192.168.18.131(53573) streaming 0/C05A028
    gpadmin  16800 16608  0 04:15 pts/0    00:00:00 grep --color=auto postgres

    设置gpadmin用户环境变量,Master,Standby都需设置。 

    vim /home/gpadmin/.bashrc

    [gpadmin@gp-master ~]$ cat .bashrc 
    # .bashrc
    
    # Source global definitions
    if [ -f /etc/bashrc ]; then
            . /etc/bashrc
    fi
    
    # Uncomment the following line if you don't like systemctl's auto-paging feature:
    # export SYSTEMD_PAGER=
    
    # User specific aliases and functions
    
    source /usr/local/greenplum-db/greenplum_path.sh 
    
    export MASTER_DATA_DIRECTORY=/data/greenplum_data/gpmaster/gpseg-1
    export PGPRORT=5432
    export PGDATABASE=testDB
    [gpadmin@gp-master ~]$ scp .bashrc gp-standby:`pwd`

    启动和停止数据库测试是否能正常启动和关闭,命令如下

    gpstart
    gpstop 

    到此 Greenplum 就部署完成了。下面进行一些简单的测试。

    登录数据库:psql -d postgres

    建表,插入,查询

    postgres=# create table student ( no int primary key,student_name varchar(40),age int);
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "student_pkey" for table "student"
    CREATE TABLE
    postgres=# insert into student values(1,'yayun',18);
    INSERT 0 1
    postgres=# select * from student;
     no | student_name | age 
    ----+--------------+-----
      1 | yayun        |  18
    (1 row)

     

  • 相关阅读:
    日志框架 log4j2 全解析
    SpringMVC开发RESTful接口
    SpringMVC 进阶
    SpringMVC 参数映射与文件上传
    SSM整合
    算法分析
    SpringMVC 入门
    数据结构与算法概念
    在普通WEB项目中使用Spring
    《算法导论》——重复元素的随机化快排Optimization For RandomizedQuickSort
  • 原文地址:https://www.cnblogs.com/sini157/p/11353866.html
Copyright © 2011-2022 走看看