zoukankan      html  css  js  c++  java
  • Openshift 3.6 安装

    因为有客户需求,所以必须尝试一下,可悲的是手里只有3.7的离线安装文档,加上之前3.11安装因为同事文档写得太好,基本没遇到什么坑,所以就没仔细研究就开始搞了。

    结果果然是因为/etc/ansible/host文件写得有问题,遇到一堆问题,记录一下了。

    1.遇到问题记录

    • 镜像不ready

    镜像不ready,虽然都pull下来了,可是没仔细看文档,就save -o了文档中的那几个,所以就造成下面的错误,只好重新开始下载

    One or more required container images are not available:
                       openshift3/registry-console:v3.6,
                       registry.example.com/openshift3/ose-deployer:v3.6.173.0.130,
                       registry.example.com/openshift3/ose-docker-registry:v3.6.173.0.130,
                       registry.example.com/openshift3/ose-haproxy-router:v3.6.173.0.130,
                       registry.example.com/openshift3/ose-pod:v3.6.173.0.130
                   Checked with: skopeo inspect [--tls-verify=false] [--creds=<user>:<pass>] docker://<registry>/<image>
                   Default registries searched: registry.example.com, registry.access.redhat.com
                   Failed connecting to: registry.example.com, registry.access.redhat.com
    • registry 443端口没配

    学3.11安装配了个80以为可以绕信过关,结果就报错了

    [root@master ~]# oc logs  registry-console-1-deploy -n default
    --> Scaling registry-console-1 to 1
    --> Waiting up to 10m0s for pods in rc registry-console-1 to become ready
    E1114 13:34:58.912499       1 reflector.go:304] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:509: Failed to watch *api.Pod: Get https://172.30.0.1:443/api/v1/namespaces/default/pods?labelSelector=deployment%3Dregistry-console-1%2Cdeploymentconfig%3Dregistry-console%2Cname%3Dregistry-console&resourceVersion=1981&timeoutSeconds=412&watch=true: dial tcp 172.30.0.1:443: getsockopt: connection refused
    • registry-catalog需要retag一下

    pull service-catalog的镜像出问题,这个是个大坑啊,每次一装就需要1个多钟头,类似错误如下

    15m        13m        4    kubelet, master.example.com    spec.containers{apiserver}    Normal        Pulling        pulling image "registry.access.redhat.com/openshift3/ose-service-catalog:v3.6"
      15m        13m        4    kubelet, master.example.com    spec.containers{apiserver}    Warning        Failed        Failed to pull image "registry.access.redhat.com/openshift3/ose-service-catalog:v3.6": rpc error: code = 2 desc = All endpoints blocked.
      15m        13m        6    kubelet, master.example.com    spec.containers{apiserver}    Normal        BackOff        Back-off pulling image "registry.access.redhat.com/openshift3/ose-service-catalog:v3.6"
      15m        4m        46    kubelet, master.example.com                    Warning        FailedSync    Error syncing pod
      

    解决办法如下:

    docker pull registry.example.com/openshift3/registry-console:v3.6.173.0.130
    docker tag registry.example.com/openshift3/registry-console:v3.6.173.0.130 registry.example.com/openshift3/registry-console:v3.6
    
    docker push registry.example.com/openshift3/registry-console:v3.6
    • 配置了yum但找不到docker

    master上安装docker找不到,大家都是配置同样的yum repository,后来只好通过联网方式的subscription-manager注册解决。

    • apiserver的pod虽然启动,但是无法连上,报错信息
    curl: (6) Could not resolve host: apiserver.kube-service-catalog.svc; Unknown error

             通过修改./etc/resolv.conf为

    [root@node2 ~]# cat /etc/resolv.conf 
    # nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
    # Generated by NetworkManager
    search cluster.local example.com
    nameserver 192.168.0.105

    3.6不像3.11有一个Prequrest的check,这个直接安装上来,就需要一直等他是否出错的信息了,所以每次安装很长时间。

    host文件的选项可以参考,踩坑必看啊。

    https://docs.okd.io/3.6/install_config/install/advanced_install.html#enabling-service-catalog

    • 安装完成没有看到metrics等组件 

    安装完成最后的log

    TASK [openshift_excluder : Enable openshift excluder] *******************************************************************************************************************
    changed: [node1.example.com]
    changed: [master.example.com]
    changed: [node2.example.com]
    
    PLAY RECAP **************************************************************************************************************************************************************
    localhost                  : ok=15   changed=0    unreachable=0    failed=0   
    master.example.com         : ok=740  changed=72   unreachable=0    failed=0   
    nfs.example.com            : ok=91   changed=3    unreachable=0    failed=0   
    node1.example.com          : ok=250  changed=18   unreachable=0    failed=0   
    node2.example.com          : ok=250  changed=18   unreachable=0    failed=0   

    检查只有这么几个pod,设置的metrics都没有上来,一定是hosts文件出了问题。

    [root@master ~]# oc get pods --all-namespaces
    NAMESPACE              NAME                       READY     STATUS              RESTARTS   AGE
    default                docker-registry-1-x0hlq    1/1       Running             7          2d
    default                registry-console-2-p84p6   1/1       Running             2          1d
    default                router-10-ttqq9            0/1       MatchNodeSelector   0          1d
    default                router-12-rfpxc            1/1       Running             1          1d
    kube-service-catalog   apiserver-3ls5x            1/1       Running             1          1d
    kube-service-catalog   controller-manager-7zdbc   0/1       CrashLoopBackOff    1          1d
    [root@master ~]# oc get nodes
    NAME                 STATUS    AGE       VERSION
    master.example.com   Ready     2d        v1.6.1+5115d708d7
    node1.example.com    Ready     2d        v1.6.1+5115d708d7
    node2.example.com    Ready     2d        v1.6.1+5115d708d7
    •  卸载脚本  
    ansible-playbook  /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml;
    • DNS无法启动导致atomic-openshift-node.service服务启动失败
    Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: I1117 18:55:51.787479   32772 mount_linux.go:203] Detected OS with systemd
    Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: I1117 18:55:51.787497   32772 docker.go:364] Connecting to docker on unix:///var/run/docker.sock
    Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: I1117 18:55:51.787510   32772 docker.go:384] Start docker client with request timeout=2m0s
    Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: W1117 18:55:51.789279   32772 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
    Nov 17 18:55:51 master.example.com atomic-openshift-node[32772]: F1117 18:55:51.798668   32772 start_node.go:140] could not start DNS, unable to read config file: open /etc/origin/node/resolv.conf: no such file or directory
    Nov 17 18:55:51 master.example.com systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
    Nov 17 18:55:51 master.example.com systemd[1]: Failed to start OpenShift Node.
    -- Subject: Unit atomic-openshift-node.service has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit atomic-openshift-node.service has failed.

    解决方案,拷贝一个resolv.conf文件

    [root@master ansible]# cd /etc/origin/node
    [root@master node]# ls
    ca.crt            node-dnsmasq.conf  server.key                          system:node:master.example.com.key
    node-config.yaml  server.crt         system:node:master.example.com.crt  system:node:master.example.com.kubeconfig
    [root@master node]# cp /etc/resolv.conf .
    • Router启动失败,经过分析,发现deploy到node2.example.com的时候失败,原因是bind不到443端口
    [root@master node]# oc get pods -o wide 
    NAME              READY     STATUS             RESTARTS   AGE       IP              NODE
    router-1-deploy   0/1       Error              0          30m       10.129.0.14     node2.example.com
    router-2-55bpf    1/1       Running            0          5m        192.168.0.104   node1.example.com
    router-2-deploy   1/1       Running            0          5m        10.128.0.14     node1.example.com
    router-2-dw31q    1/1       Running            0          5m        192.168.0.103   master.example.com
    router-2-xn9cp    0/1       CrashLoopBackOff   6          5m        192.168.0.105   node2.example.com
    [root@master node]# oc logs router-2-xn9cp  
    I1117 12:19:27.665452       1 template.go:246] Starting template router (v3.6.173.0.130)
    I1117 12:19:27.679413       1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1936
    I1117 12:19:27.700732       1 router.go:240] Router is including routes in all namespaces
    E1117 12:19:27.777551       1 ratelimiter.go:52] error reloading router: exit status 1
    [ALERT] 320/121927 (45) : Starting frontend public_ssl: cannot bind socket [0.0.0.0:443]

    问题分析: registry在node2上也是bind 443端口,估计冲突了,所以修改ansible,删除node2的route属性。

    把监控功能上上去,又修改了一把hosts文件,最后安装成功的hosts文件参考如下:

    # Create an OSEv3 group that contains the masters and nodes groups
    [OSEv3:children]
    masters
    nodes
    etcd
    nfs
    
    [OSEv3:vars]
    ansible_ssh_user=root
    openshift_deployment_type=openshift-enterprise
    
    osm_cluster_network_cidr=10.128.0.0/14
    openshift_portal_net=172.30.0.0/16
    openshift_master_api_port=8443
    openshift_master_console_port=8443
    
    openshift_hosted_registry_storage_kind=nfs
    openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
    openshift_hosted_registry_storage_nfs_directory=/exports
    openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
    openshift_hosted_registry_storage_volume_name=registry
    openshift_hosted_registry_storage_volume_size=10Gi
    oreg_url=registry.example.com/openshift3/ose-${component}:${version}
    openshift_docker_additional_registries=registry.example.com
    openshift_docker_insecure_registries=registry.example.com
    openshift_docker_blocked_registries=registry.access.redhat.com,docker.io
    openshift_image_tag=v3.6.173.0.130
    
    openshift_enable_service_catalog=true
    openshift_service_catalog_image_prefix=registry.example.com/openshift3/ose-
    openshift_service_catalog_image_version=v3.6.173.0.130
    ansible_service_broker_image_prefix=registry.example.com/openshift3/ose-
    ansible_service_broker_etcd_image_prefix=registry.example.com/rhel7/
    template_service_broker_prefix=registry.example.com/openshift3/
    oreg_url=registry.example.com/openshift3/ose-${component}:${version}
    openshift_examples_modify_imagestreams=true
    openshift_clock_enabled=true
    
    openshift_metrics_storage_kind=nfs
    openshift_metrics_install_metrics=true
    openshift_metrics_storage_access_modes=['ReadWriteOnce']
    openshift_metrics_storage_host=nfs.example.com
    openshift_metrics_storage_nfs_directory=/exports
    openshift_metrics_storage_volume_name=metrics
    openshift_metrics_storage_volume_size=10Gi
    openshift_metrics_hawkular_hostname=hawkular-metrics.apps.example.com
    #openshift_metrics_cassandra_storage_type=emptydir
    openshift_metrics_image_prefix=registry.example.com/openshift3/
    openshift_hosted_metrics_deploy=true
    openshift_hosted_metrics_public_url=https://hawkular-metrics.apps.example.com/hawkular/metrics
    openshift_metrics_image_version=v3.6.173.0.130
    
    
    openshift_template_service_broker_namespaces=['openshift']
    template_service_broker_selector={"node": "true"}
    openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
    # Default login account: admin / handhand
    openshift_master_htpasswd_users={'admin': '$apr1$gfaL16Jf$c.5LAvg3xNDVQTkk6HpGB1'}
    
    
    #openshift_repos_enable_testing=true
    openshift_disable_check=docker_image_availability,disk_availability,memory_availability,docker_storage
    
    docker_selinux_enabled=false
    openshift_docker_options=" --selinux-enabled --insecure-registry 172.30.0.0/16 --log-driver json-file --log-opt max-size=50M --log-opt max-file=3 --insecure-registry registry.example.com --add-registry registry.example.com"
    osm_etcd_image=rhel7/etcd
    openshift_logging_image_prefix=registry.example.com/openshift3/
    
    openshift_hosted_router_selector='region=infra,router=true'
    openshift_master_default_subdomain=app.example.com
    
    
    # host group for masters
    [masters]
    master.example.com
    # host group for etcd
    [etcd]
    master.example.com
    
    # host group for nodes, includes region info
    [nodes]
    master.example.com openshift_node_labels="{'region': 'infra', 'router': 'true', 'zone': 'default'}" openshift_schedulable=true
    node1.example.com openshift_node_labels="{'region': 'infra', 'router': 'true', 'zone': 'default'}" openshift_schedulable=true
    node2.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true
    
    [nfs]
    nfs.example.com

    安装完成后拿最后的hosts文件又装了一遍,这次终于全部都出来了

    [root@master ~]# oc get pods --all-namespaces 
    NAMESPACE              NAME                         READY     STATUS    RESTARTS   AGE
    default                docker-registry-1-p8p0s      1/1       Running   2          2h
    default                registry-console-1-t4bw2     1/1       Running   0          1h
    default                router-1-1nnt3               1/1       Running   2          2h
    default                router-1-4h8tg               1/1       Running   3          2h
    kube-service-catalog   apiserver-z6nmz              1/1       Running   2          1h
    kube-service-catalog   controller-manager-d2jgc     1/1       Running   0          1h
    openshift-infra        hawkular-cassandra-1-m6r4x   1/1       Running   0          1h
    openshift-infra        hawkular-metrics-4j828       1/1       Running   1          1h
    openshift-infra        heapster-rgwrw               1/1       Running   6          2h

    查看pv,pvc

    [root@master ~]# oc get pv,pvc
    NAME                 CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                    STORAGECLASS   REASON    AGE
    pv/registry-volume   10Gi       RWX           Retain          Bound     default/registry-claim                            26m
    
    NAME                 STATUS    VOLUME            CAPACITY   ACCESSMODES   STORAGECLASS   AGE
    pvc/registry-claim   Bound     registry-volume   10Gi       RWX                          26m

    2.批量存镜像脚本

    for i in $(docker images |awk '{print $1":"$2}'); do
            imagename=$(echo $i | awk -F '/' {'print $3'} | awk -F ':' {'print $1'});
      #      imagename=$($i |awk -F '/' {'print $3'} | awk -F ':' {'print $1'});
            echo $imagename;
           # echo docker save $1 | gzip -c > /root/images/$imagename.tar.gz;
            docker save $i | gzip -c > /root/images/$imagename.tar.gz;
    done;

    3. 镜像放在单独的盘

    Virtualbox 添加一个新盘,然后通过

    fdisk -l

    找到相应的设备,比如/dev/sdb

    格式化

    echo "n
    p
    1
    
    
    w" | fdisk /dev/sdb;

    创建vg

    pvcreate /dev/sdb1;
    vgcreate docker-vg /dev/sdb1;

    docker使用docker-vg

    vgs;
     
    cat <<EOF > /etc/sysconfig/docker-storage-setup
    VG=docker-vg
    EOF
    
    docker-storage-setup
    
    lvextend -l 100%VG /dev/docker-vg/docker-pool
    touch /etc/containers/registries.conf
    systemctl start docker
    systemctl enable docker
    
    lvs  
    getenforce

    4.ocp.repo文件

    [root@master ~]# cat /etc/yum.repos.d/ocp.repo 
    [server]
    name=server
    baseurl=http://192.168.56.103:8080/repo/rhel-7-server-rpms/
    enabled=1
    gpgcheck=0
    [datapath]
    name=datapath
    baseurl=http://192.168.56.103:8080/repo/rhel-7-fast-datapath-rpms/
    enabled=1
    gpgcheck=0
    [extra]
    name=extra
    baseurl=http://192.168.56.103:8080/repo/rhel-7-server-extras-rpms/
    enabled=1
    gpgcheck=0
    [ose]
    name=ose
    baseurl=http://192.168.56.103:8080/repo/rhel-7-server-ose-3.6-rpms/
    enabled=1
    gpgcheck=0

    5.主要安装步骤记录

    systemctl stop firewalld
    systemctl disable firewalld
    systemctl mask firewalld
    setenforce 0;
    sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config
    
    yum clean all
    yum repolist
    
    yum install -y docker
    
    
    yum -y install wget git net-tools bind-utils iptables-services bridge-utils bash-completion vim atomic-openshift-excluder atomic-openshift-docker-excluder lrzsz unzip atomic-openshift-utils;
    yum -y install python-setuptools
    
    yum -y update;
    
    
    ssh-keygen
    
    ssh-copy-id root@master.example.com
    ssh-copy-id root@node1.example.com
    ssh-copy-id root@node2.example.com
    
    echo "n
    p
    1
    
    
    w" | fdisk /dev/sdb;
    
    pvcreate /dev/sdb1;
    vgcreate docker-vg /dev/sdb1;
    
    vgs;
     
    cat <<EOF > /etc/sysconfig/docker-storage-setup
    VG=docker-vg
    EOF
    
    docker-storage-setup
    
    lvextend -l 100%VG /dev/docker-vg/docker-pool
    touch /etc/containers/registries.conf
    systemctl start docker
    systemctl enable docker
    
    lvs  
    getenforce
    
    
    yum -y install docker-distribution;
    systemctl enable docker-distribution;
    systemctl start docker-distribution;

    service catalog灰色,technology preview版本一望就知。

  • 相关阅读:
    vue excel 二进制文件导出
    小程序 input 批量监听
    vue-cli3 环境配置
    vue 同一浏览器只允许登录一个账号的解决办法
    vue ueditor 百度富文本
    视频分享
    vue、react、mpvue、node、ng视频教程以及项目
    【转】 值得推荐的C/C++框架和库 (真的很强大)
    C# 验证过滤代理IP是否有效
    【转】 C#学习笔记14——Trace、Debug和TraceSource的使用以及日志设计
  • 原文地址:https://www.cnblogs.com/ericnie/p/9966403.html
Copyright © 2011-2022 走看看