这几天在客户环境中搞Operatorhub的离线,因为已经安装了OpenShift 4.3的集群,所以目标是只将考核的Service Mesh和Serverless模块安装上去即刻,因为前期工作关系,我曾在离线的4.2环境安装过类似组件,所以稍作准备就出发了,但这几天遇到的问题和坑确实不少,4.3和4.2相比在离线方面有很大的改进,但又埋了另外一些坑,本文算是大致的一个记录。
另外感谢各位前辈及前浪的指引,让我在一片混乱中清晰了思路。
1.制作catalog的镜像
因为网络环境太慢,所以建议大家直接mirror到本地的仓库然后再进行
oc image mirror registry.redhat.io/openshift4/ose-operator-registry:v4.3 registry.example.com/openshift4/ose-operator-registry
形成本地的catalog镜像
oc adm catalog build --appregistry-org redhat-operators --from=registry.example.com/openshift4/ose-operator-registry:v4.3 --to=registry.example.com/olm/redhat-operators:v1 --insecure
形成要mirror下载的镜像文件
oc adm catalog mirror --manifests-only registry.example.com/olm/redhat-operators:v1 registry.example.com --insecure
形成的目录结构如下
[root@registry test]# tree redhat-operators-manifests/ redhat-operators-manifests/ ├── imageContentSourcePolicy.yaml └── mapping.txt
打开mapping.txt文件看一下
registry.redhat.io/openshift-service-mesh/istio-rhel8-operator:1.0.5=registry.example.com/openshift-service-mesh/istio-rhel8-operator:1.0.5 registry.redhat.io/openshift-service-mesh/3scale-istio-adapter-rhel8@sha256:00fb544a95b16c652cc571396679c65d5889b2cfe6f1a0176f560a1678309a35=registry.example.com/openshift-service-mesh/3scale-istio-adapter-rhel8 registry.redhat.io/container-native-virtualization/kubevirt-kvm-info-nfd-plugin@sha256:bb120df34c6eef21431a074f11a1aab80e019621e86b3ffef4d10d24cb64d2df=registry.example.com/container-native-virtualization/kubevirt-kvm-info-nfd-plugin
基本上全是安装operator需要的sha256码的镜像,以及和本地register server的对应关系了。
最好的做法是基于下面的语句把所有的镜像都下载下来,但因为我们只需要两个模块,所以采用了手工的模式。(这也就注定了大量的工作时间和反复的镜像导入)
oc apply -f ./redhat-operators-manifests
上面命令是官方的做法,下午验证了一下,发现需要具备集群环境,我自己写了一个脚本进行批量的下载,首先可以缩减需要下载的镜像,按照命名空间,然后再通过脚本批量mirror
[root@registry redhat-operators-manifests]# cat batchmirror.sh #!/bin/bash i=0 while IFS= read -r line do i=$((i + 1)) echo $i; source=$(echo $line | cut -d'=' -f 1) echo $source target=$(echo $line | cut -d'=' -f 2) echo $target skopeo copy --all docker://$source docker://$target sleep 20 done < eventing.txt
2.形成离线的Operatorhub Catalog.
这个步骤比较容易。主要是
oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
然后建立一个文件catalogsource.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: my-operator-catalog namespace: openshift-marketplace spec: sourceType: grpc image: registry.example.com/olm/redhat-operators:v1 displayName: My Operator Catalog publisher: grpc
建立完成后检查,operatorhub界面里面应该有所有红帽的镜像
oc create -f catalogsource.yaml oc get pods -n openshift-marketplace oc get catalogsource -n openshift-marketplace oc describe catalogsource internal-mirrored-operatorhub-catalog -n openshift-marketplace
3.基于模块下载Operator及组件镜像
到了这一步就满满的坑了,先安装一个ElasticSearch Operator,然后发现Image Pull Error,再mapping中找到具体的sha256码,比如
registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:aa0c7b11a655454c5ac6cbc772bc16e51ca5004eedccf03c52971e8228832370
按照4.2的做法,只是需要运行
oc image mirror registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f registry.example.com/openshift4/ose-elasticsearch-operator
然后运行成功后,为了验证,需要在本地拉取一下
podman pull registry.example.com/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f
你会发现根本拉不下来,据说这是因为在4.3中某些镜像属于多层的sh256码,而解决办法是
skopeo copy --all docker://registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f docker://registry.example.com/openshift4/ose-elasticsearch-operator
然后将registry的存放地址打成tar包,在离线环境解开就可。
因为大部分的operator的镜像都是sha256模式,所以需要一个一个的skopeo。此处消耗大量时间。
4. sample-registres.conf文件
这个文件的目的是为了将源地址和目标地址进行映射,并且让ocp的crio知道如何去下载源地址的镜像。
unqualified-search-registries = ["docker.io"] [[registry]] location = "quay.io/openshift-release-dev/ocp-release" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/ocp4/openshift4" insecure = false [[registry]] location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/ocp4/openshift4" insecure = false [[registry]] location = "registry.redhat.io/distributed-tracing" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/distributed-tracing" insecure = false [[registry]] location = "registry.redhat.io/openshift-service-mesh" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/openshift-service-mesh" insecure = false [[registry]] location = "registry.redhat.io/openshift4" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/openshift4" insecure = false
而这个配置需要刷到集群的每台机器上去,这个刷机的动作是由machine-config这个cluster operator完成的,正常步骤是
创建一个machineconfig.yaml,然后运行刷机。。。。
cat sample-registries.conf | base64 | tr -d ' ' apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: annotations: labels: machineconfiguration.openshift.io/role: worker name: 50-worker-container-registry-conf spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,${YOUR_FILE_CONTENT_IN_BASE64} verification: {} filesystem: root mode: 420 path: /etc/containers/registries.conf oc apply -f machineconfig.yaml
然后当前集群的machine-config的Cluster Operator的状态为false,尝试修复未果,心生一计,直接将这个sample-registres.conf覆盖每一台机器的registries.conf,覆盖完成记得重新启动crio
systemctl restart crio
如果不放心,可以直接在node上运行,如果正常,应该可以出来。
podman pull registry.redhat.io/.....@sha256....
5. Knative
一切安装就绪,在尝试helloworld-go的时候,又出现了X509的问题,找了半天,发现是一个已知问题,之前一直在aws公有云上尝试,所以没遇到,但如果将例子程序放在本地的镜像仓库中就必现了,
客官可见: https://github.com/knative/serving/issues/5126
解决办法也很野蛮,直接在configmap中跳过tag解析,(下面代码仅作参考,我是基于图形界面修改的)
oc -n knative-serving edit configmap config-deployment apiVersion: v1 data: queueSidecarImage: gcr.azk8s.cn/knative-releases/knative.dev/serving/cmd/queue@sha256:5ff357b66622c98f24c56bba0a866be5e097306b83c5e6c41c28b6e87ec64c7c registriesSkippingTagResolving: registry.example.com
一切正常后,发现event的source的创建方式变了,cronjobsource已经deprecated,不让创建,只好通过下面命令
$ oc get inmemorychannel NAME READY REASON URL AGE imc-msgtxr True http://imc-msgtxr-kn-channel.kn-demo.svc.cluster.local 24s kn source ping create msgtxr-pingsource --schedule="* * * * *" --data="This message is from PingSource" --sink=http://imc-msgtxr-kn-channel.kn-demo.svc.cluster.local
创建完成后终于一切正常,而我也终于有机会苟延残喘,记录一下。 :(