LarsL

Lars Lehmnn

Repairing OpenShift after an interrupted OVN migration

Manual MachineConfig rollout

Catagory: OpenShift

There are currently two CNI solutions in OpenShift which come directly from Red Hat and are supported accordingly. One is OpenShift SDN, which has been around since OpenShift 3, and the newer KubernetesOVN. With OpenShift 4.17, however, the old OpenShift SDN is no longer supported, which is why all clusters with SDN must be migrated to OVN.

The migration

To migrate a cluster from SDN to OVN, the configuration for the OVN components must first be created on the nodes. OpenShift 4 has the machine-config-operator for this purpose, which controls the configuration of nodes via the MachineConfigPools (MCP).

Once the nodes have been configured, normally only the CNI needs to be changed, which then causes a short downtime.

This is the theory in short, more information can be found in the official documentation.

Problem

During the migration of one of our test clusters (OpenShift 4.14), the cluster network was accidentally switched from SDN to OVN while the MachineConfigPool update was still running. As a result, all CNI pods were replaced, although there was no OVN config on most of the nodes, which in the end meant that the OVN could not start and there was no network in the cluster.

Options

After various files were missing on the nodes, which was quite clear in the logs of the ovnkube-node pods (NS: openshift-ovn-kubernetes), it was clear that these files need to be put on the nodes. However, the official way via the MCPs was no longer possible because the machine-config-controller (NS: openshift-machine-config-operator) that controls the rollout could no longer access the API due to the missing network.

Alternatively, it would be possible to copy the files from a working node or to use the files from the MachineConfig templates of the nodes.

But manually copying files around is a rather messy solution with RHEL CoreOS and would cause further problems after the repair, as the machine-config-daemon (NS: openshift-machine-config-operator) would complain about every touched file, which could easily be fixed by a force apply.

Or we try to do the MachineConfig rollout ourselves, or in other words we play the role of the machine-config-controller by our own.

Function of MCPs

If a config change is to be rolled out to nodes in OpenShift, e.g. a customized chrony config, a MachineConfig is created for one of the MachineConfigPools. The machine-config-controler then creates a new MachineConfig, the rendered-machineconfig-<id>, in which only the individual MachineConfigs for the pool are merged.

Then the machine-config-controller comes and selects one node per pool (e.g. master, infra, app/worker) to be updated. Then the machine-config-controller sets the annotation machineconfiguration.openshift.io/desiredConfig on the node object with the name of the previously generated rendered machineconfig for the pool, e.g. rendered-app-c2144a20e3d0e37e96785165ba81f4e8. Themachine-config-daemon is running on each node, which watches the node object and is responsible for the rollout of the config at node level. If a machineconfiguration.openshift.io/desiredConfig is set that is different from the machineconfiguration.openshift.io/currentConfig, the machine-config-daemon starts the update process.

First, the node has to be drained to move the workload to another node. To do this, the machine-config-daemon sets the machineconfiguration.openshift.io/desiredDrain annotation in the node object to the desired config with the drain prefix (e.x. drain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8). When the annotation is set, the machine-config-controller normally starts to drain the node. When the drain is completed, the controller also sets the machineconfiguration.openshift.io/lastAppliedDrain annotation to the MachineConfig with the drain prefix, so that the machine-config-daemon knows that the node is drained, and the rollout can be continued and a reboot can be started.

After the rollout and restart of the node, the machine-config-daemon sets the machineconfiguration.openshift.io/desiredDrain annotation on the node object to the rendered MachineConfig with the uncordon prefix (e.x. uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8), whereupon the controller uncordons the node and confirms the action again via the machineconfiguration.openshift.io/lastAppliedDrain annotation. And finally, the machineconfiguration.openshift.io/currentConfig annotation is updated to the new rendered machinecnfig. And this process is then executed for the next node in the MCP.

Solution

To repair the network of the cluster we decided to take over the function of the machine-config-controller. This basically meant adjusting the individual annotations as described above in order to update the nodes to the correct configuration.

We performed the following steps:

Get the newser Machineconfig

To make sure that we are rolling out the correct MachineConfig, we simply looked at which ones are the latest and if they match the start date of the change.

$ oc get mc --sort-by=.metadata.creationTimestamp | tail -n 4
rendered-infra-295e23d9e2d49bdd5944747144b7d400         5409abac417622d748dfe58f392e79f1b83bbbe4   3.4.0             10d
rendered-worker-c2144a20e3d0e37e96785165ba81f4e8        5409abac417622d748dfe58f392e79f1b83bbbe4   3.4.0             10d
rendered-master-07828908f6fbd3d47c74169bb02bcba7        5409abac417622d748dfe58f392e79f1b83bbbe4   3.4.0             10d
rendered-app-c2144a20e3d0e37e96785165ba81f4e8           5409abac417622d748dfe58f392e79f1b83bbbe4   3.4.0             10d

Start the update

To start the update of the node we have updated the annotation machineconfiguration.openshift.io/desiredConfig on the first node here qsu-app-01 to the name of the desired machineconfig rendered-app-c2144a20e3d0e37e96785165ba81f4e8.

apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-app-412d73acd12c50adb849543f725dd419
    machineconfiguration.openshift.io/desiredConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-app-412d73acd12c50adb849543f725dd419
    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-app-412d73acd12c50adb849543f725dd419
  name: qsu-app-01

After some secondy the value of machineconfiguration.openshift.io/desiredDrain has changed to drain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8 by the machine-config-daemon.

apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-app-412d73acd12c50adb849543f725dd419
    machineconfiguration.openshift.io/desiredConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredDrain: drain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-app-412d73acd12c50adb849543f725dd419
  name: qsu-app-01

Drain the Node

Now that a drain is requested, the node needs to be drained, which can be done with oc adm drain qsu-app-01 --ignore-daemonsets --delete-emptydir-data. After the node was drained, we told the machine-config-daemon that the drain is finished by setting the annotation machineconfiguration.openshift.io/lastAppliedDraintodrain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8`.

apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-app-412d73acd12c50adb849543f725dd419
    machineconfiguration.openshift.io/desiredConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredDrain: drain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/lastAppliedDrain: drain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
  name: qsu-app-01

Changing this annotation also triggered a reboot of the node after some seconds.

Finish the update

After the reboot of the node there was an update of the annotation machineconfiguration.openshift.io/desiredDrain to uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8 whereupon we uncordoned the node with oc adm uncordon qsu-app-01.

apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-app-412d73acd12c50adb849543f725dd419
    machineconfiguration.openshift.io/desiredConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/lastAppliedDrain: drain-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
  name: qsu-app-01

And then we also set the annotation machineconfiguration.openshift.io/lastAppliedDrain to uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8.

apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-app-412d73acd12c50adb849543f725dd419
    machineconfiguration.openshift.io/desiredConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
  name: qsu-app-01

Afterwards the machine-config-daemon marked the update as finished by updating machineconfiguration.openshift.io/currentConfig to rendered-app-c2144a20e3d0e37e96785165ba81f4e8.

apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredConfig: rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-app-c2144a20e3d0e37e96785165ba81f4e8
  name: qsu-app-01

Cleanup afterwards

After we had carried out the steps just described for all nodes, the OVN pods were able to find all the required configurations for the individual nodes and were finally able to start.

Now the problem was that the existing pods could not communicate properly with each other, which was probably because the pod IPs were not clearly assigned to a CNI or similar, we did not investigate this too much but simply deleted all pods with a simple oc delete pod -A --all.

After “restarting” the entire workload, most of the services started up again correctly, and the remaining problems had to be fixed individually.

Fazit

Due to this small error during the SDN to OVN migration, we have learned how to manually trigger MachineConfig updates if the machine-config-controller is not “willing” or able to do so.